Q&A: Why Reformatting a USB Key can Change its Size & Performance

Just recently after I had posted my review of the Sandisk Cruzer Facet USB key, a regular reader sent me an e-mail which noted in a concerning tone that reformatting the device causes it to lose about 12-13Mb of storage, and also noted that some of his units performed slower than others. While I wouldn’t rule out manufacturing differences in causing performance differences in devices, storage devices present themselves as a block device with a fixed number of sectors and should remain the same physical size regardless of the format.

Lets just say, I decided to check if it was true … and indeed, the size of the drive before reformatting is bigger than after formatting with Windows 7 by the amount he had claimed:

suppled-size post-reformat-size

This is not the first time I’ve had a reader ask me about why USB keys and memory cards seemingly can have different sizes depending on how it is formatted. Other readers have asked me previously as to whether it is “safe” to format a card a certain way, or whether it was best to format the card in camera or using a certain tool.

As it turns out, the answer to this question is a little more complicated than it appears, and explaining it will take quite a while.

Types of Format

When we talk about flash memory such as a USB key or a memory card, sometimes it’s hard to understand what sort of formatting is actually happening. Flash memory has fixed-geometry cells determined at manufacturing, and the number of device sectors is fixed in the manufacturing process. So what actually happens when you format one of these devices?

In essence, formatting lays down the high level data structures which describe how the data is stored. Two major types of formats are used.

The first type is known as VFAT or superfloppy, and this originates from how MS-DOS handled floppy disks, and later on, superfloppies and MO disks. In those types of disks, storage is at a premium and partitioning was not useful. Instead of losing storage to recording a master boot record and partition table, the VFAT style devices start with the partition data itself – namely the boot sector and file system. This format is the default for Windows and did not have very good compatibility with BIOSes for booting, hence the need for third-party formatting tools such as the HP USB Disk Storage Formatter to generate the latter style of format.

The second type is a full-blown MBR and partition table, as you would have on a regular hard drive. This features maximum flexibility of being able to have multiple partitions (although Windows will only mount the first one), and better boot compatibility with BIOSes. It also can offer better alignment opportunities, by virtue of having the partition “start” at a certain sector. However, this format does use up some additional space – typically 16KiB at the least, and often (depending on the tool) 1MiB, 4MiB or even 16MiB. More modern versions of Windows (7 and above) will format USB devices with this type of format if that was the format on the device already.

As a result, having just a different format of the drive can result in 16kiB to a few megabytes of difference in drive size.

In the case of the Sandisk Cruzer Facet, it was supplied with a full MBR format, and retained this format post-formatting with Windows 7.

fresh-preformat after-reformatting

Before Format                                                     After Format

fresh-preformat-partition-table after-reformatting-partition-tableA check of the partition table values both pre and post-formatting show that they have essentially identical values. Thus the difference in capacity is not due to the type of formatting – although should VFAT format be used, it is likely an extra 16kiB would be had.

Before Format                   After Format

The Role of the FAT32 Filesystem

After the partitions, or lack thereof, have been established, the next major data structure is the filesystem itself. For the purposes of this post, we will talk about FAT32 as this is commonly used on flash storage devices of 4GiB to 32GiB.

When we think of a filesystem, we think of its role in terms of providing an index to where files are stored and the related metadata for the file. The FAT32 system is fairly old and well established, but it too has a few surprises up its sleeve when it comes to formatting.

When formatting FAT32 filesystems, several parameters can be altered depending on what you use to format the device. Parameters of relevance include the sectors per cluster (otherwise known as cluster size) and reserved sectors.

The sectors per cluster, or cluster size, basically determines the size of the minimum logical storage unit on the device. Every file occupies at least one cluster, and can only occupy an integer number of clusters. A side effect is that the cluster size also determines the size of the FAT itself (the index to the clusters). With smaller cluster sizes, there are more clusters that need to be indexed, and thus the FAT is larger. The converse is also true.

Indeed, with the Sandisk Cruzer Facet, this is where we get our first lead, as the chkdsk output for a fresh drive and a reformatted drive show that the stock format has 16kiB clusters and the reformatted drive has 4kiB clusters by default (see bytes in each allocation unit).

cluster-size-difference

Of course, life would be simple if reformatting with the same cluster size resulted in the same size … but alas … it doesn’t! Note the number of allocation units on disk – there is a 17 unit difference, or 278,528 bytes. We are much closer to the original capacity though.

post-format-same-clustersiz

This clearly demands a closer look at the actual result of the format – this is where looking at the boot sector with its BIOS Parameter Block and FAT32 information is important.

fresh-preformat-boot-sector after-reformatting-boot-sector post-reformat-same-clustersize

As supplied                                   After reformatting (4kiB)         After reformatting (16kiB)

The key point to note is that after reformatting, even with the same cluster size as the original drive, the reserved sector number is 562 instead of 18. This means that there are 544 extra sectors lost in the reformatting process merely because they were reserved because the formatting program decided to do so. The mathematics show that 544 sectors * 512 bytes per sector = 278,528 bytes. This is exactly the size discrepancy as noted earlier. The drive itself remains the same physical size throughout all of this.

So why does Windows decide to reserve these extra sectors? I’m not entirely sure. It could be because other devices expect to have some reserved sectors to play with, as they may bury their encryption keys, license keys, advanced boot-loader programs or other state information in these areas as “regular users” aren’t aware of it and generally will not corrupt it unless repartitioning or reformatting. However, this is a loss of space that could otherwise possibly be used.

Choosing larger cluster sizes can increase the amount of space available as it reduces the amount of overhead in the FAT size itself. Unfortunately, the trade-off is that drives which contain many small files smaller than a cluster unit will see inefficient use of space with lost storage in file slack. For example, if you copy a bunch of 17kiB documents to a drive with a 4kiB cluster size, then it will occupy just 5 clusters or 20kiB of storage. If the same drive was formatted with 16kiB cluster size, then it would occupy two clusters or 32kiB of storage. A drive formatted with 64kiB clusters will see one cluster used, or 64kB of storage used. As a result, even though you may see more available space, whether it will actually be useful will depend on the mix of files that you store.

To see this effect in practice, I used mkfs.vfat under Linux to produce VFAT formatted devices with no attention to alignment and the minimum number of reserved sectors (2). The command was of the format ‘mkfs.vfat -a -I -s <number of sectors per cluster> -R 2 /dev/sd*‘.

512b 1024b 2048b 4096b

512 bytes                       1024 bytes                     2048 bytes                    4096 bytes

8192b 16384b 32768b 65536b

8192 bytes                      16384 bytes                   32768 bytes                  65536 bytes

The capacity ranged from 7,879,611,392 bytes at 512 byte clusters, to 8,001,748,992 bytes for 64kiB clusters, a difference of 119,275kiB.

As far as the mysterious size difference of the Sandisk Cruzer Facet goes, the mystery is considered solved as the difference is in the size of the FAT due to cluster size and the reserved sectors. Of course, there are still more considerations which deserve discussion.

Optimizing for Size

mkfsvfat-64koptimized-bpbKnowing all of this, it is indeed possible to make a drive as big as possible within the physical parameters. The first step is to opt for a VFAT layout, and the second step is to opt for the largest cluster size possible. Add onto this, opting for the minimum reserved sectors and zero hidden sectors and you get something that has 8,001,748,992 bytes of storage as in the above cluster size demonstration.

I personally wouldn’t recommend doing this – and the reasons for this are many and covered in the next section, but it is to show it is indeed possible to have these format parameters and for it to work successfully.

No data corruption was witnessed, and the drive did work, but performance was impacted. The reasoning behind this will be explained later.

 

 

post-optimize-h2tw post-optimize-cdm mkfsvfat-64koptimized-atto

post-optimize-root-area

In the end, this results in a root directory (first storage cluster) beginning at sector 1910 (977,920 bytes).

The Art of Formatting, or why YOU should NOT format a device …

Flash devices are not as simple as meets the eye. In order to understand why you should not tinker with the formatting of a flash device, mandatory reading is this article titled How to Damage a FLASH Storage Device from OLPC’s Wiki.

For those who don’t want to read the article in full, the key points are as follows:

  • Flash devices have a limited endurance – ensuring that you avoid causing extraneous writes will allow these devices to live longer and perform better.
  • Flash devices have a fixed geometry. While they expose themselves as having 512 byte sectors like hard drives, the devices internally often have 2kiB, 4kiB, 8kiB, 16kiB or larger pages which are the minimum unit.
  • Writes typically involve multiple pages being erased and written together.
  • Manufacturers understand their devices and take pains to craft a geometry that best matches logical sector boundaries with the actual physical erase block boundaries to avoid unnecessary performance and endurance penalties from mis-aligned writes.

This explains why many partitioners and formatting utilities may reserve blocks which are not strictly needed – in order to achieve alignment. Unfortunately, their idea of alignment isn’t necessarily the right alignment for a given device. As knowledge about the internal layout of the device is limited, and the special behaviour of controllers in regards to physical to logical sector mapping is involved, achieving perfect results is hard.

As a result, it is often NOT recommended to reformat any flash device unless absolutely necessary, and if possible, to understand or record the original geometry before destroying or re-creating its format.

You may have heard of the alignment problem when 4kiB sectored “advanced format” hard drives first appeared on the market. This is, in essence, a very similar issue. I’ve produced a simplified image to show the importance of alignment:

alignment-importance

Here, in an oversimplified schematic, I show the underlying 512 byte sectors exposed by the device (bottom row). Above this is the actual page size, say 16384 bytes. If the drive was formatted with 16kiB sectors aligned, it would nicely match the page size at each boundary. If it was formatted with 4kiB sectors aligned, no sectors would cross any page boundary and this would still be an acceptable alignment. However, in many cases, the unaligned result at the top is what you see – where writing one cluster will result in two pages being reprogrammed at a minimum.

Actually achieving alignment is not easy, part of the problem being that even if you align the beginning of partitions, the actual clusters themselves are often unaligned because the FAT itself occupies a non-aligned number of sectors!

fresh-preformat-rootdir

Looking at the original format, the first storage cluster is at sector 7648, which implies alignment to 16kiB. From there, each cluster is 16kiB, meaning the data is aligned to 16kiB boundaries. In the size-optimised format above, the first cluster is at sector 1910, which would not be aligned in the same way, causing performance penalties because it is exactly causing the split page issue as illustrated in the crude diagram.

Even though we can observe an alignment to 16kiB, is it really necessary to have this? Might the drive actually have a block size that is smaller, and it just so happened to land on a multiple of the block size. In order to determine this, a simple experiment based on this page was used where 256MiB of random data was written to the flash drive using dd and various block sizes and timed.

det-blk-size

It seems quite clear that the device has a page size of 4kiB, and the 16kiB alignment observed earlier is mainly due to it being a multiple of 4kiB. As a result, I’ve decided to stick with a 4kiB sector format, and align to 4kiB to ensure best performance.

The resulting format was achieved under Linux using ‘mkfs.vfat -a -I -s 8 -R 250 /dev/sd*’

nice-optimized-rootd

nice-optimized-sizeThe first cluster now starts at 30720 (divisible by 8, so on a 4kiB boundary).

We have lost a little storage because we have decided to lose 248 sectors in the name of alignment.

In theory, noting the 4kiB page size would have meant that sacrificing less sectors was a possibility, however, we must keep in mind that the erase blocks are normally many pages wide, and will be a binary power of two capacity. As a result, I decided to have the first cluster at a MiB boundary as well, and it turns out this sits it at 15MiB.

The choice of 4kiB clusters was made as I would likely have a few small files on this drive, so the space savings of less file slack seemed to make sense to me in terms of losing a little storage compared to having a larger cluster size.

nice-optimized-cdm nice-optimized-h2tw

nice-optimized-attoThe big benefit was in restoring performance, where about 0.5MB/s write performance was gained by having this alignment rather than the size maximised version.

Peak performance in ATTO was also slightly improved over the review test original formatting as well, although sustained performance remains similar, so it seems that this alignment may have benefits with its internal cache.

Reformatting the original MBR formatted Sandisk Cruzer Facet in Windows 7 leads to the first cluster being at sector 32768 and the MBR format retained (2304 reserved sectors, 32 hidden sectors, 7,985,938,432 bytes). Reformatting a VFAT formatted one also leads to the first cluster being aligned at sector 32768 (with 2302 reserved sectors and 0 hidden sectors, total size 7,985,954,816 bytes which is 16kiB more as there is no MBR) which makes it a nice power-of-two meaning alignment issues should not be a problem.

Additional benefits are had with some formats with MBR by spacing the partition 1MiB or more integer MiB away from the partition table as this ensures the first erase block is never actually erased – it can be treated as read only. This avoids any possible wear-out or corruption of that block which could make the whole drive inaccessible – but of course, VFAT doesn’t have such a structure and instead a similar result can be had by increasing the reserved sector number. Whether it is really necessary is really up to debate.

Using SDFormatter results in an MBR format with the first partition starting at sector 8192 (a total of 4MiB protecting the MBR), but with the partition itself configured such that 8192 hidden sectors are used with 4382 reserved sectors resulting in the root directory cluster being at sector 8192. It also has an interesting 7.4Mb of unpartitionable space at the end, likely for compatibility and protection reasons. It chooses a 32kiB cluster size by default.

That being said, as the endurance of most devices is still sufficient for most users, and losing some performance may be an acceptable trade-off for having a working device, I think users shouldn’t lose too much sleep on the above points except for where they do have the necessary understanding to produce the necessary alignments. In the end, your data is still safe – the drive just needs to work harder and wear itself out faster to do it.

Conclusion

Reformatting your flash based media can result in different user accessible sizes due to the type of formatting used, and the way the filesystem is configured in regards to reserved sectors and hidden sectors. These parameters are rarely user-controlled, but do serve a purpose of ensuring device alignment to maximise performance and minimise unnecessary wear on the flash memory cells which have a limited endurance.

On the principle of manufacturer knows best in terms of the memory’s physical arrangement and block/page boundaries, it is unadvisable to reformat devices unless absolutely necessary. However, unless you choose to be obtuse and maximise the size of your drive with mkfs.vfat like I did, Windows 7 seems to align the clusters starting from sector 32768 (16MiB) into the drive which seems to be a very safe option to ensure the data is aligned but will result in possibly unnecessary loss of capacity.

Older versions of Windows may not be as gentle – namely XP and older may end up with strange alignments at sector 63 for example which will cause premature wear of your media. Standard partitioning tools in Linux generally also create partitions which may be aligned to boundaries (1MiB) but the actual clusters themselves are often not as the FAT is formatted to be within the partition boundary and is rarely an even cluster size.

However, losing sleep over this is generally not necessary as the devices will still operate even if they do wear out faster, or perform slower. The data is generally not lost unless the device decides to fail …

About lui_gough

I’m a bit of a nut for electronics, computing, photography, radio, satellite and other technical hobbies. Click for more about me!

This entry was posted in Computing and tagged , , , , . Bookmark the permalink.

2 Responses to Q&A: Why Reformatting a USB Key can Change its Size & Performance

  1. sparcie says:

    Interesting, the format seems to make a significant difference, although considering what most flash drives are used for, it’s probably not a big deal.

    Any experience on what effect other formats have? such as NTFS, ext(1-3) or UFS?

    I’ve got two flash drives I formatted as UFS for my NetBSD system as simple backup drives. It’s not a huge deal if they are slow, and breaking early is only an issue if I have a data loss before I can replace them. I have one plugged in at a time and swap them every now and then, using rsync to only copy stuff that is updated. It would be interesting to see if formatting them this way had any adverse effects.

    Cheers
    Sparcie

    • lui_gough says:

      For most non-MS filesystems, the layout is determined more by the partitioner (i.e. partition alignment) and then by the architecture of the filesystem itself. As most filesystems have the same concept of clusters, and most are binary powers of two, as long as the file system metadata is also “cluster sized”, then alignment should be all sweet provided the partition alignment is good. As far as I can tell fdisk under Linux refuses to align partitions any earlier than 2048 sectors by default, so the 1MiB alignment of the partition should be just fine for most flash drives and not too wasteful.

      Of course, I haven’t had the time to clearly analyze this, as I’m aware UFS has inodes scattered throughout the disk, so it’s unlike FAT in the sense the metadata is more distributed … and I’m not familiar enough with inode sizes etc to be able to comment on that.

      – Gough

Error: Comment is Missing!