Flash memory is, and has been, a commodity item for a while now. Almost everyone has at least a few USB flash drives (sticks) and maybe even a few memory cards. When it comes to rapidly transferring large files between devices, or even storing working documents, USB flash drives are the universal choice.In recent years, we’ve also seen the popularization of solid-state drives (SSDs) which have put high demands on flash manufacturers in terms of volume, as they rapidly displace moderately high capacity hard drive storage from mainstream computers.
However, while flash continues to grow in popularity, it is not without its downsides. The nature of flash memory itself operates on the principle of charge trapping and tunneling, which is not a perfect process. The charges that represent the data can be lost over time due to the insulator being leaky or damaged – the whole process of writing and erasing data relies on tunneling that induces damage into the insulator which is not readily repairable in-place. Worst still, owing to price pressure, the inclination to get more data onto less silicon has resulted in smaller feature sizes (e.g. 16nm) which result in smaller cells which hold less charge and have a shorter lifetime presumably due to the smaller insulator area as well. This is compounded by the move to triple-level cell (TLC) storage, which results in the need to accurately distinguish between eight levels of cell voltage rather than the four of MLC or two of SLC, thus reducing the margin for error. As a result, at least for planar NAND, the move to smaller lithography and TLC have resulted in reduced endurance.
For the most part, most opinions do not focus on the issues of endurance as much as in the early days of SSDs. Larger drives, forced overprovisioning and better wear levelling technologies have largely resulted in consumer drives seeing only a limited number of write-erase cycles before being retired because they’re too slow or too small. Simplistic calculations of say 10Gb written a day to a 1Tb SSD only results in 18.25 cycles (assuming no write amplification and perfect wear levelling) within a lifetime of five years. As a result, manufacturers seem to warrant SSDs based on terabytes written which can result in anywhere between 150-500 cycles for TLC drives in general, even though most of their drives will go further. However, the situation for inexpensive USB flash drives and memory cards are not so clear. Part of the issue is a lack of diagnostic data (e.g. SMART as on SSDs) which would allow users to understand the condition of their drives. Another is the lack of willingness on the manufacturer’s behalf to state any information as to the endurance of their products.
As a result, I decided to run a little experiment to try and find out just how robust inexpensive USB flash drives of today are.
The Contenders and Methodology
Three contenders were chosen for this experiment – namely, three different drives I have reviewed in the past on this site, of which I still had brand new samples which were never used. The drives are:
They were attached to a computer with a USB 2.0 port and formatted to exFAT (to allow for large files >4Gb to be stored). Cygwin was used to continually write to the drive and log the progress using the following command:
while :; do dd if=/dev/random of=/cygdrive/X/rawout.raw bs=8M &>> stress.log; sleep 5; done
Note that the command continually recreates a file filled with pseudo-random contents, thus the drive cannot “cheat”. A large block size was chosen to avoid write amplification due to small-block accesses being immediately purged to the drive. However, no effort to verify the validity of written data or verify that any stored data would be retained over a period of cold storage was undertaken. The experiment would be terminated and the log file examined to tally the number of write cycles endured as soon as the drive failed to continue to receive data. A sleep time was set for each loop to avoid excessive CPU utilization on failure (although I should have probably checked the return value of dd to terminate the loop instead). Post-failure examination of the drive status was undertaken.
This experiment was conducted beginning November, prior to leaving for my holiday and for a variety of reasons (including a power failure and loss of system control due to loss of internet connectivity while I was overseas), was not completed until recently.
Under the same conditions, the first drive to fail was the Sandisk Cruzer Facet, at 632 cycles written. This aligns with the rough expectation that planar TLC NAND may only achieve anywhere from 300-1000 cycles.
The more devastating result was its failure mode. The drive exhibited a reluctance to conduct writes which resulted in the filesystem becoming corrupted. On a removal and reinsertion, the drive needed to be formatted as no valid filesystem was recognized.
The damage was, however, more serious than that as the internal drive geometry appears to have been lost. The drive was unable to report its original size, and thus was unable to be formatted, nor read. Recovery of data is very unlikely even with some expertise, as the whole drive is a single system-on-package and thus the NAND and controller are encapsulated in the one package.
The next drive to fail was the Comsol UF4-8000 which first hiccuped at 743 cycles by dropping out of the USB bus with partial corruption. A reformat was successful, and another 216 write cycles were completed before the drive again dropped out, for a total of 959 cycles. Partial corruption, especially of the filesystem, was experienced, however, the drive remained readable.
Under this condition, it seems likely that data would be recoverable when the drive first faulters. However, wishing to explore the failure mode further, I reformatted the unit and attempted to run H2testW on it, for it to fail in less than a cycle of writes.
Unfortunately, as it turns out, the act of trying to further write to the drive seems to have destroyed it – it now wants to be formatted, but fails to be formatted. It’s likely that the drive has now failed in a read-only state, but in a corrupted one. As a result, it probably indicates that when a drive is exhibiting drop-out symptoms from high cycle use, that one should avoid any repairs to the filesystem and just image the card and recover from a copy of the data, as any writes may well exhaust the few spare-cells that remain.
It’s interesting to see that while both were based around Sandisk NAND, the Comsol did last a little longer (but not by that much) and fail differently as it used a third party controller. Another factor is likely due to the greater overprovisioning – 7.26GiB user accessible for the Comsol versus 7.44GiB for the Sandisk Cruzer Facet.
The last to fail was the Verbatim Store’n’Go which achieved an impressive 9751 cycles. This drive was made of Toshiba eMMC memory, which is of a grade expected to be embedded in tablet devices, and thus its endurance was likely to be greater purely due to this fact. Another benefit may have been the greater overprovisioning on the Verbatim, which only exposed 7.21GiB of user accessible storage.
At failure, the drive was recognized for size and format, with data still partially readable. Attempting to format and run H2testW resulted in all writes appearing to succeed very slowly but failure to verify written data.
This may be a preferable end-of-life behaviour, as it would appear to “accept” any filesystem repair writes despite not being able to write it to memory, while allowing for whatever is still salvageable to be read. The cycle life result approaches that as claimed for MLC memory (of 10,000), and thus is quite respectable.
Discussion and Other Key Points
Most users of cheap commodity USB memory sticks will probably be wondering why any of this is important – after all, many of them may not even use 100 cycles before the stick is lost or damaged, so even 600 cycles is plenty. However, there’s a few things to consider.
For one, this cycle life test did not evaluate the data accuracy after writing – only when the drive failed to write or the filesystem got mangled, did we conclude the test and examine the drive. It’s fairly probable that some data corruption or access failures may have occurred earlier if verification was undertaken. As a result, the numbers we are getting are an upper bound result.
Secondly, the cycle life test did not evaluate how the persistence of the data stored on the drive. Over consistent cycling, the damage to the insulator is expected to increase the charge leakage rate, and thus it is quite probable that data stored on drives which have undergone cyclic writes will not be retained for as long, especially due to the more stringent voltage margins of TLC storage and the lack of sophistication in low-cost USB memory controllers. It seems quite probable that after even less than a year of storage, that data could be permanently lost to the point of the drive failing due to a loss of firmware or geometry metadata (note my experiences with microSD cards and even a Samsung 850 EVO SSD).
Due to the cost pressures on this segment of the market, the majority of the USB memory sticks on the market are likely to be planar TLC NAND and suffer such issues. The cheaper, slower, all integrated miniature USB devices appear to be particularly vulnerable and problematic as any recovery from them will be complicated by the fact that the NAND can not be directly accessed. Regardless, the failure modes also vary, and in some cases result in sudden and complete loss of access to data, thus “self rescue” by software based recovery tools is not a possibility.
With this in mind, these conclusions are likely to extend to many memory cards, particularly microSD cards where high densities and low costs prevail. In this application, it seems quite likely that cycle-life exhaustion could occur, especially when used in embedded computing with data logging, or in dashcams and security cameras which record in a loop. This voids the warranty of many microSD cards, but it’s important to keep this in mind, as it would render the solution completely ineffective if the card were to fail or become unreadable when extracted from the camera. In the case of long-term storage, loss of data is indeed a possibility as such flash is not well suited for long term storage.
Usage patterns will also make a difference, as will the sophistication of the wear levelling algorithm on the controller. Regardless, if a USB key is acting “iffy”, it could well be a sign that it is running out of spare cells to reallocate, and could completely lock-up into read-only or become unreadable soon. It’s unwise, with that in mind, to rely on cheap USB sticks as your sole storage of working documents despite the popularity of doing this. However, there’s no guarantee that even more expensive units are better.
The endurance of cheap USB flash drives was examined in a write-only scenario, with the Sandisk Cruzer Facet failing at 632 cycles with complete failure to read any data, the Comsol UF4-8000 failing at 959 cycles with an earlier hiccup and partial corruption but in a readable state that rejected writes, and finally, the Verbatim Store’n’Go failing at 9751 cycles with partial corruption but a partially readable state that accepted writes but could not commit them to memory. The failure modes were varied and have an impact on the “recoverability” of the drive once end of life is reached.
This test produces an upper bound figure on the endurance of cheap drives – it is likely if verification was completed that drives may have failed earlier, and that impacts on data retention over time would have also occurred. Even though the cycle figures may seem ample for most consumers, there are a number of applications (e.g. embedded data-logging, dash cam/surveillance) where it can be exhausted. Use of flash drives as primary storage for working documents or long term storage is probably unwise. Drives with drop-out symptoms or random write-failures/verification failures are likely experiencing a pre-failure symptom and should be imaged/recovered without further writes to avoid complete drive failure.
In light of this, the drives are still very suitable for occasional use and data interchange.
Other technologies other than flash, for example, crosspoint memory as demonstrated in Intel Optane modules, may well be an alternative technology that overcomes some of these problems. Improvements in NAND geometry, such as 3D VNAND by Samsung and BiCS by Toshiba may also offset some of the loss in cycle life endurance, however, it is unlikely that you will find such improved technologies especially when shopping at the price sensitive bottom-end of the market, as many people do. However, even paying more is no guarantee as to quality, however, it seems quite likely that the NAND and controllers in SSDs are made to be significantly more reliable than that of such cheap drives.