I’ve been quite busy lately with a few things, so many posts continue to be stashed on the backburner. To get these posts out often requires some work – experimental work, or even just tidying up images. To make my workflow as efficient as possible, I’ve now pretty much switched to SSD-only based storage for my “work in progress”. As this can be a fairly sizeable amount of raw data, especially with a number of projects “in flight”, I opted to upgrade my measily 256Gb 840PRO with the 1Tb 850EVO I reviewed earlier.
I don’t normally condone the use of TLC flash memory in SSD applications due to its lower endurance. However, it seems like it has become the “standard” for cheap consumer SSDs and many back-of-the-envelope calculations suggest it has sufficient endurance for “most” users. The other consideration is how well the data retention is on such flash cells, as TLC reduces the margin between states and is likely to be less tolerant to loss of charge.
Samsung was a pioneer in bringing planar TLC to the SSD market in order to bring about cost reductions. Their 840 EVO drive achieved fairly good popularity, until it was discovered that there were data retention problems which resulted in slow access speeds for rarely used data. Then, they issued a series of fixes, which involved re-writing the old data periodically to keep it refreshed, itself consuming NAND write cycles and potentially shortening the drive lifetime. If left unpowered, it seemed to me that data retention issues could well exist with drives just “left in storage”.
My personal feeling at the time was that planar TLC was not ready for consumer applications. The estimated endurance of about 300-500 cycles was not to my liking and was a number which I have exceeded with my 840 PRO at the time. I felt that retention issues were possible, so I held off.
But the 850 EVO promised something new – namely the 3D V-NAND structure which bought endurance back up to MLC levels (roughly speaking) and from all accounts, the data retention problems did not affect the 850 EVO. Or so … that was the theory.
First Hint of Trouble
The first hint of trouble I saw was that file access times seemed to be getting slightly slower than I remembered. Specifically loading times for some games that I’ve previously downloaded and occasionally played were noticeably longer. Once I felt there was an issue, I decided to run a benchmark to check.
Throughput was all over the place. Sure, the average value isn’t too bad, but there are sustained portions under 100MB/s which just isn’t right for an SSD that doesn’t compress its data and shouldn’t have any good reason for throughput variations. We know the SATA controller is not at fault, since the drive does reach the 520MB/s or thereabouts that it should, so the interface is working as expected.
A quick check with trimcheck also shows that TRIM is working, and it corroborates my experience when I accidentally deleted a file and … tried rather unsuccessfully to restore it. So I’m not doing anything particularly bad to the SSD to cause this. For many older SSDs, loss of performance in use is considered natural, and TRIM cannot restore it perfectly, but instead, it often presents itself as an almost “even” loss of performance throughout the drive. A quick check of the SMART variables at the time showed nothing alarming either.
As a result, I was forced to consider whether this behaviour was related to the 840 EVO data retention issue. To do this, I used SSD Read Speed Tester which was developed specifically for this cause and run it against the drive in question. Because I didn’t disable access times, etc, the results could be accurately obtained.
Of specific note is the vague correlation of read speed performance loss for files of 12 weeks and older, which seems to get worse as time goes on. This provides some evidence that older data may be degrading and becoming more difficult to read/decode, and the controller is having issues with it.
The first thing I didn’t want was any data loss, so I immediately archived all the files on the drive to another drive. I chose archiving as the method of preservation, as I didn’t want to lose file date/times as a copy would.
Initially, I planned to wipe the whole drive (secure erase) and then restore the data, but since I was running a long-term experiment that required the machine not to be reset (to release the ATA security freeze) and the drive hosted my swapfile, I couldn’t do this.
So I did the next best thing – I unarchived the files over the top of the existing files selecting replacement. After this “quick fix”, the file benchmark shows the following:
Immediate relief. A big change. It rules out the contents of the files as the cause of the speed decrease, and it shows that after the same data has been rewritten to the drive, the drive’s performance returns to the expected result. Note that the “weeks old” is not correct as the files were refreshed with their original dates, but it serves to prove the point.
Even the HDTune Pro benchmark shows mostly a return to normality. Based on these results, I can definitely say that something seems to be wrong with data retention on 850 EVO SSDs.
A Helping Hand?
As it turns out, my SSD was running firmware version EMT01B6Q. This firmware has since been superseded by EMT02B6Q. At the time I installed the drive, the new firmware was only just released, and due to download restrictions, I could not install it. After the drive got into daily production use, I didn’t want to install it on the off chance that it bricked the unit – there were a number of reports of bricked 850 EVOs with this particular firmware.
Unfortunately, Samsung doesn’t do us any good as firmware changelogs are nowhere to be found. I didn’t know if the firmware was vital or just a compatibility update for certain troublesome controllers. There wasn’t any clarity about whether data might be lost, or performance changes were instituted. But seeing as a real problem had been detected, it could well be possible that this update institutes some fixes that they “developed” with the 840 EVO in some way, but without actually admitting it, to avoid harming their brand image which has taken a battering in recent times.
Whatever the case may be, I decided to pull the trigger and do the update.
The dread of receiving one of these messages is real. Thankfully, the drive was not harmed – it’s simply a case of incompatibility between the AMD SATA driver and the software. Reverting to the Microsoft AHCI drivers allowed the update to complete successfully.
Will this be the cure? I don’t know. Unless Samsung tells me what the firmware does, I have no idea what it’s going to do.
The SMART data post-update and post-refresh doesn’t show anything unusual. A few ICRC errors occurred, but this isn’t unexpected when you’ve got a lot of drives in the one case with cables that may be at times borderline. The fact it’s only a low number means only an occasional throughput blip. It’s also not like this drive is stored in the off-state for a lot of the time either – I’ve had it for around a yeat (~8760hrs) and it’s been powered up 5363hrs or about 61% of the time. Most office machines wouldn’t even see this level of usage (Monday to Friday, 9-5pm is about 24% of the time).
We all know that flash memory is charge-retention based, and its suitability for long term data storage is, at best, limited. That being said, having witnessed the 840 EVO retention issues, we were rather confident and assured that the 3D V-NAND of the 850 EVO would improve the situation and that the 850 EVO was not affected. While I can’t claim to have fully understood the underlying issues, the slow-down experienced was noticeable and the benchmarks indicated issues especially with older files – symptoms which share similarities with the 840 EVO retention issue. To me, this discovery is rather disappointing, and suggests that data stored for even just one year is at potential risk of inaccessibility.
Of course, that was experienced under the older EMT01B6Q firmware, and I have since updated to EMT02B6Q without knowledge of exactly what is fixed and whether this behaviour might re-manifest itself into the future. The lack of a change-log is disappointing. That being said, Samsung SSDs are the most popular in the market segment, so I hope such issues don’t also affect their newer TLC-based products.
Updates and Musings
Upon further consultation with a friend that also uses the 850 EVO, it did not seem that his 500Gb unit was affected. Further research turned up the specifications at Anandtech, which shows that the 500Gb and below units used a new Samsung MGX controller, however, the 1Tb model used the older Samsung MEX which was common with the 850 PRO and 840 EVO.
Could this be a clue that in some way, the 840 EVO and the 850 EVO 1Tb might have shared a common shortcoming in their controller or firmware that caused the above behaviour? Could it also vindicate 850 EVO 500Gb and lower-capacity drives? I’m not sure, but the number of data samples is unfortunately in the “single digits”.