Project: 20th Anniversary Pentium G3258 – Part 2: Overclocking & Benching

As an engineer, I get excited whenever I have any form of optimization to perform. Overclocking is a kind of optimization, depending on how you do it – in this case it’s attempting to extract the maximum clock rate, with the lowest core voltage (and heat) practicable, maintaining a level of stability which is high enough for everyday use, using parts of the lowest cost, within the constraints of the stock CPU cooler, paste and weak VRMs on the motherboard. Having previous experience with Haswell overclocking, I dove right in and tried my hand at several things.

Initial Experiences

From my initial experiences with the Asrock Z97M Anniversary, overclocking was slightly more difficult than I would have liked. Due to some conflicting settings somewhere (don’t know what), despite selecting the overclock defaults provided, the board booted into Windows at stock speed and could only be overclocked with their A-Tuning software. Resetting the CMOS using the UEFI Setup Utility didn’t help. The strange thing was that the UEFI utility insisted the CPU was overclocked, and everything else insisted that it wasn’t, including A-Tuning.

It was only after clearing the CMOS with the hardware jumper that I managed to get the board to boot with the overclocked settings.

I started with the pre-sets, but soon opted to “go it my own”. Important parameters to tweak are:

  • CPU Multiplier (selects the Core speed)
  • Cache Multiplier (selects the Uncore speed)
  • GPU Clock Rate (overclocks the integrated GPU)
  • CPU Core Voltage (selects the voltage the FIVR gives the cores)
  • CPU Cache Voltage (selects the voltage the FIVR gives the uncore)
  • Current Limit (allows for higher power usage)
  • Turbo Power Limit (allows for higher power usage)
  • FIVR Operating Mode (select high performance over efficiency)
  • FIVR Faults (disable to prevent issues)
  • GPU Voltage (selects the voltage the FIVR gives the GPU)
  • RAM Clock Base (selects 100/133Mhz based RAM increments)
  • RAM Rate (selects the multiplier)
  • RAM Latencies (to fine tune the RAM)
  • Analog I/O voltage (to add stability for RAM overclocking)
  • Digital I/O voltage (to add stability for RAM overclocking)
  • System Agent voltage (to add stability for RAM overclocking)
  • DIMM Voltage (changes the voltage to the RAM)

Ultimately, given the price of the CPU, I didn’t mind killing it at all, so I went through it in a rather cut-throat way.

Overclocking Methodology

The methodology behind overclocking Haswell CPUs can be simplified to five stages. By following them in order, it is possible to reduce the number of headaches and “I don’t know what caused it to BSOD” problems, while also reducing the time it takes to reach full stability.

0. Set the BIOS settings

The first thing to do is to set the BIOS settings correctly. I would suggest you try to load a low overclock default as a basis if you’re not comfortable setting them from scratch. You will need to disable current limits, faults and identify where your multipliers are. You need to check and verify (say, with CPU-Z) that your board is doing what it says it’s doing (setting multipliers correctly). Also consider setting the fans to full speed to avoid losing cooling performance. Also, you might want to disable thermal throttling (at your own risk) to ensure the CPU performs at the fastest speed at all times regardless of the high temperature.

1. Optimize Core

The first overclocking stage is to optimize the core independently of everything else. By this, it would be fastest to increase the core voltage to the highest you can accept (likely to be 1.400v or less for air cooling) within thermal limits (check temperatures using HWMonitor or CoreTemp), and then increase the multiplier until it no longer boots. Then dial back the multiplier, and try to pass 24-hours of Prime95 mixed torture test. Once you can pass 24 consecutive hours, then you know you’re stable at that given multiplier.

Remember that the temperatures reported are junction temperatures, and the temperatures reported on the datasheets are case temperatures. It would be wise to stay under 95 degrees C at the peak, and others will advise you to be even more conservative. But if it doesn’t crash, you’re pretty much good as real life workloads are unlikely to cause as much stress as Prime95 will.

Once you have this, it would be advisable to start the voltage optimization routine, by repeating the above but instead of increasing the multiplier, your job is to reduce the voltage in small steps (0.025v at the most, preferably 0.010v) until it breaks, and then go back up one or two steps. This will reduce the heat produced and improve the electrical efficiency of your system.

2. Optimize Uncore

The next step is to optimize the uncore – this is the multiplier which controls the cache. While the chip is shipped normally running both core and uncore at synchronous rates (i.e. 32 and 32), for high overclocks, it is likely that this isn’t possible. Uncore seems to top out at 39-42 depending on the chip.

While it isn’t an absolute necessity to run the cache synchronously with the CPU due to increased cache bandwidth, there are slight performance benefits to overclocking the uncore. As a result, you should increase your Uncore, starting from 38-39 and see where it fails the 24-hour Prime95 test, and then back it off.

The voltage to the uncore is generally less important as it contributes less heat, I would suggest setting it to 1.2 or 1.25v and leaving it.

3. Optimize RAM

The third step is to optimize the RAM. It’s helpful if you have the stock latencies from the SPD at hand (see CPU-Z). Generally, the Intel CPUs benefit greatly from increased RAM clock rate even at the expense of latency.

To prepare for RAM overclocking, I would suggest increasing the Digital, Analog I/O voltage and System Agent by +0.2v and the Vdimm to 1.65v and leaving it there. Then, start pushing the clock rate while manually adjusting latencies as not all boards will make sensible decisions. Do change the DRAM base clock to 133Mhz if that helps you eek a little more out of your RAM. You should also consider changing to 1T/1N from 2T/2N to improve performance as well.

To work out the latency, I’ve made this handy table (especially handy for cheap RAM):

Clock-BasisFor example, lets say we have cheap DDR3 1600Mhz RAM with a latency of 11-11-11-28 at 1600Mhz. You can see that the CAS latency (first number) corresponds to ~6.88ns. If you want to try running this at 1866Mhz, you look up the number nearest to this and find a CAS latency of 13 at 1866Mhz corresponds to 6.96ns which should be safe. Likewise if you then decide to push it to 2400, you should probably try CAS latency of 15 to keep a similar amount of “absolute” time.

How do you determine the big number at the end? Well, as a very rough rule of thumb, you should be using something around twice the CAS latency + 1 or 2 as a minimum. So if you’re trying 15-15-15-?, the ? should be 31 or 32 as a minimum.

This gives you a nearly safe starting point to decide what manual timings to apply.

Then you can tighten these and see if you can still keep it stable. Memtest-86 can be helpful, although this needs to be followed by a Prime95 run to be sure. Note that Memtest doesn’t show the right CPU clock rate – this is normal.

The important thing to realize is that overclockers term this loosening the timings, but you need to remember the timings are the number of clocks at a given clock rate. If you increase the clock rate, it takes more clocks for the same amount of time to pass. As a result, you might still get a net benefit if you increase your clock rate (bandwidth) even if you have to loosen your timings.

4. Optimize GPU

The final piece is to optimize the GPU. For this, I like to push the voltage up to about 1.300v and then move the clock up in 100Mhz increments and run a 3D test application (even the Windows System Assessment Tool is enough in most cases). When the GPU is unstable, the graphics subsystem driver resets and the screen blinks. Back off by 100Mhz once you hit this point and you’re often fine. The GPU can be tweaked in 50Mhz increments if necessary.

It might seem complicated at first, and in some ways, it can be. But if you’re willing to live with a more limited overclock, then the existing “one touch” settings might be more to your taste.

Finalized Settings

As it turns out, here are my finalized settings which pass 24-hours of Prime95 with the stock cooler never exceeding 91 degrees C:

  • Core Multiplier: 46x (stock: 32x)
  • Cache Multiplier: 40x (stock: 32x)
  • Vcore: 1.375v (stock: 1.090v)
  • Vcache: 1.200v
  • Boost Power Max: 1000
  • Short Power Max: 1000
  • GPU Clock: 1700Mhz (stock: 1100Mhz)
  • GPU Voltage: 1.300v
  • System Agent: +0.2v (stock: 0v)
  • Analog I/O Voltage: +0.2v (stock: 0v)
  • Digital I/O Voltage: +0.2v (stock: 0v)
  • RAM Clock: 2666Mhz (stock: 1600Mhz)
  • RAM Timings: 14-14-14-30-1T (stock: 11-11-11-28-2T)
  • RAM Voltage: 1.65v (stock: 1.50v)
  • CPU Fan: 100% Fixed Mode

The proof is in the screenshot:

46-40-1375-1200

Looking at the stock VID of 1.090v, the chip is considered an “okay” to “good” chip. It’s not an excellent gem, and that’s noted by the high 1.375v core voltage required to achieve 4.6Ghz. Anandtech’s sample managed to do 4.7Ghz at 1.375v – so mine’s not far off the beat.

The biggest surprise was the level to which the Samsung OEM RAM (which has no heatspreaders or any fancy features) managed to overclock which simultaneously pushing the timings somewhat (despite how it appears). This actually has a positive performance benefit – as I will demonstrate in the Benchmarking section.

One unfortunate thing seems to be that this is the limit of the CPU and the motherboard as well. The VRMs are squealing fairly loudly under heavy computational load, normally indicative of slight overloading or stress. Nothing’s blown up or gotten too hot yet … but it means we are at the limit of the cooling, VRMs, CPU and RAM almost simultaneously – the kind of coincidence which helps you optimize every last drop from your hardware.

Validation Fun

Before all of that serious benching, I wanted to see just how fast this thing can go, irrespective of long term stability. The way to achieve this is to push the voltage up to the maximum you (and your board) can tolerate on a short term, and then push the core rate up until you freeze and or fail to submit your validation. For me, my CPU managed this impressive performance on both cores:

ipnrce

My first validation, and my first CPU at 5Ghz. This is a pretty cool present, Intel!

Benchmarking

Looking at the raw clock numbers are rarely entirely indicative of performance gains from actual use cases. As a result, it is important to benchmark with a wide suite of applications which reflect your actual uses and may be subject to bottlenecks from various other components within the system. It also provides us an opportunity to investigate the performance difference of running DDR3 1600 at 11-11-11-28 to DDR3 2666 at 14-14-14-30 (fairly “lousy” latencies by 2666 standards).

Unfortunately, there are many many benchmarks available and it would be impossible to cover them all. Others have some restrictive licensing agreements or software limitations that make them less useful. The benchmarks used below are:

The first results column represents running at stock speed with stock RAM, the next column is running at overclocked speed with RAM at 1600Mhz 11-11-11-28. The column next to it represents the percentage improvement. This is repeated for running overclocked with the RAM at 2666Mhz 14-14-14-30.

BenchResults

Overall, it seems that running the RAM with the looser timings but higher clock has a nearly 5% performance advantage on average. Some benchmarks showed no improvement or even marginal decrease in some cases, however. Overall, the CPU increased in frequency by 43.75% and the benchmark results raised by 34.34-39.12% which is a fairly significant amount, although slightly less than the core frequency increase would suggest.

Conclusion

It seems that my overclocking has yielded fairly significant benefits, although not without some effort. The CPU managed to get to 4.6Ghz on both cores, stably, under stock cooling and push a 1600Mhz DDR3 pair up to 2666Mhz. Quite impressive indeed for the price.

In terms of performance, the Pentium has been elevated into a mid-high Core i3 level, sitting between the 4350 and 4360 in terms of Passmark CPUBenchmark scores. It’s a decent chunk of performance for a basic machine, and a decent price. Add to that, a weekend of fun for me, and that’s money well spent!

Again, I will state, if you have no intention of overclocking, the CPU is a pretty “regularly” priced low-end CPU with no real distinguishing features. But if you do, you can definitely get some improvement for free – just don’t expect it. It’s not for everyone.

Posted in Computing | Tagged , , , , | 2 Comments

Project: 20th Anniversary Pentium G3258 – Part 1: The Nostalgia & the Parts

It’s not often in technology that something lasts a good 20 years, but it seems, the Pentium name is one of them. To celebrate this, Intel released a special Intel Pentium 20th Anniversary Edition G3258 in mid-2014 based on the Haswell design (LGA 1155) with unlocked multiplier, at a relatively low price, as a bit of a “gift” to overclockers.

A Bit of Nostalgia

When I first started with computers, the only thing I knew was the Intel 80386SX-16 that was in the box at home. From that, I eventually grew to building computers from ex-business scraps, mainly 486-DX2′s.

At that time, the Pentium name was well and truly the name for performance. I would look at the benchmarks and it was phenomenal. They literally ate my 486′s for breakfast! The 486′s were relegated to the budget end. I still remember advertisements on TV which pushed the Intel name, with the “Intel sound” very vividly. As part of your duties as a computer buyer, you were supposed to check it had “Intel Inside”.

This was no doubt because Intel was feeling the pressure from competing CPU manufacturers, such as AMD, who had a rather good run with the 386 and 486 clones and legal victories when it came to the 386 and 486 designations not being able to be trademarked as they were mostly numbers. Not to mention, the competition also involved Cyrix, IDT, Winbond and TI – many more than we see today.

As a result, the Pentium name came about, with Pent signifying the fifth generation, and -ium making it sound like an “element”. The funny thing? The Pentium name lived on through the Pentium, Pentium MMX, Pentium Pro, Pentium II, Pentium III, Pentium 4, Pentium-M, Pentium D and Pentium Dual-Core. There were four “big” generations with the Pentium name as the performance choice.

The second machine I bought second hand had a Pentium 133Mhz (non MMX) CPU in it. In fact, I made this machine stretch – from 1998 to 2005, I was still (painfully, and stubbornly) choosing to use the machine as my primary machine until the motherboard packed up and I leaped forward several generations to an AMD Athlon XP 2500+. It happily ran Windows 98SE on 48Mb of 72-pin SIMMs, a PCI 56k softmodem, a PCI-USB 2.0 host controller, two hard drives (4.3Gb each) and a CD-RW drive. Part of it was being as efficient as you could with your limited resources – unfortunately while everyone was happily enjoying DivX3 encoded files, my CPU couldn’t. Even MPEG2 wasn’t in reach. In high school, people were rather astonished when I told them I had one hundred and thirty three megahertz. They thought I was mistaken!

Not everything about the Pentium’s history is good. For one, the Pentium name introduced the wider public to the fact that CPUs are not infallible with the FDIV bug. Maybe that inspired the whole “upgradeable microcode” feature of most modern CPUs, as we’ve come to accept that errata are the norm, even for the modern Haswell CPUs. Prior to this, the Intel 80386 did have some buggy units which can’t run 32-bit software.

The Pentium CPUs ran hot for the time as well, requiring a small heatsink and fan, where older CPUs were content with a heatsink. This repeated itself with the Netburst Pentium 4, which ran very toasty compared to their AMD counterparts.

But even with all of that, the Pentium often provided the yardstick by which all their competitors were measured by, and this wasn’t by accident. If anything, the next most long-lived name may be the Celeron … possibly followed by the Athlon (and its variants, XP/MP, 64, x2) [citation needed].

Here’s a montage of every (really?) Intel animation (1971-2013) made by Flake Songs (the audio could do with some normalization), but you can see just how much the Pentium branding was the mainstay for four generations of CPUs.

On that note, I kinda got distracted … looks like Girls’ Generation did a whole song to advertise Intel’s Core CPUs (for the kpop lovers) …

… which marked the beginning of the “retirement” of the Pentium name. No longer did Pentium mean flagship – instead, Pentium is now used to denote “one notch above Celeron (i.e. the bottom), and one notch below Core i3, which is below the i5 and i7 respectively”. In some ways, it’s a bit of an undignified retirement of the name.

So, I guess it’s Happy 20th Birthday Intel Pentium … but maybe this is the last hurrah for the name.

C’mon Intel – Are You Being Serious?

Intel announced the Pentium Anniversary Edition G3258 to be released in mid-2014 and it was received with mixed results. On the one hand, it seems that overclockers appreciated a low-cost unlocked part, on which we could take our gambles. Unfortunately, this wasn’t the CPU we had wished for – as a Pentium series CPU, it didn’t have some of the special instruction sets and improved graphic cores that the Core i3 and above had. It also only had two cores and no HyperThreading, and as a result, its performance would be relatively limited regardless of how high it was overclocked. In fact, it turns out the CPU is just the unlocked cousin of the existing Intel Pentium G3420.

Relatively old-hat overclockers will remember the days when locked CPUs were not the norm, and will remember the days when the Front Side Bus still existed and could be decoupled from other buses for very fine overclocking of any chip. You didn’t need to play by their rules in paying more for an unlocked chip, and buying a more versatile chipset board based on the Z-series chipsets.

As a result, some of the overclockers see no point in giving the G3258 any attention at all. Why would one spend significant amounts on an overclocking-motherboard, and maybe aftermarket cooling and high performance RAM when the CPU is likely to only give performance approaching a bottom end i3 CPU?

On the other hand, the CPU is AU$78, which can be considered inexpensive enough to play with. To try and address the motherboard cost issue for the Z97 series chipset, stripped down versions of motherboards have been offered – for example, the Asrock Z97M Anniversary edition for AU$112 and the Asrock Z97 Anniversary edition for AU$122, both claimed by Asrock to be the best motherboards for overclocking the Pentium Anniversary Edition (yeah right). I’m sure other manufacturers are coming out with theirs too, but they are less available at this time.

Again, I would have to give the cynnical overclockers some credit, as neither of these two boards are particularly good boards. In fact, it’s downright weird. As a “gateway drug” to overclocking higher range CPUs, the Z97M Anniversary lets us down by utilizing a 3+1 phase arrangement which is rather limited when it comes to providing the power we need. The Z97 Anniversary Edition board has a 4+2 phase arrangement which is better, but then that board has only one HDMI graphics out and omits the DVI and VGA that the Z97M Anniversary Edition has.

In all, I wouldn’t purchase either motherboard if you want to upgrade in the future. Those boards are best used with the G3258 for a “value” system with a bit of additional kick.

Others will yell loudly that the CPUs themselves should have been marked faster and this isn’t real overclocking. Bulldust! Overclocking only comes about because of the binning process by which CPU manufacturers use. This is mostly a marketing exercise, as chips have to be sorted based on their tested performance into model numbers. Sometimes you will have chips which meet the requirements, but only just, and other cases you will have considerable headroom which you can utilize (i.e. the overclocking).

When the bins match the yield very closely, then you will find CPUs with very little headroom at each model number. Overclockers then cry in disappointment because they’re getting no free rides. Conversely, when the bins match the yield very loosely, then overclockers get excited because this means free performance to be gained.

Everything is still a gamble – there’s no saying your particular chip will go faster than the 3.2Ghz they’ve posted on the sticker, although, I suppose you’re likely to be very very disappointed if it doesn’t. I don’t think Intel puts effort into making sure every chip overclocks equally well, or at all. So yes, it is overclocking.

Keeping it Cheap

I think it’s clear that if you’re going to make the G3258 an economical proposition, you’re pretty much going to have to forego spending on any exotic components at all and instead work hard at trying to juice everything from what you already have.

As a result, I picked up the following:

  • Intel Pentium Anniversary Edition G3258 AU$78
  • Asrock Z97M Anniversary Edition AU$112
  • 2x4Gb Samsung DDR3 1600Mhz OEM AU$86

This brings the total for the system “core” parts to AU$276. The choice to go with two RAM modules was mainly to make use of the dual channel RAM capabilities, and improve performance, but limiting the amount of RAM and number of sticks should help with overclocking (as you’re often limited by your weakest part).

I decided to forego a case (for now, as I’ve entered this competition on OCAU and I’m hoping to win), and re-use an old spare crappy power supply. I’ll bench the components on the motherboard box, in open air, and use an old 100Gb laptop 2.5″ SATA drive for booting. The interest is on overclocking as a hobby.

Most overclockers will agree that a quality CPU cooler is a prerequisite for decent overclocking. To that, I will instead yawn and remind you all that your “pricey” Noctua NH-D14′s are more expensive than the CPU itself, and it’s hardly a sensible spend. Instead, I will set myself the constraints of stock CPU cooling and the provided Intel paste (albeit, fan on maximum).

The Goodies – The CPU

The CPU comes in the classic Intel cardboard packaging, with blue colouration. The design, however, has been changed to reflect the anniversary nature.

DSC_7939

This particular chip’s details are as follows:

DSC_7940

The box also includes a cooling solution and a three year warranty, as noted on the rear. This particular chip comes from their Costa Rica plant.

DSC_7941

The sides list the details of the features supported, and the top has a window for you to admire the chip inside. The underside has the regular multilingual details.

DSC_7942 DSC_7943

DSC_7944

Inside, there is an installation booklet and McAfee offer.

DSC_7945

The rear of the booklet has the case badge.

DSC_7947

The CPU comes safely ensconced in a plastic bubble.

DSC_7949 DSC_7950

The chip, outside of the carrier, looks like this:

DSC_7956 DSC_7957

And nestled safely in the motherboard … almost ready to go!

DSC_7958

The included cooling solution is a regular Intel orb style cooler – a bit thinner than the LGA 755 ones I’ve dealt with in the past.

DSC_7951 DSC_7952

The Goodies – The Motherboard

DSC_7953

The Asrock Z97M Anniversary motherboard is a mATX board with a decent feature set. The features include four DIMM slots, 1x PCI-e x16, 2x PCI-e x1, 6 SATA III ports, 2xUSB 3.0 on rear + 2x USB 3.0 on headers, 4xUSB 2.0 on rear + 4xUSB2.0 on headers, Realtek GbE, Onboard Audio, VGA+DVI+HDMI graphics output, PS/2 Keyboard and Mouse connectors. It also claims to have Elna capacitors in the audio path, but there’s a lot of marketing exaggeration as to the quality of the onboard sound solution …

DSC_7954

Unfortunately, at this price point, the power delivery circuitry of 3+1 phase design is a little limiting when it comes to overclocking, especially higher end CPUs. There is no Intel Gigabit LAN, or support for M.2 SSDs or SATA Express which is a feature of the Z97. It also seems through a peruse of the BIOS, that the voltage going into the FIVR is not adjustable either.

DSC_7963

The board itself also seems to have the Z97M Anniversary model number labelled onto the PCB. The PCB itself is very similar to their H97M Anniversary board, which isn’t much cheaper at AU$106. The power is taken from a single four-pin auxiliary connector, which tells me that the board’s VRMs aren’t very beefy at all.

Conclusion

The Pentium name has been around for a long time. While in the late part of its life, it represented value-end products in a rather big “fall from grace”, it was the yardstick and flagship for many many series of CPUs and remains a memorable name.

The Pentium Anniversary Edition is a commemorative release, which doesn’t suit all. As rightly pointed out by some overclockers, its value proposition is at most, suspect considering all the requirements to maximise overclocking, however, as a “free gamble” with good odds of reaching 4.5Ghz, it’s not a bad deal.

However, it’s not a foregone conclusion, as it may be possible to gain acceptable performance gains without investing too heavily through careful “no-frills” part selection.

Stay tuned for Part 2 here, where I push the G3258 to its absolute limits and find out just how high it can go.

Posted in Computing | Tagged , , , | Leave a comment

An SSD in a USB 3.0 Enclosure: Part 3 – Sandisk Extreme II & Transcend SSD340 (again)

For those who are interested in the topic, they might also want to see Part 1 and Part 2 to better understand the caveats and limitations, as it will put this article in a better context.

So far, I’ve been recommending SandForce based drives for external enclosures as they typically work better despite the lack of TRIM support as they have sufficient overprovisioning, help from compression and sophisticated wear levelling algorithms.

As discovered in the previous part, the Transcend StoreJet 25S3 supports UASP, and thus this set of experiments were performed on UASP capable machine (i.e. my Asus K55A refurbished laptop).

Using the Sandisk Extreme II 480Gb as a USB-SSD

In the previous part, it was determined that the Jmicron based Transcend SSD340 performed horribly once filled due to a lack of overprovisioning for wear levelling and garbage collection to work effectively. As a result, I wondered just how well the Sandisk Extreme II would perform given that it does NOT use a SandForce controller, but it DOES have a larger overprovisioning similar to SandForce drives.

As a result, my methodology was to fill the drive repeatedly and completely using HD Tune Pro’s write feature. If the drive can maintain its performance under these circumstances, it should operate well despite a lack of TRIM.

28-August-2014_00-44 28-August-2014_01-06 28-August-2014_01-52

From these results, we can see no dramatic loss in write speeds as we observed with the Jmicron based drive before. Lets take a look at the read performance …

28-August-2014_02-13

Pretty decent for a USB 3.0 connected SSD. With the SSD still in its dirty state, I decided to check what CrystalDiskMark thought of it.

cdm

All in all, the drive doesn’t seem to be hurting at all, the performance is very good. H2testw saw no corruption issues, so I suppose the Extreme II is a good candidate for TRIM-less usage, although I cannot comment on how well it resists power interruption.

h2testw

The Asmedia chipset used in the Transcend StoreJet 25S3 supports passing SMART data, so you can see how hard my SSD has been hammered in the name of research …

cdi

Of note is that the E9 parameter (likely to be actual flash writes) isn’t much higher than the F1 parameters, indicating low write amplification, and therefore, slower rates of wear even in TRIM-less situations. The overprovisioning provides enough wriggle room to rotate blocks around and optimize their wear.

Saving the Transcend SSD340 for USB-SSD by Manual Overprovisioning?

In doing the above, a little light bulb went off in my head. As we have already established that the SSD340 restores its performance upon a secure erase, what would happen if we chose never to write to some sectors at the end of the drive? The drive will know that the sectors were never used since the secure erase and should theoretically be able to use it as free blocks for block rotation.

So how can we establish the limits of the drive? Similarly to the above. By using HD Tune Pro’s Write feature, but with Short Stroke turned on, we can select to use only the first x gigabytes (decimal) of the drive. By writing over and over, we can determine if there is degradation over time.

The first point to start at, for a 256Gb drive, is 240Gb (the default SandForce overprovision).

28-August-2014_03-27 28-August-2014_03-39 28-August-2014_03-52

Surprise surprise, the write performance is maintained across three fills of the first 240Gb. There is hope after all. The “user” overprovision is noticeable as the drive has a “hump” in its write pattern which shifts along. This indicates that the unused blocks are being rotated in on the next write, causing the hump to run along the graph. Cool! There’s hope after all!

Lets be a little more greedy and try to get 248Gb out of it.

28-August-2014_10-48 28-August-2014_11-13 28-August-2014_11-30

There is a slight dip at the end, which is very small, but no permanent loss of write performance is evidenced, unlike for a full drive fill. So we’re good at 248Gb. Lets try upping it a little more to 250Gb (i.e. the same as a Samsung 840 Evo).

28-August-2014_11-44 28-August-2014_12-10 28-August-2014_12-36 28-August-2014_12-56 28-August-2014_13-27 28-August-2014_13-57

This is where everything falls apart. The drive starts losing write speed in parts of the drive in subsequent runs before then the whole drive cuts its write speed in half. In order, the average write rate goes:

  1. 303.2Mb/s
  2. 232.6Mb/s
  3. 188.3Mb/s
  4. 220.3Mb/s
  5. 143.1Mb/s

It doesn’t quite decrease consistently, but it certainly does go down. As a result this drive can have about 248Gb of its surface filled before the performance goes downhill.

So how can we enforce this? By secure erasing the drive, and then partitioning the drive of course. It’s important to note that we have to do a conversion – the magic number is 248Gb in decimal gigabytes but most partitioners want the partition size in binary megabytes.

We can do the conversion by multiplying 248 (decimal Gb) by 1,000,000,000 (bytes in a decimal Gb) and then dividing it by 1,048,576 (bytes in a binary Mb). This gives us 236511.23. Lets round down to 236,511Mb. Remember, there’s no harm in over-overprovisioning!

In order to check if the drive is able to maintain its performance, I decided to fill the drive five times with H2testW and make a note of the write speeds. Overall, we witnessed no significant degradation in speed, implying that the strategy worked.

h2tw5 h2tw4 h2tw3 h2tw2 h2tw

After this, with the drive still completely dirty, I decided to give it a few benchmarks on my non-UASP capable system just to check that it’s all good.

transcend-nonuasp-cdm trasncend-AS-SSD-nonuasp StoreJet Transcend USB Device_256GB_1GB-20140828-1850 transcend-nonuasp-Atto

Of course, the results aren’t as fast as the UASP machine reports, but it’s miles faster than the dog-slow performance before.

Conclusion

Drives which feature significant overprovisioning (e.g. 120Gb vs 128Gb, 240Gb vs 256Gb, 480Gb vs 512Gb) might seem to be the worse buy to an end consumer. They seem to offer less space than their competitors without any tangible benefits in specifications.

The truth is that both drives contain the same amount of flash memory, and the additional flash memory improves the I/O consistency of the drive, especially in TRIM-disabled environments (i.e. RAID or USB). It is an important part of reducing write amplification, keeping write performance high and allowing wear levelling to work most effectively.

We have demonstrated, through manual overprovisioning, it was possible to turn a drive which works poorly without TRIM enabled into one which can perform acceptably in non-TRIM circumstances provided the controller has a “sane” block rotation and wear levelling algorithm. It’s likely, however, that should significant number of blocks fail, and be reallocated, the drive’s performance will fall again due to a lack of spare blocks for rotation.

Posted in Computing, Flash Memory | Tagged , , , | Leave a comment