As an engineer, I get excited whenever I have any form of optimization to perform. Overclocking is a kind of optimization, depending on how you do it – in this case it’s attempting to extract the maximum clock rate, with the lowest core voltage (and heat) practicable, maintaining a level of stability which is high enough for everyday use, using parts of the lowest cost, within the constraints of the stock CPU cooler, paste and weak VRMs on the motherboard. Having previous experience with Haswell overclocking, I dove right in and tried my hand at several things.
From my initial experiences with the Asrock Z97M Anniversary, overclocking was slightly more difficult than I would have liked. Due to some conflicting settings somewhere (don’t know what), despite selecting the overclock defaults provided, the board booted into Windows at stock speed and could only be overclocked with their A-Tuning software. Resetting the CMOS using the UEFI Setup Utility didn’t help. The strange thing was that the UEFI utility insisted the CPU was overclocked, and everything else insisted that it wasn’t, including A-Tuning.
It was only after clearing the CMOS with the hardware jumper that I managed to get the board to boot with the overclocked settings.
I started with the pre-sets, but soon opted to “go it my own”. Important parameters to tweak are:
- CPU Multiplier (selects the Core speed)
- Cache Multiplier (selects the Uncore speed)
- GPU Clock Rate (overclocks the integrated GPU)
- CPU Core Voltage (selects the voltage the FIVR gives the cores)
- CPU Cache Voltage (selects the voltage the FIVR gives the uncore)
- Current Limit (allows for higher power usage)
- Turbo Power Limit (allows for higher power usage)
- FIVR Operating Mode (select high performance over efficiency)
- FIVR Faults (disable to prevent issues)
- GPU Voltage (selects the voltage the FIVR gives the GPU)
- RAM Clock Base (selects 100/133Mhz based RAM increments)
- RAM Rate (selects the multiplier)
- RAM Latencies (to fine tune the RAM)
- Analog I/O voltage (to add stability for RAM overclocking)
- Digital I/O voltage (to add stability for RAM overclocking)
- System Agent voltage (to add stability for RAM overclocking)
- DIMM Voltage (changes the voltage to the RAM)
Ultimately, given the price of the CPU, I didn’t mind killing it at all, so I went through it in a rather cut-throat way.
The methodology behind overclocking Haswell CPUs can be simplified to five stages. By following them in order, it is possible to reduce the number of headaches and “I don’t know what caused it to BSOD” problems, while also reducing the time it takes to reach full stability.
0. Set the BIOS settings
The first thing to do is to set the BIOS settings correctly. I would suggest you try to load a low overclock default as a basis if you’re not comfortable setting them from scratch. You will need to disable current limits, faults and identify where your multipliers are. You need to check and verify (say, with CPU-Z) that your board is doing what it says it’s doing (setting multipliers correctly). Also consider setting the fans to full speed to avoid losing cooling performance. Also, you might want to disable thermal throttling (at your own risk) to ensure the CPU performs at the fastest speed at all times regardless of the high temperature.
1. Optimize Core
The first overclocking stage is to optimize the core independently of everything else. By this, it would be fastest to increase the core voltage to the highest you can accept (likely to be 1.400v or less for air cooling) within thermal limits (check temperatures using HWMonitor or CoreTemp), and then increase the multiplier until it no longer boots. Then dial back the multiplier, and try to pass 24-hours of Prime95 mixed torture test. Once you can pass 24 consecutive hours, then you know you’re stable at that given multiplier.
Remember that the temperatures reported are junction temperatures, and the temperatures reported on the datasheets are case temperatures. It would be wise to stay under 95 degrees C at the peak, and others will advise you to be even more conservative. But if it doesn’t crash, you’re pretty much good as real life workloads are unlikely to cause as much stress as Prime95 will.
Once you have this, it would be advisable to start the voltage optimization routine, by repeating the above but instead of increasing the multiplier, your job is to reduce the voltage in small steps (0.025v at the most, preferably 0.010v) until it breaks, and then go back up one or two steps. This will reduce the heat produced and improve the electrical efficiency of your system.
2. Optimize Uncore
The next step is to optimize the uncore – this is the multiplier which controls the cache. While the chip is shipped normally running both core and uncore at synchronous rates (i.e. 32 and 32), for high overclocks, it is likely that this isn’t possible. Uncore seems to top out at 39-42 depending on the chip.
While it isn’t an absolute necessity to run the cache synchronously with the CPU due to increased cache bandwidth, there are slight performance benefits to overclocking the uncore. As a result, you should increase your Uncore, starting from 38-39 and see where it fails the 24-hour Prime95 test, and then back it off.
The voltage to the uncore is generally less important as it contributes less heat, I would suggest setting it to 1.2 or 1.25v and leaving it.
3. Optimize RAM
The third step is to optimize the RAM. It’s helpful if you have the stock latencies from the SPD at hand (see CPU-Z). Generally, the Intel CPUs benefit greatly from increased RAM clock rate even at the expense of latency.
To prepare for RAM overclocking, I would suggest increasing the Digital, Analog I/O voltage and System Agent by +0.2v and the Vdimm to 1.65v and leaving it there. Then, start pushing the clock rate while manually adjusting latencies as not all boards will make sensible decisions. Do change the DRAM base clock to 133Mhz if that helps you eek a little more out of your RAM. You should also consider changing to 1T/1N from 2T/2N to improve performance as well.
To work out the latency, I’ve made this handy table (especially handy for cheap RAM):
For example, lets say we have cheap DDR3 1600Mhz RAM with a latency of 11-11-11-28 at 1600Mhz. You can see that the CAS latency (first number) corresponds to ~6.88ns. If you want to try running this at 1866Mhz, you look up the number nearest to this and find a CAS latency of 13 at 1866Mhz corresponds to 6.96ns which should be safe. Likewise if you then decide to push it to 2400, you should probably try CAS latency of 15 to keep a similar amount of “absolute” time.
How do you determine the big number at the end? Well, as a very rough rule of thumb, you should be using something around twice the CAS latency + 1 or 2 as a minimum. So if you’re trying 15-15-15-?, the ? should be 31 or 32 as a minimum.
This gives you a nearly safe starting point to decide what manual timings to apply.
Then you can tighten these and see if you can still keep it stable. Memtest-86 can be helpful, although this needs to be followed by a Prime95 run to be sure. Note that Memtest doesn’t show the right CPU clock rate – this is normal.
The important thing to realize is that overclockers term this loosening the timings, but you need to remember the timings are the number of clocks at a given clock rate. If you increase the clock rate, it takes more clocks for the same amount of time to pass. As a result, you might still get a net benefit if you increase your clock rate (bandwidth) even if you have to loosen your timings.
4. Optimize GPU
The final piece is to optimize the GPU. For this, I like to push the voltage up to about 1.300v and then move the clock up in 100Mhz increments and run a 3D test application (even the Windows System Assessment Tool is enough in most cases). When the GPU is unstable, the graphics subsystem driver resets and the screen blinks. Back off by 100Mhz once you hit this point and you’re often fine. The GPU can be tweaked in 50Mhz increments if necessary.
It might seem complicated at first, and in some ways, it can be. But if you’re willing to live with a more limited overclock, then the existing “one touch” settings might be more to your taste.
As it turns out, here are my finalized settings which pass 24-hours of Prime95 with the stock cooler never exceeding 91 degrees C:
- Core Multiplier: 46x (stock: 32x)
- Cache Multiplier: 40x (stock: 32x)
- Vcore: 1.375v (stock: 1.090v)
- Vcache: 1.200v
- Boost Power Max: 1000
- Short Power Max: 1000
- GPU Clock: 1700Mhz (stock: 1100Mhz)
- GPU Voltage: 1.300v
- System Agent: +0.2v (stock: 0v)
- Analog I/O Voltage: +0.2v (stock: 0v)
- Digital I/O Voltage: +0.2v (stock: 0v)
- RAM Clock: 2666Mhz (stock: 1600Mhz)
- RAM Timings: 14-14-14-30-1T (stock: 11-11-11-28-2T)
- RAM Voltage: 1.65v (stock: 1.50v)
- CPU Fan: 100% Fixed Mode
The proof is in the screenshot:
Looking at the stock VID of 1.090v, the chip is considered an “okay” to “good” chip. It’s not an excellent gem, and that’s noted by the high 1.375v core voltage required to achieve 4.6Ghz. Anandtech’s sample managed to do 4.7Ghz at 1.375v – so mine’s not far off the beat.
The biggest surprise was the level to which the Samsung OEM RAM (which has no heatspreaders or any fancy features) managed to overclock which simultaneously pushing the timings somewhat (despite how it appears). This actually has a positive performance benefit – as I will demonstrate in the Benchmarking section.
One unfortunate thing seems to be that this is the limit of the CPU and the motherboard as well. The VRMs are squealing fairly loudly under heavy computational load, normally indicative of slight overloading or stress. Nothing’s blown up or gotten too hot yet … but it means we are at the limit of the cooling, VRMs, CPU and RAM almost simultaneously – the kind of coincidence which helps you optimize every last drop from your hardware.
Before all of that serious benching, I wanted to see just how fast this thing can go, irrespective of long term stability. The way to achieve this is to push the voltage up to the maximum you (and your board) can tolerate on a short term, and then push the core rate up until you freeze and or fail to submit your validation. For me, my CPU managed this impressive performance on both cores:
My first validation, and my first CPU at 5Ghz. This is a pretty cool present, Intel!
Looking at the raw clock numbers are rarely entirely indicative of performance gains from actual use cases. As a result, it is important to benchmark with a wide suite of applications which reflect your actual uses and may be subject to bottlenecks from various other components within the system. It also provides us an opportunity to investigate the performance difference of running DDR3 1600 at 11-11-11-28 to DDR3 2666 at 14-14-14-30 (fairly “lousy” latencies by 2666 standards).
Unfortunately, there are many many benchmarks available and it would be impossible to cover them all. Others have some restrictive licensing agreements or software limitations that make them less useful. The benchmarks used below are:
- Intel Burn Test
- Maxon Cinebench R15
- Intel Extreme Tuning Utility (XTU)
- Passmark PerformanceTest 8.0
- PC Wizard 2014
The first results column represents running at stock speed with stock RAM, the next column is running at overclocked speed with RAM at 1600Mhz 11-11-11-28. The column next to it represents the percentage improvement. This is repeated for running overclocked with the RAM at 2666Mhz 14-14-14-30.
Overall, it seems that running the RAM with the looser timings but higher clock has a nearly 5% performance advantage on average. Some benchmarks showed no improvement or even marginal decrease in some cases, however. Overall, the CPU increased in frequency by 43.75% and the benchmark results raised by 34.34-39.12% which is a fairly significant amount, although slightly less than the core frequency increase would suggest.
It seems that my overclocking has yielded fairly significant benefits, although not without some effort. The CPU managed to get to 4.6Ghz on both cores, stably, under stock cooling and push a 1600Mhz DDR3 pair up to 2666Mhz. Quite impressive indeed for the price.
In terms of performance, the Pentium has been elevated into a mid-high Core i3 level, sitting between the 4350 and 4360 in terms of Passmark CPUBenchmark scores. It’s a decent chunk of performance for a basic machine, and a decent price. Add to that, a weekend of fun for me, and that’s money well spent!
Again, I will state, if you have no intention of overclocking, the CPU is a pretty “regularly” priced low-end CPU with no real distinguishing features. But if you do, you can definitely get some improvement for free – just don’t expect it. It’s not for everyone.