Video Compression Testing: x264 vs x265 CRF in Handbrake 0.10.5

Having played around with video since I had a few multimedia CD-ROMs and a BT878-based TV tuner card, video compression is one area that has amazed me. I watched as early “simple” compression efforts such as Cinepak and Indeo bought multimedia to CD-ROMs running at 1x to 2x, good enough for interactive encyclopedias and music video clips. The quality wasn’t as good as TV, but it was constrained by the computing power available then.

Because of the continual increase in computing power, I watched as MPEG-1 bought VCDs and VHS quality to the same amount of storage as normally taken by uncompressed CD-quality audio. Then MPEG-2 heralded the era of the DVD, SVCD and most of the DVB-T/DVB-S transmissions, with a claimed doubling of compression efficiency. Before long, MPEG-4/H.263 (ASP) was upon us, with another doubling, enabling a lot of “internet” video (e.g. DIVX/XVID). Another bump was achieved with MPEG-4/H.264 (Part 10 – AVC) which improved efficiency to the point where standard definition “near-DVD-quality” could be fit into the same sort of space as CD-quality audio.

Throughout the whole journey, I have been doing my own video comparisons, but mostly empirically by testing out several settings and seeing how I liked them. In the “early” days of each of these standards, it was a painful but almost necessary procedure to optimize the encoding workflow and achieve the required quality. I had to endure encode rates of about an hour for each minute of video when I first started with MPEG-1, then with MPEG-2, MPEG-4 ASP, and then MPEG-4 AVC. Luckily, the decode rates were often “sufficiently fast” to be able to render the output in real-time.

Developments in compression don’t stop. Increased computing power allows more sophisticated algorithms to be implemented. Increasing use of internet distribution and continual pressure on storage and bandwidth provide motivation to transition to an even more efficient form of compression, trading off computational time for better efficiency. Higher resolutions, such as UHD 4K and 8K, are likely to demand such improvements to become mainstream and to avoid overtaxing the limited bandwidth available in distribution channels.

The successor, at least in the MPEG suite of codecs, is MPEG-H Part 2, otherwise known as High Efficiency Video Coding (HEVC) or H.265. This standard was first completed in 2013, and is slowly seeing adoption owing to the increase in 4K cameras and smartphone SoCs with inbuilt hardware accelerated decoding/encoding and promises another almost halving of bitrate for the same perceptual quality. Unfortunately, licensing appears to be one of the areas which are holding HEVC back.

Of course, it’s not the only “next generation” codec available. VP9 (from Google) directly competes with HEVC, and has been shown by some to have superior encoding speed and similar video performance, although support is more limited. Its successor has been rolled into AOMedia Video 1, which is somewhat obscure at this time. From the Xiph.Org team, there is Daala, and from Cisco there is Thor. However, in my opinion, none of these codecs have quite reached the “critical mass” of adoption to make it hardware-embraced and universally-accessible as the MPEG suite of codecs has.

I did some initial informal testing on H.265 using x265 late last year, but it was not particularly extensive because of time limitations and needing to complete my PhD. As a result, I didn’t end up writing anything about it. This time around, I’ve decided to be a little more scientific to see what would turn up.

Before I go any further, I’ll point out that video compression testing is an area where there are many differing opinions and objections to certain types of testing and certain sorts of metrics. As a science, it’s quite imprecise because the human physiological perception of video isn’t fully understood, thus there are many dissenting views. There are also many settings which can be altered in the encoding software which can impact on the output quality, and some people have very strong opinions about how some things should be done. The purpose of this article isn’t to debate such issues, although where there are foreseeable objections, I will enclose some details in blockquotes, such as this paragraph.

Motivation

The main motivation of the experiment was to understand more about how x265 compares in encoding efficiency compared to x264. Specifically, I was motivated by this tooltip dialog in Handbrake that basically says “you’re on your own.

rf-window

As a result, I had quite a few questions I wanted to answer in as short a time as possible:

  • What is the approximate bitrate scale for the CRF values and how does it differ for x264 vs. x265?
  • How does this differ for content that’s moderately easy to encode, and others which are more difficult?
  • How do x264 CRF values and x265 CRF values compare in subjective and synthetic video quality benchmarks?
  • What are the encoding speed differences for different CRF values (and consequently bitrates), and how does x264 speed compare to x265 speed?
  • How do my different CPUs compare in terms of encoding speed?
  • Does x265 handle interlaced content properly?

As a result, I had to develop a test methodology to try and address these issues.

Methodology

Two computers running Windows 7 (updated to the latest set of patches at publication) were used throughout the experiment – an AMD Phenom II x6 1090T BE @ 3.9Ghz was used to encode the “difficult case” set of clips, and an Intel i7-4770k @ 3.9Ghz was used to encode the “average case” set of clips. The encoding software was Handbrake 0.10.5 64-bit edition. The x264 encoding was performed by x264 core 142 r 2479 dd79a61, and the x265 encode was performed by x265 1.9.

The test clips were encoded with Handbrake in H.264 and H.265 for comparison at 11 different CRF values, evenly spaced from 8 to 48 inclusive (i.e. spaced by 4). For both formats, the preset was set to Very Slow, and encoding tuning was not used. The H.264 profile selected was High/L4.1, whereas for H.265, the profile selected was Main. It was later determined that the H.265 level was L5, thus there is some disparity in the featuresets, however, High/L4.1 is most common for Blu-Ray quality 1080p content, and a matching setting was not available in Handbrake for x265. In additional options, interlace=tff was used for the difficult case to correspond with the interlaced status of the content. No picture processing (cropping, deinterlacing, detelecining, etc.) within Handbrake was enabled.

Final bitrates were determined using Media Player Classic – Home Cinema’s information dialog and confirmed with MediaInfo. Encoding rate was determined from the encode logs. As the AMD system was my “day to day” system, it was in use during several encodes resulting in outlying reduced encode rate numbers. These have been marked as outliers.

The encoded files and the source file were then transcoded into a lossless FFV1 AVI file using FFmpeg (version N-80066-g566be4f built by Zeranoe) for comparison (noting that no colourspace conversion occured, the file remained YUV 4:2:0). This was due to unusual behaviour being witnessed if this was not done resulting in implausible SSIM/PSNR figures. Frame alignment of the files was verified using Virtualdub and checking for scene change frames – in the case of the “difficult case” video, the first frame of the source file was discarded as Handbrake did not encode that frame to maintain video length and frame alignment. The “average case” video did not need any adjustments.

Pairs of files were compared for SSIM and PSNR using the following command:

ffmpeg -i [test] -i [ref] -lavfi "ssim;[0:v][1:v]psnr" -f null -

Results were recorded and reported. Produced data is available in the Appendix at the end of this post. If it is not visible, please click the more link to access it.

Two frames from each video were extracted, and a 320×200 crop from a detailed section was assembled into a collage for still image comparison. The frames were chosen to be at least two frames away from a scene cut to avoid picking a keyframe. This was performed using FFmpeg extracting into .bmp files (conversion from YUV 4:2:0 to RGB24), and then using Photoshop and exporting to a lossless PNG to avoid corrupting the output.

Subjective video quality was assessed using my Lenovo E431 laptop connected to a Kogan 50″ LED TV. This was prior calibrated by eye to ensure highlights and shadows do not clip. Testing was done with viewing at 2.5*H distance from the screen in a darkened room. Overscan correction was applied, however, all other driver-related enhancements were disabled. Use of frame rate mode switching in MPC-HC was used to avoid software frame-rate conversion. TV motion smoothing was not available, thus ensuring the viewed result is consistent with the encoded data. Subjective opinions at each rate were recorded.

The clips used were:

Approximations of the clips used are linked above (YouTube), however, the actual video files differ slightly (especially with difficult case where the online video is missing a few tens of seconds). The encoding by YouTube is also relatively poor by comparison to the source. Unfortunately, as the source clips are copyrighted, I can’t distribute them.

The choice of the clips was for several reasons – I had good quality sources of both samples which meant a better chance of seeing encoding issues, I was familiar with both clips, and both clips feature segments with high sharpness details. In the case of the difficult case, that clip is especially tricky to encode as the background has high spatial frequency detail, whereas the “focal point” of the dancing girl-group members have relatively “low” frequency detail, thus encoders often get it wrong and devote a lot of attention to the background. It also has a lot of flashing patterns which are quite “random” and require high bitrates to avoid turning into “mush”. (I did consider using T-ARA – Bo Peep as the difficult case clip, but that was mostly “fast cuts” increasing the difficulty, rather than any tricky imagery, plus my source quality was slightly lower.)

At this point, some people will have objection about the use of compressed material as the source. Normal objections include the potential for preferencing H.264 as the material was H.264 coded before, and the potential for loss of detail as to render high CRF encodes “meaningless”.

However, I think it’s important to keep in mind that if you expect the output to resemble the potentially imperfect result of the compressed input, this is less of an issue. The reference is the once-encoded video.

The second thing to note is that I’ve chosen sample clips I have with the highest bitrate and cleanest quality I have available – this maximises the potential for noticing encoding problems.

Thirdly, it’s also important to note that transcoding is a legitimate use of the codec – most people do not have the equipment to acquire raw footage and most consumer grade cameras already have compressed the footage. Other users are likely to be format-shifting and transcoding compressed to compressed. Thus testing in a compressed to compressed scenario is not invalid.

Results: Bitrate vs CRF

It’s an often touted piece of advice that a change of CRF by +/- 6 will halve/double the bitrate. Suggested rate-factors are normally around 19 to 23 roughly. Because I had no idea what a certain CRF value would produce bit-rate wise, and whether x265 adheres to the same convention, I found out by plotting the resulting bitrates on a semi-log plot and curve fitting.

bitrate-vs-crf-value

In the case of difficult case for x264, the upper end CRF 8 bitrate fell off because it had reached the limits of the [email protected] profile. Aside from that, the lines are somewhat wavy but still close to an exponential function with exponent ranges from -0.108 to 0.136.

As a result, from the curve fits, it seems that for x265, we observed that it takes a CRF movement of 5.09667 to 5.5899 to see a halving/doubling in size. For x264, it took 5.68153 to 6.41801 to see a halving/doubling in size. It seems that x265 is slightly more sensitive to the CRF value in setting its bitrate (average ~5.34 as opposed to 6.05).

Readers may be concerned that my x264 examples involve using a different profile and level ([email protected]) versus the x265 (Main@L5). It is acknowledged that it will cap the output quality – in future, I’ll try to match the encode levels but that is not directly configurable for x265 at present from Handbrake.

Results: Bitrate Savings at CRF Value

On the assumption that the CRF values correspond to the same quality of output, how much bitrate do we save? I tried to find out by comparing the bitrate values at given CRFs.

bitrate-ratio-graph

The answer is less straightforward than expected. For the difficult case, the x265 output averaged 92% of the x264 output but varied quite a bit – in some cases at higher CRFs being larger than the x264 output. The average case displayed an average size of 59% which is more in-line with expectations and is mostly stable around the commonly-used CRF ranges.

Then, naturally, comes the actual question of whether the CRF values provide the same perceived quality.

Results: SSIM and PSNR

There are two main methods used to evaluate video quality – namely Structual Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR). These metrics are widely used, and are easily accessible thanks to FFmpeg filters. Their characteristics differ somewhat, with SSIM attempting to be more perceptual, so it’s helpful to look at both.

At this point, many encodists may point out the existence of many other, potentially better, video quality judgement schemes. Unfortunately, they’re less easily accessible, they’re less widely used, and there will almost certainly be debates as to whether they correlate with perception or not.

This area is continually being contested, so I’d rather stick to something which has been widely used and known as the caveats are also known to some extent. In the case of SSIM and PSNR, one of the biggest disadvantages to my knowledge is that it has no temporal assessment of quality. They are also source-material sensitive, and are not very valid when comparing across different codecs. Of course, we can’t rely solely on synthetic benchmarks.

ssimnorm-vs-crf-value

We first take a look at the SSIM versus CRF graph. In this graph using the normalized (to 1) scale of SSIM, we can see the quality “fall-off” as CRF values are increased. The slope is steeper for the difficult case clips compared to the average case. In the case of the average case, the SSIM is almost tit-for-tat x265 vs x264 at each CRF value with the exception of CRF 48. Between the difficult case clips, there is a ~0.015 quality difference favouring x264.

ssimnorm-vs-bitrate

For fun, we can also plot this against bitrate to see what happens. In the average case, the lines are very close together, and the quality takes an abrupt turn for the worse at about 4Mbit/s. In all but the highest bitrates, x265 has an advantage. The difficult case shows a less pronounced knee, and has x264 leading. A potential explanation for this can be seen in the subjective viewing section.

ssimdb-vs-crf-value

To see differences in the high end more clearly, we can plot the dB value of SSIM. We can see that at lower CRFs (<20) for the average case, x264 actually pulls ahead for a higher SSIM. Whether this is visible, or even a positive impact will need to be checked, as cross-codec comparisons are not as straightforward.

ssimdb-vs-bitrate

Repeating for bitrate, we see the same sort of story as we saw with the normalized values.

psnraverage-vs-crf-value

Looking at the PSNR behaviour shows that there are only minor differences throughout, with an exception at the lowest CRF . The minimum PSNR also seems to “level out” at high CRF values, so the “difference” in quality between the best and worst frames is lower. In all, there’s really no big difference between PSNRs for the average case between x264 and x265 on a CRF value basis.

psnrdifficult-vs-crf-value

The difficult case shows a fairly similar result, without major differences with the exception at the low CRF end where H.264 profile restrictions prevented the bitrate from going any higher, limiting the potential PSNR. Interestingly, the PSNR variance increased for x264 as the CRF was increased so as to hit the bitrate limits – so while the PSNR average is better, the worst frame was more poorly encoded to make that happen.

psnr-vs-bitrate

Plotting the same plots versus bitrate doesn’t reveal much more.

It seems on the whole, both PSNR and SSIM metrics achieved similar values for corresponding x264 and x265 CRF values. As a result, at least from a synthetic quality standpoint, the quality of x264 and x265 encodes at the same CRF are nearly identical, implying a bitrate saving averaging 41% can be achieved in the average case (and just 8% for the difficult case).

Results: Encode Rate

Of course, with every bitrate saving comes a compute penalty, so it’s time to work that out.

encrate-vs-crf-value

First by plotting CRF values, we can see that the Intel machine that encoded the “average case” files was much faster than the older AMD machine that encoded the “difficult case” files. Interestingly, the encode speed increased as the CRF increased (i.e. lower bitrates) for the Intel machine but didn’t really show as strong of a relationship for the AMD machine. The fall off in encode rate as CRF increased to 48 may have to do with reaching “other” resource limitations within the CPU.

encrate-vs-bitrate

The same thing is plotted versus the resulting bitrate. Overall, the encode rates (excluding purple outlier data points) show that x265 achieves on average just 15.7% of the speed of x264 on the Intel machine, and 4.8% of the speed on an AMD machine. Older machines are probably best sticking to x264 because of the significant speed difference. The difference in the encode rates at lower bitrates/higher CRFs may be due to different performance optimizations and cache sizes between the CPUs.

This also highlights a potential pitfall for buyers deciding whether to upgrade or not, and are basing their decision on a single metric such as CPUBenchmark scores. In our case:

AMD PhenomII x6 1090T BE
5676 @ 3.2Ghz
6918 @ 3.9Ghz (scaled for clock rate)

Intel Core i7-4770k
10131 @ 3.5Ghz
11289 @ 3.9Ghz (scaled for clock rate)

This would mean that we would expect that the i7-4770k would perform at 163% of the AMD PhenomII x6 1090T BE. In reality, it performed at 213% on x264 and 637% on x265. Quite a big margin of difference.

Results: Still Image Samples

Lets take a look at some selected still image samples to see how the different CRFs compare. I suppose publishing small image samples for the case of illustrating the encoding quality is fair use … and while I could theoretically use artificially generated clips or self-shot clips, I don’t think that would represent the quality and characteristics of a professionally produced presentation which would skew the encoding results.

Yes, I know, you’re going to scream at me because the human eye doesn’t perceive video as “each frame being a still picture” and some of the quality degradation might not be noticeable. But hey, this is the next best thing …

Average Case #1

This is frame #215 from the source, where SinB stares inquisitively into a sideways camera. This frame is chosen due to pure detail, especially in shadows.

comparison-nav0

For x265, starting at CRF 20, I can notice some alterations in hair structure where some of the finer hairs have been “spread” slightly. Even CRF 16 isn’t immune to this, but its image quality is good. CRF 12 is indistinguishable from source. CRF 24 continues the quality slide and makes it a bit blotchy, whereas CRF 28 is obviously corrupting the quality of the eyebrows as well which are now just a smear and subtle details in the eyebrows and lower eyelid edge are missing.

The character of x264 is different, where impairments are not primarily in detail loss initially, instead, edges seem to gain noise. CRF 20 in the hair, has some odd coloured blocks, and the skin edge seems to be tainted with edge colour issues. The hair is slightly smoother than CRF 16 which appears much sharper and “straighter”. CRF24 makes a royal mess of the hair, turning it into blotches, and CRF 28 turns it into an almost solid block while losing details in the eyebrows and eyelid.

Average Case #2

This is frame #4484 from the source, a bridge scene where the members of Gfriend are seen running across. The scene is particularly sharp, and the bars of the bridge form a difficult encoding challenge, with high detail in the planks and the water running below.

comparison-nav1

The x265 encode at CRF 16 seems indistinguishable for the most part. However, at CRF 20, Yuju’s finger has a “halo”to the left of it, and Sowon’s red pants are starting to “merge” into the bars of the bridge somewhat. CRF 24 seems to worsen the halos around the fingers, and now, noise around heads passing the concrete can be seen, and the pants merging with the bridge bars is getting worse. CRF 28 is obviously starting to smooth a lot, and blockiness is obvious in the pants.

For x264, the impairments at CRF 28 was more sparkles and blocky posterization/quilting. CRF 24 showed a “pre-echo” of Yuju’s finger as well, which disappeared in CRF 20. CRF 20 appears to have lost some detail in the concrete beam behind, but isn’t bad at all.

Difficult Case #1

This is frame #1092, where Jessica (now ex-member of Girls’ Generation) had a solo shot. The frame was chosen because of the high detail in the eyes and hair.

comparison-gg0

Unfortunately, in the case of this clip, some of the detail was already lost in the encoding at the “source”, so we need to compare with an obviously degraded original.

For x265, the most obvious quality losses begin at about CRF 24 where the hair to the side seems to go slightly flatter in definition and some of the original blockiness (a desirable quality) is lost. By CRF 28, the hair looks like it’s pasted on with the loose strands being a little ill defined, and CRF 32 causes her to lose her eyebrows entirely.

For x264, CRF 20 maintains some of the original blockiness, but CRF 24 is visibly less defined in the hair in terms of the original blockiness. The difference is very minor, but by CRF 28, a similar loss of hair fidelity is seen but instead, it looks a little sharper but much noisier.

Difficult Case #2

This was frame #5827 where Yoona (left) and Tiffany (right) are dancing in front of the LED display board.

comparison-gg1

In the x265 case, in light of the messiness of the source, even CRF 24 looks acceptable. By CRF 28, Yoona’s almost completely lost her eyebrows and most of the facial definition, whereas Tiffany’s nose has a secondary “echo” outline. By comparison, the x264 encode looks a bit sharper, with some more visual noise around the facial features as if they’ve been sharpened resulting in some bright noise spots in CRF 24 and CRF 28. This clip is particularly tough to judge.

Summary

The still image samples seem to show that the necessary CRF to attain visually acceptable performance varies as a function of the input material. This is not unexpected, however, in the case of the more clear and simple material, CRF 12 was indistinguishable, CRF 16 was extremely good and CRF 20 was considered acceptable. For the more complex material, CRF 20 was considered good, and CRF 24 was considered somewhat acceptable.

Results: Subjective Viewing

I spent quite a few hours in front of my large TV checking out the quality of the video. In this way, the temporal quality and perception-based quality of the videos can be assessed.

average-case-summary-table

On the whole, I would have to agree that the x264 CRF values produce very similar acceptance levels on x265. I would probably accept CRF 12 as being visually lossless for the average case material, CRF 16 as hard to discern near-lossless and CRF 20 as “watchable”. This is because I’m especially picky when it comes to quality and minor flaws when I watch material that I’m familiar with (and I always wonder how people put up with YouTube and other streaming services which so obviously haven’t got enough bitrate).

The key difference is the type of impairments that occur with x264 vs x265. In bitrate starvation, x264 appears to be sharper and goes into a blocky-mode of degradation preferring to retain sharp details even if it makes it look noisy. In contrast, x265 starts smoothing areas of lower detail, while “popping” sharpness into the areas that have finer details. This does sometimes look a bit un-natural. It also starts dropping motion where it is small, resulting in motion artifacts and jumpiness, but on the whole, this might be slightly less objectionable depending on your personal opinion.

difficult-case-summary-table

With the difficult case data, we have a bit of a different opinion where CRF 16 is visually indistinguishable, and CRF 20 is almost indistinguishable. I would have to agree that x264 is better for this case and appeared more visually clean even at higher CRFs. This seems to be because the noise in x264 is “disguised” better in the patterning of the LED lights, whereas the smoothing in x265 becomes more obvious.

But a second, and more important issue, is the presence of a field oddity post-deinterlacing for the x265 clips, especially at CRF > 20.

decode-field-oddityThe oddity results in “stripes” appearing every n pixels vertically as if there is something wrong with the fields there.

block-boundariesUsing FFmpeg’s FFV1 decoded lossless file, examining it seems to show the encoded result actually does have the oddity in the fields. The reasoning for it isn’t clear at this stage, but may be related to a encode unit block boundary condition of sorts or a poor implementation of interlaced encoding. Whatever the case is, it makes interlaced files CRF > 20 difficult to watch during panning sequences especially.

This may go to explain why the SSIM/PSNR values were more smooth compared to the “average” case and were lower – these errors were not critical to the comparison, but are very temporally evident patterns.

Speaking of interlaced video, it’s a sad fact of life we still have to deal with it due to the storage of old videos, and due to some cameras still recording true interlaced content despite the majority of the world using progressive displays. Apparently H.265 supports interlaced encoding, although there was some confusion. One naive solution that some users may think of is just simply to deinterlace the video first and then encode it. The problem is that you will lose information through deinterlacing – if you’re going 50 fields per second to 25 frames, you’ve lost half the temporal information. If you frame double, then you can keep the temporal resolution but will have to generate the missing field for each frame – computationally intensive and can potentially introduce artifacts. It can also result in a file that is incompatible with many players, and if your motion compensation/prediction algorithm is poor, you might lose sharpness in some areas. I personally prefer to keep each format (progressive / interlaced) in its respective format through to the final display stage where the “best” deinterlacing for the situation can be applied.

However, as it turns out, the difficult case video is a Blu-Ray standard video, but it isn’t native interlaced material at all despite being 29.97fps. It’s 23.976fps that’s gone through a telecine process to make it 29.97fps. Why they would do such a thing, I don’t know, as Blu-ray supports 23.976p natively.

Conclusion

After a week and a bit of encoding and playing around with things, I think there are some rather interesting results.

On the whole, for the average case, x265 showed bitrate of about 59% of that of x264 at the same CRF. The CRF value sensitivity of x265 was slightly higher than x264, being about +/- 5.34 for a doubling/halving rather than +/- 6.05. Synthetically, the corresponding CRF values produced very similar SSIM and PSNR values for both x264 and x265, so the same “rules of thumb” might be applied, although the bitrate saving will vary depending on the specific CRF selected.

Encode rates for x265 were significantly slower than x264, as to be expected, due to the increased computational complexity. However, it seemed that lower CRF values/lower bitrates were much faster to encode on modern hardware (possibly due to better cache use). This wasn’t reflected with my older AMD Phenom II based system (possibly due to difference in instruction set and optimization).

Subjectively speaking, I’d have to say CRF 12 is indistinguishable and CRF 16 is good enough for virtually all cases. For the less discerning, CRF 20 is probably fine for watching, but CRF 24 is beginning to become annoying and CRF 28 is the least that could be considered acceptable. The result seems to be consistent across x264 and x265, although (unexpectedly) the difficult case seemed to tolerate higher CRF values probably as the harsh patterns were not as easily resolved by the eye and noise was less easily seen. As a result, even having a “rule of thumb” CRF can be hard, as it depends on the viewer, viewing equipment, source characteristics and sensitivity to artifacts.

Unfortunately, it seems that the “difficult case” data is really hard to interpret. This appears to be because x265 isn’t very good about handling interlaced content, and by using the “experimental” feature, the output wasn’t quite correct as seen in the subjective viewing. As a result, the synthetic benchmarks may have been reflective of the strange field blending on the edge of blocks resulting in a loss of fidelity that only resolved at fairly high quality values (CRF <=20). As a result, the mature x264 encoder was much more adept at handling interlaced content correctly, and I suppose we should take the difficult case data as being “atypical” and not representative of what properly encoded interlaced H.265 video would be like.

It looks like I’ve got another round of encoding ahead for testing the difficult case – as I discovered that the material was actually 23.976fps pulled up to 29.97fps, I’ll perform an inverse telecine on it and encode the progressive output to see what happens. This time, I’ll use H.264 Main@L5 for consistency as well. With any luck, the results might be more consistent with the average case.

Appendix: Table of Data

x265 - Average Case
CRF Bitrate SSIM     SSIM (dB) PSNR (avg) PSNR (min) fps
8   37545   0.995211 23.197804 51.363782  46.213167  0.40533746
12  18834   0.991562 20.737413 48.847675  43.550076  0.571760862
16  8407    0.987726 19.110261 47.004687  39.892563  0.847564601
20  4315    0.983769 17.896507 45.373892  36.558467  1.094531205
24  2504    0.978525 16.680696 43.635386  33.687427  1.289760018
28  1520    0.971082 15.388295 41.74252   30.954092  1.596318738
32  936     0.96073  14.059367 39.729187  28.777697  1.947472064
36  575     0.94749  12.797598 37.674602  27.641992  2.371282528
40  346     0.931052 11.614763 35.585072  27.202179  2.881390385
44  212     0.911069 10.509464 33.549725  26.877077  3.672094814
48  160     0.892783 9.697347  31.99061   26.683718  3.85216387

x265 - Difficult Case
CRF Bitrate SSIM     SSIM (dB) PSNR (avg) PSNR (min) fps
8   83914   0.994746 22.79505  47.241291  41.851423  0.150718854
12  58270   0.991083 20.4976   44.149468  38.15019   0.161308334
16  39154   0.98508  18.262177 41.057735  34.518727  0.177631469
20  25251   0.97563  16.131467 38.046595  30.987662  0.194232091
24  15318   0.961485 14.143724 35.185632  27.964271  0.093471634
28  8747    0.941772 12.348706 32.649622  25.227401  0.235893164
32  4855    0.915883 10.751168 30.507201  22.978117  0.236307704
36  2633    0.881909 9.277838  28.55634   21.244047  0.246284614
40  1409    0.839445 7.943765  26.817959  20.279733  0.297332074
44  975     0.80528  7.105894  25.836161  19.5892    0.245301742
48  888     0.791061 6.799803  25.438011  16.554815  0.234813531

x264 - Average Case
CRF Bitrate SSIM     SSIM (dB) PSNR (avg) PSNR (min) fps
8   44940   0.997364 25.791192 53.422334  47.377392  4.026217
12  28489   0.994607 22.681598 50.2906    44.523548  4.934854
16  14837   0.989849 19.934854 47.552363  41.977158  6.346291
20  6964    0.984727 18.16071  45.473564  37.964469  8.55346
24  3795    0.979337 16.848072 43.587039  34.802245  10.813952
28  2325    0.972033 15.53357  41.603963  31.924008  12.368113
32  1509    0.961974 14.199156 39.536816  29.436038  13.44052
36  1022    0.948866 12.912879 37.445484  27.7433    14.142429
40  716     0.932261 1.691611  35.328038  26.990109  14.590753
44  517     0.91127  10.519294 33.163862  25.74449   15.208363
48  380     0.8856   9.415741  30.925813  24.26836   15.538453

x264 - Difficult Case
CRF Bitrate SSIM     SSIM (dB) PSNR (avg) PSNR (min) fps
8   57884   0.997158 22.334516 45.401482  32.920994  2.67663
12  50338   0.993274 21.722554 44.885056  34.287518  2.975561
16  36946   0.990023 20.0101   42.79283   35.122098  3.339178
20  25202   0.983669 17.86938  39.792569  33.580799  3.86772
24  16425   0.972965 15.680784 36.641095  29.975891  4.494949
28  10094   0.95588  13.553604 33.580348  26.595447  5.168905
32  5977    0.931332 11.632443 30.855433  23.918678  5.85861
36  3589    0.898831 9.949544  28.509815  21.358371  6.376635
40  2264    0.858916 8.505224  26.492488  19.204705  6.556086
44  1511    0.812144 7.261751  24.741765  17.527457  6.636509
48  1040    0.762812 6.24908   23.281506  15.783191  6.661346

About lui_gough

I'm a bit of a nut for electronics, computing, photography, radio, satellite and other technical hobbies. Click for more about me!
This entry was posted in Computing and tagged , , , , , , . Bookmark the permalink.

36 Responses to Video Compression Testing: x264 vs x265 CRF in Handbrake 0.10.5

  1. anon says:

    This article is fantastic and it is insane there are no other comments…internet crowd nowdays are losing their IQ on youtube or twitch

    Very interesting results since one of the most used questions online is how x264 crf corresponds to x265

    Another interesting result is that the visually losless threshold value you found was 12
    I thought for transparent copies the average value was considered by many crf 18 for x264 for HD and crf 16 for SD

    • lui_gough says:

      Thanks for the comment – I was afraid nobody was even asking these sorts of questions anymore.

      It’s interesting, because I’ve personally assumed it to be 16 myself, but I suspect this is dependent on a load of factors:
      – Source material – something with sharp edges but smooth tones on either sides seems to exacerbate the “noise” generated by the lossy compression resulting in sparkly artifacts which are easy to spot. On the contrary to my expectations, the more complex material is more “messy” making such noise more difficult to spot. In this regard, I’d say x265 is better as it tends to produce less sharp noise but instead seems to sacrifice sharpness to some degree.
      – Display set-up – many people are using TVs with internal noise reduction, edge enhancements, motion interpolation – or even computers with graphic cards that have driver “features” enabled which obscure, and have their own effects on the decoded video. This might mask some of the codec deficiencies to some degree, and or create new ones. I’ve tried my best to ensure things are disabled enhancement wise when subjectively evaluating – and the image samples are based on ffmpeg software decoding. There is also the issue of correctly calibrating colours, gamma and black/white levels so as to reduce the possibility of clipping and make posterization artifacts in solid colour regions more apparent.
      – Viewing area condition – darkened room vs lit room apparently can make a difference, but the most major difference is the viewing distance and eyesight of the person judging the video. If you can’t visually distinguish every pixel on the screen, chances are, you aren’t going to be as harsh as someone that can – this is one reason why 4k vs 1080 sees diminishing returns – most people sit too far from their TV to be able to benefit. Further to this, there’s also the issue of small displays – I’ve found some differences assessing footage on a 24″ computer monitor vs a 50″ TV – the larger screen can make defects more obvious especially if you’re willing to get closer.
      – Source material quality – if it’s already compressed very harshly and has visual impairments, my guess would be that you would need to pay more attention and use higher CRFs to avoid further diluting the image (i.e. a bad approximation of an already bad approximation), but a good source that’s got a high bitrate can have more “loss” attributed to it without it being as visually noticeable.
      – Encoder mode decision – I’ve left things on automatic, but professional encodists often will selectively re-encode portions, enforcing certain frame types, filters, bit-rate limits on scenes to maximise quality and ensure no scene changes are missed. Others may try to push down or eliminate B-frames entirely, as they tend to produce a bigger quantization step difference (by default) which can result in noise “strobing” between each GOP. Of course, there’s also GOP length tuning which can increase compression efficiency (long GOP) at the risk of introducing more accumulated errors from P/B frames and causing problems with precise seeking or device compatibility, or the use of shorter GOPs with less compression efficiency (due to more I frames) but more consistent image quality.

      I’m going to be looking at some more video encoding related tidbits in the near future – but unfortunately, time is limited as blogging is a bit of a hobby on the side, so hopefully the next few articles will prove to be interesting.

      Thanks,
      Gough.

      • anon says:

        Thanks for the insightful response and i think you are 100% correct that the perfect aka visually loseles crf holy grail value is actually a variable depending on the factors you insightfully reported.

        Actually i did some experiments earlier this year using some VHS (from 80s/early 90s) transfer masters that i have archived in loseless Lagarith (huge files) in external drive and i kept some notes about the encodes ( i usually keep the masters…give away the encodes to relatives and then delete them)

        I did some exports from the masters using x264 to check the crf “transparent” well saught value so i could do casual transfer for familly and relatives and give it in a usb flash disk…it would be useful because all the masters have the same feel/vibe/average and the same svhs/tbc/ machine was used for the capture .

        Export one was crf 0 aka loseless x624 with the qp 0 switch
        Export two was crf 18
        Export three was crf 14
        Export four was crf 12

        All exports were using L4.1 preset low subme 10 and some other custom switches and were deinterlaced with QTGMC preset low

        Compared with the loseless x264 encode i think crf 18 copy was very very acceptable for casual people’s eyes…crf 14 was fantastic but not quite visually transparent but in general excellent….crf 12 i could not see any differency but i am not super picky 100% 1:1 at pixel viewing

        So all in all back when i did my tests i decided to give to family and relatives the crf 18 encodes.

        I find video codecs a very entertaining subject and even though i still have a lot of stuff to study and experiment with.

        • lui_gough says:

          Very interesting, as I was involved in some VHS transfers just recently – albeit noisy “on their last legs” multiple-generation copies of home videos. I’ve got some of mine archived in Huffyuv (older transfers) or FFV1 myself, mainly as when I started, my CPU wasn’t strong enough to handle other lossless codecs. Unfortunately, the earliest transfers were then converted to MPEG2 for DVD and the original .avis deleted, which I now regret. Anyhow, FFV1 is the codec I most prefer for lossless archival mainly as FFV1 is open source, documented, predictable, and used by other archivists at museums and libraries around the world. Huffyuv is my preferred codec where editing might take place, as FFV1 is not an “all I-frame” format, so re-encode free cutting at any point in the file is not guaranteed.

          Regardless, I wanted to make the transfers more accessible, so I did quick encodes with x264. CRF 18 produced “watchable” footage – the noisy deinterlaced VHS was not severely blocky (as was the case for lower bitrates), but it wasn’t visually perfect by any measure. Good enough to be not too annoying, and still weighed in at about 6.2Mbit/s (which tells you just how difficult it was to encode well). This time, I’m sure to keep the original FFV1 encoded .avis.

          I suspect this brings in another factor I didn’t mention earlier – sometimes differences of opinion come down to how familiar you are with the footage itself. If it’s something you’ve watched a lot, and you always look into the “tough” areas (e.g. dark shadows in a bright scene, background areas where there’s a lot of foreground details), you can be a lot harsher on the material. But with “dirty” source material, it’s hard to judge the codec noise versus the “intrinsic” noise of the source if they have similar characteristics (e.g. random soft-ish spotty noise). It’s where the codec noise is distinctly different (e.g. blocky sparkles only at edges) that makes me single them out as a “deficiency” as it stands out from the wanted signal. So I suppose the relationship between CRF and complexity of material is not straightforward, and that probably explains the wide variations of opinion.

          The other thing is the “effective” bits per area – if we’re watching old fashioned 576i footage on a 1080i display, each source pixel is going to be magnified to several pixels of the display, so accordingly if comparing subjective quality to “native resolution” footage from my experience, the quality of data for each pixel needs to be higher to avoid perceiving artifacts as any coding artifacts are also magnified. This probably explains the reason why often the recommendation for CRF values is lower CRF (higher quality) for SD, and slightly higher CRF (lower quality) for HD.

          – Gough

  2. jd says:

    Hello, Gough.

    first of all – thank you for the amazing analysis.
    You have done incredible work here!

    The stills are very interesting to me. When I tried to find the right x264 “CRF sweetspot” for myself a good while ago, I always compared short scenes of moving picture.

    I ended up at CRF16 (for regular 1080p / 8 Bit / 23.976p / BT.709 Blu-ray sources).

    I still think that it is impossible to distinguish the results from the Blu-ray source – as long as the picture is actually moving.
    However, I do see the minor difference in your stills. 🙂

    For the sources in question (common Blu-rays), CRF16 is not only a picture quality sweetspot to me. It is a setting, where I still end up with considerable file size savings in many cases.

    Going a bit higher in quality (lower CRF) results in file sizes, that are in most cases equal to or even larger than the source. That is probably due to most Blu-rays being compressed already.

    Your investigation made me do another test of my own to see if CRF14 is a viable option for me, this is what I ended up with:

    Source: 28.8GB
    CRF16: 20.5GB
    CRF14: 30.5GB

    So I will be sticking with CRF16 I think. 😉

    If you start another test series (which you indicated?), I was wondering if you are open to a few suggestions. 🙂
    Since you also seem to be looking for the best possible picture quality, why not stop at CRF24 and start with CRF12 (which you already identified as lossless looking)?
    You could then advance in closer steps of 1 or 2 in the important ranges, for instance CRF 12, 13, 14, 15, 16, 17, 18, 20, 22, 24.
    That would be a not too massive series of 10 and nobody who is interested in quality goes beyond 24 anyhow. 😉

    I am suggesting this because I am not really fond of the x265 “smoothing”.
    CRF16 for x264 looks great to me, however I would not be quite satisfied with x265 CRF16. This is why I would be interested in the filesizes of CRF14 or CRF15 with x265. Is there even an advantage over x264?

    Also, use Handbrake nightly builds. There are major improvements. 🙂
    https://handbrake.fr/nightly.php

    Anyhow – great work!
    Keep it up. 🙂

    Thank you again!

    Regards, jd

    p.s.
    Have you heard of “VMAF” as a metric?
    I think this should be interesting to you. 🙂

    http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html

    Maybe you can use VMAF to quantify your findings? 🙂

    • lui_gough says:

      Thanks JD for your input.

      Initially when I was running it, I was going blind and I really wanted to know the bitrate change vs CRF so I wanted to see if the relationship held towards the ends of the spectrum as well. As you validly noted, CRF steps of 4 are a bit too big in the area of interest, and nobody really cares beyond the late 20’s because the image quality is so poor.

      That being said, I’ve got a few other experiments that are still interested in the bitrate changes vs CRF, so I’ve retained the present steps, although if I do end up repeating the experiment, I will keep this in mind.

      As to whether a given CRF will be a saving or not highly depends on the source material bitrate. If the source was already overly compressed or has a relatively low average bitrate (say 16Mbit/s or less), the chances of saving anything with the higher CRFs reduces, as instead, it’s spending *more* bits to try and make sure that every sparkly artifact is retained. In that case, I’d argue you’d be better off remuxing if the source is in the desired target codec and profile to avoid *generational loss*, as say a 16Mbit/s source vs a 16Mbit/s re-encode, the re-encode is always going to lose out on detail. In fact, one of my upcoming posts looks at x264 generational encoding loss behaviour (up to an extreme case as well … yeah I’m crazy!).

      If you do the calculations based on the bitrates, the x265 bitrates are sometimes very close for complex materials, and can be about 60% less for relatively clean material … savings of 40-60% are typically quoted, and judging from my graphs, it can be achieved over the “middle” portion of the CRF range. At high quality (low CRF) the differences narrow somewhat … so I’d say x264 is still a good choice due to the reduced computational load.

      In terms of nightlies and that sort of thing – I might even be better off playing with ffmpeg directly (as I did later), but that being said, I seem not to see major differences until the underlying library has a major version change of some sort.

      I did see the note about the Netflix stuff – they seem to have developed their own metric, but they’re not the only ones who have tried to “improve” upon the whole computed metrics side of things. The problem is that some are licensed, others are poorly documented, and ultimately I think SSIM and PSNR are still most used and their caveats are most understood. To “add” another metric can take quite a bit of programming work and sleuthing, and I’ll probably confess now that I’m no image quality expert, so I’d probably make a mistake somewhere and screw it up. As soon as ffmpeg adds it in as a filter, I’ll be happy to use it :).

      – Gough

      • jd says:

        Wow, that was a quick reply. Thank you. 🙂

        You are right, if a source is already highly compressed and I end up with barely any saving with CRF16, I throw away the encode and keep an untouched remux. This is how I always handled it and it applies to approx. 10% of the sources.

        I am looking forward to your post on generational x264 encoding. 🙂

        Regards, jd

  3. Alex says:

    Dude! It is THE analysis I was looking for! You are just the real man in video processing!

    Let me try to input a little. I’m messing up with a bunch of videos captured by samsung x900 all in 1080p/50 quality. My goal is to encode them all in a batch to supply to people in a smaller size. However, quality matters. I was making a choice in between of x264/x265/vp8/vp9. Well, internet-wise thinking excludes x265 as only a handful of browsers support it but still, I was moving on. What I was discovered may be interesting to you.

    1. I have proven that officially declared “lossless” mode initiated by CRF=0 is NOT lossless in both x265 and x265. I checked it by taking raw data (yuv), encoding, converting back to raw and comparing byte-by-byte.
    2. I have found a glitch in veryfast preset of x264 as it constantly produces files smaller than even placebo. I reported this to devs but they dismiss any responsibility 🙂 I could not pinpoint a mistake in the code still but my observation is based on encoding of 34 quite different videos for now. It is not just one random case.

    These 2 points make me thinking that x264 and x265 are not free of either mistakes or hidden assumptions which we are not aware of.

    3. Continuing anyway with x264 and x265 I found out that in most cases SSIM is lower for me. Roughly speaking at CRF=12 it is often a little bit below 99, but still close. Here I see the only reason which is the frame-rate. I have 50, you have 30 and 24. Even though it is a plausible assumption I want to stress that some confusion is on the way as usually we have accustomed to bitrate in bits/seconds and NOT in bits/frame. I’m thinking on this as it is not obvious whether a combination of a lower SSIM and a higher bitrate change a human perception or not.

    4. vp9 is way more consistent in my opinion. First, its lossless mode IS lossless (proven by yuv->vp9->yuv and compare sequence). It has its own “crf” and it again can be analysed as you did. I have no enough computational power and/or time to do this but this is interesting and I believe will be found useful. Are you good to go with this? It seems that dependence of SSIM on CRF is similar for x264/5.
    5. vp8 seems weird. It has no just CRF. You need to specify both CRF and desired-bit-rate. you are obliged to go with a 2-parameter space of values to adjust and this complicates the things.

    I see it all is interesting and important as long as you like/need to mess up with video processing. Especially in a batch, not one clip a month. You are welcome to contact me directly (by e-mail) for an even more technical discussion.

    Finally, Thanks for your great work!

    With the very best regards,
    Alex

    • lui_gough says:

      Dear Alex,

      Interesting observations – I would agree outright that both x264 and x265 are unlikely to be free of mistakes – they are open source software which is very fluid with nightly builds and patches left-right-and-centre. However, having some more “core” functionality being broken seems rather surprising to me although at this stage, I don’t really have the time to investigate it. As for the lossless mode not being entirely lossless, that’s new news to me and although I haven’t explored it, it may also come down to YUV subsampling, as it is YUV 4:2:0 by default, and if you’re decoding it to some other format (e.g. 4:2:2 YUY2), then things will happen, just as they will if you have a filter chain somewhere which causes colourspace conversion. I came across a point where the metadata in the .MP4 file specified a particular colourspace, and depending on the decoder, the output would vary (ffmpeg versions had different results depending on what you were doing – e.g. .MP4 to .MP4 would be different to .MP4 -> “lossless” FFV1 using YUV420 space -> .MP4). The lossless mode in x264 at least is known not to be “decoder compatible” with all decoders as well. All I can say is that each tool and each version of the tool can produce different results, so it’s worth testing again from time to time, and not knowing the inner parts of the software (as I’m not a dev myself) makes things a little less clear to me where the problems might creep in.

      I suspect in regards to placebo, they’re not interested in it anyway, as it’s really not a truly useful preset. Smaller files on a CRF basis is not expected on a lower preset as normally this precludes more efficient forms of optimization, so it does seem to be a strange result.

      Frame rate complicates things slightly – on the one hand, you have more “information” to start with, but on the other, you expect to have more frame-to-frame correlation due to the smaller time difference between frames. As a result, often, higher frame rates can be accomodated with a slight increase in bitrate to offset the overhead of throwing more frame-headers in, although you need to pay attention to GOP lengths as well, as it could result in twice as many keyframes being put in if you reach your GOP length limits, reducing compression efficiency. Bitrate is only one “simple” metric, but it really depends on how efficient the encoder is as to whether the bits translate into visual quality or not.

      Personally speaking, the content itself has a bigger impact on SSIM figures, mainly due to the whole way CRF is evaluated by the encoder. If the material is simple, there’s a good chance that higher SSIM figures might be required at the same CRF as simple plain objects make encoding deficiencies more visible. This does seem to appear to be the case, at least in my testing, that simpler material does expose deficiencies more.

      VP9 has been a format I’ve kept my eye on. It has very similar compression efficiencies, but because of it’s patchy support (esp. hardware accelerated) and not-so-wide adoption, I can’t really justify running a comparison with it quite just yet. Maybe in the future, but I had always hoped that VP9 would have been the format of choice due to patent encumberances surrounding the MPEG-series of standards.

      I don’t do as much video coding as I used to – but I felt that testing it was important to see whether it was worth my while to begin transitioning to the “next” generation format. Each transcoding opportunity is something not to be taken lightly, as with each encoding generation there are losses, and the computational time and energy spent is not trivial for larger sets of video done properly. Unfortunately, because I’m getting quite busy with a few other things towards the end of the year, I don’t have any time for this right now, but thanks for your comment anyhow and interesting insights.

      Sincerely,
      Gough.

    • Jia Chie says:

      I suppose that you were trying to use Handbrake to do lossless encoding. Although Handbrake uses x264 and x265, it does not include every feature of x264 and x265, including the actual lossless mode. To be clear, the lossless encoding is only supported in certain profiles of H.264 and H.265. For H.264, x264 can do lossless encoding under High 4:4:4 Predictive Profile. For H.265, x265 can do lossless encoding under Main 4:4:4 Profile. Both profiles are not supported by Handbrake. What you will get under Handbrake is nearly lossless encoding, which is lossy. The command line tool ffmpeg, which includes both x264 and x265, does support actual lossless encoding. But if the source is Chroma Subsampled, the yuv444p has be specified to avoid chroma subsampling. Also, the way to trigger lossless encoding is different under x264 and x265. CRF 0 will trigger lossless encoding under x264. However, CRF 0 under x265 is lossy.
      Under ffmpeg,
      ffmpeg -i “input.mkv” -pix_fmt “yuv444p” -vcodec “libx264” -crf “0” “output.mkv”
      ffmpeg -i “input.mkv” -pix_fmt “yuv444p” -vcodec “libx265” -x265-params “lossless=1” “output.mkv”
      Most media players do not support lossless h.264 or h.265 video as they usually assume the pixel format of yuv420p. Handbrake only supports yuv420p. Under x265, there is a separate lossless mode. Without specifically choosing the lossless mode, no matter what qp or crf value is chosen, the encoded video will be lossy under x265. When lossless mode is triggered under x265, the qp value is 4 while qp=0 under x264 will trigger lossless mode.

      • lui_gough says:

        In fact, I am aware of the lossless mode in x264, but no, I was not targeting actual lossless video, but merely to see the influence of CRF on bitrate and perceived quality (to me) as I’m a little more discerning than the average viewer. I’m also aware of the playback issues with lossless x264, so I typically don’t use it in preference of a less efficient but more “standardized for archival” FFV1 in .avi container when doing intermediate processing.

        – Gough

    • nhyone says:

      It is not surprising that veryfast produces smaller files than placebo. In my experience, it often produces the smallest file because it throws more details away.

      In my limited testing, x264 placebo is slower than all the other presets put together and yet it is no better than veryslow — and size is sometimes a little bigger.

      The x264 presets are highly correlated with trellis:
      0: ultrafast, superfast, veryfast
      1: faster, fast, medium, slow
      2: slower, veryslow, placebo

      Use veryfast for speed and size, at some expense of quality
      Use tweaked slow for optimal trellis 1 setting
      Use tweaked veryslow for optimal trellis 2 setting

      Why tweaked? slow is hampered by its low ref/b-frames. Increase them for better efficiency.

      veryslow, OTOH, has too high ref/b-frames. Lower them for faster encoding with very little size increase.

      If you check the logs, you’ll find that x264 seldom use more than 4 ref frames and 6 b-frames.

  4. Steve says:

    In this post truth era, it is refreshing to see hard data. Respect, mate!

  5. orgest says:

    Hello there Gough!!
    First of all thank you for the detailed comparison between x264 and x265 codecs.
    I have a few questions about encoding mainly movies/Tv shows form BD source.
    Till now I’ve been using Staxrip to encode in x264.
    I usually use these settings (Copy-paste from the Staxrip software): [–preset veryslow –tune film –pass 2 –bitrate 13998 –level 4.1 –keyint 48 –ref 4 –bframes 3 –b-pyramid strict –vbv-bufsize 30000 –vbv-maxrate 40000 –aud –slices 4 –nal-hrd vbr –bluray-compat –]
    I’ve been wondering about switching to x265 since it came out, but I’ve been waiting for it to get more stable and then to make the switch.
    I want to know if I can make the turn now?
    And if I’ll use the x265 I want to use it to get more visual/better quality at the same filesize as I would encode with the x264.
    From your “Appendix: Table of Data” I see that If I’ll use the crf: 13/14@x265, then mabye I’ll have the same bitrate (14Mbps, roughly).
    But for example If I’ll keep the same 14Mbps bitrate for BD-encoding with x265, how much would the improvement be in comparison to the x264 codec?
    Thanks again.
    Regards.
    Orgest.

    • lui_gough says:

      Dear Orgest,

      First of all, your current settings seem to be VBR 2-pass bitrate targeted. This type of encoding is generally used where a given file-size or bitrate needs to be targeted on “average” for the whole video, and is likely to mean that very graphically simple films may be encoded with very good quality, whereas very graphically complex films are encoded at slightly less quality.

      Where an average bitrate or file size is not the aim, CRF encoding is used that basically keeps a perceived quality using however many bits as necessary. This also means that it isn’t necessary to run multiple passes, but the file size will very significantly depending on the source materials and settings.

      You can make the switch to x265, however, I don’t think it’s necessarily the best idea given the longer encoding times, and marginal gains in compression efficiency. First of all, you noted CRF13/14 which is actually quite high and not at all necessary. On perceived quality, encoding higher than CRF16 is likely not to be visible at all, but I suppose if you are archiving, then there might be a good reason for this.

      As for bitrate savings, that’s really what I focused on, rather than quality improvement at the same bitrate, as I was not doing CBR or VBR encoding but CRF encoding. Based on the CRF results, if we look at the SSIM (dB) vs Bitrate chart, using average case values, at 14Mbit’s, the difference is so small that it’s less than 0.5dB SSIM in favour of x265. Repeating the same for PSNR, it was only just under 1dB in favour of x265. In other words, the difference is relatively small to non-existent at least for the source material I used. To lengthen your encode times by 6-fold and risk compatibility issues to achieve this is not really what I’d call “efficient” at this stage.

      – Gough

      • Orgest says:

        Thanks for your quick reply Gough 🙂
        Actually when I have experimented with CRF encoding a few years back (mainly from 16-18 value), I had seen either a very big final filesize or strangely a smaller than expected final filesize. After doing some research about this, I found that it depends a lot from the source you are using, from the scenes etc etc.
        In that moment I didn’t have a lot of storage in hand and that’s why I decided to go with the 2pass fixed bitrate, because after experimenting a while I saw that for the CRF:18, the average bitrate was plus/minus at 14Mbps.
        But now that the storage is not a very big problem for me, I’m thinking to go the CRF way.
        And yes, I’m doing the encodins for archive purposes, so I want them to be as much as “future-proof” as possible. But I also don’t want them to have compatibility problems in the future.
        One more question.
        From your “Appendix: Table of Data/Average Case” I see that for the same CRF value of 16, the bitrate for the x264 encode is “14837 Kbps” and for the x265 one is “8407 Kbps”.
        If we compare these two x264/x265 encodings from the same CRF (16 in this case), the visual quality should be the same, right?
        And if we talk for the sake of compatibility (hardware/software), how are the x265 encodins right now?
        Is there gonna be more support in the future?
        Sorry for all these questions.
        Again, thank you for your time.
        Regards.
        Orgest.

        • lui_gough says:

          It’s hard to say. While intuitively, the quality should be the same, that’s why I also provided my subjective opinions in the section above. I don’t think they’re a perfect 1:1 match at the same CRF values, although I suppose that depends whether you’re sensitive to an image being soft, more than being blocky, etc. It’s highly subjective.

          H.265 HEVC compatibility is a mixed bag. Depending on your settings, it may or may not be able to be hardware accelerated decoded (e.g. choose 10-bit and you are likely to have issues). It also depends on the vintage of your equipment. In general, anything H.265 related requires more processing – both encoding and decoding, so if you’re targeting older devices, it’s not a good idea. You need to try this out for yourself. However, many graphic cards and mobile SoCs already have H.265 accelerated decoding support now, and despite patent royalty issues, it seems it is still taking off more than its main competitor (VP9) is.

          Again, you need to test this for yourself, because there is no “magic bullet” answer that will suit everyone.

          – Gough

  6. Jack says:

    Dear Lui,
    Dear all,

    since you represent high level of professionalism and interest in video compression (processing)
    I would be interest to set up a r&d team to develop smart algorithms to build depth maps from video material.
    Ok, depth maps are generated on-the-fly by modern 3D TV sets to let us watch 2D video input converted to 3D live.

    I study personal drone crashes in US, following statutory registration of personal drones implemented by FAA in December 2015, basing on data recorded by blackbox.

    Modern personal drones get equipped with distance sensing, obstacle avoidance technology (LIDAR or ultrasonic, laser distance meter).
    The above obstacle avoidance technologies are still expensive and video based depth maps generation is underfunded.

    I can provide you with maths, algorithms to be implemented and all technology behind depth maps generation.
    What should be solved logistics, if high quality video should be sent to the cloud for real-time processing or lower quality video should be live processed by add-on computer installed on the drone.

    I prefer video camera based obstacle avoidance since camera is readily available and already embedded into drone’s hardware and software.

    My depth maps live generation approach may require sophisticated video compression, object extraction, 3D topology algorithms.

    Please let me know your opinion.

  7. Hunter says:

    Wow. I tried my self using handbrake and the result is awesome i’ve made 8GB 1080p BluRay Movie to around 1.5 GB and the quality of the movie is same. The only problem with it is it uses hell a lot of CPU. Other than that everything worked fine for me.

  8. Hernan says:

    wow, AWESOME JOB!!!!!!!!! I was testing, making measurements, process time, configurations, use of cpu, memory, etc but for 2 or 3 days. Your work is much appreciated. (I’m sorry, my English is not very good, I’m from LA)

  9. I loved reading your analysis. I had done some basic tests on my own, but nothing as detailed as this. I ended up settling on CRF 24 as a good size/quality tradeoff and I’m glad to see that the reasons I chose that CRF were similar to your observations.

  10. freddyzdead says:

    Hello Gough,
    I know you’re busy atm and for the near future, so I’m not surprised if you don’t have time to answer questions for awhile.

    I got a real appreciation of the impact the quality of the source can have, when I acquired a 4k encoding of “Lucy”, being the first real opportunity to evaluate any real content at this resolution. The filesize was >22GB. The hardware I was playing it on could only manage 3840×2160@25Hz. The playback, not surprisingly, was stuttering and the audio was lagging behind. The picture quality, though, was excellent, playing through HDMI input of a 40″ 4k television I use as a monitor.

    So, I needed to downsize the source in order to actually watch it. I used Avidemux with x264, CQ 22 to re-encode it to 1080p. When it was finished, the resulting mkv file was only 1.1 GB. I played it, not expecting much, but was astonished to see a quite flawless picture. No artifacts, no smoothing, no loss of detail, at least that I could detect. I suspect I could have done it at crf 18 or 19, and it would have been well under 1GB and still good quality.

    The first time I got a good appreciation of how important the source quality is was many years ago when I tried to encode “Waterworld” to fit on one CD. This was an impossible task, since almost the entire movie takes place on the ocean, and moving water is one of the most difficult things to give an encoder to deal with. It was very frustrating, I could not get a watchable picture no matter what I did.

    About MPEG licencing, doesn’t the open-source nature of x265 circumvent the licencing quagmire around MPEG? Though, if VP9 is comparable to x265 and doesn’t have licencing issues, then I don’t see why it shouldn’t eventually have its day.

  11. plonk420 says:

    “About MPEG licencing, doesn’t the open-source nature of x265 circumvent the licencing quagmire around MPEG?”

    as much as i’m for an “open” filesharing market, there’s no getting around MPEG-LA’s patents.

  12. John says:

    Thanks for your analysis.

    I just started encoding again myself and ended up at CRF 16, great to see it confirmed!

  13. JungleBoy says:

    The BBC and ITV use VBR encoders these days for HDTV. Most other HD channels still use the old CBR encoders at a fixed 8.5Gbps. The BBC and ITV use a quantizer of about 21 for their 1080i broadcasts. If encoding 1080p (from 1080i/h264) it’s a waste of time using a quantizer less than 21 otherwise the 1080p will be bigger than the 1080i source. I usually encode 1080p at CRF 21 to 23 x264 8bit [email protected] or 1080p at CRF 23 x265 10bit Main10@L4. I find that 1080p/x265 10bit x265 gives roughly half the filesize of the 1080i/h264 source.

    4KTV’s can’t play 10bit x264 so it’s a waste of time using 10bit x264.
    4KTV’s can play 8bit x264 4:2:0 and either 8bit or 10bit x265 4:2:0
    Current 4K broadcasts on Astra 2 use 10bit 265 4:2:0 (some with HLG HDR).
    HEVC players can’t seem to play 4:2:2 or 4:4:4 but some 4KTV’s can.

    It would be best to compare the CRF values between x264 8bit 4:2:0 against x265 10bit 4:2:0 as these are the two modes that are currently the most used. To convert 8bit 1080i source into 10bit (for x265) I use the AviSynth dither tools.

  14. I always come back to this article when thinking about making changes to encoding profiles, this continues to be the reference work for understanding the options. Thanks for the remarkable work.

    As 10-bit video becomes more widely available in video decoders and displays, have you thought about conducting this analysis for HEVC 8-bit vs 10-bit for perceived quality? I have recently settled on a new profile of H.265 10-bit @ 20 RF, which in a handful of tests produces files similar in size to my previous of H.265 8-bit @ 19 RF, but to my eye produces a noticeable improvement in quality. The most obvious is in in dark areas, which is always a tough spot for this kind of encoding. To get good results, you’d typically need to accept a very large increase in file size with a lower RF just for a few scenes. Wasn’t worth it to me.

    With 10-bit it’s a rather remarkable improvement. The blockiness usually associated with very low contrast dark areas at moderate RF is pretty much gone, and there seems to be no cost other than an increase in encoding time.

    But I am always hesitant to adopt a totally new default profile without more analysis. To my eye this looks like a big win, but I wonder what I might not be noticing yet by reducing the RF by one. Leaving it the same adds about 25% to file size vs. 8 bit in a couple videos I tested. I suspect it’s OK to reduce it without necessarily making the same sacrifice you would with 8-bit, because the perhaps the extra color depth improves the ability to make good edge decisions. But I’m just guessing and this is based on just a handful of anecdotal comparison’s I’ve made…would love to see a professional analysis like this for the greater bit depth.

  15. excellent topic.
    one question – can you tell us anything about Average Bitrate settings, as opposed to CRF? I saw articles, which reccomends us using Avg bitrate option instead of CRF.

    • lui_gough says:

      ABR modes are generally similar to CBR in the sense you can predict the size of the file, but retaining some variability as in VBR modes to allocate bits to the more complex scenes that need it. Of course, when running in a single pass, it is limited to only a small amount of look-ahead, rather than being able to analyze the whole file in 2-pass or above modes.

      Generally speaking, ABR/CBR/VBR (multi-pass) are useful when you are targeting a particular filesize. CRF/CQ modes are useful if you are adamant about maintaining a certain level of quality without any predictability of the size.

      – Gough

  16. Its June 2019 now, and this workout is still fantastic. Many many thx to you, you answered all my questions i had about h264 vs. h265, as i now have a CPU which has the power to encode not only h264 but also h265 “in usable time”, and i did not know which CRF to use for my videos in order to safe space of course, but not to offer too much quality. And finally, if its worth recoding my videos to h265. Actually, i think, its worth to do it, but it will take his time.

    Many many greats from Germany, you did a wonderfull job!!

    Regards
    Rainer

    P.S.: My new CPU is an AMD Ryzen 5 2600, i had the AMD 1090 before and its a bigger step than from night to day…..

  17. graphs says:

    Hello! Fantastic test!

    Quick question though: How did you make those graphs out of the data from ffmpeg? Thank you so much!

  18. WM says:

    I worked out the optimal rate between 23 and 27 -> how’s that for a coincidence?

    hevc_amf -quality 0 -qp_p 23 -qp_i 27

  19. Thanks!
    I had downloaded a X265 file when I had a choice of X264 or x265 – filesize was the same – because newer is better, right? Then I wondered if I’d made the right choice. I was googling “X264 vs X265 and your article popped up! Other articles split for either side, with a slight edge to X264 articles, but no explanation as to why.

    • lui_gough says:

      Well, newer is better … but depends on how you define better.

      H.265/HEVC (which is what x265 produces) is a more modern codec which is more efficient with encoding as compared to H.264/AVC which x264 produces. That means, for the same amount of bits, HEVC offers better quality (or conversely, for the same quality, HEVC file sizes will be smaller).

      The trade-off of this is added computational complexity. What this means is that encoding takes a lot longer (which means more energy use) and that decoding may take a little longer as well. In the early days, this may have been a problem for older computers which may not have had a GPU or CPU capable of decoding acceleration resulting in choppy playback. For older mobile devices, this could have led to more battery usage or again, inability to play back.

      It is also important to note the time as codecs, especially encoders, improve over time as they become more feature complete. Earlier versions of the x265 codec may not have been able to use all the bit-rate efficiency enhancements, while some users may deliberately select options to disable techniques which cost long encoding times, favouring fast encoding (e.g. preset fast) over efficiency. As a result, depending on the encoder set-up and age, it could well and truly make x265 look bad (e.g. an old version of x265 configured to take the same amount of time to encode as x264).

      However, in general, HEVC allows for encoding techniques which can really reduce bit-rate around 50% although depending on material and viewer preferences – it’s up to encoders to implement them and encodists to choose the right settings. It is also up to the viewer to ensure their player doesn’t use “speed-up” optimisations that may skip certain steps (e.g. deblocking) as that will reduce the decoded quality as well. Any edge to x264 may be due to other considerations – such as the wide availability of H.264/AVC video decode acceleration, its near universal acceptance (e.g. YouTube famously does not support H.265/HEVC due to patent reasons), its modest computational requirements compared to modern computers etc.

      – Gough

  20. Dima says:

    This is a fantastic article. Thank you so much. A pleasure to read. I admire people like you. A+++.

  21. Tom says:

    By far the most helpful article on the subject I’ve seen – even now in (almost) 2023! Thank you for publishing this.

  22. Dave says:

    In my extensive tests using CRF 18 in H264 has never been this virtually transparent/minimal loss of quality compared to the source; just too much loss of fine detail. CRF 16 appears to be the acceptable sweet spot for both SD and HD material I found assuming the source is of good quality. Your A+ article totally challenges(thankfully!)and asks questions what many wanted to do on those so called techie forums, but always get shouted down: “You must accept CRF 18 as gospel..” nonsense! Thankyou for this!

Leave a Reply to HernanCancel reply