Video Codec Tests: x264 CRF vs enc_qsv CQ vs x265 CRF in Handbrake 0.10.5

After my earlier attempt to do some codec comparisons, I realized that something went grossly wrong with the “difficult case” clip. Specifically, it was an interlaced video clip which I wanted to use, and I naturally assumed that interlaced encoding would work fine as the option was present in x265. However, it seems that the feature is experimental, and the output had odd combing faults every certain-number of lines.

As the clip was actually shot progressive and pulled-up to make a 29.97fps interlaced clip, I decided to use ffmpeg to pull it down to a 23.976fps progressive clip and use this as the difficult case source clip. As a result, the clip is now technically different, although visually, it still looks the same. As a result, everything would need to be re-encoded for a test, sample frames chosen and auditioning checks undertaken.

Because of my perpetual “busy-ness” and a small lack of motivation, this post took a whole month longer than expected to compile.

Methodology

The clip used is Girls’ Generation – Galaxy Supernova, with the source file being H.264 [email protected] (27163kbit/s) 3m29s [email protected] 8-bit 4:2:0. A low-quality Youtube version that is similar (but not identical) can be seen here. The file was pulled down to 23.976fps using ffmpeg and encoded as a lossless ffv1 .avi file which was then encoded using Handbrake 0.10.5.0 (64-bit) into the various test files. I settled this time to use [email protected] profile to even the playing field for all contenders (as x265 does not have a High profile setting) and remove bitrate caps at the low CRF end of the scale.

As everything needed to be re-encoded, I decided to expand the scope to add in another competitor – Intel Quicksync, which Handbrake supports for H.264 encoding through enc_qsv. This utilizes special hardware elements of Intel 4th generation and newer Core i-series CPUs to perform accelerated video decoding (not used) and encoding. It was used in CQ mode, as it has no CRF mode, using the Best Quality preset. CQ represents Constant Quantizer and maintains a constant quality without regard to visual perception, whereas CRF used in x264/x265 is like an “upgraded” version of CQ which takes visual perception into account to try and improve perceived quality for the same size. My interest was to see if Intel Quicksync is a viable compressor (hardware compression has a relatively bad reputation in general), and how its CQ scale compares to the CRF scale used by x264/x265. Encoding with x264/x265 used the Very Slow preset.

To make encoding times more comparable, all encoding was done by my Intel Core i7-4770k system, overclocked at 3.9Ghz, dedicated solely to the project. After encoding, files were checked for their SSIM and PSNR values using FFmpeg (version N-80066-g566be4f built by Zeranoe) and the encoding frame rates were extracted from Handbrake log files. The synthetic metrics were analyzed and graphed. The raw data is provided in the appendix section at the end of the article.

Subjective opinions based on still frames were constructed. The frames selected from the pulled-down footage were frame 314 containing sharp-edged text and frame 3695 containing Sooyoung’s face and hair details. Opinions based on watching the encoded footage are also presented and are based upon viewing on a 50″ LCD TV in a dark room, with all enhancements turned off and calibrated as best as I could to ensure no details were clipped.

Results: Bitrate vs CRF/CQ

hbenc2-br-vs-crfval

Plotting the bitrates versus CRF/CQ values show significant deviations in the scale of enc_qsv versus x264/x265. Noting the log scale, it seems that enc_qsv consumes almost two to three times the bitrate of x264 at the same numerical value. In terms of scaling, it was found that x264 needs a movement of 5.975 units to halve/double the bitrate, x265 is slightly more sensitive needing 5.458 units, and enc_qsv less sensitive, needing a movement of 7.001 units to halve/double bitrate.

Results: Bitrate Savings at CRF Value

hbenc2-br-ratio

Using x264 as our “reference” for bitrate, at the same numerical value, enc_qsv used an average of 290% of the bitrate x264 used. The same comparison with x265 saw an average of 71.5% of the bitrate x264 used, so a slight saving there. The savings do vary as a function of CRF and in a semi-unpredictable way. However, it seems for x265, the savings are least at the high-quality end, which is unfortunate.

Looking at bitrates at CRF/CQ values alone is misleading, as this doesn’t take into account the quality delivered at each bitrate.

Results: Synthetic Metrics (SSIM and PSNR)

We begin by examining the normalized SSIM value, as this is considered “less sensitive” to minor quality improvements.

hbenc2-ssimnor-vs-crfval

When considering CRF/CQ values, enc_qsv gives better SSIMs for the same value. This is expected, since it’s using almost three times the bitrate, so the comparison is somewhat misleading. Comparing x264 with x265, they actually cross-over at CRF28, with a slight lead to x264 for values less than 28 and a slightly bigger lead to x265 for values greater than 28. It seems this implies that x265 degrades the image less even at the same CRF value where lower-bitrates may be involved.

hbenc2-ssimnor-vs-br

To even this out, the results can be plotted versus bitrate to show what level of quality is being achieved per bitrate. In this case, it can be seen that enc_qsv is at a notable disadvantage, with the difference between x265 and x264 being smaller than the difference between enc_qsv and x264. This implies that enc_qsv is still hardware optimized encoding with the tradeoffs of lower compression efficiency compared to pure-software implementations. The bitrate-quality curves converge at higher bitrates, as SSIM is not as sensitive to small quality changes, however, at bitrates below 25Mbit/s, x265 is seen to have a clear advantage over x264 even if it appears to be small.

hbenc2-ssimdb-vs-crfval

The same information can be represented in the decibel scale to make small changes more apparent.

hbenc2-ssimdb-vs-br

For SSIM, it appears that enc_qsv lags behind at any given bitrate by 2-3dB which is quite a notable difference. In this graph, the difference between x264 and x265 at each bitrate is much smaller, barely 1dB or thereabouts. It’s notable that in this scale, it’s clear that x265 is superior at all bitrates – as you would hope it to be.

We now move on to examining the PSNR plots, which are more sensitive to small changes and are typically used to examine and fine-tune codecs.

hbenc2-psnr-vs-crfval

At each CRF/CQ value, it seems that x264 and x265 are a close match for CRF values of 28 or less, with x264 appearing slightly ahead. At higher CRFs (lower quality), x265 deviates upwards and produces better PSNR values for the same CRF selection. enc_qsv, with its completely different scale, shows 5dB above PSNR at low CQ values, but also has a nearly-straight-line PSNR compared to the curved lines of x264/x265. Therefore, its quality degradation is more in line with the CQ value selected with regards to PSNR. Another difference is that the difference between the average PSNR and minimum PSNR seems to increase as the CQ values get higher with enc_qsv, implying that the quality variation is increasing as lower quality is selected. This might be undesirable, as this might be observed as blurry frames which suddenly “snap” back to clarity at the next keyframe. x264 and x265 both seem to keep the difference “limited”.

hbenc2-psnr-vs-br

When the bitrate is considered with the delivered quality, for PSNR, both x264 and x265 have about a 1.5dB increase in average PSNR and 3dB increase in minimum PSNR at the same bitrate used in most areas of the chart. The values converge towards the low end (sub 10Mbit/s), which means that the compression efficiency difference may not be of significance to those looking to transcode to fit mobile devices (where low bitrates are more common). In this metrix, x265 beats x264 at all bitrates, with a margin of about 0.5dB at the high bitrate end, and up to about 2.5dB at 5Mbit/s.

Results: Encode Rate

hbenc2-encr-vs-crfval

Keeping in mind that the encoding was performed on an Intel Core i7-4770k, the encode rates for x264 and x265 both varied as a function of CRF with higher CRFs (lower quality) encoding faster than lower CRFs. Encoding rate for x265 reached a peak of 7.8fps, and x265 reached a peak of 2.6fps. At CRF8, this fell to 2.4fps and 0.3fps respectively.

As enc_qsv uses a hardware based encoder, its encode rate was mostly stable and reached about 18-19fps. This speed may not represent the maximum encoding rate possible, as it may have been constrained by the speed of decoding and piping in the ffv1 encoded source material. Regardless, this is still more than double the speed of x264.

hbenc2-encr-vs-br

The same information can be plotted against bitrate. As enc_qsv’s scale is different, the bitrates produced by enc_qsv can be quite high. At the higher bitrates, it seems the speed suffers slightly. For x264 and x265, it seems there is a drastic increase with encoding speed for very low bitrates below about 5000kbit/s. Speeds tail off as bitrates increase.

Results: Still Image Samples

Frame 314

hbenc2-frame314

This frame was chosen due to the high detail in the background and sharp edges of the text which should allow encoding deficiencies to be identified.

With x264, no visual anomalies can be easily identified in the CRF8 to CRF20 samples. At CRF24, the purple stripe across the background near the top of the image seems to have slightly less pink vibrance, and the straight line solid colour part of the N is developing some false colour bleeding. The pink purple stripe on the lower part of the frame is a little rough in appearance. By CRF28, the upper stripe has lost some of its detail and is blending into the background, and some of the shadowing in the O in the text is being lost. The green stripe to the lower edge of the frame has noticeable quilting. CRF32 shows noticeable quilting with some blocks being smoothed, which by CRF36, most of them have been smoothed. From thereon, quality rapidly deteriorates.

For x265, CRF8 to CRF16 are visually transparent. CRF20 appears to have some slight hint of geometry changes with the upper purple stripe which shows a hint that some quilting “might” be developing. It’s really hard to pick up, and I might even be imagining it. CRF24 shows the upper purple stripe developing more hints of sharp edges, and the N seems to have been slightly “transparent”, being encroached upon by the background and slightly softened. The lower pink stripe has more hardened edges, and by CRF28, the text seems noticeably softer and losing shadow detail with noticeable quilting. CRF32 has some noticeable quilting and geometric disruptions, with larger blocks by CRF36 and similar rapid quality degradation from there.

enc_qsv was visually transparent through to around CQ28. At CQ32, the background details appear to be contaminated with a little noise and smoothing, but the text is well rendered. It seems that the enc_qsv video preserves the text detail quite strongly, even through to CQ48, although background detail is rapidly lost from CQ36 onward where some of the background details appear to be blended.

Frame 3695

hbenc2-frame3695

This frame was chosen as there were some details in foreground and background to preserve. This particular one is tricky to judge, as the frame in question does have some encoding artifacts in the reference which need to be faithfully preserved in the output to avoid introducing further losses.

x264 does a transparent job up to CRF16, with CRF20 seeing some of the blockiness in the hair starting to be disturbed and smoothed out. CRF24 continues this further with CRF28 starting to introduce some encoding disturbances in the form of sparkles and blocking near sharp edges of the face. Geometrical irregularities in the background are seen in CRF32 along with more loss of face skin-tone. From there, CRF36 shows the background being destroyed and quality rapidly declining.

x265 does a mostly transparent job up to CRF20, with CRF24 showing the same smoothing out of the blocking in the hair, although the result actually looks quite visually acceptable and consistent. CRF28 shows further smoothing and slight disruptions to the edge of the face. CRF32 shows strong macroblocking of various sizes and more loss of skin tone. By CRF36, background disruption is occurring and hair texture is beginning to be completely lost.

enc_qsv is transparent until CQ20, with CQ24 showing a very slight softening of the blockiness in the hair. CQ28, the softening becomes more obvious but most details are retained. Some noise seems to be introduced along the edges of the cheeks, which gets more severe with CQ32. CQ36 continues this, with the hair detail suffering dramatically, and CQ40 has the hair detail almost completely lost and cheek/nose structure being damaged.

Results: Subjective Viewing

hbenc2-svtableThe x264/x265 figures are broadly similar, with CRF20 producing indistinguishable and CRF24 producing very good results. enc_qsv is a little different as its bitrate scale is different, with CQ24 producing indistinguishable results, and CQ28 producing very good results. Again, the difference in low-bitrate degradation is most notable, with my personal preference to x265 for bitrate constrained behaviour of “smoothing” the image rather than introducing strong macroblocking/quilting and false details. At higher bitrates, they’re mostly on parity in the case of this difficult case footage on an identical CRF level.

That being said, there is some variation between CRF levels considered “visually transparent” and this is likely due to the source material quality and the nature of the source material itself. As the material source is already compressed (most sources will be in some way), any artifacts which exist in the source are considered details to be preserved in future encodes. If the detail was already lost in the source, it cannot be recovered, and a higher CRF (lower quality) may be necessary to preserve the remaining detail without blowing out the bitrate. Highly complex material with lots of transient scenes such as this particular clip may make codec deficiencies easier to “hide” as we may not have enough time to fully evaluate brief scenes before they are removed from the screen. Visually complex material with strong patterning also provides places for noise to “hide” without being visually obvious. So rather counter-intuitively, the cleaner “average case” material required a higher CRF to avoid producing observable artifacts, as compared to the “difficult case” material.

Conclusion

It seems that for the difficult case material, the advantages of x265 in terms of space savings is more limited than for the average case. While the average savings amounted to about 28.5%, the synthetic metrics appeared to show x265 producing the larger savings for low-bitrate scenarios, rather than high-quality high-bitrate scenarios. This may be acceptable where bandwidth limited transmission is the target, but isn’t optimal for storage and archival, especially considered the significant hit to encode rates to achieve this. At the same CRF values, the PSNR and SSIM metrics tended to show agreement across x264 and x265, implying similar visual quality was being achieved.

Intel’s Quicksync encoding is effective, showing a >2-fold speed increase compared to pure x264 software encoding. However, its CQ encoding scale is not directly translated to the CRF values that x264 uses, and there is a slight compression efficiency disadvantage as well. The speed increase, however, could make it worthwhile for certain users who may not be interested in squeezing out every last bit.

On the whole, subjective viewing and still frame analysis both yielded similar results and agreement with synthetic metrics which suggested that for this difficult case clip, CRF20 was visually transparent (for x264/x265) and CRF24 was very good. For enc_qsv, the respective values approximately are CQ24 and CQ28. Counter-intuitively, more complex material seems to tolerate slightly higher CRF values possibly due to the transient nature of the scenes and strong patterns which can make deficiencies less obvious.

Appendix: Raw Data

x264
crf br     ssim     ssim (dB) psnr-av   psnr-min  enc-fps
8    80431 0.996259 24.270122 49.079511 42.799438 2.409331
12   56200 0.993292 21.733926 45.872697 40.598088 2.669777
16   37980 0.988544 19.4097   42.639772 36.824497 3.001713
20   24748 0.980864 17.181453 39.386707 33.235945 3.494829
24   15310 0.968555 15.024512 36.20401  29.81061  4.007893
28    8945 0.949897 13.001395 33.274627 26.354434 4.59113
32    5209 0.92455  11.223395 30.796128 23.510602 5.198533
36    3169 0.892636 9.691405  28.645187 20.9171   5.945302
40    2014 0.855041 8.387561  26.704712 18.49257  6.691361
44    1360 0.81408  7.306731  25.075933 16.915291 7.260373
48     921 0.771202 6.405481  23.655747 15.703609 7.846578

enc_qsv
crf br     ssim     ssim (dB) psnr-av   psnr-min  enc-fps
8   150207 0.998004 26.998622 53.610681 49.913408 17.920824
12  112393 0.996435 24.479861 50.712053 46.636429 17.885168
16   83636 0.994113 22.301061 47.97989  43.539322 18.606039
20   59988 0.990626 20.280631 45.065929 39.953024 18.590631
24   42045 0.983793 17.902841 41.703423 36.282667 18.709183
28   28690 0.975579 16.12245  39.017255 33.058641 18.778584
32   18493 0.963111 14.331049 36.285278 29.824356 18.748028
36   11569 0.945839 12.663095 33.828592 26.923752 18.732244
40    7209 0.923872 11.184576 24.568336 67.168087 18.78739
44    4576 0.895844 9.823159  29.676427 22.165962 18.738522
48    3123 0.863739 8.656299  27.981151 20.207771 18.724827

x265
crf br     ssim     ssim (dB) psnr-av psnr-min    enc-fps
8    67999 0.995184 23.173434 48.232495 43.026082 0.336379
12   46754 0.991877 20.902864 45.165876 39.340194 0.392566
16   31004 0.986617 18.734503 42.077445 35.808481 0.466758
20   19626 0.978444 16.664254 39.043464 32.509358 0.620252
24   11598 0.966304 14.724241 36.173184 29.336083 0.767254
28    6446 0.949827 12.995272 33.686856 26.353374 0.950986
32    3573 0.929462 11.515755 31.68793  24.076365 1.180606
36    2012 0.903356 10.148233 29.894034 22.547193 1.484253
40    1117 0.868552 8.812468  28.114236 21.458328 1.856564
44     724 0.832554 7.761241  26.829709 21.090787 2.312682
48     634 0.815353 7.336581  26.347943 20.948869 2.702679

About lui_gough

I'm a bit of a nut for electronics, computing, photography, radio, satellite and other technical hobbies. Click for more about me!
This entry was posted in Computing and tagged , , , , , , . Bookmark the permalink.

Error: Comment is Missing!