After my earlier attempt to do some codec comparisons, I realized that something went grossly wrong with the “difficult case” clip. Specifically, it was an interlaced video clip which I wanted to use, and I naturally assumed that interlaced encoding would work fine as the option was present in x265. However, it seems that the feature is experimental, and the output had odd combing faults every certain-number of lines.
As the clip was actually shot progressive and pulled-up to make a 29.97fps interlaced clip, I decided to use ffmpeg to pull it down to a 23.976fps progressive clip and use this as the difficult case source clip. As a result, the clip is now technically different, although visually, it still looks the same. As a result, everything would need to be re-encoded for a test, sample frames chosen and auditioning checks undertaken.
Because of my perpetual “busy-ness” and a small lack of motivation, this post took a whole month longer than expected to compile.
The clip used is Girls’ Generation – Galaxy Supernova, with the source file being H.264 [email protected] (27163kbit/s) 3m29s [email protected] 8-bit 4:2:0. A low-quality Youtube version that is similar (but not identical) can be seen here. The file was pulled down to 23.976fps using ffmpeg and encoded as a lossless ffv1 .avi file which was then encoded using Handbrake 0.10.5.0 (64-bit) into the various test files. I settled this time to use [email protected] profile to even the playing field for all contenders (as x265 does not have a High profile setting) and remove bitrate caps at the low CRF end of the scale.
As everything needed to be re-encoded, I decided to expand the scope to add in another competitor – Intel Quicksync, which Handbrake supports for H.264 encoding through enc_qsv. This utilizes special hardware elements of Intel 4th generation and newer Core i-series CPUs to perform accelerated video decoding (not used) and encoding. It was used in CQ mode, as it has no CRF mode, using the Best Quality preset. CQ represents Constant Quantizer and maintains a constant quality without regard to visual perception, whereas CRF used in x264/x265 is like an “upgraded” version of CQ which takes visual perception into account to try and improve perceived quality for the same size. My interest was to see if Intel Quicksync is a viable compressor (hardware compression has a relatively bad reputation in general), and how its CQ scale compares to the CRF scale used by x264/x265. Encoding with x264/x265 used the Very Slow preset.
To make encoding times more comparable, all encoding was done by my Intel Core i7-4770k system, overclocked at 3.9Ghz, dedicated solely to the project. After encoding, files were checked for their SSIM and PSNR values using FFmpeg (version N-80066-g566be4f built by Zeranoe) and the encoding frame rates were extracted from Handbrake log files. The synthetic metrics were analyzed and graphed. The raw data is provided in the appendix section at the end of the article.
Subjective opinions based on still frames were constructed. The frames selected from the pulled-down footage were frame 314 containing sharp-edged text and frame 3695 containing Sooyoung’s face and hair details. Opinions based on watching the encoded footage are also presented and are based upon viewing on a 50″ LCD TV in a dark room, with all enhancements turned off and calibrated as best as I could to ensure no details were clipped.
Results: Bitrate vs CRF/CQ
Plotting the bitrates versus CRF/CQ values show significant deviations in the scale of enc_qsv versus x264/x265. Noting the log scale, it seems that enc_qsv consumes almost two to three times the bitrate of x264 at the same numerical value. In terms of scaling, it was found that x264 needs a movement of 5.975 units to halve/double the bitrate, x265 is slightly more sensitive needing 5.458 units, and enc_qsv less sensitive, needing a movement of 7.001 units to halve/double bitrate.
Results: Bitrate Savings at CRF Value
Using x264 as our “reference” for bitrate, at the same numerical value, enc_qsv used an average of 290% of the bitrate x264 used. The same comparison with x265 saw an average of 71.5% of the bitrate x264 used, so a slight saving there. The savings do vary as a function of CRF and in a semi-unpredictable way. However, it seems for x265, the savings are least at the high-quality end, which is unfortunate.
Looking at bitrates at CRF/CQ values alone is misleading, as this doesn’t take into account the quality delivered at each bitrate.
Results: Synthetic Metrics (SSIM and PSNR)
We begin by examining the normalized SSIM value, as this is considered “less sensitive” to minor quality improvements.
When considering CRF/CQ values, enc_qsv gives better SSIMs for the same value. This is expected, since it’s using almost three times the bitrate, so the comparison is somewhat misleading. Comparing x264 with x265, they actually cross-over at CRF28, with a slight lead to x264 for values less than 28 and a slightly bigger lead to x265 for values greater than 28. It seems this implies that x265 degrades the image less even at the same CRF value where lower-bitrates may be involved.
To even this out, the results can be plotted versus bitrate to show what level of quality is being achieved per bitrate. In this case, it can be seen that enc_qsv is at a notable disadvantage, with the difference between x265 and x264 being smaller than the difference between enc_qsv and x264. This implies that enc_qsv is still hardware optimized encoding with the tradeoffs of lower compression efficiency compared to pure-software implementations. The bitrate-quality curves converge at higher bitrates, as SSIM is not as sensitive to small quality changes, however, at bitrates below 25Mbit/s, x265 is seen to have a clear advantage over x264 even if it appears to be small.
The same information can be represented in the decibel scale to make small changes more apparent.
For SSIM, it appears that enc_qsv lags behind at any given bitrate by 2-3dB which is quite a notable difference. In this graph, the difference between x264 and x265 at each bitrate is much smaller, barely 1dB or thereabouts. It’s notable that in this scale, it’s clear that x265 is superior at all bitrates – as you would hope it to be.
We now move on to examining the PSNR plots, which are more sensitive to small changes and are typically used to examine and fine-tune codecs.
At each CRF/CQ value, it seems that x264 and x265 are a close match for CRF values of 28 or less, with x264 appearing slightly ahead. At higher CRFs (lower quality), x265 deviates upwards and produces better PSNR values for the same CRF selection. enc_qsv, with its completely different scale, shows 5dB above PSNR at low CQ values, but also has a nearly-straight-line PSNR compared to the curved lines of x264/x265. Therefore, its quality degradation is more in line with the CQ value selected with regards to PSNR. Another difference is that the difference between the average PSNR and minimum PSNR seems to increase as the CQ values get higher with enc_qsv, implying that the quality variation is increasing as lower quality is selected. This might be undesirable, as this might be observed as blurry frames which suddenly “snap” back to clarity at the next keyframe. x264 and x265 both seem to keep the difference “limited”.
When the bitrate is considered with the delivered quality, for PSNR, both x264 and x265 have about a 1.5dB increase in average PSNR and 3dB increase in minimum PSNR at the same bitrate used in most areas of the chart. The values converge towards the low end (sub 10Mbit/s), which means that the compression efficiency difference may not be of significance to those looking to transcode to fit mobile devices (where low bitrates are more common). In this metrix, x265 beats x264 at all bitrates, with a margin of about 0.5dB at the high bitrate end, and up to about 2.5dB at 5Mbit/s.
Results: Encode Rate
Keeping in mind that the encoding was performed on an Intel Core i7-4770k, the encode rates for x264 and x265 both varied as a function of CRF with higher CRFs (lower quality) encoding faster than lower CRFs. Encoding rate for x265 reached a peak of 7.8fps, and x265 reached a peak of 2.6fps. At CRF8, this fell to 2.4fps and 0.3fps respectively.
As enc_qsv uses a hardware based encoder, its encode rate was mostly stable and reached about 18-19fps. This speed may not represent the maximum encoding rate possible, as it may have been constrained by the speed of decoding and piping in the ffv1 encoded source material. Regardless, this is still more than double the speed of x264.
The same information can be plotted against bitrate. As enc_qsv’s scale is different, the bitrates produced by enc_qsv can be quite high. At the higher bitrates, it seems the speed suffers slightly. For x264 and x265, it seems there is a drastic increase with encoding speed for very low bitrates below about 5000kbit/s. Speeds tail off as bitrates increase.
Results: Still Image Samples
This frame was chosen due to the high detail in the background and sharp edges of the text which should allow encoding deficiencies to be identified.
With x264, no visual anomalies can be easily identified in the CRF8 to CRF20 samples. At CRF24, the purple stripe across the background near the top of the image seems to have slightly less pink vibrance, and the straight line solid colour part of the N is developing some false colour bleeding. The pink purple stripe on the lower part of the frame is a little rough in appearance. By CRF28, the upper stripe has lost some of its detail and is blending into the background, and some of the shadowing in the O in the text is being lost. The green stripe to the lower edge of the frame has noticeable quilting. CRF32 shows noticeable quilting with some blocks being smoothed, which by CRF36, most of them have been smoothed. From thereon, quality rapidly deteriorates.
For x265, CRF8 to CRF16 are visually transparent. CRF20 appears to have some slight hint of geometry changes with the upper purple stripe which shows a hint that some quilting “might” be developing. It’s really hard to pick up, and I might even be imagining it. CRF24 shows the upper purple stripe developing more hints of sharp edges, and the N seems to have been slightly “transparent”, being encroached upon by the background and slightly softened. The lower pink stripe has more hardened edges, and by CRF28, the text seems noticeably softer and losing shadow detail with noticeable quilting. CRF32 has some noticeable quilting and geometric disruptions, with larger blocks by CRF36 and similar rapid quality degradation from there.
enc_qsv was visually transparent through to around CQ28. At CQ32, the background details appear to be contaminated with a little noise and smoothing, but the text is well rendered. It seems that the enc_qsv video preserves the text detail quite strongly, even through to CQ48, although background detail is rapidly lost from CQ36 onward where some of the background details appear to be blended.
This frame was chosen as there were some details in foreground and background to preserve. This particular one is tricky to judge, as the frame in question does have some encoding artifacts in the reference which need to be faithfully preserved in the output to avoid introducing further losses.
x264 does a transparent job up to CRF16, with CRF20 seeing some of the blockiness in the hair starting to be disturbed and smoothed out. CRF24 continues this further with CRF28 starting to introduce some encoding disturbances in the form of sparkles and blocking near sharp edges of the face. Geometrical irregularities in the background are seen in CRF32 along with more loss of face skin-tone. From there, CRF36 shows the background being destroyed and quality rapidly declining.
x265 does a mostly transparent job up to CRF20, with CRF24 showing the same smoothing out of the blocking in the hair, although the result actually looks quite visually acceptable and consistent. CRF28 shows further smoothing and slight disruptions to the edge of the face. CRF32 shows strong macroblocking of various sizes and more loss of skin tone. By CRF36, background disruption is occurring and hair texture is beginning to be completely lost.
enc_qsv is transparent until CQ20, with CQ24 showing a very slight softening of the blockiness in the hair. CQ28, the softening becomes more obvious but most details are retained. Some noise seems to be introduced along the edges of the cheeks, which gets more severe with CQ32. CQ36 continues this, with the hair detail suffering dramatically, and CQ40 has the hair detail almost completely lost and cheek/nose structure being damaged.
Results: Subjective Viewing
The x264/x265 figures are broadly similar, with CRF20 producing indistinguishable and CRF24 producing very good results. enc_qsv is a little different as its bitrate scale is different, with CQ24 producing indistinguishable results, and CQ28 producing very good results. Again, the difference in low-bitrate degradation is most notable, with my personal preference to x265 for bitrate constrained behaviour of “smoothing” the image rather than introducing strong macroblocking/quilting and false details. At higher bitrates, they’re mostly on parity in the case of this difficult case footage on an identical CRF level.
That being said, there is some variation between CRF levels considered “visually transparent” and this is likely due to the source material quality and the nature of the source material itself. As the material source is already compressed (most sources will be in some way), any artifacts which exist in the source are considered details to be preserved in future encodes. If the detail was already lost in the source, it cannot be recovered, and a higher CRF (lower quality) may be necessary to preserve the remaining detail without blowing out the bitrate. Highly complex material with lots of transient scenes such as this particular clip may make codec deficiencies easier to “hide” as we may not have enough time to fully evaluate brief scenes before they are removed from the screen. Visually complex material with strong patterning also provides places for noise to “hide” without being visually obvious. So rather counter-intuitively, the cleaner “average case” material required a higher CRF to avoid producing observable artifacts, as compared to the “difficult case” material.
It seems that for the difficult case material, the advantages of x265 in terms of space savings is more limited than for the average case. While the average savings amounted to about 28.5%, the synthetic metrics appeared to show x265 producing the larger savings for low-bitrate scenarios, rather than high-quality high-bitrate scenarios. This may be acceptable where bandwidth limited transmission is the target, but isn’t optimal for storage and archival, especially considered the significant hit to encode rates to achieve this. At the same CRF values, the PSNR and SSIM metrics tended to show agreement across x264 and x265, implying similar visual quality was being achieved.
Intel’s Quicksync encoding is effective, showing a >2-fold speed increase compared to pure x264 software encoding. However, its CQ encoding scale is not directly translated to the CRF values that x264 uses, and there is a slight compression efficiency disadvantage as well. The speed increase, however, could make it worthwhile for certain users who may not be interested in squeezing out every last bit.
On the whole, subjective viewing and still frame analysis both yielded similar results and agreement with synthetic metrics which suggested that for this difficult case clip, CRF20 was visually transparent (for x264/x265) and CRF24 was very good. For enc_qsv, the respective values approximately are CQ24 and CQ28. Counter-intuitively, more complex material seems to tolerate slightly higher CRF values possibly due to the transient nature of the scenes and strong patterns which can make deficiencies less obvious.
Appendix: Raw Data
x264 crf br ssim ssim (dB) psnr-av psnr-min enc-fps 8 80431 0.996259 24.270122 49.079511 42.799438 2.409331 12 56200 0.993292 21.733926 45.872697 40.598088 2.669777 16 37980 0.988544 19.4097 42.639772 36.824497 3.001713 20 24748 0.980864 17.181453 39.386707 33.235945 3.494829 24 15310 0.968555 15.024512 36.20401 29.81061 4.007893 28 8945 0.949897 13.001395 33.274627 26.354434 4.59113 32 5209 0.92455 11.223395 30.796128 23.510602 5.198533 36 3169 0.892636 9.691405 28.645187 20.9171 5.945302 40 2014 0.855041 8.387561 26.704712 18.49257 6.691361 44 1360 0.81408 7.306731 25.075933 16.915291 7.260373 48 921 0.771202 6.405481 23.655747 15.703609 7.846578 enc_qsv crf br ssim ssim (dB) psnr-av psnr-min enc-fps 8 150207 0.998004 26.998622 53.610681 49.913408 17.920824 12 112393 0.996435 24.479861 50.712053 46.636429 17.885168 16 83636 0.994113 22.301061 47.97989 43.539322 18.606039 20 59988 0.990626 20.280631 45.065929 39.953024 18.590631 24 42045 0.983793 17.902841 41.703423 36.282667 18.709183 28 28690 0.975579 16.12245 39.017255 33.058641 18.778584 32 18493 0.963111 14.331049 36.285278 29.824356 18.748028 36 11569 0.945839 12.663095 33.828592 26.923752 18.732244 40 7209 0.923872 11.184576 24.568336 67.168087 18.78739 44 4576 0.895844 9.823159 29.676427 22.165962 18.738522 48 3123 0.863739 8.656299 27.981151 20.207771 18.724827 x265 crf br ssim ssim (dB) psnr-av psnr-min enc-fps 8 67999 0.995184 23.173434 48.232495 43.026082 0.336379 12 46754 0.991877 20.902864 45.165876 39.340194 0.392566 16 31004 0.986617 18.734503 42.077445 35.808481 0.466758 20 19626 0.978444 16.664254 39.043464 32.509358 0.620252 24 11598 0.966304 14.724241 36.173184 29.336083 0.767254 28 6446 0.949827 12.995272 33.686856 26.353374 0.950986 32 3573 0.929462 11.515755 31.68793 24.076365 1.180606 36 2012 0.903356 10.148233 29.894034 22.547193 1.484253 40 1117 0.868552 8.812468 28.114236 21.458328 1.856564 44 724 0.832554 7.761241 26.829709 21.090787 2.312682 48 634 0.815353 7.336581 26.347943 20.948869 2.702679