Faster ARM64 encodes using x265

x265 contains a significant amount assembly optimization for its compute kernels which enables speed-ups of the order of 5X when compared to the running pure C-code. While the support for the x86 architecture is extensive (there exist kernels right from SSSE3 all the way until AVX512), support for other architectures such as ARM is limited. Up until now, x265’s support for the ARM architecture, for example, has been limited to support for ARMv7 architecture (32b).

In the recently released v3.4 of x265, a fresh new set of hand tuned assembly implementation for the 64bit ARM architecture (aarch64) for some compute intensive kernels have been introduced . The figure below shows the acceleration across presets that these kernels enable. On an average, the kernels speed-up the encode by 10%, with up to 21% acceleration for the default medium preset.

Fig.1. Speed up across presets for  crowd_run_1080p

These kernels were developed by video codec engineers at Huawei, increasing the footprint of companies that contribute to x265. When asked about their reasons to focus on x265, the Huawei team described their intent to build the opensource ARM ecosystem for use-cases such as Big Data, Web, Storage, Database, Acceleration library, and so on. Given x265’s popularity in the video domain, they chose to contribute to this project, enabling a win-win for both the open-source community and the ARM ecosystem.

In addition, Huawei has made ARM resources available to the open-source community. The community can use the two 8U16G and one 32U64G VMs that have been donated, to work on such aarch64-focused optimizations.

If you are interested to leverage these VMs, please write to us at

Happy compressing!!

v3.4 is released!

Version 3.4 of x265 is out with cool new features to automate the generation of efficient ABR-ladder, speedy ARM64 encodes and improved 2-pass. The complete feature list and the associated performance numbers is available in our release notes. The release can be downloaded here.

Happy compressing!

All new v3.3!

Version 3.4 of x265 is out with cool new features to automate the generation of efficient ABR-ladder, speedy ARM64 encodes and improved 2-pass. The complete feature list and the associated performance numbers is available in our release notes. The release can be downloaded here.

Happy compressing!!

The revamped “veryslow” and “slower” presets of x265 Encoder ver. 3.0

By  Praveen Kumar Karadugattu and Kalyan Goswami


In the x265 Encoder ver 3.0, the default parameters for “veryslow” and “slower” presets are changed. In this blog we will discuss this modification in detail.

The all new ‘veryslow’ Preset

The “veryslow” preset is one of the most preferred presets by the offline encoding vendors, where the video quality is paramount. Hence, the quality of encoded bitstream is the key feature for this preset than the encoding speed. Hence, this preset targets at improving the quality of the encoded video at a given bitrate while performing extensive computations. This preset can be enabled by specifying the CLI option “–preset veryslow” or “–preset 8” with x265. After performing a bunch of evaluations and finally come up with the following changes in the default settings of the four key parameters in x265 ver 3.0 under the “veryslow” preset.

The significance of these modified params is described below.

  1. “–limit-refs” implies that x265 will limit references for a PU at a given CU-depth by using information from its higher depth and/or other PU modes at the same depth. Previously, the default value for this preset was set as “1”, which restrict analysis at the current depth, based on references used to code four sub-blocks at the next depth. For example, a 16×16 CU will only use the references used to code its four 8×8 CUs. In version 3.0 of x265, the restriction for limiting the references is relaxed.
  2. “–limit-modes” restricts the mode analysis for each CU using cost metrics from the 4 sub-CUs. When multiple inter-modes, such as “–rect” and/or “–amp” are enabled, this feature uses motion cost heuristics from the 4 sub-CUs to bypass modes that are unlikely to be the best choice. It significantly improves the encoding performance, when “–rect” and/or “–amp” are enabled bit with a minimal loss of compression efficiency. In x265 version 3.0, this restriction on mode analysis is relaxed.
  3. During motion estimation, the encoder merges the motion of neighbouring blocks to predict the motion of the current block. The CLI option “–max-merge” defines a maximum number of such neighbour candidate blocks (spatial and temporal) that the encoder may consider for merging motion predictions. If a merge candidate results in no residual, it is immediately coded as a “skip”. Otherwise, the merge candidates are tested when searching for the least cost inter-mode option.
  4. “–limit-tu” enables an early exit from TU depth recursion, for inter coded blocks. Previously, the “–limit-tu” was 4, which means the 1st sub-TU depth is taken as the limiting depth for the other sub-TUs. In x265 version 3.0, this feature is disabled under the “veryslow” preset.

All the above-mentioned features have a direct impact on the visual quality (both subjective and objective) of the encoded bitstream, but with the penalty of slightly increased encoding time. We have performed a series of experiments to evaluate the speed vs quality trade-off with the new “veryslow” preset against the previous version.

The all-new ‘slower’ Preset

Since most of our OTT clients use “veryslow” preset, the burden of longer encoding times might hamper their application. In order to address this issue, we have redefined the new “slower” preset to match the quality of the previous “veryslow” preset. Hence, the “slower” preset of x265 version 3.0 should provide the same visual quality as the older versions’ “veryslow” preset while maintaining the same encoding speed (fps) as before.

Experimental Results

The sequences that we have chosen for our experiments are listed in Table 1. We have used 4 different resolutions with different source FPS (not to be confused with the encoding fps) and bit-depth. All the experiments are carried out on a Dual-Xeon Linux-based server with 56 cores and 112 threads in a a sequential order.

Table 1: Test Sequences used

In Table 2 the improvements in quality under the “slower” and “veryslow” presets in version 3.0 over version 2.9 are shown. We have computed the BD-PSNR and BD-SSIM metrics to represent the improvement in encoding efficiency in the x65 version 3.0 by making the version 2.9 as the anchor. The positive values of BD-PSNR and BD-SSIM imply the improvement of encoding efficiency (quality). The values in each row of this table indicate the average value of all the test samples under a particular resolution. From this table, it is quite clear that the new version of x265 has significant improvement in quality compared to the older one for these presets.

Table 2: Quality improvement with “slower” and “veryslow” presets in the x265 ver 3.0 over ver 2.9

We can clearly see that there is a significant improvement in the visual quality of the “veryslow” preset of x265 ver 3.0 over the “veryslow” preset of x265 ver 2.9.

As pointed out in the previous section, the improvement in encoding efficiency is expected to slow down the performance (encoding fps). Hence, it is important to gauge the effect of the newer presets on performance. Table 3 shows the performance results of x265 version 3.0 in comparison with those of version 2.9 under the “slower” and “veryslow” presets. These are the encoding FPS values averaged under each resolution.

Table 3: Performance decrement for “slower’ and ‘veryslow” presets in the x265 ver 3.0 over ver 2.9

From Table 3 it is evident that the new “veryslow” and “slower” presets are significantly slower than the older version. However, the performance drop with the new “slower” from the older “veryslow” (represented under the column “ΔFPS% (slower_3.0 vs veryslow_2.9)”) is quite negligible. Hence, for the customers who need maintain the same performance as the “veryslow” preset of x265 version 2.9, are recommended to use the new “slower” preset of x265 version 3.0.


The new “veryslow” preset of x265 ver 3.0 gives much higher compression efficiency compared to the old “veryslow” of ver 2.9, however, it comes with a significant drop in the performance (encoding fps). Hence, we have moved the old “veryslow” to the new “slower” in ver 3.0 which retains a similar compression efficiency as the old “veryslow” (of ver 2.9) without sacrificing much of the performance (encoding speed in fps).


Both the versions of x265 can be downloaded from here.

The CLIs we ran to perform these tests are:

x265 –psnr –ssim –input <input.yuv> –input-res <width>x<height> –fps <fps> –bitrate <target bitrate in kbps> –preset slower –output “out_slower.hevc”

x265 –psnr –ssim –input <input.yuv> –input-res <width>x<height> –fps <fps> –bitrate <target bitrate in kbps> –preset veryslow –output “out_veryslow.hevc”


x265 and SVT-HEVC in the same house

With changeset a41325fc854f, the x265 library can invoke the SVT-HEVC library for encoding through the —svt option. We have mapped presets and command-line options supported by the x265 application into the equivalent options of SVT-HEVC, and have added a few specific options that are available only when the SVT-HEVC library is invoked. This page in our documentation describes the steps to build, and invoke the SVT-HEVC library in more detail.

Our reason for this integration was to enable our users to evaluate additional relative trade-offs between performance and compression efficiency while working behind the familiar API of the x265 library. In the long term, we plan to leverage this integration to further improve x265’s ability to handle real-time and low turn-around scenarios in pure software; this is the space that SVT-HEVC was focused on. In parallel, we will continue to innovate on our flagship presets that are used in offline encoding where x265 dominates.  You can expect to see these changes in the coming releases of x265, increasing the reach of open-source for video compression!

v3.0 is now out!

We are happy to announce now have version 3.0 of x265. The main focus of this version is to improve the quality, especially for the ‘veryslow’ and ‘slower’ presets. Moreover, Dolbyvision is included in this version. The detail description of all the new features and releases is available in our release notes.

x265 delivers Dolby Vision streams!

 By Aruna MatheswaranKirithika Kalirathnam

Dolby Vision transforms the way you experience movies, TV shows, and games with incredible brightness, contrast, and color that bring entertainment to life before your eyes. By fully leveraging the maximum potential of new cinema projection technology and new TVs’ display capabilities, Dolby Vision delivers high-dynamic-range (HDR) and wide-color-gamut content.

Dolby Vision compliant bitstreams can now be easily generated out of x265 by specifying the preferred Dolby Vision profile in the command line option –dolby-vision-profile we have introduced.

Here is the list of  Dolby Vision profiles that x265 supports today, but Dolby Vision provides a rich set of profiles to support various ecosystems from over-the-top streaming to Blu-ray Discs.  For more information, please refer to the Dolby Vision Profiles and Levels document at

  • Profile 5 single layer with Dolby Vision-only support  
  • Profile 8.1 single layer with HDR10 compatibility
  • Profile 8.2 single layer with SDR compatibility

All these encodes use the 10-bit YCbCr 4:2:0 base layer output as input, which is generated from a Dolby Vision mezzanine source that has gone through profile specific Dolby Vision pre-processing.

The single layer encoding approach includes a base video essence, while the dual-layer encoding approach contains a base layer and enhancement layer video essence. Multiple video essences can be either carried separately or interleaved as a single video essence within a media container.

Comparison of Dolby Vision profiles supported in x265

* Dolby Vision-proprietary IPT is similar to BT.2100 ICtCp, where I is similar to I, P similar to Cp, and T similar to Ct.

Metadata muxing enabled x265 Encoder

Dolby Vision’s processing pipeline includes 4 major stages starting with source mezzanine pre-processing, encoding,  metadata muxing with elementary bitstream and post-processing.

To minimize the workload of Dolby Vision processing pipeline, x265, in addition to generating Dolby Vision Compliant elementary streams, has also encapsulated Dolby Vision Metadata muxing in its workflow. –dolby-vision-rpu is the command line option we have introduced in x265 to take in Dolby Vision RPU metadata generated by Dolby Vision pre-processors and mux it with the elementary bitstream.

Muxing enabled x265 Encoder



Who gets more bits? Chroma? or Luma?

Due to the larger color volume that IPT delivers, more bits than usual may be allocated for chroma. Since the human visual system is more sensitive to the compression artifacts in luma, increasing chroma QP offset values may improve video quality when more bits are needed for luma. Hence, we have optimized the chroma QP offsets for Dolby Vision profile 5 encodes.

Sample x265 command line to try out:

 ./x265 --input <Profile specific 10bit YCbCr 4:2:0 source> --input-res <wxh> --fps <fps> --input-depth 10 –-input-csp i420 --dolby-vision-profile  <5|8.1|8.2> --dolby-vision-rpu <Dolby Vision metadata RPU file> --vbv-bufsize <vbv bufsize> --vbv-maxrate <vbv maxrate> -o Dolby_Vision_stream.hevc

Snapshots captured from LG OLED55C8PTA TV with Dolby Atmos and 4k cinema HDR with Dolby Vision

Dolby Vision profile 5 HDR vs SDR

Dolby Vision profile 8.1’s HDR10 vs Conventional HDR10

Dolby Vision profile 8.2’s SDR vs Conventional SDR



Meet us at IBC 2018!

Make sure to visit us at our kiosk in the Intel booth at 5.B65, where we will be demonstrating enhancements to x265 that improve the throughput of encoding for ABR streaming by over 2X by leveraging Machine Learning on the CPU. We continue to innovate in this space to bring ground-breaking improvement in performance and quality by leveraging our expertise in video compression algorithms, machine learning technology, and microarchitecture-aware optimizations that enabled us to use Intel AVX512 instructions while encoding.
You can reach out to us on our usual channels developer mailing listdoom9 or Facebook to talk.
Do NOT follow this link or you will be banned from the site!