Comprimato JPEG2000 SDK 2.6 Release: 290% speed-up and more than 1000 FPS

We are rolling out the 2.6 version of our GPU and CPU accelerated JPEG2000 Codec SDK. The new release brings up to 290% faster encoding and 200% faster decoding. It is now possible to decode FHD video at more than 1000 FPS or to encode 4K 60p video at 3.7x the real-time. Starting with this release, we officially support the NVIDIA Turing GPU architecture, including all Tesla, Quadro RTX, and GeForce RTX GPU boards.

Performance boost with NVIDIA Turing

With 2.6, we are officially launching support for Turing, the latest generation of NVIDIA GPUs. Compared to the previous Pascal generation, Turing brings significantly higher memory throughput and a significant boost in the raw TFLOPS performance. Comprimato JPEG2000 SDK leverages the new GPU architecture to enable almost 3x higher encoding density with single slot Quadro RTX4000, the successor to the popular Quadro P4000.

“Comprimato JPEG2000 SDK leverages the new NVIDIA Turing GPU architecture to enable 3x higher encoding throughput.”

Below, you can see a comparison of the Quadro P4000 with the new Quadro RTX4000. Compared to the P4000, the new Comprimato JPEG2000 2.6 Codec encodes HD video streams almost 3x faster when running on the RTX4000. Decoding is about 2x faster. We see a similar speed-up for UHD 4K video streams.

Performance comparison for NVIDIA Quadro P4000 and Quadro RTX 4000 for FHD and UHD video

Comparing two of the high-end Quadro GPUs P6000 and RTX6000, we are achieving significant performance improvements also. Mainly, the encoding got much faster. There is about 2x improvement for both FHD and UHD 4K video data. Although the decoding was always fast, there is a significant improvement as well. HD video is about 1.4x faster while UHD decoding got 1.2x faster.

Performance comparison for NVIDIA Quadro P6000 and Quadro RTX6000 for FHD and UHD video

Note that the performance was evaluated on a standard workstation sporting Intel Xeon E5 1620v3, a 4/8 core CPU with 32GB RAM. The test data were 1920×1080 and 3840×2160 videos with 4:2:2 sampling and 10bit colors. The JPEG2000 encoder was set to produce videos with bitrates at 200 Mbps for the FHD (1080p60) and 600 Mbps for UHD (2160p60) video streams.

You can check the performance of more GPUs and video profiles in our performance calculator.

Fixed bugs

  • Fixed decoder crash when reading invalid codestream (with specific invalid marker ordering).
  • Fixed incorrect GPU rate control in encoder with CUDA 10.
  • Fixed incorrect input size when postprocessing images with varying sizes on GPU.
  • Fixed repeated calls to GPU postprocessing size callback for the same image.
  • Fixed precision of V210 formatter in the decoder.
  • Fixed decoder parser for some damaged codestreams (SOT marker starting 3 bytes before codestream end).
  • Fixed CUDA page locked memory handling.

