We at Comprimato are developing a JPEG2000 codec which is accelerated on GPUs and with the latest NVIDIA cards with Pascal architecture e.g. Quadro GP100 or Quadro P6000 we found the speed of PCI express 3.0 as a limiting factor for the speed of our JPEG2000 codec.
The theoretical speed of PCI express 3.0 x16 is 15.7 GB/s. The real speed for NVIDIA cards measured by the NVIDIA bandwidth test is slightly above 12 GB/s. During our 8K tests we find that we are able to use up to 11.5 GB/s because we do not allow copying data during the whole decoding.
For better illustration, I chose 8K 422 10bit video profile because I can set two different output formats (10bit stored in 16bit or V210) which are commonly used.
• If I used 422 10bit stored in 16bit then I achieved 102 FPS in decoding which makes bandwidth 102 * 113.2 MB (uncompressed image size) = 11.5 GB/s for 8192×3456 footage sample.
• If I used 422 10bit format called V210 where image size is reduced to 75.6 MB then decoding speed increased to 153 FPS and bandwidth remained the same: 153 * 75.6 = 11.5 GB/s.
Finally, I also measured 10bit decoding but only 8 bit were transferred via the PCI-e bus. The speed in this case increased to 164 FPS which creates a bandwidth 164 * 56.6 MB = 9.2 GB/s. All these results showed that our JPEG2000 codec reaches the speed limit of PCI-e 3.0. Looking forward to try the new NVIDIA NVLink bridge with 80 GB/s bandwidth and hoping that PCI express 4.0 will come soon…