( Blog )

Essential Video Compressing Terms

by comprimato
August 9th, 2016

Are you a newbie to video compression? Don’t worry, we know how confusing all these difficult terms can be. That’s why we’ve put together a list explaining the essential terminology to clear things up a bit. Let us know if you are missing some important terms so we can update the list accordingly.



The term 4K stands for a video which has four times the quality of a regular Full HD video which means it has four times more pixels than Full HD. Such high definition enables  image clarity and presents vivid and realistic colors.

The usage of the term 4K is slightly messy as there are two main 4K standards and several subordinate ones. The first, DCI 4K standard used mainly in movie industry, has a resolution of 4096 x 2160 px. You are most likely to meet this resolution when you decide to visit a 4K cinema.

The second standard has  a resolution of 3840 x 2160 px and it is used in television and video games.

The 4K resolution is also commonly called Ultra HD or simply UHD.



The term 8K stands for an ultra high resolution of 7680 x 4320 px. It is currently the highest resolution used and it is sixteen times the quality of  Full HD resolution regarding the number of pixels.

Together with an extremely sharp image and the possibility to view the smallest of details, a recording in such a high resolution provides a number of options in the post-production process – you can easily zoom in or crop the image.

The terms 8K UHD or 8K Ultra HD are also commonly used in relation to such a high resolution.


Bit Depth

The term Bit Depth tells you how large a color palette is used for the given video or image. It helps you find out how many bits were used by the system in order to save each color component in an individual pixel.

For example, the color of each pixel on your screen is made of three individual components – red, green and blue. Together, they create a needed color and light up the pixel accordingly. Therefore if you use 8 bits on each component you get 256 different shades of red, green and blue which add up to 16,777,216 different colors, which can be used to light up the pixels.

It can be seen the deeper the bit depth is, the larger color palette it portrays. However, you have to take into account the increased requirements on the memory card.


Bit Rate

The bit rate parameter determines how much data is present in every second of a video. It is commonly set in megabytes per second (Mbps). The overall size of the video can be found out by multiplying the length of the video by its bit rate. So for example a video which is 80 seconds long with the bit rate of 1 Mbps has an overall size of 80 Mb, or 10 MB (8 bits = 1 Byte and 80 Mbits = 10 MBytes). The bit rate also indicates the speed of internet connection needed for streaming. To use the example from above, to be able to load and play the video we would need an internet connection with the speed of at least 1 Mbps.


Color Sampling

Color Sampling is a technique with which you can erase a part of the color information from the image while maintaining an acceptable visual quality, therefore the image does not take up so much space. We call this process subsampling.

First of all, the image is converted into an YCbCr color space which indicates the luminance and chrominance of blue and red color. When subsampling, the luma component is kept while the chroma is compressed because the human eye reacts more sensitively to the change in luminance than in color hue.

The next step is converting the image into an RGB model which is used by an overwhelming majority of today’s LCD displays. The green component is added at this stage, as it has been missing in the YCbCr. The addition of the green color can be easily done by the system substracting the luma component from the sum of the red and blue colors.

While compressing the color information, the pixels are joined with at least one same color component. These pixels are divided into conceptual regions which are usually the size of 4 x 2 pixels. By merging the pixels together the samples emerge in the regions which take over the color from one of the original pixels in the given region. The number of samples in the given region sets the schema pattern which is composed of three part ratio J:a:b, where:

  • J: the width of the conceptual region which is usually 4 pixels wide (the height is always 2 pixels).
  • a: number of samples in the first row.
  • b: number of samples in the second row.

The most common ratio is either 4:4:4 or 4:2:0.




The goal of video compression is to compress the size of the desired file and maintain the highest possible quality of the original. When compressing a video, redundancies, for example a group of pixels of a same color which are next to each other, are erased. We distinguish two types of compression:

  • Lossy compression: we inevitably lose data from the original file, therefore the difference between the original and the compressed image can be visible. However, by reducing the quality we gain a compressed file of a smaller size.
  • Lossless compression: The original image including the details is preserved. The drawback of this method is the limited possibility to compress the size of the file.


Container format

Container format is a specification of the wrapper in which the actual encoded image, audio or metadata is stored. This enables us to interpret the encoded data (such as a movie containing both sound and subtitles). The most commonly used containers in the Media & Entertainment field are MXF, DCP, IMF and MPEG-TS.



CUDA is a parallel computing platform which has been developed by the NVIDIA company in 2007. CUDA platform involves GPU (Graphics Processing Unit) as a computing device and a special programming language and a compiler to create programs that run on the GPU. CUDA therefore allows us to significantly increase the computing power of a computer by harnessing the power of the GPU.

CUDA programs can be created by using the programming languages C, C++ and Fortran which enable the user to send the code straight to the GPU. It also simplifies the process of creating new applications using the output of the graphics processing unit. A rather large ecosystem emerged around CUDA in which you can find not only the tools and libraries but also a number of exercises, manuals and webinars.

Source: http://www.nvidia.com/object/cuda_home_new.html



DCI is an abbreviation for Digital Cinema Initiatives, an organization which connects the most important movie makers such as Paramount Pictures, 20th Century Fox, Warner Bros and others. The DCI organization sets standards for digital cinemas such as the format and quality of video and sound. For example, JPEG2000 is the video coding standard for digital cinema with  250 Mbps as its highest allowed bit rate. This means that today each movie screened in a cinema is in JPEG2000 format.



DCP is an abbreviation for Digital Cinema Package which means digital movies which are made out of video and sound files. It is a container format for movies in which they are sent to cinemas. The package consists of the video and sound track, subtitles and a protective mechanism against copying. The video data is in a JPEG2000 format.



Decoding is the reverse process of encoding and its goal is to convert the encoded data back to its original form, therefore to data which contains the image color pixel by pixel. The decoding process ensures the possibility to play or view the desired file.



Encoding is a term which stands for a process of conversion from the original file to a different form. Generally, this means compressing, or decreasing the size of data. This means you get an image file of a smaller size which can be transmitted and saved more easily.


HDR (High Dynamic Range)

HDR technology is directly linked to Bit Depth. Images and videos commonly contain a bit depth of 8 bits, however on many occasions this does not sufficiently portray the scene. HDR is able to increase the bit depth and get a more realistic portrayal of colors.

For instance, HDR in photographs works on the basis of a scene exposure done multiple times with a different setting each time. We receive more information about both the light and the dark side of the image, which the camera (or a specialized program) blends together into a final shot.

In order to be able to shoot a video in HDR, you need to have a camera which supports it and you also need it to subsequently play the video either on your computer or HDR ready TV.



Gamut is the range of colors which the human eye is able to perceive or which the given device is able to reproduce. The spectrum of all colors visible to the naked human eye is described by the CIE 1939 diagram.




GPU is an abbreviation for graphics processing unit. This is the hardware part of the computer which provides the necessary calculations for the creation of images in a frame buffer intended for output to a computer monitor. GPU can be present in the form of a processor or on a video card.

The development of GPU computing led to the GPU being used not only for graphic calculations but others as well – financial and mathematical calculations and it has also contributed to the creation of meteorological models. Comprimato uses a GPU for such purposes as well.



IMF stands for Interoperable Master Format, a new format for digital content. IMF is a digital format used for editing files such as images, sound or subtitles. IMF helps us to combine and create different versions of the given file.

This format is used for movie production where a number of language mutations and versions are used for different countries and distribution networks. IMF minimizes the requirements on the capacity of the memory card, as it only saves the changes made between the original file and the new versions of the file. Therefore it helps the user to easily create final file bundles.

IMF is standardized by SMPTE organization – a world leader in movie and television production and development.

You can find more information about this format in this Youtube video.


Intra-frame compression

This compression method enables us to find redundancies in each frame. In contrast to Inter-frame compression, the compression algorithm is always used only for a single image and not for the rest of the images in the video sequence. By reducing the number of individual images, we reduce the size of the whole video, however the final frames are of the same quality. Because of that, this compression is used in the post-production phase of the film-making process.


Inter-frame compression

This is another compression method in which the compression algorithm processes the video frames in bundles where only the first, key frame is compressed as a whole. As for the rest of the frames, the algorithm only compares them with the key frame and records their differences.

The similarities are, on the other hand, erased as redundancies and therefore the size of the compressed file is reduced. This method of data compression is mainly used for video streaming on the internet as the differing quality of individual frames is not suitable for the film’s post-production process.


IP Infrastructure

This is a term for video distribution infrastructure that is build on top of an IP network. This type of infrastructure is typically used inside the video production houses such as broadcast production facilities. IP networks and their reliability play a key role for the broadcasters and its main goal is to interconnect the video production, together with its editing and distribution, and its broadcasting to the end audience. Today, IP infrastructures are gradually replacing the outdated types of infrastructure which are based on the SDI interface.


IP Stream

This is a term for a video stream which is transmitted across the IP network. It can either be a stream which is inside the IP infrastructure or any video stream on the internet. Other examples of IP streams on the internet are:

  • While securing the office or household through security cameras,
  • When sharing live video from your mobile device,
  • In public monitoring of long-term events, such as the construction of an important building, the birth of baby animals in a zoo, current weather forecast in a given location etc.



JPEG2000 is a coding system for image data created in 2000. Its predecessor was an older and better known JPEG format which gained its popularity because of the highly efficient lossy compression.

With the development of multimedia in the 1990s, there was an increase in the amount of data which subsequently led to more demanding requirements on the features of compression algorithms such as wireless compression or  format openness.

This led to the development of the new JPEG2000 format which offers a number of other benefits. You can learn more about this format in this article.


JPEG2000 Broadcast Profiles

While DCI are JPEG2000 standards used in cinemas, JPEG2000 Broadcast Profiles is a standard which is followed by television broadcasters during their video editing process. There are a number of broadcast profiles which are described in the ISO/IEC 15444-1:2004/Amd 3:2010 specification.


MPEG2-TS (TS = transport stream)

MPEG2-TS is a container format which is used when streaming videos on the internet. As it is internally synchronized, it is suitable for digital broadcasting, where the video can be played even if the whole file is not yet fully downloaded.



MXF is an open container format which is standardized by the SMPTE organization. It is a professional format which was developed to solve a number of issues (for example, the saving of metadata) which kept arising because of unprofessional formats. As MXF is able to deal with such problems, it is suitable for interoperability between professional applications.

There have been many different implementations of this kind over the years and they negatively affected interoperability, therefore there was a need to specify the MXF standard in more detail.



The process of post-production is generally understood to be a work done on a film or recording after filming or recording has taken place. The video post-production phase consists of editing, color correction, post-production graphic and sound edition, and addition of special effects if necessary.


Progressive decoding

Progressive decoding enables the user to decode only selected parts of a compressed file, so for instance you can decode only HD video from a 4K video or decode only 8 bits from a video with 16 bit color depth. Comprimato significantly speeds up the process of decoding, as it is able to decode video previews in low resolution so there is no need to decode the whole video.


Real-time processing

This is a term for real time editing of individual frames in image data. This means that individual frames in a live stream have to be encoded and decoded at least with the equal speed of the given live stream. So if the speed of the stream is for example 60 fps, each frame has to be edited in 1/60 of a second.



Resolution is one of the basic parameters determining the quality of a video based on the number of rows and columns of pixels in an image. There is a number of different resolutions, such as:

  • 720p – HD (High Definition) – 1280×720 px
  • 1080p – Full HD – 1920×1080 px
  • 2160p – 4K – 3840×2160 px
  • 4320p – 8K – 7680×4320 px



SDI stands for Serial Digital Interface and it is an interface for digital video transmission. SDI interface is mainly used in HD or 4K videos as it is capable of transmitting large amount of data without the need of compressing them first.

For that very reason, SDI interface is used when transmitting data from cameras in TV studios or when making a presentation in HD quality.


UHD – Ultra HD

See 4K



Wavelet is a kind of digital signal transformation whose main goal is to separate the important parts of an image from the less important ones. The result is that the transformed image can then be compressed easily. For the purpose of transformation/preprocessing of image data before the compression itself, the JPEG2000 compression standard uses so called Discrete Wavelet Transform (DWT).

During the wavelet transformation, the image is split up into high and low frequencies. As a result, two images of half size are created as opposed to the original. The human eye reacts more sensitively on the image with low frequency, however the amount of data in the image with high frequency can be reduced without us noticing the difference.

Wavelet transformation enables images in JPEG2000 format to be split up several times and their size is therefore reduced.