Design Article
Video Compression Overview
Andrew Davis
2/8/1998 12:00 AM EST
Video compression technology is the key ingredient for cost-effective transmission of video images over any digital communications link. Compression is also the gating factor for video delivery in multimedia authoring, entertainment, and education systems, in digital CATV, and in digital direct broadcast satellite TV. The history of the videoconferencing industry is intertwined with the history of compression technology. For example, in videoconferencing, a full CIF (Common Intermediate Format) image would contain 352 x 288 luminance pixels, 176 x 144 chroma R-Y pixels, and 176 x 144 chroma B-Y pixels. At 30 frames per second with 8-bits of information per pixel, this represents over 36 Mbps; squeezing this down into an ISDN-PRI communications channel of 128 kbps is a formidable challenge. Doing it in real time, while synchronizing with audio adds further complications.
A generic video compression scheme is implemented in three steps:
- Signal Analysis
This operation performs measurements on the input video stream, computes prediction errors and transform coefficients, divides the signal into subbands, and does any correlation analysis. Input pixels are transformed into another format (for example DCT coefficients) but no data reduction is involved.
- Quantization
Here is where most of the compression gains are attained because the quantizer can throw away values that represent video information indiscernible to the viewer. The quantization step is usually lossy, but lossless schemes with lower compression ratios are also possible.
- Variable Length Coding (Entropy Coding)
Each event is assigned a code with a variable number of bits. Commonly occurring events have codes with few bits; rare events have codes with lots of bits. The expectation is that the average code length will be less than the fixed code that would otherwise be required.
Video compression is accomplished through a digital signal processing algorithm. Many products and approaches are available today, spread across four technology classes:
- Discrete Cosine Transform (DCT)
- Vector Quantization (VQ)
- Fractal Compression
- Discrete Wavelet Transform (DWT)
The DCT is the basis of the major international compression standards: JPEG, MPEG, H.261, and H.263.

Figure 1: Forward discrete cosine transform (DCT) based encoder
During the DCT encoding process, a picture is broken up into smaller blocks of 8x8 pixels, which serve as the input to a discrete cosine transform (DCT) which reduces data redundancy. The DCT coefficients are then quantized using weighting functions optimized for the human visual system (which is less sensitive to some color and spatial frequencies) and the resulting data are entropy encoded using a Huffman variable word length scheme. To decompress the image, the process is carried out in reverse.
Quantization is the lossy component of video compression. After the DCT operation, quantization is applied to the resultant frequency matrix. Since human vision is less sensitive to color than to brightness, the quantization of the color values can be greater than that of the intensity values. This provides a higher degree of compression for the color components, without causing much perceptible image degradation. Similarly, given that human vision is less sensitive to high frequency details than to low frequency details, the quantization of high frequency values can be greater, gaining extra compression without sacrificing perceived image quality.
Because it compresses the overall range of the data, a quantization operation on the frequency matrix will output an altered matrix which contains more frequent and longer runs of the same value. In particular, a quantized matrix typically contains many more zeros. The quantized matrix can be inverted to approximate the original full-range frequency matrix. However, the finer granulations between values that were eliminated have been permanently lost.
In vector quantization (VQ), changes in the position of a pixel are represented by numbers showing magnitude and direction. VQ relies less on discarding color data than does DCT, but tends to blur motion at the edges of a rapidly moving image. With VQ, a transmitted digital word represents the quantized value of more than one sample of a signal. For imaging, a single vector usually represents a 4x4 pixel array. VQ is a highly asymmetrical algorithm; encoding requires a complex search process to make the decision as to what vector to transmit, while the decoder is merely using a look-up table to display a corresponding value.
Fractals and wavelets are newer technologies. The wavelet transorm decomposes an image into frequency components by iterative low- and high-pass filtering which is performed on the entire image, not on 8x8 blocks. The result is a hierarchical representation of an image, where each layer represents a frequency band. Fractal compression is slow because it is complex, but it does produce a much smaller file size than most other approaches, and decompression can be done in software. Iterated Systems, Inc., which is pushing fractal technology hard, is going after the desktop market, promising broadcast quality images from CD-ROM. Iterated is also active in Internet video delivery, and has hopes of two-way videoconferencing capability in the future. Aware Inc. is a company promoting wavelets as a way to improve the quality of video compression. The company has licensed much of its technology to Analog Devices, which is also a leader in the wavelet field. Unlike DCT, which operates on small blocks of pixels, the Discrete Wavelet Transform (DWT) operates on the entire image, thereby eliminating blocky artifacts common with DCT technology.
Compression algorithms can also be described according to whether they are based on data from a single frame (spatial compression), techniques known as intra-frame, or whether they achieve additional compression by comparing frames and compressing only the differences between frames (spatial and temporal compression), a technique dubbed inter-frame. JPEG is the best known intra-frame algorithm. Inter-frame approaches provide much higher compression ratios and themselves represent a range of complexity and compression ratio options. A simple inter-frame approach would be to calculate the difference between two adjacent frames and compress and transmit only this difference. A more complex example would be to estimate the inter-frame motion of blocks of pixels and transmit the motion vector.

Figure 2: Intraframe compression removes redundancies and detail within a single frame while interframe compression removes redundancies between adjacent frames. Modeled after the Microprocessor Report, February 14, 1994.



