Design Article
Video codecs, part 1: Intraframe coding and bitrates
John W. Woods
7/9/2008 12:00 PM EDT
This series is excerpted from "Multidimensional Signal, Image, and Video Processing and Coding." Order this book today at
www.elsevierdirect.com or by calling 1-800-545-2522 and receive an additional 20% discount and free shipping. Use promotion code 92004 when ordering. Valid only in North America.
Part 2 looks at interframe coding.
Digital Video Compression
Video coders compress a sequence of images, so this chapter is closely related to Chapter 8 on image compression. Many of the techniques introduced there will be used and expanded upon in this chapter. At the highest level of abstraction, video coders comprise two classes: interframe and intraframe, based on whether they use statistical dependency between frames or are restricted to using dependencies only within a frame. The most common intraframe video coders use the DCT and are very close to JPEG; the video version is called M-JPEG, wherein the "M" can be thought of as standing for "motion," as in "motion picture," but does not involve any motion compensation. Another common intraframe video coder is that used in consumer DV camcorders. By contrast, interframe coders exploit dependencies across frame boundaries to gain increased efficiency. The most efficient of these coders make use of the apparent motion between video frames to achieve their generally significantly larger compression ratio. An interframe coder will code the first frame using intraframe coding, but then use predictive coding on the following frames. The new HDV format uses interframe coding for HD video.
MPEG coders restrict their interframe coding to a group of pictures (GOP) of relatively small size, say 12–60 frames, to prevent error propagation. These MPEG coders use a transform compression method for both the frames and the predictive residual data, thus exemplifying hybrid coding, since the coder is a hybrid of the block-based transform spatial coder of Chapter 8 and a predictive or DPCM temporal coder. Sometimes transforms are used in both spatial and time domains, the coder is then called a 3-D transform coder.
All video coders share the need for a source buffer to smooth the output of the variable length coder for each frame. The overall video coder can be constant bitrate (CBR) or variable bitrate (VBR), depending on whether the buffer output bitrate is constant. Often, intraframe video coders are CBR, in that they assign or use a fixed number of bits for each frame. However, for interframe video coders, the bitrate is more highly variable. For example, an action sequence may need a much higher bitrate to achieve a good quality than would a so-called "talking head," but this is only apparent from the interframe viewpoint.
Table 11-1 shows various types of digital video, along with frame size in pixels and frame rate in frames per second (fps). Uncompressed bitrate assumes an 8-bit pixel depth, except for 12-bit digital cinema (DC), and includes a factor of three for RGB color. The last column gives an estimate of compressed bitrates using technology such as H.263 and MPEG 2. We see fairly impressive compression ratios comparing the last two columns of the table, upwards of 50 in several cases. While the given "pixel size" may not relate directly to a chosen display size, it does give an indication of recommended display size for general visual content, with SD and above generally displayed at full screen height, multimedia at half screen height, and teleconference displayed at one-quarter screen height, on a normal display terminal. In terms of distance from a viewing screen, it is conservatively recommended to view SD at 6 times the picture height, HD at 3 times the picture height, and DC at 1.5 times the picture height (see Chapter 6).

(Click to enlarge)
Table 11-1. Types of video with uncompressed and compressed bitrates
11.1 Intraframe Coding
In this section we look at three popular intraframe coding methods for video. They are block-DCT, motion JPEG (M-JPEG), and subband/wavelet transform (SWT). The new aspect over the image coding problem is the need for rate control, which arises because we may need variable bit assignment across the frames to get more uniform quality. In fact, if we would like to have constant quality across the frames that make up our video, then the bit assignment must adapt to frame complexity, resulting in a variable bitrate (VBR) output of the coder.
Figure 11-1 shows the system diagram of a transform-based intraframe video coder. We see the familiar transform, quantize, and VLC structure. What is new is the buffer on the output of the VLC and the feedback of a rate control from the buffer. Here we have shown the feedback as controlling the quantizer. If we need constant bitrate (CBR) video, then the buffer output bitrate must be constant, so its input bitrate must be controlled so as to avoid overflow or underflow, the latter corresponding to the case where this output buffer is empty and so unable to supply the required CBR. The common way of controlling the bitrate is to monitor buffer fullness, and then feed back this information to the quantizer. Usually the step-size of the quantizer is adjusted to keep the buffer around the midpoint, or half full.

Figure 1-1. Illustration of intraframe video coding with rate control.
As mentioned earlier, a uniform quantizer has a constant step-size, so that if the number of output levels is an odd number and the input domain is symmetric around zero, then zero will be an output value. If we enlarge the step-size around zero, say by a factor of two, then the quantizer is said to be a uniform threshold quantizer (UTQ) (see Figure 11-2). The bin that gets mapped into zero is called the deadzone, and can be very helpful to reduce noise and "insignificant" details. Another nice property of such UTQs is that they can be easily nested for scalable coding, as mentioned in Chapter 8, and which we discuss later in this chapter.

Figure 11-2. Illustration of uniform threshold quantizer, named for its deadzone at the origin.



