Design Article
Video codecs, part 2: Interframe coding, MPEG-2 & MPEG-4
John W. Woods
7/16/2008 12:00 PM EDT
This series is excerpted from "Multidimensional Signal, Image, and Video Processing and Coding." Order this book today at
www.elsevierdirect.com or by calling 1-800-545-2522 and receive an additional 20% discount and free shipping. Use promotion code 92004 when ordering. Valid only in North America.
Part 1 looks at intraframe coding. Part 3 looks at H.264 and video over networks.
We now turn to the generally more efficient interframe coders. While these coders offer more compression efficiency than do the intraframe coders just discussed, they are not so artifact free, and are mainly used for distribution quality purposes. Intraframe coders, with the exception of the digital cinema distribution standard, have been mainly used for contribution quality, i.e., professional applications, because of their high quality at moderate bitrates and their ease of editing.
11.2 Interframe Coding
Now we look at coders that make use of the dependence between frames or interframe dependence. We consider spatiotemporal (3-D) DPCM, basic MC-DCT concepts, MPEGx multimedia coding, H.26x visual conferencing, and MC-SWT coding. We start out with the 1-D DPCM coder and generalize it to a sequence of temporal frames.
11.2.1 Generalizing 1-D DPCM to Interframe Coding
We replace the 1-D scalar value x(n) with the video frame x(n1,n2,n) and do the prediction from a nonsymmetric half-space (NSHS) region in general. However, in practice the prediction is usually based only on the prior frame or frames. Then we quantize the prediction error difference or residual. We thus have the system shown in Figure 11-8. We can perform conditional replenishment [4], which only transmits pixel significant differences, i.e., beyond a certain threshold value. These significant differences tend to be clustered in the frame so that we can efficiently transmit the cluster position with a context-adaptive VLC. The average bitrate at the buffer output can be controlled by a variable threshold to pass only significant differences. (For an excellent summary of this and other early contributions, see the 1980 review article by Netravali and Limb [32].)

Figure 11-8. Spatiotemporal generalization of DPCM with spatiotemporal predictor.
Looking at Figure 11-8, we can generalize 3-D DPCM by putting any 2-D spatial or intraframe coder in place of the scalar quantizer Q[·]. If we use block DCT for the spatial coder that replaced the spatial quantizer, we have a hybrid coder, being temporally DPCM and spatially transform-based. We typically use a frame-based predictor, whose most general case would be a 3-D nonlinear predictor operating on a number of past frames. Often though, just a frame delay is used, effectively assuming the current frame will be the same as the motionwarped version of the previous frame.
In some current coding standards, a spatial filter (called a loop filter) is used to shape the quantizer noise spectrum and to add temporal stability to this otherwise only marginally stable system. Another way to stabilize the DPCM loop is to put in an Intra frame every so often. Calling this frame an I frame, and the intercoded frames P, we can denote the coded sequence as IPPP ···PIPPP···PIPPP···. Another idea, in the same context, is to randomly insert I blocks instead of complete I frames. While the former refresh method is used in the MPEG 2 entertainment standard, the latter random I block refresh method is used in the H.263 visual conferencing standard.
11.2.2 MC Spatiotemporal Prediction
There are two types of motion-compensated hybrid coders. They differ in the kind of motion estimation they employ: forward motion estimation or backward motion estimation. The MC block in Figure 11-9 performs the motion compensation (warping) after the motion estimation block computes forward motion. The quantity
seen in the figure is also the decoded output, i.e., the encoder contains a decoder.

Figure 11-9. Illustrative system diagram for forward motion-compensated DPCM.
The actual decoder is shown in Figure 11-10, and consists of first a spatial decoder followed by the familiar DPCM temporal decoding loop as modified by the motion compensation warping operation MC that is controlled by the received motion vectors. In forward MC, motion vectors are estimated between the current input frame and the prior decoded frame at the encoder. Then the motion vectors must be coded and sent along with the MC residual data as side information, since the decoder does not have access to the input frame
In backward MC, we base the motion estimation on the previous two decoded frames
and
as shown in Figure 11-11. Then there is no need to transmit motion vectors as side information, since in the absence of channel errors, the decoder can perform the same calculation as the encoder, and come up with the same motion vectors. Of course, from the viewpoint of total computation, backward MC means doing the computationally intensive motion estimation and compensation work twice, once at the coder and once at the decoder. (You are asked to write the decoder block diagram for backward motion-compensated hybrid coding as an end-of-chapter problem.)

Figure 11-10. Hybrid decoder for forward motion compensation.

Figure 11-11. Illustration of video encoder using backward motion compensation.
One problem with interframe coding is the required accuracy in representing the motion vectors. There has been some theoretical work on this subject [12] that uses a very simple motion model. Even so, insight is gained on required motion vector accuracy, especially as regards the level of noise in the input frames.




wosch
7/15/2008 12:39 PM EDT
The link of pages 2, 3, 4 of part 2 is not working (Mussage "Site couldn“t be found")
Wolfgang Schira
Sign in to Reply