News & Analysis
Granular scheme tunes network video
Rama Kalluri, Member of Research Staff, Philips Research, Briarcliff Manor, N.Y.
6/17/2002 11:11 AM EDT
Data networks are becoming the most prevalent form of communication, with the Internet and wireless networks (such as Wi-Fi) leading the way. Video transmission is moving from the traditional broadcast network environment, such as terrestrial, cable and satellite, to those data networks. Decoded video quality is very sensitive to network performance. In this context, the ability of a network-and in the end, the provider of the content-to effectively control the bandwidth usage by video streams is important.
A newly standardized scalable scheme, called fine-granular scalability (FGS), allows adaptation to bandwidth variations in real-time by transmitting enhancement information in incremental fashion. FGS relies on a nonscalable base layer, whose bandwidth is fixed but low enough to be guaranteed, and an incrementally coded enhancement layer that can be cut to satisfy the available bandwidth. As more bandwidth becomes available, more per-frame enhancement information is transmitted, improving the decoded picture quality. FGS performance strongly depends on the quality of the base layer.
Two competing base-layer coding standards are nearing completion, MPEG-4 and H.26L. It has been shown that the new International Telecommunication Union's H.26L standard outperforms MPEG-4 coding by 1 to 3 dB at the same bit rate, thanks to new features not present in the MPEG-4 standard, such as variable-block-size motion vector coding, in-loop filtering and multiple past-reference frames. H.26L has put scalable coding methodologies on its horizon, hence bringing up the question of how best to design the corresponding enhancement layer.
The fundamental design factor is the choice of transform to use in the enhancement layer, since it affects the overall performance of the scalable coding system. An optimal transform concentrates the energy into the fewest coefficients, such that it costs fewer bits to code the most active coefficients.
Transform design
In the base layers, MPEG uses an 8 x 8 discrete cosine transform (DCT), whereas H.26L uses a 4 x 4 transform. The enhancement layer may be designed around those two or a number of other transforms. We examined systems using both of those transforms as well as those based on wavelet.
First, we compared the compaction ratios of the H.26L 4 x 4 transform and the MPEG 8 x 8 DCT, applying each of them to the base-layer residual signal. The compaction ratio provides a measure of how well the transform has packed the energy into the fewest coefficients. If the compaction ratio is large, then the transform has done a good job of packing the energy into the fewest coefficients. If the compaction ratio is small, then the energy is spread out across the transform coefficients. Of course, a larger compaction ratio means that fewer coefficients must be coded. It is well-known that the MPEG 8 x 8 DCT does an excellent job in this regard. Here we compare it with the H.26L 4 x 4 transform.
We examined the compaction ratio of the H.26L base-layer coding residual and made the following observations:
- The MPEG 8 x 8 DCT had a greater compaction ratio than the H.26L 4 x 4 transform. On some extremely rare occasions, the H.26L 4 x 4 transform was better.
- Regardless of the parameters used in the base layer, the MPEG 8 x 8 DCT outperformed the H.26L 4 x 4 transform.
- When the base layer was coded using a low quantization parameter-for example, between 1 and 5-the compaction ratio of the H.26L 4 x 4 transform was less than 1.
This last observation is critical. It implies that the H.26L 4 x 4 transform actually expanded the information in certain instances. Far short of being optimal, this means that simply transmitting the original pixels is more optimal than transmitting the H.26L 4 x 4 transform coefficients.
Having established a reason for choosing the MPEG 8 x 8 DCT, we proceeded to verify that this choice was optimal among those investigated, by implementing an FGS-like system to encode the residual. Given the result of the transform, the coder produces a bit-plane-by-bit-plane-oriented stream. We coded the residual and applied the MPEG 8 x 8 DCT to the base-layer residual. The transform coefficients were then coded using the MPEG-4 FGS variable-length coding tables.
We compared the performance of each of those methods by applying the appropriate transform and coding according to the corresponding algorithm. The efficiency was determined by cutting the generated streams to satisfy a set of bit rates and determining the decoded picture quality. These observations were made:
- The performance was similar for MPEG-4 FGS, arithmetic coding of 8 x 8 DCT residual and set partitioning into hierarchical trees (SPIHT) coding of wavelet coefficients.
- The arithmetic-coded H.26L 4 x 4 transformed residual was consistently worse than any other method.
- The SPIHT method gave slightly better performance than the methods using the 8 x 8 DCT. But the gain was insignificant.
- The SPIHT method is significantly more complex than the adaptive arithmetic-coded 8 x 8 DCT residual method.
- The MPEG 8 x 8 DCT has good compaction properties compared with the H.26L 4 x 4 transform. Also, coders such as MPEG-4 FGS and adaptive-arithmetic coding of 8 x 8 DCT coefficients, designed around this transform, have better coding efficiency.
MPEG-4 FGS provides nearly the same performance as SPIHT coding of wavelet coefficients, with far less complexity. The efficiency and flexibility of MPEG-4 FGS make it ideal for adaptation to varying network conditions.
This article will be presented at the ICCE show in Los Angeles, in a paper called "Fine Granular Scalability for H.26L-based Video Streaming."



