News & Analysis
Codec designs fine-tuned with spectral band replication
6/24/2002 7:32 AM EDT
Oliver Kunz, Vice President,Strategic Marketing, Coding Technologies, Nuernberg, Germany
Over the past two years, two new audio codecs have made their inroads into the market. Both codecs, mp3PRO and CT-aacPlus, are enhanced variants of previously existing audio codecs, namely MP3 and Advanced Audio Coding (AAC) and use a technology called Spectral Band Replication (SBR) to achieve a significant improvement in compression efficiency.
SBR-enhanced codecs achieve a significantly higher compression efficiency than their non-enhanced counterparts. The quality of mp3PRO at 64 kbits/s stereo is comparable to that of MP3 at a bitrate above 100 kbit/second. CT-aacPlus, building on the more efficient AAC as base codec, performs another 30 percent better than mp3PRO. Several independent listening tests indicate that CT-aacPlus is the most efficient audio codec available today, which was one of the main reasons for the use of CT-aacPlus in the XM Satellite Radio digital radio system.
AAC will remain state-of-the-art in conventional perceptual coding for the foreseeable future. Given the close connection between the efficiency of the base codec and the efficiency of its SBR-enhanced counterpart, CT-aacPlus can be expected to remain the best codec of its kind for the next few years. Minor improvements of SBR performance may occur in the course of the ongoing maintenance.
Another development that could probably lead to an even higher compression efficiency, however at the risk of slightly more noticeable differences between original and coded signal, are technologies that reconstruct stereo images. Such technologies, probably even in combination with SBR-enhanced codecs, could offer good stereo quality signals with full audio bandwidth at bitrates below 30 kbits/s. This would open even more opportunities for low bitrate audio applications such as streaming over mobile networks. Stereo reconstruction technologies are about to be introduced into the market.
Traditional waveform codecs such as MPEG Layer-2, MP3, AAC, Atrac or Windows Media Audio all have their performance limits, albeit at different levels. Increased compression efficiency beyond the performance level of AAC using only the conventional approaches of perceptual audio coding is very hard to achieve. When waveform codecs operate below their reasonable operating point--which is a function of data rate, number of channels, audio bandwidth and sample rate--the quantization noise significantly violates the masking threshold, and the codec will generate audible artifacts.
Two main methods have been used so far to overcome this problem in perceptual waveform codecs. The most important one is to limit the audio bandwidth of the signal in or prior to the coding process. As there is no high frequency energy left to be coded, more bits remain available for the low frequency region of the spectrum, resulting in a clean, but hollow sounding signal. The other method, called intensity stereo, can only be used for stereo signals. In intensity stereo, only one channel and panning information is transmitted instead of a left and a right channel. However, this is only of rather limited use for increasing compression efficiency, because in many cases the stereo image of the audio signal gets destroyed because it can be considered a coding artifact.
At this point, SBR technology comes into play. SBR uses a hybrid waveform/parametric coding method. It is based on the fact that in most cases there are dependencies between the lower and higher frequency components of an audio signal. Therefore, the high frequency part of an audio signal can be reconstructed from the low frequency part. Transmission of the high frequency part is therefore not necessary. Only a small amount of SBR control data needs to be carried in the bit stream to guarantee an accurate reconstruction of the high frequencies.
The mere generation of high frequency content, however, is not sufficient for accurate high frequency reconstruction, since the reconstructed part does not reflect the spectral envelope of the original. Therefore, careful adjustment of the spectrum is essential for the performance of the system. The adjustment is controlled by the SBR information carried in the bit stream and results in a correctly shaped high frequency part.
SBR-enhanced codecs are preferably used as a dual-rate system, with the underlying codec operating at half the original sampling-rate, while SBR operates at the original sampling rate. This does not only ensure maximum coding efficiency, it also allows to keep the computational complexity of the overall system lower.
SBR works like a shell around the codec. It contains a pre-processing step on the encoding side and a post-processing step on the decoding side. On the encoding side, the incoming audio signal is analyzed to extract information about the high frequency components of the signal. This information is later transmitted inside the encoded bit stream. In the case of mp3PRO, the so-called "ancillary data" fields in the MP3 bit stream are used to transmit this SBR helper information. Therefore, an mp3PRO bit stream can still be decoded by any compliant MP3 decoder, although in this case the high frequency components would be missing. Further tasks performed by the SBR preprocessing part in the encoder are the sample rate conversion and the low pass filtering to a bitrate-dependent cutoff-frequency above which the spectral components will be reconstructed by SBR.
Most of the actual SBR algorithm is contained in the decoder component. This part is responsible for creating and adjusting the high frequency components to finally achieve a high quality audio output signal. The interfacing between the underlying audio codec and the SBR postprocessor is straightforward. The required input data for the SBR algorithm are the decoded audio signal at half the original sample rate and the SBR helper information which needs to be extracted from the bit stream. SBR uses these two pieces of information to recreate the high frequency components of the original audio signal. Thanks to that lean interface, implementers of SBR-enhanced decoders can usually re-use their existing MP3 or AAC decoder implementations without major modifications.
Audio codecs are usually standardized using a normative description of the decoder and bit stream, and no normative encoder description. Due to the necessary careful tuning of the encoder to achieve maximum benefit from the combination with SBR, a re-use of existing encoder implementations requires a diligent review of the implementation.
Decoders usually operate under much tighter complexity constraints than encoders. It should be noted that an mp3PRO encoder does not consume more MIPS than a full sample rate MP3 encoder does. When comparing the computational requirements of an MP3 and an mp3PRO decoder, you will find a moderate increase in MIPS demand of around 40 percent and an increase in the amount of RAM needed by roughly a factor of two. Both numbers assume a highly optimized MP3 decoder.
The SBR component of an mp3PRO or CT-aacPlus decoder contains an analysis and a synthesis QMF filter bank--the high frequency reconstruction is done in the frequency domain plus a number of processing blocks that aim to adjust the reconstructed high frequency components to match those of the original audio signal. Thanks to the dual-rate approach, the complexity requirements of the underlying audio codec are significantly lower than usual.
Since the MP3 decoder part and the SBR part are processed sequentially, the re-use of certain memory areas such as the output buffer of the mp3 decoder is possible. Detailed guidance on this as well as template code for implementations of mp3PRO on fixed-point processors is available to implementers as part of the licensing package for mp3PRO. The same is true for CT-aacPlus.


See related chart
