News & Analysis

Endpoint techniques bolster audio quality for VoIP

6/24/2002 7:37 AM EDT

Endpoint techniques bolster audio quality for VoIP
Philip Bednarz, President, CEONetergy Microelectronics, Inc., Santa Clara, Calif.

Once a novelty, Voice-over-Internet-Protocol (VoIP) has become a serious alternative for enterprise phone systems. Various algorithmic techniques that operate at the endpoints of a VoIP network are key enablers for the technology, concealing packet loss and packet jitter. VoIP processors in IP phones and gateways fill an important voice-quality role by being able to handle the intense processing requirements at the end points without adding significant latency.

IP protocols were never designed to provide telephone service. A user may be willing to wait an hour to download a large software update but a delay of greater than 250 milliseconds can make a two-way telephone conversation difficult or impossible. Even a 100 millisecond delay gives the participants the feeling that they are speaking to a very cautious and thoughtful conversationalist. But Voice over IP can be made to work at near toll quality.

In IP telephony, a voice conversation is digitized, compressed and broken up into 10 to 30 millisecond audio packets and sent over the network. There is no guarantee that packets will arrive at their destination in sequence, in time or even at all. Packet loss over the Internet, depending on network congestion, can be as great as 30 percent and packet jitter--the variation of the arrival time of sequential packets-- can be as high as 70 to 100 milliseconds or more. In one-way streaming audio applications where some delay can be tolerated, buffering is used to conceal packet jitter and lost packets can be recovered by retransmission. However, these strategies all increase latency and do not fit well into the 250-millisecond psychoacoustic tolerance for telephone conversational delay.

The ultimate answer is adequate network bandwidth and network packet management that gives priority to time-sensitive packets. Local control by routers over the network manages packet loss, packet jitter and latency through preventive techniques such as prioritization. At the network endpoints, additional corrections such as packet loss concealment (PLC), adaptive jitter buffering (AJB) and latency reduction to give reasonable voice quality under adverse and unpredictable conditions. Let's take a look at these techniques for the network endpoints.

Lost packets result in dropouts in the received speech. Due to the natural redundancy in speech information, a 10 percent packet loss still provides intelligible speech; however, even a packet loss of one percent is annoying. The simplest way to conceal a lost packet is simply to replay the previous packet. Although this method is better than losing data, it still results in annoying distortion and breaks down under heavy packet drop-out or burst loss conditions.

Sending additional information with the basic voice packet stream is another way to conceal packet loss. For example, placing a redundant copy of every packet in the next packet provides perfect recovery for single packet drop out. In more complex algorithms, additional information about the characteristics of the voice signal is carried in extra packets on the network. These approaches can work well but they require additional network bandwidth and are not standardized, which may cause interoperability problems between VoIP system components.

The receiving endpoint can also maintain statistical information on the incoming audio stream such as voice pitch and the spectral response characteristics, and use this information to recreate a best estimate of a lost packet. The quality of the algorithm largely depends on its complexity. Some common low-bit-rate codecs such as the ITU's G.723.1 and G.729 algorithms have packet loss concealment capability built-in, but enhanced, proprietary methods can be used as well. For example, using the excitation and synthesis filter response of previous or following frames, the software can estimate those same parameters for the lost frame and the frame can be reconstructed with minimal distortion. Often estimates for multiple frames produce only moderate distortion.

A packet that does not arrive in time to be played out correctly is the same as a lost packet. Additional buffering prevents late-arriving packets from being lost to the audio stream. With the additional buffering, late packets can play on time and in order. However, increasing the play-out buffer length also increases latency. A dynamic algorithm continually monitors packet arrival statistics and minimizes the buffer length while still maintaining a low percentage of late packets.

For instance, if the packet arrival statistics indicate that packets are arriving sequentially and very regularly, the AJB latency can be reduced to zero. If the jitter increases, the AJB latency increases as well until the fraction of late-arriving packets drops to a given level. The algorithm that inserts extra packets or removes packets from the playout queue uses techniques similar to the PLC algorithm to minimize the perceptual artifacts.

Latency is the delay caused by network transmission, codec algorithm delay and local packet processing. A latency of more than 100 ms is noticeable, and at 250 ms feels like push-to-talk, half-duplex operation of an international call via satellite.

Network endpoints cannot provide much help for packet transmission and algorithm delays. However, they can address processing delays. Processing delay is the delay added by the computational processor and system I/O. To achieve toll quality service, a specialized, embedded processor is required with a real time operating system that can rapidly execute highly optimized algorithms.

The VoIP processor should ideally be an efficient, unified high-performance RISC processor with high-powered DSP capability optimized for voice-over-packet operations. Specific audio subsystem software is required to provide the sophisticated audio quality enhancement algorithms that run on this processor.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form