Design Article

How VoIP works: protocols, codecs, and more

David Katz, Tomasz Lukasiak, Rick Gentile, and Wayne Meyer, Analog Devices

7/28/2006 12:00 PM EDT

The age of voice-over-Internet-protocol (VoIP) is here, bringing together telephony and data communications to provide packetized voice and fax data streamed over low-cost Internet links. The transition from circuit-switched to packet-switched networking, continuing right now at breakneck speed, is encouraging applications that go far beyond simple voice transmission, embracing other forms of data and allowing them to all travel over the same infrastructure.

What is VoIP?
Today's voice networks—such as the public switched telephone network (PSTN)—utilize digital switching technology to establish a dedicated link between the caller and the receiver. While this connection offers only limited bandwidth, it does provide an acceptable quality level without the burden of a complicated encoding algorithm.

The VoIP alternative uses Internet protocol (IP) to send digitized voice traffic over the Internet or private networks. An IP packet consists of a train of digits containing a control header and a data payload. The header provides network navigation information for the packet, and the payload contains the compressed voice data.

While circuit-switched telephony deals with the entire message, VoIP-based data transmission is packet-based, so that chunks of data are packetized (separated into units for transmission), compressed, and sent across the network—and eventually re-assembled at the designated receiving end. The key point is that there is no need for a dedicated link between transmitter and receiver.

Packetization is a good match for transporting data (for example, a JPEG file or email) across a network, because the delivery falls into a non-time-critical "best-effort" category. For voice applications, however, "best-effort" is not adequate, because variable-length delays as the packets make their way across the network can degrade the quality of the decoded audio signal at the receiving end. For this reason, VoIP protocols, via quality-of-service (QoS) techniques, focus on managing network bandwidth to prevent delays from degrading voice quality.

Packetizing voice data involves adding header and trailer information to the data blocks. Packetization overhead (additional time and data introduced by this process) must be reduced to minimize added latencies (time delays through the system). Therefore, the process must achieve a balance between minimizing transmission delay and using network bandwidth most efficiently—smaller size allows packets to be sent more often, while larger packets take longer to compose. On the other hand, larger packets amortize the header and trailer information across a bigger chunk of voice data, so they use network bandwidth more efficiently than do smaller packets.

By their nature, networks cause the rate of data transmission to vary quite a bit. This variation, known as jitter, is removed by buffering the packets long enough to ensure that the slowest packets arrive in time to be decoded in the correct sequence. Naturally, a larger jitter buffer contributes to more overall system latency.

As mentioned above, latency represents the time delay through the IP system. A one-way latency is the time from when a word is spoken to when the person on the other end of the call hears it. Round-trip latency is simply the sum of the two one-way latencies. The lower the latency value, the more natural a conversation will sound. For the PSTN phone system in North America, the round-trip latency is less than 150 ms.

For VoIP systems, a one-way latency of up to 200 ms is considered acceptable. The largest contributors to latency in a VoIP system are the network and the gateways at either end of the call. The voice encoders and decoders (codecs) add some latency—but this is usually small by comparison (<20 ms).

When the delay is large in a voice network application, the main challenges are to cancel echoes and eliminate overlap. Echo cancellation directly affects perceived quality; it becomes important when the round-trip delay exceeds 50 ms. Voice overlap becomes a concern when the one-way latency is more than 200 ms.

Because most of the time elapsed during a voice conversation is "dead time"—during which no speaker is talking—codecs take advantage of this silence by not transmitting any data during these intervals. Such "silence compression" techniques detect voice activity and stop transmitting data when there is no voice activity, instead generating "comfort" noise to ensure that the line does not appear dead when no one is talking.

In a standard PSTN telephone system, echoes that degrade perceived quality can happen for a variety of reasons. The two most common causes are impedance mismatches in the circuit-switched network ("line echo") and acoustic coupling between the microphone and speaker in a telephone ("acoustic echo"). Line echoes are common when there is a two-wire-to-four-wire conversion in the network (e.g., where analog signaling is converted into a T1 system).

Because VoIP systems can link to the PSTN, they must be able to deal with line echo, and IP phones can also fall victim to acoustic echo. Echo cancellers can be optimized to operate on line echo, acoustic echo, or both. The effectiveness of the cancellation depends directly on the quality of the algorithm used.

An important parameter for an echo canceller is the length of the packet on which it operates. Put simply, the echo canceller keeps a copy of the signal that was transmitted. For a given time after the signal is sent, it seeks to correlate and subtract the transmitted signal from the returning reflected signal—which is, of course, delayed and diminished in amplitude. To achieve effective cancellation, it usually suffices to use a standard correlation window size (e.g., 32 ms, 64 ms, or 128 ms), but larger sizes may be necessary.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form