News & Analysis

Hot Chips' burning issue: packets

Will Wade

8/18/2000 3:30 PM EDT

Hot Chips' burning issue: packets
PALO ALTO, Calif. — Network processing technology took center stage this week (Aug. 13-15) at the annual Hot Chips technical conference, where nearly a third of the presentations focused on the mounting problem of moving more digital packets at higher speeds while attending to packet data's special processing needs.

Some presenters suggested using coprocessors alongside conventional network processors to turbocharge performance, and industry heavyweights detailed how they hope to eke more power out of existing architectures. Other ideas included several designs that tap very wide embedded memories to increase on-chip bandwidth, one chip optimized solely for voice traffic, and an approach that redefines and relabels the headers used to direct packets to their final destination.

Startup Entridia Corp. detailed a packet-forwarding engine that can speed data routing by redefining the way the data is labeled. While standard routers examine each packet's header to look for the destination address, Entridia's architecture calls for a chip that adds new, smaller tags to the headers, said Paramesh Gopi, the company's co-founder and vice president of technical marketing. That gives the systems less data to read and allows packets to be processed faster.

"There is a big market for traditional network processor architectures in places where having sustained wire-speed rates is not critical," he said. "But this approach can work for customers that need wire-speed performance guaranteed."

Entridia already has products out that run at OC-3 (155-Mbit/second) and OC-12 (622-Mbit/s) rates, and its OC-48 (2.5-Gbit/s) entry will launch in November. But Gopi said those chips were mainly created as a foundation for the very high-speed products the company plans to launch over the next few years, including an OC-192 (10-Gbit/s) chip expected next spring and an OC-768 (40-Gbit/s) device that will follow by the end of next year. Some of the major networking system companies are already using Entridia's products, and Gopi said he expects to land several more key design wins later this year.

Bob Merritt, director of emerging markets for market analysts Semico Research Corp., said Entridia's approach has some strengths but advised that the company pay attention to details beyond the purely technological. "This idea definitely has some routing advantages, but the disadvantage is that the communications industry has always been paranoid of using sole-source chips," he said. "Communications is heavily regulated by the government and is almost considered a national resource. They will be reluctant to support a technology if there's only one vendor providing the chips."

One key to Entridia's design is its memory architecture, which features an on-die cache using a 128-bit-wide bus. Several other network processors also have used wide memory buses to increase on-chip data transfers.

Mike O'Connor, director of advanced architecture for Silicon Access Networks, said that more than 70 percent of his company's iFlow address processor is embedded DRAM, which is used to speed the lookup phase (during which a routing engine compares the address in a packet to addresses stored in a lookup table to determine where to send the data).

Indeed, the key to Silicon Access Networks' approach is not just the on-die cache but the use of memory blocks with exceptionally wide custom DRAM, which allows for faster communication within the chip, O'Connor said. Besides 1.2 Mbits of SRAM, with a width exceeding 2,000 bits, the iFlow packs 52 Mbits of DRAM, in three different configurations. Two Mbits feature blocks 256 bits wide running at 133 MHz; 25 Mbits run at the same speed but are 100 bits wide; and the final 25 Mbits are slower, running at 66 MHz, but are 3,200 bits wide. O'Connor said that the total aggregate bandwidth for the memory is 252 Gbits/s.

The chip can support systems running at OC-192 rates, and up to four of the devices can be implemented in a cascade structure. The iFlow will be fabricated in a 0.18-micron process at Taiwan Semiconductor Manufacturing Co. Ltd. and is slated to be available next year.

A number of other network processor designers, including EZchip and PixelFusion Ltd., are turning to embedded DRAM to achieve OC-192 speeds.

Peter Glaskowsky, senior analyst for multimedia at MicroDesign Resources, said embedded memory has some inherent advantages because on-die buses can be created that are much wider, and therefore much faster, than the buses that link separate chips. Using custom DRAM can also allow designers to create bus widths that closely match the size of packets, allowing the systems to process a packet with every clock cycle.

No-go for graphics
Glaskowsky earlier saw embedded memories come into vogue in the graphics market, where the same high data transfer rates were valued. But the graphics market also demanded that the chips process and manipulate the data — and it proved nearly impossible to integrate enough memory on-chip to store all the data. In the networking segment, where the packets flow off-chip as fast as they come on and where little processing is required, embedded memory is likely to become a more common technology, he predicted.

"A lot of network processor designs are moving in that direction," Glaskowsky said. "In the graphics space it isn't very common anymore, but in the networking segment it still makes sense."

The iFlow is designed not as a standalone network processor but as a coprocessor dedicated to address lookups and to supporting a primary processor. O'Connor said use of the iFlow can free the main processor to perform up to an additional 1 billion operations per second.

SwitchOn Networks Inc., like many other companies, is focused on Layer 7 classification, a step in which packets are examined down to such details as whether the packet is part of a music file or e-mail message. Many companies believe a separate coprocessor is required to perform such work at high speeds.

SwitchOn detailed a coprocessor chip that focuses on the classification stage, wherein packets are examined to determine whether they qualify for any special treatment, such as voice and video streams that demand higher quality of service than simple e-mail messages. According to chief executive officer Ajit Shelat, the ClassIPI chip features a very wide memory bus and has some on-die SRAM cache but is mainly linked to off-chip memory. Its primary task is to classify packets: It not only reads their headers to determine where to route them but also determines whether they are standard data packets or requiring of special treatment, such as the QoS classification demanded by voice and video traffic.

Shelat said the design can support as much as 58 terabits/s of aggregate memory bandwidth and can support OC-192 network speeds. Implementing up to four of the chips together can support networks running at OC-768. "That's the kind of performance that will be required for the next generation of switches that are coming to the market," he said.

Analyst Merritt said the coprocessor designs are going to become more common as Internet traffic continues to swell because they allow the main processor to shift some of the workload to a specialized device and can allow the overall system to work more efficiently. "I think coprocessors are going to be the next wave in the network processing world," he predicted.

Voice traffic is one application that is particularly suited to a coprocessor approach. Jayan Ramankutty, founder and board member of EmpowerTel Networks, gave a presentation detailing his company's design: a special processor aimed at the burgeoning voice-over-Internet Protocol (VoIP) market that uses embedded memory to decrease latency as the chip routes huge numbers of packets at high speed. Latency in routing packetized voice traffic through digital networks can lead to gaps between packets, and voice users will not tolerate the resultant breaks in the flow of conversation.

"Lookup is a very, very real issue," Ramankutty said. "This is where the bottleneck can occur. The key to solving this is low latency."

EmpowerTel's Media Express Processor (MxP) uses four 32-MIPS processor cores and has at its heart an embedded content-addressable memory to limit latency, according to Ramankutty. With a 128-bit FIFO buffer to store search results, the chip will route the first packet to its destination, and subsequent packets will arrive with no delays because the MxP can process them in just a single clock cycle.

Other companies detailing conventional network processors included C-Port, which is now part of Motorola; Vitesse Semiconductor Corp.; and IBM Corp. Many of the current high-powered network processor designs are based on parallel computing: C-Port's chip, for example, features 16 processors working together, and the Vitesse design has four CPUs.

Merritt noted that as RISC engines are strung together, the overhead to keep them running in sync increases faster than the performance gains obtained from having additional CPUs. The concept thus may not be the best way to move into the generation of very high-bandwidth networking systems.

Meanwhile, it remains to be seen which of the current crop of coprocessors will prove most effective.

"These are the devices that are going to make the Internet go in the future, and designers today have to solve problems that haven't even been identified," Merritt said.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form