News & Analysis
Architectural diversity marks net processor intros
Peter Clarke and Craig Matsumoto
6/14/2001 3:31 PM EDT
SAN MATEO, Calif. Chip makers ladled out a mulligan stew of network processor architectures at the Embedded Processor Forum in San Jose, Calif., this week, proffering such diverse approaches to packet forwarding that it was hard to pick the winners from the sinners.
Some vendors served up what they called "packet processors" at the OC-768 (40-Gbit/second) speed grade, while others claimed greater flexibility and broader applications potential for devices tailored for lower speeds. And between the extremes of fast biprocessors and 16-way multiprocessors there arose hybrid architectures that contained some combination of multiprocessing or very long instruction word (VLIW) processing with single-instruction, multiple-data (SIMD) parallelism.
The bounty of options detailed at the forum and its new adjunct, the Network Processor Forum, never jelled into a vision of a single way forward for network processors.
"It's clear there are too many companies in this space," said Linley Gwennap, president and principal analyst with The Linley Group (Mountain View, Calif.). "Who can deliver their chip, get to market on schedule [is key]. There's just not enough design wins to go around, and it's not necessarily the best technology that will win."
Even as the new network processor designs begin to roll, the earlier wave of NPUs from the likes of C-Port Corp. (now owned by Motorola Inc.) and Sitera (now owned by Vitesse Semiconductor Corp.) is just beginning to hit volume production. "The architectures haven't stabilized yet," said Bob Merritt, an analyst with Semico Research Corp. (Phoenix). "There's just not enough data back from this first wave of processors" to know what works and what doesn't.
Moreover, no one architecture is likely to excel at all of the tasks required of the NPU, in Gwennap's view. "There's no benchmark here," he said. "There's megapackets per second, which usually means something like IPv4 layer 3 packet forwarding, but that's just one task. Multiprotocol handling, translation, encapsulation that's what's really being done" in a network.
The lack of clear benchmarks makes it difficult to gauge which architectures are best suited for which tasks. The result is a panorama of approaches, with no obvious way to separate the leaders from the laggards.
"You can't look at a block diagram [of a chip design] and say this or that architecture will or won't work," Gwennap said. "It's somewhere down in the details that an architecture or implementation is bottlenecked."
Many vs. few
One obvious distinction among the welter of architectures described this week lies in the number of processor cores used. In most cases, companies are combining lots of small cores, as in the 16-processor design from Lexra Inc. (San Jose). Such designs typically are targeted at the data plane, the portion of the networking architecture that examines every packet for forwarding and classification information. Speed, or at least throughput, is of the essence in the data plane, while programming complexity may be trivial, Gwennap said.
Multiple cores make it possible to dice the packet-forwarding problem into pieces that can be handled in parallel. Lexra and Clearwater Networks Inc. (Los Gatos, Calif.) presented different multithreaded schemes at the forum, while Xelerated Packet Devices AB (Stockholm, Sweden) showed off an approach in which individual tasks are assigned to particular processors.
An alternative scheme is to use very few cores running at extremely high speeds, as in the dual, 1-GHz processor presented by PMC-Sierra Inc. (Burnaby, British Columbia). Architectures like this one are best suited for the control plane, which handles exceptions discovered by the data plane. Here, clock speed is less important; instead, the control plane "does require a more broad approach" that can be suited to a MIPS uniprocessor, Gwennap said.
MIPS compatibility is important in the control plane for one particular reason: Networking-equipment giant "Cisco has millions of lines of MIPS code," Gwennap said. "In the data plane there is a lot less legacy code. It's an easier market for companies to enter, and so there may be many new architectures" for that realm.
Among startups, architectural ideas went beyond the combining of familiar cores. Clearspeed Technology Ltd. (Bristol, England) previewed a chip based on four processors, each containing 64 SIMD elements.
Another unusual architecture was presented by Cognigine Corp., which described its Variable Instruction Set Communications concept as a hybrid of VLIW and SIMD concepts. Cognigine's chip configures its processing elements on the fly, depending on the instructions they have to execute, never fully hardwiring itself.
"It gives them a different way of scaling than we're used to, whether it's programmable silicon or an ASIC," analyst Merritt said.
The problem with most architectures is that Moore's Law actually restricts their progress, said Merritt. Whether hardwired or software-programmable, once an architecture is etched into silicon, its speed improvements are bound by Moore's Law, which describes a slower rate of growth than that of network bandwidth.
Over and above the choice of how many processors to use and what size they should be (full-featured 64-bit, or standard or extended 32-bit) is the difficult issue of interconnection within the chip.
"A nonblocking crossbar [switch] is the way to go," Gwennap said. "One of the problems with C-Port was that [its chip] has two levels of interconnect, a local cluster and a chip-wide [interconnect]. In such a system, getting the balance right is important. The long-distance connection was a bottleneck and prevented meeting OC-48 [2.5-Gbit/s] data rates."
More generally, on-chip bandwidth was a concern for many of the architectures presented at the Embedded Processor Forum. Clearwater and Brecis Communications highlighted specialty on-chip buses, while PMC-Sierra described plans to use the HyperTransport bus from Advanced Micro Devices Inc.
Memory is also an obstacle, since access times can become a bottleneck at high data rates. Clearwater officials said they took particular pains to improve memory throughput or at least to hide memory latency in their architecture.
Software is key
While some multiprocessor vendors seem to believe that hand optimization of code is commonplace, Linley Group analyst Bob Wheeler was adamant that in order to gain market acceptance, multiprocessor designs must be as programmable as single processors, and they must be accompanied by a C compiler.
"The single-image programming model is a lot simpler," Wheeler said. "One way to do that is to provide prebuilt code. The trouble is, although it may work, it often isn't as good as will eventually be needed. So having a compiler is a big advantage."
By using a C compiler, companies can mask the complexity of their chips. Cognigine's device, for example, can use C programs to hide the variable instruction set from the user.
C compilers could become ubiquitous in network processing, Gwennap believes. "Almost every new [NPU] vendor we talk to is working on one," he said. "The customer may not use it for everything. Typically they would use it for early code, which you can then profile to find the hot spots to tune."



