News & Analysis

Packet classes are optimized for use

Andy Gottlieb, Vice President of Marketing, AMCC Switching and Network Processing, San Diego, Calif.

9/25/2001 12:14 PM EDT

Packet classes are optimized for use
efore deciding where to handle packet classification, designers first must determine what type of classification is required. With the convergence of heterogeneous traffic over shared networks and the simultaneous proliferation of bandwidth-hungry and/or latency-sensitive applications, the need for flexible, deep packet classification is becoming increasingly important.

In addition, emerging content-aware applications, such as content-switching, load balancing and stateful firewalls all require on-the-fly deep-packet look-up functions. At the same time, the escalation of optical interface bandwidth means wire-speed classification tasks must occur much more quickly.

Here, we review the applications-level factors that are driving the need for higher performance and more adaptable packet classification as well as exploring the multifaceted implementation alternatives.

When it comes to implementing network search and packet classification, system designers face a three-dimensional squeeze play. On the first axis, the breadth of requirements for deep-packet classification is continually expanding to accommodate applications such as policy-based quality of service (QoS), Voice-over-Internet Protocol (VoIP), virtual private networks (VPNs), multiprotocol label switching (MPLS), streaming data, and a variety of policy-based security applications and firewalls. It is clear that wire-speed packet classification based on comprehensive QoS criteria will play a vital role in the success of heterogeneous multiservice network environments that combine data with latency-sensitive applications such as voice.

In addition, bandwidth management applications, such as load balancing, accounting and billing, and policy-based network security applications such as access control and prevention of distributed denial of service (DDOS) attacks are being designed into state-of-the-art systems, thereby requiring additional classification processing bandwidth.

All of these QoS and policy-based management applications are pushing networking infrastructure equipment to deliver much higher capacity for more searches per packet.

On the second axis, the sheer amount of information that must be efficiently stored and retrieved in network databases is rising exponentially. Only a few years ago, typical router databases included only 1,000 to 10,000 entries. However, with tables now consisting of forwarding addresses, access control lists, Layer 2 MAC address lists, QoS parameters, etc., overall database size and complexity is increasing dramatically. Next-generation networking equipment will require database capacities of 250,000 entries to as high as 4 million entries, representing an increase of as much as 100 times over today’s requirements.

Finally, on the third axis, the escalation of network speeds to multi-gigabit levels (e.g. OC-48, OC-192 and OC-768) is dramatically compressing the time available for accomplishing packet classification at sustained wire-speed levels. As these higher speeds migrate outward from the core toward the network edge, the next few years will likely see a 20 to 50 times increase in packets per second flowing through the average router.

The alternatives for implementing packet classification can be generally grouped into three major approaches:

  • A pure software approach, running directly on the network processor hardware;
  • Use of an on-chip hardware-optimized coprocessor core for targeted classification functions;
  • Use of off-chip hardware coprocessors that can provide scalability for handling deeper and more complex classification requirements.

It has already become quite clear that using a pure software model on a general-purpose processor is fundamentally inadequate for today’s high-speed deep search and classification functions. Even optimized network processor unit (NPU) architectures have turned to on-chip hardware coprocessors rather than trying to accomplish all classification tasks in software on the core processor. Recently, with the dramatic expansion of networked databases and increased wire-speed requirements, the industry has further responded with specialized off-chip devices that offer complete flexibility to handle part or all of the search/classification functions.

While at first these three approaches may appear to be distinctly separate methodologies, in pragmatic, real-world system designs the best answer often is a blended implementation that leverages both on-chip and off-chip resources to address various aspects of the packet classification challenge. Such a comprehensive blended approach allows for an efficient cascading of processing functions, in which the initial classifications are performed on-chip and then subsequent deeper classification tasks are handed off to specialized off-chip processing resources.

However, to make the blended approach practical, designers need access to a uniform application programming interface (API) and a modular programming model that allows for efficient distribution of classification/search functions and that essentially makes the underlying hardware implementation transparent to the software. In effect, the entire processing pipeline, from the NPU core through the on-chip coprocessor and any off-chip coprocessors, must function as a seamless logical whole. For example, a modular programming model can deliver the flexibility on the front-end to handle a wide range of packet types as well as the power on the back-end to rapidly perform deep searches of large databases.

A modularized programming model also promotes smooth extensibility of system architectures because it allows for relatively straightforward software upgrades. For instance, an initial system design could focus on delivering IPv4 and ATM packet classification, after which IPv6 and/or MPLS could be added simply by reprogramming the on-chip policy engine to recognize the new packet types and to parse them out to off-chip processors.

As the first step in the classification process, the on-chip coprocessor must handle the full bandwidth of incoming packets, while making a relatively limited set of decisions to determine the packet type. For today’s requirements this means that an on-chip processor must be able to carry out as many as 100 million searches per second (MSPS). In addition, the interface between the NPU core and the on-chip coprocessor must offer optimal efficiency with a minimal number of clock cycles.

After accomplishing the initial type classification in a single-cycle look-up process on-chip, the coprocessor can simply point to an application-specific subroutine (e.g. IPv4, IPv6, MPLS, IP multicast, ACL, etc.) to initiate the subsequent off-chip processing.

The use of external off-chip coprocessors then provides maximum flexibility and scalability for handling the large databases and deeper look-ups required in many new applications. For instance, the need to support 256k IP header look-ups or store as many as 1 million MPLS labels is not something that can be practically accomplished within an embedded on-chip coprocessor. Going off-chip also allows the coprocessor devices to handle a full range of application-specific search/classification requirements, including longest-prefix matches, wild-card look-ups, range matching, etc.

Figure 2 shows the key relationships and data flow between an NPU (nP7250) with on-chip policy engine coprocessor and an external search/classification coprocessor (XSM).

By standardizing the NPU’s search-coprocessor interface, overall design flexibility is further enhanced because the NPU doesn’t need to know or care about where and how the search is being conducted. It simply sends out the request and receives the needed response, regardless of whether the search is handled by the on-chip coprocessor or by off-chip devices.

Efficient standards-based interfaces are required to support both data-plane and control-plane integration of multiple devices as well as allowing independent scalability for database search requirements vs. other processing tasks. In essence, the designer is free to scale either side of the equation by combining a single NPU with multiple cascading search engines or using many NPUs to access a single search engine, which automatically arbitrates between the different search tasks.

For example, standardization of the search-coprocessor interface allows for straightforward implementation of scalable external coprocessor databases, which can be seamlessly accessed by multiple NPUs for optimal efficiency. By enabling multiple NPU pipelines on the same line card to share common search tables, such a design can maximize port densities and processing performance while minimizing space consumption and power dissipation. The sharing of an external search coprocessor with a standardized interface eliminates the need for multiple memory banks and the cost and complexity of FPGA-based glue logic, and greatly simplifies the CPU’s task of managing and updating the search tables.

The emergence of standard coprocessor interfaces also has opened the door for technology partnerships, thereby bringing together leading-edge NPU devices with high-performance network search engines and classification coprocessors. For example, the same hardware interface and modular software programming model that is used for AMCC’s internal NPUs and XSM devices also can provide for glueless and seamless integration of industry-leading search and classification devices from NetLogic. In addition, the same interface methodologies are currently under review by the Network Processor Forum’s working group on LookAside processors and are being considered for adoption as an industry standard.

For system designers, the bottom line is the flexibility to efficiently distribute network search functions across a seamlessly unified hardware/software architecture. This allows for tailoring the entire packet classification process to conform to dynamically evolving application-specific requirements. Instead of having to periodically start from scratch to develop new capabilities for previously unanticipated requirements, designers can leverage an intelligently segmented and independently scalable NPU and search/classification architecture to smoothly accommodate higher speeds, proliferating packet types and deeper packet classification requirements.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form