Design Article
Distributed switch fabric standards: more performance, choices
Chuck Hill, System Architect, Motorola Computer Group, Tempe, Ariz., Active PICMG 2.20 and PICMG 3.x Committees
7/15/2002 9:04 AM EDT
Mesh topology fabrics are becoming the topology of choice for open architecture systems. A mesh populates point-to-point connections until all nodes have connections to all other nodes. At this time, the hierarchy found in a typical star topology disappears. Each node can be an endpoint, a router, or both. Mesh networks can be used as a distributed switch fabric. Each node switches its own traffic. There is no dependence on a central resource. All nodes are equal in a peer-to-peer system.
Carrier grade switch fabrics found in much of today's telecom equipment, have varying implementations, but have similar characteristics. High-end switch fabrics are generally partitioned into multiple elements with a star topology. The "traffic managers" receive the layer-two traffic, classify and schedule the traffic. The traffic managers encapsulate the layer-two traffic into a format specific to the lower layer fabric. In this way, the fabric can transport a variety of different protocols by connecting traffic managers specialized for each protocol type.
A mesh network creates a "distributed fabric". The mesh interconnect creates a fully populated, non-blocking, fabric. Because each node has a direct route to all destinations, a node does not have to route traffic for other nodes. Each node in the fabric contains a portion of the fabric. Instead of an NxN switch, the fabric consists of M number of 1xN switches distributed among the line cards.
Most commercial off-the-shelf fabrics (COTS) have industry standard interfaces to the traffic managers, but have proprietary interfaces and protocols between the traffic managers and the fabric. This creates a problem for open architecture systems with blades from multiple vendors. The arrival of network processors as programmable traffic managers has created demand for more interoperability.
The Network Processing Forum has defined a standard "common switch interface" specification (CSIX). This interface defines electrical and packet protocol layers for managers and fabrics to communicate. The electrical layer is a parallel bus based on UTOPIA (Universal Test and Operations Physical Interface for ATM). CSIX helps different silicon components interoperate, but still does not solve the need for interoperability in a backplane / blade system configuration. Proprietary fabrics require the interface at the backplane level to use all the same vendor's components.
Open architecture systems need interoperability at the fabric layer. There are several serial fabric standards like Infiniband and RapidIO, but network processors do not communicate with these standards. Furthermore, these protocols contain much more capability than is needed to move packet data between network processors.
The CSIX protocol is optimized for moving packets through a carrier grade switch fabric, and is the standard that will be implemented by most of the next generation of network processors. A mesh fabric can use the CSIX packet as it emerges from the network processor, to transport the payload over the serial interface. CSIX contains provisions for applying priority and managing quality of service (QoS). CSIX also defines packet types for flow control, broadcast, and multicast.
The CSIX Unicast packet offers payloads up to 256 bytes with only 10 bytes of overhead. The Ready D and Ready C bits offer link layer flow control between the Network Processor and Fabric Interface. The 12 -bit Destination Address allows 4096 unique destination addresses. The Class Field allows the arriving data to be treated with 256 different classes of service, or associated with 256 individual flows at each destination.
The simplest path to an open fabric standard is to extend the CSIX protocol into the serial layer of a mesh fabric using a common serial physical layer. The Gigabit Ethernet (IEEE802.3z) physical layer with 8b/10b encoding and has been around the longest.
It is straightforward to serialize the CSIX packet and encapsulate it using the K-Codes for START and END. The IDLE code is a "comma code" that also establishes byte frame boundaries by having a unique bit pattern that does not exist in normal data patterns. Frame delineation is accomplished by transmitting the C Frame contiguously between the START and END codes.
Any receipt of an IDLE character resets the Frame boundary and any incomplete packets are lost packets. The XON and XOFF K-Codes provide link level flow control for the serial link layer. The logic to implement a mesh fabric interface using this serial protocol can be built easily using existing Field Programmable Gate Arrays.
Distributed interfaces
The fabric interface on each board has to aggregate incoming traffic, and distribute outgoing traffic. The switching at each node is 1-to-N which allows each node's fabric interface to be much simpler. The head end determines the capacity of the 1-to-N fabric interface. Each board's fabric interface does not need to implement more capacity than the board can utilize.
In a distributed fabric, the serial links do not necessarily have to operate at the same capacity as the head end. The traffic needs to be adequately distributed among the links such that the average traffic matches the head end capacity. A 2.5Gbit/second head end, for example, could be adequately supported by 1Gbit/second links. A 10Gbit/second board would more likely need to have 2.5Gbit/second links.
A nice feature of the mesh is that the links can individually operate at different speeds. The distributed fabric has less contention for resources. A mesh essentially creates an over-provisioned interconnect so that each node does not have to compete with other nodes for resources.
Each board can implement fabric features suited to its application. The fabric interfaces may need buffering, multiple layers of prioritization, and flow management. Each board implements traffic management according to the needs of its application. Distributed fabric implementations can be less complex than centralize fabrics. The distributed nature of the fabric still offers opportunities to add value. The difference from current COTS fabrics is a re-partitioning of resources, and embracing an open standard protocol.
The distributed nature of the fabric interface allows costs to be added incrementally. The only fixed cost in a mesh system is the cost of the fixed copper interconnects. Capacity, and cost, is added with each card. Capacity can also be added in different increments. A 2.5Gb/second board does not have to be as costly as a 10Gb/second board.
With a star network, the central resource has to have full system capacity, even if not immediately used. If the capacity of the star is exceeded, the central router has to be discarded and replaced with a higher capacity device. A star system has to have 2 redundant routers of the same capacity, only one of which is used at any given time. A distributed fabric can spare a node at the incremental fabric costs. The cost of redundancy is N+1 instead of 2N.
Distributed fabrics are ideal for open architecture, multi-service platforms. Each node is a peer in an orthogonal system. There are no special slots, no special routers. A system is populated with modules according to the application. As the application changes or scales, the distributed fabric changes and scales with each new module. As the telecom industry moves toward more outsourcing, the demand for open architecture platforms is increasing. The COTS industry is responding with mesh-based platforms that offer scalability and interoperability.
The PCI Industrial Computer Manufacturers Group (PICMG), the organization responsible for CompactPCI, has two new emerging standards that offer mesh networks in open architecture platforms. The CompactPCI Serial Mesh Backplane (CSMB), PICMG 2.20, adds a mesh network to existing PICMG 2.x systems. CSMB overlays with existing PICMG 2.x specifications like the CompactPCI bus (2.0), Hot Swap (2.1), Ethernet (2.16), and StarFabric (2.17). It provides infrastructure for applications like Asynchronous Transfer Mode (ATM), 3G wireless, frame relay, and other proprietary or consortium based transport protocols. The CSMB infrastructure provides for up to 700 Gbit/s of data transport capability.
PICMG is also developing a new family of specifications for an Advanced Telecom and Computing Architecture (AdvancedTCA), which are systems are based entirely on a mesh network. The network in these systems scale to 2.4 terabit/second, based on the same high-speed serial interconnects. Technology developed under one standard will be able to transition to the other standard. The availability of open standard fabrics make high speed serial interconnects and mesh backplanes the topology choice for future open specification systems.



