News & Analysis
RapidIO streamlines system design
Victor Menasce, Director, Lead Architect,New Product Definition, Tundra Semiconductor Corp., Kanata, Ontario
1/17/2002 12:09 PM EST
System designers face a barrage of interconnect standards, often all of them present on a single board. Tying together many of these single-purpose interconnects is a cumbersome affair involving a lot of FPGA design and selecting among specifications as diverse as H.100, Utopia II, POS-Phy, GMII, PCI, the DSP Host port bus, the processor bus and about a dozen others. The permutations and combinations for connecting these disparate standards are too numerous to commercially support dedicated "bridges" from one standard to another.
A good example of the dilemma facing the board designer is the digital hardware that might exist in a third-generation wireless Internet basestation. Unfortunately, such a system is not unique but rather, representative of many systems today.
It has a number of subsystems that require specialized functions. There is a high-speed modem pool and a processor subsystem for routing and control. There are both channelized and connectionless communication paths, and both telephony and Internet Protocol access. Several protocols are involved and they need to be terminated at wire speed. All of this is processing intensive.
Linking together many of these single-purpose interconnects involves FPGA design, which is 10 times more labor intensive than connecting application-specific standard products on a board. Many organizations keep metrics of design effort for a variety of design tasks. And while FPGA and ASIC solutions are certainly appropriate for some applications, the data shows that the engineering effort to design one is at least an order of magnitude larger than incorporating a standard product in a board design.
The investment required to implement an interface is a function of the complexity in the interface. Designers choosing the FPGA or ASIC route need to consider many factors.
For example, the interface needs to have low latency and high bandwidth. The software running in a microprocessor needs to address I/O devices using memory-mapped I/O techniques. Software processes often communicate through messages sent to mailboxes. Interrupts need to be available for both software and I/O devices to affect program control flow.
RapidIO is based on a low-latency memory-mapped programming model. A microprocessor interface needs to support cache coherency for multiprocessing environments, and RapidIO's full-featured, global-shared-memory scheme is a directory-based cache coherence modeled after the Stanford Dash project.
RapidIO supports the notion of coherence domains. In this scheme, microprocessors can be grouped together. The cache-coherence traffic is limited to members of that domain, eliminating the broadcast traffic that is commonly associated with symmetric multiprocessors. The GSM protocol uses a three-state modified-shared-local model.
In microprocessor-based systems, latency is of utmost importance. The RapidIO protocol was designed to minimize latency. In RapidIO architectures, switches have a minimum of intelligence. To minimize latency, they route traffic based on a set of nailed-up mapping tables, placing the burden of knowing the network topology on the end points.
Today's DSP interface is generally a simple slave interface that can be asynchronously addressed by a general-purpose I/O interface. There is no good general-purpose solution to connecting DSPs and microprocessors today. In a DSP farm setting, the CPU must poll through the array of DSPs to extract the data. This polling is extremely inefficient and consumes a lot of CPU real-time.
In a RapidIO system, a serial switch fabric device with integrated direct memory access (DMA) devices would manage the data flows from the DSPs. Traffic would be aggregated by the switch and the DMAs would place the data in microprocessor memory. This ensures that the data is available in memory queues for software to begin processing. In a wireless application, the data streams from the DSPs have a constant data rate. Offloading the CPU from polling the DSPs simplifies the programming of this application significantly. It also dramatically improves performance.
The asynchronous transfer mode standard is a packet-based spec in which packets are of a fixed size. The physical layer in ATM is generally Sonet. ATM packets have a both a header (5 bytes) and a payload (48 bytes).
A RapidIO interface can easily accept ATM packets by encapsulating them within a RapidIO packet. RapidIO packets are variable in length and can have a payload from 0 up to 256 bytes. ATM packets would be 64 bytes aligned within a RapidIO packet, and up to four ATM packets can be encapsulated within a RapidIO packet.
ATM packets have strict ordering rules that are the same as the ordering rules for RapidIO. Packets are generally transmitted and received in order. In some cases, packets of a higher priority are permitted to pass packets of a lower priority. This is required to ensure forward progress, and to prevent deadlocks and head-of-queue blocking. The bandwidth of parallel RapidIO interfaces is ideally suited to the requirements of today's ATM standards. An OC-48 interface only uses 25 percent of an 8-bit link's available bandwidth.
The PCI standard has achieved ubiquitous status by any account. The rich marketplace of I/O devices available in PCI is too compelling to ignore. However, PCI is reaching limits of bandwidth and scalability for today's systems. The RapidIO interoperability specification documents how RapidIO packets relate to PCI transactions.
Ensuring PCI interoperability requires support for the ordering rules of PCI. These ordering rules are mapped to priorities in RapidIO. RapidIO defines four priorities today, Levels 0-3, with provision for more priority levels in the future. Level 0 represents the lowest and Level 3 the highest. Higher-priority packets are permitted to pass lower-priority ones. PCI transactions are mapped to RapidIO transaction.



