Design Article
I<sub>2</sub>O in Next Generation DSP Based Telecom Systems
Yogendra Jain and Richard Dargusch
12/4/1997 12:00 AM EST
I2O is an emerging standard for Intelligent Input/Output. A primary benefit of I2O is the ease with which multiple peripheral devices and boards can be integrated into a single system. Such systems are common in DSP applications, where different types of I/O and processing boards are required. I2O's implementation requires placing intelligence on the peripheral board in the form of a local processor. With this, the host CPU no longer has to act as the data distributor. The peripherals can themselves carry out peer-to-peer communications. As a result, the CPU response time is faster and more deterministic and allows a DSP to focus on the signal processing task at hand. This article presents benefits of I2O for signal processing and telecommunications applications.
Figure 1: A typical multi-board system for DSP applications with a CPU, coprocessor, DSP board, and multiple I/O channels.
Consider the dashed line arrows in Figure 1. These indicate the devices that need to communicate with one another. Writing a single device driver (non- I2O) can be challenging. Implementing a multi-platform intercommunicating driver is exponentially more difficult. Imagine writing the device drivers to make all these devices talk to one another as indicated, without intelligence on the boards; it's a daunting task. Making those drivers actually work is another. All the interrupts need to be configured, priority structures must be established, and custom code to support all the inter-communication(s) must be written and debugged.
Here, large volumes of data are ostensibly transferred from board to board. Control is passing between multiple layers of the architecture. Without intelligence in the I/O and co-processor boards, the hosts can spend significant time handling I/O requests and transfers. The I/O and co-processor's throughput will suffer as well and overall performance will be limited.
Figure 2: A simplified DSP peripheral platform with multiple I/O devices and a local processor for I2O
Figure 2 shows a simplified I2O device platform, which illustrates the use of multiple "devices" on a single board. The platform also contains a local processor (the "intelligence"); it processes I2O messages and executes the device driver modules, both of which are described below.
Figure 3: Split driver model for I2O"traditional device drivers are written as a single block of code, interfacing to both the O/S and the hardware device. I2O splits the driver into two pieces and defines a standard set of messages to be used for communication between the two.
Historically, device drivers have been written specifically tailored to a particular O/S and a particular peripheral device. In I2O, the driver is split into two parts: the OSM (O/S Services Module) which provides the interface only to the O/S, and the HDM (Hardware Driver Module) which provides the interface only to the peripheral device. The two communicate (Figure 3) via standard message packets across a layered system composed of a messaging layer which resides on a transport layer. The messages are passed between the OSM's and HDM's via two virtual FIFO queues"one outbound, one inbound.
To accomplish this communication, an I/O processor (IOP) is required on the peripheral side to process the HDM's. Intel's i960RP processor is specifically tailored for I2O; however, the HDM (Figure 3) can be hosted on any applicable processor. By standardizing the messages, platforms communicate without knowledge of underlying bus architectures, OS's, device specifics, and I/O hardware. Thus buses such as VME, PCI, cPCI, and so on are all potential candidates for use with this specification.
- Random Block Storage (HDD or CD-ROM)
- Sequential Storage (tape drives)
- LAN (Ethernet or Token Ring)
- WAN (ATM controller)
- Fibre Channel Port
- SCSI Peripheral
- ATE Port (ATE controller)
- ATE Peripheral (an ATE device)
- Floppy Controller
- Floppy Device
- Bus Adapter Port
- Peer - Peer.
Additionally, RadiSys (DSPD) is currently designing the specification of a class for Telecom devices.
Messages in I2O are passed between the OSM(s) and HDM(s) via two virtual FIFO queues"one outbound, one inbound - thus substantially reducing the number of interrupts the other processors in the system need to handle. In ordinary systems, the interrupts of the CPU rob it of a significant portion of its processing power. So in lightening the load on it, I2O enables the main processor to devote its power to running code, rather than managing I/O traffic!
Multi-Processor, Multi-Peripheral Platforms
The messaging structure of I2O enables it to
operate in systems with any number of hosts, I/O, and DSP
platforms. In Figure 4, host CPUs operating under two
different operating systems are communicating with three
different classes of devices (A,B,C) on two separate I/O
platforms (or boards). Communication is via messages between
host and peripheral, or peripheral to peripheral.
Figure 4: I2O can communicate across multiple operating systems, and between peripheral platforms on a peer-to-peer basis
System Control
A system such as is seen in Figures 1 or 7 without
I2O would be brought to a halt should the controller
processor need to be reset. I2O loosens the coupling
of boards so that resetting any one processor, including the
host(s), will not force any other peripheral to stop its
processing.
Figure 5: I2O capable peripheral board with integrated DSP, multiple I/O connections, and an H.100 style switch. The i960 handles all on board I/O transactions as well as communication with the host(s).
In the board shown in Figure 5, RadiSys' SPIRIT-6000 has its multiple components all on a single cPCI form factor. The i960 is the IOP. The H.100 (a computer telephony bus standard) switch selects from several input sources (e.g. TI/E1 framer, ISDN, Frame Relay, POTS). The i960 controls the switch, as well as the data flow into and out of the DSP, be it directly through a host port to the TMS320C602 DSP, or via the H.100 through a high speed serial port. The i960 (or IOP) also handles all the communication with the host, or other devices, external to the card.
Since all these tasks are now performed by the IOP (the i960), the DSP is freed up to do intensive signal processing with minimal servicing of the host, resulting in greater throughput and greater determinism. DSP software development and maintenance is easier, and fewer DSP external memory accesses are required since host communication code is no longer in DSP memory. Applications become modular and more robust.
Consider again the multi-host, multi-I/O platform environment(s) shown above in Figure 4. Such systems present particular I/O complexities that the distributed intelligence of I2O can seriously simplify. As Figure 4 indicates, I2O supports multi-host (and multiple O/S), and multi-IOP systems, due to its structured nature and standardized classes of devices. The host no longer needs to manage all of the data, and the migraines of the complexities of inter-peripheral communication are eliminated. Implementing I2O forces distributed intelligence, which is a natural requirement for large I/O and computationally intense systems.
Distributed, or local, intelligence is not a new concept. As an example, designers have used a 386 as a controller for this task. In "system on a chip," a controller core may accompany a processor and an I/O core. These local intelligent processors are advantageous, but they create system wide havoc. Trying to get them all to communicate over a standard bus with a standard set of API's has been quite difficult.
The difference with I2O is that regardless of the IOP selected, the API's for the peripherals within the same class(es) are identical.
Distributed intelligence can provide another important feature. The IOP minimizes DSP interruptions when processing host commands. For general I/O applications, this is a plus. For the real time applications encountered where DSP's determinism is critical, this is a phenomenal advantage.
In fact, in communications with the host, studies with other classes of I/O platforms have shown at least a three times (3X) data throughput improvement rate (IOP to host), while reducing the load on the host processor by up to 50%.
Figure 6: Simple parallel processing DSP application. An image is broken into four pieces, and each DSP board is given one piece to process.
Consider a parallel processing application involving four DSP's where a controlling processor divides an image into four parts and distributes one piece to each DSP board for processing. In such systems, three main factors determine the efficiency of the overall throughput:
- Processor Node Speed (how fast each DSP board can process)
- Processor to Processor Link Speed (how fast can the data be transferred among boards)
- Processor Overhead on Data Transfers (how much of the DSP processor's power is required for data transfers).
Where many DSPs are working together with I/O and an application specific board(s), custom architecture specific programs had to be written. The DSP's themselves had to process data transfers, using up precious DSP MIPS, and reducing node speed. With the CPU also involved in all data transfers between peripheral boards, the bus bandwidth used is twice that which a peer-to-peer communication system would allow.
With a standardized model as I2O, the DSP's node speed is increased, since the I/O tasks are now off-loaded. The i960 acts as the data pump / DMA engine, and the link can take full advantage of the host bus speed. Data transfers are direct from peer to peer, rather than through the CPU, effectively halving the bus bandwidth usage.
Further, the host processor can still maintain control and do load balancing by monitoring the % utilization of each DSP, and routing incoming data to the device least used.
Data flow emulation is also made easier by I2O. The messages that are passed contain a header and a payload. The header contains information about the data, which can include: source, destination, type and size of data, parameters, coefficients, algorithm and processing requirements, and so on. The payload contains the data itself. Thus, the CPU (by writing to the IOP) can control the destination of the data for optimal load balancing.
Multiple processors and boards can easily be integrated together to handle the data for the DSP(s), facilitating powerful real time processing environments with a minimum of overhead.
The diagram shows a typical cPCI based system for Telecom applications. Again, the host is a Pentium CPU based controller; the co-processors consist of an x86 board and a DSP board. The I/O is either a daughter card on the co-processor boards or a separate I/O board over the cPCI bus. Also a secondary bus, called the H.100, is shown. The H.100 is specific for high speed telecom traffic. The host O/S is Windows NT with extension for real-time and an algorithm execution environment called TASK (Telecom Application Specific Kernel).
Figure 7: RadiSys' integrated solution for multi-platform processing
Here, all the potential complexities noted in the first section of this article are painfully apparent. This system requires a multi-platform intercommunicating driver (system) that must be fully integrated before any of the boards can talk with one another.
Enter I2O. Utilizing the split driver model (Figure 3), one hardware driver module (HDM) is written for each hardware device (the DSP(s) and the I/O devices), independent of all the other devices and the O/S. For most I/O devices, this HDM will be provided by the board or device vendor. Standard O/S side drivers (OSMs) can be implemented, independent of the specifics of the device to be controlled (or purchased from a vendor, such as Wind River). Inter-device communication is handled by the I2O system; the developer need not even be concerned with it. Product development cycle time is thus dramatically reduced. The results are modular in nature, and easily reproducible.
In telecom data-logging applications, I2O can significantly decrease the system integration time. Typically in these applications, many channels of voice are passing via the I/O. The channels are processed (compressed) and stored. Upon a request by the host the stored channels are read from the disk, uncompressed and played back on the host.
With the availability of I2O drivers, the I/O device can be configured easily to pass the data from the I/O to its IOP. The IOP sends data directly from the I/O to the DSP board's IOP which locally transfers the data to the DSP processor. After the compression, the DSP transfers the data back to its IOP which sends the data to disk. For playback, the same sequence takes place in reverse; but instead of playing out over the output, the data may be played to the host CPU. A full duplex application (simultaneous record and playback) with no loss of data requires six to eight months of painful driver and application integration. The promise of I2O, assuming all the boards support it, is that such integration can be done in less than one fourth the time.
Although I2O is new and initial efforts from board and CPU vendors will be required, once we all cross this painful threshold, a decrease will be seen in application development costs, board support costs, development cycle time, and time to market.



