Design Article

More architectural innovations needed for the broadband edge

Greg Christison, Communications Processor, Development Manager, Texas Instruments, Dallas

3/10/2003 8:52 AM EST

More architectural innovations needed for the broadband edge

The maturing broadband market brings with it a wide range of customer-premises equipment (CPE) at the network edge that offers different combinations of functionality for homes and small offices.

Broadband CPE in this complex networking environment frequently offers Internet Protocol bridging and routing, digital-subscriber-line and cable modems, wired and wireless Ethernet, voice-over IP (VoIP), universal serial bus, asynchronous transfer mode, firewalls, virtual private network clients and other forms of security, to name only the most common interfaces and protocols that are available.

While individual CPE products at the network edge vary widely in the capabilities they offer, one thing they all have in common is sensitivity to cost. Developers are faced with the need to build many types of CPE systems as inexpensively as possible to cover the wide range of broadband applications.

To meet those requirements, developers must rely on underlying silicon solutions that offer developers cost efficiency and flexibility along with performance. A processor platform that provides a complete hardware and software solution with the versatility of a modular architecture can serve as the basis for a full line of broadband products, saving development resources and reducing time-to-market for the CPE manufacturer.

Fortunately for system developers, architectural innovations are appearing in broadband processor platforms that improve system performance and enhance design modularity.

The interrelated hardware and software architectural techniques being employed include DMA management that is distributed among interface peripherals, a protocol-independent common programming interface for peripherals, and a switching central resource that replaces the standard bus architecture. These architectural innovations, together with tuning techniques such as cache sizing, support greater performance and flexibility not only in system-on-chip (SoC) integration but also in system-wide design.

In the broadband environment, direct-memory-access control does not have to be designed to handle data transfers for every conceivable interface. Distributed DMA control, as the name implies, removes the function from the central controller and distributes it among the various peripheral interfaces. Each peripheral can serve as its own bus master for DMA, in contrast to the usual architecture where bus mastery is normally claimed by either the CPU or the central DMA controller.

Each distributed DMA control function is tailored to the requirements of its own peripheral interface, with no additional features or flexibility. Since buffering and burst sizes are optimized for the type of data being transferred, the hardware is compact. Even when the various distributed DMA control functions are added together, they do not exceed the size of a general-purpose DMA controller. In addition, several small control functions fit more efficiently on the die than a single large controller, saving cost in the silicon die.

The efficiencies of distributed DMA control can be enhanced with the use of a protocol-independent programming interface for multichannel packet-oriented communication peripherals.

An example is the Communications Port Programming Interface (CPPI) that is implemented on some broadband processors. Essentially, CPPI offers developers a common way of handling different protocol interfaces that may require multiple priorities and multiple channels on a single port. CPPI defines the register set, data structures, interrupts and buffer handling for all peripherals, regardless of protocol.

CPPI is based on a buffer scatter/gather scheme, in which individual packets are broken up and then stored into small buffers, from which they are retrieved and reassembled (as opposed to using buffers that are located contiguously in memory). When protocol translation is required, a packet header can be appended in a small buffer, saving the CPU from having to rewrite the entire packet and header by performing a copy from one large buffer to another. The considerable savings in CPU cycles that result make buffer scatter/gathering the most efficient scheme for bridging and routing.

Efficient processing

To this inherent efficiency, CPPI adds a common method of interfacing to different modules on a chip, enabling more efficient processing of the data by the CPU. That is, less translation and reformatting of data and buffer descriptors is required to forward data, cutting down on the overhead associated with the management of linked buffers. Transmit and receive operations are highly symmetrical, leading to maximum efficiency of bus and memory usage. In storage, CPPI is capable of offsetting data so that, for example, IP addresses are not broken across 32-bit words, saving extra fetches and reassembly when the CPU needs an address.

The performance efficiency of distributed DMA control and CPPI is enhanced by replacing the common data bus on chip with a switched central resource (SCR). Switching enables multiple data transfers to take place simultaneously between different master-slave sets.

At first glance, it might seem that implementing a switch would require considerable space on the die and a corresponding increase in cost. But our experience is that an SCR requires only about 15k more gates than a conventional bus — a small investment in die space compared with the increase in performance that results from simultaneous transfers. Also, since the peripheral interfaces are able to transmit much faster with a switch, port buffer sizes can be decreased, offsetting the increased size of the switching fabric.

Although multiple ports can transfer simultaneously to each other and to different memory areas, only one transfer can take place at a time through each memory/slave interface. Therefore, while the SCR eliminates the bus as a data bottleneck, care must be taken not to create a new bottleneck at the external memory interface (EMIF) to system memory. That potential problem can be minimized, if not eliminated, by storing CPPI buffer address descriptors in on-chip SRAM instead of in the main memory. The CPU can then access the on-chip SRAM for descriptors, instead of clogging the EMIF with stores and fetches. The EMIF is thus freed to handle DMAs much more efficiently.

The size of the instruction and data cache inside the CPU can be a critical performance and cost factor. Tests across a wide range of broadband operations indicate that packet throughput increases gradually with cache size. However, there are certain key levels of cache size that result in dramatic routing throughput increases. One jump in performance occurs when cache size is increased to 8 kbytes for instructions and 4 kbytes for data, corresponding to many cache implementations on broadband processors today.

An even greater jump occurs when the cache is increased to 16 kbytes for instructions and 8 kbytes for data. While the first jump increases packet throughput by roughly 50 to 100 percent, depending on the operating system tested, the second jump doubles packet throughput with all operating systems tested. For those who are curious, increasing the data cache to the same size as the instruction cache produces only a slight rise in throughput.

When a 16-kbyte/8-kbyte cache is used in conjunction with a scheme for storing CPPI descriptors on-chip, the result is a maximum increase in packet throughput — the primary application of a broadband processor.

See related chart





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form