News & Analysis

High Bandwidth Systems for High Performance DSP Applications

Gerard Vichniac

12/17/1998 12:00 AM EST



As signal processing systems become more sophisticated, they present an ever-growing, ever-more-demanding stream of data to the computers embedded within them. For example, advanced airborne radar applications such as space-time adaptive processing (STAP) deliver a data stream from the radome to the computer on a continuing basis. Every bit in the stream is critical, even if it represents an empty cell of space, because the only safe assumption about a missing cell is that it contains a threat. In fact, the ability to deploy high-end applications is often determined by whether small enough, fast enough computers exist to handle their data streams.

These "stream computing" applications are most often deployed on multiprocessor systems built around the VMEbus. This is due in part to the mature set of auxiliary data buses available in this space. These ausiliary buses provide a pathway for data exchange between system elements, eliminating the bottleneck of carrying data—as well as system communications—on the VMEbus itself. Many provide multiple simultaneous transfers at data rates that greatly exceed the bandwidth of the VMEbus, with the determinism and low latency that real-time applications require.

Mercury Computer Systems' RACEway Interlink was one of the earliest VME auxiliary bus architectures. Introduced in 1993, it was adopted as an ANSI/VITA standard in 1995. It provides multiple 160 MB/s pathways within systems, implemented through a switched-fabric crossbar solution that supports networks of up to 1000 processor nodes and more than 1GB/s in aggregate bandwidth.

Mercury's own implementation of the RACEway Interlink standard is its RACE family of heterogeneous multicomputing systems. While RACE remains the architecture of choice for many government, medical, and commercial signal and image processing applications, such as high-end MRI systems, airborne STAP radar, and signal intelligence processors, Mercury recognized the need for a compatible migration path to even higher performance, aggregate bandwidth, and processing power within multicomputer systems.


The RACE++ Architecture
Mercury recently announced RACE++, the next generation of its RACE switched-fabric architecture (since this announcement, the original RACE architecture has come to be known as RACE 1.0).

The RACE++ architecture achieves revolutionary performance increases for stream computing system designers through a number of targeted, evolutionary refinements to the original RACE 1.0 (see Table 1).

  RACE 1.0 RACE++
Port-to-Port Bandwidth 160 MB/s 267 MB/s
Crossbar Aggregate Transfer Bandwidth 480 MB/s 1 GB/s
Crossbar Broadcast Bandwidth 800 MB/s 1.8 GB/s
System Processor Scalability Up to 1000 processors Up to 4000 processors
Bisection Bandwidth @ 16 Boards 640 MB/s 2.1 GB/s

Table 1:  RACE 1.0 and RACE++ Comparison

Much of the bandwidth enhancement comes from the addition of two more ports to the original crossbar's six, for a total of eight ports per crossbar. A second contributor is a modest increase in clock rate, from 40 to 66 MHz.

These improvements are further increased, by a factor of up to 4X, by improved topologies. At the system level, adaptive routing increases sustained bisection bandwidth improvements over RACE 1.0 by up to 6X, and system processor scalability can rise up to 4X. The RACE++ crossbar ASIC is also endowed with higher connectivity for more fully connected topologies.

In addition to higher data speed and greater system bandwidth, the RACE++ architecture also provides the following extensions to RACE 1.0 functionality:

  • Scalability
    The RACE++ architecture offers a more richly connected network, together with increased bandwidth, which accelerates interprocessor communication and allows designers to build larger computing systems with mixes of DSP, RISC, and specialty processors. Whereas the RACE 1.0 architecture supports systems containing 1000 processors, RACE++ will boost this number to more than 4000 processors in a single system.

  • Adaptive Routing
    In the RACE 1.0 architecture, the system chooses, in real time, the least congested data paths between data sources and destinations. This adaptive routing is executed entirely in hardware; it does not incur any system or user software overhead. The RACE++ architecture greatly extends the routing adaptability. In RACE 1.0, two (out of six) ports can implement adaptive routing. In RACE++, all eight ports can implement adaptive routing.

  • Enhanced Endpoints
    In the RACE 1.0 architecture, "endpoints" are sources and destination terminations for data transfers. These include arrays in local memory, shared memory buffers, and foreign bus address spaces. They need not be limited to an intelligent processor's memory or bound to any specific process. The RACE++ architecture extends the concept of endpoints to encompass the crossbar ASIC itself. The benefit of this new generalization is that it adds "handles" for users to control adaptive routing and new ways to reconfigure routing in real time to boost determinism.

With an array of off-the-shelf and custom solutions for applications ranging from computed tomography to battlefield surveillance, the RACE++ architecture offers a fast development environment for new and enhanced applications. RACE++ provides a compatible upgrade path for customers in commercial and military markets as well as for the >45 third-party vendors manufacturing more than 60 RACEway-compatible products.


RACE++ Meets Market Demands
The RACE++ architecture is a significant early outcome of Mercury's $100 million commitment to research and development through the year 2003. This investment is designed to satisfy the defense and medical imaging markets' requirement for systems offering 100 Gflops/cubic foot in the near term, en route to 1 teraflop/cubic foot early in the new century.

According to John Entzminger, former Deputy for Technology of the Defense Airborne Reconnaissance Office, and now a private consultant to technology companies including Mercury, the demands of advanced military stream computing applications in the years 2001 through 2003 will require systems with more than 1,000 processors. Such systems would allow the development of a new generation of applications, designed for the uninterruptible processing of large data streams arriving continuously from sensors and other sources.

The performance levels made possible in part by the RACE++ architecture will also have a profound effect in the diagnostic medical field. Three-dimensional images from MRI, CT, and ultrasound scanners may become commonplace, and real-time imaging could facilitate advanced procedures that cannot be accomplished with existing technologies. Such advanced imaging applications as fusing data sets from different modalities, such as those from CT and nuclear medicine, may allow the physician to identify cancerous lesions and to precisely locate them for surgical procedures. It may also provide early warnings of cancer recurrence and allow the physician to differentiate between benign and malignant lesions. These procedures however, will require breaking through the current bottlenecks in data processing.


RACE++ Availability
Shipments of products based on the RACE++ architecture will commence in 1999. Mercury's two VME product lines, the RACE Series multicomputer family and the RACE Series MultiPort family of embedded supercomputers, will provide the increased computational power required for high-end government, medical, and commercial digital signal and image processing. New systems will feature up to 4000 computer nodes in rugged systems that can be deployed aboard ships, on aircraft, and in emergency medical settings.

Mercury also plans to announce RACE++ versions of its two PCI-based systems. These systems scale to hundreds of processors, and are primarily directed to the medical imaging market (segments of which are moving from VME- to PCI- based designs), the ground-based defense electronics market, and new commercial markets such as semiconductor testing and digital communications.

Mercury will also begin shipments of the basic RACE++ components in early 1999. The company has already begun an early access program for the many third-party companies who offer "RACEway Ready" products based on the ANSI/VITA standard RACEway Interlink architecture. These products include I/O devices, frame buffers, digital radios, and many other components. Mercury anticipates several third-party companies will introduce RACE++-compatible versions of their products in parallel with Mercury's system introductions.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form