News & Analysis
DSPs Are Hot
Ray Weiss
6/13/2000 12:00 AM EDT
Today, DSP processing is a hot technology arena. The growth rate for DSP programmable processors is running some 35+% for a number of years. That DSP processor growth has attracted a number of alternate technologies, including reconfigurable, DSP functions. And the FPGA vendors have pretty much recognized DSP as a specialized growth area. Every major FPGA vendorXilinx, Altera, QuickLogic, and Atmelis addressing the DSP designer's needs.
Fueling the FPGA charge into DSP processing is the ever-present silicon curve. Silicon developments have moved FPGA chip resolutions down to 0.15 micron and lower, enabling FPGA-based designers to pack in more and faster logic onto a single chip.
Driving this FPGA move toward DSP functions has been the emergence of the wireless market, in particular the infrastructure needed to deploy wireless infrastructure equipment. "Wireless is in many cases the driver for DSP processing on reconfigurable logic," said Will Strauss, president of Forward Concepts, the leading analysts on the DSP marketplace and technology. "The processing power needed for the wireless infrastructure has been increasing, and in many instances is greater than past and most current DSPs could supply. Thus, designers have turned to FPGAs to do the on-the-fly, front-end processing."
Consequentially, the market numbers for FPGA-based DSP processing are rising. According to Will Strauss, reconfigurable DSPs (FPGA-based) will show a combined annual growth rate of some 30%. Sales are projected to go from 1999's $84.5 M to $185 M though 2002. That said, we should recognize that the DSP market is much, much larger and that reconfigurable DSP logic will represent at most some 10% of that market.
| YEAR | RECONFIGURABLOGIC (FPGAs)($M) | DSP PROGRAMMABLE PROCESSORS |
| 1999 | $84.5 M | $4.4 B |
| 2000 | $109 M | $6.1 B |
| 2001 | $142 M | $8.2 B |
| 2002 | $185 M | $19.2 B |
Approach 1—More, Faster Gates
No designer wants fewer gates or slower logic. One way to handle
complex processing problems is to throw lots of fast logic and
state blocks at it. Lots of fast gates and state elements can
handle a surprising number of problems. Faster logic lets you do
more between clocks and more logic lets you spread the problem out
vertically, putting more logic to work in the same clock period.
With more logic modules or engines to do the work in parallel, you
get even more done in the same clock time.
Today's top-of-the-line FPGAs have already passed the 1 M usable gate barrier. They are also driving internal system clock rates to 200 MHz, 400 MHz and beyond, while supporting I/O rates on the order of 600 to 800 MHz to support Telecom OC-2 and higher links. The leading FPGA architecture for DSPs today is Xilinx's Vertix I, which packs up to 1 M available gates into a single chip. And that's not all. Xilinx just announced its next generation FPGA family, Vertix II, which goes to 10 M gates, providing a veritable bonanza of available gates for DSP processing. Altera is also pushing the gate count limits with its APEX 20K family. These FPGAs deliver up to 1.52 M usable gates.
Approach 2—Hybrid FPGAs With Special Blocks
More and more gates and logic cells may not be the answer. Many
FPGA vendors have realized that you can't build everything with the
basic FPGA logic/state cell. And that the FPGA tinker-toy approach
doesn't always work for large datapath elements. So they've shifted
to adding specialized blocks for higher datapath element
efficiencies.
The first and most important blocks added were RAM blocks. These blocks of RAM are placed strategically in the silicon die data-flow and provide high-density RAM without the logic wastage that was common in earlier FPGAs when the logic cell flip-flops were used as RAM, leaving much of the cell logic unused.
More RAM and efficient RAM is critical to many of the newer applications, which are "data-flow" oriented, rather than being control oriented. Data-flow applications can be characterized as involving the flow of data that is passed through one or more processing stages. These stages process the data and move the results on to the next stage.
Many of the growth applications—Telecom, Datacom, and wireless base stations—are all examples of such data-flow. For these applications, RAM is needed to hold the incoming data, hold intermediate results, and to buffer output flows. Buffering inputs, intermediate and output results enables the logic to maintain the processing flow, producing high throughput efficiencies.
Today, the FPGA-based memory is there. For example, QuickLogic deploys 83Kbits of block RAM in its QuickDSP FPGA family; these are RAM blocks at the top and bottom of its FPGA array. Altera packs in some 442 Kbits of block memory. This memory is distributed in Embedded System Blocks (ESBs) of 2 K bits; the ESBs can be configured as RAM, ROM, FIFO, or CAM memory. And, last but not least, Xilinx's Vertex I integrates some 131 Kbits of block SRAM.
FPGA vendors add more than memory to the FPGA DSP mix. They're adding processing blocks as well. These are specialized arithmetic functional blocks that combine ALU, MPU, and MAC operations. Generally, they can be scaled, i.e., cascaded to provide larger field arithmetic.
For example, QuickLogic is taking an almost ASIC-like flow approach to its blocks. Its Eclipse family of FPGAs integrate I/O, with RAM blocks, and FPGA logic cells with up to 18 Embedded Computation Units (ECUs), which are hardwired logic not built on FPGA cells. These ECUs are mathematical processing units with 8-bit multipliers, 16-bit adders, and internal registers. They can be cascaded for larger arithmetic processing. The ECUs are supplemented with up to 662 K usable gates and 82 Kbits of block RAM.
Approach 3—Reconfigurable Logic Supported by Fixed
Blocks
A third approach to FPGA DSP processing is to integrate the DSP
functions in FPGA logic with an on-chip standard processor and
support blocks. With this approach, you'd use the FPGA structure to
build the specialized DSP processing functions, but also rely on
fixed support blocks to handle the normal I/O, coordination, state
and data storage functions.
Thus, you have relatively fixed hardware architecture integrated with the flexibility of FPGA logic blocks. This architecture relies more on software control, via an on-chip processor, than on hardwired control. This software also provides a level of flexibility, while gaining hardware efficiencies from fixed, hardwired blocks that are not built from FPGA logic cells.
An even greater advantage can come from using the fixed control resources — the on-chip processor and memory—to dynamically reconfigure the FPGA cell resources. The processor can control swapping into FPGA logic banks, using the on-chip memory resources to hold or cache the FPGA logic cell descriptions that are loaded into the FPGA's cell defining RAM. Specialized DSP processing or signal conditioning functions can be loaded in to process the incoming data, and be swapped out for the next DSP processing function. A design can use two FPGA banks, one running the current logic, the second being loaded for the next logic iteration.
Atmel has created such a combined FPGA product, called FPSLIC. This architecture integrates Atmel's AVR 8-bit, 40 MHz RISC microcontroller, with block RAM and a set of RAM-based FPGA blocks.
The microprocessor controls operations and comes with a set of on-chip peripherals. It has an on-chip hardware multiplier accelerator (8 x 8). The microprocessor runs from up to 32KB of on-chip RAM and can support external memory as well. The FPSLIC includes up to 40 FPGA system gates that are RAM based. Additionally, the FPGA logic incorporates up to 18.4 K bits of distributed, single-port RAM.
Interestingly, each logic cell can be configured into a number of modes. These include a 2-bit adder/counter, 4-bit random logic, a 2-bit multiplexer (w select and enable) or a 2-bit multiplier cell. The multiplier cell configurations can be connected together to form large multiplier arrays for DSP processing.
The Atmel engineers took a very straightforward approach to reconfigurable logic, or as they refer to it: Cached Logic. The underlying RAM that defines a block of FPGA logic cells can be loaded to reconfigure the logic. The RAM can be loaded from the chip's block RAM that can be said to be "caching the logic," much as RISC CPU's cache memory "caches" executable code and programs.
| FPGA | # Usable Gates | RAM Blocks (Kbits) | System Clock (MHz) | Function Blocks | Comments |
| Altera APEX | 1.5 M (RAM based) | 442.3 Kbits | 200 MHz | Mixed cells with PLD logic, LUT-based logic | |
| Atmel FPSLIC | 40 K (RAM based) | 18.4 K | 8-bit uP 36 KB I & D SRAM uP peripherals |
SOC with CPU, memory, & reconfigurable FPGA LOGIC | |
| QuickLogic QuickDSP | 660 K (antifuse) | 82.9 Kbits Dual-Ported | 400 MHz | 18 ECU (8-bit Adder, 16-bit MPY/MAC) | Flow-thru arch w layers: RAM CLB FPGA logic RAM |
| Xilinx Vertex I | 1.124 M (RAM based) | 131 Kbits | 200 MHz | RAM based, opens underlying LUTs, logic to use | |
| Xilinx Vertex II | 10 M (RAM based) | 262 Kbits | 200 MHz | 1.5 V CMOS 0.12 transistor 8-layer metal |
MathWorks, the leading vendor of DSP algorithm design tools, has linked up with FPGAs. Its MATLAB tools, which many DSP developers use to create and test their algorithms, are now integrated with the FPGA vendor toolsets. For example, MathWorks has in conjunction with Xilinx, brought out its Xilinx System Generator. This tool bridges between the System Design Domain (MATLAB and Simulink), and the Hardware Design Domain (HDL design, hardware simulation, synthesis/placement/routing, and timing verification).
The Xilinx System Generator provides a link and a feedback path between the two domains, between system and algorithm design and FPGA hardware implementation. The tool automatically calls the Xilinx CORE generator and maps the MATLAB design into FPGA terms. This tool implements fully parameterized, high-performance LogiCORE DSP cores for the Xilinx FPGAs. These cores are implemented in FPGA logic cells, both at the logic level and at the inner cell LUT level (using the FPGA's internal logic mapping resources).
Designers can build a fully DSP system and then automatically generate a HDL representation that is compatible with the Xilinx FPGA architecture. Designs can be generated from the MATLAB side or HDL-based designs can be pushed up to the MATLAB interface for algorithm development and checking. The Xilinx DSP System Generator will be available from Xilinx in the 4Q00.



