News & Analysis

Reconfigurable processor combines extensible core with SRAM-based FPGA

Michele Borgatti, Francesco Lertora,Benoit Fort,Lorenzo Cal, STMicroelectronics,Innovative Systems Design, Agrate Brianza, Italy

5/13/2002 10:30 AM EDT

Reconfigurable processor combines extensible core with SRAM-based FPGA
This system-chip features an embedded reconfigurable processor built by joining a configurable and extensible processor core and a SRAM-based embedded FPGA. The following article will be presented at the Custom Integrated Circuits Conference and contains excerpts from the paper, titled "A Reconfigurable System featuring Dynamically Extensible Embedded Microprocessor, FPGA and Customizable I/O" with permission from IEEE 2002 CICC.

These days we are witnessing two conflicting trends in the electronics industry. On one side, the economics of system integration pushes logic suppliers towards ever more complex system-chip devices. On the other side, increasing complexity of design and associated risks, increase of non-recurrent engineering expenses and shorter time-to-marked and product life are causing OEMs to look for faster turnaround and lower risk design solutions.

The recent introduction of embedded programmable logic allows ASIC and ASSP vendors to broaden the appeal of their products. Also, hardware programmability can be exploited by system integrators for product customization. We propose a pragmatic approach that introduces flexibility in system-chip design and exploits embedded programmable silicon fabrics to enhance system performance. In particular, enabling application-specific configurations to adapt the underlying hardware architecture to time-varying application demands that can improve execution speed and reduce power consumption compared to a general-purpose programmable solution.

Embedded programmable logic allows static or dynamic configuration of the instruction set in an embedded microprocessor, the creation of bus- mapped application-specific hardware coprocessors and accelerators, and the customization of the system I/O. The latter feature allows the device to potentially connect to any external unit/sensor given that its communication protocol can be mapped to the on-chip programmable logic. Also, some computations can be performed on-the-fly when data is captured.

The system has been built using a set of state-of-the- art IP cores. In particular, a configurable and extensible processor with associated tools, and an embedded FPGA were used. The resulting system has been developed to target image and voice processing and recognition applications.

One of the main goals of this work was to build a flexible architecture, working at a reasonably high clock frequency, using an embedded FPGA and an extensible 32-bit microprocessor. The base processor comes with a complete set of tools for configuration and performance analysis. Main features of the processor core are a 5-stage pipeline, direct-mapped data/instruction caches, a 24- or 16- bit instruction format for improved code density, a 64-bit processor interface (PIF) with burst transfers for cache-page refill, 13 interrupt lines organized in four priority interrupt levels.

Application-specific bus-mapped coprocessors and flexible I/O can also be added and dynamically modified by reconfiguring the embedded FPGA. The chip was fabricated in a 0.18 micron CMOS technology, and the embedded FPGA accounts for about 40% of the area.

A PIF/AHB bridge translates processor cycles to the AMBA AHB bus with support for fast burst and locked transfers. An external memory interface (EMI) exploits the available peak throughput of the fastest commercial external non-volatile flash memories. It allows a wide range of burst mode and page mode configurations under software control and supports low-voltage, low-swing operations. If required, an external RAM port allows the extension of the on-chip 48-kbyte SRAM.

At the heart of the system is the embedded FPGA and its multiple interfaces to main system units. In particular, the embedded FPGA functions as an extension to the processor datapath supporting a set of additional special-purpose instructions (TIE). This is done by connecting the processor datapath through a wide bus and a specific interface (TIE bus/interface). Hardware units mapped into the e-FPGA can be interfaced to the system bus through an AHB bus master/slave.

Another use for the e-FPGA is for flexible I/O. The programmable general-purpose I/O pad interface is used to connect external units or sensors with their application-specific communication protocol. All of these possibilities may be mixed in a single configuration for the FPGA and this results in a highly configurable device. To accelerate communications between the configurable hardware and software tasks running on the processor, four interrupt channels can be driven by logic mapped into the e-FPGA. Two-way HW/SW communication can be implemented by the joint use of these interrupt channels and dedicated AMBA APB registers.

Processor based on an embedded FPGA achieves a combination of both speed and flexibility. A high-speed link between the hardware and software segments of an algorithm is implemented with logic downloaded to the FPGA that drives four interrupt channels. The AMBA APB bus connects all configuration and general purpose registers to the bus
Source: STMicroelectronics

Downloading of the FPGA bitstream is performed by a flexible programming interface. To allow validation of the FPGA configuration, the bitstream may be read-back by hardware support. Most audio or video applications require storage buffers to interface fast decoding hardware and slower software running on the processor. With this concept in mind, a one kbyte dual port buffer has been added and organized as 4 x 256 byte rows. One port of this buffer is connected to the AHB bus while the second port is directly accessed by the FPGA dual port buffer interface. The AMBA APB Bus connects all the configuration/general purpose registers to the system.

On the same bus, an I2C master interface has been added to connect external devices or sensors like LCD display, CMOS camera, etc. A programmable general-purpose I/O module features mono input/output and bi-directional pads under the control of both the e-FPGA and the microprocessor.

The configurable processor allows the user to add custom instructions. In the proposed architecture, this capability was mapped exclusively into the e-FPGA, allowing runtime re-configuration of the instruction set. This implies that the number of user-defined instructions available at a given time is limited by the e-FPGA logic capacity and instruction logic complexity. However, a set of additional instructions can be defined to target specific application needs. If the logic size of the set of additional instructions exceeds the logic capacity of the e-FPGA, it might be split into a number of contexts fitting the size constraints of the e-FPGA.

These contexts might be used to dynamically re-program the FPGA to support application needs. This flexibility advantage implies a speed penalty for the part of logic mapped inside the e-FPGA. In particular, specific processor instructions mapped into the reconfigurable fabric may be one to 10 times slower than their equivalent implementation in standard cells.

As the additional instruction set is part of the processor pipeline, slowing down this logic results in a drastic reduction of processor maximum speed, hence affecting the processor performance when using the baseline general- purpose instruction set. A mechanism is introduced to allow the processor to be clocked at its maximum speed while executing standard instructions, whereas it is slowed down by a programmable, instruction-dependent number of cycles (1-16) when executing processor instructions mapped into the FPGA.

A clock control system allows the processor to be synchronized with the e-FPGA for the number of cycles the instruction is executed. A dedicated module is able to identify instructions whose performance is not aligned with the processor. As each of these instructions needs to be associated to its execution time, the set was partitioned. A predefined map table divides the whole set of opcodes reserved for user-defined instructions into four sets. For each set that belongs to a configuration, a number, mapped as a constant output of the FPGA, defines the number of times the clock needs to be stretched to synchronize the execution of the pipeline between the FPGA and the base processor.

The architecture of the e-FPGA is organized as a hierarchical multi-level interconnect network. An array of logic elements called multi-function logic cells (MFC) allows implementation of digital logic. The MFC is a four input / one output programmable structure associating a four input look-up table and a storage element (dff, latch). There are three thousand MFCs shared among 24 clusters. The global interconnect network links the clusters together and to IPads & OPads peripheral cells. At a lower level, a local interconnect network links MFCs together and to the global network.

The full-chip has been implemented in a standard CMOS 1.8V/3.3V, 0.18 micron technology featuring six metal layers. The layout of the system has been integrated using commercial place and route tools for digital ASICs. To avoid external multiple power supplies, an internal DC (3V to 1.8V) voltage regulator has been integrated. The chip is being tested and is fully functional at the clock rate of 175MHz. The processor system is able to reconfigure the e-FPGA at full speed. Reconfiguration takes about 500 microseconds at a clock rate of 100MHz.

The system is being tested using both a face recognition application and a speech recognition application. During architecture development , we reported speedups of between four and eight times by using instruction extensions to accelerate face recognition computing kernels.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form