Design Article

Power-aware FPGA design (Part 2)

Hichem Belhadj, Vishal Aggrawal, Ajay Pradhan, and Amal Zerrouki, Actel

2/11/2009 12:46 PM EST

Editor's Note: Hi there, and welcome to Part 2 of our three-part mini-series on Power-Aware FPGA design. Since this whole article adds up to around 25 pages, it's probably a good idea to briefly summarize the overall scheme of things as follows:

Part 1
– Abstract
– Introduction
– FPGA Power Components and System Profile
– Fighting Static Power
– Fighting Dynamic Power

Part 2
– Fighting Dynamic Power (continued)

Part 3
– Fighting Dynamic Power (continued)
– Proposed Power Reduction Methodology
– Conclusions
– References


Other techniques to reduce RAM power
There are more opportunities to reduce wasted power; in particular, when cascading multiple blocks to build a large RAM, or when the data and/or the address bits are not changing systematically every clock cycle. The following sections address these...

RAM Cascading
FPGAs offer several embedded RAM blocks with unique sizes but variable aspect ratios. This feature opens the door for different cascading schemes. Fig 9 is an illustration of two alternatives that have different timing and power attributes.


9. Potential embedded RAM cascading schemes for 4Kx4 RAM.
(Click this image to view a larger, more detailed version)

In one case, all the RAM blocks toggle at each clock cycle as their outputs are concatenated to build the output. In the second case, only one RAM block is active at a time. However, there is overhead logic that not only could consume extra power, but also definitely affects timing. Users should check whether the address-generation logic addresses one RAM a large number of times before moving on to another one. If the address locality is guaranteed, then cascading schemes where only one RAM is active at a time are viable. The next section on gating the clock and enable signals includes some actual silicon power results.

Root and Leaf Clock and Enable Gating
Several experiments have been developed to further reduce the RAM power dissipation in a design that uses a CAM. These experiments considered the RAM cascading options and the root and leaf gating of the read and write clocks, as well as the RE and WE with the address decoding. The silicon power measurement results are summarized in Fig 10.


10. RAM Cascading Clock and WE/RE gating effect on RAM.
(Click this image to view a larger, more detailed version)

A New Technique to Mitigate Peak RAM Power
Usually, because of timing constraints, FPGA designers do not re-investigate the read and write clocks and their relationship. The issue of peak power is even worse when the read and write clocks are driven from the same source. This leads to potential simultaneous accesses on the dual-port or two-port RAMs.

The proposed technique, inspired by the DDR concept, is to use clocks with opposite edges for these ports. This method guarantees accesses staggered in time and thus a spread over time of the power dissipated by each access. This has been proven on multiple designs, provided that the timing constraints at the input and the output of one of the RAM access operations are met when inverting the clock.

I/O power dissipation
We came across a large number of customer designs where the power burned in the I/O banks is the highest in the power profile of these designs. In addition to the techniques for mitigating static power described in the "Fighting Static Power" section in Part 1, FPGA designers need to work very closely with the system architect and system board designers to challenge their decisions regarding I/O standards selection, interface timing requirements, electrical requirements, pinout constraints, etc.

Power-Aware I/O Standards Selection
Differential I/Os (LVDS, LVPECL) and resistively-terminated I/Os (HSTL, SSTLs, etc.) have relatively high static power but the lowest dynamic power because of the limited voltage swing. The rule is to use these for the highest toggling frequencies.

For low frequencies and relaxed timing, using single-ended I/Os such as LVCMOS has the advantage of lowering the dynamic power – especially when the FPGA offers I/Os supporting voltages as low as 1.2 V.

Other Techniques for I/O Power Reduction
The first technique is to reduce the I/O number by reconsidering the design/function partitioning over several devices, or by eliminating I/Os that can be time-multiplexed.

To reduce the activity or toggling rates of the I/Os, designers need to eliminate unnecessary glitches at the output of the I/O drivers. In case it is very complex to do so, the alternative is to use a tristate output buffer instead of a simple output and monitor the logic that generates the enable signal of the tristate. Another technique widely adopted as a good design practice is selecting bus encoding that helps reduce the number of toggling bits and correlates successive values on the bus.

In the area of "know your FPGA," users need to know that for some I/O-terminated standards, the power dissipated when driving low is slightly different than when driving high. Using a bus encoding that favors the least drawn current or even inverting the output will lead to less total current for these I/Os. Users should study the static probabilities of the signals – the fraction of time that the signal will be logic '1' during the period of device up-time.

FPGA designers should also determine the speed and waveform requirements in order to use the lowest possible drive strength that meets them. Working closely with the board layout team in defining pin assignment to I/O banks, the designers may be able to reduce the number of compatible I/O standards and voltages required.

New Techniques to Mitigate I/O Power Consumption
Another design technique usually used to mitigate simultaneously switching outputs (SSOs) could also apply to reducing I/O peak power, provided the interface timing allows. This technique staggers the I/O transitions over time. The simplest method of staggering an I/O bus, driven internally by sequential elements, is to divide the active outputs into two groups, some at the positive edge of the clock and others at the opposite edge.

Board designers must pay attention to the design of a trace that implies the lowest capacitance without compromise of the signal integrity for high-speed signals.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form