News & Analysis

Modularity Holds Key to Accurate DSP Power Estimations

Todd Hiers, Texas Instruments Inc.

10/28/2003 11:33 AM EST

Modularity Holds Key to Accurate DSP Power Estimations
Although power consumption has always been an important design parameter for designers implementing digital signal processors (DSPs), it's more crucial than ever—not only in battery-powered appliances, but also in larger systems where power and heat management present special challenges.

Modern DSPs consume far less power than their predecessors, but estimating this consumption has become increasingly difficult. Today's devices incorporate more functionality than ever before, so traditional methods of estimating power are becoming less accurate and less helpful. A modular, parameterized way of estimating power consumption, described in this article, accounts for these new factors.

The Old Method
It's first instructional to look at how most engineers and vendors make power estimates. They model power as a combination of fixed activity levels for the entire device. These fixed levels may represent generic functions such as "low activity" and "high activity" or they may represent some specific application that a user might want to run. A savvy user might take a time-weighted average of several data points for a more accurate picture, or the data might already be presented that way. However, this averaging is often the only data tailoring that can be done.

Often, a "high activity" number will be quoted that approximates full power for a device. For this model, the device runs optimized code that makes extensive use of all on-chip resources. The level of activity, though, can vary widely among devices or even members of the same family. For instance, some chips have enough internal memory to store the algorithm's code and data, thereby reducing the amount of sustained I/O needed to implement it. Because I/O operations consume a significant amount of a DSP's total power, having sufficient memory reduces the power requirement. This "high activity" number might be complemented by a "low activity" number to represent the time spent setting up registers or executing less optimized code segments that leave some parts of the DSP unused, when the DSP runs IDLE instructions, or when the code activates power-down modes that disables the clock for different modules.

Engineers can generally obtain data directly from device manufacturers, who often write these special "canned" tests that move bits around and exercise all parts of a DSP, but don't necessarily do useful work. Data is obtained by simply running these tests on a variety of part samples and reading the current draw from the device's power supply.

A user must determine how their application matches up with the model's use for taking the power data. This may mean estimating the ratio of high to low activity the DSP experiences, or determining which data point best matches their system. It may be useful to then interpolate between the boundaries of the power-consumption data space. Experience shows that for real-world applications DSPs typically spend between 50 to 75% of their time performing high-activity tasks and the remainder on low-activity operations, so interpolated data may be presented directly.

Using this method a designer can perhaps make some minor adjustments for temperature or different clock speeds, which are indeed important parameters in power estimation. But there's not much else the designer can do. He/she has no real understanding of what's behind the final power figure. It's a very constrained method, very fixed and not very exact.

A Better Way
Given developments in device technology, the relatively simplistic method of power estimation described above is starting to fall short. First, it gives only a rough estimate, while some modern applications are very power conscious. A difference of 10% in consumption can have a major impact on other design decisions, perhaps even prompting the designer to switch DSPs.

Second, early DSPs contained a processing core, a small amount of internal memory and interfaces to external memory and peripherals, so the total-chip approach was reasonable. Contrast those early devices with today's DSPs, which can contain multiple computational units in their cores along with large Flash memory or onboard RAM. These chips also pack many peripheral functions from a wide variety of possibilities including sophisticated debug ports and support circuitry, high-speed serial ports, host-port interfaces, wide external-memory interfaces, PCI-bus interfaces, and DMA controllers. Some of the latest devices even incorporate a built-in Ethernet media-access controller or ports that handle audio or video I/O.

Of course, not all of these peripherals are active at the same time. Power consumption varies widely depending on the use of these on-chip resources. Thus it's impossible to accurately estimate power without an understanding of these function modules and their usage patterns.

Calculating a power estimate with new devices takes a different angle than the traditional method: breaking consumption into baseline and activity power. But in order to make reasonably accurate predictions across the spectrum of possible power scenarios for different applications, the estimation methodology must take into account how all the various modules act for computing the activity component. Indeed, it makes sense to modularize a power analysis, breaking a DSP into manageable subsystems and evaluating them individually, but it also makes the estimating process more complicated.

Baseline consumption is the power that a device consumes independent of the task, if any, it is performing. It includes static power along with power for the PLL and clock tree. Note, however, that baseline power does vary with the supply voltage and temperature.

Activity power is power that is consumed by active subsystems on the DSP such as the CPU, direct memory access (DMA) channels, or various peripherals. The power for each subsystem is independent of temperature, but it does depend on the supply voltage and activity levels. In fact, all the major modules in the device share the total activity power and you want to measure the contribution of each.

Among the possible parameters used to determine the activity level of a module are:

  • Operating frequency, or that of an external interface to the module.
  • Percent utilization, the relative amount of time the module is in use.
  • Percent writes, the relative amount of active time the module is sending out data versus reading data into the DSP.
  • The number of bits used in a selectable-width interface.
  • Percentage switching, the probability that any one data bit changes state from one cycle to the next.
  • Loading, as measured by the length of the trace an I/O line must drive.

Note, though, that not all of these parameters apply to every module. Further, experience shows that of these, frequency and percentage utilization are the dominant factors.

The Real Work: Getting the Values
In the old method the task was to find the ratio of high and low activity for the entire device or match to a fixed test case. Now, the designers must determine the usage characteristics for the core and then each module. Some of the usage parameters are easy to determine such as the clock frequency and number of bits because the overall system design dictates their values. In addition, the designers can easily identify which modules are unused and unclocked. To choose appropriate values for the other parameters, a designer needs a good understanding of the user application.

As noted earlier, one of the most dominant parameters in a power-consumption study is percent utilization. First look for those peripherals that don't have a range of uses and are typically either on or off. An example is a timer.

CPU utilization isn't as straightforward because of the core's complex structure, which means it can experience varying degrees of use depending on how well the code is optimized. Consider that many DSPs now have multiple functional units. For example, one commercially available DSP has eight functional units. In this case, zero percent utilization means that the CPU is idle. In contrast, 100% utilization means that all of the CPU's functional blocks are active every cycle and the maximum amount of data is being acquired from the program and data caches every cycle (Figure 1).


Figure 1: Tools such as the CPU analysis tool shown here give valuable data to assist in finding an overall usage factor.

Few DSP algorithms achieve 100% utilization. Even DSP-intense applications don't spend all of the time in such highly parallel loops, because they also need time for control code or less-demanding algorithms. These types of code might execute only a few instructions in parallel and thus, significantly reduce the I/O and the number of active functional units of the CPU and thus overall utilization. Although gauging core utilization can be tricky, one aid is to examine the output from development tools that offer an indication of the CPU activity at any time. Those numbers can prove very useful when looking for an overall utilization rate.

For peripherals that have I/O, designers can estimate utilization by comparing used bandwidth with the theoretical maximum. For instance, if an external memory interface performs reads and writes a quarter of the time and otherwise has no data to move, you would assign it 25% utilization. As another example, if an application must transfer 50 Mbytes/s over a PCI port with a theoretical maximum transfer rate of 132 Mbytes/s, the port utilization is roughly 40%.

Don't forget however, that system-level issues can reduce utilization on various modules. It is not realistic to expect all peripherals to run at 100% simultaneously. As the system consumes memory and DMA bandwidth, it throttles back peripheral activity due to these bottlenecks. Thus, for those applications with considerable memory or DMA usage, make sure to keep module utilization numbers to a reasonable level.

Next, examine how to approach the percentage of writes compared to reads. This parameter is important because during writes the DSP must power I/O drivers to send signals on the physical connection. In a simple case, peripherals that move data out of the DSP as much as they move data in have 50% writes, so we can assume that the other half of the active time consists of reads. In some applications, designers might know that the code moves data in only one direction or has a known balance of data movement. Otherwise, 50% is a good default value.

Now, we need address percent switching, which includes the power needed to change a bit's state. In this discussion it applies only to I/O peripherals where data is being sent over an interface. Because random data has a 50% chance any bit will change from one cycle to the next, that's a good default for most cases. However, in some algorithms you might be able to predict the percentage of bit changes.

A final aspect to consider —I/O power or loading—is more of an overall system parameter, that is not tied strictly to the DSP itself. For loads with CMOS inputs, the power required to drive a signal trace dominates and it is a better measure of load than the number of inputs or a lumped load capacitance. Since most of the power consumed in a toggling I/O bit goes to drive the transmission line rather than the load at the end of the line, the length of trace attached will directly impact how much I/O power the DSP uses.

Only a Guideline, But Very Useful
Using the methodology described in this article, we can come up with a good estimate of power consumption. And although this modularized approach provides more accuracy than older methods, users should acknowledge that they can't expect an exact answer; it's still an estimate. However, it's not necessary to be dead-on accurate and yet have useful information. These results can be close enough, for example, to help determine whether or not to go to forced cooling, whether to use heat sinks or fans, or if the device is suitable for a system where cooling isn't possible.

As described, the strategy in estimating power consumption in modern DSPs is to modularize and parameterize contributing factors. The problem then becomes one of finding the actual power-consumption figures that bracket the possible utilization extremes for each module.

It's not practical for individual engineers to create and run a "canned" test to isolate each module and come up with a baseline and activity power figure—especially if device vendors are in a much better position to do it once and more accurately. Some device vendors recognize the value of this approach to their users, and they've started to package all these benchmark formulas and calculations into a spreadsheet. Users should approach those device manufacturers who don't yet supply such a tool and encourage them to do so (Figure 2).


Figure 2: A spreadsheet that includes power-consumption values for baseline and high-activity operation of the core and all internal peripherals can save users considerable time in coming up with an accurate estimate of how much power a DSP draws when running a specific algorithm.

Users, though, aren't off the hook completely; they must examine the final application code very closely, coming up with the usage parameters for each module and plugging that data into the spreadsheet. This task itself isn't trivial, but the effort is well worthwhile.

About the Author
Todd Hiers is a DSP applications engineer for the TMS320 DSP group at Texas Instruments. His current focus is verification and validation of new device revisions or fixes along with characterization of device performance and system requirements. Hiers received his BS in EECS and a Masters of Engineering in EECS from MIT. Todd can be reached at tchiers@ti.com





Please sign in to post comment

Navigate to related information

Featured Job On
Scroll for More Jobs

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)