Design Article
PRODUCT HOW TO: Pushing performance limitations in microcontrollers
Kristian Saether
3/5/2010 2:22 PM EST
Interrupts, however, can only determine when a real-time event has occurred. Developers must still directly involve the CPU to read I/O and peripherals before data is lost. Handling an interrupt requires potentially interrupting other latency-sensitive tasks, incurs context switching overhead, and introduces a wide range of esoteric challenges such as managing latency when multiple interrupts occur concurrently, all of which reduce predictability and processor efficiency.
To be able handle the high data rates and frequencies of real-time I/O and peripherals, microcontrollers must achieve higher processing efficiency. This efficiency, however, needs to be founded not on increased clock frequency (which comes at the expense of higher power consumption) but through internal changes in microcontroller architectures. Specifically, microcontrollers have begun to integrate coprocessors which offload specific task blocks, multi-channel DMA controllers for facilitating penalty-free memory access, and integrated event systems which route signals between internal subsystems to offload I/O and peripheral management.
More than one way to offload a CPU
Integrated coprocessors have become fairly widespread in a wide range of embedded microcontrollers. Among the more commonly recognized coprocessors are encryption and TCP/IP offload engines. Effectively, coprocessors offload entire tasks or assist in the more computer-intensive portions of complex algorithms.
For example, an encryption engine reduces AES computations on the CPU down from 1000s of cycles to 100s of cycles per operation while a TCP/IP offload engine make it possible to terminate an Ethernet connection with little CPU overhead. In addition, offload engines simplify implementation of these tasks, eliminating the need for extensive driver code creation by allowing developers to access this advanced functionality through the use of simple APIs.
DMA and event system technologies are less familiar to developers and often not used for this reason. DMA controllers offload management of data movement from the CPU by performing data accesses, such as peripheral registers to internal or external SRAM in the background. For example, a developer can configure the DMA controller to preload a block of data into on-chip RAM so that it is available for fast access before the CPU needs it, thus eliminating wait states and dependency delays. Alternatively, a DMA controller can assume most of the burden of managing communication peripherals " see table 1.
Table 1: DMA Controllers can assume most of the burden of managing communication peripherals.
The savings in cycles from using a DMA controller can be significant: many embedded developers have found themselves unable to fit an application within the resource limits of a microcontroller, only to have the manufacturer introduce them to DMA and suddenly find themselves with extra cycles available, sometimes on the order of 30 to 50 per cent across the system.
It is only when they face a processing wall that many developers first discover the untapped potential that has been available to them from the start.
Even fewer developers are familiar with event systems which work in conjunction with DMA controllers to further offload CPU cycles as well as reduce overall power consumption. An event system is a bus that can connect internal signals from one peripheral on a microcontroller to another. When an event occurs at the peripheral, it can trigger action to be taken in other peripherals without involving the CPU and within a two-cycle latency. much the way the human body processes reflexes like pulling a person's hand out of a fire without having to first consult the brain.
Specifically, an event system routes signals throughout the microcontroller using a dedicated network connecting the CPU, data bus, peripherals, and DMA controller. Normally peripherals must interrupt the CPU to initiate any action, including reading the peripheral. By routing events directly between peripherals, the event system in effect offloads these interrupts from the CPU. Developers have the flexibility to configure peripherals to follow different event channels, thus defining the particular event routing required to meet the specific needs of the application.
Next: Flexible offloading




KarlS
3/9/2010 11:30 AM EST
I am working on a design of a component to be used as a real time coprocessor rather than the compute intensive functions. It fits right into the environment you described. It is programmed in C, not a C to HDL gadget, that executes C in minimum cycles. It could be used to provide additional control function to the DMA. The size is scalable according to the code size because it is mainly embedded ram.
CEngine is a hardware design block diagram file that can be compiled into a custom component used with Altera's SOPC builder just as the FIFO is used for DMA. There is also a C# program that parses C statements and expressions to load the memory blocks to personalize the hardware to the code function. The program also has a simulation step that creates a cycle log showing the data manipulation for the code execution.
Sign in to Reply