Design Article

Providing memory system and compiler support for MPSoc designs: Part 1

Mahmut Kandemir and Nikil Dutt

1/5/2009 6:50 PM EST

System-on-chip (SoC) architectures are being increasingly employed to solve a diverse spectrum of problems in the embedded and mobile systems domain. The resulting increase in the complexity of applications ported into SoC architectures places a tremendous burden on the computational resources required to deliver the required functionality.

An emerging architectural solution places multiple processor cores on a single chip to manage the computational requirements. Such a multiprocessor system-on-chip (MPSoC) architecture has several advantages over a conventional strategy that employs a single, more powerful (but complex) processor on the chip.

First, the design of an on-chip multiprocessor composed of multiple simple processor cores is simpler than that of a complex single-processor system. This simplicity also helps reduce the time spent in verification and validation..

Second, an on-chip multiprocessor is expected to result in better utilization of the silicon space. The extra logic that would be spent on register renaming, instruction wake-up, speculation/predication, and register bypass on a complex single processor can be spent on providing higher bandwidth on an on-chip multiprocessor.

Third, an MPSoC architecture can exploit loop-level parallelism at the software level in array-intensive embedded applications. In contrast, a complex single-processor architecture needs to convert loop-level parallelism to instruction-level parallelism at run time (that is, dynamically) using sophisticated (and power-hungry) strategies. During this process, some loss in parallelism is inevitable.

Finally, a multiprocessor configuration provides an opportunity for energy savings through careful and selective management of individual processors. Overall, an on-chip multiprocessor is a suitable platform for executing array-intensive computations commonly found in embedded image and video processing applications.

One of the most critical components that determine the success of an MPSoC based architecture is its memory system. This is because many applications spend a significant portion of their cycles in the memory hierarchy. In fact, one can expect this to be even more so in the future, considering the ever-increasing dataset sizes, coupled with the widening processor-memory gap.

In addition, from an energy consumption angle, the memory system can contribute up to 90% of the overall system power. In fact, one can expect that a significant portion of the transistors in an MPSoC-based architecture will be devoted to the memory hierarchy.

There are at least two major (and complementary) ways of optimizing the memory performance of an MPSoC-based system: (1) constructing a suitable memory organization/hierarchy and (2) optimizing the software (application) for it. This chapter focuses on these two issues and discusses different potential solutions for them.

On the architecture side, one can employ a traditional cache-based hierarchy or can opt to build a customized memory hierarchy, which can consist of caches, scratch pad memories, stream buffers, LIFOs, or a combination of these. It is also possible to make some architectural features reconfigurable and tune their parameters at run time according to the needs of the application being executed.

Traditional compilation techniques for multiprocessor architectures focus only on performance (execution cycles). However, in an MPSoC-based environment, one might want to include other metrics of interest as well, such as energy/power consumption and memory space usage. Therefore, the compiler's job is much more difficult in our context compared with the case of traditional high-end multiprocessors.

Memory Architectures
The application-specific nature of embedded systems presents new opportunities for aggressive customization and exploration of architectural issues. Since embedded systems typically implement a fixed application or problem in a particular domain, knowledge of the applications can be used to tailor the system architecture to suit the needs of the given application.

Such an architectural exploration scheme is quite different from the development of general-purpose computer systems that are designed for good average performance over a set of typical benchmark programs that cover a wide range of applications with different behaviors.

However, in the case of embedded systems, the features of the given application can be used to determine the architectural parameters. This is particularly true for MPSoC-based systems, in which we have numerous power-hungry components. For example, if an application does not use floating point arithmetic, then the floating point unit can be removed from the MPSoC, thereby saving area and power in the implementation.

Since the memory subsystem will dominate the cost (area), performance, and power of an MPSoC, we have to pay special attention to how the memory subsystem can benefit from customization. Unlike a general-purpose processor, in which a standard cache hierarchy is employed, the memory hierarchy - indeed the overall memory organization - of an MPSoC-based system can be tailored in various ways.

The memory can be selectively cached; the cache line size can be determined by the application; the designer can opt to discard the cache completely and employ specialized memory configurations such as FIFOs and stream buffers; and so on. The exploration space of different possible memory architectures is vast, and there have been attempts to automate or semiautomate this exploration process .

Traditionally, memory issues have been separately addressed by disparate research groups: computer architects, compiler writers, and the CAD/embedded systems community. Memory architectures have been studied extensively by computer architects. Memory hierarchy, implemented with cache structures, has received considerable attention from researchers.

Cache parameters such as line size, associativity, and write policy, and their impact on typical applications have been studied in detail. Recent studies have also quantified the impact of dynamic memory (DRAM) architectures. Since architectures are closely associated with compilation issues, compiler researchers have addressed the problem of generating efficient code for a given memory architecture by appropriately transforming the program and data. Compiler transformations such as blocking/tiling are examples of such optimizations. Note that many of these designs/optimizations need a fresh look when an MPSoC-based system is under consideration.

Finally, researchers in the area of computer-assisted design (CAD)/embedded systems have typically employed memory structures such as register files, static memory (SRAMs), and DRAMs in generating application-specific designs. Although the optimizations identified by the architecture and compiler community are still applicable in MPSoC design, the architectural flexibility available in the new context adds a new exploration dimension.

To be really effective, these optimizations need to be integrated into the design process as well as enhanced with new optimization and estimation techniques. In this section, we first present an overview of different memory architectures and then survey some of the ways in which these architectures have been customized.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form