Design Article

An application modeling & hardware description for network-on-chip benchmarking

Erno Salminen, Cristian Grecu, Timo D. Hamalainen, and Andre Ivanov

1/14/2009 1:06 AM EST

Measuring and comparing performance, cost, and other features of advanced communication architectures for complex multicore/multiprocessor systems on chip is a significant challenge which has hardly been addressed so far.

The NoC Benchmarking Workgroup of OCP-IP presents a modeling concept for applications running on multicore systems and defines an XML format for documenting and distributing network-on-chip benchmarks.

It defines a black-box view of the processing elements that discloses only the computational aspects that are relevant in interacting with the on chip data transport mechanism. The purpose is to lay the groundwork for a standardized NoC benchmark set.

No optimal NoC exists in general case and a brute-force search is impossible due to vast design space. Benchmarking, however, allows identifying the most promising solutions which are then selected for detailed and more time consuming analysis.

This reduces the design time notably once the major characteristics and requirements of the system are known. Furthermore, common point of reference is required for detailed comparison of approaches. Hence, NoC benchmarking aims to answer two basic questions:

1) NoC developer: What gain does my novel feature bring? .
2) System integrator: Which NoC should I choose? How should I configure it?

The model and the corresponding XML description are divided into four main sections:

1) Application defines the workload in terms of computation and communication.
2) Mapping binds the application tasks to the resources.
3) Platform defines the resources and the NoC interconnecting them.
4) Measurement section defines how to perform the evaluation, for example metrics and simulation length.

Figure 1: Modular view of the NoC benchmarks and corresponding pseudo-XML

To view an enlarged image, click here.

Separation to distinct parts is necessary to handle complex architectures and applications. Orthogonality allows exercising or modifying one of the components, while keeping the rest at their previous (default) configuration. Thus, the mapping, for example, may be varied without touching the application or hardware models.

Similarly, one can describe the particular NoC once but change the application when needed. Figure 1 above depicts the proposed NoC system model. It shows a simple task graph consisting of six tasks (A-F) and two triggering events, which trigger tasks A and D. The tasks are grouped into five groups (I-V) that are mapped onto platform, namely to four processing elements (PE0-PE3).

The data amounts (in bytes) are expressed beside each edge. Assuming that the events in this application are periodical and their time interval is set properly, all the PEs 0-3 can execute tasks simultaneously.

The benchmarks set covers the three uppermost sections of Figure 1 (application, mapping, computation resources) and measurement settings but leave the network definition to the NoC designer/evaluator.

Message-passing communication paradigm is assumed at this stage of research but no assumptions are made about the abstraction level of the NoC. The XML description is given as input to a traffic generator, which will be used during the simulation/evaluation. Figure 2 below shows the top-level tags used for describing the NoC system model.

Application tasks are the primary means of expressing the communication and computation load. Tasks communicate via unidirectional ports. A task is triggered for execution according to a condition that depends on the received data tokens and possibly on the internal state of the task. A task may include several behaviors but exactly one is executed at a time.

Figure 2: Major tags of the XML description of a NoC benchmark

To view an enlarged image, click here.

For example, a task may have a different behavior on the first execution but after that the same on all other executions. Another example could be that a task performs function A on every even execution count and function B on every odd one. In the extreme case, the task behavior is different on each execution.

This is suitable for capturing a trace for highly varying tasks for which the average values are misleading. Hence, the same file format can store both the captured trace of events and the more abstract workload models. Each execution behavior is characterized by three elements:

1) operation count,
2) data amount to be sent (in bytes) and the output ports where the data will be sent, and,
3) next state of the task after execution.

The operation count and data amount are expressed with either a statistical distribution (uniform, normal, or Poisson), or as a polynomial function, which depends on the received data amount (a constant value is subset of the latter). Polynomial and statistical are mutually exclusive choices.

However, the choice is done per task; certain tasks may have polynomial count/amount values and others statistical ones. The actual values are, for example, from simulation of virtual prototype when the application exists but educated guesses are needed in case of estimating the workload of future applications.

By varying the values, the designer can search for corner cases, for example, finding the maximum allowed operation counts for certain tasks. These may serve as boundary conditions for the application developers.

There is one send tag for each destination. Each output is assigned a certain probability. The same output is always chosen if its probability is equal to 1.0, i.e. in 100% of cases. Smaller probability values allow a more compact the model as fewer triggering conditions are needed.

Mapping is performed in two stages: grouping of tasks together, and mapping the groups to resources. If uncertain, users can have just one task per group. The main ideas in grouping are to model operating system threads and to restrict mapping exploration.

Both tasks and groups may have parameters related to scheduling, such as priorities. Group contents may be modified by automated design automation tools if allowed (mutable="yes") or they can be moved elsewhere (movable="yes"). The former restriction can be also done for a PE and the latter for a task.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form