Design Article
Selecting an appropriate programming model
Larry Huston, Principal Software Architect, Network Processor Division, Intel Corp., Tempe, Ariz.
8/5/2002 7:39 AM EDT
Programming a network processor offers its own challenges. That's because it differs in some fundamental ways from the general-purpose processors that have in the past been the staple of embedded design. Conventional programming models on general-purpose processors are based on a sequential programming paradigm and do not account for the parallelism and hardware features offered in the network processor. To take advantage of these features, new programming models are needed.
Most network processors use parallelism to achieve high performance. This can be accomplished with thread-level parallelism as well as multiple processing elements within the network processor. Programming these parallel devices introduces several important challenges.
One is serialization. Because multiple threads or processing elements may be accessing the same data, mutual exclusion must be provided to keep two threads from updating the data at the same time.
A second is maintaining packet order. Frequently, packets must be processed in the order they arrived. For example, a forwarding device must not reorder related packets because of the effect on the end-station throughput. In addition, some packet processing (such as header compression and header decompression) requires that packets be processed in the order they are received. Because independent threads may be working on related packets, the programming model must provide some way to ensure that the ordering constraints are not violated.
When a programmer wishes to map an application onto a network processor, a programming model must be chosen that addresses these and other challenges. The next two sections will look at two common programming models and some of their advantages and disadvantages. At the end of this article we will look at a sample networking application that mixes both programming models.
The sequential programming model is similar to what a programmer of a general-purpose processor would expect. The user writes a program that takes a packet (or other unit of work) and performs a set of operations on the packet. The program will continue processing the packet until all of the necessary work has been performed. When the current packet is finished, the program will get a new packet and repeat the procedure. To take advantage of the parallelism within a network processor, this program is instantiated on each thread in parallel.
The programmer using this model is responsible for providing serialization between threads. To help the programmer, most network processor vendors provide a combination of special hardware and software libraries. The hardware may include efficient lock hardware as well as special features to perform common tasks without locking. One example would be hardware that atomically increments a memory location; this feature allows an application to keep statistics without locking the relevant data structures.
The network processor vendors normally provide some libraries that allow the programmer to abstract the locking functions. These libraries will typically provide access to hardware functions in a manner familiar to the programmer.
In the sequential model, the programmer is also responsible for maintaining packet order for applications where order is required, with the complexity of maintaining packet order depending on the specific application. As with serialization, most network processor vendors provide a combination of hardware features and software libraries that can be used by the programmer to maintain packet order. It is the programmer's responsibility to understand when order must be preserved and to use the appropriate libraries to implement ordering.
Feature support
At the high level, the sequential programming model is similar to what most programmers would expect on a general-purpose processor. If a specific application has locking or ordering requirements, then writing the application may become more challenging. The additional complexity of writing the application will depend on how efficiently the network processor supports the required features and how effectively the provided libraries hide the underlying complexity.
The sequential model works well when the amount of effort needed varies from packet to packet. This will likely be the case when many different protocols and data types have to be processed. In this case, the programmer can look at the expected packet distributions and processing requirements to estimate performance.
The second programming model may be called the pipelined model. With this method, the programmer takes a network application and divides it into separate processing stages. Each of the processing stages is mapped onto specific forwarding engines. The packet logically moves through each of the stages similar to the way a hardware implementation would work. Variations of this model allow the programmer to map multiple functions onto a single processing element and instantiate them in parallel.
An important characteristic of the pipeline model is that threads maintain execution order through each of the blocks or set of blocks. If Thread 1 starts processing a packet before Thread 2, then Thread 1 must complete processing before Thread 2 can complete its own processing.
The mechanics of enforcing thread ordering are specific to each network processor. With some network processors, the hardware itself controls the movement of data from one stage to another. In other network processors, the programmer maintains order by communicating between the threads.
The pipelined model helps to solve concurrency problems. Because the programmer controls the execution order, it is possible to ensure that only one thread (or group of threads) is updating shared data at a same time.
Packet ordering problems are also solved in this model because the packets are implicitly ordered at each of the stages. Order is maintained for all packets processed by the same group of threads.
As with any pipeline architecture, the throughput will be limited by the slowest pipeline stage. The pipelined model is well suited for tasks where the amount of time spent on each packet is approximately the same.
One challenge the pipeline model presents is balancing the work performed by each pipeline stage to ensure that no stage takes significantly longer than the other stages. When new features are added to an existing design, this may require rearranging some of the code to rebalance the pipeline stages.
An application that is well suited to a mix of sequential and pipelined programming if the underlying network processor supports it is a basic router implementing IPv4 and IPv6 forwarding and IPv4 and IPv6 interoperability. It maps well to a network processor configuration, in this case the IXP2800, consisting of 16 processing elements, each running at 1.4 gigahertz and in which each microengine has eight threads.
The pipelined model is the best choice for media receive and transmit, as well as queuing and scheduling, for several reasons. First, the amount of work that must be performed on each packet/cell is the same. Second, the blocks are all updating shared state, so efficient mutual exclusion is important.
Using the pipelined model we can efficiently implement this mutual exclusion without requiring locks in memory. Another reason for choosing the pipeline model is that all of these blocks require packets to be processed in the order they are received. The pipelined model naturally achieves this ordering requirement without additional overheads or complexity.
The sequential model is ideal for packet processing because the amount of work required for each packet can vary. One packet may need to go to IPv4 forwarding, then to the interoperability block for encapsulation and then to IPv6 to forward the encapsulated packet. Another packet may require only IPv4 forwarding. Using the sequential model, we allow the thread processing of the second packet to complete its work and start processing another packet while the first packet is still being processed. Correct packet ordering is maintained as items are sent to the queue manager.


See related chart
