Design Article
Minimizing latency in diverse embedded system design environments
Miguel Rodriguez
9/14/2009 3:02 PM EDT
These applications are designed to take advantage of PCIe's high-throughput capabilities but also face performance-limiting latency that's masked by the large amounts of data moving throughout the system. Device latency, therefore, plays a hidden yet critical role in how well embedded and communications systems perform.
This article will address how latency issues in embedded (and other) systems have been successfully and measurably countered in PCIe's first two generations and what designers can expect with PCIe Gen 3 on the horizon.
"High performance" in many instances is associated with high throughput. Though bandwidth and performance go hand in hand, there are other factors that make a significant contribution to system performance where high throughput is not part of the system application. In such applications, latency -- specifically device latency -- plays a larger role in the overall performance.
System latency can be narrowed down to two key contributors: the amount of time for an endpoint to respond to a read request, and the amount of time it takes a packet to traverse across a device such as a PCIe switch. For posted transactions, the system latency is the sum of the latency for the individual components.
For non-posted transaction, such as memory and configuration reads, the latency is doubled to account for the round-trip delay. Depending on the number of endpoints in a system, multiple switches can be cascaded to increase the number of PCIe connections. The more switches in a PCIe fabric, the higher the aggregate device latency.
![]() |
| Figure 1. Latency as measured in a PCIe switch |
Protocol efficiency, although not directly impacted by system latency, also plays a key role with regards to the overall performance. PCIe is a packet protocol and each packet consists of a header, up to 4DW of which contains the routing information, and the actual data payload. A PCIe packet can have up to 4KB of data payload per the PCIe specification. Each PCIe packet, regardless of the data payload associated with it, must include a header.
Storage and server applications will move many megabytes of data through the system at any given time. A posted transaction from a Fibre Channel (FC) controller, for example, can transfer up to 4KB of data. This large transaction is broken down into multiple PCIe packets, depending on the PCIe maximum payload size (MPS), typically 128B and 256B (determined by the least capable device in the system).
The completion(s) associated with read requests are also broken down into multiple PCIe packets, depending on the system's PCIe MPS. In order to reduce the protocol overhead and increase system throughput, a FC controller will issue multiple outstanding requests at once in order to mask the system latency.




