News & Analysis
Network data flow is key to SRAM choice
J. Thomas Pawlowski
5/9/2003 10:24 AM EDT
Insatiable desire for higher-bandwidth networks has made it necessary to create increasingly innovative memory architectures to facilitate new system implementations.
The network design community first adopted the synchronous burst SRAM, also known as the SyncBurst or pipelined burst SRAM, for this purpose in the mid-1990s. At the time, these implementations had to rely on unidirectional data streaming, keeping the SRAM bus in one direction for several cycles before reversing the data flow direction. This was necessary for reasonable bus utilization employing a device originally designed for cache line operations; that is, four data words associated with each address.
As network speeds increased, such an approach to data streaming became inadequate and rapid bus turnarounds (read, then write, then read, etc.) became essential. But this device definition could not perform that way. Wait states added to permit contention-free bus turnarounds affected performance. Higher frequencies were required to achieve the necessary data transfers. Both data and address bus efficiency were inadequate.
Once it was understood how these new systems were exercising the SyncBurst SRAMs, manufacturers began to develop architectural improvements, resulting in the zero bus turnaround SRAM pioneered by Micron, IDT and others. Optimized for network data flow, this type of SRAM has an equal read and write cycle pipeline length. Thus, reads and writes could be randomly interspersed with no bus turnaround penalty. The performance-hungry network design community snapped them up, and this architecture remains the most widely used SRAM in network applications.
There is, however, a problem. Simply put, the zero bus turnaround architecture assumes a specific frequency target, basically a range of 50 through 166 MHz. Only systems with frequencies in that range can use these SRAMs to avoid wasted bus turnaround cycles. To be sure, zero bus turnaround designs can be clocked at higher frequencies, but only with the addition of the dreaded wait state to avoid bus contention. So, now what?
SRAMs are used in a plethora of places in network applications-so many that it is easiest to classify them in terms of the nature of traffic flows that the SRAM bus needs to accommodate.
Lookup-style accesses entail mostly read cycles and occasional writes to update the table. Packet-buffering-style accesses entail balanced read and write operations, acting essentially as an addressable buffer. Packet-classification or quality-of-service processing touches the packet more than once, resulting in an unbalanced ratio between reads and writes and, indeed, in ratios that vary depending on network traffic. Any new architectures will need to deal with this diversity of bus operations.
Increased bandwidth also must be addressed. Obviously, bandwidth can be increased by simply making the bus wider. A 100-MHz, 144-bit zero turnaround bus could easily provide a sustainable bandwidth of 14.4 Gbits/second. However, employing 144 data signals is very costly. And bandwidth in network data-flow management configu- rations cannot be provided blindly, but demands maximum pin efficiency. If, for example, an ASIC technology is capable of 400-Mbit/s/pin toggle rates while satisfying signal integrity issues, then it must be used at that frequency or the implementation will sacrifice pin efficiency.
Data transfer size is also important. Each SRAM bus design uses the smallest word as the basis to set the device burst length requirement, or how many bits must be transferred per address.
Device latency, while not the most critical parameter, is nevertheless a factor. Since the classic two-stage SRAM pipeline (array access plus data transfer) is a reasonably efficient solution, it is best to maintain this concept wherever possible.
This diversity of issues has resulted in a solution requiring three seemingly different yet complementary SRAM architectures, the product of an industrywide effort, the Quad Data Rate SRAM Consortium, consisting of Micron, Cypress Semiconductor, IDT, NEC and Samsung Electronics, among others.
The first of the network-optimized architectures, quad-data-rate SRAM, is so named to reflect how many data transfers occur during one clock cycle. This SRAM comprises separate data-in and data-out buses. Each bus runs at double data rate (DDR) and transfers two data bits per pin per clock cycle. Each bus runs concurrently, hence, 2 x 2 = quad data rate. This also means that read and write operations can be simultaneous.
In fact, the two-word burst device (acting on two bus widths of data per address) requires that a read and a write be initiated each clock cycle to saturate both data buses. This device is optimized for the case in which read and write operations are balanced and, even more important, balanced in the short term. To better illustrate: If there were only one bus, it would Ping-Pong between read and write, then read, then write, etc. That is what is meant by short-term balanced read/write. If this device is used in, say, a lookup application, the SRAM data input bus would be mostly idle and all pins would not be efficiently utilized. In contrast, a packet-buffering-style ap-plication can use the two separate buses to great advantage.
The second of the net-optimized architectures, DDR SRAM, comprises a single common data I/O bus. This device is therefore more accurately referred to as the DDR CIO (common I/O) SRAM. The bus runs at double data rate.
This architecture is optimized for the case in which the bus remains in one direction for long periods-the longer, the better. Unlike QDR, only one read or write can be in flight at one time. If this device is used in, say, a lookup application, the SRAM data bus is mostly used for read cycles with few bus turnaround cycles, so all pins are efficiently utilized. But a packet-buffering-style application can efficiently use this SRAM only by data streaming-executing multiple read cycles back to back, turning the bus around, executing multiple write cycles back to back, turning the bus around, etc.
Data bus efficiency
The goal here is to minimize the number of bus turnaround operations to maintain highest data bus efficiency. At high frequencies (above 180 MHz), two idle cycles are required after a read cycle if a write cycle is to follow.
The third SRAM type, known as DDR SIO (separate I/O), is a cross between QDR and DDR. Like QDR, it has separate data input and output buses. Like DDR, only one read or write can be in progress at once. Unlike DDR, there is never a need to insert idle cycles for bus turnarounds, so the device can always accept one random read or write each clock cycle. The aggregate bus utilization is always 50 percent if you count both data buses.
While choosing the optimal SRAM for a new application still remains a matter of establishing clear goals after looking at trade-offs among data signal count, bus width, frequency, bus use and optimal data transfer size, the good news is that these network-optimized SRAM architectures accommodate nearly every reasonably imaginable set of goals.
J. Thomas Pawlowski is senior fellow at Micron Technology Inc. (Boise, Idaho).
http://www.eet.com



