Design Article
Implementing Parallel Packet Buffering: Part 2
Sailesh Kumar, Raja Venkatesh, Joji Philip, and Sunil Shukla, Paxonet Communications
4/29/2002 2:34 AM EDT
Here's the second installment in our two-part set on parallel packet buffering (PPB) architectures. In the Part 1 of this series, we explored how the PPB architecture works, looking at how it improves memory utilization and overall memory bandwidth in network architectures. Now, we'll explore some of the challenges that designers will face when implementing this technology. We'll also deliver some requirements for choosing the number of DRAM memory modules when implementing PPB.
While PPB provides many positives, such as higher bandwidth and better bandwidth utilization, it does come with one disadvantage--increased overflow rate.13,14,15 Let's look at the impact of overflow on a PPB architecture for different traffic profiles.
The PPB architecture can lead to an early overflow even when there is some space available in one of the memory modules. If the less occupied memory module is being read, then all the incoming packets will be directed to other idle modules, which may already be full or having higher occupancy, causing an overflow or increased imbalance in the occupancies of the memories.
To illustrate this situation, let the occupancies of the memory modules 1, 2, and n be X1, X2, and Xn respectively. Then the state of the PPB at any time can be represented by the vector { X1, X2, ...,Xn}.
Due to random read and write sequences, there can be small discrepancies in the value of Xi's. However, ideally, in the steady state, the values of Xi should remain almost same for all i.
Statistically all the Xi's should tend towards each other because PPB's selection policy chooses the memory module with the least occupancy for writing any new arriving packet. However, if all the memory modules with lesser occupancies are getting read consistently, then the arriving packet will be directed to a memory module with a higher occupancy. This will lead to an increase in already existing discrepancies in the memory occupancies. These increasing discrepancies in the memory occupancies might ultimately lead the overflow of the PPB even when there is some space left in some of the memory modules.
An early overflow state in a PPB with N=2 is depicted in Figure 6.

As a worst case in PPB, it is quite possible that one of the memory module overflows while remaining (n -- 1) modules remains completely empty. Thus, in the worst-case scenario, PPB architectures will perform like an n times faster buffer with n times less buffering space.
Fortunately, in practice, such cases are very rare. Even during simulation, this scenario is difficult to recreate.
Overall, simulation results showed that the amount of memory getting wasted due to increased overflow rate is independent of the total buffer size and the bandwidth. It depends on the traffic profile and is inversely proportional to the value of n.
Estimating the overflow rate
To estimate the overflow rate, let's look at the impact an n=2 PPB architecture as on input-queued switches.16,17. In this example, assume that the PPB memory buffering space is used to emulate n virtual packet queues in a common memory buffer. Qi represents the ithqueue and event of arrival of a packet to this queue is represented by Ai.
For the arrival process, let's consider Ai's distributed function as a uniform independent identically distributed (i.i.d.) Bernoulli function.19. The selection policy of scheduler (arbiter) to select one of the virtual queues for reads is made random. It selects a queue out of the n queues each with uniform probability of selection (1/n) independent of the queue size.
Finally, we will only present simulation results for the case when offered load is high or the virtual queues are written and read very rapidly, utilizing the maximum possible bandwidth. The reason we do this is because we are interested in the overflow rate and overflows only occur in significant numbers (and therefore only become measurable) when the offered load is high. Also we define the wastage factor (ω) as the ratio of memory buffering space left in the memory when overflow occurs to the total memory size. Equation 2 highlights this scenario:

where Oi is the occupancy of the ith memory module when the overflow occurs.
The simulation results for uniform i.i.d. Bernoulli traffic without any burstiness has been shown in Figure 7. Based on the results highlighted in this figure, it appears that the wastage factor, ω is inversely proportional to the total PPB size M. However, the total memory wastage in terms of number of buffers remains almost same irrespective of the PPB size (M).

In practice, it can be concluded that beyond certain M, the total memory wastage remains almost constant and we have found that it is as low as 10 packets worth space and wastage factor keeps on decreasing as the total buffer size increases. Thus by putting few additional buffers in the PPB, the overflow rate will become comparable to that of a logically single memory based architecture.
Now let's consider whether burstiness in the arriving traffic cause higher rate of overflows. Intuitively, it can be said that the traffic arrival rate can lead to burstiness in the traffic departure, if the device is not performing any traffic shaping function. This kind of burstiness can result in higher imbalances in the memory occupancies and hence higher overflow.
To see the effect of burstiness on the performance of the PPB architecture, let's consider a two-state Markov-modulated Bernoulli arrival method.18. Often used as a simple model of burstiness, this method models a simple on-off process that alternately produces a burst of packets, all with the same destination, followed by an idle period. The length of the bursty and idle periods is geometrically distributed.
Through this simulation, it can be seen that, contrary to expectations, burstiness in the incoming traffic reduces the fraction of buffering space wasted by the PPB architecture. This is also shown by Figure 7 above.
Overall, the authors of this article believe that increased burstiness will decrease the wastage factor. Here's why.
First, recall the example in Figure 7 that shows how discrepancy (and hence increased wastage factor) can arise in the PPB architecture. The discrepancy was caused by a sequence of constrained writes (while other memory modules were being read out) that forced an arriving stream of packets to be written into only one of the memory module.
Later, when this sequence of packets departed (all from one memory), they forced another sequence of arriving packets to be written into the other memory. In other words, when a sequence of arriving packets causes a discrepancy in the occupancy distribution, the same sequence of packets will cause further discrepancy when they depart. In this way, discrepancy propagates in a FIFO PPB buffer queue.
Now, suppose that the packets had not departed in FIFO order. In fact, let's assume that they departed in a random order. This would mean that the first sequence of arriving packets would not force all of the packets in a future sequence to be written into the same memory module. In other words, the random departure order breaks this propagation of discrepancy from one sequence of packets to the next.
Now consider what happens when the buffer is arranged as a set of virtual output queues rather than a single FIFO queue. The scheduling algorithm will cause the packets to depart in non-FIFO order. Thus, the wastage factor will decrease. Furthermore, if arriving packets are burstier, they will depart even further away from the FIFO queue.
Since the scheduling algorithm is independent of the queue size, it is unlikely to service the same queue twice in a row. This means that packets arriving in a burst are unlikely to depart consecutively, or even close to each other in time. If, on the other hand, the arrivals are not bursty, then successively arriving packets are likely to be destined for different queues. If this analysis is accurate, it is therefore quite possible that the scheduler will cause them to depart consecutively, or close to each other in time.
Currently, the authors are running simulations on this theory. At this point, however, the authors have not been able to prove that the scenario detailed above will occur on a regular basis.
Wastage Factor Variations
All of the results that we have presented have been for the case when the offered load is high. Although the examples above show that the wastage factor is low for a reasonably larger sized memory, it is worth asking whether the wastage factor is still small when we reduce the offered load. Indeed, simulation results show that as we reduce the offered load for different types of traffic, the wastage factor decreases.
For brevity, we will not reproduce lots of results here. However, Figure 7 shows that as the offered load decreases, the waste factor also decreases.
Intuitively, the relationship between offered load and wastage factor is simple. Recall that wastage in the PPB architecture is caused by discrepancies in the occupancy level of the memory modules, which in turn is caused by reads and writes in a particular order. When the offered load is high, there are a large number of reads, which constrain the writes of the (plentiful) new packet arrivals. Therefore, the probability that an incoming packet faces a constrained write increases with the offered load. Discrepancy is increased, leading to larger wastage.
Therefore, we can infer that when the offered load is reduced, the wastage factor will also decrease. In fact, we found that the wastage factor is inversely proportional to the offered load on the PPB.
To support this point, let's look at the discrepancy of the memory occupancies at the times of overflows. The discrepancy is independent of memory size. However, if the buffer is small, then it constrains the discrepancy to small values. A small increase in memory size significantly reduces the probability that a discrepancy will cause an overflow, but increases the maximum discrepancy (Figure 8).

On the other hand, if the buffer is large, a small increase in memory size barely changes the probability of a discrepancy causing an overflow. In addition, the maximum discrepancy remains almost same.
Wastage Factor as Function of n
Recall that all the simulation results have been shown for a PPB with n=2. It can also be proven that as the value of n increases, the wastage factor decreases rapidly in an inverse relation. This can be hypothetically proved by the following argument:
- It is known fact that the wastage factor is directly proportional to the memory discrepancy.
- The memory discrepancy in PPB occurs when a write is constrained to happen on a memory with higher occupancy. This happens because the memories with lower occupancy may be getting read while the new packet arrives.
- Consider that a packet to be read out of PPB has equal probability to exist each of the n memory modules. Thus, the probability that the packet to be read exists in memory module i is 1/n for all i. Therefore for each read operation, PPB can select a memory module i with probability 1/n.
- Earlier we have proved that with the highest possible offered load on PPB, n/2 out of n memory modules will be getting read at any point of time. Therefore for arriving packet, PPB will select one memory module out of the n/2 idle memory modules. Now the discrepancy will increase only if all the n/2 idle modules have higher occupancy than all the n/2 modules that are getting read. From point 3, it can be concluded that probability of such an event is 1/(2xnx(n-1)). Thus the probability that the discrepancy will increase has an inverse relation with n.
Logically speaking, as the value of n increases, the PPB arbiter has more memory modules (n/2) to select from during each write operation. The PPB arbiter will select the memory modules will the least occupancy. Therefore, by increasing the values of n, memory discrepancies (if they exist) will begin to decrease because the PPB arbiter will be able to select the least occupant memory modules out of the lot of n/2 modules.
Choosing the appropriate n
The real trick for designers implementing the PPB architecture is choosing the appropriate n value for their design. To help out, designers can use the following parameters:
- Bandwidth requirements.
- Traffic profile (mean and variance of packet size).
- The available memory technology.
As the value of n goes up, the memory bandwidth requirement of a single memory module goes down. Actually each memory is required to support n times lesser bandwidth than the line rate. Thus to achieve very high bandwidth that can't be supported by a logically single memory, there is no option but to use PPB. The value of n can be chosen depending on the buffering bandwidth required and the maximum bandwidth capacity of the memory technology.
The n value also depends on the traffic profiles and the packet sizes. If there is little variation in the packet sizes, then lower values of n will suffice the purpose. For very high fluctuation in the packet sizes, to provide constant and deterministic access times and better utilization of memory buffering space, higher values of n should be chosen.
Also as the n value goes up, the data width of individual memory modules goes down. Their bandwidth requirement also goes down. Thus cheaper memories with narrower data bus and lower bandwidth can be used.
Summing it Up
That wraps up the second part in our two part-set on packet buffering. As can be seen from the discussion, the PPB architecture provides the best memory utilization in a networking architecture while also delivering the scalability needed to keep pace with increasing line rates.
Editor's Note: To view Part 1 of this article, Click here.
References
- Obara, H., "Optimum architecture for input queueing ATM switches," IEE Electronics Letters, pp. 555-557, March 28, 1991.
- Marwan Krunz, Herman Hughes, and Parviz Yegani, "Design and analysis of a buffer management scheme for multimedia traffic with loss and delay priorities," Proceedings of the IEEE GLOBECOM '94 Conference, pp. 1560-1564, San Francisco, November 1994.
- J. Liebeherr and N. Christin, "JoBS: Joint Buffer Management and Scheduling for Differentiated Services", Proceedings of IEEE/IFIP International Workshop on Quality of Service (IWQoS '2001). Karlsruhe, Germany, June 2001
- H. Fu and E. Knightly, "Aggregation and Scalable QoS: A Performance Study", Proceedings of IWQoS '01, Karlsruhe, Germany, June 2001.
- M. Shor, K. Li, and J. Walpole, "Application of Control Theory to Modeling and Analysis of Computer Systems," Proceedings of Japan-USA-Vietnam Workshop on Research and Education in Systems, 2000.
- Eric M. Mantion, "Tomorrow.s Internet Broadband Capillaries, Need Terabit Arteries", Cahners IN-STAT Group, August 2001.
- Gideon Kaempfer, "The Challenges of Building a Terabit Router", Charlotte's Web Networks.
- Youngmi Joo, and Nick McKeown, "Doubling Memory Bandwidths for Network Buffers", IEEE Infocom, April 1998, Vol 2, pp. 808-815, San Francisco.
- Sundar Iyer, Ramana Rao Kompella, and Nick McKeown, "Analysis of a Memory Architecture for Fast Packet Buffers", IEEE Workshop on High Performance Switching and Routing, Dallas, Texas, May 2001.
- D. K. Hunter, M. C. Chia, and I. Andonovic, "Buffering in optical packet switches," J. Lightwave Technol., vol. 16, no. 12, pp.
- M. I. Irland, "Buffer Management in a Packet Switch," IEEE Trans. Commun., vol. 26, pp. 328-- 337, Mar. 1978.
- D.K. Hunter, B. Qiu, K. M. Guild, M.C. Chia, A. C. Bryce, M. H. M. Nizam, Y. Qian, I. Andonovic, J. S. Aitchison, J. Marsh, M. J. O'Mahony: "Buffering Strategies for Optical Packet Switches", OSA Topical Meeting on Photonics in Switching, 21-23 July 1999, Santa Barbara, CA
- Jin Cao and Kavita Ramanan, "A Poisson Limit for Buffer Overflow Probabilities" Proceedings from INFOCOMM, 2002.
- C. Courcoubetis and R. Weber, "Buffer overflow asymptotics for a buffer handling many traffic sources," Journal of Applied Probability, vol. 33, pp. 886--903, 1996.
- Michel Mandjes and Sem Borst, "Overflow behavior in queues with any long-tailed inputs," Advances in Applied Probability, vol. 32, pp. 1150-- 1167, 2001.
- Marco Ajmone Marsan, Andrea Bianco, Paolo Giaccone, Emilio Leonardi, Fabio Neri, "Packet Scheduling in Input-Queued Cell-Based Switches", IEEE INFOCOM'01, Ankorage, Alaska, USA, April 2001.
- N.McKeown, "Scheduling algorithms for input-queued cell switches", Ph.D. Thesis, Un. of California at Berkeley, 1995.
- A. B. DRAGUT, M. Dragut, "Upper Bound Decision Process for a Time-Constrained Radical New Product Development Process (TR-NPD)-A Finite Horizon Discrete-Time Markov Nonstationary Approach", Proceedings of the X-th International Symposium on Applied Stochastic Models and Data Analysis, June 12-15, 2001, Compigne, France, pp.392-397.
- B.D.Choi, Y.C.Kim, D.I.Choi, D.K.Sung, "An Analysis of M, MMPP/G/1 Queues with QLT Scheduling Policy and Bernoulli Schedule", IEICE Transactions on Communication, Vol.E81-B, pp.1-10, No.1, 1998.
About the Authors
Joji Philip, Raja Venkatesh, Sailesh Kumar, and Sunil Shukla are design engineers at Paxonet Communications. Joji can be reached at joji@pune.paxonet.com; Raja can be reached at raja@pune.paxonet.com; Sailesh can be reached at sailesh@pune.paxonet.com; and Sunil can be reached at sunil@pune.paxonet.com.



