IJECET: FPGA Implementation of Scalable Queue Manager

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
INTERNATIONAL JOURNAL OF ELECTRONICS AND
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 4, Issue 1, January- February (2013), pp. 79-84
IJECET
© IAEME: www.iaeme.com/ijecet.asp
Journal Impact Factor (2012): 3.5930 (Calculated by GISI) ©IAEME
www.jifactor.com

FPGA IMPLEMENTATION OF SCALABLE QUEUE MANAGER

Ms. Sharada Kesarkar#1, Prof. Prabha Kasliwal#2
Department of Electronics, MAE, Alandi (D)
University of Pune, India.
1
sharada.kesarkar@gmail.com, 2prabha.kasliwal@gmail.com

ABSTRACT

The main issue while designing a network system is very often its memory system.
This is mainly due to the constantly changing nature of network traffic and demand to realise
quality of service (QoS) in networks. Per flow queuing is an effective solution to guarantee
QoS but its brute force implementation consumes a huge amount of memory because its
assign dedicated physical queues to each in-progress flow, hence QM need to maintain a
large number of queues. It require huge amount of memory and hence memory scalability is
becoming critical issue also system performance may be degrades as the number of flow
increase. To achieve per flow queuing performance with less memory a scalable queue
manager (QM) solutions are vital with fixed architectures and efficiently management of
queues. In this paper, we present a proposed FPGA- implementation of scalable QM
architecture which is able to manage memory resources efficiently by dynamic queue sharing
(DQS) techniques. DQS provide isolation of each in coming active flow by dynamically
allocating ongoing active flows onto a limited number of physical queues instead of assigning
a dedicated queue for each in-progress flow. Practically, the number of active flows is always
low which significantly reduce the required physical queue from millions to hundred hence
required memory resources will be less. DQS for per flow and per class system is designed,
implemented and simulated in Xilinx Ise simulator using Xilinx family device spartan 3
xc3s4000. The proposed advanced algorithm enables the architecture to work in per flow and
per class mode which dramatically reduce the queue exhaustion.

Keywords: Field programmable gate arrays (FPGAs), Queue manager (QM), DQS, Active
flow, per flow queuing

79


I. INTRODUCTION

The constantly changing nature of network traffic and demand to realise quality of
service (QoS) in networks require packet processing functionality in networking devices to
support varying QoS levels.[1] It mainly responsible for holding arriving packets during the
time of traffic congestion to smooth the burst of internet traffic. In a packet buffering system,
related studies fall into three categories: enqueue mechanism, dqueue mechanism and queue
organization mechanism. [2] Queue organisation is the basis of the queuing system. It decides
how to implement queue in a buffer and how to assign flow with physical queue. If the
number of physical queues is larger than the number of active flows (the flow having the
packet in the queue), an isolated queue is assigned for each flow to maintain required QoS;
otherwise several different flows would share a same queue and the QoS guarantee of each
flow is violated and also increase the queue exhaustion.
per-flow designs assign separate physical queues to each in-progress flow but one problem
with this is that number of flows carried over links in the current networking system is
continuously increasing [3] hence whenever there are more flows than the number of physical
queues, several different flows will compete for the same resources and the QoS guarantee of
each flow is violated hence QM need to maintain a large number of queues. But with
increasing number of flows, it is difficult to maintain and manage large number of queues
also it require huge amount of memory and hence memory scalability is becoming critical
issue. Thus, efficient use of memory resource is necessary in achieving scalable queue
management system.
To overcome these limitations we propose a FPGA implementation of scalable QM by
designing the proposed advanced algorithm. We monitor the number of active flows
continuously. Practically the fact that most active flows have short time scales [2], [4], [5].
On the other hand, the number of active flows is always low compared to the total physical
flow. Based on this, we proposed techniques to implement scalable QM. The idea is that we
only assign physical queues for ‘active flows’ instead of all in-progress flows in network
while maintaining the per-flow queuing feature. We used total 10 fix queues which are used
to store packets from different in coming active flows. If queue is full provide a queue full
indication to use another free queue.
Whenever the particular output port is available it send packet to that port and make queue
free. Hence the number of required physical queues can be reduced from millions to
hundreds. We proposed the advanced algorithm which enables the architecture to work in
both per flow and per class mode depending on the current traffic condition (i.e. whether it is
increase or smooth) to dramatically reduce the queue exhaustion and also reduce the number
of packet loss.
Key feature is:
-To dynamically assign the separate physical queue to only active flow to maintain the per
flow queuing features.
-Efficiently use the memory resources by reducing the required number of physical queues in
pre class mode and
-Maintain the more number of input flows on same number of queue.
Queue exhaustion phenomena is observed in traditional per-flow QMs as there are more flow
then available physical queue [7]; a straightforward solution is to increase the number of
physical queues equal to the number of input flow but this requires more memory and also
difficult to maintain these huge number of physical queue. By allocating dedicated queues for
simultaneous active flows (An active flow is defined as the flow having packets buffered in

80


the devices) instead of all in- progress flows, the QM is able to reduce the required physical
queues from millions to hundreds [10]. However in practice, traffic flow of current network is
continuously varying which can impact performance of network system. Thus, unfavourable
network traffic can contend for the available queue resources, leading to queue exhaustion
(i.e. all of the physical queues are occupied) and also causing major system performance
degradation. Dynamic queue sharing (DQS) is used to share a small number of physical
queues among active flows in a specific system. This technique provides solution on queue
exhaustion for some extent. However, no QM solution exists that implements active flow
sharing for queue management in a scalable fashion.

II. THE QUEUE MANAGER

The block diagram of the proposed queue manager is shown in Fig. 1. In this
architecture, packets are arriving from the input port and are departing from the output port.
The system is controlled through the queue controller. This architecture uses memory devices
to implement the queues i.e. queue bank. There is a queue controller which is responsible for
writing (reading) data packets to (from) the queues. The main function of the QM is to store
the incoming packets in certain individual queues to the data memory, read the stored packets
and forward those to the data packet flow whenever the output ports will be free. Fig. 1
outlines the architecture of the QM.

Fig.1: QM Architecture

It consists of following main blocks: queue Controller (QC), Queue bank, segmentation,
active flow indication, data packet flow. The data packet is received from the input port. if
input flow is active flow this is indicated by active flow indication and sends it to
segmentation module. It performs the de-framing operation and separate data bits from
packet. The queue controller takes data and search for free queue to write data into it. if free
physical queue is available then QC write data into that queue and make it busy queue. if
queue is full it provide an indication for queue full to search for new free queue to write data
into it. This is called queue write process. The queue read process works by sending the data
to data packet flow which frame the packet and send it to output port if it is available and
make the queue free. As long as there are enough free queues, these queues will be used for
new incoming active flows. QC is responsible to monitor the number free queues if it is less
than cut off level it will select the per class mode otherwise the architecture works in default
per flow mode.
The architecture can be work in two mode i.e. per flow QM and per class QM. Whenever
there is increase in traffic and number of free queue less than cut off level then QC allows

81


the architecture to select per class mode, this will reduce queue exhaustion in the per flow
QM by reducing the number of incoming active flows that can be done by organising the
input active flow into a class of flow. Similarly, the QC allows the reverse operation in order
to resume per flow mode, whenever there is a smooth traffic and the number of free queue
greater than cut off level. The system can protect itself from queue exhaustion by selecting
any mode (i.e. per flow QM or per class QM). This can be implemented by designing
Advance algorithms.

III ALGORITHMS

Arrival

Search for active flow
if yes
check for queue full
if null
write the packets in to queue
else
allot a new free queue to write packet and make it busy

Departure

if read signal
send packet to output port from corresponding queue
if queue empty
make it free
else
null

Main Algorithm

busy queues reach to cut off
if null
work in per flow mode
else
work in per class mode

IV. FPGA IMPLEMENTATION

The described architecture was implemented using VHDL hardware description
language. We have used Xilinx 9.2i. We have implemented ten fixed size queues for the
twelve minimum input flows. The QM store and forward the packet to the output port. The
whole architecture is controlled by the QC which takes input packet and writes it to the free
queue. If this queue is full it will allot another free queue to write packets.
Whenever the output port is free QC send packets from corresponding queue to that port.
While doing this the QC always monitor the number of free queue if they are less than the cut

82


off level then it will select the per class mode and run the same architecture in that mode
otherwise the architecture run in the per flow mode.
We have run the architecture in per flow and per class mode on 3s400fg456-5 device and
provide the simulation result to calculate required number of physical queues, logic element
and Memory utilization. We have observed the result per the following table.

Decrease
Advance
Parameter DQS rate w.r.t.
d Decrease
algorithm to DQS
algorithm
algorithm
memory
Memory
size in 242.2 308.7 -66.5 27.06%
Utilization
Mb
Number
5257 2544 2713 51.60%
of Slices

Number
of Slice 5904 1020
Device 4885 82.72%
Flip 82% 14%
Utilization Flops
Number
of 4 5441 4972
649 8.61%
input 75% 69%
LUTs

V. CONCLUSION

The main objective of any algorithm is to be implemented on FPGA to determine the
efficient Memory utilization together with the minimum number of physical queues and
device utilization. In this study, the algorithm for the proposed design is described in VHDL
hardware description language and the logic is tested in Xilinx-Ise simulator. The
simulated design to be optimized for Memory resources and device utilised using Xilinx
family device spartan 3 xc3s4000. DQS algorithm operates on 126MHz while the advanced
algorithm works on 137MHz. We got the result that without the introduction of the scalable
QM mechanism, per flow queuing suffers from a very high queue exhaustion and required
higher memory resources compare to the per class mode, also by implementing Advanced
algorithms, the packet dropped rate is again reduces dramatically, device utilization and
memory resources are less required in per class QM.

VI. REFERENCE

[1] Qi Zhang, Member, IEEE, Roger Woods, Senior Member, IEEE, and Alan Marshall,
Senior Member, IEEE, “ An On-Demand Queue Management Architecture for a
Programmable Traffic Manager” ieee transactions on very large scale integration (vlsi)
systems,pp1849-1862,Octomber 2012
[2] C. Hu, Y. Tang, X. Chen, B. Liu, “Dynamic Queuing Sharing Mechanism for Per-flow
QoS Control,” IET Communications, Vol. 4, No. 4, pp.472-483, Mar. 2010.
[3] Y. Xiao, et al.: “Internet Protocol Television (IPTV): The killer application for the Next-
Generation Internet,” IEEE Communication Magazine, pp.126-134, Nov. 2007.

83


[4] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely, C.
Diot, “Packet-level traffic measurements from the Sprint IP backbone,” Network, IEEE, vol.
17, no. 6, pp. 6-16, Nov-Dec. 2003
[5] A. Kortebi, L. Muscariello, S. Oueslati and J. Roberts, “Evaluating the number of Active
Flows in a Scheduler Realizing Fair Statistical Bandwidth Sharing”, ACM SIGMETRICS'05,
Banff, Canada, Jun. 2005.
[6] A. Nikologiannis, I. Papaefstathiou, G. Kornaros, C. Kachris, “An FPGA-based Queue
Management System for High Speed Networking Devices,” Elsevier Journal on
"Microprocessors and Microsystems", special issue on FPGAs, Vol. 28, Issues 5-6, pp. 223-
236, Aug. 2004.
[7] H. Jonathan Chao and Bin Liu, “High Performance Switches and Routers,” ISBN:
9780470053676, John Wiley & Sons Publication, 2007.
[8] M. Alisafaee, S. M. Fakhraie, M. Tehranipoor,” Architecture of an Embedded Queue
Management Engine for High-Speed Network Devices” IEEE 48th Midwest Symposium on
Circuits and Systems, pp 1907 - 1910 Vol. 2, Auguest 2005
[9] Jindou Fan, Chengchen Hu, Bin Liu, “Experiences with Active Per-flow Queuing for
Traffic Manager in High Performance Routers,” Proc. Of IEEE ICC 2010, Cape Town, South
Africa, May 23-27, 2010.
[10] Chengchen Hu, Yi Tang, Xuefei Chen, Bin Liu” Per-flow Queueing by Dynamic Queue
Sharing”, 26th IEEE International Conference on Computer Communications. IEEE, pp.
1613 – 1621, May 2007
[11] Qi Zhang, Roger Woods and Alan Marshall:”A Scalable and Programmable Modular
Queue Manager Architecture” ECIT Institute, Queen’s University Belfast, Queen’s Road,
Queen’s Island, Belfast, BT3 9DT, N. Ireland
[12] I. Hadzic and J. M. Smith, “Balancing performance and flexibility with hardware
support for network architectures,” ACM Trans. Comput. Syst., vol. 21, no. 4, pp. 375–411,
Nov. 2003.
[13] Sriadibhatla Sridevi, Dr. Ravindra Dhuli and Prof. P. L. H. Varaprasad, “FPGA
Implementation Of Low Complexity Linear Periodically Time Varying Filter” International
journal of Electronics and Communication Engineering &Technology (IJECET), Volume 3,
Issue 1, 2012, pp. 130 - 138, Published by IAEME.
[14] Mrs.Bhavana L. Mahajan,Prof. Sampada Pimpale and Ms.Kshitija S. Patil, “FPGA
Implemented Ahb Protocol” International journal of Electronics and Communication
Engineering &Technology (IJECET), Volume 3, Issue 3, 2012, pp. 162 - 169, Published by
IAEME.

84

IJECET: FPGA Implementation of Scalable Queue Manager

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à IJECET: FPGA Implementation of Scalable Queue Manager

Similaire à IJECET: FPGA Implementation of Scalable Queue Manager (20)

Plus de iaemedu

Plus de iaemedu (20)

IJECET: FPGA Implementation of Scalable Queue Manager