SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
A Scalable Network Monitoring and Bandwidth
Throttling System for Cloud Computing
N.F. Huysamen and A.E. Krzesinski
Department of Mathematical Sciences
University of Stellenbosch
7600 Stellenbosch, South Africa
Email: Nico.Huysamen@bsg.co.za, aek1@cs.sun.ac.za.
Abstract—With the increase in the demand for affordable,
high capacity processing power, the use of Cloud Computing
systems such as the Amazon Elastic Compute Cloud (EC2) has
increased significantly. This also increases the potential for users
to intentionally, to accidentally, abuse such systems. We present a
scalable Network Bandwidth Monitoring and Throttling System,
which monitors the users on the cloud, identifies those users
abusing the available, non-local bandwidth and throttles their
available bandwidth so as to normalize the network usage.
I. INTRODUCTION
This paper gives a brief overview of cloud computing
and describes a scalable Network Monitoring and Throttling
System. It explains which algorithms and data structures were
used so as to make the system scalable and efficient. We iden-
tify the external libraries we used and give a brief description
of their workings. We present a simulation framework that we
developed to test the throttling system on a large scale. We
conclude by presenting our findings, and we show that the
system provides a scalable solution for the bandwidth abuse
problem. The project was initiated by Amazon.com, and all
references to the infrastructure and/or working of a cloud refer
to the Amazon Elastic Compute cloud (EC2) [1]. Much of the
detail concerning the infrastructure of EC2 was not disclosed
to us.
II. CLOUD COMPUTING
A. Overview
When working with cloud computing systems, a cloud
should not be confused with a grid. A grid is a form of
distributed computing where a single or virtual computing unit
consists of a cluster of computers connected via a network and
working together to perform, usually very large, computations
or tasks. A cloud on the other hand, is also a cluster of
computers connected via a network, but the computers do
not usually work together to perform a single task. In the
commercial domain, a cloud is mainly used as follows. A
user, be it an individual or a company, buys computational
power from the cloud which is presented in the form of a
subset of machines from the cloud. With the progression in
virtualization technology, the user is more often presented
AEK is supported by grant numbers 2054027 and 2677 from the South
African National Research Foundation, Nokia-Siemens Networks and Telkom
SA Limited.
with a number of virtual machines running on the physical
machines in the cloud. The set of virtual machines presented to
the user can be scattered across the cloud and is not restricted
by physical proximity.
The abstraction in the working of the cloud has played an
important role in the increase of cloud computing popularity.
Since all the technical details are hidden from the user, who is
presented with a number of virtual machines, any individual
or company can benefit from this technology without having
any knowledge of the underlying workings or architecture of
the cloud.
B. The EC2 Infrastructure
The following information about the EC2 infrastructure was
provided to us. The EC2 system consists of many physical
machines, located in the USA and Europe [1]. At each of these
locations, the machines are partitioned into smaller clusters.
Each cluster consists of an undisclosed number of physical
machines that are in the same physical location.
The EC2 system makes use of virtualization technology
and each physical machine runs an undisclosed number of
virtual machines. This enables users (the term “user” refers to
any entity, be it individual or company, that has purchased
computational power on the cloud) to load their operating
system of choice onto their virtual machines. The processing
power bought by the users is presented in units of EC2
Compute Units and Virtual Cores. Each virtual machine has a
single network interface through which it accesses the Internet.
All the virtual machines on a single physical machine share
the physical network interface of the machine.
Each cluster of physical machines has a single aggregation
router that links it to the second level of the EC2 infrastructure.
Sets of aggregation routers connect to higher level routers. The
number of levels in the infrastructure was not disclosed. At the
highest level, the routers link up with the EC2 local loop. This
loop has three high-speed connections to the Internet. Note that
the infrastructure of the EC2 does not allow the broadcasting
of packets.
III. THE IMPLEMENTATION OF THE MONITORING SYSTEM
The monitoring system is based on the client server model.
Each virtual machine in the cloud runs the client application,
while one server is run for each cluster of physical machines
reporting to a single aggregation router. The reason for running
multiple servers is two-fold. First it reduces the workload on
each server. Second, by examining each cluster individually,
and throttling the bandwidth for each cluster separately, the
network usage of the entire cloud will also be controlled. As
mentioned above, since broadcasting is not allowed, the clients
have to be set up manually to know which server is responsible
for it.
A user can interact with the cloud either via a web interface,
or the SSH protocol. Since each physical machine may run
multiple virtual machines, there is no guarantee that all the
virtual machine instances on any physical machine belong to
the same user. By requiring each virtual machine to monitor
its own traffic, the distinction between users is automatically
made.
A. Restrictions
A restriction which had a major impact on the design of the
monitoring system was that we were instructed by Amazon
that the entire system must run on the virtual machines.
We had no access to the underlying operating system and/or
architecture and we were not given permission to access the
physical hardware that the virtual machines run on. We were
presented with the same interface that a normal user of the
cloud would have.
This restriction limits us to an implementation where the
users of the cloud must install the application on their indi-
vidual instances (which can be forced by policy). Since this
was a limitation placed on us by Amazon, investigating mech-
anisms to implement the same algorithms on the underlying
architecture falls beyond the scope of this project, and could
be the subject of further research.
B. Scalability and Efficiency
The major design consideration was the scalability of the
monitoring system. This stems from the fact that the number
of nodes (the term “node” refers to a single physical machine
in the cloud) in the cloud typically ranges in the region of
thousands to tens of thousands. This implies that the number
of clients (the term “client” refers to a client of a single virtual
machine) is in the range of millions. We therefore need an
efficient and scalable way to parse, interpret, store and recall
the data that we gather from the clients. We achieve this by
choosing appropriate algorithms and data structures for the
monitoring system.
A second consideration is the volume of data processed
by the servers. Since all the servers frequently receive large
amounts of data from the clients that they are responsible for,
the decision was made to store all the data in memory. If
the data were stored on disk this would make for too much
overhead in disk seek and read times, and thus reduce the
efficiency of the software. We also had to store the minimum
amount of data required by the server to be able to make
useful decisions. We discuss later how we relieve some of the
processing of data that the server needs to do, by delegating
it to the clients.
C. The Client Software
The client application is a small program that runs on each
virtual machine. It collects data on the non-local network
usage of the virtual machine it is run on, and sends update
information to the server responsible for its cluster domain.
In the event that the server decides to throttle specific users,
each client belonging to those users will receive a message
from the server instructing it to start limiting the available
bandwidth to a level determined by the server of that virtual
machine. The client will then run a script (which updates the
operating system’s network queueing disciplines using tc) to
limit the bandwidth of the virtual machine to the level defined
by the server. After an amount of time has passed (specified in
a configuration file, see below) the bandwidth limit is lifted.
1) Design Considerations: A major considerations was to
design the client application to have the smallest possible
resource usage. The client application should run on the virtual
machines without adversely affecting the performance of the
user programs. The client application does not need to store
much data or process much information. This works to our
benefit, as we can assign some functionality to the client to
pre-process its data (e.g. calculating the average kilobytes per
second rather than the number of packets, which is the raw data
we receive, or bytes per second, which can easily be obtained
from the raw packet data) before sending it to the server. This
relieves some of the processing load of the server.
2) Configuration: When the client application starts to exe-
cute, it needs certain configuration details in order to function
properly. Some of these configuration variables have default
values. However, most of the values need to be provided by
the network administrator. This gives greater control to the
network administrator as to how he/she would like the system
to behave, which levels of upload and download bandwidth
usage to enforce and how many users to monitor. The layout
of the configuration file is beyond the scope of this article.
D. The Server Software
The server, in contrast to the client, runs on a single,
dedicated virtual machine. One server is necessary per cluster
of nodes. The server gathers and aggregates usage informa-
tion from all the client applications that it is responsible
for (the server’s cluster domain). It continuously checks the
total amount of non-local network traffic generated by the
clients. Once the upload and/or download threshold (set in
the configuration file) is reached, the server starts throttling
the users.
1) Design Considerations: The most important design con-
sideration was to make the server scalable and efficient. This
was done by choosing appropriate data structures to store
and process the information gathered from the clients. In
Section VI we report that we successfully tested the server
with up to a million simulated clients, which is much more
than the normal amount of clients per cluster. The number of
clients per cluster usually range in the number of hundreds,
not thousands or millions.
Another design consideration was the method of limiting the
bandwidth of the users. According to Amazon, when an abuse
of bandwidth occurs, it is usually caused by one, or at most
two users. A field was added to the server configuration file
to specify the number of top users to take into consideration
when throttling. This significantly reduces the processing load
of the server.
2) Data Structures: The server contains two main data
structures [2]. A hash map is used to store the usage data of
all the clients in the server’s domain. A hash map provides
amortized constant time for the two most used operations,
namely insertion and searching. A hash map has a slower
insertion time than some other structures, but since we only
insert each client object once, a hash map adequately meets
our needs.
The second data structure stores references to the top
bandwidth using users. We used the Java implementation of a
priority queue, which is an implementation of a priority based
heap structure. This data structure is appropriate since we only
need to store a small number n of object references in a sorted
manner. The priority queue provides constant time to find the
smallest element, O(log n) time to insert an element, O(log n)
time to remove an element and O(log n) time to remove the
smallest element and reorder the queue.
3) Throttling: When the upload and/or download thresholds
are exceeded, the server calculates the rate (in kilobytes per
second) to which each client must be throttled. This is done
per user for the top users being monitored. If any throttling
needs to take place, the server calculates this value and sends
it to all the clients belonging to the respective abusive users.
The throttled rate λi of each client of user i is given by
λi = (τi − δτi/
i
τi)/ni
where τi = j τij is the current rate of user i, τij is the
current rate of client j of user i, ni denotes the number of
clients of user i and δ denotes the difference between the
actual bandwidth usage and the threshold level.
4) Configuration: As with the client, the server reads a
configuration file during application start-up. This file contains
information regarding the set-up of the server. Some of the
variables have default values. The configuration file gives
the network administrator control over the behaviour of the
system. The layout of the configuration file is beyond the scope
of this article.
IV. THE SIMULATOR – SIMNET
A simulation framework was developed to test the monitor-
ing system. The account we received from Amazon on the EC2
limited us to the use of twenty clients. This, while sufficient
to test the basic working of the monitor, was not sufficient to
test the scalability of the monitoring system.
We developed SimNet, a real-time simulation framework
that enabled us to test the monitoring system in an environment
with more than a million simulated clients.
A. Overview
The basic principles of the real-time simulator are the same
as those of a discrete-time simulator. Our simulator has a
calendar containing a chronologically ordered list of references
to actors. Each actor has a specific time at which its action
must take place. Whereas in a discrete-time simulation the
actor is removed from the front of the calendar and the
simulation time is advanced to coincide with the time of
occurrence of the dequeued actor, simulation time in our
simulator advances linearly: our simulator waits until the
simulation time coincides with the time at which the actor
must be executed. This simulates the elapsed time between
the updates sent by the clients to the server. When an actor is
removed from the calendar, the handler method of the actor
is executed which sends a simulated update to the server and
the actor schedules itself into the calendar. This process is
repeated for the lifetime of the simulation.
B. Design Considerations
Two major considerations were taken into account when
designing the simulator. First, the simulator had to be robust
and had to be easily extensible to provide functionality for
other projects. In other words, we wanted to build a general
purpose real-time network simulation framework. We did this
by implementing the basic structure of the simulation in as
general a way as possible, and provide an interface to the user
from which to derive an application-specific actor class from
an abstract item class. This provides the necessary interface
to interact with the rest of the simulation framework. The
user can make the actors perform as required by designing
application-specific handler methods. This was proven to
work, and SimNet was successfully used in another project [8].
Second, the the simulator had to be as efficient as possible.
Since we are simulating a cloud computing environment, we
must to be able to simulate a very large number of actors.
Our simulations have successfully simulated more than a
million actors, at which stage the performance of the server
deteriorates.
V. EXTERNAL LIBRARIES AND APPLICATIONS
The monitoring system is written in the Java programming
language using the JDK version 1.6. All of the code apart
from standard libraries was written from scratch without
reference to existing code. Our implementation runs on a
Linux platform. We use one external library called Jpcap,
which is used for network packet capture and analysis. We
also use the tc program, which is a traffic control application
that is standard with most versions of Linux.
Jpcap [3] is a library for capturing and sending network
packets in Java. It is based on the libpcap [4] and WinPcap
[5] libraries, and as such any operating system that supports
libpcap or WinPcap will support Jpcap. Jpcap provides a
simple application programming interface (API) that allows
developers to incorporate the functionality of libpcap and
WinPcap in Java programs.
0
500000
1e+06
20 40 60 80 100
rateKB/s
time
down
threshold down
up
threshold up
(a) Without throttling
0
40000
80000
120000
160000
20 40 60 80 100
rateKB/s
time
max down
down
threshold down
max up
up
threshold up
(b) With throttling
Fig. 1. Simulation of 1 server and 1,000,000 clients
libpcap, or the Packet Capture Library, provides a high
level interface to packet capture systems. All packets on the
network are accessible through these libraries, even packets
not destined for the host [4]. WinPcap is a port of libpcap for
the Microsoft Windows platform.
tc is a command-line application, running in user space, that
can be used to configure the traffic control components in the
kernel [6, 7]. The Linux traffic control system determines the
shaping, scheduling policy and dropping of network traffic.
The configuration of the scheduling and queuing disciplines
can be altered with the tc utility. The usage of tc (and its
underlying working) is beyond the scope of this article. Our
client application uses the tc program “as-is” by running it
natively via a bash script.
VI. RESULTS
This Section presents initial results from simulations of
the monitoring system. We modified the server to output a
comma separated value (csv) file that contains entries con-
cerning bandwidth usage. These entries are the same that the
UsageAnalyser class reads periodically. We present only one
of the result sets. The simulation outputs present the following
information
• the observed (throttled) download and upload rates
• the download and upload rates if the server is disabled
(not throttled)
• the maximum download and upload rates
• the download and upload thresholds.
The simulator was used to model the scenario where server
manages 1 million clients. This is an unlikely scenario, but it
was chosen to demonstrate the scalability of the monitoring
system.
Fig. 1(a) displays the results that were obtained when the
clients were not throttled: the bandwidth usage of the mis-
behaving clients increases without bounds. Fig. 1(b) displays
the results that were obtained when the misbehaving clients
were throttled. The figures show that the cloud monitoring and
throttling system works reasonably well. There is a significant
reduction in the bandwidth usage of misbehaving clients, to
the extent that the bandwidth usage is maintained within the
specified limits enforced by the server. The values on the y-
axis are in kilobytes per second. The values i on the x-axis
denote instants in time ti when the system was measured.
The peaks in the bandwidth usage can be explained by the
time-out period of the server. If the bandwidth usages of the
clients are extremely large, and if the time-out value of the
server between bandwidth usage checks is relatively large, then
during the time-out phase the bandwidth usage of the clients
can exceed the maximum allowed bandwidth usage. At the
next check, the server observes that the bandwidth usage is
too large and starts throttling the misbehaving users, hence the
sudden drop in bandwidth usage. This can easily be avoided
by shortening the time between the bandwidth usage checks
done by the server. This can however be a potential problem
when working with very large clusters, as not enough time for
calculation is left to the server.
VII. CONCLUSION AND FUTURE WORK
This paper describes a scalable Network Monitoring and
Throttling System for cloud computing systems. It explains
which algorithms and data structures were used so as to make
the system scalable and efficient. We present a simulation
framework that was used to test the throttling system on a
large scale. Initial simulations show that the system provides
a scalable solution for detecting and constraining bandwidth
abuse.
The next step is to adapt the bandwidth monitor to execute
on a real cloud infrastructure. The bandwidth monitor should
work without modification if all the assumptions we made
regarding virtual machines and running in user-space hold true.
Otherwise, modifications will have to be made to allow the
clients to run on the underlying infrastructure.
The functionality of the simulator will also be extended.
Even though the simulator was developed to test the capabil-
ities of the monitoring system, it is in itself a useful software
system.
REFERENCES
[1] Amazon. Amazon Web Services, Amazon Elastic Compute Cloud,
http://aws.amazon.com/ec2
[2] M.T. Goodrich and R. Tamassia. Data Structures & Algorithms in
Java, 4th Edition.
[3] Jpcap. A library for capturing and sending network packets,
http://netresearch.ics.uci.edu/kfujii/jpcap/doc/index.html
[4] tcpdump/libpcap. http://www.tcpdump.org/,
http://www.tcpdump.org/pcap3 man.htm
[5] WinPcap. The Windows Packet Capture Library,
http://www.winpcap.org/
[6] W. Almesberger. Linux Traffic Control - Next Generation, 2002,
http://tcng.sourceforge.net/doc/tcng-overview.pdf
[7] M.P. Stanic. tc - traffic control, Linux QoS control tool, 2001,
http://www-igm.univ-mlv.fr/ badis/IR3/QoS/TP/linux-tc-english.pdf
[8] M.C. de Klerk and A.E. Krzesinski. Visualizing and Managing Cloud
Computing Networks, University of Stellenbosch, 2009.
Nico Huysamen obtained the BSc (Hons) degree in Computer Science from
the University of Stellenbosch, South Africa.
Anthony Krzesinski is a Professor of Computer Science at the University
of Stellenbosch, South Africa.

Contenu connexe

Tendances

Application of Virtualisation and CloudComputing for Development and Runtime ...
Application of Virtualisation and CloudComputing for Development and Runtime ...Application of Virtualisation and CloudComputing for Development and Runtime ...
Application of Virtualisation and CloudComputing for Development and Runtime ...Ingenieurbuero Arno-Can Uestuensoez
 
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...ijcsit
 
cloud computing final year project
cloud computing final year projectcloud computing final year project
cloud computing final year projectAmeya Vashishth
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...hrmalik20
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Unit 1.1 introduction to cloud computing
Unit 1.1   introduction to cloud computingUnit 1.1   introduction to cloud computing
Unit 1.1 introduction to cloud computingeShikshak
 
Overview of computing paradigm
Overview of computing paradigmOverview of computing paradigm
Overview of computing paradigmRipal Ranpara
 
Introduction of Cloud Computing By Pawan Thakur HOD CS & IT
Introduction of Cloud Computing By Pawan Thakur HOD CS & ITIntroduction of Cloud Computing By Pawan Thakur HOD CS & IT
Introduction of Cloud Computing By Pawan Thakur HOD CS & ITGovt. P.G. College Dharamshala
 
Synopsis on cloud computing by Prashant upta
Synopsis on cloud computing by Prashant uptaSynopsis on cloud computing by Prashant upta
Synopsis on cloud computing by Prashant uptaPrashant Gupta
 
Cyber forensics in cloud computing
Cyber forensics in cloud computingCyber forensics in cloud computing
Cyber forensics in cloud computingAlexander Decker
 
Openstack & why cloud for enterprise ppt
Openstack & why cloud for enterprise pptOpenstack & why cloud for enterprise ppt
Openstack & why cloud for enterprise pptAsmaa Ibrahim
 
Fundamental concepts and models
Fundamental concepts and modelsFundamental concepts and models
Fundamental concepts and modelsAsmaa Ibrahim
 
Seminar report on cloud computing
Seminar report on cloud computingSeminar report on cloud computing
Seminar report on cloud computingJagan Mohan Bishoyi
 

Tendances (19)

Cloud Technology_Concepts
Cloud Technology_ConceptsCloud Technology_Concepts
Cloud Technology_Concepts
 
Sensor Cloud
Sensor CloudSensor Cloud
Sensor Cloud
 
Application of Virtualisation and CloudComputing for Development and Runtime ...
Application of Virtualisation and CloudComputing for Development and Runtime ...Application of Virtualisation and CloudComputing for Development and Runtime ...
Application of Virtualisation and CloudComputing for Development and Runtime ...
 
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...
Distributed Cloud Computing Environment Enhanced With Capabilities For Wide-A...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
cloud computing final year project
cloud computing final year projectcloud computing final year project
cloud computing final year project
 
Latest Research Topics on Cloud Computing
Latest Research Topics on Cloud ComputingLatest Research Topics on Cloud Computing
Latest Research Topics on Cloud Computing
 
Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...Cloud Computing System models for Distributed and cloud computing & Performan...
Cloud Computing System models for Distributed and cloud computing & Performan...
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Unit 1.1 introduction to cloud computing
Unit 1.1   introduction to cloud computingUnit 1.1   introduction to cloud computing
Unit 1.1 introduction to cloud computing
 
Overview of computing paradigm
Overview of computing paradigmOverview of computing paradigm
Overview of computing paradigm
 
Introduction of Cloud Computing By Pawan Thakur HOD CS & IT
Introduction of Cloud Computing By Pawan Thakur HOD CS & ITIntroduction of Cloud Computing By Pawan Thakur HOD CS & IT
Introduction of Cloud Computing By Pawan Thakur HOD CS & IT
 
Synopsis on cloud computing by Prashant upta
Synopsis on cloud computing by Prashant uptaSynopsis on cloud computing by Prashant upta
Synopsis on cloud computing by Prashant upta
 
Understanding Cloud Computing
Understanding Cloud ComputingUnderstanding Cloud Computing
Understanding Cloud Computing
 
Cyber forensics in cloud computing
Cyber forensics in cloud computingCyber forensics in cloud computing
Cyber forensics in cloud computing
 
Openstack & why cloud for enterprise ppt
Openstack & why cloud for enterprise pptOpenstack & why cloud for enterprise ppt
Openstack & why cloud for enterprise ppt
 
Cluster and Grid Computing
Cluster and Grid ComputingCluster and Grid Computing
Cluster and Grid Computing
 
Fundamental concepts and models
Fundamental concepts and modelsFundamental concepts and models
Fundamental concepts and models
 
Seminar report on cloud computing
Seminar report on cloud computingSeminar report on cloud computing
Seminar report on cloud computing
 

En vedette

En vedette (11)

Residential apartments company brochure beglinwoods architects
Residential apartments company brochure beglinwoods architectsResidential apartments company brochure beglinwoods architects
Residential apartments company brochure beglinwoods architects
 
Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
มาทำความรู้จักเครื่องสำอาง
มาทำความรู้จักเครื่องสำอางมาทำความรู้จักเครื่องสำอาง
มาทำความรู้จักเครื่องสำอาง
 
HR Resume_Arjun Kumar
HR Resume_Arjun KumarHR Resume_Arjun Kumar
HR Resume_Arjun Kumar
 
Обувь Christian Louboutin
Обувь Christian LouboutinОбувь Christian Louboutin
Обувь Christian Louboutin
 
ib-pulse-check-executive-summary
ib-pulse-check-executive-summaryib-pulse-check-executive-summary
ib-pulse-check-executive-summary
 
Portafolio Visión Market
Portafolio Visión MarketPortafolio Visión Market
Portafolio Visión Market
 
StudentDaily(MyVNZ)
StudentDaily(MyVNZ)StudentDaily(MyVNZ)
StudentDaily(MyVNZ)
 
SUPERIOR-GUIDE-TO-NEW-ANSI-EN388-CUT-LEVELS
SUPERIOR-GUIDE-TO-NEW-ANSI-EN388-CUT-LEVELSSUPERIOR-GUIDE-TO-NEW-ANSI-EN388-CUT-LEVELS
SUPERIOR-GUIDE-TO-NEW-ANSI-EN388-CUT-LEVELS
 
Blogger y wiki
Blogger y wikiBlogger y wiki
Blogger y wiki
 
Servicios que google nos ofrece.
Servicios que google nos ofrece.Servicios que google nos ofrece.
Servicios que google nos ofrece.
 

Similaire à Scalable Network Monitoring and Bandwidth Throttling System for Cloud

Cloud ready reference
Cloud ready referenceCloud ready reference
Cloud ready referenceHelly Patel
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computingijccsa
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computingneirew J
 
Cloud computing
Cloud computingCloud computing
Cloud computingMisha Ali
 
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...acijjournal
 
e-suap cloud computing- English version
e-suap cloud computing- English versione-suap cloud computing- English version
e-suap cloud computing- English versionSabino Labarile
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computingAlexander Decker
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfAgaram Technologies
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...IEEEGLOBALSOFTTECHNOLOGIES
 
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...IJTET Journal
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introductionguest90f660
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud ComputingLiming Liu
 

Similaire à Scalable Network Monitoring and Bandwidth Throttling System for Cloud (20)

H017113842
H017113842H017113842
H017113842
 
Cloud ready reference
Cloud ready referenceCloud ready reference
Cloud ready reference
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computing
 
Implementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud ComputingImplementation of the Open Source Virtualization Technologies in Cloud Computing
Implementation of the Open Source Virtualization Technologies in Cloud Computing
 
cloud computing basics
cloud computing basicscloud computing basics
cloud computing basics
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
unit 4-1.pptx
unit 4-1.pptxunit 4-1.pptx
unit 4-1.pptx
 
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...
DYNAMIC ALLOCATION METHOD FOR EFFICIENT LOAD BALANCING IN VIRTUAL MACHINES FO...
 
e-suap cloud computing- English version
e-suap cloud computing- English versione-suap cloud computing- English version
e-suap cloud computing- English version
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computing
 
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
 
SOFTWARE COMPUTING
SOFTWARE COMPUTINGSOFTWARE COMPUTING
SOFTWARE COMPUTING
 
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
A Secure Cloud Storage System with Data Forwarding using Proxy Re-encryption ...
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introduction
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
106248842 cc
106248842 cc106248842 cc
106248842 cc
 

Scalable Network Monitoring and Bandwidth Throttling System for Cloud

  • 1. A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South Africa Email: Nico.Huysamen@bsg.co.za, aek1@cs.sun.ac.za. Abstract—With the increase in the demand for affordable, high capacity processing power, the use of Cloud Computing systems such as the Amazon Elastic Compute Cloud (EC2) has increased significantly. This also increases the potential for users to intentionally, to accidentally, abuse such systems. We present a scalable Network Bandwidth Monitoring and Throttling System, which monitors the users on the cloud, identifies those users abusing the available, non-local bandwidth and throttles their available bandwidth so as to normalize the network usage. I. INTRODUCTION This paper gives a brief overview of cloud computing and describes a scalable Network Monitoring and Throttling System. It explains which algorithms and data structures were used so as to make the system scalable and efficient. We iden- tify the external libraries we used and give a brief description of their workings. We present a simulation framework that we developed to test the throttling system on a large scale. We conclude by presenting our findings, and we show that the system provides a scalable solution for the bandwidth abuse problem. The project was initiated by Amazon.com, and all references to the infrastructure and/or working of a cloud refer to the Amazon Elastic Compute cloud (EC2) [1]. Much of the detail concerning the infrastructure of EC2 was not disclosed to us. II. CLOUD COMPUTING A. Overview When working with cloud computing systems, a cloud should not be confused with a grid. A grid is a form of distributed computing where a single or virtual computing unit consists of a cluster of computers connected via a network and working together to perform, usually very large, computations or tasks. A cloud on the other hand, is also a cluster of computers connected via a network, but the computers do not usually work together to perform a single task. In the commercial domain, a cloud is mainly used as follows. A user, be it an individual or a company, buys computational power from the cloud which is presented in the form of a subset of machines from the cloud. With the progression in virtualization technology, the user is more often presented AEK is supported by grant numbers 2054027 and 2677 from the South African National Research Foundation, Nokia-Siemens Networks and Telkom SA Limited. with a number of virtual machines running on the physical machines in the cloud. The set of virtual machines presented to the user can be scattered across the cloud and is not restricted by physical proximity. The abstraction in the working of the cloud has played an important role in the increase of cloud computing popularity. Since all the technical details are hidden from the user, who is presented with a number of virtual machines, any individual or company can benefit from this technology without having any knowledge of the underlying workings or architecture of the cloud. B. The EC2 Infrastructure The following information about the EC2 infrastructure was provided to us. The EC2 system consists of many physical machines, located in the USA and Europe [1]. At each of these locations, the machines are partitioned into smaller clusters. Each cluster consists of an undisclosed number of physical machines that are in the same physical location. The EC2 system makes use of virtualization technology and each physical machine runs an undisclosed number of virtual machines. This enables users (the term “user” refers to any entity, be it individual or company, that has purchased computational power on the cloud) to load their operating system of choice onto their virtual machines. The processing power bought by the users is presented in units of EC2 Compute Units and Virtual Cores. Each virtual machine has a single network interface through which it accesses the Internet. All the virtual machines on a single physical machine share the physical network interface of the machine. Each cluster of physical machines has a single aggregation router that links it to the second level of the EC2 infrastructure. Sets of aggregation routers connect to higher level routers. The number of levels in the infrastructure was not disclosed. At the highest level, the routers link up with the EC2 local loop. This loop has three high-speed connections to the Internet. Note that the infrastructure of the EC2 does not allow the broadcasting of packets. III. THE IMPLEMENTATION OF THE MONITORING SYSTEM The monitoring system is based on the client server model. Each virtual machine in the cloud runs the client application, while one server is run for each cluster of physical machines
  • 2. reporting to a single aggregation router. The reason for running multiple servers is two-fold. First it reduces the workload on each server. Second, by examining each cluster individually, and throttling the bandwidth for each cluster separately, the network usage of the entire cloud will also be controlled. As mentioned above, since broadcasting is not allowed, the clients have to be set up manually to know which server is responsible for it. A user can interact with the cloud either via a web interface, or the SSH protocol. Since each physical machine may run multiple virtual machines, there is no guarantee that all the virtual machine instances on any physical machine belong to the same user. By requiring each virtual machine to monitor its own traffic, the distinction between users is automatically made. A. Restrictions A restriction which had a major impact on the design of the monitoring system was that we were instructed by Amazon that the entire system must run on the virtual machines. We had no access to the underlying operating system and/or architecture and we were not given permission to access the physical hardware that the virtual machines run on. We were presented with the same interface that a normal user of the cloud would have. This restriction limits us to an implementation where the users of the cloud must install the application on their indi- vidual instances (which can be forced by policy). Since this was a limitation placed on us by Amazon, investigating mech- anisms to implement the same algorithms on the underlying architecture falls beyond the scope of this project, and could be the subject of further research. B. Scalability and Efficiency The major design consideration was the scalability of the monitoring system. This stems from the fact that the number of nodes (the term “node” refers to a single physical machine in the cloud) in the cloud typically ranges in the region of thousands to tens of thousands. This implies that the number of clients (the term “client” refers to a client of a single virtual machine) is in the range of millions. We therefore need an efficient and scalable way to parse, interpret, store and recall the data that we gather from the clients. We achieve this by choosing appropriate algorithms and data structures for the monitoring system. A second consideration is the volume of data processed by the servers. Since all the servers frequently receive large amounts of data from the clients that they are responsible for, the decision was made to store all the data in memory. If the data were stored on disk this would make for too much overhead in disk seek and read times, and thus reduce the efficiency of the software. We also had to store the minimum amount of data required by the server to be able to make useful decisions. We discuss later how we relieve some of the processing of data that the server needs to do, by delegating it to the clients. C. The Client Software The client application is a small program that runs on each virtual machine. It collects data on the non-local network usage of the virtual machine it is run on, and sends update information to the server responsible for its cluster domain. In the event that the server decides to throttle specific users, each client belonging to those users will receive a message from the server instructing it to start limiting the available bandwidth to a level determined by the server of that virtual machine. The client will then run a script (which updates the operating system’s network queueing disciplines using tc) to limit the bandwidth of the virtual machine to the level defined by the server. After an amount of time has passed (specified in a configuration file, see below) the bandwidth limit is lifted. 1) Design Considerations: A major considerations was to design the client application to have the smallest possible resource usage. The client application should run on the virtual machines without adversely affecting the performance of the user programs. The client application does not need to store much data or process much information. This works to our benefit, as we can assign some functionality to the client to pre-process its data (e.g. calculating the average kilobytes per second rather than the number of packets, which is the raw data we receive, or bytes per second, which can easily be obtained from the raw packet data) before sending it to the server. This relieves some of the processing load of the server. 2) Configuration: When the client application starts to exe- cute, it needs certain configuration details in order to function properly. Some of these configuration variables have default values. However, most of the values need to be provided by the network administrator. This gives greater control to the network administrator as to how he/she would like the system to behave, which levels of upload and download bandwidth usage to enforce and how many users to monitor. The layout of the configuration file is beyond the scope of this article. D. The Server Software The server, in contrast to the client, runs on a single, dedicated virtual machine. One server is necessary per cluster of nodes. The server gathers and aggregates usage informa- tion from all the client applications that it is responsible for (the server’s cluster domain). It continuously checks the total amount of non-local network traffic generated by the clients. Once the upload and/or download threshold (set in the configuration file) is reached, the server starts throttling the users. 1) Design Considerations: The most important design con- sideration was to make the server scalable and efficient. This was done by choosing appropriate data structures to store and process the information gathered from the clients. In Section VI we report that we successfully tested the server with up to a million simulated clients, which is much more than the normal amount of clients per cluster. The number of clients per cluster usually range in the number of hundreds, not thousands or millions.
  • 3. Another design consideration was the method of limiting the bandwidth of the users. According to Amazon, when an abuse of bandwidth occurs, it is usually caused by one, or at most two users. A field was added to the server configuration file to specify the number of top users to take into consideration when throttling. This significantly reduces the processing load of the server. 2) Data Structures: The server contains two main data structures [2]. A hash map is used to store the usage data of all the clients in the server’s domain. A hash map provides amortized constant time for the two most used operations, namely insertion and searching. A hash map has a slower insertion time than some other structures, but since we only insert each client object once, a hash map adequately meets our needs. The second data structure stores references to the top bandwidth using users. We used the Java implementation of a priority queue, which is an implementation of a priority based heap structure. This data structure is appropriate since we only need to store a small number n of object references in a sorted manner. The priority queue provides constant time to find the smallest element, O(log n) time to insert an element, O(log n) time to remove an element and O(log n) time to remove the smallest element and reorder the queue. 3) Throttling: When the upload and/or download thresholds are exceeded, the server calculates the rate (in kilobytes per second) to which each client must be throttled. This is done per user for the top users being monitored. If any throttling needs to take place, the server calculates this value and sends it to all the clients belonging to the respective abusive users. The throttled rate λi of each client of user i is given by λi = (τi − δτi/ i τi)/ni where τi = j τij is the current rate of user i, τij is the current rate of client j of user i, ni denotes the number of clients of user i and δ denotes the difference between the actual bandwidth usage and the threshold level. 4) Configuration: As with the client, the server reads a configuration file during application start-up. This file contains information regarding the set-up of the server. Some of the variables have default values. The configuration file gives the network administrator control over the behaviour of the system. The layout of the configuration file is beyond the scope of this article. IV. THE SIMULATOR – SIMNET A simulation framework was developed to test the monitor- ing system. The account we received from Amazon on the EC2 limited us to the use of twenty clients. This, while sufficient to test the basic working of the monitor, was not sufficient to test the scalability of the monitoring system. We developed SimNet, a real-time simulation framework that enabled us to test the monitoring system in an environment with more than a million simulated clients. A. Overview The basic principles of the real-time simulator are the same as those of a discrete-time simulator. Our simulator has a calendar containing a chronologically ordered list of references to actors. Each actor has a specific time at which its action must take place. Whereas in a discrete-time simulation the actor is removed from the front of the calendar and the simulation time is advanced to coincide with the time of occurrence of the dequeued actor, simulation time in our simulator advances linearly: our simulator waits until the simulation time coincides with the time at which the actor must be executed. This simulates the elapsed time between the updates sent by the clients to the server. When an actor is removed from the calendar, the handler method of the actor is executed which sends a simulated update to the server and the actor schedules itself into the calendar. This process is repeated for the lifetime of the simulation. B. Design Considerations Two major considerations were taken into account when designing the simulator. First, the simulator had to be robust and had to be easily extensible to provide functionality for other projects. In other words, we wanted to build a general purpose real-time network simulation framework. We did this by implementing the basic structure of the simulation in as general a way as possible, and provide an interface to the user from which to derive an application-specific actor class from an abstract item class. This provides the necessary interface to interact with the rest of the simulation framework. The user can make the actors perform as required by designing application-specific handler methods. This was proven to work, and SimNet was successfully used in another project [8]. Second, the the simulator had to be as efficient as possible. Since we are simulating a cloud computing environment, we must to be able to simulate a very large number of actors. Our simulations have successfully simulated more than a million actors, at which stage the performance of the server deteriorates. V. EXTERNAL LIBRARIES AND APPLICATIONS The monitoring system is written in the Java programming language using the JDK version 1.6. All of the code apart from standard libraries was written from scratch without reference to existing code. Our implementation runs on a Linux platform. We use one external library called Jpcap, which is used for network packet capture and analysis. We also use the tc program, which is a traffic control application that is standard with most versions of Linux. Jpcap [3] is a library for capturing and sending network packets in Java. It is based on the libpcap [4] and WinPcap [5] libraries, and as such any operating system that supports libpcap or WinPcap will support Jpcap. Jpcap provides a simple application programming interface (API) that allows developers to incorporate the functionality of libpcap and WinPcap in Java programs.
  • 4. 0 500000 1e+06 20 40 60 80 100 rateKB/s time down threshold down up threshold up (a) Without throttling 0 40000 80000 120000 160000 20 40 60 80 100 rateKB/s time max down down threshold down max up up threshold up (b) With throttling Fig. 1. Simulation of 1 server and 1,000,000 clients libpcap, or the Packet Capture Library, provides a high level interface to packet capture systems. All packets on the network are accessible through these libraries, even packets not destined for the host [4]. WinPcap is a port of libpcap for the Microsoft Windows platform. tc is a command-line application, running in user space, that can be used to configure the traffic control components in the kernel [6, 7]. The Linux traffic control system determines the shaping, scheduling policy and dropping of network traffic. The configuration of the scheduling and queuing disciplines can be altered with the tc utility. The usage of tc (and its underlying working) is beyond the scope of this article. Our client application uses the tc program “as-is” by running it natively via a bash script. VI. RESULTS This Section presents initial results from simulations of the monitoring system. We modified the server to output a comma separated value (csv) file that contains entries con- cerning bandwidth usage. These entries are the same that the UsageAnalyser class reads periodically. We present only one of the result sets. The simulation outputs present the following
  • 5. information • the observed (throttled) download and upload rates • the download and upload rates if the server is disabled (not throttled) • the maximum download and upload rates • the download and upload thresholds. The simulator was used to model the scenario where server manages 1 million clients. This is an unlikely scenario, but it was chosen to demonstrate the scalability of the monitoring system. Fig. 1(a) displays the results that were obtained when the clients were not throttled: the bandwidth usage of the mis- behaving clients increases without bounds. Fig. 1(b) displays the results that were obtained when the misbehaving clients were throttled. The figures show that the cloud monitoring and throttling system works reasonably well. There is a significant reduction in the bandwidth usage of misbehaving clients, to the extent that the bandwidth usage is maintained within the specified limits enforced by the server. The values on the y- axis are in kilobytes per second. The values i on the x-axis denote instants in time ti when the system was measured. The peaks in the bandwidth usage can be explained by the time-out period of the server. If the bandwidth usages of the clients are extremely large, and if the time-out value of the server between bandwidth usage checks is relatively large, then during the time-out phase the bandwidth usage of the clients can exceed the maximum allowed bandwidth usage. At the next check, the server observes that the bandwidth usage is too large and starts throttling the misbehaving users, hence the sudden drop in bandwidth usage. This can easily be avoided by shortening the time between the bandwidth usage checks done by the server. This can however be a potential problem when working with very large clusters, as not enough time for calculation is left to the server. VII. CONCLUSION AND FUTURE WORK This paper describes a scalable Network Monitoring and Throttling System for cloud computing systems. It explains which algorithms and data structures were used so as to make the system scalable and efficient. We present a simulation framework that was used to test the throttling system on a large scale. Initial simulations show that the system provides a scalable solution for detecting and constraining bandwidth abuse. The next step is to adapt the bandwidth monitor to execute on a real cloud infrastructure. The bandwidth monitor should work without modification if all the assumptions we made regarding virtual machines and running in user-space hold true. Otherwise, modifications will have to be made to allow the clients to run on the underlying infrastructure. The functionality of the simulator will also be extended. Even though the simulator was developed to test the capabil- ities of the monitoring system, it is in itself a useful software system. REFERENCES [1] Amazon. Amazon Web Services, Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2 [2] M.T. Goodrich and R. Tamassia. Data Structures & Algorithms in Java, 4th Edition. [3] Jpcap. A library for capturing and sending network packets, http://netresearch.ics.uci.edu/kfujii/jpcap/doc/index.html [4] tcpdump/libpcap. http://www.tcpdump.org/, http://www.tcpdump.org/pcap3 man.htm [5] WinPcap. The Windows Packet Capture Library, http://www.winpcap.org/ [6] W. Almesberger. Linux Traffic Control - Next Generation, 2002, http://tcng.sourceforge.net/doc/tcng-overview.pdf [7] M.P. Stanic. tc - traffic control, Linux QoS control tool, 2001, http://www-igm.univ-mlv.fr/ badis/IR3/QoS/TP/linux-tc-english.pdf [8] M.C. de Klerk and A.E. Krzesinski. Visualizing and Managing Cloud Computing Networks, University of Stellenbosch, 2009. Nico Huysamen obtained the BSc (Hons) degree in Computer Science from the University of Stellenbosch, South Africa. Anthony Krzesinski is a Professor of Computer Science at the University of Stellenbosch, South Africa.