Dr. Konstantinos Giannoutakis presents the CloudLightning simulator, a bespoke cloud simulation engine built for modelling and simulating heterogeneous resources as well as self-organising systems.
This presentation was given at the CloudLightning Conference held in conjunction with NC4 2017 in Dublin City University on 11th April 2017.
2. Overview • Cloud simulation & Cloud simulation frameworks
• Limitations and requirements for hyper-scale simulations
• Design of CloudLightning simulator
• Extensibility
• Graphical User Interface
3. Cloud
Simulation
• Cloud simulation is an essential tool for understanding the impact
of different technologies or hardware integrated into modern
Clouds or diverse workloads caused by new applications migrating
to Clouds
• Cloud simulators are divided in two major categories:
- Discrete Event Simulators (DES): Avoid building and
processing small simulation objects (like packets). Instead, the
effect of object interaction is captured. Examples: CloudSim,
CloudAnalyst, CloudShed, DCSim, GDCSim, iCanCloud.
- Packet Level Simulators (PLS): Whenever a data message
has to be transmitted between simulator entities a packet
structure with its protocol headers is allocated in the memory
and all the associated protocol processing is performed.
Examples: NetworkCloudSim, GreenCloud.
4. Cloud
Simulation
frameworks Simulator GUI VMs Scheduling Network Energy Parallel
CloudAnalyst x x x L
CloudSim x x L L
CloudSched x x x L L
DCSim x x L
GDCSim x x L
GreenCloud L x x
iCanCloud x x x x x x
NetworkCloudSim x x x L x
L: Limited support x: Full support
Table 1: Characteristics of Cloud simulators
5. Cloud
Simulation
frameworks Simulator CPU Network Memory Storage PSU
Cooling
model
CloudSim x
CloudSched x
DCSim x
GDCSim x x
GreenCloud x x x x
iCanCloud x x x x x
Table 2: Energy consumption models of Cloud simulators
6. Cloud
Simulation
• Packet level simulators cannot be used for hyper-scale
simulations, since they simulate exhaustively certain components
of the Cloud (i.e. Network in a packet level basis (micro-scale)).
• DES simulators build a list of events, sort the list and then compute
the effect of each event on the system. The list is not populated by
packet-level events.
• This substantially increases the computational performance and
allows for large scale simulations.
• However, simulating only macro scale events results in loss of
accuracy, since various parameters are neglected.
• DES simulators are suited for evaluating scheduling policies,
energy consumption, throughput of tasks, effect of live-migration,
etc.
7. Cloud
Simulation
• The most popular DES Cloud simulator is CloudSim (based on
SimJava).
• CloudSim has been used to simulate Cloud environments with up
to 100000 servers.
• Various toolboxes have been built to simulate network and storage
in CloudSim, however, they are separate and do not interfere with
the Cloud itself. In example, NetworkCloudSim simulates only the
network, while StorageCloudSim only the storage separately from
CloudSim.
• Parallel versions have also been proposed, namely CloudSimEX
and Cloud2Sim and used for conducting multiple simulations
concurrently (CloudSimEX) or parallel (distributed memory)
simulation (Cloud2Sim).
8. DES
Simulators
• DES simulators, such as CloudSim, have some disadvantages.
• The running time for hyper-scale simulations (1000000 servers) is
substantial (measured in dozens of hours or even days) (Java is
also a problem).
• Dynamical components, such as SOSM components, cannot be
simulated easily, since these components might react even if no
event is happening on the system.
• Hybrid distributed memory parallel systems cannot be used
effectively to accelerate the simulation, since the timeline of events
needs to be separated into non-dependent subsets, which is a
very difficult task, when events are in the order of millions.
• No support for heterogeneous resources so far.
9. Requirements
for hyper-scale
simulations
What are the key requirements, that limit existing DES simulators, for
hyper-scale simulations?
• Very large amount of computations.
• Accurate models for power consumption based on adequate
interpolating models.
• Native parallel design in order to be able to execute in HPC
environments.
• Support for tasks that can span across multiple Virtual Machines
(VMs).
• Support for accelerators (GPU,MIC,DFE).
• The simulator should be designed in a language that is build for
high performance computations (i.e. C or C++).
With the above in mind, CloudLightning simulator has been build.
10. CL-Simulator • In order to design a simulator for large scale phenomena we can
borrow the design from large scale Engineering and Physics
simulations.
• These simulations are based on a time-advancing loop with
prescribed time granularity.
• The time advances from t0 = 0 to tend with a prescribed sampling
step tstep. (in seconds, milliseconds, etc.).
• This design enables for integration of dynamical components,
since the state of these components can be updated with respect
to tstep.
• This time-stepping approach allows for a dynamic resolution of the
results, since a large time-step will only reveal a coarse picture of
the system, while a small time-step will reveal more fine
interactions.
12. Parallelization • Modern parallel systems are composed of a network of multi-core
nodes, which are partitioned into clusters according to requests
from the Job scheduler.
13. Simulation
models
• Power consumption models for
- CPUs
- GPUs
- MICs
- FPGAs
• Network and storage
- Treated as resources
- Task dependent penalties can be applied to the execution of a
task
• Execution model
- Space-Time shared model
15. Design and
extensibility
• The selected decomposed approach enables for easy extension of
the simulator.
• The extension procedure requires only to insert methods to the
appropriate class. In example, a new power consumption model
can be inserted in the Power Consumption component.
• Adding models can be performed with minimal interaction with the
source code.
• The addition of a new component, in example, a second statistics
engine, requires designing the new class, updating the Cell class
to include it and add the update procedure in the Update and
Statistics Engine.
• Finally, the MPI is responsible for scaling across Compute Nodes,
while OpenMP is responsible for scaling update and search
procedures across the available Cores of a compute node.