1. Integration of a batch scheduler
with a private cloud solution
Internship project:
Research Computing & Facilitating Services,
University College London
Riccardo Samperna
riccardo.samperna.13@ucl.ac.uk
2. The aim of the project
Investigate solutions to deploy an open source private
cloud inside the Legion cluster.
Integrate the batch scheduler already in use to allow
researchers to submit instances of virtual machines
and being able to work on different operating
systems.
3. 1. Preliminary steps
Internship itinerary
2. Hacking the virtualization
3. Cloud platform choice
5. Conclusions and results
4. Deployment and configuration
4. Set up test environment: two machines installed with
centOS 6.5 in order to be able to reproduce the
software running in the Legion platform.
Preliminary steps
Learn about new technologies: a lot of
documentation and research papers in particular about
virtualization, hypervisors, batch schedulers.
5. During my first weeks I was advised to read about
another similar project at the University of Victoria.
They realized Cloud Scheduler, a cloud-enabled
distributed resource manager backend that manages
virtual machines on clouds like Nimbus, OpenNebula,
Eucalyptus or EC2 to create an environment for batch
job execution.
Investigate similar solutions
It is based on the Torque batch scheduler.
6.
7. 1. Preliminary steps
Internship itinerary
2. Hacking the virtualization
3. Cloud platform choice
5. Conclusions and results
4. Deployment and configuration
8. In the first stage of the research I realized a
prototype of the project without using cloud software
solutions.
Hacking the virtualization
This step helped me to have a better understanding of
the underlying mechanism and to familiarize with the
Grid Engine batch scheduler.
10. VMs on Grid Engine
Starter_method: checks if jobs require the creation of
virtual machines, if so, it creates a virtual machine
using Vagrant.
Prolog: runs the job submitted from the user inside
the virtual environment
Epilog: cleans up the virtual environment and collect
the results of the job.
11.
12. 1. Preliminary steps
Internship itinerary
2. Hacking the virtualization
3. Cloud platform choice
5. Conclusions and results
4. Deployment and configuration
13. According to the official NIST definition:
Cloud computing
Cloud Computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared pool of
configurable computing resources that can be rapidly
provisioned and released with minimal management effort or
service provider interaction.
The cloud model is composed of four deployment
models.
14. A private cloud is a particular deployment model of
cloud computing.
Private cloud
The cloud (the pool of resource) is only accessible by
a single organisation providing that organisation with
greater control and privacy.
16. OpenStack is a free and open-source cloud computing
software platform.
Openstack
The technology consists of a series of interrelated
projects that control pools of processing, storage, and
networking resources throughout a data center—which
users manage through a web-based dashboard,
command-line tools, or a RESTful API.
17.
18. Eucalyptus is the acronym for Elastic Utility
Computing Architecture for Linking Your Programs
To Useful Systems.
Eucalyptus
It enables pooling compute, storage, and network
resources that can be dynamically scaled up or down
as application workloads change.
Amazon Web Services compatible.
21. The Nimbus Infrastructure is an open source EC2/S3-
compatible Infrastructure-as-a-Service
implementation.
About Nimbus
It specifically targets features of interest to the
scientific community such as support for proxy
credentials, batch schedulers, best-effort allocations
and others.
22. Three main components:
Nimbus architecture
- Nimbus IaaS/Cumulus: master node and repository
of the infrastructure.
- VMMs: Virtual Machine Managers and executor
nodes.
- Cloud client.
23.
24. Support for batch scheduler integration;
So… Why Nimbus?
● Reduced complexity compared to the other
available solutions;
● Specifically target the scientific community;
25. 1. Preliminary steps
Internship itinerary
2. Hacking the virtualization
3. Cloud platform choice
5. Conclusions and results
4. Deployment and configuration
26. For the deployment of the cloud in my test
environment I decided to install the Nimbus
Iaas/Cumulus in one machine. I installed the VMM
and the cloud client in the second machine.
Deployment and configuration
I used a similar configuration for the Grid Engine
scheduler.
27. During the installation of the cloud software I had a
network problem because it wasn’t possible to use a
DHCP server. Both the machines were using a static ip
connected to the UCL network.
Network configuration
To overcome this problem I used a crossbar cable
between the two machines creating a small private
network.
28. Nimbus currently supports two hypervisors: KVM and
Xen. I decided to go with KVM that is easier and faster
to install.
First test
After some dependencies fix and some other problem
resolution I succeeded in setting everything up. I was
able to create virtual machines and ssh into them.
29. The installation process of Nimbus use the default
mode that does not integrate the batch scheduler.
Therefore the next step was to switch to the pilot
mode to be able to submit requests of creation of VMs
to the Grid Engine scheduler.
First test (pilot mode)
At this point I found out something unexpected, the
online documentation didn’t mention that only Xen is
supported in pilot mode.
30. Shell output after VM creation request with Nimbus
cloud client (using KVM hypervisor):
First test problem
Start time: will be set after lease is secured
Shutdown time: will be set after lease is secured
Termination time: will be set after lease is secured
workspacepilot.py (python script for the pilot
mode) was trying to create a virtual machine with
Xen without success.
31. When the hypervisor problem came up I decided to re-
install the VMM, this time using Xen.
Second test
Everything was set up again but when I tried to launch
a virtual machine instance I found out another
problem, there was something wrong with the kernel
that Xen was using to boot the virtual environment.
32. dracut: Starting plymouth daemon
alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
blkfront: xvda1: flush diskcache: enabled using persistent grants
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda1): mounted filesystem with ordered data mode
dracut: Remounting /dev/xvda1 with -o errors=remount-ro,ro
kjournald starting. Commit interval 5 seconds
EXT3-fs (xvda1): mounted filesystem with ordered data mode
dracut: Mounted root filesystem /dev/xvda1
dracut: Switching root
EXT3-fs (xvda1): using internal journal
Second test problem
Console output when Xen creates a virtual machine
using the Nimbus cloud client (it enters a loop and
crash down):
33. Looking for a solution to the problem I also found out
that Red Hat, from the version 6, dropped the
support for Xen.
Second test problem (part II)
Therefore the problem can be related to this event
but I didn’t find any other clue to solve the issue.
34. 1. Preliminary steps
Internship itinerary
2. Hacking the virtualization
3. Cloud platform choice
5. Conclusions and results
4. Deployment and configuration
35. Nimbus seemed the best choice for the purpose of
this project thanks to its support for the batch
scheduler and the relatively easy installation but in
the end I would reconsider its use.
Conclusion and results
I think it will be better to adapt a more commercial
solution as Openstack, given its greater support and
elasticity, in order to be able to solve the goal of the
project.