This document discusses budgeting for cloud computing resources at the Leibniz Supercomputing Centre. It provides an introduction and outline, describes the LRZ's compute cloud setup and increasing user base. It proposes a cost function for budgeting based on resources like cores, RAM, and storage space. The document outlines plans for budgeting including hardware classes, user classes, and pre-paid models to avoid budget overflows. It describes the current budgeting implementation and next steps to update OpenNebula and focus on security.
1. Budgeting: the Ugly Duckling of Cloud Computing?
Dr. Matteo Lanati (matteo.lanati@lrz.de)
25th
October 2016
2. 2LRZ, Distributed Resources Group, Matteo Lanati
●
Introduction
●
Update on the last year‘s activity
●
Budgeting for OpenNebula
●
Why it is needed
●
Rationale and ideas
●
Current status
●
Next steps
Outline
3. 3LRZ, Distributed Resources Group, Matteo Lanati
●
Scope:
– Munich
– Bavaria
– Germany
– Europe
– Worldwide
●
Provision of traditional IT services
●
High performance systems
Leibniz Supercomupting Centre of the Bavarian
Academy of Sciences and Humanities
5. 5LRZ, Distributed Resources Group, Matteo Lanati
SSH commands
Monitoring probes
...
Worker node 88
Datastore System store 1 System store 10
Worker node 1
LRZ Compute Cloud: OpenNebula setup
88 physical nodes
736 cores / 7.5 TB RAM
...
VMWare high availability
8 cores / 32 GB RAM
NetApp NAS
300 TB
6. 6LRZ, Distributed Resources Group, Matteo Lanati
Our user base
Update on last year‘s activity
March 2015 – October 2015
200 accounts
October 2015 – October 2016
250 new accounts
LRZ 28%
Other 15% Math / CS 28%
Mech. Eng. 12%
Other 40%
LRZ 19%
Math / CS 15%
Bio 10%
7. 7LRZ, Distributed Resources Group, Matteo Lanati
Resource usage: computation
Update on last year‘s activity
October 2015 – October 2016
Computation: ~3 Mi CPU-hours
Storage: ~30 TB
March 2015 – October 2015
Computation: ~1 Mi CPU-hours
Storage: ~10 TB
Math / CS
1.2 M (41%)
Geo
496 K (17%)
Mech. Eng.
459 K (15%)
Other
443 K (14%)
LRZ
191 K (6%)
Geo
396 K (35%)
Math / CS
128 K (12%)
Other
219 K (20%)
LRZ
88 K (8%)
Mech. Eng.
64 K (6%)
Physics
108 K (10%)
8. 8LRZ, Distributed Resources Group, Matteo Lanati
Goal: efficient use of resources (i.e., few idle VMs)
Manage the lifetime of a group of VMs according to:
●
Number of cores (Nc)
●
RAM (Mem)
●
Datastore space (Ds)
●
IPs
●
time
What budgeting means
Cost function
(A * Nc + B * Mem + C * Ds + D * IPs) * <running time>
9. 9LRZ, Distributed Resources Group, Matteo Lanati
A concrete proposal for the cost factors
What budgeting means
0.01 * Nc * <hours> + 0.001 * Mem * <hours> +
+ 0.01 * Ds * <months> + 0.50 * IPpublic * <months> +
+ 0.10 * IPprivate * <months>
Item Time period Cost
Core Hour 0.01 €
GB of RAM Hour 0.001 €
GB in image store Month 0.01 €
Public IP Month 0.50 €
Private (campus) IP Month 0.10 €
10. 10LRZ, Distributed Resources Group, Matteo Lanati
Use cases
●
Computational bursts
– 200 to 400 cores for few weeks to few months
●
Multitenancy inside a group / project
– Support students training activities
– Important feature: avoid budget overflow
●
Resource management and planning
– To help us deciding how /in which direction to grow
Why budgeting
11. 11LRZ, Distributed Resources Group, Matteo Lanati
Hardware Classes
●
Regular
– Payed by LRZ
●
Reserved
– Brought in by the user
– Exclusive access
Budgeting: the big plan
User Classes
●
Normal (uninterruptible)
– No guarantees on start time
●
Privileged (golden)
– Immediate start
12. 12LRZ, Distributed Resources Group, Matteo Lanati
Hardware Classes
●
Regular
– Payed by LRZ
●
Reserved
– Brought in by the user
– Exclusive access
Budgeting: the big plan
User Classes
●
Normal (uninterruptible)
– No guarantees on start time
●
Privileged (golden)
– Immediate start
13. 13LRZ, Distributed Resources Group, Matteo Lanati
Hardware Classes
●
Regular
– Payed by LRZ
●
Reserved
– Brought in by the user
– Exclusive access
Budgeting: the big plan
Usage optimisation
User Classes
●
Normal (uninterruptible)
– No guarantees on start time
●
Privileged (golden)
– Immediate start
●
Opportunistic
●
Interruptible
14. 14LRZ, Distributed Resources Group, Matteo Lanati
Hardware Classes
●
Regular
●
Reserved
– Permission/ownership
– Scheduling requirements
– Scheduler
Budgeting: possible implementation
User Classes
●
Selected in the template
●
Possible customisation of the
GUI
15. 15LRZ, Distributed Resources Group, Matteo Lanati
●
Prepaid model
– Avoid budget overflow
– Mitigation in case the budget is exceeded => undeploy VMs
●
External implementation
– Split the budget management from sysadmin view
– Easier to use the cost function to run a prediction model
Budgeting: important features
19. 19LRZ, Distributed Resources Group, Matteo Lanati
Budgeting: the implementation so far
VM submission VM runningHook script
Cron jobs
VM undeployed
ONE DB
Budget thresholds
Budget
Consumption
<# cores> * <running time>
20. 20LRZ, Distributed Resources Group, Matteo Lanati
●
Update to ONE 5.0.x
●
Upgrade the hardware
●
Focus on the security of VMs – LRZ Security Scanner (LSS)
– Detect weak passwords
– Identify vulnerabilities
Next Steps