In addition to providing bare-metal access to large amounts of compute FAS Research Computing (FASRC) at Harvard also builds and fully maintains custom virtual machines tailored to faculty and researchers needs including lab websites, portals, databases, project development environments, and more both locally and on public clouds. Recently FASRC converted its internal VM infrastructure from a completely home-made KVM cluster to a more robust and reliable system powered by OpenNebula and Ceph configured with public cloud integration. Over the years as the number of VMs grew our home-made solution started to show signs of wear and tear with respect to scheduling, provisioning, management, inventory, and performance. Our new deployment improves on all of these areas and provides APIs and features that both help us serve clients more efficiently and improve our internal processes for testing new system configurations and dynamically spinning up resources for continous integration and deployment. Our new VM infrastructure deployment is fully automated via puppet and has been used to provision a multi-datacenter, fault-tolerant, VM infrastructure with a multi-tiered back-up system and robust VM and virtual disk monitoring. We will describe our internal system architecture and deployment, challenges we faced, and innovations we made along the way while deploying OpenNebula and Ceph. We will also discuss a new client-facing OpenNebula cloud deployment we’re currently beta testing with select users where users have full control over the creation and configuration of their VMs on FASRC compute resources via the OpenNebula dashboard and APIs.
Driving Behavioral Change for Information Management through Data-Driven Gree...
OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by Justin Riley, Harvard University
1. Paying Down Technical Debt
Research Computing
Faculty of Arts and Sciences
Harvard University
2. - Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
3. How VMs come to life @ FASRC:
1. User submits a ticket requesting some resources
2. Admins review the request and decide best route
moving forward
3. If a virtual machine is the best option then deploy it
4. We ensure the VM is up and running along with any
critical services
Virtual Machines @ FASRC
4. Examples:
1. Research group web sites
2. Development environments
3. Web-based visualization tools
4. API servers
5. Web-based workflow engines (e.g. galaxy)
6. Private git(lab) servers
7. Various (web) front-ends
8. Internal services
… and more
Virtual Machines @ FASRC
5. The Old System
- KVM + 2-node gluster storage system
- Single datacenter
- Manual VM scheduling
- Full OS installs from scratch every time (PXE)
- Manual VM resizing (virsh edit)
- Numerous custom helper scripts to simplify
core tasks (build_vm, hypervisor_stats, etc)
The Old System
6. The New System
- OpenNebula + KVM + Ceph
- Multi-datacenter hypervisor and storage
clusters
- Automated VM scheduling
- Image-based OS Deployments
- Dynamic VM resizing via OpenNebula
- Multi-tiered backup system
The New System
7. - Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
8. Why OpenNebula?
FLEXIBLE
Fully open-source and
customizable to fit into
any data center and
policies
ROBUST
Production-ready,
highly-scalable, reliable
and supported
LIGHT & SIMPLE
Lightweight and easy to
install, maintain, operate,
upgrade and use
POWERFUL
Innovative functionality
for private/hybrid clouds
and DC virtualization
10. - Open Source Red Hat
Project
- Distributed File, Block,
and Object Store
- Object store at its core
(RADOS)
- User accessible S3/Swift
- No Single Point of Failure
- Self-healing
- Exabyte scale
Why Ceph?
11. - Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
13. Load Testing
- OneFlow-Deployed Salt + Diamond + Grafana/Graphite Stack
- Running Bonnie++ and HPL benchmarks
- Live migration between datacenters
- Pulled OSDs and killed MONs
- Ceph tuning (CRUSH, rebalance, etc)
- Monitored file ACKs and system uptime
- Base VM Image tuning (swap, SCSI timeout, etc)
15. OpenNebula and Puppet
- We needed our Private Cloud deployment to integrate with
existing configuration management at FAS RC
- Extended epost-dev’s OpenNebula puppet module:
https://github.com/fasrc/opennebula-puppet-module
- Automatically provision multiple separate
OpenNebula clusters (repeatable deployment)
- Manage resources in ongoing operations
Integration
16. Backups
Production Ceph Cluster Backup Ceph Cluster QCOW2 Deep Backups
- Daily VM disk snapshots
(2 weeks worth)
- Delta objects between
today and yesterday
transferred
- Delta objects
applied to sister
block device on
separate backup
cluster
- Export RBD devices
from backup cluster to
qcow2 file format and
keep in separate
filesystem in separate
datacenter
Integration
https://github.com/fasrc/ ceph_rsnapshot
19. VM Leaderboard (“Who’s Hammering Disk?”)
Diamond collector to gather Ceph client performance counters for OpenNebula VM disks
and send to graphite/grafana https://github.com/fasrc/nebula-ceph-diamond-collector
Integration
20. OpenNebula Hooks (plugins)
- Shared networks between VMs and physical systems
- Developed hook/plugin to prevent IP conflicts (events: CREATE, RUNNING)
- If a mismatch is found, the VM is paused to protect existing physical infrastructure
- Notification added as a VM attribute visible via Sunstone, e.g.:
Mismatch:
Fixed:
Integration
21. OneDNS
Dynamic DNS resolution for OpenNebula
https://github.com/fasrc/onedns
Features:
- Dynamic DNS generation for all VMs based on (sanitized) name
- Automatic forward and reverse PTR records per VM per NIC
- Deployable via Python pip or docker
- Built using dnslib and python-oca
Example:
$ nslookup my-vm-name.onevms.my.domain.com
….
Address: 192.168.2.4
Integration
23. OpenNebula Images
Using Cangallo to create clean VM images:
https://github.com/jfontan/cangallo
- “Dockerfile” for containers*
- Simple YAML file format
- Exports to *.qcow2
- No cruft (no booting the OS and
snapshotting)
- Uses libguestfs + qemu-img under the
hood
* No fine-grained image layer caching
Single git repo for all base + derived images.
Operations
24. OpenNebula Images
Example of a base image and derived image
CentOS 7 + OpenNebula Context Package FASRC Base CentOS 7 Image
Operations
25. VM Bootstrapping
Initialization script for bootstrapping a VM
from a git repo:
https://github.com/fasrc/one-vm-bootstrap
- Clones git repo
- Checks out a specific branch if specified
- Runs a script from the repo (exact path
can be configured via context variables)
- Script gets all VM context variables via
environment
- Logs stdout/stderr for the entire run +
total run time to a file.
Operations
26. VM Bootstrapping
Use cases:
- Provisioning hosts using puppet, chef,
ansible, etc.
- Launching OpenNebula + Ceph on
OpenNebula (for testing)
- Gitlab CI Runners
- RPM builders (slurm, libvirt, ceph, etc.)
- Anything a script can do...
- etc.
Operations
27. OneFlow
What does it do?
- Provision service
VMs in a defined
order
- Autoscale services
Use cases:
- Load testing: salt
master and worker
nodes running
benchmarks
- OpenNebula on
OpenNebula testing
Operations
28. - Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
29. User-Facing OpenNebula Cloud (ALPHA)
- Compute hardware refresh
- Current Odyssey cluster --> ONE
- Direct access to OpenNebula APIs
- IaaS for FASRC Users
- FASRC Private Cloud “2.0”
What’s Next?
30. John Noss: Infrastructure Engineer, Harvard University Faculty of Arts and Sciences
Research Computing
Wes Dillingham: Infrastructure Engineer, Harvard University Faculty of Arts and
Sciences Research Computing
Dr. Ignacio M. Llorente: Visiting Scholar at Harvard University, Professor at
Complutense University, and Project Director at OpenNebula
Special thanks to Javier Fontan and the entire
OpenNebula Development Team!
Acknowledgements