SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Paying Down Technical Debt
Research Computing
Faculty of Arts and Sciences
Harvard University
- Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
How VMs come to life @ FASRC:
1. User submits a ticket requesting some resources
2. Admins review the request and decide best route
moving forward
3. If a virtual machine is the best option then deploy it
4. We ensure the VM is up and running along with any
critical services
Virtual Machines @ FASRC
Examples:
1. Research group web sites
2. Development environments
3. Web-based visualization tools
4. API servers
5. Web-based workflow engines (e.g. galaxy)
6. Private git(lab) servers
7. Various (web) front-ends
8. Internal services
… and more
Virtual Machines @ FASRC
The Old System
- KVM + 2-node gluster storage system
- Single datacenter
- Manual VM scheduling
- Full OS installs from scratch every time (PXE)
- Manual VM resizing (virsh edit)
- Numerous custom helper scripts to simplify
core tasks (build_vm, hypervisor_stats, etc)
The Old System
The New System
- OpenNebula + KVM + Ceph
- Multi-datacenter hypervisor and storage
clusters
- Automated VM scheduling
- Image-based OS Deployments
- Dynamic VM resizing via OpenNebula
- Multi-tiered backup system
The New System
- Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
Why OpenNebula?
FLEXIBLE
Fully open-source and
customizable to fit into
any data center and
policies
ROBUST
Production-ready,
highly-scalable, reliable
and supported
LIGHT & SIMPLE
Lightweight and easy to
install, maintain, operate,
upgrade and use
POWERFUL
Innovative functionality
for private/hybrid clouds
and DC virtualization
CORONA
COntaineRized OpenNebulA
- https://github.com/fasrc/corona
- OpenNebula Services in Docker
- ONED, Sunstone, OneGate,
OneFlow, libvirt/KVM, etc.
- Simplified upgrade/rollback
- Open Source Red Hat
Project
- Distributed File, Block,
and Object Store
- Object store at its core
(RADOS)
- User accessible S3/Swift
- No Single Point of Failure
- Self-healing
- Exabyte scale
Why Ceph?
- Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
Architecture
Load Testing
- OneFlow-Deployed Salt + Diamond + Grafana/Graphite Stack
- Running Bonnie++ and HPL benchmarks
- Live migration between datacenters
- Pulled OSDs and killed MONs
- Ceph tuning (CRUSH, rebalance, etc)
- Monitored file ACKs and system uptime
- Base VM Image tuning (swap, SCSI timeout, etc)
- ...
- Architecture
- Integration
- OpenNebula and Puppet
- Backups
- Monitoring
- VM Leaderboard
- OpenNebula Hooks
- OneDNS
- Operations
- What’s next?
Outline
OpenNebula and Puppet
- We needed our Private Cloud deployment to integrate with
existing configuration management at FAS RC
- Extended epost-dev’s OpenNebula puppet module:
https://github.com/fasrc/opennebula-puppet-module
- Automatically provision multiple separate
OpenNebula clusters (repeatable deployment)
- Manage resources in ongoing operations
Integration
Backups
Production Ceph Cluster Backup Ceph Cluster QCOW2 Deep Backups
- Daily VM disk snapshots
(2 weeks worth)
- Delta objects between
today and yesterday
transferred
- Delta objects
applied to sister
block device on
separate backup
cluster
- Export RBD devices
from backup cluster to
qcow2 file format and
keep in separate
filesystem in separate
datacenter
Integration
https://github.com/fasrc/ ceph_rsnapshot
Ceph Monitoring
Integration
Ceph Nagios Alerting
Integration
VM Leaderboard (“Who’s Hammering Disk?”)
Diamond collector to gather Ceph client performance counters for OpenNebula VM disks
and send to graphite/grafana https://github.com/fasrc/nebula-ceph-diamond-collector
Integration
OpenNebula Hooks (plugins)
- Shared networks between VMs and physical systems
- Developed hook/plugin to prevent IP conflicts (events: CREATE, RUNNING)
- If a mismatch is found, the VM is paused to protect existing physical infrastructure
- Notification added as a VM attribute visible via Sunstone, e.g.:
Mismatch:
Fixed:
Integration
OneDNS
Dynamic DNS resolution for OpenNebula
https://github.com/fasrc/onedns
Features:
- Dynamic DNS generation for all VMs based on (sanitized) name
- Automatic forward and reverse PTR records per VM per NIC
- Deployable via Python pip or docker
- Built using dnslib and python-oca
Example:
$ nslookup my-vm-name.onevms.my.domain.com
….
Address: 192.168.2.4
Integration
- ...
- Architecture
- Integration
- Operations
- OpenNebula Images
- VM Bootstrapping
- OneFlow
- What’s next?
Outline
OpenNebula Images
Using Cangallo to create clean VM images:
https://github.com/jfontan/cangallo
- “Dockerfile” for containers*
- Simple YAML file format
- Exports to *.qcow2
- No cruft (no booting the OS and
snapshotting)
- Uses libguestfs + qemu-img under the
hood
* No fine-grained image layer caching
Single git repo for all base + derived images.
Operations
OpenNebula Images
Example of a base image and derived image
CentOS 7 + OpenNebula Context Package FASRC Base CentOS 7 Image
Operations
VM Bootstrapping
Initialization script for bootstrapping a VM
from a git repo:
https://github.com/fasrc/one-vm-bootstrap
- Clones git repo
- Checks out a specific branch if specified
- Runs a script from the repo (exact path
can be configured via context variables)
- Script gets all VM context variables via
environment
- Logs stdout/stderr for the entire run +
total run time to a file.
Operations
VM Bootstrapping
Use cases:
- Provisioning hosts using puppet, chef,
ansible, etc.
- Launching OpenNebula + Ceph on
OpenNebula (for testing)
- Gitlab CI Runners
- RPM builders (slurm, libvirt, ceph, etc.)
- Anything a script can do...
- etc.
Operations
OneFlow
What does it do?
- Provision service
VMs in a defined
order
- Autoscale services
Use cases:
- Load testing: salt
master and worker
nodes running
benchmarks
- OpenNebula on
OpenNebula testing
Operations
- Introduction
- Virtual Machines @ FASRC
- The Old System
- The New System
- Why OpenNebula and Ceph?
- Architecture
- Integration
- Operations
- What’s next?
Outline
User-Facing OpenNebula Cloud (ALPHA)
- Compute hardware refresh
- Current Odyssey cluster --> ONE
- Direct access to OpenNebula APIs
- IaaS for FASRC Users
- FASRC Private Cloud “2.0”
What’s Next?
John Noss: Infrastructure Engineer, Harvard University Faculty of Arts and Sciences
Research Computing
Wes Dillingham: Infrastructure Engineer, Harvard University Faculty of Arts and
Sciences Research Computing
Dr. Ignacio M. Llorente: Visiting Scholar at Harvard University, Professor at
Complutense University, and Project Director at OpenNebula
Special thanks to Javier Fontan and the entire
OpenNebula Development Team!
Acknowledgements
https://www.computer.org/csdl/mags/co/2017/06/mco2017060092.html
Learn More...
Thank you!

Contenu connexe

Tendances

Tendances (20)

OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan HoracekOpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
OpenNebula Conf 2014 | Lightning talk: OpenNebula at Etnetera by Jan Horacek
 
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier FontanOpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
OpenNebula Conf 2014 | OpenNebula as Open Replacement of vCloud by Javier Fontan
 
OpenNebulaconf2017US: OpenNebula hybrid clouds with Amazon and Azure by Ruben...
OpenNebulaconf2017US: OpenNebula hybrid clouds with Amazon and Azure by Ruben...OpenNebulaconf2017US: OpenNebula hybrid clouds with Amazon and Azure by Ruben...
OpenNebulaconf2017US: OpenNebula hybrid clouds with Amazon and Azure by Ruben...
 
OpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebula
OpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebulaOpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebula
OpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebula
 
OpenNebula 5.4 Enhancements vCenter Integration
OpenNebula 5.4 Enhancements vCenter IntegrationOpenNebula 5.4 Enhancements vCenter Integration
OpenNebula 5.4 Enhancements vCenter Integration
 
OpenNebula and VMware - A dance
OpenNebula and VMware - A danceOpenNebula and VMware - A dance
OpenNebula and VMware - A dance
 
Mi-Cloud Deployment Scenarios - Nazarudin Wijee
Mi-Cloud Deployment Scenarios - Nazarudin WijeeMi-Cloud Deployment Scenarios - Nazarudin Wijee
Mi-Cloud Deployment Scenarios - Nazarudin Wijee
 
OpenNebulaconf2017US: Orchestration of vMware datacenters with OpenNebula by ...
OpenNebulaconf2017US: Orchestration of vMware datacenters with OpenNebula by ...OpenNebulaconf2017US: Orchestration of vMware datacenters with OpenNebula by ...
OpenNebulaconf2017US: Orchestration of vMware datacenters with OpenNebula by ...
 
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
 
OpenNebula is Evolving... Fast! - Jaime Melis
OpenNebula is Evolving... Fast! - Jaime MelisOpenNebula is Evolving... Fast! - Jaime Melis
OpenNebula is Evolving... Fast! - Jaime Melis
 
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
 
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
 
OpenNebula - Key Aspects in CentOS
OpenNebula - Key Aspects in CentOSOpenNebula - Key Aspects in CentOS
OpenNebula - Key Aspects in CentOS
 
OpenNebula Networking - Rubén S. Montero
OpenNebula Networking - Rubén S. MonteroOpenNebula Networking - Rubén S. Montero
OpenNebula Networking - Rubén S. Montero
 
TechDay - Cambridge 2016 - OpenNebula Corona
TechDay - Cambridge 2016 - OpenNebula CoronaTechDay - Cambridge 2016 - OpenNebula Corona
TechDay - Cambridge 2016 - OpenNebula Corona
 
OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...
OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...
OpenNebula Conf 2014 | OpenNebula as alternative to commercial virtualization...
 
TechDay - April - OpenNebula and Docker
TechDay - April - OpenNebula and DockerTechDay - April - OpenNebula and Docker
TechDay - April - OpenNebula and Docker
 
OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...
OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...
OpenNebula Conf 2014 | Building Hybrid Cloud Federated Environments with Open...
 
OpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on Tutorial
 

Similaire à OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by Justin Riley, Harvard University

OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 

Similaire à OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by Justin Riley, Harvard University (20)

KubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to ProdKubeCon 2017: Kubernetes from Dev to Prod
KubeCon 2017: Kubernetes from Dev to Prod
 
Kash Kubernetified
Kash KubernetifiedKash Kubernetified
Kash Kubernetified
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStack
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Kubernetes extensibility: crd & operators
Kubernetes extensibility: crd & operators Kubernetes extensibility: crd & operators
Kubernetes extensibility: crd & operators
 
Kubernetes extensibility: CRDs & Operators
Kubernetes extensibility: CRDs & OperatorsKubernetes extensibility: CRDs & Operators
Kubernetes extensibility: CRDs & Operators
 
Manila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - TokyoManila, an update from Liberty, OpenStack Summit - Tokyo
Manila, an update from Liberty, OpenStack Summit - Tokyo
 
MetalK8s 2.x 'Moonshot' - LOADays 2019, Antwerp
MetalK8s 2.x 'Moonshot' - LOADays 2019, AntwerpMetalK8s 2.x 'Moonshot' - LOADays 2019, Antwerp
MetalK8s 2.x 'Moonshot' - LOADays 2019, Antwerp
 
Automated Application Management with SaltStack
Automated Application Management with SaltStackAutomated Application Management with SaltStack
Automated Application Management with SaltStack
 
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaSAutoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!Why you’re going to fail running java on docker!
Why you’re going to fail running java on docker!
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Queens release updates
Queens release updatesQueens release updates
Queens release updates
 
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
Kubernetes vs dockers swarm supporting onap oom on multi-cloud multi-stack en...
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
 
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E... Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Openstack devops challenges a journey from dump baremetal to functional ope...
Openstack devops challenges   a journey from dump baremetal to functional ope...Openstack devops challenges   a journey from dump baremetal to functional ope...
Openstack devops challenges a journey from dump baremetal to functional ope...
 

Plus de OpenNebula Project

OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebula Project
 

Plus de OpenNebula Project (20)

OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
 
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
 
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
 
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
 
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAFOpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
 
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
 
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
 
Replacing vCloud with OpenNebula
Replacing vCloud with OpenNebulaReplacing vCloud with OpenNebula
Replacing vCloud with OpenNebula
 
NTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do ItNTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do It
 
OpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISPOpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISP
 
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbHNTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux Way
 
NetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebulaNetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebula
 
NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10
 
Security for Private Cloud Environments
Security for Private Cloud EnvironmentsSecurity for Private Cloud Environments
Security for Private Cloud Environments
 
CheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebulaCheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebula
 
DE-CIX: CloudConnectivity
DE-CIX: CloudConnectivityDE-CIX: CloudConnectivity
DE-CIX: CloudConnectivity
 
DDC Demo
DDC DemoDDC Demo
DDC Demo
 
Cloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebulaCloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebula
 

Dernier

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by Justin Riley, Harvard University

  • 1. Paying Down Technical Debt Research Computing Faculty of Arts and Sciences Harvard University
  • 2. - Introduction - Virtual Machines @ FASRC - The Old System - The New System - Why OpenNebula and Ceph? - Architecture - Integration - Operations - What’s next? Outline
  • 3. How VMs come to life @ FASRC: 1. User submits a ticket requesting some resources 2. Admins review the request and decide best route moving forward 3. If a virtual machine is the best option then deploy it 4. We ensure the VM is up and running along with any critical services Virtual Machines @ FASRC
  • 4. Examples: 1. Research group web sites 2. Development environments 3. Web-based visualization tools 4. API servers 5. Web-based workflow engines (e.g. galaxy) 6. Private git(lab) servers 7. Various (web) front-ends 8. Internal services … and more Virtual Machines @ FASRC
  • 5. The Old System - KVM + 2-node gluster storage system - Single datacenter - Manual VM scheduling - Full OS installs from scratch every time (PXE) - Manual VM resizing (virsh edit) - Numerous custom helper scripts to simplify core tasks (build_vm, hypervisor_stats, etc) The Old System
  • 6. The New System - OpenNebula + KVM + Ceph - Multi-datacenter hypervisor and storage clusters - Automated VM scheduling - Image-based OS Deployments - Dynamic VM resizing via OpenNebula - Multi-tiered backup system The New System
  • 7. - Introduction - Virtual Machines @ FASRC - The Old System - The New System - Why OpenNebula and Ceph? - Architecture - Integration - Operations - What’s next? Outline
  • 8. Why OpenNebula? FLEXIBLE Fully open-source and customizable to fit into any data center and policies ROBUST Production-ready, highly-scalable, reliable and supported LIGHT & SIMPLE Lightweight and easy to install, maintain, operate, upgrade and use POWERFUL Innovative functionality for private/hybrid clouds and DC virtualization
  • 9. CORONA COntaineRized OpenNebulA - https://github.com/fasrc/corona - OpenNebula Services in Docker - ONED, Sunstone, OneGate, OneFlow, libvirt/KVM, etc. - Simplified upgrade/rollback
  • 10. - Open Source Red Hat Project - Distributed File, Block, and Object Store - Object store at its core (RADOS) - User accessible S3/Swift - No Single Point of Failure - Self-healing - Exabyte scale Why Ceph?
  • 11. - Introduction - Virtual Machines @ FASRC - The Old System - The New System - Why OpenNebula and Ceph? - Architecture - Integration - Operations - What’s next? Outline
  • 13. Load Testing - OneFlow-Deployed Salt + Diamond + Grafana/Graphite Stack - Running Bonnie++ and HPL benchmarks - Live migration between datacenters - Pulled OSDs and killed MONs - Ceph tuning (CRUSH, rebalance, etc) - Monitored file ACKs and system uptime - Base VM Image tuning (swap, SCSI timeout, etc)
  • 14. - ... - Architecture - Integration - OpenNebula and Puppet - Backups - Monitoring - VM Leaderboard - OpenNebula Hooks - OneDNS - Operations - What’s next? Outline
  • 15. OpenNebula and Puppet - We needed our Private Cloud deployment to integrate with existing configuration management at FAS RC - Extended epost-dev’s OpenNebula puppet module: https://github.com/fasrc/opennebula-puppet-module - Automatically provision multiple separate OpenNebula clusters (repeatable deployment) - Manage resources in ongoing operations Integration
  • 16. Backups Production Ceph Cluster Backup Ceph Cluster QCOW2 Deep Backups - Daily VM disk snapshots (2 weeks worth) - Delta objects between today and yesterday transferred - Delta objects applied to sister block device on separate backup cluster - Export RBD devices from backup cluster to qcow2 file format and keep in separate filesystem in separate datacenter Integration https://github.com/fasrc/ ceph_rsnapshot
  • 19. VM Leaderboard (“Who’s Hammering Disk?”) Diamond collector to gather Ceph client performance counters for OpenNebula VM disks and send to graphite/grafana https://github.com/fasrc/nebula-ceph-diamond-collector Integration
  • 20. OpenNebula Hooks (plugins) - Shared networks between VMs and physical systems - Developed hook/plugin to prevent IP conflicts (events: CREATE, RUNNING) - If a mismatch is found, the VM is paused to protect existing physical infrastructure - Notification added as a VM attribute visible via Sunstone, e.g.: Mismatch: Fixed: Integration
  • 21. OneDNS Dynamic DNS resolution for OpenNebula https://github.com/fasrc/onedns Features: - Dynamic DNS generation for all VMs based on (sanitized) name - Automatic forward and reverse PTR records per VM per NIC - Deployable via Python pip or docker - Built using dnslib and python-oca Example: $ nslookup my-vm-name.onevms.my.domain.com …. Address: 192.168.2.4 Integration
  • 22. - ... - Architecture - Integration - Operations - OpenNebula Images - VM Bootstrapping - OneFlow - What’s next? Outline
  • 23. OpenNebula Images Using Cangallo to create clean VM images: https://github.com/jfontan/cangallo - “Dockerfile” for containers* - Simple YAML file format - Exports to *.qcow2 - No cruft (no booting the OS and snapshotting) - Uses libguestfs + qemu-img under the hood * No fine-grained image layer caching Single git repo for all base + derived images. Operations
  • 24. OpenNebula Images Example of a base image and derived image CentOS 7 + OpenNebula Context Package FASRC Base CentOS 7 Image Operations
  • 25. VM Bootstrapping Initialization script for bootstrapping a VM from a git repo: https://github.com/fasrc/one-vm-bootstrap - Clones git repo - Checks out a specific branch if specified - Runs a script from the repo (exact path can be configured via context variables) - Script gets all VM context variables via environment - Logs stdout/stderr for the entire run + total run time to a file. Operations
  • 26. VM Bootstrapping Use cases: - Provisioning hosts using puppet, chef, ansible, etc. - Launching OpenNebula + Ceph on OpenNebula (for testing) - Gitlab CI Runners - RPM builders (slurm, libvirt, ceph, etc.) - Anything a script can do... - etc. Operations
  • 27. OneFlow What does it do? - Provision service VMs in a defined order - Autoscale services Use cases: - Load testing: salt master and worker nodes running benchmarks - OpenNebula on OpenNebula testing Operations
  • 28. - Introduction - Virtual Machines @ FASRC - The Old System - The New System - Why OpenNebula and Ceph? - Architecture - Integration - Operations - What’s next? Outline
  • 29. User-Facing OpenNebula Cloud (ALPHA) - Compute hardware refresh - Current Odyssey cluster --> ONE - Direct access to OpenNebula APIs - IaaS for FASRC Users - FASRC Private Cloud “2.0” What’s Next?
  • 30. John Noss: Infrastructure Engineer, Harvard University Faculty of Arts and Sciences Research Computing Wes Dillingham: Infrastructure Engineer, Harvard University Faculty of Arts and Sciences Research Computing Dr. Ignacio M. Llorente: Visiting Scholar at Harvard University, Professor at Complutense University, and Project Director at OpenNebula Special thanks to Javier Fontan and the entire OpenNebula Development Team! Acknowledgements