SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
OpenStack at the
Sanger Institute
Dave Holland
From zero knowledge to
“Pan-Prostate Genome Blaster”
in 18 months
What I’ll talk about
● The Sanger Institute
● Motivations for using OpenStack
● Our journey
● Some decisions we made (and why)
● Some problems we encountered (and how we addressed them)
● Projects that are using it so far
● Next steps
The Sanger Institute
LSF 9
~10,000 cores in main compute farm
~10,000 cores across smaller project-specific farms
13PB Lustre storage
Mostly everything is available everywhere - “isolation” is based on POSIX file
permissions
Motivations
LSF great for HPC utilization but…
● It doesn’t address data size/sharing/locality
● It’s quicker to move an image (or an image definition) to the data
○ benefit from existing data security arrangements
○ benefit from tenant isolation
LSF isn’t going away - complementary to cloud-style computing
Our journey
● 2015, June: sysadmin training
● July: experiments with RHOSP6 (Juno)
● August: RHOSP7 (Kilo) released
● December: pilot “beta” system opened to testers
● 2016, first half: Science As A Service
● July: pilot “gamma” system opened using proper Ceph hardware
● August: datacentre shutdown
● September: production system hardware installation
● 2017, January: “delta” system opened to early adopters
● February: Sanger Flexible Compute Platform announced
Science As A Service
First half of 2016
Proof-of-concept of a user-friendly orchestration portal (CloudForms) on top
of OpenStack and VMware
Consultancy and development input from RedHat
Presented at Scientific Working Group in Barcelona summit, October 2016
Decisions we made
Hardware
We approached current vendors, and SuperMicro via BIOS-IT
Wanted to get most bang for buck
Arista provided seed switch kit and offered VXLAN support
Production OpenStack (1)
• 107 Compute nodes (Supermicro) each with:
• 512GB of RAM, 2 * 25GB/s network interfaces
• 1 * 960GB local SSD, 2 * Intel E52690v4 (14 cores @ 2.6Ghz)
• 6 Control nodes (Supermicro) allow 2 openstack deployments
• 256 GB RAM, 2 * 100 GB/s network interfaces
• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe (/var)
• 2 * Intel E52690v4 (14 cores @ 2.6Ghz)
• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading
• RHOSP8 (Liberty) deployed with Triple-O
Production OpenStack (2)
• 9 Storage nodes (Supermicro) each with:
• 512GB of RAM
• 2 * 100GB/s network interfaces,
• 60 * 6TB SAS discs, 2 system SSD
• 2 * Intel E52690v4 (14 cores @ 2.6Ghz)
• 4TB of Intel P3600 NVMe used for journal
• Ubuntu Xenial
• 3 PB of disc space, 1PB usable
• Single instance (1.3 GBytes/sec write, 200 MBytes/sec read)
• Ceph benchmarks imply 7 GBytes/sec
Production OpenStack (3)
• 3 racks of equipment, 24 KW load per rack
• 10 Arista 7060CX-32S switches
• 1U, 32 * 100Gb/s -> 128 * 25Gb/s
• Hardware VXLAN support integrated with OpenStack *
• Layer two traffic limited to rack, VXLAN used inter-rack
• Layer three between racks and interconnect to legacy systems
• All network switch software can be upgraded without disruption
• True Linux systems
• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems
* VxLan in ml2 plugin not used in first iteration because of software issues
OpenStack installation
RHOSP vs Packstack vs …
• Paid-for support from RedHat
• Terminology confusion: Triple-O undercloud and overcloud
• Need wellness checks of undercloud and overcloud before each
(re)deploy
• Keep deployment configuration in git and deploy with a script for
consistency
Ceph installation
Integrated or standalone?
• Deployment by RHOSP is easier but is tied to that OpenStack
• A separate self-supported Ceph was more cost effective and a
better fit for staff knowledge at the time
• It’s possible to share a Ceph between multiple OpenStacks
• ceph-ansible is seductive but brings some headaches
• e.g. --check causes problems like changing the fsid
Networking
We wanted VXLAN support in switches to enable metal-as-a-service
Unfortunately we’re not there yet…
e.g. ml2 driver bugs: “reserved” is not a valid UUID
We currently have VXLAN double encapsulation
Local customisations
Puppet or what?
We chose to use Ansible
• There’s only a single Puppet post-deploy hook
• Wider strategic use of Ansible within Sanger IT
• Keep configuration in git
Our customisations
• scheduler tweaks (stack not spread, CPU/RAM overcommit)
• hypervisor tweaks (instance root disk on Ceph or hypervisor)
• enable SSL for Horizon and API
• change syslog destination
• add “MOTD” to Horizon login page
• change session timeouts
• register systems with RedHat
• and more...
Customisation pitfalls
Some customisations become obsolete when moving to a newer
version of OpenStack - can’t blindly carry them forward
A redeploy (e.g. to add compute nodes) overwrites configuration so
the customisations need to be reapplied - and there’s a window when
they’re absent
Restarting too many services too quickly upsets HAproxy, rabbitmq...
Flavours and host aggregates
Three main flavour types:
1. Standard “m1.*”
• True cloud-style compute; root disk on hypervisor; 90% of compute
nodes
2. Ceph “c1.*”
• Root disk on Ceph allows live migration; 6 compute nodes support this
3. Reserved “h1.*”
• Limited to tenants running essential availability services
Flavours and host aggregates
Per-project flavours:
• For Cancer group “k1.*”
• True cloud-style compute, like “m1.*”
• Sized to fit two instances on each hypervisor: half the disk, half the CPUs,
half the RAM
• Trying to prevent Ceph “double load” caused by data movement:
Ceph→S3→instance→Cinder volume→Ceph
• Only viable with homogeneous hypervisors and known/predictable
resource requirements
Deployment thoughts
“Premature optimisation is the root of all evil” - Knuth
“Get it working, then make it faster” - my boss Pete
“Keep it simple (because I’m) stupid” - me
Turn off h/w acceleration (10GbE offloads guilty until proven innocent)
Find some enthusiastic early adopters to shake the problems out
Deploy, monitor, tweak, rinse, repeat
Metrics, monitoring, logging
Metrics
Find the balance between
“if it moves, graph it”
and
“don’t overload the metrics server”
50,000 metrics every 10 seconds is optimistic
Architecture
We’re using collectd → graphite/carbon → grafana
Modular plugins make it easy to record new metrics e.g.
entropy_avail
Using the collectd libvirt plugin means new instances are
automatically measured
...although the automatic naming isn’t great:
openstack_flex2.instance-00000097_bbb85e84-6c0c-4fe
8-9b3c-db17a665e7ef.libvirt.virt_cpu_total
Per-tenant graphs
Logging
We wanted something like Splunk
...but without the £££
We’re using ELK
Today as a syslog destination; planning to use rsyslog to watch
OpenStack component log files
Monitoring
Bare minimum in Opsview (Nagios)
• Horizon and API availability
• Controllers up
• radosgw S3 availability
• Ceph nodes up
We’d like hardware status reporting but SuperMicro IPMI is not helpful
Pitfalls and problems
“Space,” it says, “is big. Really big. You just won't believe how vastly,
hugely, mindbogglingly big it is.”
There’s a substantial learning curve for admins and developers
OpenStack
Problems with Docker
Docker likes to use 172.17.0.0/16 for its bridge network
Sanger uses 172.16.0.0/12 for its internal network
...oh.
Also problems with bridge MTU > instance MTU and PMTUD not
working. Fix: --bip=192.168.3.3/24 --mtu=1400
Problems with radosgw
Ceph radosgw implements most but not all AWS S3 features
ACLs are implemented, policies are not
We’re trying to implement a write-only bucket using nginx as a proxy
to rewrite the auth header
Problems with DHCP
On Ceph nodes, Ubuntu DHCP client doesn’t request a default
gateway
Infoblox DHCP server sends Classless Static Routes option
DHCP client can override a server-supplied value but not ignore it
The Ceph nodes’ default route ends up pointing down the 1GbE
management NIC not the 2x100GbE bond
...oh.
Problems with rabbitmq
rabbitmq partitions are really painful
We sometimes end up rebooting all the controllers - there must be a
better way
Fortunately running instances aren’t affected
Problems with deployment
Running the overcloud deployment from the wrong directory is
very bad
The deployer doesn’t find the file containing the service
passwords and proceeds to change them all, which is very tedious
to recover from
The deployment script really really really needs to have
cd ~stack
to prevent accidents
Problems with cinder
When a volume is destroyed, cinder overwrites the volume with
zeroes
If a user is running a pipeline which creates and destroys many 1TB
volumes this produces a lot of I/O
Consider setting volume_clear and/or volume_clear_size in
cinder.conf
Use cases
Prostate cancer analysis
Pan-Prostate builds on previous Pan-Cancer work
Multiple participating institutes using Docker to provide a consistent
analysis framework
In the past that required admin time to build an isolated network,
now OpenStack gives us that for free - and lets the scientists drive it
themselves
wr - Workflow Runner
Reimplementation of Vertebrate Resequencing Group’s pipeline
manager in Go
Designed to be fast, powerful and easy to use
Can manage LSF like existing version, and adds OpenStack
https://github.com/VertebrateResequencing/wr
wr - Workflow Runner
Lessons learned:
• “There’s a surprising amount of stuff you have to do to get
everything working well”
• There are annoying gaps in the Go SDK
• Lots of things can go wrong if end users bring up servers, so handle
all the details for them
New Pipeline Group
Using s3fs as a shim on top of radosgw S3 speeds development
s3fs presents a bucket as a filesystem (but it’s turtles all the way
down)
In tests, launching up to 240 instances, for read-only access to a few
GB of reference sequence data, with caching turned on: up to ~8
might get stuck
Human Genetics Informatics
Working towards a production Arvados system
Speedbumps around many tools/SDKs assuming real AWS S3, not
some S3-alike
Sending patches to open-source projects (Packer, Terraform…)
What next?
More Ceph
...because 1PB isn’t enough…
This has implications for DC placement (due to cooling requirements)
and Ceph CRUSH map (to ensure data replicas are properly
separated)
Should we split rbd pools from radosgw pools?
OpenStack version upgrade
We will probably skip to RHOSP10 (Newton)
Need Arista driver integrations for VXLAN for metal-as-a-service
We will install a new system alongside the current one and migrate
users and then compute nodes
$THING-as-a-service
metal - deploy instance on bare-metal (Ironic)
key management (Barbican) to enable encrypted volumes
DNS (Designate)
shared filesystem (Manila)
…though many of these can already be achieved with creative use of
images/heat/user-data
Federation
JISC Assent looks interesting
Lots of internal process to work through first
Open questions about:
• scheduling - pre-emptible instances would help
• charging - market-based instance pricing?
Lustre
We have 13PB of Lustre storage
Consider exposing some of it to tenants using Lustre routers, NID
mapping and sub-mounts
Little things
• expose hypervisor RNG to instances
• could make instance key generation go faster
• have LogStash report metrics of “log per host”
• to spot log volume anomalies
• ...
Thanks
My colleagues at Sanger - both in Systems and across the institute
The OpenStack community
Helpful people on mailing lists
Questions?
dh3@sanger.ac.uk
Sanger OpenStack presentation March 2017

Contenu connexe

Tendances

Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtqViet Stack
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraDataStax Academy
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache ZookeeperAnshul Patel
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Puppet
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopAll Things Open
 
Geek Week 2016 - Deep Dive To Openstack
Geek Week 2016 -  Deep Dive To OpenstackGeek Week 2016 -  Deep Dive To Openstack
Geek Week 2016 - Deep Dive To OpenstackHaim Ateya
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionJohn Garbutt
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesBinu George
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleDatadog
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudVinay Kumar Chella
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm毅 吕
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 

Tendances (20)

Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Cloud data center and openstack
Cloud data center and openstackCloud data center and openstack
Cloud data center and openstack
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014
 
Way to cloud
Way to cloudWay to cloud
Way to cloud
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
 
Geek Week 2016 - Deep Dive To Openstack
Geek Week 2016 -  Deep Dive To OpenstackGeek Week 2016 -  Deep Dive To Openstack
Geek Week 2016 - Deep Dive To Openstack
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 

Similaire à Sanger OpenStack presentation March 2017

Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSOpenstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSSadique Puthen
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputePatrick McGarry
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph Community
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Sadique Puthen
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiffvandersantiago
 

Similaire à Sanger OpenStack presentation March 2017 (20)

Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSOpenstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiff
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Sanger OpenStack presentation March 2017

  • 1. OpenStack at the Sanger Institute Dave Holland
  • 2. From zero knowledge to “Pan-Prostate Genome Blaster” in 18 months
  • 3. What I’ll talk about ● The Sanger Institute ● Motivations for using OpenStack ● Our journey ● Some decisions we made (and why) ● Some problems we encountered (and how we addressed them) ● Projects that are using it so far ● Next steps
  • 4. The Sanger Institute LSF 9 ~10,000 cores in main compute farm ~10,000 cores across smaller project-specific farms 13PB Lustre storage Mostly everything is available everywhere - “isolation” is based on POSIX file permissions
  • 5. Motivations LSF great for HPC utilization but… ● It doesn’t address data size/sharing/locality ● It’s quicker to move an image (or an image definition) to the data ○ benefit from existing data security arrangements ○ benefit from tenant isolation LSF isn’t going away - complementary to cloud-style computing
  • 6. Our journey ● 2015, June: sysadmin training ● July: experiments with RHOSP6 (Juno) ● August: RHOSP7 (Kilo) released ● December: pilot “beta” system opened to testers ● 2016, first half: Science As A Service ● July: pilot “gamma” system opened using proper Ceph hardware ● August: datacentre shutdown ● September: production system hardware installation ● 2017, January: “delta” system opened to early adopters ● February: Sanger Flexible Compute Platform announced
  • 7. Science As A Service First half of 2016 Proof-of-concept of a user-friendly orchestration portal (CloudForms) on top of OpenStack and VMware Consultancy and development input from RedHat Presented at Scientific Working Group in Barcelona summit, October 2016
  • 8.
  • 9.
  • 11. Hardware We approached current vendors, and SuperMicro via BIOS-IT Wanted to get most bang for buck Arista provided seed switch kit and offered VXLAN support
  • 12.
  • 13. Production OpenStack (1) • 107 Compute nodes (Supermicro) each with: • 512GB of RAM, 2 * 25GB/s network interfaces • 1 * 960GB local SSD, 2 * Intel E52690v4 (14 cores @ 2.6Ghz) • 6 Control nodes (Supermicro) allow 2 openstack deployments • 256 GB RAM, 2 * 100 GB/s network interfaces • 1 * 120 GB local SSD, 1 * Intel P3600 NVMe (/var) • 2 * Intel E52690v4 (14 cores @ 2.6Ghz) • Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading • RHOSP8 (Liberty) deployed with Triple-O
  • 14. Production OpenStack (2) • 9 Storage nodes (Supermicro) each with: • 512GB of RAM • 2 * 100GB/s network interfaces, • 60 * 6TB SAS discs, 2 system SSD • 2 * Intel E52690v4 (14 cores @ 2.6Ghz) • 4TB of Intel P3600 NVMe used for journal • Ubuntu Xenial • 3 PB of disc space, 1PB usable • Single instance (1.3 GBytes/sec write, 200 MBytes/sec read) • Ceph benchmarks imply 7 GBytes/sec
  • 15. Production OpenStack (3) • 3 racks of equipment, 24 KW load per rack • 10 Arista 7060CX-32S switches • 1U, 32 * 100Gb/s -> 128 * 25Gb/s • Hardware VXLAN support integrated with OpenStack * • Layer two traffic limited to rack, VXLAN used inter-rack • Layer three between racks and interconnect to legacy systems • All network switch software can be upgraded without disruption • True Linux systems • 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems * VxLan in ml2 plugin not used in first iteration because of software issues
  • 16. OpenStack installation RHOSP vs Packstack vs … • Paid-for support from RedHat • Terminology confusion: Triple-O undercloud and overcloud • Need wellness checks of undercloud and overcloud before each (re)deploy • Keep deployment configuration in git and deploy with a script for consistency
  • 17.
  • 18. Ceph installation Integrated or standalone? • Deployment by RHOSP is easier but is tied to that OpenStack • A separate self-supported Ceph was more cost effective and a better fit for staff knowledge at the time • It’s possible to share a Ceph between multiple OpenStacks • ceph-ansible is seductive but brings some headaches • e.g. --check causes problems like changing the fsid
  • 19. Networking We wanted VXLAN support in switches to enable metal-as-a-service Unfortunately we’re not there yet… e.g. ml2 driver bugs: “reserved” is not a valid UUID We currently have VXLAN double encapsulation
  • 21. Puppet or what? We chose to use Ansible • There’s only a single Puppet post-deploy hook • Wider strategic use of Ansible within Sanger IT • Keep configuration in git
  • 22. Our customisations • scheduler tweaks (stack not spread, CPU/RAM overcommit) • hypervisor tweaks (instance root disk on Ceph or hypervisor) • enable SSL for Horizon and API • change syslog destination • add “MOTD” to Horizon login page • change session timeouts • register systems with RedHat • and more...
  • 23. Customisation pitfalls Some customisations become obsolete when moving to a newer version of OpenStack - can’t blindly carry them forward A redeploy (e.g. to add compute nodes) overwrites configuration so the customisations need to be reapplied - and there’s a window when they’re absent Restarting too many services too quickly upsets HAproxy, rabbitmq...
  • 24. Flavours and host aggregates Three main flavour types: 1. Standard “m1.*” • True cloud-style compute; root disk on hypervisor; 90% of compute nodes 2. Ceph “c1.*” • Root disk on Ceph allows live migration; 6 compute nodes support this 3. Reserved “h1.*” • Limited to tenants running essential availability services
  • 25. Flavours and host aggregates Per-project flavours: • For Cancer group “k1.*” • True cloud-style compute, like “m1.*” • Sized to fit two instances on each hypervisor: half the disk, half the CPUs, half the RAM • Trying to prevent Ceph “double load” caused by data movement: Ceph→S3→instance→Cinder volume→Ceph • Only viable with homogeneous hypervisors and known/predictable resource requirements
  • 26. Deployment thoughts “Premature optimisation is the root of all evil” - Knuth “Get it working, then make it faster” - my boss Pete “Keep it simple (because I’m) stupid” - me Turn off h/w acceleration (10GbE offloads guilty until proven innocent) Find some enthusiastic early adopters to shake the problems out Deploy, monitor, tweak, rinse, repeat
  • 28. Metrics Find the balance between “if it moves, graph it” and “don’t overload the metrics server” 50,000 metrics every 10 seconds is optimistic
  • 29. Architecture We’re using collectd → graphite/carbon → grafana Modular plugins make it easy to record new metrics e.g. entropy_avail Using the collectd libvirt plugin means new instances are automatically measured ...although the automatic naming isn’t great: openstack_flex2.instance-00000097_bbb85e84-6c0c-4fe 8-9b3c-db17a665e7ef.libvirt.virt_cpu_total
  • 30.
  • 31.
  • 32.
  • 34. Logging We wanted something like Splunk ...but without the £££ We’re using ELK Today as a syslog destination; planning to use rsyslog to watch OpenStack component log files
  • 35. Monitoring Bare minimum in Opsview (Nagios) • Horizon and API availability • Controllers up • radosgw S3 availability • Ceph nodes up We’d like hardware status reporting but SuperMicro IPMI is not helpful
  • 37. “Space,” it says, “is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is.” There’s a substantial learning curve for admins and developers OpenStack
  • 38. Problems with Docker Docker likes to use 172.17.0.0/16 for its bridge network Sanger uses 172.16.0.0/12 for its internal network ...oh. Also problems with bridge MTU > instance MTU and PMTUD not working. Fix: --bip=192.168.3.3/24 --mtu=1400
  • 39. Problems with radosgw Ceph radosgw implements most but not all AWS S3 features ACLs are implemented, policies are not We’re trying to implement a write-only bucket using nginx as a proxy to rewrite the auth header
  • 40. Problems with DHCP On Ceph nodes, Ubuntu DHCP client doesn’t request a default gateway Infoblox DHCP server sends Classless Static Routes option DHCP client can override a server-supplied value but not ignore it The Ceph nodes’ default route ends up pointing down the 1GbE management NIC not the 2x100GbE bond ...oh.
  • 41. Problems with rabbitmq rabbitmq partitions are really painful We sometimes end up rebooting all the controllers - there must be a better way Fortunately running instances aren’t affected
  • 42. Problems with deployment Running the overcloud deployment from the wrong directory is very bad The deployer doesn’t find the file containing the service passwords and proceeds to change them all, which is very tedious to recover from The deployment script really really really needs to have cd ~stack to prevent accidents
  • 43. Problems with cinder When a volume is destroyed, cinder overwrites the volume with zeroes If a user is running a pipeline which creates and destroys many 1TB volumes this produces a lot of I/O Consider setting volume_clear and/or volume_clear_size in cinder.conf
  • 45. Prostate cancer analysis Pan-Prostate builds on previous Pan-Cancer work Multiple participating institutes using Docker to provide a consistent analysis framework In the past that required admin time to build an isolated network, now OpenStack gives us that for free - and lets the scientists drive it themselves
  • 46.
  • 47.
  • 48. wr - Workflow Runner Reimplementation of Vertebrate Resequencing Group’s pipeline manager in Go Designed to be fast, powerful and easy to use Can manage LSF like existing version, and adds OpenStack https://github.com/VertebrateResequencing/wr
  • 49.
  • 50. wr - Workflow Runner Lessons learned: • “There’s a surprising amount of stuff you have to do to get everything working well” • There are annoying gaps in the Go SDK • Lots of things can go wrong if end users bring up servers, so handle all the details for them
  • 51. New Pipeline Group Using s3fs as a shim on top of radosgw S3 speeds development s3fs presents a bucket as a filesystem (but it’s turtles all the way down) In tests, launching up to 240 instances, for read-only access to a few GB of reference sequence data, with caching turned on: up to ~8 might get stuck
  • 52. Human Genetics Informatics Working towards a production Arvados system Speedbumps around many tools/SDKs assuming real AWS S3, not some S3-alike Sending patches to open-source projects (Packer, Terraform…)
  • 54. More Ceph ...because 1PB isn’t enough… This has implications for DC placement (due to cooling requirements) and Ceph CRUSH map (to ensure data replicas are properly separated) Should we split rbd pools from radosgw pools?
  • 55. OpenStack version upgrade We will probably skip to RHOSP10 (Newton) Need Arista driver integrations for VXLAN for metal-as-a-service We will install a new system alongside the current one and migrate users and then compute nodes
  • 56. $THING-as-a-service metal - deploy instance on bare-metal (Ironic) key management (Barbican) to enable encrypted volumes DNS (Designate) shared filesystem (Manila) …though many of these can already be achieved with creative use of images/heat/user-data
  • 57. Federation JISC Assent looks interesting Lots of internal process to work through first Open questions about: • scheduling - pre-emptible instances would help • charging - market-based instance pricing?
  • 58. Lustre We have 13PB of Lustre storage Consider exposing some of it to tenants using Lustre routers, NID mapping and sub-mounts
  • 59. Little things • expose hypervisor RNG to instances • could make instance key generation go faster • have LogStash report metrics of “log per host” • to spot log volume anomalies • ...
  • 60. Thanks My colleagues at Sanger - both in Systems and across the institute The OpenStack community Helpful people on mailing lists