CERN uses cloud computing and virtualization to manage its large computing infrastructure needed for particle physics experiments like the Large Hadron Collider. Five years ago CERN transitioned to using open source tools like OpenStack, Puppet, and Ceph to automate management of its infrastructure across two data centers and improve agility, efficiency, and sustainability. This has enabled CERN to scale its cloud from managing a few thousand servers to over 70,000 virtual machines and 9,000 hypervisors while maintaining high performance and responding rapidly to security issues like Meltdown.
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
20181219 ucc open stack 5 years v3
1.
2. Clouds at CERN : A 5 year perspective
Utility and Cloud Computing Conference, December 19, 2018
Tim Bell
@noggin143UCC 2018 2
3. About Tim
• Responsible for Compute
and Monitoring in CERN
IT department
• Elected member of the
OpenStack Foundation
management board
• Member of the
OpenStack user
committee from 2013-
2015
UCC 2018 3
9. Image credit: CERN
Image credit: CERN
UCC 20189
ATLAS, CMS, ALICE and LHCb
EIFFEL
TOWER
HEAVIER
than the
Image credit: CERN
10. UCC 2018 10
40 million
pictures
per second
1PB/s
Image credit: CERN
11. About the CERN IT Department
UCC 2018 11
Enable the laboratory to fulfill its mission
- Main data centre on Meyrin site
- Wigner data centre in Budapest (since 2013)
- Connected via three dedicated 100Gbs links
- Where possible, resources at both sites
(plus disaster recovery)
Drone footage of the CERN CC
About the CERN IT Department
UCC 2018
4
Enable the laboratory to fulfill its mission
- Main data centre on Meyrin site
- Wigner data centre in Budapest (since 2013)
- Connected via three dedicated 100Gbs links
- Where possible, resources at both sites
(plus disaster recovery)
Drone footage of the CERN CC
19/12/2018
13. Outline
UCC 2018
13
• Fabric Management before 2012
• The AI Project
• The three AI areas
- Configuration Management
- Monitoring
- Resource provisioning
• Review
14. CERN IT Tools up to 2011 (1)
UCC 2018
14
• Developed in series of EU funded projects
- 2001-2004: European DataGrid
- 2004-2010: EGEE
• Work package 4 – Fabric management:
“Deliver a computing fabric comprised of all the necessary tools to
manage a centre providing grid services on clusters of thousands of
nodes.”
15. CERN IT Tools up to 2011 (2)
UCC 2018
15
• The WP4 software was developed from scratch
- Scale and experience needed for LHC Computing was special
- Config’ mgmt, monitoring, secret store, service status, state mgmt, service databases, …
LEMON – LHC Era Monitoring
- client/server based monitoring
- local agent with sensors
- samples stored in a cache & sent to server
- UDP or TCP, w/ or w/o encryption
- support for remote entities
- system administration toolkit
- automated installation, configuration &
management of clusters
- clients interact with a configuration
database (CMDB) & and an installation
infrastructure (AII)
Around 8’000 servers managed!
16. 2012: A Turning Point for CERN IT
UCC 2018
16
• EU projects finished in 2010: decreasing development and support
• LHC compute and data requirements increasing
- Moore’s law would help, but not enough
• Staff would not grow with managed resources
- Standardization & automation, current tools not apt
• Other deployments have surpassed the CERN one
- Mostly commercial companies like Google, Facebook, Rackspace, Amazon, Yahoo!, …
- We were no longer special! Can we profit?
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
we are
here
what we
can afford
LS1 (2013) ahead, next window for change would only open in 2019 …
2012
17. UCC 2018
17
How we began …
• Formed a small team of service managers from …
- Large services (e.g. batch, plus)
- Existing fabric services (e.g. monitoring)
- Existing virtualization service
• ... to define project goals
- What issues do we need to address?
- What forward looking features do we need?
http://iopscience.iop.org/article/10.1088/1742-6596/396/4/042002/pdf
18. Agile Infrastructure Project Goals
UCC 2018
18
New data centre support
- Overcome limits of CC in Meyrin
- Disaster recovery and business continuity
- ‘Smart hands’ approach
1
19. Agile Infrastructure Project Goals
UCC 2018
19
Sustainable tool support
- Tools to be used at our scale need maintenance
- Tools with a limited community require more time for
newcomers to become productive and are less valuable
for the time after (transferable skills)
2
20. Agile Infrastructure Project Goals
UCC 2018
20
Improve user response time
- Reduce the resource provisioning time span
(current virtualization service reached scaling limits)
- Self-service kiosk
3
21. Agile Infrastructure Project Goals
UCC 2018
21
Enable cloud interfaces
- Experiments already started to use EC2
- Enable libraries such as Apache’s libcloud
4
22. Agile Infrastructure Project Goals
UCC 2018
22
Precise monitoring and
accounting
- Enable timely monitoring for debugging
- Showback usage to the cloud users
- Consolidate accounting data for usage of CPU, network,
storage … across batch, physical nodes and grid
resources
5
23. Agile Infrastructure Project Goals
UCC 2018
23
Improve resource
efficiency
- Adapt provisioned resources to services’ needs
- Streamline the provisioning workflows
(e.g. burn-in, repair or retirement)
6
24. Our Approach: Tool Chain and DevOps
UCC 2018
24
• CERN’s requirements are no longer special!
• A set of tools emerged when looking at other places
• Small dedicated tools
allowed for rapid validation &
prototyping
• Adapted our processes,
policies and work flows
to the tools!
• Join (and contribute to)
existing communities!
25. IT Policy Changes for Services
UCC 2018
25
• Services shall be virtual …
- Within reason
- Exceptions are costly!
• Puppet managed, and …
• … monitored!
- (Semi-)automatic with Puppet
Decrease provisioning time
Increase resource efficiency
Simplify infrastructure mgmt
Profit from others’ work
Speed up deployment
‘Automatic’ documentation
Centralized monitoring
Integrated alarm handling
26. UCC 2018
26
Tools + Policies:
Sounds simple!
From tools to services is complex!
- Integration w/ sec services?
- Incident handling?
- Request work flows?
- Change management?
- Accounting and charging?
- Life cycle management?
- … Image: Subbu Allamaraju
28. Resource Provisioning: IaaS
UCC 2018
28
• Based on OpenStack
- Collection of open source projects for cloud orchestration
- Started by NASA and Rackspace in 2010
- Grown into a global software community
30. The CERN Cloud Service
UCC 2018
30
• Production since July 2013
- Several rolling upgrades since,
now on Rocky
- Many sub services deployed
• Spans two data centers
- One region, one API entry point
• Deployed using RDO + Puppet
- Mostly upstream, patched where needed
• Many sub services run on VMs!
- Boot strapping
32. Agility in the Cloud
UCC 2018
32
• Use case spectrum
- Batch service (physics analysis)
- IT services (built on each other)
- Experiment services (build)
- Engineering (chip design)
- Infrastructure (hotel, bikes)
- Personal (development)
• Hardware spectrum
- Processor archs (features, NUMA, …)
- Core-to-RAM ratio (1:2, 1:3, 1:5, …)
- Core-to-disk ratio (2x or 4x SSDs)
- Disk layout (2, 3, 4, mixed)
- Network (1/10GbE, FC, domain)
- Location (DC, power)
- SLC6, CC7, RHEL, Windows
- …
33. What about our initial goals?
UCC 2018
33
• The remote DC is seamlessly
integrated
- No difference from provisioning PoV
- Easily accessible by users
- Local DC limits overcome (business continuity?)
• Sustainable tools
- Number of managed machines has multiplied
- Good collaboration with upstream communities
- Newcomers know tools, can use knowledge
afterwards
• Provisioning time span is ~minutes
- Was several months before
- Self-service kiosk with automated workflows
• Cloud interfaces
- Good OpenStack adoption, EC2 support
• Flexible monitoring infra
- Automatic in for simple cases
- Powerful tool set for more complex ones
- Accounting for local and grid resources
• Increased resource efficiency
- ‘Packing’ of services
- Overcommit
- Adapted to services’ needs
- Quick draining & back filling
So … 100% success?
34. Cloud Architecture Overview
UCC 2018
34
• Top and child cells for scaling
- API, DB, MQ, Compute nodes
- Remote DC is set of cells
• Nova HA only on top cell
- Simplicity vs impact
• Other projects global
- Load balanced controllers
- RabbitMQ clusters
• Three Ceph instances
- Volumes (Cinder), images (Glance), shares (Manila)
36. Tech. Challenge: Scaling
• OpenStack Cells provides composable units
• Cells V1 – Special custom developments
• Cells V2 – Now the standard deployment model
• Broadcast vs Targetted queries
• Handling down cells
• Quota
• Academic and scientific instances push the
limits
• Now many enterprise clouds above 1000
hypervisors
• CERN running 73 Cells in production
UCC 2018 36
https://www.openstack.org/analytics
37. Tech. Challenge: CPU Performance
UCC 2018
37
• The benchmarks on full-node VMs was about 20% lower
than the one of the underlying host
- Smaller VMs much better
• Investigated various tuning options
- KSM*, EPT**, PAE, Pinning, … +hardware type dependencies
- Discrepancy down to ~10% between virtual and physical
• Comparison with Hyper-V: no general issue
- Loss w/o tuning ~3% (full-node), <1% for small VMs
- … NUMA-awareness!
*KSM on/off: beware of memory reclaim! **EPT on/off: beware of expensive page table walks!
38. CPU Performance: NUMA
UCC 2018
38
• NUMA-awareness identified as most
efficient setting
• “EPT-off” side-effect
- Small number of hosts, but very
visible there
• Use 2MB Huge Pages
- Keep the “EPT off” performance gain
with “EPT on”
39. NUMA roll-out
UCC 2018
39
• Rolled out on ~2’000 batch hypervisors (~6’000 VMs)
- HP allocation as boot parameter reboot
- VM NUMA awareness as flavor metadata delete/recreate
• Cell-by-cell (~200 hosts):
- Queue-reshuffle to minimize resource impact
- Draining & deletion of batch VMs
- Hypervisor reconfiguration (Puppet) & reboot
- Recreation of batch VMs
• Whole update took about 8 weeks
- Organized between batch and cloud teams
- No performance issue observed since
VM Before After
4x 8 8%
2x 16 16%
1x 24 20% 5%
1x 32 20% 3%
41. VM Expiry
UCC 2018 41
• Each personal instance will have an expiration date
• Set shortly after creation and evaluated daily
• Configured to 180 days, renewable
• Reminder mails starting 30 days before expiration
43. Tech. Challenge: Bare Metal
UCC 2018 43
• VMs not suitable for all of our use cases
- Storage and database nodes, HPC clusters, boot strapping,
critical network equipment or specialised network setups,
precise/repeatable benchmarking for s/w frameworks, …
• Complete our service offerings
- Physical nodes (in addition to VMs and containers)
- OpenStack UI as the single pane of glass
• Simplify hardware provisioning workflows
- For users: openstack server create/delete
- For procurement & h/w provisioning team: initial on-boarding, server re-assignments
• Consolidate accounting & bookkeeping
- Resource accounting input will come from less sources
- Machine re-assignments will be easier to track
44. Adapt the Burn In process
• “Burn-in” before acceptance
- Compliance with technical spec (e.g. performance)
- Find failed components (e.g. broken RAM)
- Find systematic errors (e.g. bad firmware)
- Provoke early failing due to stress
- Tests include
- CPU: burnK7, burnP6, burnMMX (cooling)
- RAM: memtest, Disk: badblocks
- Network: iperf(3) between pairs of nodes
- automatic node pairing
- Benchmarking: HEPSpec06 (& fio)
- derivative of SPEC06
- we buy total compute capacity (not newest processors)
UCC 2018 44
46. Tech. Challenge: Containers
UCC 2018 46
An OpenStack API Service that allows creation of container
clusters
● Use your OpenStack credentials, quota and roles
● You choose your cluster type
● Multi-Tenancy
● Quickly create new clusters with advanced features
such as multi-master
● Integrated monitoring and CERN storage access
● Making it easy to do the right thing
47. Scale Testing using Rally
• An Openstack benchmark test tool
• Easily extended by plugin
• Test result in HTML reports
• Used by many projects
• Context: set up environment
• Scenario: run benchmark
• Recommended for a production
service
to verify that the service behaves as
expected at all time
UCC 2018 47
Kubernetes
Cluster
pods,
contai
ners
Rally
report
48. First Attempt – 1M requests/Seq
• 200 Nodes
• Found multiple limits
• Heat Orchestration scaling
• Authentication caches
• Volume deletion
• Site services
UCC 2018 48
50. Tech. Challenge: Meltdown
UCC 2018 50
• In January 2018, a security vulnerability was
disclosed a new kernel everywhere
• Staged campaign
• 7 reboot days, 7 tidy up days
• By availability zone
• Benefits
• Automation now to reboot the cloud if needed -
33,000 VMs on 9,000 hypervisors
• Latest QEMU and RBD user code on all VMs
• Then L1TF came along
• And we had to do it all again......
06/06/2018
51. UCC 2018 51
First run LS1 Second run Third run LS3 HL-LHC Run4
…2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025
LS2
Significant part of cost comes
from global operations
Even with technology increase of
~15%/year, we still have a big
gap if we keep trying to do things
with our current compute models
Raw data volume
increases significantly
for High Luminosity LHC
2026
53. Non-Technical Challenges (1)
UCC 2018
53
• Agile Infrastructure Paradigm Adoption
- ‘VMs are slower than physical machines.’
- ‘I need to keep control on the full stack.’
- ‘This would not have happened with physical machines.’
- ‘It’s the cloud, so it should be able to do X!’
- ‘Using a config’ management tool is too dangerous!’
- ‘They are my machines’
54. Non-Technical Challenges (2)
UCC 2018
54
• Agility can bring great benefits …
• … but mind (adapted) Hooke’s Law!
- Avoid irreversible deformations
• Ensure the tail is moving as well as
the head
- Application support
- Cultural changes
- Workflow adoption
- Open source community culture can help
55. Non-Technical Challenges (3)
• Contributor License Agreements
• Patches needed but merges/review time
• Regular staff changes limits Karma
• Need to be a polyglot
• Python, Ruby, Go, … and legacy Perl etc.
• Keep riding the release wave
• Avoid the end-of-life scenarios
UCC 2018 55
56. Ongoing Work Areas
• Spot Market / Pre-emptible instances
• Software Defined Networking
• Regions
• GPUs
• Containers on Bare Metal
• …
UCC 2018 56
57. Summary
UCC 2018
57
Positive results after 5 years into the project!
- LHC needs met without additional staff
- Tools and workflows widely adopted and accepted
- Many technical challenges were mastered and returned upstream
- Integration with open source communities successful
- Use of common tools increased CERN’s attraction of talents
Further enhancements in function & scale needed for HL-LHC
58. Further Information
• CERN information outside the auditorium
• Jobs at CERN – wide range of options
• http://jobs.cern
• CERN blogs
• http://openstack-in-production.blogspot.ch
• https://techblog.web.cern.ch/techblog/
• Recent Talks at OpenStack summits
• https://www.openstack.org/videos/search?search=cern
• Source code
• https://github.com/cernops and https://github.com/openstack
UCC 2018 58
61. Agile Infrastructure Core Areas
UCC 2018
61
• Resource provisioning (IaaS)
- Based on OpenStack
• Centralized Monitoring
- Based on Collectd (sensor) + ‘ELK’ stack
• Configuration Management
- Based on Puppet
62. Configuration Management
UCC 2018
62
• Client/server architecture
- ‘agents’ running on hosts plus horizontally scalable ‘masters’
• Desired state of hosts described in ‘manifests’
- Simple, declarative language
- ‘resource’ basic unit for system modeling, e.g. package or service
• ‘agent’ discovers system state using ‘facter’
- Sends current system state to masters
• Master compiles data and manifests into ‘catalog’
- Agent applies catalog on the host
63. Status: Config’ Management (1)
UCC 2018
63
(virtual and physical, private and public cloud)
(‘base’ is what every Puppet node gets)
(compilations are spread out)
(this number includes dev changes)
(number Puppet code committers)
65. Status: Config’ Management (3)
UCC 2018
65
• Changes to QA are
announced publicly
• QA duration: 1 week
• All Service Managers
can stop a change!
66. Monitoring: Scope
UCC 2018
66
Data Centre Monitoring
• Two DCs at CERN and Wigner
• Hardware, O/S, and services
• PDUs, temp sensors, …
• Metrics and logs
Experiment Dashboards
- WLCG Monitoring
- Sites availability, data transfers,
job information, reports
- Used by WLCG, experiments,
sites and users
67. UCC 2018
67
Status: (Unified) Monitoring (1)
• Offering: monitor, collect, aggregate, process,
visualize, alarm … for metrics and logs!
• ~400 (virtual) servers, 500GB/day, 1B docs/day
- Mon data management from CERN IT and WLCG
- Infrastructure and tools for CERN IT and WLCG
• Migrations ongoing (double maintenance)
- CERN IT: From Lemon sensor to collectd
- WLCG: From former infra, tools, and dashboards
68. Status: (Unified) Monitoring (2)
UCC 2018
68
Kafka cluster
(buffering) *
Processing
Data enrichment
Data aggregation
Batch Processing
Transport
FlumeKafkasink
Flumesinks
FTS
Data
Sources
Rucio
XRootD
Jobs
…
Lemon
syslog
app log
DB
HTTP
feed
AMQ
Flume
AMQ
Flume
DB
Flume
HTTP
Flume
Log
GW
Flume
Metric
GW
Logs
Lemon
metrics
HDFS
Elastic
Search
…
Storage &
Search
Others
(influxdb)
Data
Access
CLI, API
User
Views
User
Jobs
User
Data
Today: > 500 GB/day, 72h buffering
Notes de l'éditeur
Reference: Fabiola’s talk @ Univ of Geneva
https://www.unige.ch/public/actualites/2017/le-boson-de-higgs-et-notre-vie/
European Centre for Nuclear research
Founded in 1954, today 22 member state
World largest particle physics laboratory
~2.300 staff, 13k users on site
Budget 1k MCHF
Mission
Answer fundamental question on the universe
Advance the technology frontiers
Train scientist of tomorrow
Bring nations together
https://communications.web.cern.ch/fr/node/84
For all this fundamental research, CERN provides different facilities to scientists, for example the LHC
It’s a ring 27 km in circumference, crosses 2 countries, 100 mt underground, accelerates 2 particle beans to near the speed of light, and it make them collides to 4 different points where there are detectors to observe the fireworks.
2.500 people employed by CERN, > 10k users on the site
Talk about LHC here, describe experiment, lake geneve , mont blanc, an then jump in
Big ring is the LHC, the small one is the SPS, computer centre is not so far.
Pushing the boundary of technology,
It facilitate research, we just run the accelerators, experiment are done by institurtes, member states, university
Itranco swiss border, very close to geneva
Our flagship program is the LHC
Trillions of protons race around the 27km ring in opposite directions over 11,000 times a second, travelling at 99.9999991 per cent the speed of light.
Largest machine on earth
With an operating temperature of about -271 degrees Celsius, just 1.9 degrees above absolute zero, the LHC is one of the coldest places in the universe
120T Helium, only at that temperature there is no resistence
https://home.cern/about/engineering/vacuum-empty-interstellar-space
Inside beam operate a vey high vacuum, comparable to vacuum of the moon, there actually 2 beam, proton beams going int 2 directions, vaccum to avoiud protocon interacting with other particles
Technology very advanced beasts, 4 of them, ATLAS and CMS are the most well known ones, generale pouprose testing standard model properties, in those detector higgs particle have been discovered in 2012
In the picture you can see physicists. ALICE and LHCB
To sample and record the debris from up to 600 million proton collisions per second, scientists are building gargantuan devices that measure particles with micron precision.
100 Mpixel camera, 40 Million picture per seconds
https://www.ethz.ch/en/news-and-events/eth-news/news/2017/03/new-heart-for-cerns-cms.html