SlideShare une entreprise Scribd logo
1  sur  48
Tim Bell
@noggin143
tim.bell@cern.ch
23/07/2014 2OSCON - CERN Mass and Agility
About Tim
• Runs IT Infrastructure group at CERN
• Member of OpenStack management board
and user committee
• Previously worked at
• Deutsche Bank running European Private
Banking Infrastructure
• IBM as a consultant and kernel developer
23/07/2014 3OSCON - CERN Mass and Agility
23/07/2014 4
CERN was founded 1954: 12 European States
“Science for Peace”
Today: 21 Member States
Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,
Finland, France, Germany, Greece, Hungary, Israel, Italy, the Netherlands,
Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and
the United Kingdom
Candidate for Accession: Romania
Associate Members in Pre-Stage to Membership: Serbia
Applicant States for Membership or Associate Membership:
Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine
Observers to Council: India, Japan, Russia, Turkey, United States of America;
European Commission and UNESCO
~ 2,300 staff
~ 1,000 other paid personnel
> 11,000 users
Budget (2013) ~1,000 MCHF
OSCON - CERN Mass and Agility
What are the Origins of Mass ?
23/07/2014 5OSCON - CERN Mass and Agility
Matter/Anti Matter Symmetric?
23/07/2014 6OSCON - CERN Mass and Agility
Where is 95% of the Universe?
23/07/2014 7OSCON - CERN Mass and Agility
23/07/2014 8OSCON - CERN Mass and Agility
23/07/2014 9OSCON - CERN Mass and Agility
23/07/2014 10OSCON - CERN Mass and Agility
Collisions
23/07/2014 11OSCON - CERN Mass and Agility
A Big Data Challenge
23/07/2014 12
In 2014,
• ~ 100PB archive with additional 35PB/year
• ~ 11,000 servers
• ~ 75,000 disk drives
• ~ 45,000 tapes
• Data should be kept for at least 20 years
In 2015, we start the accelerator again
• Upgrade to double the energy of the beams
• Expect a significant increase in data rate
OSCON - CERN Mass and Agility
LHC data growth
• Plan to record
400PB/year by
2023
• Compute needs
expected to be
around 50x current
levels if budget
available
23/07/2014 OSCON - CERN Mass and Agility 13
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
Run 1 Run 2 Run 3 Run 4
CMS
ATLAS
ALICE
LHCb
2010 2015 2018 2023
PB
per
year
23/07/2014 14
Tier-1 (11 centres):
•Permanent storage
•Re-processing
•Analysis
Tier-0 (CERN):
•Data recording
•Initial data reconstruction
•Data distribution
Tier-2 (~200 centres):
• Simulation
• End-user analysis
• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC
Computing Grid
• In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs
OSCON - CERN Mass and Agility
The CERN Meyrin Data Centre
23/07/2014 15OSCON - CERN Mass and Agility
New Data Centre in Budapest
23/07/2014 16OSCON - CERN Mass and Agility
Good News, Bad News
23/07/2014 OSCON - CERN Mass and Agility 17
• Additional data centre in Budapest now online
• Increasing use of facilities as data rates increase
But…
• Staff numbers are fixed, no more people
• Materials budget decreasing, no more money
• Legacy tools are high maintenance and brittle
• User expectations are for fast self-service
Public Procurement Cycle
Step Time (Days) Elapsed (Days)
User expresses requirement 0
Market Survey prepared 15 15
Market Survey for possible vendors 30 45
Specifications prepared 15 60
Vendor responses 30 90
Test systems evaluated 30 120
Offers adjudicated 10 130
Finance committee 30 160
Hardware delivered 90 250
Burn in and acceptance 30 days typical with 380 worst case 280
Total 280+ Days
23/07/2014 OSCON - CERN Mass and Agility 18
Approach
• There is no Moore’s Law for people
• Automation needs APIs, not documented procedures
• Focus on high people effort activities
• Are those requirements really justified ?
• Accumulating technical debt stifles agility
• Find open source communities and contribute
• Understand ethos and architecture
• Stay mainstream
23/07/2014 OSCON - CERN Mass and Agility 19
O’Reilly Consideration
23/07/2014 OSCON - CERN Mass and Agility 20
Indeed.Com Consideration
23/07/2014 OSCON - CERN Mass and Agility 21
23/07/2014
Bamboo
Koji, Mock
AIMS/PXE
Foreman
Yum repo
Pulp
Puppet-DB
mcollective, yum
JIRA
Lemon /
Hadoop /
LogStash /
Kibana
git
OpenStack
Nova
Hardware
database
Puppet
Active Directory /
LDAP
22OSCON - CERN Mass and Agility
Puppet Configuration
23/07/2014 OSCON - CERN Mass and Agility 23
• Over 10,000 hosts in
Puppet
• 160 different hostgroups
• Tool chain using
• PuppetDB
• Foreman
• Git
• Scaling issues resolved
with the communities
Monitoring - Flume, Elastic
Search, Kibana
24
HDFS
Flume
gateway
elasticsearch Kibana
OpenStack infrastructure
23/07/2014 OSCON - CERN Mass and Agility
23/07/2014 25
Microsoft Active
Directory
CERN DB
on Demand
CERN Network
Database
Account mgmt
system
Horizon
Keystone
Glance
Network
Compute
Scheduler
Cinder
Nova
Block Storage
Ceph & NetApp
CERN
Accounting
Ceilometer
OSCON - CERN Mass and Agility
compute-nodescontrollers
compute-nodes
Scaling Architecture Overview
26
Child Cell
Geneva, Switzerland
Child Cell
Budapest, Hungary
Top Cell - controllers
Geneva, Switzerland
Load Balancer
Geneva, Switzerland
controllers
23/07/2014 OSCON - CERN Mass and Agility
Status
• Multi-data centre cloud in production since July
2013 (Geneva and Budapest) with nearly 1,000
users
• Currently running OpenStack Havana
• KVM and Hyper-V deployed
• All configured automatically with Puppet
• ~70,000 cores on ~3,000 servers
• 3PB Ceph pool available for volumes, images and
other physics storage
23/07/2014 27OSCON - CERN Mass and Agility
The Agile Experience
23/07/2014 OSCON - CERN Mass and Agility 28
Cultural Barriers
23/07/2014 OSCON - CERN Mass and Agility 29
Agility and Elasticity Limits
• Communities help to set good behaviour
• Internal demonstrations build momentum
• Finding the right speed is key
• Keeping up with releases takes focus
• Coping with legacy requires compromise
• Travel budget needs significant increase!
23/07/2014 OSCON - CERN Mass and Agility 30
Next Steps: Scale with Physics
• Scaling to >100,000 cores by 2015
• Around 100 hypervisors per week with fixed staff
• Deploying and configuring latest releases
• Need to stay close … but not too close
• Legacy systems retirement
• Server consolidation
• Home grown configuration and monitoring
• Analytics of processor, disk and network
• Focus on efficiency
23/07/2014 31OSCON - CERN Mass and Agility
IN2P3
Lyon
Next Steps: Federated Clouds
Public Cloud such
as Rackspace
CERN Private
Cloud
70K cores
ATLAS Trigger
28K cores
CMS Trigger
12K cores
Brookhaven
National Labs
NecTAR
Australia
Many Others on
Their Way
23/07/2014 OSCON - CERN Mass and Agility 32
Summary
• Open source tools have successfully replaced CERN’s
legacy fabric management system
• Scaling to 100,000s of cores with OpenStack and
Puppet is in sight
• Cultural change to an Agile approach has required time
and patience but is paying off
Community collaboration needed to reach 400PB/year
23/07/2014 33OSCON - CERN Mass and Agility
Questions ?
23/07/2014 34
• Details at
http://openstack-in-
production.blogspot.fr
• Previous presentations at
http://information-
technology.web.cern.ch/boo
k/cern-private-cloud-user-
guide/openstack-information
• CERN code is at
http://github.com/cernops
OSCON - CERN Mass and Agility
23/07/2014 35OSCON - CERN Mass and Agility
23/07/2014 36OSCON - CERN Mass and Agility
23/07/2014 37
http://www.eucalyptus.com/blog/2013/04/02/cy13-q1-community-analysis-%E2%80%94-openstack-vs-opennebula-vs-eucalyptus-vs-
cloudstack
OSCON - CERN Mass and Agility
23/07/2014 38OSCON - CERN Mass and Agility
Monitoring - Kibana
3923/07/2014 OSCON - CERN Mass and Agility
Monitoring - Kibana
4023/07/2014 OSCON - CERN Mass and Agility
23/07/2014 41OSCON - CERN Mass and Agility
Architecture Components
42
rabbitmq
- Keystone
- Nova api
- Nova conductor
- Nova scheduler
- Nova network
- Nova cells
- Glance api
- Ceilometer agent-central
- Ceilometer collector
Controller
- Flume
- Nova compute
- Ceilometer agent-compute
Compute node
- Flume
- HDFS
- Elastic Search
- Kibana
- MySQL
- MongoDB
- Glance api
- Glance registry
- Keystone
- Nova api
- Nova consoleauth
- Nova novncproxy
- Nova cells
- Horizon
- Ceilometer api
- Cinder api
- Cinder volume
- Cinder scheduler
rabbitmq
Controller
Top Cell Children Cells
- Stacktach
- Ceph
- Flume
23/07/2014 OSCON - CERN Mass and Agility
Upgrade Strategy
• Surely “OpenStack can’t be upgraded”
• Our Essex, Folsom and Grizzly clouds were ‘tear-down’
migrations
• Puppet managed VMs are typical Cattle cases – re-create
• User VMs snapshot, download image and upload to new instance
• One month window to migrate
• Users of production services expect more
• Physicists accept not creating/changing VMs for a short period
• Running VMs must not be affected
23/07/2014 43OSCON - CERN Mass and Agility
Phased Migration
• Migrated by Component
• Choose an approach (online with load balancer, offline)
• Spin up ‘teststack’ instance with production software
• Clone production databases to test environment
• Run through upgrade process
• Validate existing functions, Puppet configuration and monitoring
• Order by complexity and need
• Ceilometer, Glance, Keystone
• Cinder, Client CLIs, Horizon
• Nova
23/07/2014 44OSCON - CERN Mass and Agility
Upgrade Experience
• No significant outage of the cloud
• During upgrade window, creation not possible
• Small incidents (see blog for details)
• Puppet can be enthusiastic! - we told it to be 
• Community response has been great
• Bugs fixed and points are in Juno design summit
• Rolling upgrades in Icehouse will make it easier
23/07/2014 45OSCON - CERN Mass and Agility
Duplication and Divergence
Service Silos Functional Layers
23/07/2014 OSCON - CERN Mass and Agility 46
Network
Hardware Facilities
Storage
Compute
Windows
Web
Database
Custom
Network
Hardware Facilities
Infrastructure as a Service
Platform as a Service
Storage
Compute
Windows
Service Models
23/07/2014 47
• Pets are given names like pussinboots.cern.ch
• They are unique, lovingly hand raised and cared for
• When they get ill, you nurse them back to health
• Cattle are given numbers like vm0042.cern.ch
• They are almost identical to other cattle
• When they get ill, you get another one
OSCON - CERN Mass and Agility
23/07/2014 48OSCON - CERN Mass and Agility

Contenu connexe

Tendances

Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStackRainya Mosher
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN GenevaSteve Traylen
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...NAVER D2
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sBelmiro Moreira
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組みRyousei Takano
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014Jun Lee
 
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)Dan Starr
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebula Project
 

Tendances (20)

Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Learning to Scale OpenStack
Learning to Scale OpenStackLearning to Scale OpenStack
Learning to Scale OpenStack
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN Geneva
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
 
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)
Starr Bloom T.C.P. using Hadoop on Yahoo's M45 Cluster (20100112)
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
 

Similaire à CERN Mass and Agility talk at OSCON 2014

20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6Tim Bell
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
CERN & Huawei collaboration to improve OpenStack for running large scale scie...
CERN & Huawei collaboration to improve OpenStack for running large scale scie...CERN & Huawei collaboration to improve OpenStack for running large scale scie...
CERN & Huawei collaboration to improve OpenStack for running large scale scie...Helix Nebula The Science Cloud
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017Stacy Véronneau
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab OverviewEd Dodds
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022HostedbyConfluent
 
Cloud Standards in the Real World: Cloud Standards Testing for Developers
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersCloud Standards in the Real World: Cloud Standards Testing for Developers
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersAlan Sill
 
All about open stack
All about open stackAll about open stack
All about open stackDataCentred
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Databricks
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Larry Smarr
 

Similaire à CERN Mass and Agility talk at OSCON 2014 (20)

20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
CERN & Huawei collaboration to improve OpenStack for running large scale scie...
CERN & Huawei collaboration to improve OpenStack for running large scale scie...CERN & Huawei collaboration to improve OpenStack for running large scale scie...
CERN & Huawei collaboration to improve OpenStack for running large scale scie...
 
OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017OpenStack Toronto Q3 MeetUp - September 28th 2017
OpenStack Toronto Q3 MeetUp - September 28th 2017
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
Monitoring Exascale Supercomputers With Tim Osborne | Current 2022
 
Cloud Standards in the Real World: Cloud Standards Testing for Developers
Cloud Standards in the Real World: Cloud Standards Testing for DevelopersCloud Standards in the Real World: Cloud Standards Testing for Developers
Cloud Standards in the Real World: Cloud Standards Testing for Developers
 
All about open stack
All about open stackAll about open stack
All about open stack
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 

Plus de Tim Bell

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019Tim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Tim Bell
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013Tim Bell
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2Tim Bell
 
Havana survey results-final
Havana survey results-finalHavana survey results-final
Havana survey results-finalTim Bell
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3Tim Bell
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating ScienceTim Bell
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with PuppetTim Bell
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2Tim Bell
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 

Plus de Tim Bell (17)

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2
 
Havana survey results-final
Havana survey results-finalHavana survey results-final
Havana survey results-final
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
 
20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science20121017 OpenStack Accelerating Science
20121017 OpenStack Accelerating Science
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with Puppet
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

CERN Mass and Agility talk at OSCON 2014

  • 1.
  • 3. About Tim • Runs IT Infrastructure group at CERN • Member of OpenStack management board and user committee • Previously worked at • Deutsche Bank running European Private Banking Infrastructure • IBM as a consultant and kernel developer 23/07/2014 3OSCON - CERN Mass and Agility
  • 4. 23/07/2014 4 CERN was founded 1954: 12 European States “Science for Peace” Today: 21 Member States Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Israel, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in Pre-Stage to Membership: Serbia Applicant States for Membership or Associate Membership: Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America; European Commission and UNESCO ~ 2,300 staff ~ 1,000 other paid personnel > 11,000 users Budget (2013) ~1,000 MCHF OSCON - CERN Mass and Agility
  • 5. What are the Origins of Mass ? 23/07/2014 5OSCON - CERN Mass and Agility
  • 6. Matter/Anti Matter Symmetric? 23/07/2014 6OSCON - CERN Mass and Agility
  • 7. Where is 95% of the Universe? 23/07/2014 7OSCON - CERN Mass and Agility
  • 8. 23/07/2014 8OSCON - CERN Mass and Agility
  • 9. 23/07/2014 9OSCON - CERN Mass and Agility
  • 10. 23/07/2014 10OSCON - CERN Mass and Agility
  • 11. Collisions 23/07/2014 11OSCON - CERN Mass and Agility
  • 12. A Big Data Challenge 23/07/2014 12 In 2014, • ~ 100PB archive with additional 35PB/year • ~ 11,000 servers • ~ 75,000 disk drives • ~ 45,000 tapes • Data should be kept for at least 20 years In 2015, we start the accelerator again • Upgrade to double the energy of the beams • Expect a significant increase in data rate OSCON - CERN Mass and Agility
  • 13. LHC data growth • Plan to record 400PB/year by 2023 • Compute needs expected to be around 50x current levels if budget available 23/07/2014 OSCON - CERN Mass and Agility 13 0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 Run 1 Run 2 Run 3 Run 4 CMS ATLAS ALICE LHCb 2010 2015 2018 2023 PB per year
  • 14. 23/07/2014 14 Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs OSCON - CERN Mass and Agility
  • 15. The CERN Meyrin Data Centre 23/07/2014 15OSCON - CERN Mass and Agility
  • 16. New Data Centre in Budapest 23/07/2014 16OSCON - CERN Mass and Agility
  • 17. Good News, Bad News 23/07/2014 OSCON - CERN Mass and Agility 17 • Additional data centre in Budapest now online • Increasing use of facilities as data rates increase But… • Staff numbers are fixed, no more people • Materials budget decreasing, no more money • Legacy tools are high maintenance and brittle • User expectations are for fast self-service
  • 18. Public Procurement Cycle Step Time (Days) Elapsed (Days) User expresses requirement 0 Market Survey prepared 15 15 Market Survey for possible vendors 30 45 Specifications prepared 15 60 Vendor responses 30 90 Test systems evaluated 30 120 Offers adjudicated 10 130 Finance committee 30 160 Hardware delivered 90 250 Burn in and acceptance 30 days typical with 380 worst case 280 Total 280+ Days 23/07/2014 OSCON - CERN Mass and Agility 18
  • 19. Approach • There is no Moore’s Law for people • Automation needs APIs, not documented procedures • Focus on high people effort activities • Are those requirements really justified ? • Accumulating technical debt stifles agility • Find open source communities and contribute • Understand ethos and architecture • Stay mainstream 23/07/2014 OSCON - CERN Mass and Agility 19
  • 20. O’Reilly Consideration 23/07/2014 OSCON - CERN Mass and Agility 20
  • 21. Indeed.Com Consideration 23/07/2014 OSCON - CERN Mass and Agility 21
  • 22. 23/07/2014 Bamboo Koji, Mock AIMS/PXE Foreman Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop / LogStash / Kibana git OpenStack Nova Hardware database Puppet Active Directory / LDAP 22OSCON - CERN Mass and Agility
  • 23. Puppet Configuration 23/07/2014 OSCON - CERN Mass and Agility 23 • Over 10,000 hosts in Puppet • 160 different hostgroups • Tool chain using • PuppetDB • Foreman • Git • Scaling issues resolved with the communities
  • 24. Monitoring - Flume, Elastic Search, Kibana 24 HDFS Flume gateway elasticsearch Kibana OpenStack infrastructure 23/07/2014 OSCON - CERN Mass and Agility
  • 25. 23/07/2014 25 Microsoft Active Directory CERN DB on Demand CERN Network Database Account mgmt system Horizon Keystone Glance Network Compute Scheduler Cinder Nova Block Storage Ceph & NetApp CERN Accounting Ceilometer OSCON - CERN Mass and Agility
  • 26. compute-nodescontrollers compute-nodes Scaling Architecture Overview 26 Child Cell Geneva, Switzerland Child Cell Budapest, Hungary Top Cell - controllers Geneva, Switzerland Load Balancer Geneva, Switzerland controllers 23/07/2014 OSCON - CERN Mass and Agility
  • 27. Status • Multi-data centre cloud in production since July 2013 (Geneva and Budapest) with nearly 1,000 users • Currently running OpenStack Havana • KVM and Hyper-V deployed • All configured automatically with Puppet • ~70,000 cores on ~3,000 servers • 3PB Ceph pool available for volumes, images and other physics storage 23/07/2014 27OSCON - CERN Mass and Agility
  • 28. The Agile Experience 23/07/2014 OSCON - CERN Mass and Agility 28
  • 29. Cultural Barriers 23/07/2014 OSCON - CERN Mass and Agility 29
  • 30. Agility and Elasticity Limits • Communities help to set good behaviour • Internal demonstrations build momentum • Finding the right speed is key • Keeping up with releases takes focus • Coping with legacy requires compromise • Travel budget needs significant increase! 23/07/2014 OSCON - CERN Mass and Agility 30
  • 31. Next Steps: Scale with Physics • Scaling to >100,000 cores by 2015 • Around 100 hypervisors per week with fixed staff • Deploying and configuring latest releases • Need to stay close … but not too close • Legacy systems retirement • Server consolidation • Home grown configuration and monitoring • Analytics of processor, disk and network • Focus on efficiency 23/07/2014 31OSCON - CERN Mass and Agility
  • 32. IN2P3 Lyon Next Steps: Federated Clouds Public Cloud such as Rackspace CERN Private Cloud 70K cores ATLAS Trigger 28K cores CMS Trigger 12K cores Brookhaven National Labs NecTAR Australia Many Others on Their Way 23/07/2014 OSCON - CERN Mass and Agility 32
  • 33. Summary • Open source tools have successfully replaced CERN’s legacy fabric management system • Scaling to 100,000s of cores with OpenStack and Puppet is in sight • Cultural change to an Agile approach has required time and patience but is paying off Community collaboration needed to reach 400PB/year 23/07/2014 33OSCON - CERN Mass and Agility
  • 34. Questions ? 23/07/2014 34 • Details at http://openstack-in- production.blogspot.fr • Previous presentations at http://information- technology.web.cern.ch/boo k/cern-private-cloud-user- guide/openstack-information • CERN code is at http://github.com/cernops OSCON - CERN Mass and Agility
  • 35. 23/07/2014 35OSCON - CERN Mass and Agility
  • 36. 23/07/2014 36OSCON - CERN Mass and Agility
  • 38. 23/07/2014 38OSCON - CERN Mass and Agility
  • 39. Monitoring - Kibana 3923/07/2014 OSCON - CERN Mass and Agility
  • 40. Monitoring - Kibana 4023/07/2014 OSCON - CERN Mass and Agility
  • 41. 23/07/2014 41OSCON - CERN Mass and Agility
  • 42. Architecture Components 42 rabbitmq - Keystone - Nova api - Nova conductor - Nova scheduler - Nova network - Nova cells - Glance api - Ceilometer agent-central - Ceilometer collector Controller - Flume - Nova compute - Ceilometer agent-compute Compute node - Flume - HDFS - Elastic Search - Kibana - MySQL - MongoDB - Glance api - Glance registry - Keystone - Nova api - Nova consoleauth - Nova novncproxy - Nova cells - Horizon - Ceilometer api - Cinder api - Cinder volume - Cinder scheduler rabbitmq Controller Top Cell Children Cells - Stacktach - Ceph - Flume 23/07/2014 OSCON - CERN Mass and Agility
  • 43. Upgrade Strategy • Surely “OpenStack can’t be upgraded” • Our Essex, Folsom and Grizzly clouds were ‘tear-down’ migrations • Puppet managed VMs are typical Cattle cases – re-create • User VMs snapshot, download image and upload to new instance • One month window to migrate • Users of production services expect more • Physicists accept not creating/changing VMs for a short period • Running VMs must not be affected 23/07/2014 43OSCON - CERN Mass and Agility
  • 44. Phased Migration • Migrated by Component • Choose an approach (online with load balancer, offline) • Spin up ‘teststack’ instance with production software • Clone production databases to test environment • Run through upgrade process • Validate existing functions, Puppet configuration and monitoring • Order by complexity and need • Ceilometer, Glance, Keystone • Cinder, Client CLIs, Horizon • Nova 23/07/2014 44OSCON - CERN Mass and Agility
  • 45. Upgrade Experience • No significant outage of the cloud • During upgrade window, creation not possible • Small incidents (see blog for details) • Puppet can be enthusiastic! - we told it to be  • Community response has been great • Bugs fixed and points are in Juno design summit • Rolling upgrades in Icehouse will make it easier 23/07/2014 45OSCON - CERN Mass and Agility
  • 46. Duplication and Divergence Service Silos Functional Layers 23/07/2014 OSCON - CERN Mass and Agility 46 Network Hardware Facilities Storage Compute Windows Web Database Custom Network Hardware Facilities Infrastructure as a Service Platform as a Service Storage Compute Windows
  • 47. Service Models 23/07/2014 47 • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one OSCON - CERN Mass and Agility
  • 48. 23/07/2014 48OSCON - CERN Mass and Agility

Notes de l'éditeur

  1. Over 1,600 magnets lowered down shafts and cooled to -271 C to become superconducting. Two beam pipes, vacuum 10 times less than the moon
  2. These collisions produce data, lots of it. Over 100PB currently 45,000 tapes… data rates of up to 35 PB/year currently and expected to significantly increase in the next run in 2015. The data must be kept at least 20 years so we’re expecting exabytes….
  3. The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  4. Recording and analysing the data takes a lot of computing power. The CERN computer centre was built in the 1970s for mainframes and crays. Now running at 3.5MW of power, it houses 11,000 servers but is at the limit of cooling and electrical power. It is also a tourist attraction with over 80,000 visitors last year! As you can see, racks are only partially empty in view of the limits on cooling.
  5. We asked our 20 member states to make us an offer for server hosting using public procurement. 27 proposals and Wigner centre in Budapest, Hungary was chosen. This allows us to envisage sufficient computing and online storage for the run from 2015.
  6. With the new data centre in Budapast, we could now look at address the upcoming data increases but there were a number of constraints. In the current economic climate, CERN cannot be asking for additional staff to run the computer systems. At the same time, the budget for hardware is also under restrictions. The prices are coming down gradually so we can get more for the same but we need to find ways to maximise the efficicency of the hardware. Our tools for management were written in 2000s, consist of 100,000 of lines of perl over 10 years, often by students, and in need of maintenance. Changes such as IPv6 or new operating systems would require major effort just to keep up. Finally, the users are expected a more responsive central IT service… their expectations are set by the services they use at home, you don’t have to fill out a ticket to get a dropbox account so why should you need to at work ?
  7. However, CERN is a publically funded body with strict purchasing rules to make sure that the contributions from our contributing countries are also provided back to the member states, our hardware purchases should be distributed to each of the countries in ratio of their contributions., So, we have a public procurement cycle that takes 280 days in the best case… we define the specifications 6 months before we actually have the h/w available and that is in the best case. Worst case, we find issues when the servers are delivered. We’ve had cases such as swapping out 7,000 disk drives where you stop tracking by the drive but measure it by the pallet of disks. With these constraints, we needed to find an approach that allows us to be flexible for the physicists while still being compliant with the rules.
  8. We came up with a number of guiding principles… We took an approach that CERN was not special. Culturally, for a research organisation this is a big challenge. Many continue to feel that our requirements would be best met by starting again from scratch but with the modern requirements. In the past, we had extensive written procedures for sysadmins to execute with lots of small tools to run, These were error prone and often the guys did not read the latest ones before they performed the operation. We needed to find ways to scale the productivity the team to match the additional servers. One of the highest people cost items was the tooling. We had previously been constructing requirements lists, with detailed must-have needs for acceptance. Instead, we asked ourselves how come the other big centres could run using these open source tools yet we had special requirements. Often, the root cause was that we did not understand the best approach to use the tools rather than that we were special. The maintenance of our tools was high. The skills and experienced staff were taking up more and more of their time with the custom code so we took an approach of deploy rather than develop. This meant finding the open source tools that made sense for us, trying them out. Where we found something that was missing, we challenged it again and again. Finally, we would develop in collaboration with the community generalised solutikons for the problems that can eb maintained by the community afterwards. Long term forking is not sustainable.
  9. So how did we choose our tools ? There were the technical requirements are a significant factor but there is also the need to look at the community ecosystem. Open source on its own is not enough.. Our fragile legacy tools were open source but were lacking a community. Typical example of this is the O’Reilly books.. Once the O’Reilly book is out, the tool is worth a good look. Furthermore, it greatly helps to train new staff… you can buy them a copy and let them work it through to learn rather than needing to be guru mentored.
  10. CERN staff are generally on short term contracts, 2-5 years and come from all over the member states. They come to CERN, often out of university or their 1st jobs. We look for potential rather than specific skills in the current tools. After a time at CERN, they leave with expert skills and experience in our tools which is a great help for finding future job opportunities and ensuring motivation to the end of their contracts.
  11. We adopted a Google toolchain approach. The majority of home written software was replaced by open source projects. Commercial tools which were already working well such as JIRA and Active Directory were maintained. The approach was to select a tool, prototype, fail early and then refine requirements (following the we are not special approach) Key technologies were Puppet for configuration management and OpenStack for the private cloud.
  12. For monitoring, we had invested significantly in a home grown solution called lemon. Parts of this system were preserved such as the agent since it was scaling well and the maintenance. However, we need to have a better way of data mining to understand the efficiency at scale along with avoiding the ‘new project, new dashboard, new datawarehouse’ mentality. We have not been completely successful in aligning the architectures completely but there is gradual progress towards this architecture. Already, correlations between the user application, the CPU usage, the network loads are yielding good results as well as consolidating operations data such as alarms and capacity planning help to identify trends.
  13. Account Management Automation CERN legacy network database No Neutron yet
  14. HA Proxy load balancers to ensure high availability Redundant controllers for compute nodes Cells used by the largest sites such as Rackspace and NeCTAR – more than 1000 hypervisors is the recommended configuration
  15. Already 3 independent clouds – federation is now being studied Rackspace inside CERN openlab Helix Nebula as discussed later
  16. So, we assembled a team made up of experienced service managers and new students. By freezing developments on legacy projects, we were able to make resources available but only as long as we could rapidly implement new functions. Many of the staff had to do their ‘day’ jobs as well as work on the new implementations. Several effects - Newcomers often had experience of the tools from university People learnt very rapidly by following mailing lists, going to conferences and interacting with the community. Contributions such as contributing to the governance, use cases and testing in addition to standard development contributions. Short term staff saw major improvements in their post-CERN job prospects as they left with very relevant skills
  17. The agile approach is a major cultural change which is an ongoing process. To illustrate this, there are some characteristics which I show extreme examples of to watch out from Tolkein…. Luckily, we never had characters like this at CERN: Don’t be hasty, let’s go slowly… transformations such as this cannot be done in a reasonable time by incremental change Move away from silos… top to bottom from application to hardware managed by a single team to a layered model with shared budget and resources. Knowledge management responsibilities change. The guru who wrote the tool and trains others on how to use it is replaced by the outside community in which people participate. Everything can appear to be research if you start with a blank piece of paper. The server or application manager of ‘precious’ applications that need special handling and care has to be understood… some cases are inevitable but many reflect non-technical aspects of the application or server management and may justify changes of process
  18. As we implemented the tool chain, we started to notice some interesting characteristics. Staff got heavily involved in mailing lists and IRCs, helping others and learning themselves. The open source collaboration culture then starts to affect how they work with their colleagues. Ownerships between more shared, pull requests came in for enhancements rather than bug reports. Many people had good ideas and these were often competing. Spinning up a VM with a new tool, demonstrating it to a public town hall meeting and debating the potential benefits was a good way to give an initial yes/no decision (or put on hold to look at later) Speed of adoption is varied… some of the team immediately understood the concepts and approach. They became highly productive, using CI for testing, Puppet even for single servers and cloud architectures. Others were more cautious… even though they used these new tools, the approach used remained the same… release once a quarter, test manually carefully, hand-configure.. This causes tension in the teams and dis-satisfaction with the tools since trying to use these tool without changing the approach is sub-optimal. We organised boot camp training…. Initially, people suggested to send all newcomers to the department for the training but we often found these people already had the knowledge from their training.. The key team to train was actually the people who had been at CERN for a longer time and had significant professional experience with other approaches. Many of those used to conventional software enterprise had difficulties with the rate of change. New releases each week with additional features, potentially changing behaviour can be mis-interpreted as unstable by an enterprise sysadmin. Adoption of CI helped but some apps are difficult to handle in these circumstances. Classic case was backup software where the vendor did not test with all releases, this needed configuration freezing and snapshots. Discussions like hostname conventions became very intense and town hall meetings to gather the different perspectives for community decisions helped. Sometimes, these discussions would drag on for weeks to converge. Part of the collaboration with communities involved face-to-face time. The travel budget was rapidly used up as we sent people to the OpenStack summits, Puppetconf and FOSDEM to keep in touch. We found cases where our ‘free’ software travel costs exceeded costs of the commercial products…. Overall, trying to keep the team so that the gap between the front runners and the main body of administrators is kept to the minimum. We’ve not completed 100% but the bulk of the cultural transition is over.
  19. The trigger farms are those servers nearest the accelerator which are not needed while the accelerator is shut down till 2015 Public clouds are interesting for burst load (such as coming up to a conference) or when price drops such as spot market Private clouds allow universities and other research labs to collaborate in processing the LHC data
  20. Child cells have their own keystone in view of load from ceilometer Requires care to set up and test