SlideShare une entreprise Scribd logo
1  sur  46
Accelerating Science
  with OpenStack

           Tim Bell
       Tim.Bell@cern.ch
         @noggin143

  OpenStack Summit San Diego
      17th October 2012
What is CERN ?
• Conseil Européen pour la
  Recherche Nucléaire – aka
  European Laboratory for
  Particle Physics
• Between Geneva and the
  Jura mountains, straddling
  the Swiss-French border
• Founded in 1954 with an
  international treaty
• Our business is fundamental
  physics , what is the
  universe made of and how
  does it work
  OpenStack Summit October 2012       Tim Bell, CERN   2
Answering fundamental questions…
• How to explain particles have mass?
   We have theories and accumulating experimental evidence.. Getting close…

• What is 96% of the universe made of ?
   We can only see 4% of its estimated mass!

• Why isn’t there anti-matter
  in the universe?
   Nature should be symmetric…

• What was the state of matter just
  after the « Big Bang » ?
   Travelling back to the earliest instants of
   the universe would help…

OpenStack Summit October 2012          Tim Bell, CERN                         3
Community collaboration on an international scale




OpenStack Summit October 2012   Tim Bell, CERN         4
The Large Hadron Collider




OpenStack Summit October 2012   Tim Bell, CERN   5
The Large Hadron Collider (LHC) tunnel




OpenStack Summit October 2012   Tim Bell, CERN
                                                 6
OpenStack Summit October 2012   Tim Bell, CERN   7
Accumulating events in 2009-2011




OpenStack Summit October 2012   Tim Bell, CERN     8
OpenStack Summit October 2012   Tim Bell, CERN   9
Heavy Ion Collisions




OpenStack Summit October 2012          Tim Bell, CERN   10
OpenStack Summit October 2012   Tim Bell, CERN   11
Tier-0 (CERN):
                                                               •Data recording
                                                               •Initial data reconstruction
                                                               •Data distribution



                                                              Tier-1 (11 centres):
                                                              •Permanent storage
                                                              •Re-processing
                                                              •Analysis


                                                              Tier-2 (~200 centres):
                                                              • Simulation
                                                              • End-user analysis


• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
• In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs
   OpenStack Summit October 2012       Tim Bell, CERN                                12
•     Data Centre by Numbers
        – Hardware installation & retirement
                  •    ~7,000 hardware movements/year; ~1,800 disk failures/year


          Racks                              828    Disks                                  64,109         Tape Drives                 160
          Servers                         11,728    Raw disk capacity (TiB)                63,289         Tape Cartridges           45,000
          Processors                      15,694    Memory modules                         56,014         Tape slots                56,000
          Cores                           64,238    Memory capacity (TiB)                       158       Tape Capacity (TiB)       73,000
          HEPSpec06                      482,507    RAID controllers                           3,749
                                                                                                          High Speed Routers
                                                                                                                                       24
                       Xeon    Xeon Xeon                        Other Fujitsu                             (640 Mbps → 2.4 Tbps)
                       3GHz    5150 5160 Xeon                    0%    3%
       Xeon             4%      2% 10% E5335
                                                                                                          Ethernet Switches           350
       L5520                              7% Xeon                               Hitachi
        33%                                                                      23%                      10 Gbps ports              2,000
                                            E5345
                                             14%                                                 HP       Switching Capacity      4.8 Tbps
                                                                                     Seagate
                                                                                                 0%
                                                                                      15%
                                                                                                          1 Gbps ports              16,939
                                                                                                Maxtor
                                                    Western                                      0%       10 Gbps ports               558
                                           Xeon
               Xeon                                 Digital
                                           E5405
                               Xeon                  59%
               L5420                        6%                                                         IT Power Consumption       2,456 KW
                8%             E5410
                                16%
                                                                                                       Total Power Consumption    3,890 KW




    OpenStack Summit October 2012                             Tim Bell, CERN                                                                 13
OpenStack Summit October 2012   Tim Bell, CERN   14
Our Challenges - Data storage




                                •   >20 years retention
                                •   6GB/s average
                                •   25GB/s peaks
                                •   30PB/year to record

OpenStack Summit October 2012          Tim Bell, CERN     15
45,000 tapes holding 73PB of physics data




OpenStack Summit October 2012       Tim Bell, CERN              16
New data centre to expand capacity
                                                 • Data centre in Geneva
                                                   at the limit of
                                                   electrical capacity at
                                                   3.5MW
                                                 • New centre chosen in
                                                   Budapest, Hungary
                                                 • Additional 2.7MW of
                                                   usable power
                                                 • Hands off facility
                                                 • Deploying from 2013
                                                   with 200Gbit/s
OpenStack Summit October 2012   Tim Bell, CERN
                                                   network to CERN        17
Time to change strategy
• Rationale
      – Need to manage twice the servers as today
      – No increase in staff numbers
      – Tools becoming increasingly brittle and will not scale as-is
• Approach
      – CERN is no longer a special case for compute
      – Adopt an open source tool chain model
      – Our engineers rapidly iterate
            • Evaluate solutions in the problem domain
            • Identify functional gaps and challenge them
            • Select first choice but be prepared to change in future
      – Contribute new function back to the community

OpenStack Summit October 2012           Tim Bell, CERN                  18
Building Blocks
                                    mcollective, yum              Bamboo

      Puppet
                                                 AIMS/PXE
                                                  Foreman                  JIRA

   OpenStack
     Nova


                                                                                  git



                                                                           Koji, Mock
                                                  Yum repo
 Active Directory /                                 Pulp
       LDAP




                                                             Lemon /
        Hardware
                                                             Hadoop
        database
                                  Puppet-DB
OpenStack Summit October 2012         Tim Bell, CERN                                19
Training and Support
•   Buy the book rather than guru mentoring
•   Follow the mailing lists to learn
•   Newcomers are rapidly productive (and often know more than us)
•   Community and Enterprise support means we’re not on our own




OpenStack Summit October 2012     Tim Bell, CERN                     20
Staff Motivation
• Skills valuable outside of CERN when an engineer’s contracts
  end




OpenStack Summit October 2012   Tim Bell, CERN                   21
Prepare the move to the clouds
• Improve operational efficiency
      – Machine ordering, reception and testing
      – Hardware interventions with long running programs
      – Multiple operating system demand
• Improve resource efficiency
      – Exploit idle resources, especially waiting for disk and tape I/O
      – Highly variable load such as interactive or build machines
• Enable cloud architectures
      – Gradual migration to cloud interfaces and workflows
• Improve responsiveness
      – Self-Service with coffee break response time


OpenStack Summit October 2012        Tim Bell, CERN                        22
Public Procurement Purchase Model
Step                                     Time (Days)         Elapsed (Days)
User expresses requirement                                                     0
Market Survey prepared                                 15                     15
Market Survey for possible vendors                     30                     45
Specifications prepared                                15                     60
Vendor responses                                       30                     90
Test systems evaluated                                 30                     120
Offers adjudicated                                     10                     130
Finance committee                                      30                     160
Hardware delivered                                     90                     250
Burn in and acceptance                     30 days typical                    280
                                           380 worst case
Total                                                            280+ Days
OpenStack Summit October 2012        Tim Bell, CERN                                 23
Service Model
                                 • Pets are given names like
                                   pussinboots.cern.ch
                                 • They are unique, lovingly hand raised
                                   and cared for
                                 • When they get ill, you nurse them back
                                   to health

                                 • Cattle are given numbers like
                                   vm0042.cern.ch
                                 • They are almost identical to other cattle
                                 • When they get ill, you get another one



          • Future application architectures should use Cattle but Pets with
            strong configuration management are viable and still needed
OpenStack Summit October 2012         Tim Bell, CERN                           24
Supporting the Pets with OpenStack
• Network
      – Interfacing with legacy site DNS and IP management
      – Ensuring Kerberos identity before VM start
• Puppet
      – Ease use of configuration management tools with our users
      – Exploit mcollective for orchestration/delegation
• External Block Storage
      – Currently using nova-volume with Gluster backing store
• Live migration to maximise availability
      – KVM live migration using Gluster
      – KVM and Hyper-V block migration

OpenStack Summit October 2012      Tim Bell, CERN                   25
Current Status of OpenStack at CERN
• Working on an Essex code base from the EPEL repository
      – Excellent experience with the Fedora cloud-sig team
      – Cloud-init for contextualisation, oz for images with RHEL/Fedora
• Components
      – Current focus is on Nova with KVM and Hyper-V
      – Tests with Swift are ongoing but require significant experiment code
        changes
• Pre-production facility with around 150 Hypervisors, with
  2000 VMs integrated with CERN infrastructure, Puppet
  deployed and used for simulation of magnet placement using
  LHC@Home and batch

OpenStack Summit October 2012      Tim Bell, CERN                              26
OpenStack Summit October 2012   Tim Bell, CERN   27
When communities combine…
• OpenStack’s many components and options make
  configuration complex out of the box
• Puppet forge module from PuppetLabs does our configuration
• The Foreman adds OpenStack provisioning for user kiosk to a
  configured machine in 15 minutes




OpenStack Summit October 2012   Tim Bell, CERN              28
Foreman to manage Puppetized VM




OpenStack Summit October 2012   Tim Bell, CERN   29
Active Directory Integration
• CERN’s Active Directory
     –   Unified identity management across the site
     –   44,000 users
     –   29,000 groups
     –   200 arrivals/departures per month
• Full integration with Active Directory via LDAP
     – Uses the OpenLDAP backend with some particular configuration
       settings
     – Aim for minimal changes to Active Directory
     – 7 patches submitted around hard coded values and additional filtering
• Now in use in our pre-production instance
     – Map project roles (admins, members) to groups
     – Documentation in the OpenStack wiki
OpenStack Summit October 2012       Tim Bell, CERN                         30
Welcome Back Hyper-V!
• We currently use Hyper-V/System Centre for our server
  consolidation activities
      – But need to scale to 100x current installation size
• Choice of hypervisors should be tactical
      – Performance
      – Compatibility/Support with integration components
      – Image migration from legacy environments
• CERN is working closely with the Hyper-V OpenStack team
      – Puppet to configure hypervisors on Windows
      – Most functions work well but further work on Console, Ceilometer, …


OpenStack Summit October 2012       Tim Bell, CERN                            31
Opportunistic Clouds in online experiment farms
• The CERN experiments have farms of 1000s of Linux servers
  close to the detectors to filter the 1PByte/s down to 6GByte/s
  to be recorded to tape
• When the accelerator is not running, these machines are
  currently idle
      – Accelerator has regular maintenance slots of several days
      – Long Shutdown due from March 2013-November 2014
• One of the experiments are deploying OpenStack on their farm
      – Simulation (low I/O, high CPU)
      – Analysis (high I/O, high CPU, high network)


OpenStack Summit October 2012      Tim Bell, CERN                   32
Federated European Clouds
• Two significant European projects around Federated Clouds
      – European Grid Initiative Federated Cloud as a federation of grid sites providing
        IaaS
      – HELiX Nebula European Union funded project to create a scientific cloud
        based on commercial providers
                                                                   EGI Federated Cloud Sites
                                                        CESGA        CESNET       INFN         SARA


                                                        Cyfronet     FZ Jülich    SZTAKI       IPHC


                                                        GRIF         GRNET        KTH          Oxford


                                                        GWDG         IGI          TCD          IN2P3


                                                        STFC


OpenStack Summit October 2012          Tim Bell, CERN                                                   33
Federated Cloud Commonalities
• Basic building blocks
      – Each site gives an IaaS endpoint with an API
            • OCCI? CDMI ? EC2 ? Libcloud ? Jclouds ?
      – Image stores available across the sites
      – Federated identity management based on X.509 certificates
      – Consolidation of accounting information to validate pledges and usage
      – Common security policies and computing rules
• Multiple cloud technologies in use
      – OpenStack
      – OpenNebula
      – Proprietary

OpenStack Summit October 2012          Tim Bell, CERN                       34
Next Steps
• Deploy into production at the start of 2013 with Folsom running the Grid
  software on top of OpenStack IaaS
• Support multi-site operations with 2nd data centre in Hungary
• Exploit new functionality
    – Ceilometer for metering
    – Bare metal for non-virtualised use cases such as high I/O servers
    – X.509 user certificate authentication
    – Load balancing as a service


Ramping to 15,000 hypervisors with
100,000 to 300,000 VMs by 2015
OpenStack Summit October 2012     Tim Bell, CERN                             35
What are we missing (or haven’t found yet) ?
• Best practice documentation for
      – Monitoring and KPIs as part of core functionality
      – Guest disaster recovery solutions
      – Migration between versions of OpenStack
• Roles within multi-user projects
      – VM owner allowed to manage their own resources (start/stop/delete)
      – Project admins allowed to manage all resources
      – Other members should not have high rights over other members VMs
• Global quota management for non-elastic private cloud
      – Manage resource prioritisation and allocation centrally
      – Capacity management / utilisation for planning

OpenStack Summit October 2012       Tim Bell, CERN                       36
Conclusions
• Production at CERN in next few months on Folsom
      – Our emphasis will shift to focus on stability
      – Integrate CERN legacy integrations via formal user exits
      – Work together with others on scaling improvements
• Community is key to shared success
      – Our problems are often resolved before we raise them
      – Packaging teams are producing reliable builds promptly
• CERN contributes and benefits
      – Thanks to everyone for their efforts and enthusiasm
      – Not just code but documentation, tests, blogs, …
OpenStack Summit October 2012      Tim Bell, CERN                  37
References
CERN                                            http://public.web.cern.ch/public/
Scientific Linux                                http://www.scientificlinux.org/
Worldwide LHC Computing Grid                    http://lcg.web.cern.ch/lcg/
                                                http://rtm.hep.ph.ic.ac.uk/
Jobs                                            http://cern.ch/jobs
Detailed Report on Agile Infrastructure         http://cern.ch/go/N8wp
HELiX Nebula                                    http://helix-nebula.eu/
EGI Cloud Taskforce                             https://wiki.egi.eu/wiki/Fedcloud-tf




 OpenStack Summit October 2012            Tim Bell, CERN                               39
Backup Slides




OpenStack Summit October 2012   Tim Bell, CERN   40
OpenStack Summit October 2012   Tim Bell, CERN   41
CERN’s tools
• The world’s most powerful accelerator: LHC
      –   A 27 km long tunnel filled with high-tech instruments
      –   Equipped with thousands of superconducting magnets
      –   Accelerates particles to energies never before obtained
      –   Produces particle collisions creating microscopic “big bangs”
• Very large sophisticated detectors
      – Four experiments each the size of a cathedral
      – Hundred million measurement channels each
      – Data acquisition systems treating Petabytes per second
• Top level computing to distribute and analyse the data
      – A Computing Grid linking ~200 computer centres around the globe
      – Sufficient computing power and storage to handle 25 Petabytes per
        year, making them available to thousands of physicists for analysis
OpenStack Summit October 2012        Tim Bell, CERN                           42
Our Infrastructure
• Hardware is generally based on commodity, white-box servers
      – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF
      – Compute nodes typically dual processor, 2GB per core
      – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
• Vast majority of servers run Scientific Linux, developed by
  Fermilab and CERN, based on Redhat Enterprise
      – Focus is on stability in view of the number of centres on the WLCG




OpenStack Summit October 2012      Tim Bell, CERN                            43
New architecture data flows




OpenStack Summit October 2012   Tim Bell, CERN   44
500
                                                   1500
                                                                    2000
                                                                           2500
                                                                                  3000
                                                                                         3500




                                          1000




                                0
 Mar-10

       Apr-10

May-10

         Jun-10

                Jul-10

    Aug-10

      Sep-10

        Oct-10




OpenStack Summit October 2012
   Nov-10

     Dec-10

          Jan-11

      Feb-11

 Mar-11

       Apr-11

May-11

         Jun-11

                Jul-11




Tim Bell, CERN
    Aug-11

      Sep-11

        Oct-11

   Nov-11

     Dec-11

          Jan-12

      Feb-12

 Mar-12

       Apr-12

May-12
                                                                                                Virtualisation on SCVMM/Hyper-V




         Jun-12

                Jul-12

    Aug-12

      Sep-12
45
                                                 Linux




        Oct-12
                                                          Windows
Scaling up with Puppet and OpenStack
• Use LHC@Home based on BOINC for simulating magnetics
  guiding particles around the LHC
• Naturally, there is a puppet module puppet-boinc
• 1000 VMs spun up to stress test the hypervisors with Puppet,
  Foreman and OpenStack




OpenStack Summit October 2012   Tim Bell, CERN                   46

Contenu connexe

Tendances

Ga techsusthpc patterson
Ga techsusthpc pattersonGa techsusthpc patterson
Ga techsusthpc pattersonMelanie Brandt
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARSEd Dodds
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN GenevaSteve Traylen
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 

Tendances (9)

Ga techsusthpc patterson
Ga techsusthpc pattersonGa techsusthpc patterson
Ga techsusthpc patterson
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARS
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
Again music
Again musicAgain music
Again music
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN Geneva
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 

Similaire à Accelerating Science with OpenStack at CERN

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialJeff Larkin
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoKota Tsuyuzaki
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus Hirofumi Ichihara
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challengeJason Shih
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snwScott Adams
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationxKinAnx
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 

Similaire à Accelerating Science with OpenStack at CERN (20)

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation Savings
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challenge
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snw
 
Chapter03
Chapter03Chapter03
Chapter03
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 

Plus de Tim Bell

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019Tim Bell
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1Tim Bell
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Tim Bell
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013Tim Bell
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6Tim Bell
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4Tim Bell
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitTim Bell
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2Tim Bell
 

Plus de Tim Bell (20)

CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019CERN Status at OpenStack Shanghai Summit November 2019
CERN Status at OpenStack Shanghai Summit November 2019
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
20150924 rda federation_v1
20150924 rda federation_v120150924 rda federation_v1
20150924 rda federation_v1
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4Open stack operations feedback loop v1.4
Open stack operations feedback loop v1.4
 
CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013CERN clouds and culture at GigaOm London 2013
CERN clouds and culture at GigaOm London 2013
 
20130529 openstack cee_day_v6
20130529 openstack cee_day_v620130529 openstack cee_day_v6
20130529 openstack cee_day_v6
 
Academic cloud experiences cern v4
Academic cloud experiences cern v4Academic cloud experiences cern v4
Academic cloud experiences cern v4
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Havana survey results-final-v2
Havana survey results-final-v2Havana survey results-final-v2
Havana survey results-final-v2
 

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Accelerating Science with OpenStack at CERN

  • 1. Accelerating Science with OpenStack Tim Bell Tim.Bell@cern.ch @noggin143 OpenStack Summit San Diego 17th October 2012
  • 2. What is CERN ? • Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics • Between Geneva and the Jura mountains, straddling the Swiss-French border • Founded in 1954 with an international treaty • Our business is fundamental physics , what is the universe made of and how does it work OpenStack Summit October 2012 Tim Bell, CERN 2
  • 3. Answering fundamental questions… • How to explain particles have mass? We have theories and accumulating experimental evidence.. Getting close… • What is 96% of the universe made of ? We can only see 4% of its estimated mass! • Why isn’t there anti-matter in the universe? Nature should be symmetric… • What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help… OpenStack Summit October 2012 Tim Bell, CERN 3
  • 4. Community collaboration on an international scale OpenStack Summit October 2012 Tim Bell, CERN 4
  • 5. The Large Hadron Collider OpenStack Summit October 2012 Tim Bell, CERN 5
  • 6. The Large Hadron Collider (LHC) tunnel OpenStack Summit October 2012 Tim Bell, CERN 6
  • 7. OpenStack Summit October 2012 Tim Bell, CERN 7
  • 8. Accumulating events in 2009-2011 OpenStack Summit October 2012 Tim Bell, CERN 8
  • 9. OpenStack Summit October 2012 Tim Bell, CERN 9
  • 10. Heavy Ion Collisions OpenStack Summit October 2012 Tim Bell, CERN 10
  • 11. OpenStack Summit October 2012 Tim Bell, CERN 11
  • 12. Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs OpenStack Summit October 2012 Tim Bell, CERN 12
  • 13. Data Centre by Numbers – Hardware installation & retirement • ~7,000 hardware movements/year; ~1,800 disk failures/year Racks 828 Disks 64,109 Tape Drives 160 Servers 11,728 Raw disk capacity (TiB) 63,289 Tape Cartridges 45,000 Processors 15,694 Memory modules 56,014 Tape slots 56,000 Cores 64,238 Memory capacity (TiB) 158 Tape Capacity (TiB) 73,000 HEPSpec06 482,507 RAID controllers 3,749 High Speed Routers 24 Xeon Xeon Xeon Other Fujitsu (640 Mbps → 2.4 Tbps) 3GHz 5150 5160 Xeon 0% 3% Xeon 4% 2% 10% E5335 Ethernet Switches 350 L5520 7% Xeon Hitachi 33% 23% 10 Gbps ports 2,000 E5345 14% HP Switching Capacity 4.8 Tbps Seagate 0% 15% 1 Gbps ports 16,939 Maxtor Western 0% 10 Gbps ports 558 Xeon Xeon Digital E5405 Xeon 59% L5420 6% IT Power Consumption 2,456 KW 8% E5410 16% Total Power Consumption 3,890 KW OpenStack Summit October 2012 Tim Bell, CERN 13
  • 14. OpenStack Summit October 2012 Tim Bell, CERN 14
  • 15. Our Challenges - Data storage • >20 years retention • 6GB/s average • 25GB/s peaks • 30PB/year to record OpenStack Summit October 2012 Tim Bell, CERN 15
  • 16. 45,000 tapes holding 73PB of physics data OpenStack Summit October 2012 Tim Bell, CERN 16
  • 17. New data centre to expand capacity • Data centre in Geneva at the limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Hands off facility • Deploying from 2013 with 200Gbit/s OpenStack Summit October 2012 Tim Bell, CERN network to CERN 17
  • 18. Time to change strategy • Rationale – Need to manage twice the servers as today – No increase in staff numbers – Tools becoming increasingly brittle and will not scale as-is • Approach – CERN is no longer a special case for compute – Adopt an open source tool chain model – Our engineers rapidly iterate • Evaluate solutions in the problem domain • Identify functional gaps and challenge them • Select first choice but be prepared to change in future – Contribute new function back to the community OpenStack Summit October 2012 Tim Bell, CERN 18
  • 19. Building Blocks mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Active Directory / Pulp LDAP Lemon / Hardware Hadoop database Puppet-DB OpenStack Summit October 2012 Tim Bell, CERN 19
  • 20. Training and Support • Buy the book rather than guru mentoring • Follow the mailing lists to learn • Newcomers are rapidly productive (and often know more than us) • Community and Enterprise support means we’re not on our own OpenStack Summit October 2012 Tim Bell, CERN 20
  • 21. Staff Motivation • Skills valuable outside of CERN when an engineer’s contracts end OpenStack Summit October 2012 Tim Bell, CERN 21
  • 22. Prepare the move to the clouds • Improve operational efficiency – Machine ordering, reception and testing – Hardware interventions with long running programs – Multiple operating system demand • Improve resource efficiency – Exploit idle resources, especially waiting for disk and tape I/O – Highly variable load such as interactive or build machines • Enable cloud architectures – Gradual migration to cloud interfaces and workflows • Improve responsiveness – Self-Service with coffee break response time OpenStack Summit October 2012 Tim Bell, CERN 22
  • 23. Public Procurement Purchase Model Step Time (Days) Elapsed (Days) User expresses requirement 0 Market Survey prepared 15 15 Market Survey for possible vendors 30 45 Specifications prepared 15 60 Vendor responses 30 90 Test systems evaluated 30 120 Offers adjudicated 10 130 Finance committee 30 160 Hardware delivered 90 250 Burn in and acceptance 30 days typical 280 380 worst case Total 280+ Days OpenStack Summit October 2012 Tim Bell, CERN 23
  • 24. Service Model • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one • Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed OpenStack Summit October 2012 Tim Bell, CERN 24
  • 25. Supporting the Pets with OpenStack • Network – Interfacing with legacy site DNS and IP management – Ensuring Kerberos identity before VM start • Puppet – Ease use of configuration management tools with our users – Exploit mcollective for orchestration/delegation • External Block Storage – Currently using nova-volume with Gluster backing store • Live migration to maximise availability – KVM live migration using Gluster – KVM and Hyper-V block migration OpenStack Summit October 2012 Tim Bell, CERN 25
  • 26. Current Status of OpenStack at CERN • Working on an Essex code base from the EPEL repository – Excellent experience with the Fedora cloud-sig team – Cloud-init for contextualisation, oz for images with RHEL/Fedora • Components – Current focus is on Nova with KVM and Hyper-V – Tests with Swift are ongoing but require significant experiment code changes • Pre-production facility with around 150 Hypervisors, with 2000 VMs integrated with CERN infrastructure, Puppet deployed and used for simulation of magnet placement using LHC@Home and batch OpenStack Summit October 2012 Tim Bell, CERN 26
  • 27. OpenStack Summit October 2012 Tim Bell, CERN 27
  • 28. When communities combine… • OpenStack’s many components and options make configuration complex out of the box • Puppet forge module from PuppetLabs does our configuration • The Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutes OpenStack Summit October 2012 Tim Bell, CERN 28
  • 29. Foreman to manage Puppetized VM OpenStack Summit October 2012 Tim Bell, CERN 29
  • 30. Active Directory Integration • CERN’s Active Directory – Unified identity management across the site – 44,000 users – 29,000 groups – 200 arrivals/departures per month • Full integration with Active Directory via LDAP – Uses the OpenLDAP backend with some particular configuration settings – Aim for minimal changes to Active Directory – 7 patches submitted around hard coded values and additional filtering • Now in use in our pre-production instance – Map project roles (admins, members) to groups – Documentation in the OpenStack wiki OpenStack Summit October 2012 Tim Bell, CERN 30
  • 31. Welcome Back Hyper-V! • We currently use Hyper-V/System Centre for our server consolidation activities – But need to scale to 100x current installation size • Choice of hypervisors should be tactical – Performance – Compatibility/Support with integration components – Image migration from legacy environments • CERN is working closely with the Hyper-V OpenStack team – Puppet to configure hypervisors on Windows – Most functions work well but further work on Console, Ceilometer, … OpenStack Summit October 2012 Tim Bell, CERN 31
  • 32. Opportunistic Clouds in online experiment farms • The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape • When the accelerator is not running, these machines are currently idle – Accelerator has regular maintenance slots of several days – Long Shutdown due from March 2013-November 2014 • One of the experiments are deploying OpenStack on their farm – Simulation (low I/O, high CPU) – Analysis (high I/O, high CPU, high network) OpenStack Summit October 2012 Tim Bell, CERN 32
  • 33. Federated European Clouds • Two significant European projects around Federated Clouds – European Grid Initiative Federated Cloud as a federation of grid sites providing IaaS – HELiX Nebula European Union funded project to create a scientific cloud based on commercial providers EGI Federated Cloud Sites CESGA CESNET INFN SARA Cyfronet FZ Jülich SZTAKI IPHC GRIF GRNET KTH Oxford GWDG IGI TCD IN2P3 STFC OpenStack Summit October 2012 Tim Bell, CERN 33
  • 34. Federated Cloud Commonalities • Basic building blocks – Each site gives an IaaS endpoint with an API • OCCI? CDMI ? EC2 ? Libcloud ? Jclouds ? – Image stores available across the sites – Federated identity management based on X.509 certificates – Consolidation of accounting information to validate pledges and usage – Common security policies and computing rules • Multiple cloud technologies in use – OpenStack – OpenNebula – Proprietary OpenStack Summit October 2012 Tim Bell, CERN 34
  • 35. Next Steps • Deploy into production at the start of 2013 with Folsom running the Grid software on top of OpenStack IaaS • Support multi-site operations with 2nd data centre in Hungary • Exploit new functionality – Ceilometer for metering – Bare metal for non-virtualised use cases such as high I/O servers – X.509 user certificate authentication – Load balancing as a service Ramping to 15,000 hypervisors with 100,000 to 300,000 VMs by 2015 OpenStack Summit October 2012 Tim Bell, CERN 35
  • 36. What are we missing (or haven’t found yet) ? • Best practice documentation for – Monitoring and KPIs as part of core functionality – Guest disaster recovery solutions – Migration between versions of OpenStack • Roles within multi-user projects – VM owner allowed to manage their own resources (start/stop/delete) – Project admins allowed to manage all resources – Other members should not have high rights over other members VMs • Global quota management for non-elastic private cloud – Manage resource prioritisation and allocation centrally – Capacity management / utilisation for planning OpenStack Summit October 2012 Tim Bell, CERN 36
  • 37. Conclusions • Production at CERN in next few months on Folsom – Our emphasis will shift to focus on stability – Integrate CERN legacy integrations via formal user exits – Work together with others on scaling improvements • Community is key to shared success – Our problems are often resolved before we raise them – Packaging teams are producing reliable builds promptly • CERN contributes and benefits – Thanks to everyone for their efforts and enthusiasm – Not just code but documentation, tests, blogs, … OpenStack Summit October 2012 Tim Bell, CERN 37
  • 38.
  • 39. References CERN http://public.web.cern.ch/public/ Scientific Linux http://www.scientificlinux.org/ Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/ http://rtm.hep.ph.ic.ac.uk/ Jobs http://cern.ch/jobs Detailed Report on Agile Infrastructure http://cern.ch/go/N8wp HELiX Nebula http://helix-nebula.eu/ EGI Cloud Taskforce https://wiki.egi.eu/wiki/Fedcloud-tf OpenStack Summit October 2012 Tim Bell, CERN 39
  • 40. Backup Slides OpenStack Summit October 2012 Tim Bell, CERN 40
  • 41. OpenStack Summit October 2012 Tim Bell, CERN 41
  • 42. CERN’s tools • The world’s most powerful accelerator: LHC – A 27 km long tunnel filled with high-tech instruments – Equipped with thousands of superconducting magnets – Accelerates particles to energies never before obtained – Produces particle collisions creating microscopic “big bangs” • Very large sophisticated detectors – Four experiments each the size of a cathedral – Hundred million measurement channels each – Data acquisition systems treating Petabytes per second • Top level computing to distribute and analyse the data – A Computing Grid linking ~200 computer centres around the globe – Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis OpenStack Summit October 2012 Tim Bell, CERN 42
  • 43. Our Infrastructure • Hardware is generally based on commodity, white-box servers – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF – Compute nodes typically dual processor, 2GB per core – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card • Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise – Focus is on stability in view of the number of centres on the WLCG OpenStack Summit October 2012 Tim Bell, CERN 43
  • 44. New architecture data flows OpenStack Summit October 2012 Tim Bell, CERN 44
  • 45. 500 1500 2000 2500 3000 3500 1000 0 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 OpenStack Summit October 2012 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Jul-11 Tim Bell, CERN Aug-11 Sep-11 Oct-11 Nov-11 Dec-11 Jan-12 Feb-12 Mar-12 Apr-12 May-12 Virtualisation on SCVMM/Hyper-V Jun-12 Jul-12 Aug-12 Sep-12 45 Linux Oct-12 Windows
  • 46. Scaling up with Puppet and OpenStack • Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC • Naturally, there is a puppet module puppet-boinc • 1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack OpenStack Summit October 2012 Tim Bell, CERN 46

Notes de l'éditeur

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. Biggest international scientific collaboration in the world, over 10,000 scientistsfrom 100 countriesAnnual Budget around 1.1 billion USDFunding for CERN, the laboratory, itselfcomesfrom the 20 member states, in ratio to the grossdomesticproduct… other countries contribute to experimentsincludingsubstantial US contribution towards the LHC experiments
  4. The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale.
  5. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  6. - At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s.
  7. - Collisions can be visualised by the tracks left in the various parts of the detectors. With many collisions, the statistics allows particle identification such as mass and charge. This is a simple one…
  8. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  9. To get Quark Gluon plasma, the material closest to the big bang, we also collide lead ions which is much more intensive… the temperatures reach 100,000 times that in the sun.
  10. - We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections.
  11. The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  12. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  13. Upstairs in the computer centre, a high roof was the fashion in the 1980s for mainframes but now is very difficult to get cooled efficiently
  14. Our data storage system has to record and preserve 30PB/year with an expected lifetime of 20 years. Keeping the old data is required to get the maximum statistics for discoveries. At times, physicists will want to skim this data looking for new physics. Data rates are around 6GB/s average, with peaks of 25GB/s.
  15. Tape robots from IBM and OracleAround 60,000 tape mounts / week so the robots are kept busyData copied every two years to keep up with the latest media densities
  16. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  17. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  18. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  19. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  20. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  21. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  22. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  23. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…
  24. CERN is more than just the LHCCNGS neutrinos to Gran SassoCLOUD demonstrating impacts of cosmic rays on weather patternsAnti-hydrogen atoms contained for minutes in a magnetic vesselHowever, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus
  25. We purchase on an annuak cycle, replacing around ¼ of the servers. This purchasing is based on performance metrics such as cost per SpecInt or cost/GBGenerally, we are seeing dual core computer servers with Intel or AMD processors and bulk storage servers with 24 or 36 2TB disksThe operating system is Redhatlinux based distributon called Scientific Linux. We share the development and maintenance with Fermilab in Chicago. The choice of a Redhat based distribution comes from the need for stability across the grid, where keeping the 200 centres running compatible Linux distributions.
  26. LHC@Home is not an instruction on how to build your own accelerator but a magnet simulation tool to test multiple passes around the ring. We wanted to use it as a stress test tool and in ½ day, it was running on 1000 VMs.