SlideShare une entreprise Scribd logo
1  sur  46
Accelerating Science
  with OpenStack

           Tim Bell
       Tim.Bell@cern.ch
         @noggin143

  OpenStack Summit San Diego
      17th October 2012
What is CERN ?
• Conseil Européen pour la
  Recherche Nucléaire – aka
  European Laboratory for
  Particle Physics
• Between Geneva and the
  Jura mountains, straddling
  the Swiss-French border
• Founded in 1954 with an
  international treaty
• Our business is fundamental
  physics , what is the
  universe made of and how
  does it work
  OpenStack Summit October 2012       Tim Bell, CERN   2
Answering fundamental questions…
• How to explain particles have mass?
   We have theories and accumulating experimental evidence.. Getting close…

• What is 96% of the universe made of ?
   We can only see 4% of its estimated mass!

• Why isn’t there anti-matter
  in the universe?
   Nature should be symmetric…

• What was the state of matter just
  after the « Big Bang » ?
   Travelling back to the earliest instants of
   the universe would help…

OpenStack Summit October 2012          Tim Bell, CERN                         3
Community collaboration on an international scale




OpenStack Summit October 2012   Tim Bell, CERN         4
The Large Hadron Collider




OpenStack Summit October 2012   Tim Bell, CERN   5
The Large Hadron Collider (LHC) tunnel




OpenStack Summit October 2012   Tim Bell, CERN
                                                 6
OpenStack Summit October 2012   Tim Bell, CERN   7
Accumulating events in 2009-2011




OpenStack Summit October 2012   Tim Bell, CERN     8
OpenStack Summit October 2012   Tim Bell, CERN   9
Heavy Ion Collisions




OpenStack Summit October 2012          Tim Bell, CERN   10
OpenStack Summit October 2012   Tim Bell, CERN   11
Tier-0 (CERN):
                                                               •Data recording
                                                               •Initial data reconstruction
                                                               •Data distribution



                                                              Tier-1 (11 centres):
                                                              •Permanent storage
                                                              •Re-processing
                                                              •Analysis


                                                              Tier-2 (~200 centres):
                                                              • Simulation
                                                              • End-user analysis


• Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
• In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs
   OpenStack Summit October 2012       Tim Bell, CERN                                12
•     Data Centre by Numbers
        – Hardware installation & retirement
                  •    ~7,000 hardware movements/year; ~1,800 disk failures/year


          Racks                              828    Disks                                  64,109         Tape Drives                 160
          Servers                         11,728    Raw disk capacity (TiB)                63,289         Tape Cartridges           45,000
          Processors                      15,694    Memory modules                         56,014         Tape slots                56,000
          Cores                           64,238    Memory capacity (TiB)                       158       Tape Capacity (TiB)       73,000
          HEPSpec06                      482,507    RAID controllers                           3,749
                                                                                                          High Speed Routers
                                                                                                                                       24
                       Xeon    Xeon Xeon                        Other Fujitsu                             (640 Mbps → 2.4 Tbps)
                       3GHz    5150 5160 Xeon                    0%    3%
       Xeon             4%      2% 10% E5335
                                                                                                          Ethernet Switches           350
       L5520                              7% Xeon                               Hitachi
        33%                                                                      23%                      10 Gbps ports              2,000
                                            E5345
                                             14%                                                 HP       Switching Capacity      4.8 Tbps
                                                                                     Seagate
                                                                                                 0%
                                                                                      15%
                                                                                                          1 Gbps ports              16,939
                                                                                                Maxtor
                                                    Western                                      0%       10 Gbps ports               558
                                           Xeon
               Xeon                                 Digital
                                           E5405
                               Xeon                  59%
               L5420                        6%                                                         IT Power Consumption       2,456 KW
                8%             E5410
                                16%
                                                                                                       Total Power Consumption    3,890 KW




    OpenStack Summit October 2012                             Tim Bell, CERN                                                                 13
OpenStack Summit October 2012   Tim Bell, CERN   14
Our Challenges - Data storage




                                •   >20 years retention
                                •   6GB/s average
                                •   25GB/s peaks
                                •   30PB/year to record

OpenStack Summit October 2012          Tim Bell, CERN     15
45,000 tapes holding 73PB of physics data




OpenStack Summit October 2012       Tim Bell, CERN              16
New data centre to expand capacity
                                                 • Data centre in Geneva
                                                   at the limit of
                                                   electrical capacity at
                                                   3.5MW
                                                 • New centre chosen in
                                                   Budapest, Hungary
                                                 • Additional 2.7MW of
                                                   usable power
                                                 • Hands off facility
                                                 • Deploying from 2013
                                                   with 200Gbit/s
OpenStack Summit October 2012   Tim Bell, CERN
                                                   network to CERN        17
Time to change strategy
• Rationale
      – Need to manage twice the servers as today
      – No increase in staff numbers
      – Tools becoming increasingly brittle and will not scale as-is
• Approach
      – CERN is no longer a special case for compute
      – Adopt an open source tool chain model
      – Our engineers rapidly iterate
            • Evaluate solutions in the problem domain
            • Identify functional gaps and challenge them
            • Select first choice but be prepared to change in future
      – Contribute new function back to the community

OpenStack Summit October 2012           Tim Bell, CERN                  18
Building Blocks
                                    mcollective, yum              Bamboo

      Puppet
                                                 AIMS/PXE
                                                  Foreman                  JIRA

   OpenStack
     Nova


                                                                                  git



                                                                           Koji, Mock
                                                  Yum repo
 Active Directory /                                 Pulp
       LDAP




                                                             Lemon /
        Hardware
                                                             Hadoop
        database
                                  Puppet-DB
OpenStack Summit October 2012         Tim Bell, CERN                                19
Training and Support
•   Buy the book rather than guru mentoring
•   Follow the mailing lists to learn
•   Newcomers are rapidly productive (and often know more than us)
•   Community and Enterprise support means we’re not on our own




OpenStack Summit October 2012     Tim Bell, CERN                     20
Staff Motivation
• Skills valuable outside of CERN when an engineer’s contracts
  end




OpenStack Summit October 2012   Tim Bell, CERN                   21
Prepare the move to the clouds
• Improve operational efficiency
      – Machine ordering, reception and testing
      – Hardware interventions with long running programs
      – Multiple operating system demand
• Improve resource efficiency
      – Exploit idle resources, especially waiting for disk and tape I/O
      – Highly variable load such as interactive or build machines
• Enable cloud architectures
      – Gradual migration to cloud interfaces and workflows
• Improve responsiveness
      – Self-Service with coffee break response time


OpenStack Summit October 2012        Tim Bell, CERN                        22
Public Procurement Purchase Model
Step                                     Time (Days)         Elapsed (Days)
User expresses requirement                                                     0
Market Survey prepared                                 15                     15
Market Survey for possible vendors                     30                     45
Specifications prepared                                15                     60
Vendor responses                                       30                     90
Test systems evaluated                                 30                     120
Offers adjudicated                                     10                     130
Finance committee                                      30                     160
Hardware delivered                                     90                     250
Burn in and acceptance                     30 days typical                    280
                                           380 worst case
Total                                                            280+ Days
OpenStack Summit October 2012        Tim Bell, CERN                                 23
Service Model
                                 • Pets are given names like
                                   pussinboots.cern.ch
                                 • They are unique, lovingly hand raised
                                   and cared for
                                 • When they get ill, you nurse them back
                                   to health

                                 • Cattle are given numbers like
                                   vm0042.cern.ch
                                 • They are almost identical to other cattle
                                 • When they get ill, you get another one



          • Future application architectures should use Cattle but Pets with
            strong configuration management are viable and still needed
OpenStack Summit October 2012         Tim Bell, CERN                           24
Supporting the Pets with OpenStack
• Network
      – Interfacing with legacy site DNS and IP management
      – Ensuring Kerberos identity before VM start
• Puppet
      – Ease use of configuration management tools with our users
      – Exploit mcollective for orchestration/delegation
• External Block Storage
      – Currently using nova-volume with Gluster backing store
• Live migration to maximise availability
      – KVM live migration using Gluster
      – KVM and Hyper-V block migration

OpenStack Summit October 2012      Tim Bell, CERN                   25
Current Status of OpenStack at CERN
• Working on an Essex code base from the EPEL repository
      – Excellent experience with the Fedora cloud-sig team
      – Cloud-init for contextualisation, oz for images with RHEL/Fedora
• Components
      – Current focus is on Nova with KVM and Hyper-V
      – Tests with Swift are ongoing but require significant experiment code
        changes
• Pre-production facility with around 150 Hypervisors, with
  2000 VMs integrated with CERN infrastructure, Puppet
  deployed and used for simulation of magnet placement using
  LHC@Home and batch

OpenStack Summit October 2012      Tim Bell, CERN                              26
OpenStack Summit October 2012   Tim Bell, CERN   27
When communities combine…
• OpenStack’s many components and options make
  configuration complex out of the box
• Puppet forge module from PuppetLabs does our configuration
• The Foreman adds OpenStack provisioning for user kiosk to a
  configured machine in 15 minutes




OpenStack Summit October 2012   Tim Bell, CERN              28
Foreman to manage Puppetized VM




OpenStack Summit October 2012   Tim Bell, CERN   29
Active Directory Integration
• CERN’s Active Directory
     –   Unified identity management across the site
     –   44,000 users
     –   29,000 groups
     –   200 arrivals/departures per month
• Full integration with Active Directory via LDAP
     – Uses the OpenLDAP backend with some particular configuration
       settings
     – Aim for minimal changes to Active Directory
     – 7 patches submitted around hard coded values and additional filtering
• Now in use in our pre-production instance
     – Map project roles (admins, members) to groups
     – Documentation in the OpenStack wiki
OpenStack Summit October 2012       Tim Bell, CERN                         30
Welcome Back Hyper-V!
• We currently use Hyper-V/System Centre for our server
  consolidation activities
      – But need to scale to 100x current installation size
• Choice of hypervisors should be tactical
      – Performance
      – Compatibility/Support with integration components
      – Image migration from legacy environments
• CERN is working closely with the Hyper-V OpenStack team
      – Puppet to configure hypervisors on Windows
      – Most functions work well but further work on Console, Ceilometer, …


OpenStack Summit October 2012       Tim Bell, CERN                            31
Opportunistic Clouds in online experiment farms
• The CERN experiments have farms of 1000s of Linux servers
  close to the detectors to filter the 1PByte/s down to 6GByte/s
  to be recorded to tape
• When the accelerator is not running, these machines are
  currently idle
      – Accelerator has regular maintenance slots of several days
      – Long Shutdown due from March 2013-November 2014
• One of the experiments are deploying OpenStack on their farm
      – Simulation (low I/O, high CPU)
      – Analysis (high I/O, high CPU, high network)


OpenStack Summit October 2012      Tim Bell, CERN                   32
Federated European Clouds
• Two significant European projects around Federated Clouds
      – European Grid Initiative Federated Cloud as a federation of grid sites providing
        IaaS
      – HELiX Nebula European Union funded project to create a scientific cloud
        based on commercial providers
                                                                   EGI Federated Cloud Sites
                                                        CESGA        CESNET       INFN         SARA


                                                        Cyfronet     FZ Jülich    SZTAKI       IPHC


                                                        GRIF         GRNET        KTH          Oxford


                                                        GWDG         IGI          TCD          IN2P3


                                                        STFC


OpenStack Summit October 2012          Tim Bell, CERN                                                   33
Federated Cloud Commonalities
• Basic building blocks
      – Each site gives an IaaS endpoint with an API and common security
        policy
            • OCCI? CDMI ? Libcloud ? Jclouds ?
      – Image stores available across the sites
      – Federated identity management based on X.509 certificates
      – Consolidation of accounting information to validate pledges and usage
• Multiple cloud technologies
      – OpenStack
      – OpenNebula
      – Proprietary

OpenStack Summit October 2012          Tim Bell, CERN                       34
Next Steps
• Deploy into production at the start of 2013 with Folsom running the Grid
  software on top of OpenStack IaaS
• Support multi-site operations with 2nd data centre in Hungary
• Exploit new functionality
    – Ceilometer for metering
    – Bare metal for non-virtualised use cases such as high I/O servers
    – X.509 user certificate authentication
    – Load balancing as a service


Ramping to 15,000 hypervisors with
100,000 to 300,000 VMs by 2015
OpenStack Summit October 2012     Tim Bell, CERN                             35
What are we missing (or haven’t found yet) ?
• Best practice for
      – Monitoring and KPIs as part of core functionality
      – Guest disaster recovery
      – Migration between versions of OpenStack
• Roles within multi-user projects
      – VM owner allowed to manage their own resources (start/stop/delete)
      – Project admins allowed to manage all resources
      – Other members should not have high rights over other members VMs
• Global quota management for non-elastic private cloud
      – Manage resource prioritisation and allocation centrally
      – Capacity management / utilisation for planning

OpenStack Summit October 2012       Tim Bell, CERN                       36
Conclusions
• Production at CERN in next few months on Folsom
      – Our emphasis will shift to focus on stability
      – Integrate CERN legacy integrations via formal user exits
      – Work together with others on scaling improvements
• Community is key to shared success
      – Our problems are often resolved before we raise them
      – Packaging teams are producing reliable builds promptly
• CERN contributes and benefits
      – Thanks to everyone for their efforts and enthusiasm
      – Not just code but documentation, tests, blogs, …
OpenStack Summit October 2012      Tim Bell, CERN                  37
References
CERN                                            http://public.web.cern.ch/public/
Scientific Linux                                http://www.scientificlinux.org/
Worldwide LHC Computing Grid                    http://lcg.web.cern.ch/lcg/
                                                http://rtm.hep.ph.ic.ac.uk/
Jobs                                            http://cern.ch/jobs
Detailed Report on Agile Infrastructure         http://cern.ch/go/N8wp
HELiX Nebula                                    http://helix-nebula.eu/
EGI Cloud Taskforce                             https://wiki.egi.eu/wiki/Fedcloud-tf




 OpenStack Summit October 2012            Tim Bell, CERN                               39
Backup Slides




OpenStack Summit October 2012   Tim Bell, CERN   40
OpenStack Summit October 2012   Tim Bell, CERN   41
CERN’s tools
• The world’s most powerful accelerator: LHC
      –   A 27 km long tunnel filled with high-tech instruments
      –   Equipped with thousands of superconducting magnets
      –   Accelerates particles to energies never before obtained
      –   Produces particle collisions creating microscopic “big bangs”
• Very large sophisticated detectors
      – Four experiments each the size of a cathedral
      – Hundred million measurement channels each
      – Data acquisition systems treating Petabytes per second
• Top level computing to distribute and analyse the data
      – A Computing Grid linking ~200 computer centres around the globe
      – Sufficient computing power and storage to handle 25 Petabytes per
        year, making them available to thousands of physicists for analysis
OpenStack Summit October 2012        Tim Bell, CERN                           42
Our Infrastructure
• Hardware is generally based on commodity, white-box servers
      – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF
      – Compute nodes typically dual processor, 2GB per core
      – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
• Vast majority of servers run Scientific Linux, developed by
  Fermilab and CERN, based on Redhat Enterprise
      – Focus is on stability in view of the number of centres on the WLCG




OpenStack Summit October 2012      Tim Bell, CERN                            43
New architecture data flows




OpenStack Summit October 2012   Tim Bell, CERN   44
500
                                                   1500
                                                                    2000
                                                                           2500
                                                                                  3000
                                                                                         3500




                                          1000




                                0
 Mar-10

       Apr-10

May-10

         Jun-10

                Jul-10

    Aug-10

      Sep-10

        Oct-10




OpenStack Summit October 2012
   Nov-10

     Dec-10

          Jan-11

      Feb-11

 Mar-11

       Apr-11

May-11

         Jun-11

                Jul-11




Tim Bell, CERN
    Aug-11

      Sep-11

        Oct-11

   Nov-11

     Dec-11

          Jan-12

      Feb-12

 Mar-12

       Apr-12

May-12
                                                                                                Virtualisation on SCVMM/Hyper-V




         Jun-12

                Jul-12

    Aug-12

      Sep-12
45
                                                 Linux




        Oct-12
                                                          Windows
Scaling up with Puppet and OpenStack
• Use LHC@Home based on BOINC for simulating magnetics
  guiding particles around the LHC
• Naturally, there is a puppet module puppet-boinc
• 1000 VMs spun up to stress test the hypervisors with Puppet,
  Foreman and OpenStack




OpenStack Summit October 2012   Tim Bell, CERN                   46

Contenu connexe

Tendances

Ga techsusthpc patterson
Ga techsusthpc pattersonGa techsusthpc patterson
Ga techsusthpc pattersonMelanie Brandt
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARSEd Dodds
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN GenevaSteve Traylen
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 

Tendances (9)

Ga techsusthpc patterson
Ga techsusthpc pattersonGa techsusthpc patterson
Ga techsusthpc patterson
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Evolution of OSCARS
Evolution of OSCARSEvolution of OSCARS
Evolution of OSCARS
 
Sponge v2
Sponge v2Sponge v2
Sponge v2
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
Again music
Again musicAgain music
Again music
 
Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN Geneva
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 

En vedette

Project RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxProject RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxOpenStack Foundation
 
Best of Breed OpenStack Compute & Block Storage Cloud... .pdf
Best of Breed OpenStack Compute & Block Storage Cloud... .pdfBest of Breed OpenStack Compute & Block Storage Cloud... .pdf
Best of Breed OpenStack Compute & Block Storage Cloud... .pdfOpenStack Foundation
 
How DreamHost builds a public cloud with OpenStack.pdf
How DreamHost builds a public cloud with OpenStack.pdfHow DreamHost builds a public cloud with OpenStack.pdf
How DreamHost builds a public cloud with OpenStack.pdfOpenStack Foundation
 
Operating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOperating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOpenStack Foundation
 
Gonzalez barahona community_board_metrics_0415162
Gonzalez barahona community_board_metrics_0415162Gonzalez barahona community_board_metrics_0415162
Gonzalez barahona community_board_metrics_0415162OpenStack Foundation
 
Cherian networking in_the_cloud_041613
Cherian networking in_the_cloud_041613Cherian networking in_the_cloud_041613
Cherian networking in_the_cloud_041613OpenStack Foundation
 
Openstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneillOpenstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneillOpenStack Foundation
 

En vedette (9)

Project RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptxProject RedDwarf - Database Services in the Cloud.pptx
Project RedDwarf - Database Services in the Cloud.pptx
 
Best of Breed OpenStack Compute & Block Storage Cloud... .pdf
Best of Breed OpenStack Compute & Block Storage Cloud... .pdfBest of Breed OpenStack Compute & Block Storage Cloud... .pdf
Best of Breed OpenStack Compute & Block Storage Cloud... .pdf
 
How DreamHost builds a public cloud with OpenStack.pdf
How DreamHost builds a public cloud with OpenStack.pdfHow DreamHost builds a public cloud with OpenStack.pdf
How DreamHost builds a public cloud with OpenStack.pdf
 
Operating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdfOperating your OpenStack Private Cloud.pdf
Operating your OpenStack Private Cloud.pdf
 
dodai_grizzly.pdf
dodai_grizzly.pdfdodai_grizzly.pdf
dodai_grizzly.pdf
 
Gonzalez barahona community_board_metrics_0415162
Gonzalez barahona community_board_metrics_0415162Gonzalez barahona community_board_metrics_0415162
Gonzalez barahona community_board_metrics_0415162
 
Cherian networking in_the_cloud_041613
Cherian networking in_the_cloud_041613Cherian networking in_the_cloud_041613
Cherian networking in_the_cloud_041613
 
Openstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneillOpenstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneill
 
Hp intro
Hp introHp intro
Hp intro
 

Similaire à Accelerating Science with OpenStack.pptx

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialJeff Larkin
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveTim Bell
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Larry Smarr
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoKota Tsuyuzaki
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus Hirofumi Ichihara
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM Research
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challengeJason Shih
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snwScott Adams
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationxKinAnx
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 

Similaire à Accelerating Science with OpenStack.pptx (20)

3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation Savings
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Petabye scale data challenge
Petabye scale data challengePetabye scale data challenge
Petabye scale data challenge
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snw
 
Chapter03
Chapter03Chapter03
Chapter03
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 

Plus de OpenStack Foundation

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018OpenStack Foundation
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Foundation
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Foundation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack Foundation
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesOpenStack Foundation
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Foundation
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition OpenStack Foundation
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionOpenStack Foundation
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty EditionOpenStack Foundation
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionOpenStack Foundation
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Foundation
 

Plus de OpenStack Foundation (20)

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For Attendees
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community Presentation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group Parties
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing Plan
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar
 
Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Nova Updates - Liberty Edition
Nova Updates - Liberty EditionNova Updates - Liberty Edition
Nova Updates - Liberty Edition
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Trove Updates - Liberty Edition
Trove Updates - Liberty EditionTrove Updates - Liberty Edition
Trove Updates - Liberty Edition
 
OpenStack: five years in
OpenStack: five years inOpenStack: five years in
OpenStack: five years in
 
Swift Updates - Liberty Edition
Swift Updates - Liberty EditionSwift Updates - Liberty Edition
Swift Updates - Liberty Edition
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty Edition
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use Cases
 

Accelerating Science with OpenStack.pptx

  • 1. Accelerating Science with OpenStack Tim Bell Tim.Bell@cern.ch @noggin143 OpenStack Summit San Diego 17th October 2012
  • 2. What is CERN ? • Conseil Européen pour la Recherche Nucléaire – aka European Laboratory for Particle Physics • Between Geneva and the Jura mountains, straddling the Swiss-French border • Founded in 1954 with an international treaty • Our business is fundamental physics , what is the universe made of and how does it work OpenStack Summit October 2012 Tim Bell, CERN 2
  • 3. Answering fundamental questions… • How to explain particles have mass? We have theories and accumulating experimental evidence.. Getting close… • What is 96% of the universe made of ? We can only see 4% of its estimated mass! • Why isn’t there anti-matter in the universe? Nature should be symmetric… • What was the state of matter just after the « Big Bang » ? Travelling back to the earliest instants of the universe would help… OpenStack Summit October 2012 Tim Bell, CERN 3
  • 4. Community collaboration on an international scale OpenStack Summit October 2012 Tim Bell, CERN 4
  • 5. The Large Hadron Collider OpenStack Summit October 2012 Tim Bell, CERN 5
  • 6. The Large Hadron Collider (LHC) tunnel OpenStack Summit October 2012 Tim Bell, CERN 6
  • 7. OpenStack Summit October 2012 Tim Bell, CERN 7
  • 8. Accumulating events in 2009-2011 OpenStack Summit October 2012 Tim Bell, CERN 8
  • 9. OpenStack Summit October 2012 Tim Bell, CERN 9
  • 10. Heavy Ion Collisions OpenStack Summit October 2012 Tim Bell, CERN 10
  • 11. OpenStack Summit October 2012 Tim Bell, CERN 11
  • 12. Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs OpenStack Summit October 2012 Tim Bell, CERN 12
  • 13. Data Centre by Numbers – Hardware installation & retirement • ~7,000 hardware movements/year; ~1,800 disk failures/year Racks 828 Disks 64,109 Tape Drives 160 Servers 11,728 Raw disk capacity (TiB) 63,289 Tape Cartridges 45,000 Processors 15,694 Memory modules 56,014 Tape slots 56,000 Cores 64,238 Memory capacity (TiB) 158 Tape Capacity (TiB) 73,000 HEPSpec06 482,507 RAID controllers 3,749 High Speed Routers 24 Xeon Xeon Xeon Other Fujitsu (640 Mbps → 2.4 Tbps) 3GHz 5150 5160 Xeon 0% 3% Xeon 4% 2% 10% E5335 Ethernet Switches 350 L5520 7% Xeon Hitachi 33% 23% 10 Gbps ports 2,000 E5345 14% HP Switching Capacity 4.8 Tbps Seagate 0% 15% 1 Gbps ports 16,939 Maxtor Western 0% 10 Gbps ports 558 Xeon Xeon Digital E5405 Xeon 59% L5420 6% IT Power Consumption 2,456 KW 8% E5410 16% Total Power Consumption 3,890 KW OpenStack Summit October 2012 Tim Bell, CERN 13
  • 14. OpenStack Summit October 2012 Tim Bell, CERN 14
  • 15. Our Challenges - Data storage • >20 years retention • 6GB/s average • 25GB/s peaks • 30PB/year to record OpenStack Summit October 2012 Tim Bell, CERN 15
  • 16. 45,000 tapes holding 73PB of physics data OpenStack Summit October 2012 Tim Bell, CERN 16
  • 17. New data centre to expand capacity • Data centre in Geneva at the limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Hands off facility • Deploying from 2013 with 200Gbit/s OpenStack Summit October 2012 Tim Bell, CERN network to CERN 17
  • 18. Time to change strategy • Rationale – Need to manage twice the servers as today – No increase in staff numbers – Tools becoming increasingly brittle and will not scale as-is • Approach – CERN is no longer a special case for compute – Adopt an open source tool chain model – Our engineers rapidly iterate • Evaluate solutions in the problem domain • Identify functional gaps and challenge them • Select first choice but be prepared to change in future – Contribute new function back to the community OpenStack Summit October 2012 Tim Bell, CERN 18
  • 19. Building Blocks mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Active Directory / Pulp LDAP Lemon / Hardware Hadoop database Puppet-DB OpenStack Summit October 2012 Tim Bell, CERN 19
  • 20. Training and Support • Buy the book rather than guru mentoring • Follow the mailing lists to learn • Newcomers are rapidly productive (and often know more than us) • Community and Enterprise support means we’re not on our own OpenStack Summit October 2012 Tim Bell, CERN 20
  • 21. Staff Motivation • Skills valuable outside of CERN when an engineer’s contracts end OpenStack Summit October 2012 Tim Bell, CERN 21
  • 22. Prepare the move to the clouds • Improve operational efficiency – Machine ordering, reception and testing – Hardware interventions with long running programs – Multiple operating system demand • Improve resource efficiency – Exploit idle resources, especially waiting for disk and tape I/O – Highly variable load such as interactive or build machines • Enable cloud architectures – Gradual migration to cloud interfaces and workflows • Improve responsiveness – Self-Service with coffee break response time OpenStack Summit October 2012 Tim Bell, CERN 22
  • 23. Public Procurement Purchase Model Step Time (Days) Elapsed (Days) User expresses requirement 0 Market Survey prepared 15 15 Market Survey for possible vendors 30 45 Specifications prepared 15 60 Vendor responses 30 90 Test systems evaluated 30 120 Offers adjudicated 10 130 Finance committee 30 160 Hardware delivered 90 250 Burn in and acceptance 30 days typical 280 380 worst case Total 280+ Days OpenStack Summit October 2012 Tim Bell, CERN 23
  • 24. Service Model • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one • Future application architectures should use Cattle but Pets with strong configuration management are viable and still needed OpenStack Summit October 2012 Tim Bell, CERN 24
  • 25. Supporting the Pets with OpenStack • Network – Interfacing with legacy site DNS and IP management – Ensuring Kerberos identity before VM start • Puppet – Ease use of configuration management tools with our users – Exploit mcollective for orchestration/delegation • External Block Storage – Currently using nova-volume with Gluster backing store • Live migration to maximise availability – KVM live migration using Gluster – KVM and Hyper-V block migration OpenStack Summit October 2012 Tim Bell, CERN 25
  • 26. Current Status of OpenStack at CERN • Working on an Essex code base from the EPEL repository – Excellent experience with the Fedora cloud-sig team – Cloud-init for contextualisation, oz for images with RHEL/Fedora • Components – Current focus is on Nova with KVM and Hyper-V – Tests with Swift are ongoing but require significant experiment code changes • Pre-production facility with around 150 Hypervisors, with 2000 VMs integrated with CERN infrastructure, Puppet deployed and used for simulation of magnet placement using LHC@Home and batch OpenStack Summit October 2012 Tim Bell, CERN 26
  • 27. OpenStack Summit October 2012 Tim Bell, CERN 27
  • 28. When communities combine… • OpenStack’s many components and options make configuration complex out of the box • Puppet forge module from PuppetLabs does our configuration • The Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutes OpenStack Summit October 2012 Tim Bell, CERN 28
  • 29. Foreman to manage Puppetized VM OpenStack Summit October 2012 Tim Bell, CERN 29
  • 30. Active Directory Integration • CERN’s Active Directory – Unified identity management across the site – 44,000 users – 29,000 groups – 200 arrivals/departures per month • Full integration with Active Directory via LDAP – Uses the OpenLDAP backend with some particular configuration settings – Aim for minimal changes to Active Directory – 7 patches submitted around hard coded values and additional filtering • Now in use in our pre-production instance – Map project roles (admins, members) to groups – Documentation in the OpenStack wiki OpenStack Summit October 2012 Tim Bell, CERN 30
  • 31. Welcome Back Hyper-V! • We currently use Hyper-V/System Centre for our server consolidation activities – But need to scale to 100x current installation size • Choice of hypervisors should be tactical – Performance – Compatibility/Support with integration components – Image migration from legacy environments • CERN is working closely with the Hyper-V OpenStack team – Puppet to configure hypervisors on Windows – Most functions work well but further work on Console, Ceilometer, … OpenStack Summit October 2012 Tim Bell, CERN 31
  • 32. Opportunistic Clouds in online experiment farms • The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tape • When the accelerator is not running, these machines are currently idle – Accelerator has regular maintenance slots of several days – Long Shutdown due from March 2013-November 2014 • One of the experiments are deploying OpenStack on their farm – Simulation (low I/O, high CPU) – Analysis (high I/O, high CPU, high network) OpenStack Summit October 2012 Tim Bell, CERN 32
  • 33. Federated European Clouds • Two significant European projects around Federated Clouds – European Grid Initiative Federated Cloud as a federation of grid sites providing IaaS – HELiX Nebula European Union funded project to create a scientific cloud based on commercial providers EGI Federated Cloud Sites CESGA CESNET INFN SARA Cyfronet FZ Jülich SZTAKI IPHC GRIF GRNET KTH Oxford GWDG IGI TCD IN2P3 STFC OpenStack Summit October 2012 Tim Bell, CERN 33
  • 34. Federated Cloud Commonalities • Basic building blocks – Each site gives an IaaS endpoint with an API and common security policy • OCCI? CDMI ? Libcloud ? Jclouds ? – Image stores available across the sites – Federated identity management based on X.509 certificates – Consolidation of accounting information to validate pledges and usage • Multiple cloud technologies – OpenStack – OpenNebula – Proprietary OpenStack Summit October 2012 Tim Bell, CERN 34
  • 35. Next Steps • Deploy into production at the start of 2013 with Folsom running the Grid software on top of OpenStack IaaS • Support multi-site operations with 2nd data centre in Hungary • Exploit new functionality – Ceilometer for metering – Bare metal for non-virtualised use cases such as high I/O servers – X.509 user certificate authentication – Load balancing as a service Ramping to 15,000 hypervisors with 100,000 to 300,000 VMs by 2015 OpenStack Summit October 2012 Tim Bell, CERN 35
  • 36. What are we missing (or haven’t found yet) ? • Best practice for – Monitoring and KPIs as part of core functionality – Guest disaster recovery – Migration between versions of OpenStack • Roles within multi-user projects – VM owner allowed to manage their own resources (start/stop/delete) – Project admins allowed to manage all resources – Other members should not have high rights over other members VMs • Global quota management for non-elastic private cloud – Manage resource prioritisation and allocation centrally – Capacity management / utilisation for planning OpenStack Summit October 2012 Tim Bell, CERN 36
  • 37. Conclusions • Production at CERN in next few months on Folsom – Our emphasis will shift to focus on stability – Integrate CERN legacy integrations via formal user exits – Work together with others on scaling improvements • Community is key to shared success – Our problems are often resolved before we raise them – Packaging teams are producing reliable builds promptly • CERN contributes and benefits – Thanks to everyone for their efforts and enthusiasm – Not just code but documentation, tests, blogs, … OpenStack Summit October 2012 Tim Bell, CERN 37
  • 38.
  • 39. References CERN http://public.web.cern.ch/public/ Scientific Linux http://www.scientificlinux.org/ Worldwide LHC Computing Grid http://lcg.web.cern.ch/lcg/ http://rtm.hep.ph.ic.ac.uk/ Jobs http://cern.ch/jobs Detailed Report on Agile Infrastructure http://cern.ch/go/N8wp HELiX Nebula http://helix-nebula.eu/ EGI Cloud Taskforce https://wiki.egi.eu/wiki/Fedcloud-tf OpenStack Summit October 2012 Tim Bell, CERN 39
  • 40. Backup Slides OpenStack Summit October 2012 Tim Bell, CERN 40
  • 41. OpenStack Summit October 2012 Tim Bell, CERN 41
  • 42. CERN’s tools • The world’s most powerful accelerator: LHC – A 27 km long tunnel filled with high-tech instruments – Equipped with thousands of superconducting magnets – Accelerates particles to energies never before obtained – Produces particle collisions creating microscopic “big bangs” • Very large sophisticated detectors – Four experiments each the size of a cathedral – Hundred million measurement channels each – Data acquisition systems treating Petabytes per second • Top level computing to distribute and analyse the data – A Computing Grid linking ~200 computer centres around the globe – Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis OpenStack Summit October 2012 Tim Bell, CERN 42
  • 43. Our Infrastructure • Hardware is generally based on commodity, white-box servers – Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF – Compute nodes typically dual processor, 2GB per core – Bulk storage on 24x2TB disk storage-in-a-box with a RAID card • Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise – Focus is on stability in view of the number of centres on the WLCG OpenStack Summit October 2012 Tim Bell, CERN 43
  • 44. New architecture data flows OpenStack Summit October 2012 Tim Bell, CERN 44
  • 45. 500 1500 2000 2500 3000 3500 1000 0 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 OpenStack Summit October 2012 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11 Apr-11 May-11 Jun-11 Jul-11 Tim Bell, CERN Aug-11 Sep-11 Oct-11 Nov-11 Dec-11 Jan-12 Feb-12 Mar-12 Apr-12 May-12 Virtualisation on SCVMM/Hyper-V Jun-12 Jul-12 Aug-12 Sep-12 45 Linux Oct-12 Windows
  • 46. Scaling up with Puppet and OpenStack • Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHC • Naturally, there is a puppet module puppet-boinc • 1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack OpenStack Summit October 2012 Tim Bell, CERN 46

Notes de l'éditeur

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. Biggest international scientific collaboration in the world, over 10,000 scientistsfrom 100 countriesAnnual Budget around 1.1 billion USDFunding for CERN, the laboratory, itselfcomesfrom the 20 member states, in ratio to the grossdomesticproduct… other countries contribute to experimentsincludingsubstantial US contribution towards the LHC experiments
  4. The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale.
  5. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  6. - At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s.
  7. - Collisions can be visualised by the tracks left in the various parts of the detectors. With many collisions, the statistics allows particle identification such as mass and charge. This is a simple one…
  8. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  9. To get Quark Gluon plasma, the material closest to the big bang, we also collide lead ions which is much more intensive… the temperatures reach 100,000 times that in the sun.
  10. - We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections.
  11. The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  12. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  13. Upstairs in the computer centre, a high roof was the fashion in the 1980s for mainframes but now is very difficult to get cooled efficiently
  14. Our data storage system has to record and preserve 30PB/year with an expected lifetime of 20 years. Keeping the old data is required to get the maximum statistics for discoveries. At times, physicists will want to skim this data looking for new physics. Data rates are around 6GB/s average, with peaks of 25GB/s.
  15. Tape robots from IBM and OracleAround 60,000 tape mounts / week so the robots are kept busyData copied every two years to keep up with the latest media densities
  16. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  17. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  18. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  19. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  20. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  21. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  22. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  23. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…
  24. CERN is more than just the LHCCNGS neutrinos to Gran SassoCLOUD demonstrating impacts of cosmic rays on weather patternsAnti-hydrogen atoms contained for minutes in a magnetic vesselHowever, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus
  25. We purchase on an annuak cycle, replacing around ¼ of the servers. This purchasing is based on performance metrics such as cost per SpecInt or cost/GBGenerally, we are seeing dual core computer servers with Intel or AMD processors and bulk storage servers with 24 or 36 2TB disksThe operating system is Redhatlinux based distributon called Scientific Linux. We share the development and maintenance with Fermilab in Chicago. The choice of a Redhat based distribution comes from the need for stability across the grid, where keeping the 200 centres running compatible Linux distributions.
  26. LHC@Home is not an instruction on how to build your own accelerator but a magnet simulation tool to test multiple passes around the ring. We wanted to use it as a stress test tool and in ½ day, it was running on 1000 VMs.