SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
CERN Agile Infrastructure
                             Road to Production
                                    Steve Traylen,
                                steve.traylen@cern.ch
                                    @traylenator
                                CERN, IT Department
                             HEPiX Autumn 2012 Workshop


 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                          1
www.cern.ch/it

Monday, October 15, 2012                                      1
CERN Agile Infrastructure

                       • Motivation
                       • Component Releases
                          – Configuration
                             • Puppet, Foreman and Hiera
                             • Punch -> Judy
                          – Provision
                             • OpenStack
                          – Other Services
                             • koji , git, jira
                       • Community Interactions.
                       • AI as a Production Service
                          – Expanding user base

 CERN IT Department
  CH-1211 Genève 23
         Switzerland
www.cern.ch/it                                             2
Monday, October 15, 2012                                       2
Motivation for CERN AI.

                       • CERN IT is changing strategy for machine
                         provision and configuration.
                       • Rationale
                         – Need to manage twice as many servers as today
                         – No increase in staff numbers
                         – Our deployment of configuration tools becoming
                           increasingly brittle.
                         – New services take far to long to deploy.
                       • Approach
                         – We are no longer a special case for compute.
                         – Adopt open source tool chain model
 CERN IT Department
                         – Contribute new function back to community.
  CH-1211 Genève 23
         Switzerland
                                                                          3
www.cern.ch/it

Monday, October 15, 2012                                                      3
Configuration Components
                  • Puppet (2.7)
                        – Responsible for configuration, an industry standard.
                  • Foreman (1.0)
                        – Groups hosts into hostgroups of similar
                          configuration.
                        – Generates kickstart files from where puppet can
                          take over.
                  • Hiera (1.0)
                        – A data store used by puppet.
                  • Mcollective (2.2)
                        – pub sub messaging to control and query hosts.
                  • CDB legacy (old)
 CERN IT Department
  CH-1211 Genève 23
         Switzerland    – Still some items in CDB... e.g warranty information.
                                                                            4
www.cern.ch/it

Monday, October 15, 2012                                                         4
Configuration - Punch Service
                       • First puppet infrastructure known as “Punch”
                         – One 4 core node, set up “by hand”.
                         – puppet, foreman running behind passenger
                           (mod_ruby)
                         – In built own puppetca (cert authority)
                         – All project members with root access.
                            • Secret files uploaded by hand.
                            • Secret files being distributed by puppet
                         – Node started to struggle once 400 puppet agents
                           attached - CPU limitation on server.
                            • This was with reconfigurations every 15 minutes which
                              is excessive.
                       • Punch ran for 6 months.
 CERN IT Department
  CH-1211 Genève 23
         Switzerland     – Punch was never a scalable solution.                  5
www.cern.ch/it

Monday, October 15, 2012                                                              5
Configuration - Judy Service
                       • Punch replaced by Judy in August 2012.
                         – All components are deployed with puppet.
                         – 2 backend puppetmasters, 2 backend foreman.
                         – mod_loadbalence redirecting requests.
                                                • Using CERN CA.
                                                • CertBaby Service
                                                  • Hooks up users
                                                    kerberos identity,
                                                    machine
                                                    ownership and
                                                    certificate
                                                    requests.
 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                                         6
www.cern.ch/it

Monday, October 15, 2012                                                     6
Judy Service Scale

                       • Currently 1200 puppet agents.
                         – 500 node added in the last week.
                         – 100 a day being added right now.
                         – Agents are running on
                            • Hardware
                            • CVI Service (hyper-v)
                            • OpenStack Nova (kvm) (all new ones)
                         – Organized in 37 hostgroups with 60 subgroups.
                       • Adding more puppetmasters or foreman
                         backends is easy.
                         – Same problem as scaling web pages, e.g
                            • Number of active connections at redirector.
 CERN IT Department         • Consistency across back end servers.
  CH-1211 Genève 23
         Switzerland
                                                                            7
www.cern.ch/it

Monday, October 15, 2012                                                        7
Puppet Manifests.

                       • Puppet manifests are very (too?) quick to
                         develop.
                         – Takes little longer than configuring the service.
                         – e.g an apollo module written in two days.
                            • while apollo configuration was being learnt.
                         – later paramatization of hardcoded values easy.
                       • Puppet code to be executed on nodes is
                         distributed by puppet first.
                         – i.e no need to package any puppet modules.
                         – Makes new feature development, deployment
                           very fast.
                       • We and others will get better at sharing
                         puppet manifests as hiera becomes normal.
 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                                               8
www.cern.ch/it

Monday, October 15, 2012                                                           8
Puppet Git and Environments

                       • Git used for puppet modules & manifests.
                       • Git branches map to dynamic environments
                         – local development can be ‘puppet apply’d.
                         – admins push changes to a (gitolite) repository
                         – puppet masters pull branches and translate to
                           environments
                         – Production, Testing & Devel branches
                         – Topic branches for major changes
                         – Some services live in their own branches
                            • risk of divergence...
                       • Atlassian Crucible & Fisheye for module
                         review process ... not really started.
 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                                            9
www.cern.ch/it

Monday, October 15, 2012                                                        9
Foreman

                       • Groups hosts of similar configuration.
                       • Top group -> service. e.g lxbatch, cernfts, ...
                       • Subgroups may be very different e.g
                          – cvmfs/stratum0 vs cvmfs/lxcvmfs.




 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                                      10
www.cern.ch/it

Monday, October 15, 2012                                                   10
Separate Code and Data

                       • Quattor separated code and data well:
                         – It was one motivation to write Quattor and drop
                           LCFGng in the first place.
                       • hiera takes the separation to a new level:
                         – puppet asks for a value from hiera?
                            • $myNTP = hiera(‘ntpservers’)
                         – result can be string , array, hash, ....
                         – The lookup is based on a nodes properties, e.g
                            • Since I am at CERN answer is ntp1.cern.ch
                            • Since I am in Budapest answer is ntp2.cern.ch
                         – The schema of results for CERN nodes,
                           Budapest nodes, SLC5 nodes, debian nodes can
 CERN IT Department
  CH-1211 Genève 23
                           be arranged and changed as we please.
         Switzerland
                                                                              11
www.cern.ch/it

Monday, October 15, 2012                                                           11
Hiera and Hostgroups
                  • We arrange nodes in to (sub)hostgroups in
                    foreman.
                  • A tree of YAML files stored in git maps on to
                    these. e.g for castor hostgroups
                        –   hostgroup/castor/diskserver/atlas.yaml
                        –   hostgroup/castor/diskserver.yaml
                        –   hostgroup/castor.yaml               # A YAML file.
                        –   os/slc5.yaml                        ---
                                                                castorns: ns.cern.ch
                        –   common.yaml
                  • The files above contain increasingly general
                    keyvalues for look up in hiera.
                  • Schema and can be fully customized to CERN
 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                    space with no fear of polluting the code.
                                                                                 12
www.cern.ch/it

Monday, October 15, 2012                                                               12
Configuration Next Steps
                       • Deploy puppetdb
                         – Performance improvements - community raving.
                         – Repository for configuration data mining.
                       • Deploy mcollective
                         – Pub and Sub system for sending action
                           commands to hosts.
                         – Message broker needs ACLs on queues
                           corresponding to full diversity of CERN hosts and
                           actions.
                         – Data mine puppetdb.
                       • Workflow
                         – Move to git pull request process for central
 CERN IT Department        configuration.
  CH-1211 Genève 23
         Switzerland
                                                                          13
www.cern.ch/it

Monday, October 15, 2012                                                       13
OpenStack Deployment
                   • Currently Essex code base from the EPEL
                     repository
                   • Good experience with the Fedora cloud-sig
                     team
                   • Cloud-init for contextualisation, oz for images
                     with RHEL/Fedora
                   • Components
                        –   Nova on KVM and Hyper-V
                        –   Keystone integrated with Active Directory
                        –   Glance with Oz
                        –   Horizon

 CERN IT Department
                   • Test bed of 100 Hypervisors, 2000 VMs
  CH-1211 Genève 23
         Switzerland integrated with CERN infrastructure, Puppet14
www.cern.ch/it

Monday, October 15, 2012                                                14
OpenStack AD Integration

                       • CERN’s Active Directory
                       • Unified identity management across the site
                          – 44,000 users, 29,000 groups
                          – 200 arrivals/departures per month
                       • Full integration with Active Directory via
                         LDAP
                          – Slightly different schema from OpenLDAP
                          – Aim to minimise changes to AD Schema
                          – 7 patches submitted around hard coded values
                            and additional filtering
                       • Now in use for our pre-production instance
 CERN IT Department       – Model project definitions in Active Directory
  CH-1211 Genève 23
         Switzerland
www.cern.ch/it            – Map roles to groups                             15

Monday, October 15, 2012                                                         15
Welcome Back Hyper V

                       • We currently use Hyper-V/System Centre for
                         our server consolidation
                         – Over 3,200 VMs, 60% Linux/40% Windows
                       • Choice of hypervisors should be tactical
                         – Performance
                         – Compatibility/Support with integration
                           components
                         – Image migration
                       • CERN is working closely with the Hyper-V
                         OpenStack team
                         – Puppet to configure hypervisors on Windows
 CERN IT Department
                         – Most functions work well but further work on
  CH-1211 Genève 23
         Switzerland       Console, Ceilometer, …                         16
www.cern.ch/it

Monday, October 15, 2012                                                       16
OpenStack Next Steps

                       • Deploy into production
                          – Target for production is start of 2013 with Folsom
                          – Use current grid model running on top of OpenStack
                       • Deploy multi-site
                          – Extend to 2nd data centre in Hungary and disaster
                            recovery
                       • Deploy new functionality
                          – Ceilometer for accounting
                          – Bare metal for non-virtualised use cases such as high I/
                            O servers
                          – PKI and X.509 user certificate authentication
                          – Load balancing as a service
                       • Deploy at scale
 CERN IT Department       – Move towards 15,000 hypervisors over next two years
  CH-1211 Genève 23
         Switzerland
www.cern.ch/it
                          – Estimate 100-300,000 virtual machines             17

Monday, October 15, 2012                                                               17
Community Interactions
                       • CERN presenting to community/vendors.
                         –   PuppetConf , San Francisco, Sep 2012
                         –   Openstack Summit, San Francisco Apr 2012
                         –   Openstack Summit, San Diego , Oct 2012 (now)
                         –   PuppetCamp, Geneva, July 2012
                       • CERN has code contributions to:
                         – facter, the foreman, puppet, various puppet
                           modules, mcollective, openstack nova, keystone
                           and swift.
                         – This is increasing as new students/fellows are
                           employed for their puppet, ruby, .. skills.
                       • CERN puppet-users meeting , IT, ATLAS pit, ..
 CERN IT Department
  CH-1211 Genève 23    • Share our own http://github.com/cernops
         Switzerland
                                                                       18
www.cern.ch/it

Monday, October 15, 2012                                                    18
Other AI Services

                       • Agile is not just Puppet and Openstack.
                       • AI created a gitolite ACL’ed GIT service.
                          – CERN IT is now provisioning a public GIT service
                            based on this.
                          – AI will migrate its projects ASAP.
                       • AI created a Koji service for RPMs.
                          – Creates RPMS and publishes to yum.
                          – The service is now being used by others with in IT.
                            e.g castor builds, data management, lemon, ...
                       • AI ran jira early before a central service was
                         created.
 CERN IT Department
                          – AI already migrated to central service.
  CH-1211 Genève 23
         Switzerland
                                                                           19
www.cern.ch/it

Monday, October 15, 2012                                                        19
AI Service in Production

                       • Several Services running now on AI.
                         –   Some CVMFS components.
                         –   SLC6 batch services
                         –   SLC build machines
                         –   GIT gateways.
                         –   CASTOR (compass VO)
                         –   Test systems, glusterfs, swift, ..
                         –   New top level hostgroups every week now.
                       • From November AI opening up more.
                         – Experiment services (voboxes) will start to use AI
                           service.
 CERN IT Department
                         – Documentation to be updated/consolidated.
  CH-1211 Genève 23
         Switzerland
                                                                          20
www.cern.ch/it

Monday, October 15, 2012                                                        20
Conclusions

                       • Agile Infrastructure Project
                       • We are ready for hardware arriving in
                         Budapest in 2013.
                         – Puppet configured VMs on Puppet configured
                           OpenStack.
                       • Documentation:
                         – More user facing documentation needed.
                       • Configuration with Puppet:
                         – Services needing knowledge of everything
                         – Inter sysadmin trust.
                         – Test facility for AI.
 CERN IT Department
  CH-1211 Genève 23
                       • OpenStack deployment
         Switzerland
www.cern.ch/it           – Increase scale.                              21

Monday, October 15, 2012                                                     21
URLs

                       •   AI Project Pages: http://cern.ch/go/7vFF
                       •   CERN modules http://github.com/cernops
                       •   CERN agile tickets https://agileinf.its.cern.ch/jira
                       •   AI Presentations : http://cern.ch/go/6qRG




 CERN IT Department
  CH-1211 Genève 23
         Switzerland
                                                                              22
www.cern.ch/it

Monday, October 15, 2012                                                           22

Contenu connexe

Tendances

Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
Dell openstack boston meetup dell crowbar and open stack
Dell openstack boston meetup   dell crowbar and open stackDell openstack boston meetup   dell crowbar and open stack
Dell openstack boston meetup dell crowbar and open stackDellCloudEdge
 
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)NGN Test Centre
 
Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Fuad Arshad
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese天青 王
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONOpen Stack
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureBob Rhubart
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppetglynnfoster
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...Frank Munz
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshopFran Navarro
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Fran Navarro
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...DESMOND YUEN
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadisdandre
 
Talk IT_ Oracle_정병선_110928
Talk IT_ Oracle_정병선_110928Talk IT_ Oracle_정병선_110928
Talk IT_ Oracle_정병선_110928Cana Ko
 
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Jeroen Burgers
 

Tendances (20)

Sparc solaris servers
Sparc solaris serversSparc solaris servers
Sparc solaris servers
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Dell openstack boston meetup dell crowbar and open stack
Dell openstack boston meetup   dell crowbar and open stackDell openstack boston meetup   dell crowbar and open stack
Dell openstack boston meetup dell crowbar and open stack
 
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)
Access to Open Test Infrastructures using Panlab2 - Anastasius Gavras (Eurescom)
 
Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCON
 
Exalogic Bcn
Exalogic BcnExalogic Bcn
Exalogic Bcn
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
Engineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the FutureEngineered Systems: Oracle’s Vision for the Future
Engineered Systems: Oracle’s Vision for the Future
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshop
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 
Talk IT_ Oracle_정병선_110928
Talk IT_ Oracle_정병선_110928Talk IT_ Oracle_정병선_110928
Talk IT_ Oracle_정병선_110928
 
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
Siebel Server Cloning available in 8.1.1.9 / 8.2.2.2
 

Similaire à CERN Agile Infrastructure, Road to Production

Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN GenevaSteve Traylen
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2Tim Bell
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNCeph Community
 
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyLessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyNovell
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaJazz Yao-Tsung Wang
 
Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Roland Tritsch
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with PlatespinAdvanced Logic Industries
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...OW2
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system poolBill Pijewski
 
Axceleon Presentation at Siggraph 2009
Axceleon Presentation at Siggraph 2009Axceleon Presentation at Siggraph 2009
Axceleon Presentation at Siggraph 2009Axceleon Inc
 
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebulaD’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebulaOpenNebula Project
 
Accelerate and unify network deployment with Puppet on Juniper
Accelerate and unify network deployment with Puppet on JuniperAccelerate and unify network deployment with Puppet on Juniper
Accelerate and unify network deployment with Puppet on JuniperPuppet
 

Similaire à CERN Agile Infrastructure, Road to Production (20)

Puppet Camp CERN Geneva
Puppet Camp CERN GenevaPuppet Camp CERN Geneva
Puppet Camp CERN Geneva
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyLessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
Lxcloud
LxcloudLxcloud
Lxcloud
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and Haduzilla
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with Platespin
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system pool
 
Axceleon Presentation at Siggraph 2009
Axceleon Presentation at Siggraph 2009Axceleon Presentation at Siggraph 2009
Axceleon Presentation at Siggraph 2009
 
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebulaD’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
 
Accelerate and unify network deployment with Puppet on Juniper
Accelerate and unify network deployment with Puppet on JuniperAccelerate and unify network deployment with Puppet on Juniper
Accelerate and unify network deployment with Puppet on Juniper
 

CERN Agile Infrastructure, Road to Production

  • 1. CERN Agile Infrastructure Road to Production Steve Traylen, steve.traylen@cern.ch @traylenator CERN, IT Department HEPiX Autumn 2012 Workshop CERN IT Department CH-1211 Genève 23 Switzerland 1 www.cern.ch/it Monday, October 15, 2012 1
  • 2. CERN Agile Infrastructure • Motivation • Component Releases – Configuration • Puppet, Foreman and Hiera • Punch -> Judy – Provision • OpenStack – Other Services • koji , git, jira • Community Interactions. • AI as a Production Service – Expanding user base CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it 2 Monday, October 15, 2012 2
  • 3. Motivation for CERN AI. • CERN IT is changing strategy for machine provision and configuration. • Rationale – Need to manage twice as many servers as today – No increase in staff numbers – Our deployment of configuration tools becoming increasingly brittle. – New services take far to long to deploy. • Approach – We are no longer a special case for compute. – Adopt open source tool chain model CERN IT Department – Contribute new function back to community. CH-1211 Genève 23 Switzerland 3 www.cern.ch/it Monday, October 15, 2012 3
  • 4. Configuration Components • Puppet (2.7) – Responsible for configuration, an industry standard. • Foreman (1.0) – Groups hosts into hostgroups of similar configuration. – Generates kickstart files from where puppet can take over. • Hiera (1.0) – A data store used by puppet. • Mcollective (2.2) – pub sub messaging to control and query hosts. • CDB legacy (old) CERN IT Department CH-1211 Genève 23 Switzerland – Still some items in CDB... e.g warranty information. 4 www.cern.ch/it Monday, October 15, 2012 4
  • 5. Configuration - Punch Service • First puppet infrastructure known as “Punch” – One 4 core node, set up “by hand”. – puppet, foreman running behind passenger (mod_ruby) – In built own puppetca (cert authority) – All project members with root access. • Secret files uploaded by hand. • Secret files being distributed by puppet – Node started to struggle once 400 puppet agents attached - CPU limitation on server. • This was with reconfigurations every 15 minutes which is excessive. • Punch ran for 6 months. CERN IT Department CH-1211 Genève 23 Switzerland – Punch was never a scalable solution. 5 www.cern.ch/it Monday, October 15, 2012 5
  • 6. Configuration - Judy Service • Punch replaced by Judy in August 2012. – All components are deployed with puppet. – 2 backend puppetmasters, 2 backend foreman. – mod_loadbalence redirecting requests. • Using CERN CA. • CertBaby Service • Hooks up users kerberos identity, machine ownership and certificate requests. CERN IT Department CH-1211 Genève 23 Switzerland 6 www.cern.ch/it Monday, October 15, 2012 6
  • 7. Judy Service Scale • Currently 1200 puppet agents. – 500 node added in the last week. – 100 a day being added right now. – Agents are running on • Hardware • CVI Service (hyper-v) • OpenStack Nova (kvm) (all new ones) – Organized in 37 hostgroups with 60 subgroups. • Adding more puppetmasters or foreman backends is easy. – Same problem as scaling web pages, e.g • Number of active connections at redirector. CERN IT Department • Consistency across back end servers. CH-1211 Genève 23 Switzerland 7 www.cern.ch/it Monday, October 15, 2012 7
  • 8. Puppet Manifests. • Puppet manifests are very (too?) quick to develop. – Takes little longer than configuring the service. – e.g an apollo module written in two days. • while apollo configuration was being learnt. – later paramatization of hardcoded values easy. • Puppet code to be executed on nodes is distributed by puppet first. – i.e no need to package any puppet modules. – Makes new feature development, deployment very fast. • We and others will get better at sharing puppet manifests as hiera becomes normal. CERN IT Department CH-1211 Genève 23 Switzerland 8 www.cern.ch/it Monday, October 15, 2012 8
  • 9. Puppet Git and Environments • Git used for puppet modules & manifests. • Git branches map to dynamic environments – local development can be ‘puppet apply’d. – admins push changes to a (gitolite) repository – puppet masters pull branches and translate to environments – Production, Testing & Devel branches – Topic branches for major changes – Some services live in their own branches • risk of divergence... • Atlassian Crucible & Fisheye for module review process ... not really started. CERN IT Department CH-1211 Genève 23 Switzerland 9 www.cern.ch/it Monday, October 15, 2012 9
  • 10. Foreman • Groups hosts of similar configuration. • Top group -> service. e.g lxbatch, cernfts, ... • Subgroups may be very different e.g – cvmfs/stratum0 vs cvmfs/lxcvmfs. CERN IT Department CH-1211 Genève 23 Switzerland 10 www.cern.ch/it Monday, October 15, 2012 10
  • 11. Separate Code and Data • Quattor separated code and data well: – It was one motivation to write Quattor and drop LCFGng in the first place. • hiera takes the separation to a new level: – puppet asks for a value from hiera? • $myNTP = hiera(‘ntpservers’) – result can be string , array, hash, .... – The lookup is based on a nodes properties, e.g • Since I am at CERN answer is ntp1.cern.ch • Since I am in Budapest answer is ntp2.cern.ch – The schema of results for CERN nodes, Budapest nodes, SLC5 nodes, debian nodes can CERN IT Department CH-1211 Genève 23 be arranged and changed as we please. Switzerland 11 www.cern.ch/it Monday, October 15, 2012 11
  • 12. Hiera and Hostgroups • We arrange nodes in to (sub)hostgroups in foreman. • A tree of YAML files stored in git maps on to these. e.g for castor hostgroups – hostgroup/castor/diskserver/atlas.yaml – hostgroup/castor/diskserver.yaml – hostgroup/castor.yaml # A YAML file. – os/slc5.yaml --- castorns: ns.cern.ch – common.yaml • The files above contain increasingly general keyvalues for look up in hiera. • Schema and can be fully customized to CERN CERN IT Department CH-1211 Genève 23 Switzerland space with no fear of polluting the code. 12 www.cern.ch/it Monday, October 15, 2012 12
  • 13. Configuration Next Steps • Deploy puppetdb – Performance improvements - community raving. – Repository for configuration data mining. • Deploy mcollective – Pub and Sub system for sending action commands to hosts. – Message broker needs ACLs on queues corresponding to full diversity of CERN hosts and actions. – Data mine puppetdb. • Workflow – Move to git pull request process for central CERN IT Department configuration. CH-1211 Genève 23 Switzerland 13 www.cern.ch/it Monday, October 15, 2012 13
  • 14. OpenStack Deployment • Currently Essex code base from the EPEL repository • Good experience with the Fedora cloud-sig team • Cloud-init for contextualisation, oz for images with RHEL/Fedora • Components – Nova on KVM and Hyper-V – Keystone integrated with Active Directory – Glance with Oz – Horizon CERN IT Department • Test bed of 100 Hypervisors, 2000 VMs CH-1211 Genève 23 Switzerland integrated with CERN infrastructure, Puppet14 www.cern.ch/it Monday, October 15, 2012 14
  • 15. OpenStack AD Integration • CERN’s Active Directory • Unified identity management across the site – 44,000 users, 29,000 groups – 200 arrivals/departures per month • Full integration with Active Directory via LDAP – Slightly different schema from OpenLDAP – Aim to minimise changes to AD Schema – 7 patches submitted around hard coded values and additional filtering • Now in use for our pre-production instance CERN IT Department – Model project definitions in Active Directory CH-1211 Genève 23 Switzerland www.cern.ch/it – Map roles to groups 15 Monday, October 15, 2012 15
  • 16. Welcome Back Hyper V • We currently use Hyper-V/System Centre for our server consolidation – Over 3,200 VMs, 60% Linux/40% Windows • Choice of hypervisors should be tactical – Performance – Compatibility/Support with integration components – Image migration • CERN is working closely with the Hyper-V OpenStack team – Puppet to configure hypervisors on Windows CERN IT Department – Most functions work well but further work on CH-1211 Genève 23 Switzerland Console, Ceilometer, … 16 www.cern.ch/it Monday, October 15, 2012 16
  • 17. OpenStack Next Steps • Deploy into production – Target for production is start of 2013 with Folsom – Use current grid model running on top of OpenStack • Deploy multi-site – Extend to 2nd data centre in Hungary and disaster recovery • Deploy new functionality – Ceilometer for accounting – Bare metal for non-virtualised use cases such as high I/ O servers – PKI and X.509 user certificate authentication – Load balancing as a service • Deploy at scale CERN IT Department – Move towards 15,000 hypervisors over next two years CH-1211 Genève 23 Switzerland www.cern.ch/it – Estimate 100-300,000 virtual machines 17 Monday, October 15, 2012 17
  • 18. Community Interactions • CERN presenting to community/vendors. – PuppetConf , San Francisco, Sep 2012 – Openstack Summit, San Francisco Apr 2012 – Openstack Summit, San Diego , Oct 2012 (now) – PuppetCamp, Geneva, July 2012 • CERN has code contributions to: – facter, the foreman, puppet, various puppet modules, mcollective, openstack nova, keystone and swift. – This is increasing as new students/fellows are employed for their puppet, ruby, .. skills. • CERN puppet-users meeting , IT, ATLAS pit, .. CERN IT Department CH-1211 Genève 23 • Share our own http://github.com/cernops Switzerland 18 www.cern.ch/it Monday, October 15, 2012 18
  • 19. Other AI Services • Agile is not just Puppet and Openstack. • AI created a gitolite ACL’ed GIT service. – CERN IT is now provisioning a public GIT service based on this. – AI will migrate its projects ASAP. • AI created a Koji service for RPMs. – Creates RPMS and publishes to yum. – The service is now being used by others with in IT. e.g castor builds, data management, lemon, ... • AI ran jira early before a central service was created. CERN IT Department – AI already migrated to central service. CH-1211 Genève 23 Switzerland 19 www.cern.ch/it Monday, October 15, 2012 19
  • 20. AI Service in Production • Several Services running now on AI. – Some CVMFS components. – SLC6 batch services – SLC build machines – GIT gateways. – CASTOR (compass VO) – Test systems, glusterfs, swift, .. – New top level hostgroups every week now. • From November AI opening up more. – Experiment services (voboxes) will start to use AI service. CERN IT Department – Documentation to be updated/consolidated. CH-1211 Genève 23 Switzerland 20 www.cern.ch/it Monday, October 15, 2012 20
  • 21. Conclusions • Agile Infrastructure Project • We are ready for hardware arriving in Budapest in 2013. – Puppet configured VMs on Puppet configured OpenStack. • Documentation: – More user facing documentation needed. • Configuration with Puppet: – Services needing knowledge of everything – Inter sysadmin trust. – Test facility for AI. CERN IT Department CH-1211 Genève 23 • OpenStack deployment Switzerland www.cern.ch/it – Increase scale. 21 Monday, October 15, 2012 21
  • 22. URLs • AI Project Pages: http://cern.ch/go/7vFF • CERN modules http://github.com/cernops • CERN agile tickets https://agileinf.its.cern.ch/jira • AI Presentations : http://cern.ch/go/6qRG CERN IT Department CH-1211 Genève 23 Switzerland 22 www.cern.ch/it Monday, October 15, 2012 22