SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
glideinWMS for users




      Introduction to glideinWMS
                     by Igor Sfiligoi (UCSD)




CERN, Dec 2012             glideinWMS Intro    1
Scope of this talk

                         This talk provides a
                   user perspective of glideinWMS
                 for users with previous experience
                        with Grid computing.

                  It does not provide much detail
                 but concentrates on the concepts
                     behind the system instead.



CERN, Dec 2012                glideinWMS Intro        2
The problem(s)
 ●   Users have many                   ●   Resources provided
     jobs that must be run                 by O(100) Grid sites
      ●   Each user has
          multiple tasks at once

 ●   How do we schedule them to get the results
     in the shortest amount of time?
      ●   Assuming one result per task
 ●   How do we treat all users in a fair way?
      ●   Independently of how many jobs they submit

CERN, Dec 2012               glideinWMS Intro                     3
glideinWMS approach
 ●   Separates
      ●   Resource provisioning from                                           Never the
      ●   Resource scheduling                                                  user jobs!

 ●   In practice, sends out pilot jobs to the Grid
      ●   And creates an “overlay batch system”
 ●   Pilot jobs get ownership of the Grid slots
      ●   At least for a limited time                                Grid Site

                                                 Grid Site
                                                              Overlay
 ●   Known also as                                             batch
                                                                             Grid Site
                                                              system
     the pilot approach
                                                  Grid Site
                                                                 Grid Site

CERN, Dec 2012                glideinWMS Intro                                              4
Job scheduling
 ●   Once we have the overlay batch system,
     job scheduling works like in any dedicated B.S.
      ●   We own the B.S. and can set the
          job scheduling policies
 ●   glideinWMS based on HTCondor
      ●   So can do whatever HTCondor can do
      ●   Which is quite flexible
            –    But also nothing more...                                              HTCondor
                                                                                  (formerly knowns as Condor)
 ●   HTCondor-based pilots are                                                     is a widely used
                                                                                    batch system.
     usually called glideins
          Thus glideinWMS stands for “glidein based Workflow Management System”
                                                                                        More details
                                                                                         later on.
CERN, Dec 2012                                     glideinWMS Intro                                             5
Creating the overlay
                           i.e. resource provisioning



 ●   glideinWMS will grow and shrink the
     overlay B.S. automatically
      ●   No human intervention needed
 ●   Expansion based on user jobs in the queue
      ●   The more jobs, the faster it will try to grow
      ●   Since not all jobs can run at all sites,
          different attempted growth rates for different sites
 ●   Shrinks automatically if resources unused
      ●   Again, based
          on user jobs in the queue                     Each glidein should run
                                                          at least one user job,
                                                         but will try to run many
                                                    if the Grid slot is long enough
CERN, Dec 2012                 glideinWMS Intro                                       6
glideinWMS in a picture


                 Grid Site                Grid Site




                                                                   Grid site
                             glideinWMS
                                                          HTCondor
                                                         CPU Handler

                                                                User Job
                   HTCondor
                      Job
                   Repository




CERN, Dec 2012                        glideinWMS Intro                         7
From the user point of view
 ●   Users see just a “regular” HTCondor system
      ●   Just a dynamic one
 ●   However
      ●   Have to be aware of the resource provisioning logic
           –     No native HTCondor tools to help with this
      ●   Debugging system problems much harder
           –     Again, no native HTCondor tools to help with this
           –     Most likely question a user will ask is
                 “Why is my job not starting?”




CERN, Dec 2012                      glideinWMS Intro                 8
A few more details
 ●   So, glideinWMS is really HTCondor++
      ●   First you have to understand how HTCondor works
      ●   If you do, 99% of the problems are solved

 ●   HTCondor is composed of 3 logical pieces
      ●   Submit nodes – keep the job queue(s)
      ●   Execute nodes – owns and operates a resource
      ●   A central manager – glues the other two categories
                             together and executes policies



CERN, Dec 2012              glideinWMS Intro                   9
HTCondor in a picture

                                              Execute node

                        Central manager       Execute node
          Submit node
                                              Execute node
                           Condor
          Submit node
                                              Execute node
          Submit node
                                              Execute node
            Condor                              Condor




CERN, Dec 2012             glideinWMS Intro                  10
Even more details
 ●   The actual work quanta are
      ●   Jobs – on the submit node, typically many/node
      ●   Slots – on the execute node, typically only a few /node
 ●   While internally implemented differently,
     jobs and slots are conceptually very similar
      ●   Both describe a logical entity
                                                 “ClassAd” in
      ●   Both have attributes describing it     HTCondor speak
      ●   Both have requirements
 ●   HTCondor policy engine is really all about
     matchmaking jobs to slots

CERN, Dec 2012                glideinWMS Intro                    11
Matchmaking
 ●   Jobs that are not running
     (i.e. are “idle” in HTCondor speak)
     will be matched against Slots
     that don't yet run anything
     (i.e. are “Unclaimed” in HTCondor speak)
 ●   Requirements expressions can (and usually do)
     reference attributes in the other ClassAd
 ●   Both sides must evaluate to True for a match
      ●   Although we encourage all logic to reside in the
          Slot Requirement in glideinWMS setups
          (more on this later on)



CERN, Dec 2012                         glideinWMS Intro      12
A non-technical example

  Buyer Ad                                         Pet Ad
  MyType = “Buyer”                                 MyType = “Pet”
  TargetType = “Pet”                               TargetType = “Buyer”
  Requirements =                                   Requirements =
  (PetType == “Dog”) &&                                DogLover == True
  (Price <= AcctBalance) &&                        PetType = “Dog”
  (Size == "Large"||Size == "Very Large")
                                                   Color = “Brown”
  AcctBalance = 1000
                                                   Price = 75
  DogLover = True
                                                   Breed = "Saint Bernard"
  LegalName = “Curly Howard”
  ...                                              Size = "Very Large"
                                                   ...

  Buyer ~= Job                                             Dog == Resource ~= Slot

CERN, Dec 2012                  glideinWMS Intro                                13
Matching order
 ●   Most of the time, there are
     way more idle jobs than Unclaimed slots
      ●   So order is important
 ●   Two policies
      1) Jobs  from highest priority user first
      2) Priority-FIFO policy for jobs of the same user
 ●   User priority based on usage
      ●   The more resources you use, the lower the priority
          (with priority recovery over time)
      ●   But some users may be marked as “more important”
          (priority multipliers and group quotas)


CERN, Dec 2012                           glideinWMS Intro      14
The glideinWMS part
 ●   i.e. The layer on top of HTCondor that
     finds resources where to start the
     “Execute node” daemons
      ●   i.e. glideins
 ●   Composed of two parts:
      ●   Glidein Factory – The abstraction layer
      ●   VO Fronend      – The brain




CERN, Dec 2012              glideinWMS Intro        15
Glidein Factory
●   The splitting in two allows for the                         i.e. serve
                                                               many VO FEs
    Glidein Factory to be generic
     ●   I.e can (and should) be shared between many VOs
●   The G.F. is really just an abstraction layer
     ●   It insulates the VO from the provisioning details
           –     e.g. knowing the name of the Grid CE and relative RSL
     ●   Allows new technology to be added seamlessly
         (e.g. Clouds)
●   It also provides a troubleshooting service
     ●   The factory operators are supposed to
         address any Grid related problems they observe

CERN, Dec 2012                     glideinWMS Intro                      16
The VO Frontend
 ●   The name may be misleading
      ●   It is really the “matchmaker of Grid resources”
 ●   Introduces a new quanta
      ●   Entry – logical equivalent of a “queue at a site”
                  Basic working block of a G.F.
 ●   The VO Frontend
      1) Matches    idle Jobs to Entries
      2) Instructs the affected G.F. to
         increase or decrease the number
         of glideins on that Entry
                                                 Thus regulates the
                                                resource provisioning

CERN, Dec 2012               glideinWMS Intro                           17
Updated glideinWMS picture
                                  G.F.
                        +3
           VO FE                                      Grid
                                  G.F.
                        +1
                                                   Execute node

                             Central manager       Execute node
          Submit node
                                                   Execute node
                                Condor
          Submit node
                                                   Execute node
          Submit node
                                                   Execute node
            Condor                                   Condor




CERN, Dec 2012                  glideinWMS Intro                  18
VO FE Matchmaking logic
 ●   Based on Job attributes
      ●   Jobs don't have “FE-specific requirements”
 ●   The exact matchmaking policy
     depends on the VO FE instance
                                          Will describe CMS policies in a different talk
 ●   glideinWMS has 2 level matchmaking
      ●   Once in the FE, then in the HTCondor C.M.
      ●   Recommended to avoid explicit
          “HTCondor requirements” in the Job ClassAd
           –     The glideins should set “Slot requirements” based on the
                 same attributes used by the VO FE, instead
                                                  Since VO FE configures the glideins
CERN, Dec 2012                     glideinWMS Intro                                     19
What is the user to do?
 0) Learn  how to use HTCondor
 1) Learn what the VO FE policy is
 2) Create the HTCondor submit file (i.e. JDL)
   containing the necessary attributes
 3) Submit jobs
 4) Wait for the results to come back
 5) Rinse and repeat (from (2))




CERN, Dec 2012           glideinWMS Intro        20
This is it


             ●   Hopefully you have a high level view of
                 the system now
             ●   More details in separate talks




CERN, Dec 2012                glideinWMS Intro             21
Pointers
 ●   glideinWMS Home Page
     http://tinyurl.com/glideinWMS
 ●   HTCondor Home Page
     http://research.cs.wisc.edu/htcondor/
 ●   HTCondor support
     htcondor-users@cs.wisc.edu
     htcondor-admin@cs.wisc.edu
 ●   glideinWMS support
     glideinwms-support@fnal.gov


CERN, Dec 2012          glideinWMS Intro     22
Acknowledgments
 ●   The creation of this document was sponsored
     by grants from the US NSF and US DOE,
     and by the University of California system




CERN, Dec 2012         glideinWMS Intro            23

Contenu connexe

En vedette

Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Igor Sfiligoi
 
Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014Igor Sfiligoi
 
How is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondorHow is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondorIgor Sfiligoi
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolIgor Sfiligoi
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSIgor Sfiligoi
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014Igor Sfiligoi
 
Glidein Factory Operations
Glidein Factory OperationsGlidein Factory Operations
Glidein Factory OperationsIgor Sfiligoi
 
Building a Global Namespace with Nirvana
Building a Global Namespace with NirvanaBuilding a Global Namespace with Nirvana
Building a Global Namespace with NirvanaIgor Sfiligoi
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break pointsIgor Sfiligoi
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1Simon Kim
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsIgor Sfiligoi
 
Using ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experienceUsing ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experienceIgor Sfiligoi
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondorIgor Sfiligoi
 
Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Igor Sfiligoi
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringIgor Sfiligoi
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaIgor Sfiligoi
 

En vedette (17)

Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014
 
Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014
 
How is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondorHow is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondor
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014
 
Glidein Factory Operations
Glidein Factory OperationsGlidein Factory Operations
Glidein Factory Operations
 
Building a Global Namespace with Nirvana
Building a Global Namespace with NirvanaBuilding a Global Namespace with Nirvana
Building a Global Namespace with Nirvana
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor Internals
 
Using ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experienceUsing ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experience
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS Experience
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoring
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with Nirvana
 

Similaire à Introduction to glideinWMS

glideinWMS - The Larger Picture
glideinWMS - The Larger PictureglideinWMS - The Larger Picture
glideinWMS - The Larger PictureIgor Sfiligoi
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012Igor Sfiligoi
 
The glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldThe glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldIgor Sfiligoi
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...Igor Sfiligoi
 
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Igor Sfiligoi
 
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012Igor Sfiligoi
 
glideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuningglideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuningIgor Sfiligoi
 
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...Igor Sfiligoi
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2Robin Aggarwal
 
Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Arun Gupta
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012Igor Sfiligoi
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
 
JBoss Drools - Open-Source Business Logic Platform
JBoss Drools - Open-Source Business Logic PlatformJBoss Drools - Open-Source Business Logic Platform
JBoss Drools - Open-Source Business Logic Platformelliando dias
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in ContainerizationRyan Hunter
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
 
Saint2012 mod process security
Saint2012 mod process securitySaint2012 mod process security
Saint2012 mod process securityRyosuke MATSUMOTO
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios
 

Similaire à Introduction to glideinWMS (20)

glideinWMS - The Larger Picture
glideinWMS - The Larger PictureglideinWMS - The Larger Picture
glideinWMS - The Larger Picture
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012
 
The glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldThe glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud World
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
 
Glidein internals
Glidein internalsGlidein internals
Glidein internals
 
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
Glidein startup Internals and Glidein configuration - glideinWMS Training Jan...
 
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012glideinWMS Frontend Internals - glideinWMS Training Jan 2012
glideinWMS Frontend Internals - glideinWMS Training Jan 2012
 
glideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuningglideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuning
 
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
 
Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010
 
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
glideinWMS Frontend Monitoring - glideinWMS Training Jan 2012
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
JBoss Drools - Open-Source Business Logic Platform
JBoss Drools - Open-Source Business Logic PlatformJBoss Drools - Open-Source Business Logic Platform
JBoss Drools - Open-Source Business Logic Platform
 
Grunt
GruntGrunt
Grunt
 
Pilot Factory
Pilot FactoryPilot Factory
Pilot Factory
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Saint2012 mod process security
Saint2012 mod process securitySaint2012 mod process security
Saint2012 mod process security
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
 

Plus de Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

Plus de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Introduction to glideinWMS

  • 1. glideinWMS for users Introduction to glideinWMS by Igor Sfiligoi (UCSD) CERN, Dec 2012 glideinWMS Intro 1
  • 2. Scope of this talk This talk provides a user perspective of glideinWMS for users with previous experience with Grid computing. It does not provide much detail but concentrates on the concepts behind the system instead. CERN, Dec 2012 glideinWMS Intro 2
  • 3. The problem(s) ● Users have many ● Resources provided jobs that must be run by O(100) Grid sites ● Each user has multiple tasks at once ● How do we schedule them to get the results in the shortest amount of time? ● Assuming one result per task ● How do we treat all users in a fair way? ● Independently of how many jobs they submit CERN, Dec 2012 glideinWMS Intro 3
  • 4. glideinWMS approach ● Separates ● Resource provisioning from Never the ● Resource scheduling user jobs! ● In practice, sends out pilot jobs to the Grid ● And creates an “overlay batch system” ● Pilot jobs get ownership of the Grid slots ● At least for a limited time Grid Site Grid Site Overlay ● Known also as batch Grid Site system the pilot approach Grid Site Grid Site CERN, Dec 2012 glideinWMS Intro 4
  • 5. Job scheduling ● Once we have the overlay batch system, job scheduling works like in any dedicated B.S. ● We own the B.S. and can set the job scheduling policies ● glideinWMS based on HTCondor ● So can do whatever HTCondor can do ● Which is quite flexible – But also nothing more... HTCondor (formerly knowns as Condor) ● HTCondor-based pilots are is a widely used batch system. usually called glideins Thus glideinWMS stands for “glidein based Workflow Management System” More details later on. CERN, Dec 2012 glideinWMS Intro 5
  • 6. Creating the overlay i.e. resource provisioning ● glideinWMS will grow and shrink the overlay B.S. automatically ● No human intervention needed ● Expansion based on user jobs in the queue ● The more jobs, the faster it will try to grow ● Since not all jobs can run at all sites, different attempted growth rates for different sites ● Shrinks automatically if resources unused ● Again, based on user jobs in the queue Each glidein should run at least one user job, but will try to run many if the Grid slot is long enough CERN, Dec 2012 glideinWMS Intro 6
  • 7. glideinWMS in a picture Grid Site Grid Site Grid site glideinWMS HTCondor CPU Handler User Job HTCondor Job Repository CERN, Dec 2012 glideinWMS Intro 7
  • 8. From the user point of view ● Users see just a “regular” HTCondor system ● Just a dynamic one ● However ● Have to be aware of the resource provisioning logic – No native HTCondor tools to help with this ● Debugging system problems much harder – Again, no native HTCondor tools to help with this – Most likely question a user will ask is “Why is my job not starting?” CERN, Dec 2012 glideinWMS Intro 8
  • 9. A few more details ● So, glideinWMS is really HTCondor++ ● First you have to understand how HTCondor works ● If you do, 99% of the problems are solved ● HTCondor is composed of 3 logical pieces ● Submit nodes – keep the job queue(s) ● Execute nodes – owns and operates a resource ● A central manager – glues the other two categories together and executes policies CERN, Dec 2012 glideinWMS Intro 9
  • 10. HTCondor in a picture Execute node Central manager Execute node Submit node Execute node Condor Submit node Execute node Submit node Execute node Condor Condor CERN, Dec 2012 glideinWMS Intro 10
  • 11. Even more details ● The actual work quanta are ● Jobs – on the submit node, typically many/node ● Slots – on the execute node, typically only a few /node ● While internally implemented differently, jobs and slots are conceptually very similar ● Both describe a logical entity “ClassAd” in ● Both have attributes describing it HTCondor speak ● Both have requirements ● HTCondor policy engine is really all about matchmaking jobs to slots CERN, Dec 2012 glideinWMS Intro 11
  • 12. Matchmaking ● Jobs that are not running (i.e. are “idle” in HTCondor speak) will be matched against Slots that don't yet run anything (i.e. are “Unclaimed” in HTCondor speak) ● Requirements expressions can (and usually do) reference attributes in the other ClassAd ● Both sides must evaluate to True for a match ● Although we encourage all logic to reside in the Slot Requirement in glideinWMS setups (more on this later on) CERN, Dec 2012 glideinWMS Intro 12
  • 13. A non-technical example Buyer Ad Pet Ad MyType = “Buyer” MyType = “Pet” TargetType = “Pet” TargetType = “Buyer” Requirements = Requirements = (PetType == “Dog”) && DogLover == True (Price <= AcctBalance) && PetType = “Dog” (Size == "Large"||Size == "Very Large") Color = “Brown” AcctBalance = 1000 Price = 75 DogLover = True Breed = "Saint Bernard" LegalName = “Curly Howard” ... Size = "Very Large" ... Buyer ~= Job Dog == Resource ~= Slot CERN, Dec 2012 glideinWMS Intro 13
  • 14. Matching order ● Most of the time, there are way more idle jobs than Unclaimed slots ● So order is important ● Two policies 1) Jobs from highest priority user first 2) Priority-FIFO policy for jobs of the same user ● User priority based on usage ● The more resources you use, the lower the priority (with priority recovery over time) ● But some users may be marked as “more important” (priority multipliers and group quotas) CERN, Dec 2012 glideinWMS Intro 14
  • 15. The glideinWMS part ● i.e. The layer on top of HTCondor that finds resources where to start the “Execute node” daemons ● i.e. glideins ● Composed of two parts: ● Glidein Factory – The abstraction layer ● VO Fronend – The brain CERN, Dec 2012 glideinWMS Intro 15
  • 16. Glidein Factory ● The splitting in two allows for the i.e. serve many VO FEs Glidein Factory to be generic ● I.e can (and should) be shared between many VOs ● The G.F. is really just an abstraction layer ● It insulates the VO from the provisioning details – e.g. knowing the name of the Grid CE and relative RSL ● Allows new technology to be added seamlessly (e.g. Clouds) ● It also provides a troubleshooting service ● The factory operators are supposed to address any Grid related problems they observe CERN, Dec 2012 glideinWMS Intro 16
  • 17. The VO Frontend ● The name may be misleading ● It is really the “matchmaker of Grid resources” ● Introduces a new quanta ● Entry – logical equivalent of a “queue at a site” Basic working block of a G.F. ● The VO Frontend 1) Matches idle Jobs to Entries 2) Instructs the affected G.F. to increase or decrease the number of glideins on that Entry Thus regulates the resource provisioning CERN, Dec 2012 glideinWMS Intro 17
  • 18. Updated glideinWMS picture G.F. +3 VO FE Grid G.F. +1 Execute node Central manager Execute node Submit node Execute node Condor Submit node Execute node Submit node Execute node Condor Condor CERN, Dec 2012 glideinWMS Intro 18
  • 19. VO FE Matchmaking logic ● Based on Job attributes ● Jobs don't have “FE-specific requirements” ● The exact matchmaking policy depends on the VO FE instance Will describe CMS policies in a different talk ● glideinWMS has 2 level matchmaking ● Once in the FE, then in the HTCondor C.M. ● Recommended to avoid explicit “HTCondor requirements” in the Job ClassAd – The glideins should set “Slot requirements” based on the same attributes used by the VO FE, instead Since VO FE configures the glideins CERN, Dec 2012 glideinWMS Intro 19
  • 20. What is the user to do? 0) Learn how to use HTCondor 1) Learn what the VO FE policy is 2) Create the HTCondor submit file (i.e. JDL) containing the necessary attributes 3) Submit jobs 4) Wait for the results to come back 5) Rinse and repeat (from (2)) CERN, Dec 2012 glideinWMS Intro 20
  • 21. This is it ● Hopefully you have a high level view of the system now ● More details in separate talks CERN, Dec 2012 glideinWMS Intro 21
  • 22. Pointers ● glideinWMS Home Page http://tinyurl.com/glideinWMS ● HTCondor Home Page http://research.cs.wisc.edu/htcondor/ ● HTCondor support htcondor-users@cs.wisc.edu htcondor-admin@cs.wisc.edu ● glideinWMS support glideinwms-support@fnal.gov CERN, Dec 2012 glideinWMS Intro 22
  • 23. Acknowledgments ● The creation of this document was sponsored by grants from the US NSF and US DOE, and by the University of California system CERN, Dec 2012 glideinWMS Intro 23