SlideShare une entreprise Scribd logo
1  sur  17
glideinWMS training



                         glideinWMS
                                               -

                      The Larger Picture
                  i.e. Is it something you would be interested in?

                              by Igor Sfiligoi (UCSD)




glideinWMS training             glideinWMS - The Larger Picture      1
Why this talk?


         If you never heard of glideinWMS before,
              you likely have no idea if this is
        a product you would be interested in using.

                              This talk presents
                      glideinWMS in a larger context,
                         allowing you to understand
                       what this product is all about.


glideinWMS training            glideinWMS - The Larger Picture   2
The basics
 ●   glideinWMS has been designed to address the
     needs of High Throughput Computing (HTC)
      ●
          Better known as batch processing
 ●   In a nutshell, we are trying to facilitate
     the effective use
     of a large number of CPUs
     by a large number of users




glideinWMS training    glideinWMS - The Larger Picture   3
High Throughput Computing
 ●   The basic premise of HTC is that there is
     always more demand than available CPUs
 ●   We should make good use of those CPUs
      ●   Keep them busy, ideally, 24x7x365
 ●   Sustained utilization is thus
     more important than peak performance
      ●   Measure of success is
          FLOPY = Floating Points per Year
          not
          FLOPS = Floating Points per Second
glideinWMS training     glideinWMS - The Larger Picture   4
HTC from the user point of view
 ●   As a side effect, users must be HTC-aware
 ●   There are some negative aspects
      ●   No interactive access, only process queuing
            –   Usually referred to as user jobs
      ●   Waiting in line to get access to CPUs
 ●   But the payoff is potentially huge
      ●   A single user can use 1000s CPUs at a time
      ●   Performing in few days
          computations that would
          take several years on a single machine
glideinWMS training           glideinWMS - The Larger Picture   5
HTC in simplified picture

                                   User scheduling
                                   usually not FIFO



                      Repository

                                        Scheduler




glideinWMS training           glideinWMS - The Larger Picture   6
HTC products
 ●   There are many HTC products available
      ●   Although most call themselves “batch systems”
 ●   A non exhaustive list:
      ●   Condor
      ●   PBS, with variants like Torque/Maui
      ●   LSF
      ●   SGE, also known as Oracle Grid Engine




glideinWMS training      glideinWMS - The Larger Picture   7
Why another system?
 ●   All of the mentioned HTC systems
     assume full control
     of the compute resources (i.e. CPUs)
      ●   And there are many places where this is the case
 ●   glideinWMS developed to
     support non-dedicated use
     of compute resources
      ●   i.e. when CPUs are given
          to the system only
          for limited duration at a time

glideinWMS training        glideinWMS - The Larger Picture   8
Non-dedicated resources
 ●   In the past decade, two paradigms emerged
      ●   Grid computing
      ●   Cloud computing
 ●   Both allow a user community to use compute
     resources they don't own
      ●   Often called resource elasticity
 ●   Managing large number of Grid and Cloud
     resources by hand impractical
      ●   glideinWMS creates a HTC system using them

glideinWMS training        glideinWMS - The Larger Picture   9
Grid vs Cloud
                              (a short summary)

●    Grid computing is                     ●   (Commercial) Clouds are
     basically a federation                    about leasing resources
     of HTC clusters                           on a pay-as-you-go basis
       ●   Thus recently called                 ●    And they happen to use
           Distributed HTC                           virtualization
●    Job queuing is a                      ●   Instances expected to
     native paradigm                           start almost immediately
               ●   So-called “scientific clouds” are typically
                   just Grid systems that use virtualization
                   (and a different middleware stack)
    glideinWMS training        glideinWMS - The Larger Picture                10
Grid vs Cloud
                              (a short summary)

●    Grid computing is                     ●   (Commercial) Clouds are
     basically a federation                    about leasing resources
     of HTC clusters                           on a pay-as-you-go basis
                                                 glideinWMS
                                           currently optimized
       ●   Thus recently called               ● And they happen to use
                                            for the Grid model
           Distributed HTC                       virtualization
●    Job queuing is a                      ●   Instances expected to
     native paradigm                           start almost immediately
               ●   So-called “scientific clouds” are typically
                   just Grid systems that use virtualization
                   (and a different middleware stack)
    glideinWMS training        glideinWMS - The Larger Picture           11
glideinWMS and the Grid
                      (Cloud resources are used in a similar way)


 ●   glideinWMS creates
     an overlay system on top of
     the various HTC clusters
                                                                            HTC
      ●   From the user community                               HTC
          point of view,                                              glideinWMS
          a single HTC system                                             HTC      HTC

                                                                HTC
      ●   Just a dynamic one                                               HTC
 ●   glideinWMS
     completely automates
     the process

glideinWMS training           glideinWMS - The Larger Picture                            12
Implementation and support
 ●   glideinWMS heavily based on Condor
      ●   Essentially a thin layer on top of it
 ●   Most of the software support thus coming
     from the Condor development team
      ●   At University of Wisconsin – Madison
          http://research.cs.wisc.edu/condor/
 ●   The glideinWMS-specific layer supported by a
     team spanning Fermilab, UCSD and ISI
     http://tinyurl.com/glideinWMS



glideinWMS training           glideinWMS - The Larger Picture   13
glideinWMS and Condor
 ●   Condor handles the HTC system
      ●   Most Condor features thus available
 ●   glideinWMS role limited to scheduling,
     configuring and starting the Condor process
     on the compute resources
                                                                                   HTC
                           glideinWMS
                                                                       Condor
                                                                     CPU Handler

                                                                            User Job
                       Condor
                         Job
                      Repository


glideinWMS training                glideinWMS - The Larger Picture                       14
Summary
 ●   glideinWMS is a HTC product
      ●   i.e. enables effective use of a large number
          of CPUs by a large number of users
 ●   glideinWMS creates a HTC system out of
     non‑dedicated compute resources
      ●   e.g. Grid and Cloud resources
 ●   glideinWMS is heavily based on Condor
      ●   thus benefits from the Condor team support


glideinWMS training      glideinWMS - The Larger Picture   15
Pointers
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov
 ●   The official project Web page is
     http://tinyurl.com/glideinWMS
 ●   OSG glidein factory at UCSD
     http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory
     http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html




glideinWMS training                      glideinWMS - The Larger Picture                                 16
Acknowledgments
 ●   This document was sponsored by grants from
     the US NSF and US DOE,
     and by the UC system




glideinWMS training      glideinWMS - The Larger Picture   17

Contenu connexe

Similaire à glideinWMS - The Larger Picture

Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azure
yeschandana
 

Similaire à glideinWMS - The Larger Picture (20)

Web scale with-nutanix_rev
Web scale with-nutanix_revWeb scale with-nutanix_rev
Web scale with-nutanix_rev
 
Up is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and RightUp is Down, Black is White: Using SCCM for Wrong and Right
Up is Down, Black is White: Using SCCM for Wrong and Right
 
VMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User ExperienceVMworld 2015: The “Snappy” Virtual Desktop User Experience
VMworld 2015: The “Snappy” Virtual Desktop User Experience
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
Kernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees CookKernel Recipes 2017 - The state of kernel self-protection - Kees Cook
Kernel Recipes 2017 - The state of kernel self-protection - Kees Cook
 
Webinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to SolveWebinar: Enterprise Cloud Migration - 4 Problems to Solve
Webinar: Enterprise Cloud Migration - 4 Problems to Solve
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed Hat Enterprise Linux: Open, hyperconverged infrastructure
Red Hat Enterprise Linux: Open, hyperconverged infrastructure
 
Cloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud ComputingCloud 101: The Basics of Cloud Computing
Cloud 101: The Basics of Cloud Computing
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014
 
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
Building Android for the Cloud: Android as a Server (Mobile World Congress 2014)
 
Top 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To FocusTop 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To Focus
 
Why Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS wayWhy Cloud Computing has to go the FOSS way
Why Cloud Computing has to go the FOSS way
 
Cloud Computing & Windows Azure
Cloud Computing & Windows AzureCloud Computing & Windows Azure
Cloud Computing & Windows Azure
 
Virtualization and its importance and implementation levels
Virtualization and its importance and implementation levelsVirtualization and its importance and implementation levels
Virtualization and its importance and implementation levels
 
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
Building android for the Cloud: Android as a Server (AnDevConBoston 2014)
 
Cloud computing basics
Cloud computing basicsCloud computing basics
Cloud computing basics
 
A Seminar on Cloud Computing
A Seminar on Cloud ComputingA Seminar on Cloud Computing
A Seminar on Cloud Computing
 
A guide to modern software development 2018
A guide to modern software development 2018A guide to modern software development 2018
A guide to modern software development 2018
 
Webinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually HappenWebinar: Preparing for Disasters that Will Actually Happen
Webinar: Preparing for Disasters that Will Actually Happen
 

Plus de Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 

Plus de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Dernier

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Dernier (20)

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 

glideinWMS - The Larger Picture

  • 1. glideinWMS training glideinWMS - The Larger Picture i.e. Is it something you would be interested in? by Igor Sfiligoi (UCSD) glideinWMS training glideinWMS - The Larger Picture 1
  • 2. Why this talk? If you never heard of glideinWMS before, you likely have no idea if this is a product you would be interested in using. This talk presents glideinWMS in a larger context, allowing you to understand what this product is all about. glideinWMS training glideinWMS - The Larger Picture 2
  • 3. The basics ● glideinWMS has been designed to address the needs of High Throughput Computing (HTC) ● Better known as batch processing ● In a nutshell, we are trying to facilitate the effective use of a large number of CPUs by a large number of users glideinWMS training glideinWMS - The Larger Picture 3
  • 4. High Throughput Computing ● The basic premise of HTC is that there is always more demand than available CPUs ● We should make good use of those CPUs ● Keep them busy, ideally, 24x7x365 ● Sustained utilization is thus more important than peak performance ● Measure of success is FLOPY = Floating Points per Year not FLOPS = Floating Points per Second glideinWMS training glideinWMS - The Larger Picture 4
  • 5. HTC from the user point of view ● As a side effect, users must be HTC-aware ● There are some negative aspects ● No interactive access, only process queuing – Usually referred to as user jobs ● Waiting in line to get access to CPUs ● But the payoff is potentially huge ● A single user can use 1000s CPUs at a time ● Performing in few days computations that would take several years on a single machine glideinWMS training glideinWMS - The Larger Picture 5
  • 6. HTC in simplified picture User scheduling usually not FIFO Repository Scheduler glideinWMS training glideinWMS - The Larger Picture 6
  • 7. HTC products ● There are many HTC products available ● Although most call themselves “batch systems” ● A non exhaustive list: ● Condor ● PBS, with variants like Torque/Maui ● LSF ● SGE, also known as Oracle Grid Engine glideinWMS training glideinWMS - The Larger Picture 7
  • 8. Why another system? ● All of the mentioned HTC systems assume full control of the compute resources (i.e. CPUs) ● And there are many places where this is the case ● glideinWMS developed to support non-dedicated use of compute resources ● i.e. when CPUs are given to the system only for limited duration at a time glideinWMS training glideinWMS - The Larger Picture 8
  • 9. Non-dedicated resources ● In the past decade, two paradigms emerged ● Grid computing ● Cloud computing ● Both allow a user community to use compute resources they don't own ● Often called resource elasticity ● Managing large number of Grid and Cloud resources by hand impractical ● glideinWMS creates a HTC system using them glideinWMS training glideinWMS - The Larger Picture 9
  • 10. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis ● Thus recently called ● And they happen to use Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 10
  • 11. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis glideinWMS currently optimized ● Thus recently called ● And they happen to use for the Grid model Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 11
  • 12. glideinWMS and the Grid (Cloud resources are used in a similar way) ● glideinWMS creates an overlay system on top of the various HTC clusters HTC ● From the user community HTC point of view, glideinWMS a single HTC system HTC HTC HTC ● Just a dynamic one HTC ● glideinWMS completely automates the process glideinWMS training glideinWMS - The Larger Picture 12
  • 13. Implementation and support ● glideinWMS heavily based on Condor ● Essentially a thin layer on top of it ● Most of the software support thus coming from the Condor development team ● At University of Wisconsin – Madison http://research.cs.wisc.edu/condor/ ● The glideinWMS-specific layer supported by a team spanning Fermilab, UCSD and ISI http://tinyurl.com/glideinWMS glideinWMS training glideinWMS - The Larger Picture 13
  • 14. glideinWMS and Condor ● Condor handles the HTC system ● Most Condor features thus available ● glideinWMS role limited to scheduling, configuring and starting the Condor process on the compute resources HTC glideinWMS Condor CPU Handler User Job Condor Job Repository glideinWMS training glideinWMS - The Larger Picture 14
  • 15. Summary ● glideinWMS is a HTC product ● i.e. enables effective use of a large number of CPUs by a large number of users ● glideinWMS creates a HTC system out of non‑dedicated compute resources ● e.g. Grid and Cloud resources ● glideinWMS is heavily based on Condor ● thus benefits from the Condor team support glideinWMS training glideinWMS - The Larger Picture 15
  • 16. Pointers ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● The official project Web page is http://tinyurl.com/glideinWMS ● OSG glidein factory at UCSD http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html glideinWMS training glideinWMS - The Larger Picture 16
  • 17. Acknowledgments ● This document was sponsored by grants from the US NSF and US DOE, and by the UC system glideinWMS training glideinWMS - The Larger Picture 17