SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Condor Week 2012


          An argument for moving the
         requirements out of user hands
                                             -

                                The CMS
                                experience
                              presented by Igor Sfiligoi
                                      UC San Diego




Condor Week 2012 - May 2012         No user requirements   1
The basics
 ●   Condor architecture clearly separates
      ●   Resource providers                       Machines (aka worker nodes)
                                                        CPUs, Memory, IO,...
          from
      ●   Resource consumers                           Job queues (aka submit nodes)
                                                           Jobs submitted by users
 ●   Each has a daemon
     process to represent it                                    Schedd
                                                                         Negotiator
          Startd for resource provides




                                                                ...
      ●                                                                  Collector
                                                                Schedd
                                                                                                   Startd
      ●   Schedd for resource consumers                                                      ...
                                                                                         Startd
 ●   A central service connects them all                                           ...
                                                                               Startd
      ●
          Managed by a Collector/Negotiator pair

Condor Week 2012 - May 2012     No user requirements                                                 2
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements
           must evaluate to True



     There is also the ranking,
     but that's 2nd level optimization




Condor Week 2012 - May 2012              No user requirements   3
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements                               Most manuals
           must evaluate to True                                focus on job reqs




     There is also the ranking,
     but that's 2nd level optimization




Condor Week 2012 - May 2012              No user requirements                       4
Matchmaking
 ●    In order for a job to start running on a resource
       ●   The job requirements
           must evaluate to True
       ●   The machine requirements                                   Most manuals
           must evaluate to True                                    focus on job reqs




     There is also the ranking,
     but that's 2nd level optimization                    Machine reqs
                                                       deemed for handling
                                                         Owner state only



Condor Week 2012 - May 2012              No user requirements                           5
My argument


                          Get rid of
                      Job Requirements

               Put all logic in the
              Machine Requirements

Condor Week 2012 - May 2012     No user requirements   6
Some background
 ●   I am a big glideinWMS user                                          http://tinyurl.com/glideinWMS



 ●   And glideinWMS has 2 level matchmaking
      ●   One at VO Frontend level – where to send glideins
      ●   One at Negotiator level – which job to start in a glidein
                                 Collector
       Schedd                    Negotiator

                                                               glidein
                                                             Execution node

                                                                Startd

          Frontend                                                   Job
                                Factory

Condor Week 2012 - May 2012           No user requirements                                       7
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs


                                   Job matched to site
                                      in VO Frontend
                                             but
                                   not to site's machine
                                     in the Negotiator




Condor Week 2012 - May 2012   No user requirements         8
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs, or
      ●   Have job waiting in the queue when site available


                                        Job doesn't match to site
                                              in VO Frontend
                                                    but
                                         would to site's machine
                                             in the Negotiator
                                       if glideins were requested




Condor Week 2012 - May 2012   No user requirements                  9
Matchmaking problem
 ●   Both levels must be in sync, or you either
      ●   Ask for glideins which never match any jobs, or
      ●   Have job waiting in the queue when site available
 ●   But site and machine adds have
     different attributes
      ●   Not all machines on a site are exactly the same

                                  So I need
                                 2 different
                                requirements



Condor Week 2012 - May 2012   No user requirements            10
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here




Condor Week 2012 - May 2012      No user requirements         11
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here




      The 2 resources
         are handled
     by the same admin
      (in VO Frontend)




Condor Week 2012 - May 2012      No user requirements         12
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here


                                                             Potentially
                                                               O(1k)
      The 2 resources                                          users!
         are handled
     by the same admin
      (in VO Frontend)
                              Typically

                              O(1)
Condor Week 2012 - May 2012               No user requirements             13
The usual dilemma
 ●   Where do I define these requirements?
      ●   In user job ClassAd?
      ●   Or in the Resource ClassAds?
            –   And we have 2 different resource types here


                                                       And do you really
                                                        trust your users
      The 2 resources                                to do the right thing?
         are handled
     by the same admin
      (in VO Frontend)                                                        Potentially
                              Typically                                   O(1k)
                              O(1)
Condor Week 2012 - May 2012               No user requirements                              14
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves




Condor Week 2012 - May 2012   No user requirements   15
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves
 ●   But users still need a way to select resources!
      ●   How do they do it???




Condor Week 2012 - May 2012   No user requirements     16
Moving reqs to the resources
 ●   So we went for setting the requirements
     in the resources themselves
 ●   But users still need a way to select resources!
      ●   How do they do it???




                              They express their needs
                                 through attributes!



Condor Week 2012 - May 2012      No user requirements    17
Fixed schema
  ●       The resource provider defines the requirements
                              a.k.a startd + VO Frontend
          ●   The VO Frontend admin in our case
  ●       Those requirements look for well-defined
          user-provided attributes             Site attribute
                                               Machine attribute
                                                                     Job attribute


          entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                      ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
Example




          Start   =   stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                      ((Memory>DESIRED_Mem)=!=False)



Condor Week 2012 - May 2012                   No user requirements                   18
Simple user job submit file
 ●   No complex requirements to write
 ●   Very little user training
 ●   Low error rate
                                    a.submit

                         Executable = a.sh
                         Output = a.out
                         +DESIRED_Sites=”UCSD,Nebraska”
                         Requirements=True
                         queue




Condor Week 2012 - May 2012        No user requirements   19
How well does it work?
 ●   Advantages                                 ●    Disadvantages
      ●   Easy to keep the two                        ●    Rigid schema
          levels in sync
      ●   Easy to define
          reasonable defaults
      ●   Easy on the users
      ●   And more...
          (wait for later slides)




Condor Week 2012 - May 2012         No user requirements                  20
How well does it work?
 ●   Advantages                                 ●    Disadvantages
      ●   Easy to keep the two                        ●    Rigid schema
          levels in sync
      ●   Easy to define
          reasonable defaults
      ●   Easy on the users
      ●   And more...                                         In our experience,
          (wait for later slides)                          does not need to change
                                                                  more than
                                                           a couple of times a year




Condor Week 2012 - May 2012         No user requirements                              21
Is this glideinWMS specific?
 ●   Advantages                           ●    Disadvantages
      ●   Easy to keep the two
              Don't think so!
                                                ●    Rigid schema
          levels in sync


      ●   Easy to define
          reasonable defaults
                                                The ease of use
      ●   Easy on the users                       is there for
                                               any Condor setup



Condor Week 2012 - May 2012   No user requirements                  22
Is this glideinWMS specific?
 ●   Advantages                           ●    Disadvantages
      ●   Easy to keep the two
              Don't think so!
                                                ●    Rigid schema
          levels in sync


      ●  Easy to define
         reasonable defaults
      ● Easy on the users
                                                The ease of use
      I would argue it                            is there for
      should become                            any Condor setup
     standard practice


Condor Week 2012 - May 2012   No user requirements                  23
And there is
                  still more to it


Condor Week 2012 - May 2012   No user requirements   24
Side effect
 ●   Discovered one unexpected nice side effect




Condor Week 2012 - May 2012     No user requirements   25
Side effect
 ●   Discovered one unexpected nice side effect


                              We can outsmart
                                our users!




Condor Week 2012 - May 2012       No user requirements   26
The overflow use case
  ●        Normally, CMS jobs run near the data
           ●   So users provide a whitelist of sites to run on
           ●   And we have the appropriate glideinWMS expression
                        a.submit

          Executable = a.sh
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
          queue
                          ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
Example




           Start   =   stringListMember(GLIDEIN_Site,DESIRED_Sites)&&
                       ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                27
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   But some jobs could run over the WAN
      ●   It is just slightly less efficient



                              Xrootd based WAN access
                                 if you are interested




Condor Week 2012 - May 2012      No user requirements    28
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   Jobs could run over the WAN
      ●   It is just slightly less efficient
 ●   But if some CPUs are idle due to low demand
      ●   A low efficiency job is still better than no job!
      ●   As long as it results
          in users getting                    We call this
          their results sooner               “overflowing”
            –   i.e. only if
                “optimal resources” are not available

Condor Week 2012 - May 2012       No user requirements        29
The overflow use case
 ●   Normally, CMS jobs run near the data
 ●   Jobs could run over the WAN
      ●   It is just slightly less efficient
 ●   But if some CPUs are idle due to low demand
      ●   A low efficiency job is still better than no job!
      ●   As long as it results
          in users getting
          their results sooner              So, how do we
           – i.e. only if                  implement this?
               “optimal resources” are not available

Condor Week 2012 - May 2012      No user requirements         30
Overflow configuration
  ●       Essentially, we change the rules!
          ●   Without involving the users
  ●       We write the requirements based on where the
          data is, not where the CPUs are
          ●   Since not all sites have xrootd installed

          entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
                         stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                         ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
          Start = ((CurrentTime-QDate)>21600)&&
                    stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                    ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012       No user requirements                   31
Overflow configuration
  ●        Essentially, we change the rules!
            ●   Without involving the users                No change to the
                                                           user submit file!
  ●        We write the requirements based on where the
           data is, not where the CPUs are
                    a.submit

          Executable = a.sh
            ● Since not all sites have xrootd installed
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
          queue            stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                           ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
           Start = ((CurrentTime-QDate)>21600)&&
                      stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                      ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                       32
Overflow configuration
  ●        Essentially,decide
           And we can we change the rules!
  where to overflow from users
    ● Without involving the                                No change to the
        hours after                                        user submit file!
 the jobs werethe requirements
 ● We write submitted                based on where the
           data is, not where the CPUs are
                    a.submit

          Executable = a.sh
            ● Since not all sites have xrootd installed
          Output = a.out
          +DESIRED_Sites=”UCSD,Nebraska”
          Requirements=True
           entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&&
          queue                          ”UCSD,Wisc”
                           stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
Example




                           ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False)
           Start = ((CurrentTime-QDate)>21600)&&
                      stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&&
                      ((Memory>DESIRED_Mem)=!=False)

Condor Week 2012 - May 2012         No user requirements                       33
Looking
                      at the future


Condor Week 2012 - May 2012   No user requirements   34
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
      ●   Opens up new interesting possibilities




Condor Week 2012 - May 2012   No user requirements   35
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
 ●   Can consider RDBMS techniques
      ●   Example uses
            –   RDMBS driven matchmaking
            –   Tracking of job and/or machine history in a DB




Condor Week 2012 - May 2012       No user requirements           36
Looking at the future
 ●   The attribute schema now a fixed one
      ●   At least regarding matchmaking
 ●   Can consider RDBMS techniques
      ●   Example uses
            –   RDMBS driven matchmaking
            –   Tracking of job and/or machine history in a DB
      ●   Will likely need code changes in Condor
            –   UCSD committed to try it our in the near future
            –   If the results end up as good as hoped for,
                will push for official adoption


Condor Week 2012 - May 2012       No user requirements            37
Conclusions
 ●   Matchmaking is made of requirements
      ●   But there are two places where they can be defined
 ●   Usually, users expected to set requirements
      ●   CMS glideinWMS setup moves it completely
          into the resource domain
 ●   Experience with no-user-req setup very positive
      ●   Advocating that this should be the recommended
          way for all Condor deployments
 ●   Also opens up interesting new venues

Condor Week 2012 - May 2012     No user requirements       38
Acknowledgments
 ●   This work is partially sponsored by
      ●   the US National Science Foundation under Grants
          No. PHY-1104549 (AAA) and PHY-0612805
          (CMS Maintenance & Operations)
          and
      ●   the US Department of Energy under Grant No. DE-
          FC02-06ER41436 subcontract No. 647F290 (OSG).




Condor Week 2012 - May 2012       No user requirements   39

Contenu connexe

Similaire à An argument for moving the requirements out of user hands - The CMS Experience

glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...Igor Sfiligoi
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSIgor Sfiligoi
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolIgor Sfiligoi
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMSIgor Sfiligoi
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios
 
Evolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoEvolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoVMware Tanzu
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
 
Client Vs. Server Rendering
Client Vs. Server RenderingClient Vs. Server Rendering
Client Vs. Server RenderingDavid Amend
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...Igor Sfiligoi
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012Igor Sfiligoi
 
Backwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and TacticsBackwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and TacticsCommonsWare
 
Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02Sopheak Sem
 
Pavel Prischepa. Wodby
Pavel Prischepa. WodbyPavel Prischepa. Wodby
Pavel Prischepa. Wodbyi20 Group
 
Pavel Prischepa - Wodby
Pavel Prischepa - WodbyPavel Prischepa - Wodby
Pavel Prischepa - WodbyDrupalSib
 
Best Practices in PHP Application Deployment
Best Practices in PHP Application DeploymentBest Practices in PHP Application Deployment
Best Practices in PHP Application DeploymentShahar Evron
 
Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012Geoffrey De Smet
 

Similaire à An argument for moving the requirements out of user hands - The CMS Experience (20)

glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM... glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
glideinWMS Frontend Installation - Part 2 - Frontend Installation -glideinWM...
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Glidein internals
Glidein internalsGlidein internals
Glidein internals
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User Experience
 
Evolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand RaoEvolving to Cloud-Native - Anand Rao
Evolving to Cloud-Native - Anand Rao
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
Client Vs. Server Rendering
Client Vs. Server RenderingClient Vs. Server Rendering
Client Vs. Server Rendering
 
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS ...
 
glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012glideinWMS Architecture - glideinWMS Training Jan 2012
glideinWMS Architecture - glideinWMS Training Jan 2012
 
Backwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and TacticsBackwards Compatibility: Strategies and Tactics
Backwards Compatibility: Strategies and Tactics
 
Introduction to rails
Introduction to railsIntroduction to rails
Introduction to rails
 
Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02Introductiontorails 120804023905-phpapp02
Introductiontorails 120804023905-phpapp02
 
Pavel Prischepa. Wodby
Pavel Prischepa. WodbyPavel Prischepa. Wodby
Pavel Prischepa. Wodby
 
Pavel Prischepa - Wodby
Pavel Prischepa - WodbyPavel Prischepa - Wodby
Pavel Prischepa - Wodby
 
Best Practices in PHP Application Deployment
Best Practices in PHP Application DeploymentBest Practices in PHP Application Deployment
Best Practices in PHP Application Deployment
 
Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012Drools planner - 2012-10-23 IntelliFest 2012
Drools planner - 2012-10-23 IntelliFest 2012
 
Java interfaces design perspective
Java interfaces design perspectiveJava interfaces design perspective
Java interfaces design perspective
 
Phpers day 2019
Phpers day 2019Phpers day 2019
Phpers day 2019
 

Plus de Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

Plus de Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Dernier

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

An argument for moving the requirements out of user hands - The CMS Experience

  • 1. Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented by Igor Sfiligoi UC San Diego Condor Week 2012 - May 2012 No user requirements 1
  • 2. The basics ● Condor architecture clearly separates ● Resource providers Machines (aka worker nodes) CPUs, Memory, IO,... from ● Resource consumers Job queues (aka submit nodes) Jobs submitted by users ● Each has a daemon process to represent it Schedd Negotiator Startd for resource provides ... ● Collector Schedd Startd ● Schedd for resource consumers ... Startd ● A central service connects them all ... Startd ● Managed by a Collector/Negotiator pair Condor Week 2012 - May 2012 No user requirements 2
  • 3. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements must evaluate to True There is also the ranking, but that's 2nd level optimization Condor Week 2012 - May 2012 No user requirements 3
  • 4. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements Most manuals must evaluate to True focus on job reqs There is also the ranking, but that's 2nd level optimization Condor Week 2012 - May 2012 No user requirements 4
  • 5. Matchmaking ● In order for a job to start running on a resource ● The job requirements must evaluate to True ● The machine requirements Most manuals must evaluate to True focus on job reqs There is also the ranking, but that's 2nd level optimization Machine reqs deemed for handling Owner state only Condor Week 2012 - May 2012 No user requirements 5
  • 6. My argument Get rid of Job Requirements Put all logic in the Machine Requirements Condor Week 2012 - May 2012 No user requirements 6
  • 7. Some background ● I am a big glideinWMS user http://tinyurl.com/glideinWMS ● And glideinWMS has 2 level matchmaking ● One at VO Frontend level – where to send glideins ● One at Negotiator level – which job to start in a glidein Collector Schedd Negotiator glidein Execution node Startd Frontend Job Factory Condor Week 2012 - May 2012 No user requirements 7
  • 8. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs Job matched to site in VO Frontend but not to site's machine in the Negotiator Condor Week 2012 - May 2012 No user requirements 8
  • 9. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs, or ● Have job waiting in the queue when site available Job doesn't match to site in VO Frontend but would to site's machine in the Negotiator if glideins were requested Condor Week 2012 - May 2012 No user requirements 9
  • 10. Matchmaking problem ● Both levels must be in sync, or you either ● Ask for glideins which never match any jobs, or ● Have job waiting in the queue when site available ● But site and machine adds have different attributes ● Not all machines on a site are exactly the same So I need 2 different requirements Condor Week 2012 - May 2012 No user requirements 10
  • 11. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here Condor Week 2012 - May 2012 No user requirements 11
  • 12. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here The 2 resources are handled by the same admin (in VO Frontend) Condor Week 2012 - May 2012 No user requirements 12
  • 13. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here Potentially O(1k) The 2 resources users! are handled by the same admin (in VO Frontend) Typically O(1) Condor Week 2012 - May 2012 No user requirements 13
  • 14. The usual dilemma ● Where do I define these requirements? ● In user job ClassAd? ● Or in the Resource ClassAds? – And we have 2 different resource types here And do you really trust your users The 2 resources to do the right thing? are handled by the same admin (in VO Frontend) Potentially Typically O(1k) O(1) Condor Week 2012 - May 2012 No user requirements 14
  • 15. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves Condor Week 2012 - May 2012 No user requirements 15
  • 16. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves ● But users still need a way to select resources! ● How do they do it??? Condor Week 2012 - May 2012 No user requirements 16
  • 17. Moving reqs to the resources ● So we went for setting the requirements in the resources themselves ● But users still need a way to select resources! ● How do they do it??? They express their needs through attributes! Condor Week 2012 - May 2012 No user requirements 17
  • 18. Fixed schema ● The resource provider defines the requirements a.k.a startd + VO Frontend ● The VO Frontend admin in our case ● Those requirements look for well-defined user-provided attributes Site attribute Machine attribute Job attribute entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Example Start = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 18
  • 19. Simple user job submit file ● No complex requirements to write ● Very little user training ● Low error rate a.submit Executable = a.sh Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True queue Condor Week 2012 - May 2012 No user requirements 19
  • 20. How well does it work? ● Advantages ● Disadvantages ● Easy to keep the two ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users ● And more... (wait for later slides) Condor Week 2012 - May 2012 No user requirements 20
  • 21. How well does it work? ● Advantages ● Disadvantages ● Easy to keep the two ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users ● And more... In our experience, (wait for later slides) does not need to change more than a couple of times a year Condor Week 2012 - May 2012 No user requirements 21
  • 22. Is this glideinWMS specific? ● Advantages ● Disadvantages ● Easy to keep the two Don't think so! ● Rigid schema levels in sync ● Easy to define reasonable defaults The ease of use ● Easy on the users is there for any Condor setup Condor Week 2012 - May 2012 No user requirements 22
  • 23. Is this glideinWMS specific? ● Advantages ● Disadvantages ● Easy to keep the two Don't think so! ● Rigid schema levels in sync ● Easy to define reasonable defaults ● Easy on the users The ease of use I would argue it is there for should become any Condor setup standard practice Condor Week 2012 - May 2012 No user requirements 23
  • 24. And there is still more to it Condor Week 2012 - May 2012 No user requirements 24
  • 25. Side effect ● Discovered one unexpected nice side effect Condor Week 2012 - May 2012 No user requirements 25
  • 26. Side effect ● Discovered one unexpected nice side effect We can outsmart our users! Condor Week 2012 - May 2012 No user requirements 26
  • 27. The overflow use case ● Normally, CMS jobs run near the data ● So users provide a whitelist of sites to run on ● And we have the appropriate glideinWMS expression a.submit Executable = a.sh Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& queue ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Example Start = stringListMember(GLIDEIN_Site,DESIRED_Sites)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 27
  • 28. The overflow use case ● Normally, CMS jobs run near the data ● But some jobs could run over the WAN ● It is just slightly less efficient Xrootd based WAN access if you are interested Condor Week 2012 - May 2012 No user requirements 28
  • 29. The overflow use case ● Normally, CMS jobs run near the data ● Jobs could run over the WAN ● It is just slightly less efficient ● But if some CPUs are idle due to low demand ● A low efficiency job is still better than no job! ● As long as it results in users getting We call this their results sooner “overflowing” – i.e. only if “optimal resources” are not available Condor Week 2012 - May 2012 No user requirements 29
  • 30. The overflow use case ● Normally, CMS jobs run near the data ● Jobs could run over the WAN ● It is just slightly less efficient ● But if some CPUs are idle due to low demand ● A low efficiency job is still better than no job! ● As long as it results in users getting their results sooner So, how do we – i.e. only if implement this? “optimal resources” are not available Condor Week 2012 - May 2012 No user requirements 30
  • 31. Overflow configuration ● Essentially, we change the rules! ● Without involving the users ● We write the requirements based on where the data is, not where the CPUs are ● Since not all sites have xrootd installed entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 31
  • 32. Overflow configuration ● Essentially, we change the rules! ● Without involving the users No change to the user submit file! ● We write the requirements based on where the data is, not where the CPUs are a.submit Executable = a.sh ● Since not all sites have xrootd installed Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& queue stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 32
  • 33. Overflow configuration ● Essentially,decide And we can we change the rules! where to overflow from users ● Without involving the No change to the hours after user submit file! the jobs werethe requirements ● We write submitted based on where the data is, not where the CPUs are a.submit Executable = a.sh ● Since not all sites have xrootd installed Output = a.out +DESIRED_Sites=”UCSD,Nebraska” Requirements=True entry_req = ((CurrentTime-QDate)>21600)&&(Country=?=”US”)&& queue ”UCSD,Wisc” stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& Example ((GLIDEIN_Min_Mem>DESIRED_Mem)=!=False) Start = ((CurrentTime-QDate)>21600)&& stringListsIntersect(DESIRED_Sites,”UCSD,Wisc”)&& ((Memory>DESIRED_Mem)=!=False) Condor Week 2012 - May 2012 No user requirements 33
  • 34. Looking at the future Condor Week 2012 - May 2012 No user requirements 34
  • 35. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Opens up new interesting possibilities Condor Week 2012 - May 2012 No user requirements 35
  • 36. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Can consider RDBMS techniques ● Example uses – RDMBS driven matchmaking – Tracking of job and/or machine history in a DB Condor Week 2012 - May 2012 No user requirements 36
  • 37. Looking at the future ● The attribute schema now a fixed one ● At least regarding matchmaking ● Can consider RDBMS techniques ● Example uses – RDMBS driven matchmaking – Tracking of job and/or machine history in a DB ● Will likely need code changes in Condor – UCSD committed to try it our in the near future – If the results end up as good as hoped for, will push for official adoption Condor Week 2012 - May 2012 No user requirements 37
  • 38. Conclusions ● Matchmaking is made of requirements ● But there are two places where they can be defined ● Usually, users expected to set requirements ● CMS glideinWMS setup moves it completely into the resource domain ● Experience with no-user-req setup very positive ● Advocating that this should be the recommended way for all Condor deployments ● Also opens up interesting new venues Condor Week 2012 - May 2012 No user requirements 38
  • 39. Acknowledgments ● This work is partially sponsored by ● the US National Science Foundation under Grants No. PHY-1104549 (AAA) and PHY-0612805 (CMS Maintenance & Operations) and ● the US Department of Energy under Grant No. DE- FC02-06ER41436 subcontract No. 647F290 (OSG). Condor Week 2012 - May 2012 No user requirements 39