SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Towards Autonomic Grids

         C´cile Germain-Renaud
           e
Laboratoire de Recherche en Informatique
  Universit´ Paris-Sud - CNRS - INRIA
           e
e-science infrastructures



  2003 NSF Atkins Report :
  Revolutionizing Science and Engineering
  through Cyberinfrastructure
       Grids of computational centers
      Comprehensive libraries of digital
      objects
      Well-curated collections of
      scientific data
      Online instruments and vast sensor
      arrays
      Convenient software toolkits
e-science infrastructures



  2003 NSF Atkins Report :
  Revolutionizing Science and Engineering
  through Cyberinfrastructure
       Grids of computational centers
      Comprehensive libraries of digital
      objects
      Well-curated collections of
      scientific data
      Online instruments and vast sensor
      arrays                                The largest (circ 26km),
      Convenient software toolkits          fastest(14TeV), coldest
                                            (1.9K), emptiest (10−13 atm)
                                            machine.
e-science infrastructures



  2003 NSF Atkins Report :
  Revolutionizing Science and Engineering
  through Cyberinfrastructure
       Grids of computational centers
      Comprehensive libraries of digital
      objects
      Well-curated collections of
                                            Storage and analysis of
      scientific data                        15PB/year
      Online instruments and vast sensor
      arrays
      Convenient software toolkits
e-science infrastructures



  2003 NSF Atkins Report :
  Revolutionizing Science and Engineering
  through Cyberinfrastructure
       Grids of computational centers
      Comprehensive libraries of digital
      objects
      Well-curated collections of           The largest (40000 CPUs),
      scientific data                        most complex (200 VOs),
                                            most distributed (250 sites),
      Online instruments and vast sensor    most used (300K jobs/day)
                                            computing machine
      arrays
      Convenient software toolkits
How we configure our grids




   Courtesy James Casey talk @EGEE09
Outline


   1   The grid ecosystem

   2   Grids and Autonomic Computing

   3   The Grid Observatory

   4   Learning grid models
         On-line fault detection
         Model Selection

   5   Model-free policies
        Policy evaluation
        Reinforcement learning for responsive grids
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




e-science infrastructures

      The classical definition of grids
      A computational grid is a hardware and software infrastructure
      that provides dependable, consistent, pervasive, and inexpensive
      access to high computational capabilities.
      I. Foster, C. Kesselman, The Grid, 1998

      An old dream




      UCLA press release on the creation of Arpanet, 1969
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




The niches in the ecosystem
The grid ecosystem     Grids and Autonomic Computing          The Grid Observatory   Learning grid models   Model-free policies




Grids are not about technology, but about sharing

                                                                       Consumers: Large scale
       Ian Foster’s definition 2000
                                                                       international collaborations
       Grid are defined by
       coordinated resource sharing
       and problem solving in
       dynamic, multi-institutional
       virtual organizations
       The sharing is necessarily, highly controlled, with

       resource providers and consumers defining clearly

       and carefully just what is shared, who is allowed to

       share, and the conditions under which sharing                   Different users with
       occurs. A set of individuals and/or institutions                differentiated requirements
       defined by such sharing rules form a virtual                     across and within the
       organization                                                    collaborations
The grid ecosystem     Grids and Autonomic Computing          The Grid Observatory   Learning grid models   Model-free policies




Grids are not about technology, but about sharing

       Ian Foster’s definition 2000                                     Providers: national and
       Grid are defined by                                              regional institutions
       coordinated resource sharing
       and problem solving in
       dynamic, multi-institutional
       virtual organizations
       The sharing is necessarily, highly controlled, with

       resource providers and consumers defining clearly

       and carefully just what is shared, who is allowed to

       share, and the conditions under which sharing

       occurs. A set of individuals and/or institutions                Organized in National Grid
       defined by such sharing rules form a virtual                     Initiatives, coordinated by EGI
       organization
The grid ecosystem     Grids and Autonomic Computing          The Grid Observatory   Learning grid models   Model-free policies




Grids are not about technology, but about sharing

       Ian Foster’s definition 2000                                     Operators: local sites, with
                                                                       temporary EU support
       Grid are defined by
                                                                       (EGI-Inspire)
       coordinated resource sharing
       and problem solving in
       dynamic, multi-institutional
       virtual organizations
       The sharing is necessarily, highly controlled, with

       resource providers and consumers defining clearly

       and carefully just what is shared, who is allowed to

       share, and the conditions under which sharing

       occurs. A set of individuals and/or institutions

       defined by such sharing rules form a virtual
                                                                       Configuration, prioritization,
       organization
                                                                       monitoring, accounting, . . .
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Do Datacenters and Cloud make Grid obsolete?
The grid ecosystem   Grids and Autonomic Computing     The Grid Observatory    Learning grid models    Model-free policies




*-aaS




               Courtesy William Vambenepe - slides from the Cloud Connect keynote Freeing SaaS from Cloud
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Grids and Clouds
      IaaS : on-demand, elastic, virtualization-based provisioning
              A single-objective optimization target: pay less by turning on
              and off at the minute rather than days or weeks scale
              Convergence path: Grids over Clouds or Clouds of Grids?
              EU project Stratuslab


       SaaS: the core of the IT
       process lies in deploying and
       orchestrating heterogeneous
       software components, and
       having them ”in the cloud”
       does not help much
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Autonomic Computing

      Computing systems that manage themselves in accordance with
      high-level objectives from humans
      Kephart and Chess A vision of Autonomic Computing, IEEE
      Computer 2003
      AUTONOMIC VISION & MANIFESTO
      http://www.research.ibm.com/autonomic/manifesto/
      Relation with Machine Learning : I. Rish tutorial @ECML 2006,
           Self-managing system with the ability of
                     Self-healing: detect, diagnose and repair failures
                     Self-configuring: automatically incorporate and configure
                     components
                     Self-optimizing: ensure the optimal functioning wrt high-level
                     requirements
                     Self-protecting: anticipate and defend against security breaches
              On dynamical non-steady state systems
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Autonomic Computing
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Autonomic Computing
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Autonomic Grids

              Emerging behaviour as the result of sites and stakeholders
              decisions
              Coupled usage: Virtual Organizations, community software
              and activity
              Feedback loops in the middleware
              Incomplete and noisy information

      We need
          Inference of models for middleware components and
          applications, users and usage profiles, users interactions,
          inconsistencies
              Self-configuration and self-optimization for management
              policies
              Self-healing across middleware and applications
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Goals


              Grid digital assets curation
                     Collecting verifiable digital assets
                     Providing digital asset search and retrieval
                     Certification of the trustworthiness and integrity of the
                     collection content
                     Semantic and ontological continuity and comparability of the
                     collection
              Building the domain knowledge
                     Dimensionality and volume reduction: getting rid of the
                     massive redundancy in operational logs
                     Answering operational issues
                     Descriptive/generative/predictive models
                     Design and validation of model-free policies
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Support and collaborations
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Methods


     Focused on EGEE/EGI                                         www.grid-observatory.org
             The best approximation
             of the current needs of
             e-science
             Extensive monitoring
             facilities
             Traces were discarded
             after operational usage,
             and in any case not
             available to the scientific
             community
             Now available without
             grid certificate
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Methods


     Focused on EGEE/EGI
             The best approximation
             of the current needs of
             e-science
             Extensive monitoring
             facilities
             Traces were discarded
             after operational usage,
             and in any case not
             available to the scientific
             community
             Now available without
             grid certificate
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Grids are complex systems




      Users/Files/Clients worker nodes graph display with AVIZ GraphDice
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies




Grids are complex systems




      Users in green, File groups in purple. Rightmost is most ”active”
      And also [Lovro Iliasic PhD Computational Grids as Complex Networks]
The grid ecosystem    Grids and Autonomic Computing        The Grid Observatory   Learning grid models   Model-free policies




Issues

      Large non-stationary system




       Courtesy M. Lassnig et al. Austrian Grid Symp. 09




              Trends
              Academic events
              Scientific events
              Software events
The grid ecosystem        Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

On-line fault detection


Abrupt changepoint detection

       Page-Hinkley Statistics -
       jumps in the mean
       pt changing distribution
       pt = 1 t =1 p
               P
       ¯     t
             Pt
       mt =     =1 (p − p + δ)
                          ¯
       Mt = max{m }
       PHt = Mt − mt
       CUSUM test: if PHt > λ, change
       detected




       First Application
       Blackhole detection
       Validation requires expert
       interpretation
The grid ecosystem        Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

On-line fault detection


StrAP: On-line clustering aka Streaming

      Affinity Propagation (AP)                                               [Frey2007]
               statistical physics algorithm for clustering
               (based on message passing )
               a cluster = an exemplar
               (akin k-centers)
               the model = set of {exemplar, frequency}


       Why AP ?
                Traceability: real jobs as exemplars
                because of categorical variables, e.g., userid, queue name etc
                No prior knowledge of K , number of clusters
                quasi optimality wrt. information loss
                —> stability                                                                    [Meila2006]
The grid ecosystem        Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

On-line fault detection


From AP to Large-scale Data Streaming
                                                                                        h+2
       1 SCALABILITY                          : from O(N 2 log N) to O(N h+1 )
       Hierarchical Affinity Propagation




                negligible infromation loss                                       (proof in the paper)
The grid ecosystem        Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

On-line fault detection


From AP to Large-scale Data Streaming
       2 Non stationary distribution
                various Virtual Organization
                number and expertise of users

       Streaming AP (StrAP)
The grid ecosystem        Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

On-line fault detection


Adaptive change detection test
       Self-adapt λ                   ≡ An optimization problem

                                      |C | 1
                  1
       BIC: Fλ = |C |                 i=1 ni        d(ej , ei∗ ) + ϕ ρ log N + ηOt
                                                    ej ∈Ci           2
               ∝ loss                    + size of model + fraction of outliers

       OPTIMIZATION:
                 -greedy search from a finite set of λ values
                                               λ = argmin{E(Fλ }),

                                    λ1          λ2             λ3            λ4          ...
                                E(Fλ1 )       E(Fλ2 )        E(Fλ3 )       E(Fλ4 )       ...

                Gaussian Process Regression based on {λi , Fλi }
                a continuous value of λ is generated
The grid ecosystem                                      Grids and Autonomic Computing         The Grid Observatory           Learning grid models             Model-free policies

On-line fault detection


G-StrAP: A Grid Dashboard
       Online Monitoring
                                           100
         Percentage of jobs assigned (%)




                                                                                 8                     100
                                                                                18
                                                             exemplar shown     24                              LogMonitor is
                                           80                                                           80
                                                              as a job vector   30
                                                                                595
                                                                                                                getting clogged                                  10
                                                                                                                                                                 18
                                                                                139                                                                              29
                                           60                            10                             60                                                      20091
                                                                7                        7
                                                                         47                                                                                      395
                                                                0                       13                                                         8
                                                                                                                                                                 276
                                                                         54                                                     7           9     18      6
                                           40                   0                       14                                      0           18
                                                                        129                             40               0           10           24      5
                                                                0                       24                                      0           25
                                                                          0                                              0           47           30      10
                                                                0                      9728                              0      0    54   20110   595     14
                                                                          0                                                     0
                                           20                   0                     19190                              0          129     0     139    127
                                                                                                        20                      0           0
                                                 Reservoir                                                               0            0                 10854
                                                                                                             Reservoir   0            0
                                             0                                                           0
                                                    1           2 Clusters3      4     5                        1        2     3     4      5      6      7       8


     Off-line Analysis
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Model Selection


The Piecewise Autoregressive model

       AR process: Xt = γ + φ1 Xt−1 + . . . + φp Xt−p +                           t


     The model Parameters for
     piecewise AR
          Number of segments m
             Breakpoints
             location/segment size
             (nj )j=1...m
             AR orders.(pj )j=1...m                        Segment 1, 0 < t ≤ 512:
             AR parameters                                          Xt = 0.9Xt−1 + t
                                                           Segment 2, 512 < t ≤ 768:
             (Ψj )j=1...m                                           Xt = 1.69Xt−1 − 0.81Xt−2 +                t
                                                           Segment 3, 768 < t ≤ 1024:
     Very large model space                                         Xt = 1.32Xt−1 − 0.81Xt−2 +                t
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Model Selection


Minimum Description Length model selection for PAR


       [Davis, Lee, Rodriguez-Yam, J. American Statist. Assoc. 2006.]
       The MDL principle: the best-fitting model is the one that produces
       the shortest code length that completely describes the observed
       data y
                                        ˆ
                        CLF (y ) = CLF (F) + CLF (e|F)ˆ


                   ˆ
              CLF (F): description of the model
                     ˆ
              CLF (e|F) description the residuals - what is not explained by the
              model
                                                     m+1         pj +2                    n
       CL = log m+(m+1) log n+                       j=1 log pj + 2          log nj + 2j log(2πˆj2 )
                                                                                               σ
The grid ecosystem    Grids and Autonomic Computing         The Grid Observatory       Learning grid models   Model-free policies

Model Selection


Results on the workload processes

       The amount of unterminated work in the system

           Smoothed workload
           difference
           Typically low AR
           models
           Long segments

                                      no. of      segment      segment      smallest       Ljung-Box
                             CE      segment        start        end       root abs.         test on
                                                   [days]       [days]       value          residuals
                                                                                           (p-value)
                            CE-A        18         158.91       196.53      1.5915            0.05
                            CE-B        19         109.61       160.65      2.1563            0.04
                            CE-C        17         104.86       149.31      5.5711            0.21
                            CE-D        27         151.39       190.16      1.1062            0.05


           ´
       [T. Eltet˝ et al. Discovering Piecewise Linear Models of Grid Workload, CCGrid 2010]
                o
The grid ecosystem    Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Model Selection


Model validation

                  PAR: Ljung-Box test -                               Stability: Bootstrapping
                  whiteness of the AR                                 - stable breakpoints
                  residuals
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Model Selection


Model reconciliation – bootstrap aggregation

              Outcome: a simple and robust model describing the essential
              part of the workload process.
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Policy evaluation


Evaluation of the matchmaking scheduling policy



               ART: Actual Response Time = queuing delay at the CE
               ERT: Expected Response Time, copernican principle, gLite
               Question: how good is the prediction?
               Question: what is your definition of good predictor?
                     Root Mean Squared Error?
                     Close statistical distribution, at normal regime, in the tail?
                     Correlation of time series?
                     ROC (Receiver Operating Characteristic): cost-benefit relation
               Heterogeneous data
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Policy evaluation


Evaluation of the matchmaking scheduling policy




                                                             Overall
                                                                 The distributions are not
                                                                 consistent
                                                                     RMSE Atl. 7.94E4, Biom.
                                                                     7.2E3
                                                                     Correlation (subsampling
                                                                     at 900s) is not convincing
The grid ecosystem   Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Policy evaluation


Evaluation of the matchmaking scheduling policy

              `
              A la BQP (Batch Queue
              Predictor) How often does
              the prediction lie within a
              reasonable distance of the
              actual? Modified because
              BQP considers only upper
              bounds
              ERT is a classifier, the
              classes are intervals of the
              value range Intervals of
              exponentially increasing
              size
              ROC: True Positive Rate
              vs False Positive Rate
The grid ecosystem     Grids and Autonomic Computing   The Grid Observatory                  Learning grid models                 Model-free policies

Reinforcement learning for responsive grids


Reinforcement learning for ressource provisioning in grids
                                                                                             1

                                                                                            0.9

     A multi-objective scheduling and                                                       0.8


     dimensioning problem                                                                   0.7




                                                                              Probability
                                                                                            0.6                                             all data
                                                                                                                                            atlas

         Users: Differentiated QoS                                                           0.5

                                                                                            0.4
                                                                                                                                            biomed




              Stakeholders: Fairness                                                        0.3

                                                                                            0.2


              Administrators: Utilization                                                   0.1 0
                                                                                              10
                                                                                                     1
                                                                                                    10
                                                                                                          2
                                                                                                         10          10
                                                                                                                       3

                                                                                                              Execution time [s]
                                                                                                                                  4
                                                                                                                                 10
                                                                                                                                        5
                                                                                                                                       10          10
                                                                                                                                                       6




       Goals
           Elastic resource provisioning: the context is Grids over Clouds
           - Infrastructure as a Service (IaaS)
               Realistic hypotheses: organized sharing and mutualization, no
               central control
               Autonomics: Model-free policies and configuration-free
               implementations
The grid ecosystem     Grids and Autonomic Computing   The Grid Observatory   Learning grid models   Model-free policies

Reinforcement learning for responsive grids


Formalisation

       The scheduling MDP
               State: descriptive variables of a site (queue, cluster)
               Action: descriptive variables of a job (VO, execution time)

       The dimensioning MDP
           Action: number of computing nodes to maintain in activity

       Policy learning
            sarsa algorithm
               Continuous state-action space: Non linear regression of
               Q : (s, a) → r
               Neural Network and Echo State Network
The grid ecosystem     Grids and Autonomic Computing     The Grid Observatory   Learning grid models   Model-free policies

Reinforcement learning for responsive grids


The Rewards
               The Responsiveness utility for job j is
                                       execution timej
                        Wj =                                  .                                              (1)
                              execution timej + waiting timej
               The Fairness utility for job j is
                                                         maxk (wk − Skj )+ ,
                                              Fj = 1 −                       ,                               (2)
                                                                M
               where x+ = x if x > 0 and 0 otherwise, wk the target share of VO
               k, and Skj the share received by VO k up to the election of job j
               The Utilization reward Un at time Tn is
                                                    fn
                                   Un = n                                        (3)
                                            k=0 Pk (Tk+1 − Tk )
               where (T1 , . . . , TN ) are the instants of decision making, Pk the
               number of processors allocated in the interval [Tk , Tk+1 ] for
               1 ≤ n < N, and fn the sum of the execution times of jobs completed
               at time Tn .
The grid ecosystem                                             Grids and Autonomic Computing                                    The Grid Observatory        Learning grid models                    Model-free policies

Reinforcement learning for responsive grids


Experimental results on EGEE traces

                    1


                  0.95


                   0.9


                  0.85                                                                                                                               1

                                                                                                                                                   0.98
            CDF




                   0.8
                                                                                                                                                   0.96
                  0.75
                                                                                                                                                   0.94
                                                                                                        EGEE−INTER
                   0.7                                                                                  ORA−INTER−0.5                              0.92
                                                                                                        ORA−INTER−1.0




                                                                                                                                             CDF
                  0.65                                                                                  EST−INTER−0.5                               0.9
                                                                                                        EST−INTER−1.0
                                                                                                                                                   0.88
                   0.6                                                                                                                                                                        ELA−ORA−0.5
                       1                                              2                  3                4                 5
                     10                                          10               10                    10                 10                      0.86                                       ELA−ORA−1.0
                                                                           Queueing delay (sec)
                                                                                                                                                                                              ELA−EST−0.5
                                                                                                                                                   0.84                                       ELA−EST−1.0
                  Queuing delays - interactive jobs -Rigid                                                                                         0.82
                                                                                                                                                                                              RIG−ORA−0.5
                                                    −3
                                                                                                                                                                                              RIG−ORA−1.0
                                                x 10
                                            4                                                                                                       0.8
                                                                                                                                                        1     2             3             4                  5
                                                               ELA−ORA−0.5 − EGEE                                                                     10    10           10              10                 10
                                            3
                                                                                                                                                                  Queueing delay (sec)


                                            2
                                                                                                                                           Queuing delays - interactive jobs - Elastic                           [J.
                    Fairshare Difference




                                            1


                                            0


                                           −1


                                           −2
                                                                                                                                            Perez et al. JoGC 8/3 Sep. 2010]
                                           −3
                                                0        0.5      1       1.5        2      2.5     3   3.5   4      4.5
                                                                                Arrival Times (sec)                  6
                                                                                                                  x 10


            Dynamics of the fairshare - All jobs - Rigid
Conclusion

Contenu connexe

Tendances

SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY) SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
IJCSES Journal
 
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
IJORCS
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of Things
PayamBarnaghi
 
Tutorial 4 francisco garcia moran
Tutorial 4 francisco garcia moranTutorial 4 francisco garcia moran
Tutorial 4 francisco garcia moran
egovernment
 

Tendances (15)

SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY) SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
SECURITY AND KEY MANAGEMENT CHALLENGES OVER WSN (A SURVEY)
 
Report-Fog Based Emergency System For Smart Enhanced Living Environment
Report-Fog Based Emergency System For Smart Enhanced Living EnvironmentReport-Fog Based Emergency System For Smart Enhanced Living Environment
Report-Fog Based Emergency System For Smart Enhanced Living Environment
 
IoT Virtualization Poster
IoT Virtualization PosterIoT Virtualization Poster
IoT Virtualization Poster
 
Sensor Data Aggregation using a Cross Layer Framework for Smart City Applicat...
Sensor Data Aggregation using a Cross Layer Framework for Smart City Applicat...Sensor Data Aggregation using a Cross Layer Framework for Smart City Applicat...
Sensor Data Aggregation using a Cross Layer Framework for Smart City Applicat...
 
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing From Physical to Virtual Wireless Sensor Networks using Cloud Computing
From Physical to Virtual Wireless Sensor Networks using Cloud Computing
 
Wireless Sensor Networks UNIT-1
Wireless Sensor Networks UNIT-1Wireless Sensor Networks UNIT-1
Wireless Sensor Networks UNIT-1
 
Internet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesInternet of Things: Concepts and Technologies
Internet of Things: Concepts and Technologies
 
Overview: The Center of Excellence in Wireless & Information Technology
Overview: The Center of Excellence in Wireless & Information TechnologyOverview: The Center of Excellence in Wireless & Information Technology
Overview: The Center of Excellence in Wireless & Information Technology
 
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of Things
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012
 
Smart dust
Smart dustSmart dust
Smart dust
 
Tutorial 4 francisco garcia moran
Tutorial 4 francisco garcia moranTutorial 4 francisco garcia moran
Tutorial 4 francisco garcia moran
 
Computer Science Dissertation Topic Ideas For Phd Scholar - Phdassistance
Computer Science Dissertation Topic Ideas For Phd Scholar - PhdassistanceComputer Science Dissertation Topic Ideas For Phd Scholar - Phdassistance
Computer Science Dissertation Topic Ideas For Phd Scholar - Phdassistance
 

En vedette (8)

Green computing
Green computingGreen computing
Green computing
 
Grid Observatory @ CCGrid 2011
Grid Observatory @ CCGrid 2011Grid Observatory @ CCGrid 2011
Grid Observatory @ CCGrid 2011
 
Towards Autonomic e-Science Ecosystems
Towards Autonomic e-Science EcosystemsTowards Autonomic e-Science Ecosystems
Towards Autonomic e-Science Ecosystems
 
Green Computing Observatory
Green Computing ObservatoryGreen Computing Observatory
Green Computing Observatory
 
Modelling Globalized Systems
Modelling Globalized SystemsModelling Globalized Systems
Modelling Globalized Systems
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and Adolescents
 
The Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris LemaThe Buyer's Journey - by Chris Lema
The Buyer's Journey - by Chris Lema
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back Kid
 

Similaire à Towards Autonomic Grids

Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
sudha kar
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the grid
Jivan Nepali
 
Tafazolli io it_rcuk_tsb_11_july_2012
Tafazolli io it_rcuk_tsb_11_july_2012Tafazolli io it_rcuk_tsb_11_july_2012
Tafazolli io it_rcuk_tsb_11_july_2012
grahamhitchen
 
Poster for Snickers Conference(Afghah Version)
Poster for Snickers Conference(Afghah Version)Poster for Snickers Conference(Afghah Version)
Poster for Snickers Conference(Afghah Version)
Jabari Stegall
 

Similaire à Towards Autonomic Grids (20)

Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
Grid computing the grid
Grid computing the gridGrid computing the grid
Grid computing the grid
 
The Internet of Things (IoT) and its evolution
The Internet of Things (IoT) and its evolutionThe Internet of Things (IoT) and its evolution
The Internet of Things (IoT) and its evolution
 
Tafazolli io it_rcuk_tsb_11_july_2012
Tafazolli io it_rcuk_tsb_11_july_2012Tafazolli io it_rcuk_tsb_11_july_2012
Tafazolli io it_rcuk_tsb_11_july_2012
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Closing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen ActuationClosing the Loop - From Citizen Sensing to Citizen Actuation
Closing the Loop - From Citizen Sensing to Citizen Actuation
 
Grid computing
Grid computingGrid computing
Grid computing
 
Autonomic computer
Autonomic computerAutonomic computer
Autonomic computer
 
Autonomic computer
Autonomic computerAutonomic computer
Autonomic computer
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Grid computing
Grid computingGrid computing
Grid computing
 
3. the grid new infrastructure
3. the grid new infrastructure3. the grid new infrastructure
3. the grid new infrastructure
 
International Journal of Advanced Smart Sensor Network Systems ( IJASSN )
International Journal of Advanced Smart Sensor Network Systems  ( IJASSN )International Journal of Advanced Smart Sensor Network Systems  ( IJASSN )
International Journal of Advanced Smart Sensor Network Systems ( IJASSN )
 
International Journal of Advanced Smart Sensor Network Systems ( IJASSN )
International Journal of Advanced Smart Sensor Network Systems ( IJASSN )International Journal of Advanced Smart Sensor Network Systems ( IJASSN )
International Journal of Advanced Smart Sensor Network Systems ( IJASSN )
 
NO PUBLICATION CHARGES - International Journal of Advanced Smart Sensor Netwo...
NO PUBLICATION CHARGES - International Journal of Advanced Smart Sensor Netwo...NO PUBLICATION CHARGES - International Journal of Advanced Smart Sensor Netwo...
NO PUBLICATION CHARGES - International Journal of Advanced Smart Sensor Netwo...
 
Internet of Things: Research Directions
Internet of Things: Research DirectionsInternet of Things: Research Directions
Internet of Things: Research Directions
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Poster for Snickers Conference(Afghah Version)
Poster for Snickers Conference(Afghah Version)Poster for Snickers Conference(Afghah Version)
Poster for Snickers Conference(Afghah Version)
 

Dernier

Dernier (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Towards Autonomic Grids

  • 1. Towards Autonomic Grids C´cile Germain-Renaud e Laboratoire de Recherche en Informatique Universit´ Paris-Sud - CNRS - INRIA e
  • 2. e-science infrastructures 2003 NSF Atkins Report : Revolutionizing Science and Engineering through Cyberinfrastructure Grids of computational centers Comprehensive libraries of digital objects Well-curated collections of scientific data Online instruments and vast sensor arrays Convenient software toolkits
  • 3. e-science infrastructures 2003 NSF Atkins Report : Revolutionizing Science and Engineering through Cyberinfrastructure Grids of computational centers Comprehensive libraries of digital objects Well-curated collections of scientific data Online instruments and vast sensor arrays The largest (circ 26km), Convenient software toolkits fastest(14TeV), coldest (1.9K), emptiest (10−13 atm) machine.
  • 4. e-science infrastructures 2003 NSF Atkins Report : Revolutionizing Science and Engineering through Cyberinfrastructure Grids of computational centers Comprehensive libraries of digital objects Well-curated collections of Storage and analysis of scientific data 15PB/year Online instruments and vast sensor arrays Convenient software toolkits
  • 5. e-science infrastructures 2003 NSF Atkins Report : Revolutionizing Science and Engineering through Cyberinfrastructure Grids of computational centers Comprehensive libraries of digital objects Well-curated collections of The largest (40000 CPUs), scientific data most complex (200 VOs), most distributed (250 sites), Online instruments and vast sensor most used (300K jobs/day) computing machine arrays Convenient software toolkits
  • 6. How we configure our grids Courtesy James Casey talk @EGEE09
  • 7. Outline 1 The grid ecosystem 2 Grids and Autonomic Computing 3 The Grid Observatory 4 Learning grid models On-line fault detection Model Selection 5 Model-free policies Policy evaluation Reinforcement learning for responsive grids
  • 8. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies e-science infrastructures The classical definition of grids A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high computational capabilities. I. Foster, C. Kesselman, The Grid, 1998 An old dream UCLA press release on the creation of Arpanet, 1969
  • 9. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies The niches in the ecosystem
  • 10. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids are not about technology, but about sharing Consumers: Large scale Ian Foster’s definition 2000 international collaborations Grid are defined by coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations The sharing is necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing Different users with occurs. A set of individuals and/or institutions differentiated requirements defined by such sharing rules form a virtual across and within the organization collaborations
  • 11. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids are not about technology, but about sharing Ian Foster’s definition 2000 Providers: national and Grid are defined by regional institutions coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations The sharing is necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions Organized in National Grid defined by such sharing rules form a virtual Initiatives, coordinated by EGI organization
  • 12. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids are not about technology, but about sharing Ian Foster’s definition 2000 Operators: local sites, with temporary EU support Grid are defined by (EGI-Inspire) coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations The sharing is necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form a virtual Configuration, prioritization, organization monitoring, accounting, . . .
  • 13. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Do Datacenters and Cloud make Grid obsolete?
  • 14. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies *-aaS Courtesy William Vambenepe - slides from the Cloud Connect keynote Freeing SaaS from Cloud
  • 15. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids and Clouds IaaS : on-demand, elastic, virtualization-based provisioning A single-objective optimization target: pay less by turning on and off at the minute rather than days or weeks scale Convergence path: Grids over Clouds or Clouds of Grids? EU project Stratuslab SaaS: the core of the IT process lies in deploying and orchestrating heterogeneous software components, and having them ”in the cloud” does not help much
  • 16. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans Kephart and Chess A vision of Autonomic Computing, IEEE Computer 2003 AUTONOMIC VISION & MANIFESTO http://www.research.ibm.com/autonomic/manifesto/ Relation with Machine Learning : I. Rish tutorial @ECML 2006, Self-managing system with the ability of Self-healing: detect, diagnose and repair failures Self-configuring: automatically incorporate and configure components Self-optimizing: ensure the optimal functioning wrt high-level requirements Self-protecting: anticipate and defend against security breaches On dynamical non-steady state systems
  • 17. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Autonomic Computing
  • 18. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Autonomic Computing
  • 19. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Autonomic Grids Emerging behaviour as the result of sites and stakeholders decisions Coupled usage: Virtual Organizations, community software and activity Feedback loops in the middleware Incomplete and noisy information We need Inference of models for middleware components and applications, users and usage profiles, users interactions, inconsistencies Self-configuration and self-optimization for management policies Self-healing across middleware and applications
  • 20. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Goals Grid digital assets curation Collecting verifiable digital assets Providing digital asset search and retrieval Certification of the trustworthiness and integrity of the collection content Semantic and ontological continuity and comparability of the collection Building the domain knowledge Dimensionality and volume reduction: getting rid of the massive redundancy in operational logs Answering operational issues Descriptive/generative/predictive models Design and validation of model-free policies
  • 21. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Support and collaborations
  • 22. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Methods Focused on EGEE/EGI www.grid-observatory.org The best approximation of the current needs of e-science Extensive monitoring facilities Traces were discarded after operational usage, and in any case not available to the scientific community Now available without grid certificate
  • 23. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Methods Focused on EGEE/EGI The best approximation of the current needs of e-science Extensive monitoring facilities Traces were discarded after operational usage, and in any case not available to the scientific community Now available without grid certificate
  • 24. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids are complex systems Users/Files/Clients worker nodes graph display with AVIZ GraphDice
  • 25. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Grids are complex systems Users in green, File groups in purple. Rightmost is most ”active” And also [Lovro Iliasic PhD Computational Grids as Complex Networks]
  • 26. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Issues Large non-stationary system Courtesy M. Lassnig et al. Austrian Grid Symp. 09 Trends Academic events Scientific events Software events
  • 27. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection Abrupt changepoint detection Page-Hinkley Statistics - jumps in the mean pt changing distribution pt = 1 t =1 p P ¯ t Pt mt = =1 (p − p + δ) ¯ Mt = max{m } PHt = Mt − mt CUSUM test: if PHt > λ, change detected First Application Blackhole detection Validation requires expert interpretation
  • 28. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection StrAP: On-line clustering aka Streaming Affinity Propagation (AP) [Frey2007] statistical physics algorithm for clustering (based on message passing ) a cluster = an exemplar (akin k-centers) the model = set of {exemplar, frequency} Why AP ? Traceability: real jobs as exemplars because of categorical variables, e.g., userid, queue name etc No prior knowledge of K , number of clusters quasi optimality wrt. information loss —> stability [Meila2006]
  • 29. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection From AP to Large-scale Data Streaming h+2 1 SCALABILITY : from O(N 2 log N) to O(N h+1 ) Hierarchical Affinity Propagation negligible infromation loss (proof in the paper)
  • 30. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection From AP to Large-scale Data Streaming 2 Non stationary distribution various Virtual Organization number and expertise of users Streaming AP (StrAP)
  • 31. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection Adaptive change detection test Self-adapt λ ≡ An optimization problem |C | 1 1 BIC: Fλ = |C | i=1 ni d(ej , ei∗ ) + ϕ ρ log N + ηOt ej ∈Ci 2 ∝ loss + size of model + fraction of outliers OPTIMIZATION: -greedy search from a finite set of λ values λ = argmin{E(Fλ }), λ1 λ2 λ3 λ4 ... E(Fλ1 ) E(Fλ2 ) E(Fλ3 ) E(Fλ4 ) ... Gaussian Process Regression based on {λi , Fλi } a continuous value of λ is generated
  • 32. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies On-line fault detection G-StrAP: A Grid Dashboard Online Monitoring 100 Percentage of jobs assigned (%) 8 100 18 exemplar shown 24 LogMonitor is 80 80 as a job vector 30 595 getting clogged 10 18 139 29 60 10 60 20091 7 7 47 395 0 13 8 276 54 7 9 18 6 40 0 14 0 18 129 40 0 10 24 5 0 24 0 25 0 0 47 30 10 0 9728 0 0 54 20110 595 14 0 0 20 0 19190 0 129 0 139 127 20 0 0 Reservoir 0 0 10854 Reservoir 0 0 0 0 1 2 Clusters3 4 5 1 2 3 4 5 6 7 8 Off-line Analysis
  • 33. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Model Selection The Piecewise Autoregressive model AR process: Xt = γ + φ1 Xt−1 + . . . + φp Xt−p + t The model Parameters for piecewise AR Number of segments m Breakpoints location/segment size (nj )j=1...m AR orders.(pj )j=1...m Segment 1, 0 < t ≤ 512: AR parameters Xt = 0.9Xt−1 + t Segment 2, 512 < t ≤ 768: (Ψj )j=1...m Xt = 1.69Xt−1 − 0.81Xt−2 + t Segment 3, 768 < t ≤ 1024: Very large model space Xt = 1.32Xt−1 − 0.81Xt−2 + t
  • 34. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Model Selection Minimum Description Length model selection for PAR [Davis, Lee, Rodriguez-Yam, J. American Statist. Assoc. 2006.] The MDL principle: the best-fitting model is the one that produces the shortest code length that completely describes the observed data y ˆ CLF (y ) = CLF (F) + CLF (e|F)ˆ ˆ CLF (F): description of the model ˆ CLF (e|F) description the residuals - what is not explained by the model m+1 pj +2 n CL = log m+(m+1) log n+ j=1 log pj + 2 log nj + 2j log(2πˆj2 ) σ
  • 35. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Model Selection Results on the workload processes The amount of unterminated work in the system Smoothed workload difference Typically low AR models Long segments no. of segment segment smallest Ljung-Box CE segment start end root abs. test on [days] [days] value residuals (p-value) CE-A 18 158.91 196.53 1.5915 0.05 CE-B 19 109.61 160.65 2.1563 0.04 CE-C 17 104.86 149.31 5.5711 0.21 CE-D 27 151.39 190.16 1.1062 0.05 ´ [T. Eltet˝ et al. Discovering Piecewise Linear Models of Grid Workload, CCGrid 2010] o
  • 36. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Model Selection Model validation PAR: Ljung-Box test - Stability: Bootstrapping whiteness of the AR - stable breakpoints residuals
  • 37. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Model Selection Model reconciliation – bootstrap aggregation Outcome: a simple and robust model describing the essential part of the workload process.
  • 38. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Policy evaluation Evaluation of the matchmaking scheduling policy ART: Actual Response Time = queuing delay at the CE ERT: Expected Response Time, copernican principle, gLite Question: how good is the prediction? Question: what is your definition of good predictor? Root Mean Squared Error? Close statistical distribution, at normal regime, in the tail? Correlation of time series? ROC (Receiver Operating Characteristic): cost-benefit relation Heterogeneous data
  • 39. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Policy evaluation Evaluation of the matchmaking scheduling policy Overall The distributions are not consistent RMSE Atl. 7.94E4, Biom. 7.2E3 Correlation (subsampling at 900s) is not convincing
  • 40. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Policy evaluation Evaluation of the matchmaking scheduling policy ` A la BQP (Batch Queue Predictor) How often does the prediction lie within a reasonable distance of the actual? Modified because BQP considers only upper bounds ERT is a classifier, the classes are intervals of the value range Intervals of exponentially increasing size ROC: True Positive Rate vs False Positive Rate
  • 41. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Reinforcement learning for responsive grids Reinforcement learning for ressource provisioning in grids 1 0.9 A multi-objective scheduling and 0.8 dimensioning problem 0.7 Probability 0.6 all data atlas Users: Differentiated QoS 0.5 0.4 biomed Stakeholders: Fairness 0.3 0.2 Administrators: Utilization 0.1 0 10 1 10 2 10 10 3 Execution time [s] 4 10 5 10 10 6 Goals Elastic resource provisioning: the context is Grids over Clouds - Infrastructure as a Service (IaaS) Realistic hypotheses: organized sharing and mutualization, no central control Autonomics: Model-free policies and configuration-free implementations
  • 42. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Reinforcement learning for responsive grids Formalisation The scheduling MDP State: descriptive variables of a site (queue, cluster) Action: descriptive variables of a job (VO, execution time) The dimensioning MDP Action: number of computing nodes to maintain in activity Policy learning sarsa algorithm Continuous state-action space: Non linear regression of Q : (s, a) → r Neural Network and Echo State Network
  • 43. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Reinforcement learning for responsive grids The Rewards The Responsiveness utility for job j is execution timej Wj = . (1) execution timej + waiting timej The Fairness utility for job j is maxk (wk − Skj )+ , Fj = 1 − , (2) M where x+ = x if x > 0 and 0 otherwise, wk the target share of VO k, and Skj the share received by VO k up to the election of job j The Utilization reward Un at time Tn is fn Un = n (3) k=0 Pk (Tk+1 − Tk ) where (T1 , . . . , TN ) are the instants of decision making, Pk the number of processors allocated in the interval [Tk , Tk+1 ] for 1 ≤ n < N, and fn the sum of the execution times of jobs completed at time Tn .
  • 44. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies Reinforcement learning for responsive grids Experimental results on EGEE traces 1 0.95 0.9 0.85 1 0.98 CDF 0.8 0.96 0.75 0.94 EGEE−INTER 0.7 ORA−INTER−0.5 0.92 ORA−INTER−1.0 CDF 0.65 EST−INTER−0.5 0.9 EST−INTER−1.0 0.88 0.6 ELA−ORA−0.5 1 2 3 4 5 10 10 10 10 10 0.86 ELA−ORA−1.0 Queueing delay (sec) ELA−EST−0.5 0.84 ELA−EST−1.0 Queuing delays - interactive jobs -Rigid 0.82 RIG−ORA−0.5 −3 RIG−ORA−1.0 x 10 4 0.8 1 2 3 4 5 ELA−ORA−0.5 − EGEE 10 10 10 10 10 3 Queueing delay (sec) 2 Queuing delays - interactive jobs - Elastic [J. Fairshare Difference 1 0 −1 −2 Perez et al. JoGC 8/3 Sep. 2010] −3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Arrival Times (sec) 6 x 10 Dynamics of the fairshare - All jobs - Rigid