80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
Towards Autonomic Grids
1. Towards Autonomic Grids
C´cile Germain-Renaud
e
Laboratoire de Recherche en Informatique
Universit´ Paris-Sud - CNRS - INRIA
e
2. e-science infrastructures
2003 NSF Atkins Report :
Revolutionizing Science and Engineering
through Cyberinfrastructure
Grids of computational centers
Comprehensive libraries of digital
objects
Well-curated collections of
scientific data
Online instruments and vast sensor
arrays
Convenient software toolkits
3. e-science infrastructures
2003 NSF Atkins Report :
Revolutionizing Science and Engineering
through Cyberinfrastructure
Grids of computational centers
Comprehensive libraries of digital
objects
Well-curated collections of
scientific data
Online instruments and vast sensor
arrays The largest (circ 26km),
Convenient software toolkits fastest(14TeV), coldest
(1.9K), emptiest (10−13 atm)
machine.
4. e-science infrastructures
2003 NSF Atkins Report :
Revolutionizing Science and Engineering
through Cyberinfrastructure
Grids of computational centers
Comprehensive libraries of digital
objects
Well-curated collections of
Storage and analysis of
scientific data 15PB/year
Online instruments and vast sensor
arrays
Convenient software toolkits
5. e-science infrastructures
2003 NSF Atkins Report :
Revolutionizing Science and Engineering
through Cyberinfrastructure
Grids of computational centers
Comprehensive libraries of digital
objects
Well-curated collections of The largest (40000 CPUs),
scientific data most complex (200 VOs),
most distributed (250 sites),
Online instruments and vast sensor most used (300K jobs/day)
computing machine
arrays
Convenient software toolkits
7. Outline
1 The grid ecosystem
2 Grids and Autonomic Computing
3 The Grid Observatory
4 Learning grid models
On-line fault detection
Model Selection
5 Model-free policies
Policy evaluation
Reinforcement learning for responsive grids
8. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
e-science infrastructures
The classical definition of grids
A computational grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive, and inexpensive
access to high computational capabilities.
I. Foster, C. Kesselman, The Grid, 1998
An old dream
UCLA press release on the creation of Arpanet, 1969
9. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
The niches in the ecosystem
10. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids are not about technology, but about sharing
Consumers: Large scale
Ian Foster’s definition 2000
international collaborations
Grid are defined by
coordinated resource sharing
and problem solving in
dynamic, multi-institutional
virtual organizations
The sharing is necessarily, highly controlled, with
resource providers and consumers defining clearly
and carefully just what is shared, who is allowed to
share, and the conditions under which sharing Different users with
occurs. A set of individuals and/or institutions differentiated requirements
defined by such sharing rules form a virtual across and within the
organization collaborations
11. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids are not about technology, but about sharing
Ian Foster’s definition 2000 Providers: national and
Grid are defined by regional institutions
coordinated resource sharing
and problem solving in
dynamic, multi-institutional
virtual organizations
The sharing is necessarily, highly controlled, with
resource providers and consumers defining clearly
and carefully just what is shared, who is allowed to
share, and the conditions under which sharing
occurs. A set of individuals and/or institutions Organized in National Grid
defined by such sharing rules form a virtual Initiatives, coordinated by EGI
organization
12. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids are not about technology, but about sharing
Ian Foster’s definition 2000 Operators: local sites, with
temporary EU support
Grid are defined by
(EGI-Inspire)
coordinated resource sharing
and problem solving in
dynamic, multi-institutional
virtual organizations
The sharing is necessarily, highly controlled, with
resource providers and consumers defining clearly
and carefully just what is shared, who is allowed to
share, and the conditions under which sharing
occurs. A set of individuals and/or institutions
defined by such sharing rules form a virtual
Configuration, prioritization,
organization
monitoring, accounting, . . .
13. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Do Datacenters and Cloud make Grid obsolete?
14. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
*-aaS
Courtesy William Vambenepe - slides from the Cloud Connect keynote Freeing SaaS from Cloud
15. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids and Clouds
IaaS : on-demand, elastic, virtualization-based provisioning
A single-objective optimization target: pay less by turning on
and off at the minute rather than days or weeks scale
Convergence path: Grids over Clouds or Clouds of Grids?
EU project Stratuslab
SaaS: the core of the IT
process lies in deploying and
orchestrating heterogeneous
software components, and
having them ”in the cloud”
does not help much
16. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Autonomic Computing
Computing systems that manage themselves in accordance with
high-level objectives from humans
Kephart and Chess A vision of Autonomic Computing, IEEE
Computer 2003
AUTONOMIC VISION & MANIFESTO
http://www.research.ibm.com/autonomic/manifesto/
Relation with Machine Learning : I. Rish tutorial @ECML 2006,
Self-managing system with the ability of
Self-healing: detect, diagnose and repair failures
Self-configuring: automatically incorporate and configure
components
Self-optimizing: ensure the optimal functioning wrt high-level
requirements
Self-protecting: anticipate and defend against security breaches
On dynamical non-steady state systems
17. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Autonomic Computing
18. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Autonomic Computing
19. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Autonomic Grids
Emerging behaviour as the result of sites and stakeholders
decisions
Coupled usage: Virtual Organizations, community software
and activity
Feedback loops in the middleware
Incomplete and noisy information
We need
Inference of models for middleware components and
applications, users and usage profiles, users interactions,
inconsistencies
Self-configuration and self-optimization for management
policies
Self-healing across middleware and applications
20. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Goals
Grid digital assets curation
Collecting verifiable digital assets
Providing digital asset search and retrieval
Certification of the trustworthiness and integrity of the
collection content
Semantic and ontological continuity and comparability of the
collection
Building the domain knowledge
Dimensionality and volume reduction: getting rid of the
massive redundancy in operational logs
Answering operational issues
Descriptive/generative/predictive models
Design and validation of model-free policies
21. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Support and collaborations
22. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Methods
Focused on EGEE/EGI www.grid-observatory.org
The best approximation
of the current needs of
e-science
Extensive monitoring
facilities
Traces were discarded
after operational usage,
and in any case not
available to the scientific
community
Now available without
grid certificate
23. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Methods
Focused on EGEE/EGI
The best approximation
of the current needs of
e-science
Extensive monitoring
facilities
Traces were discarded
after operational usage,
and in any case not
available to the scientific
community
Now available without
grid certificate
24. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids are complex systems
Users/Files/Clients worker nodes graph display with AVIZ GraphDice
25. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Grids are complex systems
Users in green, File groups in purple. Rightmost is most ”active”
And also [Lovro Iliasic PhD Computational Grids as Complex Networks]
26. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Issues
Large non-stationary system
Courtesy M. Lassnig et al. Austrian Grid Symp. 09
Trends
Academic events
Scientific events
Software events
27. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
On-line fault detection
Abrupt changepoint detection
Page-Hinkley Statistics -
jumps in the mean
pt changing distribution
pt = 1 t =1 p
P
¯ t
Pt
mt = =1 (p − p + δ)
¯
Mt = max{m }
PHt = Mt − mt
CUSUM test: if PHt > λ, change
detected
First Application
Blackhole detection
Validation requires expert
interpretation
28. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
On-line fault detection
StrAP: On-line clustering aka Streaming
Affinity Propagation (AP) [Frey2007]
statistical physics algorithm for clustering
(based on message passing )
a cluster = an exemplar
(akin k-centers)
the model = set of {exemplar, frequency}
Why AP ?
Traceability: real jobs as exemplars
because of categorical variables, e.g., userid, queue name etc
No prior knowledge of K , number of clusters
quasi optimality wrt. information loss
—> stability [Meila2006]
29. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
On-line fault detection
From AP to Large-scale Data Streaming
h+2
1 SCALABILITY : from O(N 2 log N) to O(N h+1 )
Hierarchical Affinity Propagation
negligible infromation loss (proof in the paper)
30. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
On-line fault detection
From AP to Large-scale Data Streaming
2 Non stationary distribution
various Virtual Organization
number and expertise of users
Streaming AP (StrAP)
31. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
On-line fault detection
Adaptive change detection test
Self-adapt λ ≡ An optimization problem
|C | 1
1
BIC: Fλ = |C | i=1 ni d(ej , ei∗ ) + ϕ ρ log N + ηOt
ej ∈Ci 2
∝ loss + size of model + fraction of outliers
OPTIMIZATION:
-greedy search from a finite set of λ values
λ = argmin{E(Fλ }),
λ1 λ2 λ3 λ4 ...
E(Fλ1 ) E(Fλ2 ) E(Fλ3 ) E(Fλ4 ) ...
Gaussian Process Regression based on {λi , Fλi }
a continuous value of λ is generated
33. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Model Selection
The Piecewise Autoregressive model
AR process: Xt = γ + φ1 Xt−1 + . . . + φp Xt−p + t
The model Parameters for
piecewise AR
Number of segments m
Breakpoints
location/segment size
(nj )j=1...m
AR orders.(pj )j=1...m Segment 1, 0 < t ≤ 512:
AR parameters Xt = 0.9Xt−1 + t
Segment 2, 512 < t ≤ 768:
(Ψj )j=1...m Xt = 1.69Xt−1 − 0.81Xt−2 + t
Segment 3, 768 < t ≤ 1024:
Very large model space Xt = 1.32Xt−1 − 0.81Xt−2 + t
34. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Model Selection
Minimum Description Length model selection for PAR
[Davis, Lee, Rodriguez-Yam, J. American Statist. Assoc. 2006.]
The MDL principle: the best-fitting model is the one that produces
the shortest code length that completely describes the observed
data y
ˆ
CLF (y ) = CLF (F) + CLF (e|F)ˆ
ˆ
CLF (F): description of the model
ˆ
CLF (e|F) description the residuals - what is not explained by the
model
m+1 pj +2 n
CL = log m+(m+1) log n+ j=1 log pj + 2 log nj + 2j log(2πˆj2 )
σ
35. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Model Selection
Results on the workload processes
The amount of unterminated work in the system
Smoothed workload
difference
Typically low AR
models
Long segments
no. of segment segment smallest Ljung-Box
CE segment start end root abs. test on
[days] [days] value residuals
(p-value)
CE-A 18 158.91 196.53 1.5915 0.05
CE-B 19 109.61 160.65 2.1563 0.04
CE-C 17 104.86 149.31 5.5711 0.21
CE-D 27 151.39 190.16 1.1062 0.05
´
[T. Eltet˝ et al. Discovering Piecewise Linear Models of Grid Workload, CCGrid 2010]
o
36. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Model Selection
Model validation
PAR: Ljung-Box test - Stability: Bootstrapping
whiteness of the AR - stable breakpoints
residuals
37. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Model Selection
Model reconciliation – bootstrap aggregation
Outcome: a simple and robust model describing the essential
part of the workload process.
38. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Policy evaluation
Evaluation of the matchmaking scheduling policy
ART: Actual Response Time = queuing delay at the CE
ERT: Expected Response Time, copernican principle, gLite
Question: how good is the prediction?
Question: what is your definition of good predictor?
Root Mean Squared Error?
Close statistical distribution, at normal regime, in the tail?
Correlation of time series?
ROC (Receiver Operating Characteristic): cost-benefit relation
Heterogeneous data
39. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Policy evaluation
Evaluation of the matchmaking scheduling policy
Overall
The distributions are not
consistent
RMSE Atl. 7.94E4, Biom.
7.2E3
Correlation (subsampling
at 900s) is not convincing
40. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Policy evaluation
Evaluation of the matchmaking scheduling policy
`
A la BQP (Batch Queue
Predictor) How often does
the prediction lie within a
reasonable distance of the
actual? Modified because
BQP considers only upper
bounds
ERT is a classifier, the
classes are intervals of the
value range Intervals of
exponentially increasing
size
ROC: True Positive Rate
vs False Positive Rate
41. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Reinforcement learning for responsive grids
Reinforcement learning for ressource provisioning in grids
1
0.9
A multi-objective scheduling and 0.8
dimensioning problem 0.7
Probability
0.6 all data
atlas
Users: Differentiated QoS 0.5
0.4
biomed
Stakeholders: Fairness 0.3
0.2
Administrators: Utilization 0.1 0
10
1
10
2
10 10
3
Execution time [s]
4
10
5
10 10
6
Goals
Elastic resource provisioning: the context is Grids over Clouds
- Infrastructure as a Service (IaaS)
Realistic hypotheses: organized sharing and mutualization, no
central control
Autonomics: Model-free policies and configuration-free
implementations
42. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Reinforcement learning for responsive grids
Formalisation
The scheduling MDP
State: descriptive variables of a site (queue, cluster)
Action: descriptive variables of a job (VO, execution time)
The dimensioning MDP
Action: number of computing nodes to maintain in activity
Policy learning
sarsa algorithm
Continuous state-action space: Non linear regression of
Q : (s, a) → r
Neural Network and Echo State Network
43. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Reinforcement learning for responsive grids
The Rewards
The Responsiveness utility for job j is
execution timej
Wj = . (1)
execution timej + waiting timej
The Fairness utility for job j is
maxk (wk − Skj )+ ,
Fj = 1 − , (2)
M
where x+ = x if x > 0 and 0 otherwise, wk the target share of VO
k, and Skj the share received by VO k up to the election of job j
The Utilization reward Un at time Tn is
fn
Un = n (3)
k=0 Pk (Tk+1 − Tk )
where (T1 , . . . , TN ) are the instants of decision making, Pk the
number of processors allocated in the interval [Tk , Tk+1 ] for
1 ≤ n < N, and fn the sum of the execution times of jobs completed
at time Tn .
44. The grid ecosystem Grids and Autonomic Computing The Grid Observatory Learning grid models Model-free policies
Reinforcement learning for responsive grids
Experimental results on EGEE traces
1
0.95
0.9
0.85 1
0.98
CDF
0.8
0.96
0.75
0.94
EGEE−INTER
0.7 ORA−INTER−0.5 0.92
ORA−INTER−1.0
CDF
0.65 EST−INTER−0.5 0.9
EST−INTER−1.0
0.88
0.6 ELA−ORA−0.5
1 2 3 4 5
10 10 10 10 10 0.86 ELA−ORA−1.0
Queueing delay (sec)
ELA−EST−0.5
0.84 ELA−EST−1.0
Queuing delays - interactive jobs -Rigid 0.82
RIG−ORA−0.5
−3
RIG−ORA−1.0
x 10
4 0.8
1 2 3 4 5
ELA−ORA−0.5 − EGEE 10 10 10 10 10
3
Queueing delay (sec)
2
Queuing delays - interactive jobs - Elastic [J.
Fairshare Difference
1
0
−1
−2
Perez et al. JoGC 8/3 Sep. 2010]
−3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Arrival Times (sec) 6
x 10
Dynamics of the fairshare - All jobs - Rigid