SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
1
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
On-line, Non-Clairvoyant Optimization of
Workflow Activity Granularity on Grids
Rafael FERREIRA DA SILVA, Tristan GLATARD
University of Lyon, CNRS, INSERM, CREATIS
Villeurbanne, France
Frédéric DESPREZ
INRIA, University of Lyon, LIP, ENS Lyon
Lyon, France
Euro-Par 2013
August 26-30, 2013
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Task granularity
  Self-healing of workflow executions on grids
  Task granularity control process
  Experiments and results
  Conclusion
2
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Task granularity
  Self-healing of workflow executions on grids
  Task granularity control process
  Experiments and results
  Conclusion
3
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Context
  Virtual Imaging Platform (VIP)
  Medical imaging science-gateway
  Grid of ~180 sites (EGI – http://www.egi.eu)
  Significant usage
  452 registered users from 50 countries
  Consumed 472 CPU years from
August 2012 to July 2013
http://dirac.france-grilles.fr
4
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP consumption since August 2012
Workflow Execution
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
2. User launches
a simulation
3. MOTEUR generates
invocations
4. GASW generates
grid jobs
5. Jobs are submitted
to DIRAC
6. Pilot jobs are
submitted to EGI
1. Input data
upload
7. Pilot jobs
fetch grid jobs
8. Inputs download
10. Results upload
11. Download results
9. Execution
5
  Low performance of lightweight (a.k.a. fine-grained) tasks:
  High queuing times
  Communication overhead
Task Granularity
6
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
time
R1
R2
R3
t1
t2
t3
t4
t5
t1 t2
t3
t4
t5
Resources
lightweight tasks Lightweight task
executions are delayed
Group into coarse-grained tasks
reduces the cost of data transfers
when grouped tasks share input data,
and saves queuing time
Workflow Self-Healing
7
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Problem: costly manual operations
  Rescheduling tasks, restarting services or replicating data files
  In this work: task granularity in distributed workflows
  Objective: automated platform administration
  Autonomous detection of fine-grained tasks
  Perform appropriate set of actions
  Assumptions: online and non-clairvoyant
  Only partial information available
  Decisions must be fast
  Production conditions, no user activity and workloads prediction
General MAPE-K loop
8
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Incident 1
degree η = 0.8
Incident 2
degree η = 0.4
Incident 3
degree η = 0.1
level
1
level
2
level
3
Roulette wheel selection
Incident 1
Selected
Rule Confidence (ρ) ρxη
2 1 0.8 0.32
3  1 0.2 0.02
1  1	

 1.0 0.80
Association rules
for incident 1
Incident 2
Selected
Roulette wheel selection
based on association rules
Set of Actions
x2
level
1
level
2
level
3
level
1
level
2
level
3
€
=
ηi
ηjj=1
n
∑
event
(job completion and failures)
or
timeout
Monitoring Analysis
Execution Knowledge
Planning
Monitoring data
R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents
on distributed computing infrastructures, Future Generation Computer Systems
(FGCS), in press, 2013.
  Incident degrees are quantified in discrete incident levels
  Thresholds are determined from visual mode clustering
or K-means
Incident Levels and Actions
9
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
No actions are triggered Triggers a set of actions
Thresholds cluster platform
configurations into groups
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Task granularity
  Self-healing of workflow executions on grids
  Task granularity control process
  Experiments and results
  Conclusion
10
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Task execution
  Incident degree
Fineness control: degree
11
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
ηf = maxi∈[1,m]{ fi = di ⋅ ri}
€
di =
t
~
_ shared
t
~
_ shared + ni (t
~
− t
~
_ shared )
€
ri =
max j∈[1,ni ] qj
max j∈[1,ni ] qj + t
~
_ shared + ni(t
~
− t
~
_ shared )
Queued Time	

 Shared Input Data	

Other Input
Data	

Application Execution	

€
t
~
_ shared
€
t
€
qj
Median task phase durations
i = waiting task
n = number of waiting tasks
Fineness control: task estimation
  Estimation of task durations
  Job phases: setup  inputs download  execution  outputs upload
  Assumption: bag of tasks (all jobs have equal durations)
  Median-based estimation:
12
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Median duration
of jobs phases
Real job
duration
42s
300s
20s
?
42s
300s
400s*
15s
Estimated job
duration
50s
250s
400s
15s
completed
current
*: max(400s, 20s) = 400s
€
t
~
= 715s
€
t
~
i = 757s
Fineness control: levels and actions
13
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Levels: identified from the platform logs
  Actions
  Task grouping
  Grouped pairwise until
or the amount of waiting groups Q is smaller or equal
to the amount of running groups R
€
τf
Level 1
(no actions)
Level 2
action: task grouping
€
ηf ≤ τ f
  Levels  Incident degree
Coarseness control
14
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
ηc =
R
Q + R
€
τc = 0.5
time
R1
R2
R3
t1
t2
t3
t4
t5
t1
t2+t3
t4+t5
Resources
Tasks at t1
t2+t3
t4+t5
Loss of parallelism
  Non-stationary load
  Loss of parallelism
  Task-degrouping
t1 t2
Grouped tasks
at t2
De-group tasks
when R > Q
Workload for Case Studies
  Based on the workload of VIP
  January 2011 to April 2012
  Case Studies on:
  Pilot Jobs
  User accounting
  Task analysis
  Bag of tasks
  Workflows
112 users 2,941 workflow executions 680,988 tasks
338,989 completed
138,480 error
105,488 aborted
15,576 aborted replicas
48,293 stalled
34,162 queued
339,545 pilot jobs
15
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, 	

user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM 	

Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Task granularity
  Self-healing of workflow executions on grids
  Task granularity control process
  Experiments and results
  Conclusion
16
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Experiment Conditions
17
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Experiment 1
  Evaluate the fineness control process under stationary load
  Experiment 2
  Evaluate the de-grouping control process under non-stationary load
  Workflows characteristics
18
Results: stationary load
18
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Fineness yields significant makespan reduction for all repetitions
19
Results: stationary load (2)
19
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Task grouping speed-ups
SimuBloch and FIELD-II
up to a factor of 2.6, and
PET-SORTEO/emission up
to a factor of 2.5
Not able to group all SimuBloch tasks in a single group because 2
tasks must be completed for the task estimation process
20
Results: non-stationary load
20
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Resources appear progressively Resources appear suddenly
Speeds up executions up to a factor of 1.5 for
Fineness, and 2.1 for Fineness-Coarseness
Fineness is penalized by its lack of
adaptation: slowdown of 20%
21
Results: non-stationary load (2)
21
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Linear correlation coefficient between the makespan and the
average queuing time is 0.91, which indicates they are correlated
Outline
  Context
  The Virtual Imaging Platform
  Problem definition
  Task granularity
  Self-healing of workflow executions on grids
  Task granularity control process
  Experiments and results
  Conclusion
22
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Concluding remarks
23
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  Context
  Autonomous handling of unfairness among workflow executions
  No strong assumptions on resource characteristics and workload
  Summary of the proposed method
  Implements a generic MAPE-K loop
  Determines task fineness based on queue waiting time and estimated
data transfer time of shared input data
  Tasks are grouped pairwise as long as Q > R, and tasks are too fine
  Tasks are ungrouped when the number of available resources increases
  Optimizing task granularity
  Properly detects and handles lightweight tasks
  Stationary load: fineness control significantly reduces the makespan of
all applications
  Non-stationary load: de-grouping algorithm compensates lack of
adaptation of task grouping
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Thank you for your attention.
Questions?
Rafael FERREIRA DA SILVA, Tristan GLATARD
University of Lyon, CNRS, INSERM, CREATIS
Villeurbanne, France
Frédéric DESPREZ
INRIA, University of Lyon, LIP, ENS Lyon
Lyon, France
On-line, Non-Clairvoyant Optimization of
Workflow Activity Granularity on Grids
Acknowledgments:
VIP users and project members
French National Agency for Research (ANR-09-COSI-03, ANR-11-LABX-0063)
EC FP7 Programme (312579 ER-flow)
European Grid Initiative (EGI)
France-Grilles

Contenu connexe

En vedette

A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...Rafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Rafael Ferreira da Silva
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceRafael Ferreira da Silva
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsRafael Ferreira da Silva
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Human origins (Darwin theory,gentic engineering)
Human origins (Darwin theory,gentic engineering)Human origins (Darwin theory,gentic engineering)
Human origins (Darwin theory,gentic engineering)Zeeshan Sajid
 
Ics Isac Overview V0.1pub
Ics Isac   Overview V0.1pubIcs Isac   Overview V0.1pub
Ics Isac Overview V0.1pubbradblask
 
Digiped facebook 2.0
Digiped facebook 2.0Digiped facebook 2.0
Digiped facebook 2.0rajjnori
 
Monestir de poblet
Monestir de pobletMonestir de poblet
Monestir de pobletJose Marcos
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 

En vedette (12)

A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution service
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Human origins (Darwin theory,gentic engineering)
Human origins (Darwin theory,gentic engineering)Human origins (Darwin theory,gentic engineering)
Human origins (Darwin theory,gentic engineering)
 
Ics Isac Overview V0.1pub
Ics Isac   Overview V0.1pubIcs Isac   Overview V0.1pub
Ics Isac Overview V0.1pub
 
Digiped facebook 2.0
Digiped facebook 2.0Digiped facebook 2.0
Digiped facebook 2.0
 
Monestir de poblet
Monestir de pobletMonestir de poblet
Monestir de poblet
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 

Similaire à On-line, non-clairvoyant optimization of workflow activity granularity task on grids

Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsBahram Lavi
 
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration EnvironmentValidating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration Environmentstreamspotter
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialTEST Huddle
 
Exploring Failure Transparency and the Limits of Generic Recovery
Exploring Failure Transparency and the Limits of Generic RecoveryExploring Failure Transparency and the Limits of Generic Recovery
Exploring Failure Transparency and the Limits of Generic RecoveryMiro Cupak
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsRafael Ferreira da Silva
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Multi-Perspective Comparison of Business Processes Variants Based on Event Logs
Multi-Perspective Comparison of Business Processes Variants Based on Event LogsMulti-Perspective Comparison of Business Processes Variants Based on Event Logs
Multi-Perspective Comparison of Business Processes Variants Based on Event LogsMarlon Dumas
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Sw metrics for regression testing
Sw metrics for regression testingSw metrics for regression testing
Sw metrics for regression testingJyotsna Sharma
 
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...vschiavoni
 
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...Atlassian
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...Anis Nasir
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approachDerek Chang
 
Overcoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemOvercoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemQAware GmbH
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018Gourab Nath
 

Similaire à On-line, non-clairvoyant optimization of workflow activity granularity task on grids (20)

Fast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance SystemsFast Person Re-Identification for Intelligent Video Surveillance Systems
Fast Person Re-Identification for Intelligent Video Surveillance Systems
 
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration EnvironmentValidating Procedural Knowledge in the Open Virtual Collaboration Environment
Validating Procedural Knowledge in the Open Virtual Collaboration Environment
 
Otto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement PotentialOtto Vinter - Analysing Your Defect Data for Improvement Potential
Otto Vinter - Analysing Your Defect Data for Improvement Potential
 
Exploring Failure Transparency and the Limits of Generic Recovery
Exploring Failure Transparency and the Limits of Generic RecoveryExploring Failure Transparency and the Limits of Generic Recovery
Exploring Failure Transparency and the Limits of Generic Recovery
 
Innoslate 4.5 and Sopatra
Innoslate 4.5 and SopatraInnoslate 4.5 and Sopatra
Innoslate 4.5 and Sopatra
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Multi-Perspective Comparison of Business Processes Variants Based on Event Logs
Multi-Perspective Comparison of Business Processes Variants Based on Event LogsMulti-Perspective Comparison of Business Processes Variants Based on Event Logs
Multi-Perspective Comparison of Business Processes Variants Based on Event Logs
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
Sw metrics for regression testing
Sw metrics for regression testingSw metrics for regression testing
Sw metrics for regression testing
 
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
Shielding Federated Learning Systems against Inference Attacks with ARM Trust...
 
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
 
incident analysis - procedure and approach
incident analysis - procedure and approachincident analysis - procedure and approach
incident analysis - procedure and approach
 
Overcoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystemOvercoming (organizational) scalability issues in your Prometheus ecosystem
Overcoming (organizational) scalability issues in your Prometheus ecosystem
 
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
 
Esem15.ppt
Esem15.pptEsem15.ppt
Esem15.ppt
 
What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018What if computers invigilate examinations - Cypher 2018
What if computers invigilate examinations - Cypher 2018
 

Plus de Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Rafael Ferreira da Silva
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Rafael Ferreira da Silva
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsRafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchRafael Ferreira da Silva
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsRafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCRafael Ferreira da Silva
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Rafael Ferreira da Silva
 

Plus de Rafael Ferreira da Silva (14)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
 

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

On-line, non-clairvoyant optimization of workflow activity granularity task on grids

  • 1. 1 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr On-line, Non-Clairvoyant Optimization of Workflow Activity Granularity on Grids Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France Euro-Par 2013 August 26-30, 2013
  • 2. Outline   Context   The Virtual Imaging Platform   Problem definition   Task granularity   Self-healing of workflow executions on grids   Task granularity control process   Experiments and results   Conclusion 2 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 3. Outline   Context   The Virtual Imaging Platform   Problem definition   Task granularity   Self-healing of workflow executions on grids   Task granularity control process   Experiments and results   Conclusion 3 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 4. Context   Virtual Imaging Platform (VIP)   Medical imaging science-gateway   Grid of ~180 sites (EGI – http://www.egi.eu)   Significant usage   452 registered users from 50 countries   Consumed 472 CPU years from August 2012 to July 2013 http://dirac.france-grilles.fr 4 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr VIP consumption since August 2012
  • 5. Workflow Execution Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr 2. User launches a simulation 3. MOTEUR generates invocations 4. GASW generates grid jobs 5. Jobs are submitted to DIRAC 6. Pilot jobs are submitted to EGI 1. Input data upload 7. Pilot jobs fetch grid jobs 8. Inputs download 10. Results upload 11. Download results 9. Execution 5
  • 6.   Low performance of lightweight (a.k.a. fine-grained) tasks:   High queuing times   Communication overhead Task Granularity 6 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr time R1 R2 R3 t1 t2 t3 t4 t5 t1 t2 t3 t4 t5 Resources lightweight tasks Lightweight task executions are delayed Group into coarse-grained tasks reduces the cost of data transfers when grouped tasks share input data, and saves queuing time
  • 7. Workflow Self-Healing 7 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Problem: costly manual operations   Rescheduling tasks, restarting services or replicating data files   In this work: task granularity in distributed workflows   Objective: automated platform administration   Autonomous detection of fine-grained tasks   Perform appropriate set of actions   Assumptions: online and non-clairvoyant   Only partial information available   Decisions must be fast   Production conditions, no user activity and workloads prediction
  • 8. General MAPE-K loop 8 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Incident 1 degree η = 0.8 Incident 2 degree η = 0.4 Incident 3 degree η = 0.1 level 1 level 2 level 3 Roulette wheel selection Incident 1 Selected Rule Confidence (ρ) ρxη 2 1 0.8 0.32 3  1 0.2 0.02 1  1 1.0 0.80 Association rules for incident 1 Incident 2 Selected Roulette wheel selection based on association rules Set of Actions x2 level 1 level 2 level 3 level 1 level 2 level 3 € = ηi ηjj=1 n ∑ event (job completion and failures) or timeout Monitoring Analysis Execution Knowledge Planning Monitoring data R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents on distributed computing infrastructures, Future Generation Computer Systems (FGCS), in press, 2013.
  • 9.   Incident degrees are quantified in discrete incident levels   Thresholds are determined from visual mode clustering or K-means Incident Levels and Actions 9 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr No actions are triggered Triggers a set of actions Thresholds cluster platform configurations into groups
  • 10. Outline   Context   The Virtual Imaging Platform   Problem definition   Task granularity   Self-healing of workflow executions on grids   Task granularity control process   Experiments and results   Conclusion 10 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 11.   Task execution   Incident degree Fineness control: degree 11 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr € ηf = maxi∈[1,m]{ fi = di ⋅ ri} € di = t ~ _ shared t ~ _ shared + ni (t ~ − t ~ _ shared ) € ri = max j∈[1,ni ] qj max j∈[1,ni ] qj + t ~ _ shared + ni(t ~ − t ~ _ shared ) Queued Time Shared Input Data Other Input Data Application Execution € t ~ _ shared € t € qj Median task phase durations i = waiting task n = number of waiting tasks
  • 12. Fineness control: task estimation   Estimation of task durations   Job phases: setup  inputs download  execution  outputs upload   Assumption: bag of tasks (all jobs have equal durations)   Median-based estimation: 12 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Median duration of jobs phases Real job duration 42s 300s 20s ? 42s 300s 400s* 15s Estimated job duration 50s 250s 400s 15s completed current *: max(400s, 20s) = 400s € t ~ = 715s € t ~ i = 757s
  • 13. Fineness control: levels and actions 13 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Levels: identified from the platform logs   Actions   Task grouping   Grouped pairwise until or the amount of waiting groups Q is smaller or equal to the amount of running groups R € τf Level 1 (no actions) Level 2 action: task grouping € ηf ≤ τ f
  • 14.   Levels  Incident degree Coarseness control 14 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr € ηc = R Q + R € τc = 0.5 time R1 R2 R3 t1 t2 t3 t4 t5 t1 t2+t3 t4+t5 Resources Tasks at t1 t2+t3 t4+t5 Loss of parallelism   Non-stationary load   Loss of parallelism   Task-degrouping t1 t2 Grouped tasks at t2 De-group tasks when R > Q
  • 15. Workload for Case Studies   Based on the workload of VIP   January 2011 to April 2012   Case Studies on:   Pilot Jobs   User accounting   Task analysis   Bag of tasks   Workflows 112 users 2,941 workflow executions 680,988 tasks 338,989 completed 138,480 error 105,488 aborted 15,576 aborted replicas 48,293 stalled 34,162 queued 339,545 pilot jobs 15 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
  • 16. Outline   Context   The Virtual Imaging Platform   Problem definition   Task granularity   Self-healing of workflow executions on grids   Task granularity control process   Experiments and results   Conclusion 16 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 17. Experiment Conditions 17 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Experiment 1   Evaluate the fineness control process under stationary load   Experiment 2   Evaluate the de-grouping control process under non-stationary load   Workflows characteristics
  • 18. 18 Results: stationary load 18 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Fineness yields significant makespan reduction for all repetitions
  • 19. 19 Results: stationary load (2) 19 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Task grouping speed-ups SimuBloch and FIELD-II up to a factor of 2.6, and PET-SORTEO/emission up to a factor of 2.5 Not able to group all SimuBloch tasks in a single group because 2 tasks must be completed for the task estimation process
  • 20. 20 Results: non-stationary load 20 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Resources appear progressively Resources appear suddenly Speeds up executions up to a factor of 1.5 for Fineness, and 2.1 for Fineness-Coarseness Fineness is penalized by its lack of adaptation: slowdown of 20%
  • 21. 21 Results: non-stationary load (2) 21 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Linear correlation coefficient between the makespan and the average queuing time is 0.91, which indicates they are correlated
  • 22. Outline   Context   The Virtual Imaging Platform   Problem definition   Task granularity   Self-healing of workflow executions on grids   Task granularity control process   Experiments and results   Conclusion 22 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 23. Concluding remarks 23 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr   Context   Autonomous handling of unfairness among workflow executions   No strong assumptions on resource characteristics and workload   Summary of the proposed method   Implements a generic MAPE-K loop   Determines task fineness based on queue waiting time and estimated data transfer time of shared input data   Tasks are grouped pairwise as long as Q > R, and tasks are too fine   Tasks are ungrouped when the number of available resources increases   Optimizing task granularity   Properly detects and handles lightweight tasks   Stationary load: fineness control significantly reduces the makespan of all applications   Non-stationary load: de-grouping algorithm compensates lack of adaptation of task grouping
  • 24. Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr Thank you for your attention. Questions? Rafael FERREIRA DA SILVA, Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France On-line, Non-Clairvoyant Optimization of Workflow Activity Granularity on Grids Acknowledgments: VIP users and project members French National Agency for Research (ANR-09-COSI-03, ANR-11-LABX-0063) EC FP7 Programme (312579 ER-flow) European Grid Initiative (EGI) France-Grilles