SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
WorkflowSim: A Toolkit for Simulating
  Scientific Workflows in Distributed
             Environments
         Weiwei Chen, Ewa Deelman
Outline
§  Introduction
  –  Scientific workflows?
  –  Distributed environments?
§  Challenge
  –  Large scale, system overhead
§  Solution
  –  Workflow Overhead and Failure Model
§  Validation and Application
  –  Overhead Robustness
  –  Fault Tolerant Clustering

                           2
Scientific Applications
§  Scientists often need to
  –  Integrate diverse components and data
  –  Automate data processing steps
  –  Reproduce/analyze/share previous results
  –  Track the provenance of data products
  –  Execute processing steps efficiently
  –  Reliably execute applications

       Scientific Workflows provide
       solutions to these problems


                         3
Scientific Workflows
§  DAG model (Directed Acyclic Graph)
  –  Node: computational activities
  –  Directed edge: data dependencies
  –  Task: a process that users would like to execute
  –  Job: a single unit for execution with one of more tasks
  –  Task clustering: the process of grouping tasks to jobs




                             4
Workflow Management System
§  Common Features of WMS
  –  Maps abstract workflows to executable workflows
  –  Handles data dependencies
  –  Replica selection, transfers, registration, cleanup
  –  Task clustering, workflow partitioning, scheduling
  –  Reliability and fault tolerance
  –  Monitoring and troubleshooting
§  Existing WMS
  –  Pegasus, Askalon, Taverna, Kepler, Triana



                           5
Workflow Simulation
§  Benefits
  –  Save efforts in system setup, execution
  –  Repeat experimental results
  –  Control system environments (failures)
§  Trace based Workflow Simulation
  –  Import trace from a completed execution
  –  Vary workflow structures and system environments
§  Challenges (CloudSim, GridSim, etc.)
  –  System overhead and failures
  –  Multiple levels of computational activities(task/job)
  –  A hierarchy of management components
                           6
Comparison
                          CloudSim           WorkflowSim
                         Task, Bag of
  Execution Model                           Task, Job, DAG
                            Tasks
Failure and Monitoring        No                 Yes

                                          Data Transfer Delay
                         Data Transfer
  Overhead Model                         Workflow Engine Delay
                            Delay
                                          Clustering Delay …

    Site selection          Single             Multiple
                                         Scheduling, Job retry
    Optimization
                          Scheduling         Clustering,
    Techniques
                                            Partitioning …
     WorkflowSim is an extension of
     CloudSim, but it is workflow aware
                             7
System Architecture
§  Submit Host
  –  Workflow Mapper
  –  Clustering Engine
  –  Workflow Engine
  –  Local Scheduler
§  Execution Site
  –  Remote Scheduler
  –  Worker Nodes
  –  Failure Generator
  –  Failure Monitor

                         8
Workflow Overhead
–  Workflow Engine       –  Postscript Delay
   Delay                 –  Clustering Delay
–  Queue Delay           –  Data Transfer Delay




                     9
Example: Clustering Delay
                         n is the number of tasks per
                         level. k is number of jobs per
                         level.
                          !"#$%&'()*!!"#$%|!!! ! ! !
                                              =   =
                          !"#$%&'()*!!"#$%|!!! ! ! !



                         mProjectPP, mDiffFit, and
                         mBackground are the major
                         jobs of Montage

Overhead is not a constant variable.
It has diverse distribution and patterns

                 10
Validation
           !"#$%&'#$!!"#$%&&!!"#$%&'   Ideal Case: Accuracy=1.0
!!!"#$!% =
             !"#$!!"#$%&&!!"#$%&'      k: Maximum jobs per level




                                        Overheads have
                                        the biggest impact




                               11
Application: Overhead Robustness
§  Overhead robustness: the influence of
    overheads on the workflow runtime for DAG
    scheduling heuristics.
§  Inaccurate estimation (under- or over-estimated)
    of workflow overheads influences the overall
    runtime of workflows.
§  Research Merits:
     –  Sensitivity of heuristics
     –  Overhead friendly heuristics



                         12
Application: Overhead Robustness
§  Increase or Reduce Overhead by a Factor
    (Weight)
  –  Under estimation and over estimation
§  Heuristics Evaluated
  –  FCFS: First Come First Serve
  –  MCT: Minimum Completion Time
  –  MinMin: The job with the minimum completion time is
     selected and assigned to the fastest resource.
  –  MaxMin: The job with the maximum completion time
     and assigns it to its best available resource.



                           13
Application: Overhead Robustness
                           MCT<MinMin MCT<MinMin
         MCT>MinMin

MCT<MinMin




    Accurate estimation of Clustering
    Delay is more important

                      14
Workflow Failure
§  Failures have significant influence on the
    performance
§  Classifying Transient Failures
   –  Task Failure: A task fails, other tasks within the
      same job may not fail
   –  Job Failure: A job fails, all of its tasks fail




                            15
System Architecture
§  Submit Host
  –  Job Retry
  –  Reclustering
§  Execution Site
  –  Failure Generator
  –  Failure Monitor




                         16
Application: Fault Tolerant Clustering
§  Task clustering can reduce execution overhead
§  A job composed of multiple tasks may have a
    greater risk of suffering from failures
§  Reclustering and Job Retry are proposed
  –  No Optimization (NOOP) retries the failed jobs.
  –  Dynamic Clustering (DC) decreases the clusters.size if
     the measured job failure rate is high.




                           17
Application: Fault Tolerant Clustering
–  Selective Reclustering (SR) retries the failed tasks in a job
–  Dynamic Reclustering (DR) retires the failed tasks in a job and also
   decreases the clusters.size if the measured job failure rate is high.




                               18
Performance
                       §  Reclustering (DR/DC/SR) reduces the influence of
                           failures significantly compared to NOOP
                       §  DR outperforms other techniques.
The lower the better




                                                 19
Conclusion
§  WorkflowSim assists researchers to evaluate
    their workflow optimization techniques with
    better accuracy and wider support.
  –  Modeling Overhead and Failures
  –  Distributed and Hierarchical components
  –  Workflow Techniques
§  It is necessary to consider both data
    dependencies, workflow failures and system
    overhead.



                           20
Acknowledgements
§    Pegasus Team: http://pegasus.isi.edu
§    FutureGrid & XSEDE
§    Funded by NSF grants IIS-0905032.
§    More info: wchen@isi.edu
§    Available on request
§    Q&A




                               21
Overhead Distribution




                   Montage




          22
Overhead Distribution




                   CyberShake




          23
Broadband




            24
Task Failure Model and Job Failure Model




                n                                       4d
      k* =                           −d + d 2 −
                r                                    ln(1− α )
                              k* =                               ,   if   n >> r
                                                2t
     *       (kt + d)
    ttotal
        =                    *   n(k *t + d)
              1− β           t =
                             total          *
                                 rk(1− α )k




                        25                               25
Related Papers


§    Integration of Workflow Partitioning and Resource Provisioning, CCGrid 2012.
§    Improving Scientific Workflow Performance using Policy Based Data
      Placement, IEEE Policy 2012.
§    Fault Tolerant Clustering in Scientific Workflows, SWF, IEEE Services 2012
§    Workflow Overhead Analysis and Optimizations, WORKS 2011.
§    Partitioning and Scheduling Workflows across Multiple Sites with Storage
      Constraints, PPAM 2011
§    Pegasus: a Framework for Mapping Complex Scientific Workflows onto
      Distributed Systems, Scientific Programming Journal 2005




                                          26

Contenu connexe

Tendances

Oops concepts || Object Oriented Programming Concepts in Java
Oops concepts || Object Oriented Programming Concepts in JavaOops concepts || Object Oriented Programming Concepts in Java
Oops concepts || Object Oriented Programming Concepts in JavaMadishetty Prathibha
 
LCFS - Storage Driver for Docker
LCFS - Storage Driver for DockerLCFS - Storage Driver for Docker
LCFS - Storage Driver for DockerFred Love
 
Java Exception Handling and Applets
Java Exception Handling and AppletsJava Exception Handling and Applets
Java Exception Handling and AppletsTanmoy Roy
 
Rulette : A pragmatic business rule management library
Rulette : A pragmatic business rule management libraryRulette : A pragmatic business rule management library
Rulette : A pragmatic business rule management libraryKislay Verma
 
Performance testing presentation
Performance testing presentationPerformance testing presentation
Performance testing presentationBelatrix Software
 
Manual and Automation notes.pdf
Manual and Automation notes.pdfManual and Automation notes.pdf
Manual and Automation notes.pdfsynamedia
 
Getting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafGetting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafMasatoshi Tada
 
Learn Java with Dr. Rifat Shahriyar
Learn Java with Dr. Rifat ShahriyarLearn Java with Dr. Rifat Shahriyar
Learn Java with Dr. Rifat ShahriyarAbir Mohammad
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java Abdelmonaim Remani
 
Classes, objects in JAVA
Classes, objects in JAVAClasses, objects in JAVA
Classes, objects in JAVAAbhilash Nair
 
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...Edureka!
 
Java exception handling ppt
Java exception handling pptJava exception handling ppt
Java exception handling pptJavabynataraJ
 

Tendances (20)

Oops concepts || Object Oriented Programming Concepts in Java
Oops concepts || Object Oriented Programming Concepts in JavaOops concepts || Object Oriented Programming Concepts in Java
Oops concepts || Object Oriented Programming Concepts in Java
 
Java Programming
Java ProgrammingJava Programming
Java Programming
 
LCFS - Storage Driver for Docker
LCFS - Storage Driver for DockerLCFS - Storage Driver for Docker
LCFS - Storage Driver for Docker
 
Arrays in Java
Arrays in Java Arrays in Java
Arrays in Java
 
Java Exception Handling and Applets
Java Exception Handling and AppletsJava Exception Handling and Applets
Java Exception Handling and Applets
 
Unit 7 Java
Unit 7 JavaUnit 7 Java
Unit 7 Java
 
Performance testing locust
Performance testing   locustPerformance testing   locust
Performance testing locust
 
Efficient Android Threading
Efficient Android ThreadingEfficient Android Threading
Efficient Android Threading
 
Rulette : A pragmatic business rule management library
Rulette : A pragmatic business rule management libraryRulette : A pragmatic business rule management library
Rulette : A pragmatic business rule management library
 
Unit 2 Java
Unit 2 JavaUnit 2 Java
Unit 2 Java
 
Performance testing presentation
Performance testing presentationPerformance testing presentation
Performance testing presentation
 
Manual and Automation notes.pdf
Manual and Automation notes.pdfManual and Automation notes.pdf
Manual and Automation notes.pdf
 
Getting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafGetting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with Thymeleaf
 
Learn Java with Dr. Rifat Shahriyar
Learn Java with Dr. Rifat ShahriyarLearn Java with Dr. Rifat Shahriyar
Learn Java with Dr. Rifat Shahriyar
 
Packages in java
Packages in javaPackages in java
Packages in java
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java
 
Classes, objects in JAVA
Classes, objects in JAVAClasses, objects in JAVA
Classes, objects in JAVA
 
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...
Java Classes | Java Tutorial for Beginners | Java Classes and Objects | Java ...
 
Java exception handling ppt
Java exception handling pptJava exception handling ppt
Java exception handling ppt
 
Threads in JAVA
Threads in JAVAThreads in JAVA
Threads in JAVA
 

Similaire à Workflowsim escience12

High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)camunda services GmbH
 
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...apidays
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Nane Kratzke
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliencyMasashi Narumoto
 
Single Page Applications with AngularJS 2.0
Single Page Applications with AngularJS 2.0 Single Page Applications with AngularJS 2.0
Single Page Applications with AngularJS 2.0 Sumanth Chinthagunta
 

Similaire à Workflowsim escience12 (20)

High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
 
IEEE CLOUD \'11
IEEE CLOUD \'11IEEE CLOUD \'11
IEEE CLOUD \'11
 
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...
apidays Paris 2022 - Of graphQL, DX friction, and surgical monolithectomy, Fr...
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Single Page Applications with AngularJS 2.0
Single Page Applications with AngularJS 2.0 Single Page Applications with AngularJS 2.0
Single Page Applications with AngularJS 2.0
 

Dernier

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Dernier (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Workflowsim escience12

  • 1. WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments Weiwei Chen, Ewa Deelman
  • 2. Outline §  Introduction –  Scientific workflows? –  Distributed environments? §  Challenge –  Large scale, system overhead §  Solution –  Workflow Overhead and Failure Model §  Validation and Application –  Overhead Robustness –  Fault Tolerant Clustering 2
  • 3. Scientific Applications §  Scientists often need to –  Integrate diverse components and data –  Automate data processing steps –  Reproduce/analyze/share previous results –  Track the provenance of data products –  Execute processing steps efficiently –  Reliably execute applications Scientific Workflows provide solutions to these problems 3
  • 4. Scientific Workflows §  DAG model (Directed Acyclic Graph) –  Node: computational activities –  Directed edge: data dependencies –  Task: a process that users would like to execute –  Job: a single unit for execution with one of more tasks –  Task clustering: the process of grouping tasks to jobs 4
  • 5. Workflow Management System §  Common Features of WMS –  Maps abstract workflows to executable workflows –  Handles data dependencies –  Replica selection, transfers, registration, cleanup –  Task clustering, workflow partitioning, scheduling –  Reliability and fault tolerance –  Monitoring and troubleshooting §  Existing WMS –  Pegasus, Askalon, Taverna, Kepler, Triana 5
  • 6. Workflow Simulation §  Benefits –  Save efforts in system setup, execution –  Repeat experimental results –  Control system environments (failures) §  Trace based Workflow Simulation –  Import trace from a completed execution –  Vary workflow structures and system environments §  Challenges (CloudSim, GridSim, etc.) –  System overhead and failures –  Multiple levels of computational activities(task/job) –  A hierarchy of management components 6
  • 7. Comparison CloudSim WorkflowSim Task, Bag of Execution Model Task, Job, DAG Tasks Failure and Monitoring No Yes Data Transfer Delay Data Transfer Overhead Model Workflow Engine Delay Delay Clustering Delay … Site selection Single Multiple Scheduling, Job retry Optimization Scheduling Clustering, Techniques Partitioning … WorkflowSim is an extension of CloudSim, but it is workflow aware 7
  • 8. System Architecture §  Submit Host –  Workflow Mapper –  Clustering Engine –  Workflow Engine –  Local Scheduler §  Execution Site –  Remote Scheduler –  Worker Nodes –  Failure Generator –  Failure Monitor 8
  • 9. Workflow Overhead –  Workflow Engine –  Postscript Delay Delay –  Clustering Delay –  Queue Delay –  Data Transfer Delay 9
  • 10. Example: Clustering Delay n is the number of tasks per level. k is number of jobs per level. !"#$%&'()*!!"#$%|!!! ! ! ! = = !"#$%&'()*!!"#$%|!!! ! ! ! mProjectPP, mDiffFit, and mBackground are the major jobs of Montage Overhead is not a constant variable. It has diverse distribution and patterns 10
  • 11. Validation !"#$%&'#$!!"#$%&&!!"#$%&' Ideal Case: Accuracy=1.0 !!!"#$!% = !"#$!!"#$%&&!!"#$%&' k: Maximum jobs per level Overheads have the biggest impact 11
  • 12. Application: Overhead Robustness §  Overhead robustness: the influence of overheads on the workflow runtime for DAG scheduling heuristics. §  Inaccurate estimation (under- or over-estimated) of workflow overheads influences the overall runtime of workflows. §  Research Merits: –  Sensitivity of heuristics –  Overhead friendly heuristics 12
  • 13. Application: Overhead Robustness §  Increase or Reduce Overhead by a Factor (Weight) –  Under estimation and over estimation §  Heuristics Evaluated –  FCFS: First Come First Serve –  MCT: Minimum Completion Time –  MinMin: The job with the minimum completion time is selected and assigned to the fastest resource. –  MaxMin: The job with the maximum completion time and assigns it to its best available resource. 13
  • 14. Application: Overhead Robustness MCT<MinMin MCT<MinMin MCT>MinMin MCT<MinMin Accurate estimation of Clustering Delay is more important 14
  • 15. Workflow Failure §  Failures have significant influence on the performance §  Classifying Transient Failures –  Task Failure: A task fails, other tasks within the same job may not fail –  Job Failure: A job fails, all of its tasks fail 15
  • 16. System Architecture §  Submit Host –  Job Retry –  Reclustering §  Execution Site –  Failure Generator –  Failure Monitor 16
  • 17. Application: Fault Tolerant Clustering §  Task clustering can reduce execution overhead §  A job composed of multiple tasks may have a greater risk of suffering from failures §  Reclustering and Job Retry are proposed –  No Optimization (NOOP) retries the failed jobs. –  Dynamic Clustering (DC) decreases the clusters.size if the measured job failure rate is high. 17
  • 18. Application: Fault Tolerant Clustering –  Selective Reclustering (SR) retries the failed tasks in a job –  Dynamic Reclustering (DR) retires the failed tasks in a job and also decreases the clusters.size if the measured job failure rate is high. 18
  • 19. Performance §  Reclustering (DR/DC/SR) reduces the influence of failures significantly compared to NOOP §  DR outperforms other techniques. The lower the better 19
  • 20. Conclusion §  WorkflowSim assists researchers to evaluate their workflow optimization techniques with better accuracy and wider support. –  Modeling Overhead and Failures –  Distributed and Hierarchical components –  Workflow Techniques §  It is necessary to consider both data dependencies, workflow failures and system overhead. 20
  • 21. Acknowledgements §  Pegasus Team: http://pegasus.isi.edu §  FutureGrid & XSEDE §  Funded by NSF grants IIS-0905032. §  More info: wchen@isi.edu §  Available on request §  Q&A 21
  • 22. Overhead Distribution Montage 22
  • 23. Overhead Distribution CyberShake 23
  • 24. Broadband 24
  • 25. Task Failure Model and Job Failure Model n 4d k* = −d + d 2 − r ln(1− α ) k* = , if n >> r 2t * (kt + d) ttotal = * n(k *t + d) 1− β t = total * rk(1− α )k 25 25
  • 26. Related Papers §  Integration of Workflow Partitioning and Resource Provisioning, CCGrid 2012. §  Improving Scientific Workflow Performance using Policy Based Data Placement, IEEE Policy 2012. §  Fault Tolerant Clustering in Scientific Workflows, SWF, IEEE Services 2012 §  Workflow Overhead Analysis and Optimizations, WORKS 2011. §  Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, PPAM 2011 §  Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming Journal 2005 26