SlideShare une entreprise Scribd logo
1  sur  16
Versioning for Workflow Evolution Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington Eran Chinthaka Withana, Beth Plale                School of Informatics and Computing Indiana University, Bloomington, Indiana 3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010;  Eran C. Withana
Workflow Evolution Computational Science Experiments Sequence of activities Set of configurable parameters and input data Produces outputs to be analyzed and evaluated further Evolution of Research Changes in research artifacts
Workflow Evolution Workflows as a good tool to track evolution of research Automate repeatable tasks in an efficient manner Algorithms & experimental procedures encoded in to workflows Tracking workflows tracks research too Tracking effects over time Provenance of data products Lineage of and the roots of errors and affected data products Comparing Results More than one research direction in a given experiment Comparing outputs from different paths of the research Attribution Attribution of credit based on who performed, who owns/created, who own data products Sharing and attribution of research can and should be an integral part of research Eg: Sub-modules from myexperiments.org Workflow Evolution Framework and versioning model Enables the management of knowledge encoded in workflow executions
Related Work Workflow evolution share a lot in common with provenance collection frameworks I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society. Existing evolution frameworks J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006. Evolution Data Models L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142 Versioning at different levels Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999.  System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
Use Cases 1. Research Reproduction 2. Scientific Workflows In LEAD tracking namelist input files and visualizations Tracking activity binaries
Versioning Model Dimensions of workflow evolution Direct evolution occurs when a user of the workflow performs one of the following actions: Changes the flow and arrangements of the components within the system Changes the components within the workflow Changes inputs and/or output parameters or configuration parameters to different components within the workflow Contributions tracks components that are                                   reused from a previous system  Workflow Evolution Capturing Stages User explicitly saves the workflow User closes the workflow editor Execution of a workflow Warning: This granularity might not capture        all edits
Trident Workbench Trident Registry Management Workflow Packages Design Trident Runtime Services Trident Registry Data Model Publish-Subscribe Blackboard Workbench Trident Data Model Monitor Data Access Layer Scientific Workflows Evolution Framework Administration Browser Versioning Model RegistryManagement WindowsWorkflowFoundation Local Storage Other Local/remote Versioning System Architecture within Trident Scientific workflow worbench Trident Evolution FrameworkArchitecture Trident Architecture
User View (within Trident) Workflow Evolution View Versioned Objects in Registry
Performance Evaluation Evaluation strategies  Delta – difference between two consecutive versions Checkpointing  - complete version saved after fixed number of version No Delta, No Checkpointing Each version saved as it is With Delta, No Checkpointing Delta with previous version With Delta, With Checkpointing Checkpointed after n versions Workflows used
Performance Evaluation File Write Time                      O Workflow                                                                       M Workflow
Performance Evaluation Version Recovery Time                      O Workflow                                                                       M Workflow
Performance Evaluation Space Usage for a Version                      O Workflow                                                                       M Workflow
Performance Evaluation Data Retrieved per Version                      O Workflow                                                                       M Workflow
Discussion "No delta, No Checkpointing" options performs poorly with respect to storage usage  4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta outperforms both other options with respect to  version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta Criteria for selecting object maintenance strategy size of data objects average changes for data objects between different versions of the same object response time to the user and the system Challenges in working with different types of artifacts
Future Work Dynamic strategy to adjust versioning technique depending on object properties Challenges Unavailability of visualization software  Visualizing different types of data products, integrating other viz tools LEAD II Vortex2 Use case Tracking different WF Activity library versions
Thank You !!!                               Questions …?

Contenu connexe

Tendances

Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+Chris Anderson
 
Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101TIBCO Jaspersoft
 
scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...MANOJ H S
 
Back to [Jaspersoft] Basics: Dashboards 101
Back to [Jaspersoft] Basics:  Dashboards 101Back to [Jaspersoft] Basics:  Dashboards 101
Back to [Jaspersoft] Basics: Dashboards 101TIBCO Jaspersoft
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
SQL Reporting Services
SQL Reporting ServicesSQL Reporting Services
SQL Reporting Servicesneha mittal
 
Work with data in ASP.NET
Work with data in ASP.NETWork with data in ASP.NET
Work with data in ASP.NETPeter Gfader
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azureDotNetCampus
 
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Mark Kromer
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMark Kromer
 
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Timothy McAliley
 
Tech UG - Newcastle 09-17 - logic apps
Tech UG - Newcastle 09-17 -   logic appsTech UG - Newcastle 09-17 -   logic apps
Tech UG - Newcastle 09-17 - logic appsMichael Stephenson
 
Scalable Service Architectures
Scalable Service ArchitecturesScalable Service Architectures
Scalable Service ArchitecturesZoltán Németh
 
Scalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaSScalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaSshreyank byadagi
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesAntonios Chatzipavlis
 

Tendances (20)

Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+
 
Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101
 
scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...
 
Back to [Jaspersoft] Basics: Dashboards 101
Back to [Jaspersoft] Basics:  Dashboards 101Back to [Jaspersoft] Basics:  Dashboards 101
Back to [Jaspersoft] Basics: Dashboards 101
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
SQL Reporting Services
SQL Reporting ServicesSQL Reporting Services
SQL Reporting Services
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Work with data in ASP.NET
Work with data in ASP.NETWork with data in ASP.NET
Work with data in ASP.NET
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azure
 
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
 
Tech UG - Newcastle 09-17 - logic apps
Tech UG - Newcastle 09-17 -   logic appsTech UG - Newcastle 09-17 -   logic apps
Tech UG - Newcastle 09-17 - logic apps
 
Sql Azure - Adi Cohn
Sql Azure - Adi CohnSql Azure - Adi Cohn
Sql Azure - Adi Cohn
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Scalable Service Architectures
Scalable Service ArchitecturesScalable Service Architectures
Scalable Service Architectures
 
Scalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaSScalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaS
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Microsoft for Your Data
Microsoft for Your DataMicrosoft for Your Data
Microsoft for Your Data
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
 

Similaire à Versioning for Workflow Evolution

eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmentAndrea Wiggins
 
Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachSteve Lange
 
Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Steve Lange
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)Stian Soiland-Reyes
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Masterreachtimsq
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Startup Club
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013anpawlik
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchAndrea Wiggins
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework DesignsSauce Labs
 
Team Foundation Server 2010 - Overview
Team Foundation Server 2010 - OverviewTeam Foundation Server 2010 - Overview
Team Foundation Server 2010 - OverviewSteve Lange
 
Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016Sebrina Malone
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)Stian Soiland-Reyes
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 

Similaire à Versioning for Workflow Evolution (20)

eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
 
Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
 
Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Vsts
VstsVsts
Vsts
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Master
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Stat 5
Stat 5Stat 5
Stat 5
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework Designs
 
Team Foundation Server 2010 - Overview
Team Foundation Server 2010 - OverviewTeam Foundation Server 2010 - Overview
Team Foundation Server 2010 - Overview
 
Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 

Plus de Eran Chinthaka Withana

Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Eran Chinthaka Withana
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundationEran Chinthaka Withana
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsEran Chinthaka Withana
 
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Eran Chinthaka Withana
 
Usage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsUsage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsEran Chinthaka Withana
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantEran Chinthaka Withana
 

Plus de Eran Chinthaka Withana (9)

Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
 
Cassandra At Wize Commerce
Cassandra At Wize CommerceCassandra At Wize Commerce
Cassandra At Wize Commerce
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundation
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
 
Usage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsUsage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in Clouds
 
Web Services in the Real World
Web Services in the Real WorldWeb Services in the Real World
Web Services in the Real World
 
Axis2 Landscape
Axis2 LandscapeAxis2 Landscape
Axis2 Landscape
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Versioning for Workflow Evolution

  • 1. Versioning for Workflow Evolution Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington Eran Chinthaka Withana, Beth Plale School of Informatics and Computing Indiana University, Bloomington, Indiana 3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010; Eran C. Withana
  • 2. Workflow Evolution Computational Science Experiments Sequence of activities Set of configurable parameters and input data Produces outputs to be analyzed and evaluated further Evolution of Research Changes in research artifacts
  • 3. Workflow Evolution Workflows as a good tool to track evolution of research Automate repeatable tasks in an efficient manner Algorithms & experimental procedures encoded in to workflows Tracking workflows tracks research too Tracking effects over time Provenance of data products Lineage of and the roots of errors and affected data products Comparing Results More than one research direction in a given experiment Comparing outputs from different paths of the research Attribution Attribution of credit based on who performed, who owns/created, who own data products Sharing and attribution of research can and should be an integral part of research Eg: Sub-modules from myexperiments.org Workflow Evolution Framework and versioning model Enables the management of knowledge encoded in workflow executions
  • 4. Related Work Workflow evolution share a lot in common with provenance collection frameworks I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society. Existing evolution frameworks J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006. Evolution Data Models L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142 Versioning at different levels Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
  • 5. Use Cases 1. Research Reproduction 2. Scientific Workflows In LEAD tracking namelist input files and visualizations Tracking activity binaries
  • 6. Versioning Model Dimensions of workflow evolution Direct evolution occurs when a user of the workflow performs one of the following actions: Changes the flow and arrangements of the components within the system Changes the components within the workflow Changes inputs and/or output parameters or configuration parameters to different components within the workflow Contributions tracks components that are reused from a previous system Workflow Evolution Capturing Stages User explicitly saves the workflow User closes the workflow editor Execution of a workflow Warning: This granularity might not capture all edits
  • 7. Trident Workbench Trident Registry Management Workflow Packages Design Trident Runtime Services Trident Registry Data Model Publish-Subscribe Blackboard Workbench Trident Data Model Monitor Data Access Layer Scientific Workflows Evolution Framework Administration Browser Versioning Model RegistryManagement WindowsWorkflowFoundation Local Storage Other Local/remote Versioning System Architecture within Trident Scientific workflow worbench Trident Evolution FrameworkArchitecture Trident Architecture
  • 8. User View (within Trident) Workflow Evolution View Versioned Objects in Registry
  • 9. Performance Evaluation Evaluation strategies Delta – difference between two consecutive versions Checkpointing - complete version saved after fixed number of version No Delta, No Checkpointing Each version saved as it is With Delta, No Checkpointing Delta with previous version With Delta, With Checkpointing Checkpointed after n versions Workflows used
  • 10. Performance Evaluation File Write Time O Workflow M Workflow
  • 11. Performance Evaluation Version Recovery Time O Workflow M Workflow
  • 12. Performance Evaluation Space Usage for a Version O Workflow M Workflow
  • 13. Performance Evaluation Data Retrieved per Version O Workflow M Workflow
  • 14. Discussion "No delta, No Checkpointing" options performs poorly with respect to storage usage 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta outperforms both other options with respect to version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta Criteria for selecting object maintenance strategy size of data objects average changes for data objects between different versions of the same object response time to the user and the system Challenges in working with different types of artifacts
  • 15. Future Work Dynamic strategy to adjust versioning technique depending on object properties Challenges Unavailability of visualization software Visualizing different types of data products, integrating other viz tools LEAD II Vortex2 Use case Tracking different WF Activity library versions
  • 16. Thank You !!! Questions …?