New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Versioning for Workflow Evolution
1. Versioning for Workflow Evolution Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington Eran Chinthaka Withana, Beth Plale School of Informatics and Computing Indiana University, Bloomington, Indiana 3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010; Eran C. Withana
2. Workflow Evolution Computational Science Experiments Sequence of activities Set of configurable parameters and input data Produces outputs to be analyzed and evaluated further Evolution of Research Changes in research artifacts
3. Workflow Evolution Workflows as a good tool to track evolution of research Automate repeatable tasks in an efficient manner Algorithms & experimental procedures encoded in to workflows Tracking workflows tracks research too Tracking effects over time Provenance of data products Lineage of and the roots of errors and affected data products Comparing Results More than one research direction in a given experiment Comparing outputs from different paths of the research Attribution Attribution of credit based on who performed, who owns/created, who own data products Sharing and attribution of research can and should be an integral part of research Eg: Sub-modules from myexperiments.org Workflow Evolution Framework and versioning model Enables the management of knowledge encoded in workflow executions
4. Related Work Workflow evolution share a lot in common with provenance collection frameworks I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society. Existing evolution frameworks J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006. Evolution Data Models L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142 Versioning at different levels Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
5. Use Cases 1. Research Reproduction 2. Scientific Workflows In LEAD tracking namelist input files and visualizations Tracking activity binaries
6. Versioning Model Dimensions of workflow evolution Direct evolution occurs when a user of the workflow performs one of the following actions: Changes the flow and arrangements of the components within the system Changes the components within the workflow Changes inputs and/or output parameters or configuration parameters to different components within the workflow Contributions tracks components that are reused from a previous system Workflow Evolution Capturing Stages User explicitly saves the workflow User closes the workflow editor Execution of a workflow Warning: This granularity might not capture all edits
7. Trident Workbench Trident Registry Management Workflow Packages Design Trident Runtime Services Trident Registry Data Model Publish-Subscribe Blackboard Workbench Trident Data Model Monitor Data Access Layer Scientific Workflows Evolution Framework Administration Browser Versioning Model RegistryManagement WindowsWorkflowFoundation Local Storage Other Local/remote Versioning System Architecture within Trident Scientific workflow worbench Trident Evolution FrameworkArchitecture Trident Architecture
8. User View (within Trident) Workflow Evolution View Versioned Objects in Registry
9. Performance Evaluation Evaluation strategies Delta – difference between two consecutive versions Checkpointing - complete version saved after fixed number of version No Delta, No Checkpointing Each version saved as it is With Delta, No Checkpointing Delta with previous version With Delta, With Checkpointing Checkpointed after n versions Workflows used
14. Discussion "No delta, No Checkpointing" options performs poorly with respect to storage usage 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta outperforms both other options with respect to version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta Criteria for selecting object maintenance strategy size of data objects average changes for data objects between different versions of the same object response time to the user and the system Challenges in working with different types of artifacts
15. Future Work Dynamic strategy to adjust versioning technique depending on object properties Challenges Unavailability of visualization software Visualizing different types of data products, integrating other viz tools LEAD II Vortex2 Use case Tracking different WF Activity library versions