Slides from our presentation at the 1st International Workshop on Reproducibility in Parallel Computing (REPPAR'14) in conjunction with Euro-Par 2014 (August 25-29)
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study
1. A Semantic-Based Approach to
Attain Reproducibility of
Computational Environments in
Scientific Workflows: A Case Study
Idafen Santana-Perez1, Rafael Ferreira da Silva2, Mats Rynge2
Ewa Deelman2, María S. Pérez-Hernández1, Oscar Corcho1
1Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
2Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA
REPPAR'14
1st International Workshop on Reproducibility in Parallel
Computing. Porto, August 2014
2. Index
• Introduction
• Reproducibility tools
• WICUS
• Pegasus & PRECIP
• Reproducibility process
• Annotation
• Infrastructure Specification Algorithm
• Use case
• Conclusion & Future Work
2A Semantic-Based Approach to Attain Reproducibility...
3. Introduction
• Experiments in empirical science
• Primary component of the scientific method
• Main method for validating a hypothesis
• Repeatable procedure
• Scientific publications
• Announce a result
• Convince readers that the result is correct
• Computational science
• In silico science
• Computational scientific workflow: “a precise, executable
description of a scientific procedure” [De Roure, 2011]
• “Reproducibility in principle underpins the scientific
method” [Goble,2012]
• “Its about capturing, preserving, reusing and
curating” [Goble 2012]
3A Semantic-Based Approach to Attain Reproducibility...
4. Introduction
• Reproducibility in Scientific Experiments
4A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
5. Introduction
• Reproducibility in Scientific Experiments
5A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
6. CLOUD
Introduction
• Reproducibility in Scientific Experiments
6A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and
curating” [Goble 2012]
7. CLOUD
Semantics
• Reproducibility in Scientific Experiments
7A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and
curating” [Goble 2012]
8. Semantics
• Vocabularies for documenting the main resources
involved on the execution of a WF.
• Software
• Hardware
• Computational resources
• Workflow
• Increasing the understanding of the underlying
components
• Making this knowledge explicit
• Standard technology: RDF & OWL
• Easy to extend and integrate
8A Semantic-Based Approach to Attain Reproducibility...
15. CLOUD
PRECIP
• Reproducibility in Scientific Experiments
15A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and
curating” [Goble 2012]
16. PRECIP
• Pegasus Repeatable Experiments for the Clouds In
Python
• Experiment management control API
• Works with commercial and academic Clouds
• Tag-based system
• No need of pre-installed software on the VM image
• Create VM, transfer files, run commands remotely
• Linux
16A Semantic-Based Approach to Attain Reproducibility...
17. Pegasus
• Pegasus WMS
• Million-task workflows
• Records
• Data about execution
• Intermediate results
• Replica Catalog
• DAX (Direct Acyclic Graph in XML)
• Transformation Catalog
• HTCondor for executing individual tasks
17A Semantic-Based Approach to Attain Reproducibility...
18. CLOUD
Reproducibility process
• Reproducibility in Scientific Experiments
18A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
• “Its about capturing, preserving, reusing and
curating” [Goble 2012]
20. Reproducibility process
• Infrastructure Specification Algorithm
• Goal: obtain an specification defining what VMs need to be
created, what software components must be deployed and
their configuration.
20A Semantic-Based Approach to Attain Reproducibility...
21. Reproducibility process
• Infrastructure Specification Algorithm
21A Semantic-Based Approach to Attain Reproducibility...
GET WF
REQUIREMENTS
GET
<REQ,STACKS>
GET
<REQ,D-GRAPH>
GET AVAILABLE
SVA
GET
<SVA,STACKS>
CALCULATE
REQ-SVA
COMPATIBILITY
GET MAX
COMPATIBLE
REQ-SVA
CLEAN REQ
D-GRAPH
37. Use case
• Montage Workflow
• Astronomy workflow
• Construct large image mosaics of the sky
• Montage Software distribution
• 59 binaries
• Target IaaS Cloud Providers
• Amazon EC2
• FutureGrid
37A Semantic-Based Approach to Attain Reproducibility...
RO available at http://pegasus.isi.edu/publications/reppar
38. Use case
• Goal
38A Semantic-Based Approach to Attain Reproducibility...
AWS
MONTAGE
WORKFLOW
ENVIRONMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
FG
42. Use case
• Annotations: C. Resources
42A Semantic-Based Approach to Attain Reproducibility...
43. Use case
• Results
• 2 PRECIP scripts (AWS and FG): creates VM, deploys and
configure software and executes the WF.
• Successfully executed
• Same results as the original WF
43A Semantic-Based Approach to Attain Reproducibility...
44. Conclusions
• Semantic modelling approach to conserve computational
resources
• PRECIP for reproducing the execution environment on
the Cloud
• WICUS annotations + PRECIP scripting capabilities
• Apply those ideas to Montage on AWS EC2 and FG
• Assume that the binaries are available
44A Semantic-Based Approach to Attain Reproducibility...
45. Future Work
• Apply to other workflows
• Improve the annotation process
• Extend WICUS ontology network (new release coming
soon)
• Software variants
• Low level libraries
• Incompatibilities
• User policies
45A Semantic-Based Approach to Attain Reproducibility...