SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
A Semantic-Based Approach to
Attain Reproducibility of
Computational Environments in
Scientific Workflows: A Case Study
Idafen Santana-Perez1, Rafael Ferreira da Silva2, Mats Rynge2
Ewa Deelman2, María S. Pérez-Hernández1, Oscar Corcho1
1Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
2Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA
REPPAR'14
1st International Workshop on Reproducibility in Parallel
Computing. Porto, August 2014
Index
•  Introduction
•  Reproducibility tools
•  WICUS
•  Pegasus & PRECIP
•  Reproducibility process
•  Annotation
•  Infrastructure Specification Algorithm
•  Use case
•  Conclusion & Future Work
2A Semantic-Based Approach to Attain Reproducibility...
Introduction
•  Experiments in empirical science
•  Primary component of the scientific method
•  Main method for validating a hypothesis
•  Repeatable procedure
•  Scientific publications
•  Announce a result
•  Convince readers that the result is correct
•  Computational science
•  In silico science
•  Computational scientific workflow: “a precise, executable
description of a scientific procedure” [De Roure, 2011]
•  “Reproducibility in principle underpins the scientific
method” [Goble,2012]
•  “Its about capturing, preserving, reusing and
curating” [Goble 2012]
3A Semantic-Based Approach to Attain Reproducibility...
Introduction
•  Reproducibility in Scientific Experiments
4A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Introduction
•  Reproducibility in Scientific Experiments
5A Semantic-Based Approach to Attain Reproducibility...
INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
CLOUD
Introduction
•  Reproducibility in Scientific Experiments
6A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
•  “Its about capturing, preserving, reusing and
curating” [Goble 2012]
CLOUD
Semantics
•  Reproducibility in Scientific Experiments
7A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
•  “Its about capturing, preserving, reusing and
curating” [Goble 2012]
Semantics
•  Vocabularies for documenting the main resources
involved on the execution of a WF.
•  Software
•  Hardware
•  Computational resources
•  Workflow
•  Increasing the understanding of the underlying
components
•  Making this knowledge explicit
•  Standard technology: RDF & OWL
•  Easy to extend and integrate
8A Semantic-Based Approach to Attain Reproducibility...
Semantics
•  WICUS ontology network
•  Workflow Infrastructure Conservation Using Semantics
•  http://purl.org/net/wicus
•  5 ontologies
•  WICUS Software Stack ontology
•  WICUS Hardware Specs ontology
•  WICUS Scientific Virtual Appliance ontology
•  WICUS Workflow Execution Requirements ontology
•  WICUS Ontology: links the previous ontologies
9A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
•  WICUS Software Stack ontology
•  http://purl.org/net/wicus-stack
10A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
•  WICUS Hardware Specs ontology
•  http://purl.org/net/wicus-hwspecs
11A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
12A Semantic-Based Approach to Attain Reproducibility...
•  WICUS Scientific Virtual Appliance ontology
•  http://purl.org/net/wicus-sva
WICUS ontology network
•  WICUS Workflow Execution Requirements ontology
•  http://purl.org/net/wicus-reqs
13A Semantic-Based Approach to Attain Reproducibility...
WICUS ontology network
•  WICUS ontology network
•  http://purl.org/net/wicus
14A Semantic-Based Approach to Attain Reproducibility...
CLOUD
PRECIP
•  Reproducibility in Scientific Experiments
15A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
•  “Its about capturing, preserving, reusing and
curating” [Goble 2012]
PRECIP
•  Pegasus Repeatable Experiments for the Clouds In
Python
•  Experiment management control API
•  Works with commercial and academic Clouds
•  Tag-based system
•  No need of pre-installed software on the VM image
•  Create VM, transfer files, run commands remotely
•  Linux
16A Semantic-Based Approach to Attain Reproducibility...
Pegasus
•  Pegasus WMS
•  Million-task workflows
•  Records
•  Data about execution
•  Intermediate results
•  Replica Catalog
•  DAX (Direct Acyclic Graph in XML)
•  Transformation Catalog
•  HTCondor for executing individual tasks
17A Semantic-Based Approach to Attain Reproducibility...
CLOUD
Reproducibility process
•  Reproducibility in Scientific Experiments
18A Semantic-Based Approach to Attain Reproducibility...
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
•  “Its about capturing, preserving, reusing and
curating” [Goble 2012]
Reproducibility process
19A Semantic-Based Approach to Attain Reproducibility...
Pegasus
Transf.
Catalog
DAX
xml DAX annotator
TC annotator
WF
Annot
SW
Comp
Catalog
WF &
Config
Annot
Inf. Spec.
Algorithm
SVA
Catalog
Precip
Script
1
4
2
5
6
7
8
9
WMS
Annot
3
Reproducibility process
•  Infrastructure Specification Algorithm
•  Goal: obtain an specification defining what VMs need to be
created, what software components must be deployed and
their configuration.
20A Semantic-Based Approach to Attain Reproducibility...
Reproducibility process
•  Infrastructure Specification Algorithm
21A Semantic-Based Approach to Attain Reproducibility...
GET WF
REQUIREMENTS
GET
<REQ,STACKS>
GET
<REQ,D-GRAPH>
GET AVAILABLE
SVA
GET
<SVA,STACKS>
CALCULATE
REQ-SVA
COMPATIBILITY
GET MAX
COMPATIBLE
REQ-SVA
CLEAN REQ
D-GRAPH
Infrastructure Specification Algorithm
22A Semantic-Based Approach to Attain Reproducibility...
Infrastructure Specification Algorithm
23A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
Infrastructure Specification Algorithm
24A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S15
Infrastructure Specification Algorithm
25A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S15
Infrastructure Specification Algorithm
26A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
27A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
28A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
29A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15
S15
S14
S14
Infrastructure Specification Algorithm
30A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15S15
S14
S15
Infrastructure Specification Algorithm
31A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15S15
S14
S15
Infrastructure Specification Algorithm
32A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S15
S13
S14
S13
S15S15
S14
S15
Infrastructure Specification Algorithm
33A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
34A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
35A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Infrastructure Specification Algorithm
36A Semantic-Based Approach to Attain Reproducibility...
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S15
S14
S15
Use case
•  Montage Workflow
•  Astronomy workflow
•  Construct large image mosaics of the sky
•  Montage Software distribution
•  59 binaries
•  Target IaaS Cloud Providers
•  Amazon EC2
•  FutureGrid
37A Semantic-Based Approach to Attain Reproducibility...
RO available at http://pegasus.isi.edu/publications/reppar
Use case
•  Goal
38A Semantic-Based Approach to Attain Reproducibility...
AWS
MONTAGE
WORKFLOW
ENVIRONMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
FG
Use case
•  Annotations
39A Semantic-Based Approach to Attain Reproducibility...
Use case
•  Annotations: Workflow
40A Semantic-Based Approach to Attain Reproducibility...
Use case
•  Annotations: Software
41A Semantic-Based Approach to Attain Reproducibility...
Use case
•  Annotations: C. Resources
42A Semantic-Based Approach to Attain Reproducibility...
Use case
•  Results
•  2 PRECIP scripts (AWS and FG): creates VM, deploys and
configure software and executes the WF.
•  Successfully executed
•  Same results as the original WF
43A Semantic-Based Approach to Attain Reproducibility...
Conclusions
•  Semantic modelling approach to conserve computational
resources
•  PRECIP for reproducing the execution environment on
the Cloud
•  WICUS annotations + PRECIP scripting capabilities
•  Apply those ideas to Montage on AWS EC2 and FG
•  Assume that the binaries are available
44A Semantic-Based Approach to Attain Reproducibility...
Future Work
•  Apply to other workflows
•  Improve the annotation process
•  Extend WICUS ontology network (new release coming
soon)
•  Software variants
•  Low level libraries
•  Incompatibilities
•  User policies
45A Semantic-Based Approach to Attain Reproducibility...
Questions
46Conservation of Scientific Workflow Infrastructures by Using Semantics

Contenu connexe

Tendances

Scientific
Scientific Scientific
Scientific marpierc
 
The ECP Exascale Computing Project
The ECP Exascale Computing ProjectThe ECP Exascale Computing Project
The ECP Exascale Computing Projectinside-BigData.com
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsKeiichiro Ono
 
The Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCThe Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCinside-BigData.com
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suitesmarru
 
Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Logilab
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with CytoscapeNetwork Visualization and Analysis with Cytoscape
Network Visualization and Analysis with CytoscapeAlexander Pico
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing Worldinside-BigData.com
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)inside-BigData.com
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
WEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality ForecastingWEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality Forecastingbutest
 
Avogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational ChemistryAvogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational ChemistryMarcus Hanwell
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceOpen Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceMarcus Hanwell
 

Tendances (20)

Scientific
Scientific Scientific
Scientific
 
The ECP Exascale Computing Project
The ECP Exascale Computing ProjectThe ECP Exascale Computing Project
The ECP Exascale Computing Project
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
 
The Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPCThe Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPC
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suite
 
Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with CytoscapeNetwork Visualization and Analysis with Cytoscape
Network Visualization and Analysis with Cytoscape
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
WEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality ForecastingWEKA: A Useful Tool for Air Quality Forecasting
WEKA: A Useful Tool for Air Quality Forecasting
 
Avogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational ChemistryAvogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational Chemistry
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceOpen Chemistry: Realizing Open Data, Open Standards, and Open Source
Open Chemistry: Realizing Open Data, Open Standards, and Open Source
 

Similaire à A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningPatrick Nicolas
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiDatabricks
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Big data analysing genomics and the bdg project
Big data   analysing genomics and the bdg projectBig data   analysing genomics and the bdg project
Big data analysing genomics and the bdg projectsree navya
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationEOSC-hub project
 

Similaire à A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study (20)

Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Big data analysing genomics and the bdg project
Big data   analysing genomics and the bdg projectBig data   analysing genomics and the bdg project
Big data analysing genomics and the bdg project
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Master thesis
Master thesisMaster thesis
Master thesis
 

Dernier

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Dernier (20)

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

  • 1. A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study Idafen Santana-Perez1, Rafael Ferreira da Silva2, Mats Rynge2 Ewa Deelman2, María S. Pérez-Hernández1, Oscar Corcho1 1Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain 2Univ. of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA REPPAR'14 1st International Workshop on Reproducibility in Parallel Computing. Porto, August 2014
  • 2. Index •  Introduction •  Reproducibility tools •  WICUS •  Pegasus & PRECIP •  Reproducibility process •  Annotation •  Infrastructure Specification Algorithm •  Use case •  Conclusion & Future Work 2A Semantic-Based Approach to Attain Reproducibility...
  • 3. Introduction •  Experiments in empirical science •  Primary component of the scientific method •  Main method for validating a hypothesis •  Repeatable procedure •  Scientific publications •  Announce a result •  Convince readers that the result is correct •  Computational science •  In silico science •  Computational scientific workflow: “a precise, executable description of a scientific procedure” [De Roure, 2011] •  “Reproducibility in principle underpins the scientific method” [Goble,2012] •  “Its about capturing, preserving, reusing and curating” [Goble 2012] 3A Semantic-Based Approach to Attain Reproducibility...
  • 4. Introduction •  Reproducibility in Scientific Experiments 4A Semantic-Based Approach to Attain Reproducibility... INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 5. Introduction •  Reproducibility in Scientific Experiments 5A Semantic-Based Approach to Attain Reproducibility... INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 6. CLOUD Introduction •  Reproducibility in Scientific Experiments 6A Semantic-Based Approach to Attain Reproducibility... FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT •  “Its about capturing, preserving, reusing and curating” [Goble 2012]
  • 7. CLOUD Semantics •  Reproducibility in Scientific Experiments 7A Semantic-Based Approach to Attain Reproducibility... FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT •  “Its about capturing, preserving, reusing and curating” [Goble 2012]
  • 8. Semantics •  Vocabularies for documenting the main resources involved on the execution of a WF. •  Software •  Hardware •  Computational resources •  Workflow •  Increasing the understanding of the underlying components •  Making this knowledge explicit •  Standard technology: RDF & OWL •  Easy to extend and integrate 8A Semantic-Based Approach to Attain Reproducibility...
  • 9. Semantics •  WICUS ontology network •  Workflow Infrastructure Conservation Using Semantics •  http://purl.org/net/wicus •  5 ontologies •  WICUS Software Stack ontology •  WICUS Hardware Specs ontology •  WICUS Scientific Virtual Appliance ontology •  WICUS Workflow Execution Requirements ontology •  WICUS Ontology: links the previous ontologies 9A Semantic-Based Approach to Attain Reproducibility...
  • 10. WICUS ontology network •  WICUS Software Stack ontology •  http://purl.org/net/wicus-stack 10A Semantic-Based Approach to Attain Reproducibility...
  • 11. WICUS ontology network •  WICUS Hardware Specs ontology •  http://purl.org/net/wicus-hwspecs 11A Semantic-Based Approach to Attain Reproducibility...
  • 12. WICUS ontology network 12A Semantic-Based Approach to Attain Reproducibility... •  WICUS Scientific Virtual Appliance ontology •  http://purl.org/net/wicus-sva
  • 13. WICUS ontology network •  WICUS Workflow Execution Requirements ontology •  http://purl.org/net/wicus-reqs 13A Semantic-Based Approach to Attain Reproducibility...
  • 14. WICUS ontology network •  WICUS ontology network •  http://purl.org/net/wicus 14A Semantic-Based Approach to Attain Reproducibility...
  • 15. CLOUD PRECIP •  Reproducibility in Scientific Experiments 15A Semantic-Based Approach to Attain Reproducibility... FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT •  “Its about capturing, preserving, reusing and curating” [Goble 2012]
  • 16. PRECIP •  Pegasus Repeatable Experiments for the Clouds In Python •  Experiment management control API •  Works with commercial and academic Clouds •  Tag-based system •  No need of pre-installed software on the VM image •  Create VM, transfer files, run commands remotely •  Linux 16A Semantic-Based Approach to Attain Reproducibility...
  • 17. Pegasus •  Pegasus WMS •  Million-task workflows •  Records •  Data about execution •  Intermediate results •  Replica Catalog •  DAX (Direct Acyclic Graph in XML) •  Transformation Catalog •  HTCondor for executing individual tasks 17A Semantic-Based Approach to Attain Reproducibility...
  • 18. CLOUD Reproducibility process •  Reproducibility in Scientific Experiments 18A Semantic-Based Approach to Attain Reproducibility... FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT •  “Its about capturing, preserving, reusing and curating” [Goble 2012]
  • 19. Reproducibility process 19A Semantic-Based Approach to Attain Reproducibility... Pegasus Transf. Catalog DAX xml DAX annotator TC annotator WF Annot SW Comp Catalog WF & Config Annot Inf. Spec. Algorithm SVA Catalog Precip Script 1 4 2 5 6 7 8 9 WMS Annot 3
  • 20. Reproducibility process •  Infrastructure Specification Algorithm •  Goal: obtain an specification defining what VMs need to be created, what software components must be deployed and their configuration. 20A Semantic-Based Approach to Attain Reproducibility...
  • 21. Reproducibility process •  Infrastructure Specification Algorithm 21A Semantic-Based Approach to Attain Reproducibility... GET WF REQUIREMENTS GET <REQ,STACKS> GET <REQ,D-GRAPH> GET AVAILABLE SVA GET <SVA,STACKS> CALCULATE REQ-SVA COMPATIBILITY GET MAX COMPATIBLE REQ-SVA CLEAN REQ D-GRAPH
  • 22. Infrastructure Specification Algorithm 22A Semantic-Based Approach to Attain Reproducibility...
  • 23. Infrastructure Specification Algorithm 23A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6
  • 24. Infrastructure Specification Algorithm 24A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S15
  • 25. Infrastructure Specification Algorithm 25A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S15
  • 26. Infrastructure Specification Algorithm 26A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15 S15 S14 S14
  • 27. Infrastructure Specification Algorithm 27A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15 S15 S14 S14
  • 28. Infrastructure Specification Algorithm 28A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15 S15 S14 S14
  • 29. Infrastructure Specification Algorithm 29A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15 S15 S14 S14
  • 30. Infrastructure Specification Algorithm 30A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15S15 S14 S15
  • 31. Infrastructure Specification Algorithm 31A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15S15 S14 S15
  • 32. Infrastructure Specification Algorithm 32A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S15 S13 S14 S13 S15S15 S14 S15
  • 33. Infrastructure Specification Algorithm 33A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S15 S14 S15
  • 34. Infrastructure Specification Algorithm 34A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S15 S14 S15
  • 35. Infrastructure Specification Algorithm 35A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S15 S14 S15
  • 36. Infrastructure Specification Algorithm 36A Semantic-Based Approach to Attain Reproducibility... S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S15 S14 S15
  • 37. Use case •  Montage Workflow •  Astronomy workflow •  Construct large image mosaics of the sky •  Montage Software distribution •  59 binaries •  Target IaaS Cloud Providers •  Amazon EC2 •  FutureGrid 37A Semantic-Based Approach to Attain Reproducibility... RO available at http://pegasus.isi.edu/publications/reppar
  • 38. Use case •  Goal 38A Semantic-Based Approach to Attain Reproducibility... AWS MONTAGE WORKFLOW ENVIRONMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT FG
  • 39. Use case •  Annotations 39A Semantic-Based Approach to Attain Reproducibility...
  • 40. Use case •  Annotations: Workflow 40A Semantic-Based Approach to Attain Reproducibility...
  • 41. Use case •  Annotations: Software 41A Semantic-Based Approach to Attain Reproducibility...
  • 42. Use case •  Annotations: C. Resources 42A Semantic-Based Approach to Attain Reproducibility...
  • 43. Use case •  Results •  2 PRECIP scripts (AWS and FG): creates VM, deploys and configure software and executes the WF. •  Successfully executed •  Same results as the original WF 43A Semantic-Based Approach to Attain Reproducibility...
  • 44. Conclusions •  Semantic modelling approach to conserve computational resources •  PRECIP for reproducing the execution environment on the Cloud •  WICUS annotations + PRECIP scripting capabilities •  Apply those ideas to Montage on AWS EC2 and FG •  Assume that the binaries are available 44A Semantic-Based Approach to Attain Reproducibility...
  • 45. Future Work •  Apply to other workflows •  Improve the annotation process •  Extend WICUS ontology network (new release coming soon) •  Software variants •  Low level libraries •  Incompatibilities •  User policies 45A Semantic-Based Approach to Attain Reproducibility...
  • 46. Questions 46Conservation of Scientific Workflow Infrastructures by Using Semantics