SlideShare une entreprise Scribd logo
1  sur  36
The role of annotation in
reproducibility
ESWC2014 Empirical workshop
26/05/2014
Contributors: my PhD students Olga Giraldo, Daniel
Garijo, and Idafen Santana, and the Wf4Ever team
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
Setting the context of this presentation
Our main assumption
“We are not so good at describing our
experiments, and this has a negative impact
in reproducibility (and understandability, and
conservation, and reconstruction)”
• Let’s see if this happens in different areas of scientific
research
• In vitro experiments in Plant Biology
• In silico experiments in several domains
• The challenge
• Let’s use annotation as a means to increase reproducibility
• Note: see the last slide on terminology
Ingredients for reproducibility
Ingredients for reproducibility
The role of laboratory protocols in Life Sciences
Laboratory
Protocols
http://mibbi.sourceforge.net/about.shtml
Laboratory protocols support
the scientific results
Laboratory Protocols
• Written in natural language
• Generally, presented in a “recipe” style
• Description of a sequence of operations
that include inputs and outputs
• Step-by-step descriptions of procedures
• A protocol is a type of workflow
• They must be described in sufficient and
unambiguous detail.
• To enable another agent (human or machine)
to replicate the original experiment.
• Specific journals: Biotechniques, CSH
protocols, Current protocols, GMR, Jove,
Protocol exchange, Plant methods, Plos One,
Springer protocols
Detailed instructions on journal’s guides for authors
And other useful elements, including ontologies
It maintains checklists that promote how to report an experiment.
It models the design of an
investigation. Including
protocols, instrumentation,
materials and data generated.
Aims to formalize
knowledge about the
organization, execution and
analysis of scientific
experiments.
EXPO
EXACT
It provides a model for the description of experiment actions.
Minimal information models, check lists, and even ontologies
However…
• Ambiguity is the norm
• Let’s make an analysis
on protocols written for
the plant biology
community
• Incubate the centrifuge tubes
in a water bath.
•Incubate the samples for 5 min
with gentle shaking.
• Rinse DNA briefly in 1-2 ml of
wash.
•Incubate at -20C overnight.
Protocol
Analysis of Laboratory Protocols
Repository Number of Protocols
Biotechniques 8
CSH protocols 11
Current protocols 25
GMR 4
Jove 21
Protocol exchange 12
Plant methods 10
Plos One 3
Springer protocols 5
Total 99
Minimal Information to Report a Laboratory Protocol
Our model
Ocurrence in
other models
TITLE 100%
AUTHOR 100%
INTRODUCTION
Purpose 89%
Provenance of the protocol 89%
Applications of the protocol 89%
Comparison with other protocols 89%
Limitations 89%
MATERIALS
Sample 100%
· strain or line genotype
· Developmental stage
· Organism part (tissue)
Laboratory consumables/supplies
· Laboratory consumable name 22%
· Manufacturer name 11%
· Laboratory consumable ID (catalog number) 11%
Buffer recipes
· Buffer name 67%
· Chemical compound name 67%
· Initial concentration of chemical compound 67%
· Final concentration or amount of chemical
compound
56%
· Storage conditions 56%
· Cautions 56%
· Hints 67%
Our model
Ocurrence in
other models
Reagent
· Reagent name 100%
· Reagent vendor or manufacturer 100%
· Reagent ID (catalog number) 100%
Kit
· Kit name 100%
· Kit vendor or manufacturer 100%
· Kit ID (catalog number) 56%
Primer
· Primer name 67%
· Primer sequence 89%
· Primer vendor or manufacturer 33%
Equipment
· Equipment name 67%
· Equipment vendor or Manufacturer 67%
· Equipment ID (catalog number) 67%
Software
· Software name 67%
· Software version 67%
METHODS/PROCEDURE
Protocol 100%
· Cautions 56%
· Critical steps 56%
· Pause point 33%
· Hints 22%
· Troubleshooting 44%
How to Formalize the Protocols?
• Incubate the centrifuge
tubes at 65°C in a water bath
for 10 min.
• Rinse DNA briefly in 1-2 ml
of wash.
•Incubate at -20C overnight.
Protocol
indicate different length of time
2 seconds?, 5-10 seconds?...
Object: centrifuge tubes, water bath
Unit of measure: 65C, 10 min.
Action: incubate.
SMARTProtocols ontology
• http://vocab.linkeddata.es/SMARTProtocols/
Currently working on protocol annotation
plant material
instrument name
manufacturer
Buffer recipe
Reagent name
Laboratory
consumable
name
Source: Biotechniques
Meta-information
about content
Content
Plant material Arabidopsis thaliana (rosette
leaves, flowers, siliques),… and
Larix decidua (young needles)
Instrument name Leitz DMRB microscope
manufacturer Leica Micro-systems
Buffer recipe 50 mM EDTA, 1.4% SDS
Reagent name 96% ethanol ~ absolute ethanol
Laboratory
consumable name
2-mL tube, zeolite beads
15
From the wet lab to our computers
Lab book
Digital
Log
Laboratory Protocol
(recipe)
Workflow
Experiment
Ingredients for reproducibility
Scientific Workflows
17
“Template defining the set of tasks needed
to carry out a computational experiment”
[1]
•Inputs
•Steps
•Intermediate results
•Outputs
•Data driven, usually represented as
Directed Acyclic Graphs (DAGs)
[1] Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor, Workflows and e-science: an overview of
workflow system features and capabilities, Future Generation Computer Systems 25 (5) (2009) 528–540.
18
Plenty of workflow tools and platforms: Taverna, Wings, LONI Pipeline
What do I want from these workflows and repositories?
19
• As a designer: Discovery
•Workflows with similar functionality fragments/methods
•Design based in previous templates.
• As user/reuser/reviewer: Understandability, Exploration
•Search workflows by functionality
•Commonalities between execution runs
•Component categorization
•Reproducibility
Workflow 1
Working on different aspects of workflow preservation
•Workflow representation
•Plan/template representation
•Provenance trace representation
•Link between templates and traces
•Creation of abstractions/motifs in scientific workflows
•Abstraction catalog
•Find how different workflows are
related
•Understandability and reuse of scientific workflows
•Relation between the
workflows involved in the
same experiment
(Research Objects)
20
CH1: Can we export an abstract template of the
method being represented?
CH2: How do we interoperate with other
workflow results?
CH3: How do we access the workflow results?
CH4: How do we link an abstract method with
several implementations?
CH5: How can we detect what are the
typical operations in scientific workflows?
CH6: How can we detect them
automatically?
CH7: Which workflow parts are related to other
workflows?
CH8: How do workflows depend on the other parts of the
experiments?
21
Overview
• Empirical analysis on 260 workflow templates from Taverna,
Wings, Galaxy and Vistrails
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs
•Understandability and reuse
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
Common motifs in scientific workflows: An empirical analysis. Garijo, D.; Alper, P.; Belhajjame, K.;
Corcho, O.; Gil, Y.; and Goble, C. Future Generation Computer Systems, . 2013
22
Approach
•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use
23
Motif Catalog
Data-Oriented Motifs (What?)
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
Workflow-Oriented Motifs (How?)
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
Ontology Purl: http://purl.org/net/wf-motifs
Macro abstraction detection
Problem statement:
Given a repository of workflow templates (either abstract or specific) or
workflow execution traces, what are the workflow fragments I can deduce
from it?
Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate
them)
•Finding relationships between workflows and sub-workflows.
•Most used fragments, most executed, etc.
•Systems like GenePattern, LONI Pipeline and Galaxy: (Many runs, nearly
no templates published)
•Proposing new templates with the popular fragments.
24
25
Common workflow fragment detection
[Holder et al 1994]: Substructure Discovery in the SUBDUE System L. B. Holder, D. J. Cook, and S.
Djoko. AAAI Workshop on Knowledge Discovery, pages 169-180, 1994.
•Given a collection of workflows, which are the most common fragments?
•Common sub-graphs among the collection
•Sub-graph isomorphism (NP-complete)
•We use subgraph mining algorithms
•Graph Grammar learning
•The rules of the grammar are the workflow fragments
•Graph based hierarchical clustering
•Each cluster corresponds to a workflow fragment
•Iterative algorithm with two measures for compressing the graph:
•Minimum Description Length (MDL)
•Size
26
Exporting the fragment results: Wf-FD model
http://purl.org/net/wf-fd
27
Exporting the fragment results: Wf-FD model
Ingredients for reproducibility
Preserving the infrastructure
http://vocab.linkeddata.es/wicus/
Working on different aspects of workflow preservation
•Workflow representation
•Plan/template representation
•Provenance trace representation
•Link between templates and traces
•Creation of abstractions/motifs in scientific workflows
•Abstraction catalog
•Find how different workflows are
related
•Understandability and reuse of scientific workflows
•Relation between the
workflows involved in the
same experiment
(Research Objects)
30
CH1: Can we export an abstract template of the
method being represented?
CH2: How do we interoperate with other
workflow results?
CH3: How do we access the workflow results?
CH4: How do we link an abstract method with
several implementations?
CH5: How can we detect what are the
typical operations in scientific workflows?
CH6: How can we detect them
automatically?
CH7: Which workflow parts are related to other
workflows?
CH8: How do workflows depend on the other parts of the
experiments?
31
What is a Research Object?
•Aggregation of resources that bundles together the
contents of a research work:
•Data
•Experiments
•Examples
•Bibliography
•Annotations
•Provenance
•ROs
•Etc.
http://www.researchobject.org/
Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. Belhajjame,
K.; Corcho, O.; Garijo, D.; Zhao, J.; Missier, P.; Newman, D.; Palma, R.; Bechhofer, S.; Garcıa, E.;
Manuel, .G. J.; Klyne, G.; Page, K.; Roos, M.; Ruiz, J. E.; Soiland-Reyes, S.; Verdes-Montenegro, L.;
De Roure, D.; and Goble, C. In Proceedings of the Second International Conference on the Future of
Scholarly Communication and Scientific Publishing Sepublica2012, page 1-12, Hersonissos, 2012
ROHub and rohub.linkeddata.es
http://www.rohub.org/rodl/ http://rohub.linkeddata.es/
Workflow (and RO) Preservation Checklists
Acknowledgements
34
:collaboratesWith
:collaboratesWith
:collaboratesWith
:collaboratesWith
:supervises
:supervises
:yolandGil
:khalidBelhajjame
:varunRatnakar
:caroleGoble
:pinarAlper
:danielGarijo
:collaboratesWith
:collaboratesWith
:idafenSantana
:olgaGiraldo
Laboratory Protocols
Wf Infrastructure
:supervises
:oscarCorcho
OEG
The role of annotation in
reproducibility
ESWC2014 Empirical workshop
26/05/2014
Contributors: my PhD students Olga Giraldo, Daniel
Garijo, and Idafen Santana, and the Wf4Ever team
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
A final note on terminology
Source: Idafen Santana; Inspired by [Goble, 2012]

Contenu connexe

Tendances

A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
María Poveda Villalón
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
Sebastian Hellmann
 

Tendances (19)

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
ROHub
ROHubROHub
ROHub
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
A Reuse-based Lightweight Method for Developing Linked Data Ontologies and Vo...
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
SMART Protocols in LISC-2014
SMART Protocols in LISC-2014 SMART Protocols in LISC-2014
SMART Protocols in LISC-2014
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
 
4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)
 

En vedette

Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management
ISSGC Summer School
 
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
Oscar Corcho
 
Salesforce.com Prezo
Salesforce.com PrezoSalesforce.com Prezo
Salesforce.com Prezo
minihane88
 
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual ModellingSemantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Oscar Corcho
 

En vedette (16)

Experiences in the Development of Geographical Ontologies and Linked Data
Experiences in the Development of Geographical Ontologies and Linked DataExperiences in the Development of Geographical Ontologies and Linked Data
Experiences in the Development of Geographical Ontologies and Linked Data
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management Session 48 - Principles of Semantic metadata management
Session 48 - Principles of Semantic metadata management
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Semantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extensionSemantics and optimisation of the SPARQL1.1 federation extension
Semantics and optimisation of the SPARQL1.1 federation extension
 
DynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniquesDynaLearn: Problem-based learning supported by semantic techniques
DynaLearn: Problem-based learning supported by semantic techniques
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Salesforce.com Prezo
Salesforce.com PrezoSalesforce.com Prezo
Salesforce.com Prezo
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual ModellingSemantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
Semantic Techniques for Enabling Knowledge Reuse in Conceptual Modelling
 
Gentle Introduction to Semantic Enrichment
Gentle Introduction to Semantic EnrichmentGentle Introduction to Semantic Enrichment
Gentle Introduction to Semantic Enrichment
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)Linked Data Tutorial (Florianópolis)
Linked Data Tutorial (Florianópolis)
 

Similaire à The role of annotation in reproducibility (Empirical 2014)

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 
21 Microbiological Laboratory Techniques
21 Microbiological Laboratory Techniques21 Microbiological Laboratory Techniques
21 Microbiological Laboratory Techniques
Princeton Freeman
 
Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
 Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
hydrologyproject0
 

Similaire à The role of annotation in reproducibility (Empirical 2014) (20)

SMART Protocols
SMART ProtocolsSMART Protocols
SMART Protocols
 
Summary ph dtesis_oxg
Summary ph dtesis_oxgSummary ph dtesis_oxg
Summary ph dtesis_oxg
 
Phd tesis olga giraldo 10mayo
Phd tesis olga giraldo 10mayoPhd tesis olga giraldo 10mayo
Phd tesis olga giraldo 10mayo
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
FINAL APPLIED LAB PROJECT (1 CREDIT LAB COMPONENT)
FINAL APPLIED LAB PROJECT (1 CREDIT LAB COMPONENT)FINAL APPLIED LAB PROJECT (1 CREDIT LAB COMPONENT)
FINAL APPLIED LAB PROJECT (1 CREDIT LAB COMPONENT)
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Collaborative Ontology building: So much more than authoring an Ontology
Collaborative Ontology building: So much more than authoring an Ontology Collaborative Ontology building: So much more than authoring an Ontology
Collaborative Ontology building: So much more than authoring an Ontology
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Ethics reproducibility and data stewardship
Ethics reproducibility and data stewardshipEthics reproducibility and data stewardship
Ethics reproducibility and data stewardship
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
 
21 Microbiological Laboratory Techniques
21 Microbiological Laboratory Techniques21 Microbiological Laboratory Techniques
21 Microbiological Laboratory Techniques
 
21 microbiological laboratory techniques
21 microbiological laboratory techniques21 microbiological laboratory techniques
21 microbiological laboratory techniques
 
Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
 Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
Download-manuals-water quality-wq-manuals-21microbiologicallaboratorytechniques
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 

Plus de Oscar Corcho

Plus de Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?
 
Big Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los DatosBig Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los Datos
 
Aspectos técnicos de la ontología PPROC
Aspectos técnicos de la ontología PPROCAspectos técnicos de la ontología PPROC
Aspectos técnicos de la ontología PPROC
 
AragoDBpedia
AragoDBpediaAragoDBpedia
AragoDBpedia
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
 
Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)
 

Dernier

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Dernier (20)

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 

The role of annotation in reproducibility (Empirical 2014)

  • 1. The role of annotation in reproducibility ESWC2014 Empirical workshop 26/05/2014 Contributors: my PhD students Olga Giraldo, Daniel Garijo, and Idafen Santana, and the Wf4Ever team Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  • 2. Setting the context of this presentation Our main assumption “We are not so good at describing our experiments, and this has a negative impact in reproducibility (and understandability, and conservation, and reconstruction)” • Let’s see if this happens in different areas of scientific research • In vitro experiments in Plant Biology • In silico experiments in several domains • The challenge • Let’s use annotation as a means to increase reproducibility • Note: see the last slide on terminology
  • 5. The role of laboratory protocols in Life Sciences Laboratory Protocols http://mibbi.sourceforge.net/about.shtml Laboratory protocols support the scientific results
  • 6. Laboratory Protocols • Written in natural language • Generally, presented in a “recipe” style • Description of a sequence of operations that include inputs and outputs • Step-by-step descriptions of procedures • A protocol is a type of workflow • They must be described in sufficient and unambiguous detail. • To enable another agent (human or machine) to replicate the original experiment. • Specific journals: Biotechniques, CSH protocols, Current protocols, GMR, Jove, Protocol exchange, Plant methods, Plos One, Springer protocols
  • 7. Detailed instructions on journal’s guides for authors
  • 8. And other useful elements, including ontologies It maintains checklists that promote how to report an experiment. It models the design of an investigation. Including protocols, instrumentation, materials and data generated. Aims to formalize knowledge about the organization, execution and analysis of scientific experiments. EXPO EXACT It provides a model for the description of experiment actions. Minimal information models, check lists, and even ontologies
  • 9. However… • Ambiguity is the norm • Let’s make an analysis on protocols written for the plant biology community • Incubate the centrifuge tubes in a water bath. •Incubate the samples for 5 min with gentle shaking. • Rinse DNA briefly in 1-2 ml of wash. •Incubate at -20C overnight. Protocol
  • 10. Analysis of Laboratory Protocols Repository Number of Protocols Biotechniques 8 CSH protocols 11 Current protocols 25 GMR 4 Jove 21 Protocol exchange 12 Plant methods 10 Plos One 3 Springer protocols 5 Total 99
  • 11. Minimal Information to Report a Laboratory Protocol Our model Ocurrence in other models TITLE 100% AUTHOR 100% INTRODUCTION Purpose 89% Provenance of the protocol 89% Applications of the protocol 89% Comparison with other protocols 89% Limitations 89% MATERIALS Sample 100% · strain or line genotype · Developmental stage · Organism part (tissue) Laboratory consumables/supplies · Laboratory consumable name 22% · Manufacturer name 11% · Laboratory consumable ID (catalog number) 11% Buffer recipes · Buffer name 67% · Chemical compound name 67% · Initial concentration of chemical compound 67% · Final concentration or amount of chemical compound 56% · Storage conditions 56% · Cautions 56% · Hints 67% Our model Ocurrence in other models Reagent · Reagent name 100% · Reagent vendor or manufacturer 100% · Reagent ID (catalog number) 100% Kit · Kit name 100% · Kit vendor or manufacturer 100% · Kit ID (catalog number) 56% Primer · Primer name 67% · Primer sequence 89% · Primer vendor or manufacturer 33% Equipment · Equipment name 67% · Equipment vendor or Manufacturer 67% · Equipment ID (catalog number) 67% Software · Software name 67% · Software version 67% METHODS/PROCEDURE Protocol 100% · Cautions 56% · Critical steps 56% · Pause point 33% · Hints 22% · Troubleshooting 44%
  • 12. How to Formalize the Protocols? • Incubate the centrifuge tubes at 65°C in a water bath for 10 min. • Rinse DNA briefly in 1-2 ml of wash. •Incubate at -20C overnight. Protocol indicate different length of time 2 seconds?, 5-10 seconds?... Object: centrifuge tubes, water bath Unit of measure: 65C, 10 min. Action: incubate.
  • 14. Currently working on protocol annotation plant material instrument name manufacturer Buffer recipe Reagent name Laboratory consumable name Source: Biotechniques Meta-information about content Content Plant material Arabidopsis thaliana (rosette leaves, flowers, siliques),… and Larix decidua (young needles) Instrument name Leitz DMRB microscope manufacturer Leica Micro-systems Buffer recipe 50 mM EDTA, 1.4% SDS Reagent name 96% ethanol ~ absolute ethanol Laboratory consumable name 2-mL tube, zeolite beads
  • 15. 15 From the wet lab to our computers Lab book Digital Log Laboratory Protocol (recipe) Workflow Experiment
  • 17. Scientific Workflows 17 “Template defining the set of tasks needed to carry out a computational experiment” [1] •Inputs •Steps •Intermediate results •Outputs •Data driven, usually represented as Directed Acyclic Graphs (DAGs) [1] Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor, Workflows and e-science: an overview of workflow system features and capabilities, Future Generation Computer Systems 25 (5) (2009) 528–540.
  • 18. 18 Plenty of workflow tools and platforms: Taverna, Wings, LONI Pipeline
  • 19. What do I want from these workflows and repositories? 19 • As a designer: Discovery •Workflows with similar functionality fragments/methods •Design based in previous templates. • As user/reuser/reviewer: Understandability, Exploration •Search workflows by functionality •Commonalities between execution runs •Component categorization •Reproducibility Workflow 1
  • 20. Working on different aspects of workflow preservation •Workflow representation •Plan/template representation •Provenance trace representation •Link between templates and traces •Creation of abstractions/motifs in scientific workflows •Abstraction catalog •Find how different workflows are related •Understandability and reuse of scientific workflows •Relation between the workflows involved in the same experiment (Research Objects) 20 CH1: Can we export an abstract template of the method being represented? CH2: How do we interoperate with other workflow results? CH3: How do we access the workflow results? CH4: How do we link an abstract method with several implementations? CH5: How can we detect what are the typical operations in scientific workflows? CH6: How can we detect them automatically? CH7: Which workflow parts are related to other workflows? CH8: How do workflows depend on the other parts of the experiments?
  • 21. 21 Overview • Empirical analysis on 260 workflow templates from Taverna, Wings, Galaxy and Vistrails • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg Common motifs in scientific workflows: An empirical analysis. Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C. Future Generation Computer Systems, . 2013
  • 22. 22 Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use
  • 23. 23 Motif Catalog Data-Oriented Motifs (What?) Data Retrieval Data Preparation Format Transformation Input Augmentation and Output Splitting Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation Workflow-Oriented Motifs (How?) Intra-Workflow Motifs Stateful (Asynchronous) Invocations Stateless (Synchronous) Invocations Internal Macros Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading Ontology Purl: http://purl.org/net/wf-motifs
  • 24. Macro abstraction detection Problem statement: Given a repository of workflow templates (either abstract or specific) or workflow execution traces, what are the workflow fragments I can deduce from it? Useful for: •Systems like Taverna and Wings: (Many templates, little annotation to relate them) •Finding relationships between workflows and sub-workflows. •Most used fragments, most executed, etc. •Systems like GenePattern, LONI Pipeline and Galaxy: (Many runs, nearly no templates published) •Proposing new templates with the popular fragments. 24
  • 25. 25 Common workflow fragment detection [Holder et al 1994]: Substructure Discovery in the SUBDUE System L. B. Holder, D. J. Cook, and S. Djoko. AAAI Workshop on Knowledge Discovery, pages 169-180, 1994. •Given a collection of workflows, which are the most common fragments? •Common sub-graphs among the collection •Sub-graph isomorphism (NP-complete) •We use subgraph mining algorithms •Graph Grammar learning •The rules of the grammar are the workflow fragments •Graph based hierarchical clustering •Each cluster corresponds to a workflow fragment •Iterative algorithm with two measures for compressing the graph: •Minimum Description Length (MDL) •Size
  • 26. 26 Exporting the fragment results: Wf-FD model http://purl.org/net/wf-fd
  • 27. 27 Exporting the fragment results: Wf-FD model
  • 30. Working on different aspects of workflow preservation •Workflow representation •Plan/template representation •Provenance trace representation •Link between templates and traces •Creation of abstractions/motifs in scientific workflows •Abstraction catalog •Find how different workflows are related •Understandability and reuse of scientific workflows •Relation between the workflows involved in the same experiment (Research Objects) 30 CH1: Can we export an abstract template of the method being represented? CH2: How do we interoperate with other workflow results? CH3: How do we access the workflow results? CH4: How do we link an abstract method with several implementations? CH5: How can we detect what are the typical operations in scientific workflows? CH6: How can we detect them automatically? CH7: Which workflow parts are related to other workflows? CH8: How do workflows depend on the other parts of the experiments?
  • 31. 31 What is a Research Object? •Aggregation of resources that bundles together the contents of a research work: •Data •Experiments •Examples •Bibliography •Annotations •Provenance •ROs •Etc. http://www.researchobject.org/ Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. Belhajjame, K.; Corcho, O.; Garijo, D.; Zhao, J.; Missier, P.; Newman, D.; Palma, R.; Bechhofer, S.; Garcıa, E.; Manuel, .G. J.; Klyne, G.; Page, K.; Roos, M.; Ruiz, J. E.; Soiland-Reyes, S.; Verdes-Montenegro, L.; De Roure, D.; and Goble, C. In Proceedings of the Second International Conference on the Future of Scholarly Communication and Scientific Publishing Sepublica2012, page 1-12, Hersonissos, 2012
  • 33. Workflow (and RO) Preservation Checklists
  • 35. The role of annotation in reproducibility ESWC2014 Empirical workshop 26/05/2014 Contributors: my PhD students Olga Giraldo, Daniel Garijo, and Idafen Santana, and the Wf4Ever team Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  • 36. A final note on terminology Source: Idafen Santana; Inspired by [Goble, 2012]

Notes de l'éditeur

  1. However when the protocols are published some of them present problems such as insufficient granularity and the instructions can be imprecise or ambiguous due to the natural language. In order to avoid arbitrary interpretations, we are designing an ontological structure that facilitate the formal representation of experimental protocols.
  2. However when the protocols are published some of them present problems such as insufficient granularity and the instructions can be imprecise or ambiguous due to the natural language. In order to avoid arbitrary interpretations, we are designing an ontological structure that facilitate the formal representation of experimental protocols.
  3. This is the What vs How vs Why.
  4. Why test more than 1 subgraph algorithm? Because we want to compare the fragments obtained from the algorithms. There is always a trade of between the size of the subgraph and its frequency.