SlideShare une entreprise Scribd logo
1  sur  63
The beauty of
workflows and models
Workflows for research.
Reproducible research.
Professor Carole Goble
The University of Manchester, UK
The Software Sustainability Institute
carole.goble@manchester.ac.uk
@caroleannegoble
RDMF Meeting, Westminster, 20 June 2014
Scientific publications have at least
two goals:
(i) to announce a result and
(ii) to convince readers that the
result is correct
…..
papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension
Jill Mesirov
Accessible Reproducible Research
Science 22 Jan 2010: 327(5964): 415-416
DOI: 10.1126/science.1179653
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
Scientific publications have at least
two goals:
(i) to announce a result and
(ii) to convince readers that the
result is correct
…..
papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension
Jill Mesirov
Accessible Reproducible Research
Science 22 Jan 2010: 327(5964): 415-416
DOI: 10.1126/science.1179653
Virtual Witnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
“An article about computational
science in a scientific publication
is not the scholarship itself, it is
merely advertising of the
scholarship. The actual
scholarship is the complete
software development
environment, [the complete
data] and the complete set of
instructions which generated the
figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
datasets
data collections
standard operating
procedures
software
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs
Nature 482, 2012
datasets
data collections
standard operating
procedures
software
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs
Nature 482, 2012
“Executable Data”
Biodiversity
marine monitoring and health assessment
ecological niche modelling
Data Intensive Science
Collaborative Science
Pilumnus hirtellusEnclosed sea problem
(Ready et al., 2010)
Sarah Bourlat http://www.biovel.eu
Data discoveryData discovery
Data assembly,
cleaning, and
refinement
Data assembly,
cleaning, and
refinement
Ecological Niche
Modeling
Ecological Niche
Modeling
Statistical analysisStatistical analysis
Analytical cycle
Data collectionData collection
InsightsInsights Scholarly Communication
& Reporting
Scholarly Communication
& Reporting
BioSTIF
method
instruments and laboratory
Workflows: capture the steps
assembly & interoperability
shielding & optimising
flexible variant reuse
pipelines & exploration
repetition & comparison
record & set-up
provenance collection
report & embed
multi-code and multi-
resource experiments
in-house and external
workflow mgment systems
materials
http://www.taverna.org.uk
Application
GeneralistSpecialist
Infrastructure
Scientific Workflow Management Systems
Systems Biology
Modelling Cycle
Virtual Physiological Human Morphology
Microbiology Metabolic Pathways
http://www.vph-share.eu/
standards,standards,standards
Data
Models
Articles
External
Databases
http://www.seek4science.org
Metadata
http://www.isatools.org
Aggregated Content Infrastructure
share and interlinking multi-stewarded, mixed,
methods, models, data, samples…
Preservation Planning &Watch
continuous preservation management
Environment and
users
Repository
access
ingest
harvest
Monitored environment
and usersWatch
Planning Operations
create/
reevaluate
plans
deploy plan
monitored
actions
Monitored content and events
execute action plan
policies
SCOUT
c3po
PLATO
Taverna
workflows
RODA
Long term preservation of digital data. Maintaining scans of newspapers, books,
records of data; Metadata maintenance; large and automated.
Preservation Policy: Collection level Control Policy: Low level actions & constraints
http://www.scape-project.eu/
Merge a
Preservation
Action Plan….
… with an
Access
Workflow
Execution Workflow
Preservation Planning & Watch
Publish
Use
Components
Workflows (and Scripts and Models) are….
…provenance of data
…general technique for describing and enacting a process
…precise, unambiguous, transparent protocols and records.
…often complex, so they need explaining.
…often challenging and expensive to develop.
…know-how and best practice.
…collaborations.
…first class citizens of research
…support the process of research
Workflow publishing
[Scott Edmunds]
Publishing
Journals
Portals
Integrative Frameworks
galaxyproject.org/
Reproducibility = Hard Work
Code in sourceforge under GPLv3:
http://soapdenovo2.sourceforge.net/>5000 downloads
http://homolog.us/wiki/index.php?title=SOAPdenovo2
Data sets
Analyses
Linked to
Linked to
DOI
DOI
Open-Paper
Open-Review
DOI:10.1186/2047-217X-1-18
>11000 accesses
Open-Code
8 reviewers tested data in ftp server & named reports published
DOI:10.5524/100044
Open-Pipelines
Open-Workflows
DOI:10.5524/100038
Open-Data
78GB CC0 data
Enabled code to being picked apart by bloggers in wiki
[Scott Edmunds]
http://ropensci.org
Repositories
Libraries
Registries
Archiving
Publishing
Component Libraries
Preserving
Recording
Storing
Exchanging
Versioning
SharingPACKS
Repositories
Data Operations in Workflows in the Wild
Analysis of 260 publicly available workflows in Taverna, WINGS, Galaxy and Vistrails
Garijo et al Common Motifs in Scientific Workflows: An Empirical Analysis, FGCS, 36, July 2014, 338–351
Research Method Stewardship
Management, Publishing, Preservation
Workflows &
Scripts
Services &
Codes
Standard
Operating
Procedures
Descriptions
Standards
Portal
Different systems
Formats
Web Services
Code Libraries
Executables
Models &
Algorithms
Mark-up Languages,
Mathematical
descriptions
Standards
BioModels
Reuse
Organised Groups
Trust
Reciprocity
Visibility
Roll your own
from standard
parts
Complementarity
Design and Instruction
Curation
Non-intrusive, Non-invasive, Not invisible
Enclaves
Specialist Flirts
Blue collar
Incremental
JIJIT not JIC
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
Data Stewardship
Making practices
Sustainability
Management planning
Deposition
Long term access
Credit
Journals
Licensing
Open source / access
Best Practices for Scientific Computing http://arxiv.org/abs/1210.0530
1st
Workshop on Maintainable Software Practices in e-Science – e-Science 2012
Stodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009
Software
ModelsWorkflows
Services
Data Stewardship
Best Practices for Scientific Computing http://arxiv.org/abs/1210.0530
1st
Workshop on Maintainable Software Practices in e-Science – e-Science 2012
Stodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009
Software
ModelsWorkflows
Services
Making practices
Sustainability
Management planning
Deposition
Long term access
Credit
Journals
Licensing
Open source / access
http://sciencecodemanifesto.org/http://matt.might.net/articles/crapl/
Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
Software release paradigm
Some of your data isn’t data
Not a static document paradigm
• Release
research
• Methods in
motion.
• Versioning
• Forks & merges
• F1000, PeerJ
GitHub….
Pivot around method / software / data
rather than paper
Citation semantics: software as was? software as is?
The multi-dimensional paper
methods,
reproducibility
what does it
mean for
content managers
and the research
workflow?
Replication Gap
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
Out of 18 microarray papers, results
from 10 could not be reproduced
Out of 18 microarray papers, results
from 10 could not be reproduced
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
regenerate
the figure
redo
“When I use a word," Humpty
Dumpty said in rather a
scornful tone, "it means just
what I choose it to mean -
neither more nor less.”*
*Lewis Carroll, Through the Looking-Glass, and What
Alice Found There (1871)
reusereproduce
repeat replicate
Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
Methods
(techniques, algorithms,
spec. of the steps)
Materials
(datasets, parameters,
algorithm seeds)
Experiment
Instruments
(codes, services, scripts,
underlying libraries)
Laboratory
(sw and hw infrastructure,
systems software,
integrative platforms)
Setup
same experiment
same set up
same lab
same experiment
same set up
different lab
same experiment
different set up
different experiment
some of same
validate
Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
reusereproduce
repeat replicate
DesignDesign
ExecutionExecution
Result AnalysisResult Analysis
CollectionCollection
Publish /
Report
Publish /
Report
Peer
Review
Peer
Review
Peer
Reuse
Peer
Reuse
ModellingModelling
Can I repeat &
defend my
method?
Can I review / reproduce
and compare my results /
method with your results /
method?
Can I review /
replicate and certify
your method?
Can I transfer your
results into my
research and reuse
this method?
* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
Research Report
PredictionPrediction
MonitoringMonitoring
CleaningCleaning
Record
Everything
Automate
Everything
recomputation.org
sciencecodemanifesto.org
[Adapted Freire, 2013]
Authoring
Exec. Papers
Link docs to experiment
Sweave
Provenance
Tracking,
Versioning
Replay, Record, Repair
Workflows,
makefiles
ProvStore
open
accessible
available
description
intelligible
machine-
readable
provenance
gather dependencies
capture steps
track & keep results
provenance
gather dependencies
capture steps
track & keep results
http://nbviewer.ipython.org/github/myGrid/DataHackL
eiden/blob/alan/Player_example.ipynb
https://www.youtube.com/watch?v=QVQwSOX5S08 ?
notebooks
Build into the workflows of research….
RDataTracker and DDG Explorer
Build into the workflows of research….
[Barbara S. Lerner and Emery R. Boose]
Components
Dependencies
Change
• 35 kinds of annotations
• 5 Main Workflows
• 14 Nested Workflows
• 25 Scripts
• 11 Configuration files
• 10 Software dependencies
• 1 Web Service
• Dataset: 90 galaxies
observed in 3 bands
• Multiple platforms
• Multiple systems
José Enrique Ruiz (IAA-CSIC)
Galaxy Luminosity Profiling
specialist codes
libraries, platforms, tools
services
(cloud)
hosted
services
commodity
platforms
data collections
catalogues software
repositories
my data
my process
my codes
integrative
frameworks
gateways
Document vs Instrument
Reproducibility by Inspection
Read It
Reproducibility by Invocation
Run It
Instrument Entropy
all experiments become less reproducible
Zhao, Gomez-Perez, Belhajjame, Klyne, Garcia-
Cuesta, Garrido, Hettne, Roos, De Roure and
Goble. Why workflows break - Understanding
and combating decay in Taverna workflows, 8th
Intl Conf e-Science 2012
Mitigate
Detect, Repair
Preserve
Partial replication
Approx reproduce
Verification
Benchmarks
Environmental Ecosystem
Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012
Black boxes
Mixed systems, mixed stewardship
Distributed,
hosted systems
Workflow Planning &
Watch of Workflows
Watch
Operations
Planning
Env & Users
Repository
plan
deploy
monitor monitor
monitor
access
ingest,
harvest
Decay, Service Deprecation,
Data source monitoring,
Checklists,
Minimal Models
Workflows, myExperiment
Workflows for managing workflows
portability
variability tolerance
[Adapted Freire, 2013]
preservation
packaging
provenance
gather dependencies
capture steps
track & keep results
provenance
gather dependencies
capture steps
track & keep results
versioning
host
service
Open Source/Store
Sci as a Service
Integrative fws
Virtual Machines
Recompute, limited
installation, Black Box
Byte execution, copies
Descriptive read,
White Box
Archived record
Read & Run, Co-location
No installation
Portable Package
White Box, Installation
Archived record
portability
[Adapted Freire, 2013]
preservation
packaging
provenance
gather dependencies
capture steps
track & keep results
provenance
gather dependencies
capture steps
track & keep results
host
service
ReproZip
variability tolerance
versioning
Levels of Reproducibility
Coverage: how
much of an
experiment is
reproducible
OriginalExperimentSimilarExperimentDifferentExperiment
Portability
Depth: how much of an experiment is available
Binaries +
Data
Source Code /
Workflow
+ Data
Binaries +
Data +
Dependencies
Source Code /
Workflow
+ Data +
Dependencies
Virtual Machine
Binaries +
Data +
Dependencies
Virtual Machine
Source Code /
Workflow
+ Data +
Dependencies
Figures +
Data
[Freire, 2014]
indable
ccessible
nteroperable
eusable
http://datafairport.org/
Packs
(Dynamic) Research Objects
• Bundle and relate multi-hosted digital resources of a scientific experiment or
investigation using standard mechanisms, Currency of exchange
• Exchange, Releasing paradigm for publishing
http://www.researchobject.org/
• Bundle and relate multi-hosted digital resources of a scientific experiment or
investigation using standard mechanisms, Currency of exchange
• Exchange, Releasing paradigm for publishing
http://www.researchobject.org/
(Dynamic) Research Objects
• Bundle and relate multi-hosted digital resources of a scientific experiment or
investigation using standard mechanisms, Currency of exchange
• Exchange, Releasing paradigm for publishing
http://www.researchobject.org/
(Dynamic) Research Objects
OAI-
ORE
W3C
OAM
H. Van de Sompel et. al. Persistent Identifiers for
Scholarly Assets and the Web: The Need for an
Unambiguous Mapping 9th International Digital
Curation Conference; Trusty URLs
H. Van de Sompel et. al. Persistent Identifiers for
Scholarly Assets and the Web: The Need for an
Unambiguous Mapping 9th International Digital
Curation Conference; Trusty URLs
Machine readable metadata*
Machine actionable systems**
* Especially Linked Data and RDF ** Especially REST APIs
Sys Bio Research Object
Adobe UCF
Research Object
Bundle
ORE PROVODF
• Aggregation
• Annotations/provenance
• Ad-hoc domain-specific
specification
OMEX archive
Systems Biology:
Needed a common
archive format for
reuse across tools
The research lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Repositories
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
[Phil Bourne]
productivity
reproducibility
personal
side effect
public
side effect
The Cameron Neylon Equation
towards
Steps
born
reproducible
“may all your problems be technical” ...Jim Gray
Social
Matters
Organisation
MetricsCulture
Process
[Adapted, Daron Green]
Summary
• Workflow & modelling models in Science
• Software-style Stewardship
• Born reproducible
• Collective cost & responsibility
• Social factors dominate
http://www.force11.org
Force2015
12 - 13 January, 2015
Oxford University
• myGrid
– http://www.mygrid.org.uk
• Taverna
– http://www.taverna.org.uk
• myExperiment
– http://www.myexperiment.org
• BioCatalogue
– http://www.biocatalogue.org
• Biodiversity Catalogue
– http://www.biodiversitycatalogue.org
• Seek
– http://www.seek4science.org
• Rightfield
– http://www.rightfield.org.uk
• VPH-Share
– http://www.vph-share.eu/
• Wf4ever
– http://www.wf4ever-project.org
• Software Sustainability Institute
– http://www.software.ac.uk
• BioVeL
– http://www.biovel.eu
• Force11
– http://www.force11.org
• SCAPE
– http://www.scape-project.eu/
Acknowledgements
• David De Roure
• Tim Clark
• Sean Bechhofer
• Robert Stevens
• Christine Borgman
• Victoria Stodden
• Marco Roos
• Jose Enrique Ruiz del Mazo
• Oscar Corcho
• Ian Cottam
• Steve Pettifer
• Magnus Rattray
• Chris Evelo
• Katy Wolstencroft
• Robin Williams
• Pinar Alper
• C. Titus Brown
• Greg Wilson
• Kristian Garza
• Donal Fellows
• Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams
• Juliana Freire
• Jill Mesirov
• Simon Cockell
• Paolo Missier
• Paul Watson
• Gerhard Klimeck
• Matthias Obst
• Jun Zhao
• Pinar Alper
• Daniel Garijo
• Yolanda Gil
• James Taylor
• Alex Pico
• Sean Eddy
• Cameron Neylon
• Barend Mons
• Kristina Hettne
• Stian Soiland-Reyes
• Rebecca Lawrence
• Alan Williams

Contenu connexe

Tendances

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingGigaScience, BGI Hong Kong
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 

Tendances (20)

NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 

Similaire à The beauty of workflows and models

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...GigaScience, BGI Hong Kong
 
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science Carole Goble
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIRDOM
 

Similaire à The beauty of workflows and models (20)

Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
Scott Edmunds at #GAMe2017: GigaGalaxy & publishing workflows for publishing ...
 
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
 

Plus de myGrid team

2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflowsmyGrid team
 
2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity examplemyGrid team
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial ComponentsmyGrid team
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial InteractionsmyGrid team
 
2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflowsmyGrid team
 
2014 Taverna Tutorial R script
2014 Taverna Tutorial R script2014 Taverna Tutorial R script
2014 Taverna Tutorial R scriptmyGrid team
 
2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool service2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool servicemyGrid team
 
2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scriptsmyGrid team
 
2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and BiocataloguemyGrid team
 
2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced TavernamyGrid team
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial XpathmyGrid team
 
2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet importmyGrid team
 
2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflowmyGrid team
 
2014 Taverna tutorial REST services
2014 Taverna tutorial REST services2014 Taverna tutorial REST services
2014 Taverna tutorial REST servicesmyGrid team
 
2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperiment2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperimentmyGrid team
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflowsmyGrid team
 
SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionmyGrid team
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloudmyGrid team
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software SuitemyGrid team
 

Plus de myGrid team (20)

Taverna summary
Taverna summaryTaverna summary
Taverna summary
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial Components
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions
 
2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows
 
2014 Taverna Tutorial R script
2014 Taverna Tutorial R script2014 Taverna Tutorial R script
2014 Taverna Tutorial R script
 
2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool service2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool service
 
2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts
 
2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue
 
2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath
 
2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import
 
2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow
 
2014 Taverna tutorial REST services
2014 Taverna tutorial REST services2014 Taverna tutorial REST services
2014 Taverna tutorial REST services
 
2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperiment2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperiment
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows
 
SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice Description
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 

Dernier

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 

Dernier (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 

The beauty of workflows and models

  • 1. The beauty of workflows and models Workflows for research. Reproducible research. Professor Carole Goble The University of Manchester, UK The Software Sustainability Institute carole.goble@manchester.ac.uk @caroleannegoble RDMF Meeting, Westminster, 20 June 2014
  • 2. Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct ….. papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension Jill Mesirov Accessible Reproducible Research Science 22 Jan 2010: 327(5964): 415-416 DOI: 10.1126/science.1179653 Virtual Witnessing* *Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
  • 3. Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct ….. papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension Jill Mesirov Accessible Reproducible Research Science 22 Jan 2010: 327(5964): 415-416 DOI: 10.1126/science.1179653 Virtual Witnessing* *Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
  • 4. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 datasets data collections standard operating procedures software algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et al The case for open computer programs Nature 482, 2012
  • 5. datasets data collections standard operating procedures software algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et al The case for open computer programs Nature 482, 2012 “Executable Data”
  • 6. Biodiversity marine monitoring and health assessment ecological niche modelling Data Intensive Science Collaborative Science Pilumnus hirtellusEnclosed sea problem (Ready et al., 2010) Sarah Bourlat http://www.biovel.eu
  • 7.
  • 8. Data discoveryData discovery Data assembly, cleaning, and refinement Data assembly, cleaning, and refinement Ecological Niche Modeling Ecological Niche Modeling Statistical analysisStatistical analysis Analytical cycle Data collectionData collection InsightsInsights Scholarly Communication & Reporting Scholarly Communication & Reporting
  • 9. BioSTIF method instruments and laboratory Workflows: capture the steps assembly & interoperability shielding & optimising flexible variant reuse pipelines & exploration repetition & comparison record & set-up provenance collection report & embed multi-code and multi- resource experiments in-house and external workflow mgment systems materials http://www.taverna.org.uk
  • 12. Virtual Physiological Human Morphology Microbiology Metabolic Pathways http://www.vph-share.eu/
  • 15. Preservation Planning &Watch continuous preservation management Environment and users Repository access ingest harvest Monitored environment and usersWatch Planning Operations create/ reevaluate plans deploy plan monitored actions Monitored content and events execute action plan policies SCOUT c3po PLATO Taverna workflows RODA Long term preservation of digital data. Maintaining scans of newspapers, books, records of data; Metadata maintenance; large and automated. Preservation Policy: Collection level Control Policy: Low level actions & constraints http://www.scape-project.eu/
  • 16. Merge a Preservation Action Plan…. … with an Access Workflow Execution Workflow Preservation Planning & Watch Publish Use Components
  • 17. Workflows (and Scripts and Models) are…. …provenance of data …general technique for describing and enacting a process …precise, unambiguous, transparent protocols and records. …often complex, so they need explaining. …often challenging and expensive to develop. …know-how and best practice. …collaborations. …first class citizens of research …support the process of research
  • 19. Reproducibility = Hard Work Code in sourceforge under GPLv3: http://soapdenovo2.sourceforge.net/>5000 downloads http://homolog.us/wiki/index.php?title=SOAPdenovo2 Data sets Analyses Linked to Linked to DOI DOI Open-Paper Open-Review DOI:10.1186/2047-217X-1-18 >11000 accesses Open-Code 8 reviewers tested data in ftp server & named reports published DOI:10.5524/100044 Open-Pipelines Open-Workflows DOI:10.5524/100038 Open-Data 78GB CC0 data Enabled code to being picked apart by bloggers in wiki [Scott Edmunds]
  • 22. Data Operations in Workflows in the Wild Analysis of 260 publicly available workflows in Taverna, WINGS, Galaxy and Vistrails Garijo et al Common Motifs in Scientific Workflows: An Empirical Analysis, FGCS, 36, July 2014, 338–351
  • 23. Research Method Stewardship Management, Publishing, Preservation Workflows & Scripts Services & Codes Standard Operating Procedures Descriptions Standards Portal Different systems Formats Web Services Code Libraries Executables Models & Algorithms Mark-up Languages, Mathematical descriptions Standards BioModels
  • 24. Reuse Organised Groups Trust Reciprocity Visibility Roll your own from standard parts Complementarity Design and Instruction
  • 25. Curation Non-intrusive, Non-invasive, Not invisible Enclaves Specialist Flirts Blue collar Incremental JIJIT not JIC
  • 26. Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4) Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
  • 27. Data Stewardship Making practices Sustainability Management planning Deposition Long term access Credit Journals Licensing Open source / access Best Practices for Scientific Computing http://arxiv.org/abs/1210.0530 1st Workshop on Maintainable Software Practices in e-Science – e-Science 2012 Stodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009 Software ModelsWorkflows Services
  • 28. Data Stewardship Best Practices for Scientific Computing http://arxiv.org/abs/1210.0530 1st Workshop on Maintainable Software Practices in e-Science – e-Science 2012 Stodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009 Software ModelsWorkflows Services Making practices Sustainability Management planning Deposition Long term access Credit Journals Licensing Open source / access
  • 30. Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012 Software release paradigm Some of your data isn’t data Not a static document paradigm • Release research • Methods in motion. • Versioning • Forks & merges • F1000, PeerJ GitHub….
  • 31. Pivot around method / software / data rather than paper Citation semantics: software as was? software as is? The multi-dimensional paper
  • 32. methods, reproducibility what does it mean for content managers and the research workflow?
  • 33. Replication Gap 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950 Out of 18 microarray papers, results from 10 could not be reproduced Out of 18 microarray papers, results from 10 could not be reproduced
  • 34. re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle regenerate the figure redo “When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.”* *Lewis Carroll, Through the Looking-Glass, and What Alice Found There (1871)
  • 35. reusereproduce repeat replicate Drummond C Replicability is not Reproducibility: Nor is it Good Science, online Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227. Methods (techniques, algorithms, spec. of the steps) Materials (datasets, parameters, algorithm seeds) Experiment Instruments (codes, services, scripts, underlying libraries) Laboratory (sw and hw infrastructure, systems software, integrative platforms) Setup
  • 36. same experiment same set up same lab same experiment same set up different lab same experiment different set up different experiment some of same validate Drummond C Replicability is not Reproducibility: Nor is it Good Science, online Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227. reusereproduce repeat replicate
  • 37. DesignDesign ExecutionExecution Result AnalysisResult Analysis CollectionCollection Publish / Report Publish / Report Peer Review Peer Review Peer Reuse Peer Reuse ModellingModelling Can I repeat & defend my method? Can I review / reproduce and compare my results / method with your results / method? Can I review / replicate and certify your method? Can I transfer your results into my research and reuse this method? * Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010) Research Report PredictionPrediction MonitoringMonitoring CleaningCleaning
  • 39. [Adapted Freire, 2013] Authoring Exec. Papers Link docs to experiment Sweave Provenance Tracking, Versioning Replay, Record, Repair Workflows, makefiles ProvStore open accessible available description intelligible machine- readable provenance gather dependencies capture steps track & keep results provenance gather dependencies capture steps track & keep results
  • 41. RDataTracker and DDG Explorer Build into the workflows of research…. [Barbara S. Lerner and Emery R. Boose]
  • 42. Components Dependencies Change • 35 kinds of annotations • 5 Main Workflows • 14 Nested Workflows • 25 Scripts • 11 Configuration files • 10 Software dependencies • 1 Web Service • Dataset: 90 galaxies observed in 3 bands • Multiple platforms • Multiple systems José Enrique Ruiz (IAA-CSIC) Galaxy Luminosity Profiling
  • 43. specialist codes libraries, platforms, tools services (cloud) hosted services commodity platforms data collections catalogues software repositories my data my process my codes integrative frameworks gateways
  • 44. Document vs Instrument Reproducibility by Inspection Read It Reproducibility by Invocation Run It
  • 45. Instrument Entropy all experiments become less reproducible Zhao, Gomez-Perez, Belhajjame, Klyne, Garcia- Cuesta, Garrido, Hettne, Roos, De Roure and Goble. Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012 Mitigate Detect, Repair Preserve Partial replication Approx reproduce Verification Benchmarks
  • 46. Environmental Ecosystem Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012 Black boxes Mixed systems, mixed stewardship Distributed, hosted systems
  • 47. Workflow Planning & Watch of Workflows Watch Operations Planning Env & Users Repository plan deploy monitor monitor monitor access ingest, harvest Decay, Service Deprecation, Data source monitoring, Checklists, Minimal Models Workflows, myExperiment Workflows for managing workflows
  • 48. portability variability tolerance [Adapted Freire, 2013] preservation packaging provenance gather dependencies capture steps track & keep results provenance gather dependencies capture steps track & keep results versioning host service Open Source/Store Sci as a Service Integrative fws Virtual Machines Recompute, limited installation, Black Box Byte execution, copies Descriptive read, White Box Archived record Read & Run, Co-location No installation Portable Package White Box, Installation Archived record
  • 49. portability [Adapted Freire, 2013] preservation packaging provenance gather dependencies capture steps track & keep results provenance gather dependencies capture steps track & keep results host service ReproZip variability tolerance versioning
  • 50. Levels of Reproducibility Coverage: how much of an experiment is reproducible OriginalExperimentSimilarExperimentDifferentExperiment Portability Depth: how much of an experiment is available Binaries + Data Source Code / Workflow + Data Binaries + Data + Dependencies Source Code / Workflow + Data + Dependencies Virtual Machine Binaries + Data + Dependencies Virtual Machine Source Code / Workflow + Data + Dependencies Figures + Data [Freire, 2014]
  • 52. (Dynamic) Research Objects • Bundle and relate multi-hosted digital resources of a scientific experiment or investigation using standard mechanisms, Currency of exchange • Exchange, Releasing paradigm for publishing http://www.researchobject.org/
  • 53. • Bundle and relate multi-hosted digital resources of a scientific experiment or investigation using standard mechanisms, Currency of exchange • Exchange, Releasing paradigm for publishing http://www.researchobject.org/ (Dynamic) Research Objects
  • 54. • Bundle and relate multi-hosted digital resources of a scientific experiment or investigation using standard mechanisms, Currency of exchange • Exchange, Releasing paradigm for publishing http://www.researchobject.org/ (Dynamic) Research Objects OAI- ORE W3C OAM H. Van de Sompel et. al. Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping 9th International Digital Curation Conference; Trusty URLs H. Van de Sompel et. al. Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping 9th International Digital Curation Conference; Trusty URLs
  • 55. Machine readable metadata* Machine actionable systems** * Especially Linked Data and RDF ** Especially REST APIs
  • 56.
  • 57. Sys Bio Research Object Adobe UCF Research Object Bundle ORE PROVODF • Aggregation • Annotations/provenance • Ad-hoc domain-specific specification OMEX archive Systems Biology: Needed a common archive format for reuse across tools
  • 58. The research lifecycle IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Repositories Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training [Phil Bourne]
  • 59. productivity reproducibility personal side effect public side effect The Cameron Neylon Equation towards Steps born reproducible
  • 60. “may all your problems be technical” ...Jim Gray Social Matters Organisation MetricsCulture Process [Adapted, Daron Green]
  • 61. Summary • Workflow & modelling models in Science • Software-style Stewardship • Born reproducible • Collective cost & responsibility • Social factors dominate http://www.force11.org Force2015 12 - 13 January, 2015 Oxford University
  • 62. • myGrid – http://www.mygrid.org.uk • Taverna – http://www.taverna.org.uk • myExperiment – http://www.myexperiment.org • BioCatalogue – http://www.biocatalogue.org • Biodiversity Catalogue – http://www.biodiversitycatalogue.org • Seek – http://www.seek4science.org • Rightfield – http://www.rightfield.org.uk • VPH-Share – http://www.vph-share.eu/ • Wf4ever – http://www.wf4ever-project.org • Software Sustainability Institute – http://www.software.ac.uk • BioVeL – http://www.biovel.eu • Force11 – http://www.force11.org • SCAPE – http://www.scape-project.eu/
  • 63. Acknowledgements • David De Roure • Tim Clark • Sean Bechhofer • Robert Stevens • Christine Borgman • Victoria Stodden • Marco Roos • Jose Enrique Ruiz del Mazo • Oscar Corcho • Ian Cottam • Steve Pettifer • Magnus Rattray • Chris Evelo • Katy Wolstencroft • Robin Williams • Pinar Alper • C. Titus Brown • Greg Wilson • Kristian Garza • Donal Fellows • Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams • Juliana Freire • Jill Mesirov • Simon Cockell • Paolo Missier • Paul Watson • Gerhard Klimeck • Matthias Obst • Jun Zhao • Pinar Alper • Daniel Garijo • Yolanda Gil • James Taylor • Alex Pico • Sean Eddy • Cameron Neylon • Barend Mons • Kristina Hettne • Stian Soiland-Reyes • Rebecca Lawrence • Alan Williams

Notes de l'éditeur

  1. how, why and what matters benchmarks for codes plan to preserve repair on demand description persists use frameworks partial replication approximate reproduction verification
  2. / Minute Taking It examines the debate between Robert Boyle and Thomas Hobbes over Boyle's air-pump experiments in the 1660s. In 2005, Shapin and Schaffer were awarded the Erasmus Prize for this work.
  3. http://www.mysharedprotocols.com/ Openwetware.org Molmeth.org
  4. Ballast water in shipping lanes invasive species risk assessments When the tools are integrated on the Lifewatch portal, we will have access to the enviornmental data through the workflow. Global marine layers used in our study came from Bio-Oracle (http://www.biooracle. ugent.be/) with a resolution of 5 arc-minutes (9.2 km) (Tyberghein et al. 2012), and were used to study abiotic factors such as mean sea surface temperature (SST) (°C), mean surface salinity (SSS) (PSU) and mean photosynthetically available radiation (PAR) (Einstein/m2/day). Other marine layers used in our analyses came from AquaMaps (http://www.aquamaps.org/download/main.php) with a resolution of 30 arc-minutes, i.e. 1 km2 (Kaschner et al. 2010). If layers are needed from SMHI, they can be uploaded via the workflow. Not all things are batch VPH-Share opens a VNC connection spawned instance. Taverna Interaction Service Users interact with a workflow (wherever it is running) in a web browser. Interaction Service Workbench Plug-in
  5. Collecting, processing and management of big data Metagenomics, genotyping, genome sequencing, phylogenetics, gene expression analysis, proteomics, metabolomics, auto sampling Analytics and management of broad data from many different disciplines Coupling analytical metagenomics with meaningful ecological interpretations Continuous development of novel methods and technologies Functional trait-based ecology approach proposed by Barberán et. al 2012.
  6. Example of an extreme of the software issue Multi-code experiments platform libraries, plugins Infrastructure components, services infrastructure
  7. Variety: common metadata models rich metadata collection ecosystem Validity: auto record of experiment set-up, citable and shareable descriptions curation, publication, mixed stewardship third part availability model executability citability, QC/QA. trust. Social issues of understanding the culture of risk, reward, sharing and reporting.
  8. C3PO content profiling tool What kind of platform are my users running on Purpose: show how the system components work together and support the overall conceptual framework we are building on. The example story that we use is a real-world case from the British Library where a plan was created previously with Plato3 This kind of scenario is very common, of course - it could have been one of many other organisations that have similar, in particular digitized, collections. In parallel to the Plato case study we are relying on, we had conducted several others that led to different conclusions; and the same problem structure applies to the case study we just conducted in SCAPE with the SB-DK. Hence, the tools demonstrated here apply equally across many scenarios - the concepts used are completely scenario-agnostic, and decision support applies equally across scenarios, content sets, and organisations.
  9. Running migration tools
  10. Designing for reuse is hard No closing the feedback loop Off site collaboration (behind closed doors and invisible) Poor reciprocity – find, grab, piss off Not much reuse (too specific) Components Enclaves and groups are better File and forget Most workflows decay. Group enclaves Personal archives rather than sharing Credit….. Designing for reuse
  11. Curators Credit? Funded? Blue-Collar. Personal and institutional visibility Scholarly (citation metrics) Federate workloads Unpopular with the big data providers. Hell is other people’s metadata Jean-Paul Satre Invisible vs nonintrusive (discreet but controlled) credit Takes time, fears for description Non-intrusiveness vs control and invisible Off site collaboration Not much reuse (too specific) Components Enclaves and groups are better File and forget Most workflows decay.
  12. Designing for stewardship
  13. “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software” An aside
  14. TOLERANCE The explicit documentation of designed-in and anticipated variation methods matterresults may vary
  15. Added afterwards. 1. Required as condition of publication, certain exceptions permitted (e.g. preserving confidentiality of human subjects) 2. Required but may not affect editorial/publication decisions 3. Explicitly encouraged/addressed; may be reviewed and/or hosted 4. Implied 5. No mention 59% of papers in the 50 highest-IF journals comply with (often weak) data sharing rules. Alsheikh-Ali et al Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9) 2011
  16. conceptual replication “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] The letter or the spirit of the experiment indirect and direct reproducibility Reproduce the same affect? Or same result? Concept drift towards bottom. As an old ontologist I wanted an ontology or a framework or some sort of property based classification.
  17. Reproducible Research Environment Integrated infrastructure for producing and working with reproducible research. make reproducible, born reproducible Reproducible Research Publication Environment Distributing and reviewing; credit; licensing etc
  18. FAIRport* ReproducibilityFind, Access, Interoperate, Reuse, PortPreservation - Lots of copies keeps stuff safe Stability dimension Add two more dimensions to our classification of themes A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System Overlap of course Static vs dynamic. GRANULARITY This model for audit and target of your systems overcoming data type silos public integrative data sets transparency matters cloud Recomputation.org Reproducibility by ExecutionRun It Reproducibility by InspectionRead It Availability – coverage Gathered: scattered across resources, across the paper and supplementary materials Availability of dependencies: Know and have all necessary elements Change management: Data? Services? Methods? Prevent, Detect, Repair. Execution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), Skills/Infrastructure for authoring and reading Description: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding Documentation vs Bits (VMs) reproducibility Learn/understand (reproduce and validate, reproduce using different codes) vs Run (reuse, validate, repeat, reproduce under different configs/settings)
  19. Electronic lab notebooks Issues: non-secure html using http inside secure https iframe in ipython doesn’t work – need to update interaction service to deliver on https. I’ll come back to iPython notebook later Client package (currently under development, will be available via Python Package Index (PyPI) for installation for all major platforms (Linux, Mac, Windows) Allows for calling Taverna Workflows available via Taverna Player List of available workflows can be retrieved from the BioVel Portal (Taverna Player) Users can enter the input values using Ipython Notebook (these values can be then results of the code previously run in the Notebook The outputs from running the workflow (the results) are returned to the Notebook and processed further The full workflow run and the overall process (provenance) can be saved in the Ipython Notebook format For an example (static for now), see http://nbviewer.ipython.org/urls/raw.githubusercontent.com/myGrid/DataHackLeiden/alan/Player_example.ipynb?create=1
  20. the goldilocks paradox the description needed to make an experiment reproducible is too much for the author and too little for the reader” just enough just in time No just ONE system. Many systems. Bewildering range of standards for formats, terminologies and checklists The Economics of Curation Curate Incrementally Early and When Worthwhile Ramps: Automation & Integrated Tools Copy editing Methods
  21. the reproducibility ecosystem For peer and author complicated and scattered - super fragmentation – supplementary materials, multi-hosted, multi-stewarded. we must use the right platforms for the right tools The trials and tribulations of review Its Complicated www.biostars.org/ Apache Service based ScienceScience as a Service
  22. Shapin 1984 Boyle’s instruments
  23. Check numbers of workflows The how, why and what plan to preserve prepare to repair description persists common frameworks partial replication approximate reproduction verification benchmarks for codes
  24. “lets copy the box that the internet is in” black boxes closed codes & services, proprietary licences, magic cloud services, manual manipulations, poor provenance/version reporting, unknown peer review, mis-use, platform calculation dependencies Mixed systems, mixed stewardship, mixed provenance, different resources
  25. Decay monitoring, traffic lights contribute to Watch/Monitoring Customise to User characteristics, e.g. their version of workflow engine myExp and RODL provide both Operations and Repository Next steps: Planning component Explicit roles of archival and curation organisations Scientific users play different roles but it tends to be the same user Considering SCAPE brings up some questions about wf4ever What are the high level policies for workflow preservation? Who’s playing the role of, e.g. Curator or Archival Organisation? So far, use cases tend to have the same individual playing multiple roles rather than different individuals. Who’s our Designated Community? Meta-Question: How do we answer these questions? Opportunity to develop workflow preservation policies.
  26. FAIRport* ReproducibilityFind, Access, Interoperate, Reuse, PortPreservation - Lots of copies keeps stuff safe Stability dimension Add two more dimensions to our classification of themes A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System Overlap of course Static vs dynamic. GRANULARITY This model for audit and target of your systems overcoming data type silos public integrative data sets transparency matters cloud Recomputation.org Reproducibility by ExecutionRun It Reproducibility by InspectionRead It Availability – coverage Gathered: scattered across resources, across the paper and supplementary materials Availability of dependencies: Know and have all necessary elements Change management: Data? Services? Methods? Prevent, Detect, Repair. Execution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), Skills/Infrastructure for authoring and reading Description: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding Documentation vs Bits (VMs) reproducibility Learn/understand (reproduce and validate, reproduce using different codes) vs Run (reuse, validate, repeat, reproduce under different configs/settings)
  27. FAIRport* ReproducibilityFind, Access, Interoperate, Reuse, PortPreservation - Lots of copies keeps stuff safe Stability dimension Add two more dimensions to our classification of themes A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System Overlap of course Static vs dynamic. GRANULARITY This model for audit and target of your systems overcoming data type silos public integrative data sets transparency matters cloud Recomputation.org Reproducibility by ExecutionRun It Reproducibility by InspectionRead It Availability – coverage Gathered: scattered across resources, across the paper and supplementary materials Availability of dependencies: Know and have all necessary elements Change management: Data? Services? Methods? Prevent, Detect, Repair. Execution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), Skills/Infrastructure for authoring and reading Description: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding Documentation vs Bits (VMs) reproducibility Learn/understand (reproduce and validate, reproduce using different codes) vs Run (reuse, validate, repeat, reproduce under different configs/settings)
  28. ENCODE threads exchange between tools and researchers bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
  29. ENCODE threads exchange between tools and researchers bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms Three principles underlie the approach: Identity Referring to resources (and the aggregation itself) Aggregation Describing the aggregation structureand its constituent parts Annotation Associating information with aggregated resources.
  30. ROs as Information Packages in OAIS DOIs – the resolution of a DOI URL is a landing page, not RDF metadata for a client. (designed for people) HTTP URIs provide both access and identification PIDs: Persistent Identifiers (e.g.DOIs) tend to resolve to human-readable landing pages With embedded links to further (possibly machine-readable) resources ROs seen as non-information resources with descriptive (RDF) metadata Redirection/negotiation Standard patterns for Linked Data resources Bidirectional mappings between URIs and PIDs Versioning through, e.g. Memento ENCODE threads exchange between tools and researchers bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms Three principles underlie the approach: Identity Referring to resources (and the aggregation itself) Aggregation Describing the aggregation structureand its constituent parts Annotation Associating information with aggregated resources.
  31. http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5 Our collaboration with ISA/GigaScience/nanopublication is finally being written up and will be submitted to ECCB this Friday. We will upload a copy to Arxiv after the deadline. - We will continue our workshop at ISMB, with BioMED Central. And Kaitlin will also join us on the Panel. You can find more details about agenda and panel planning in other emails. Posted on December 11, 2013 by Kaitlin Thaney Part of the Science Lab’s mission is to work with other community members to build technical prototypes that move science on the web forward. In particular, we want to show that many problems can be solved by making existing tools and technology work together, rather than by starting from scratch.The reason behind that is two-fold: (1) most of the stuff needed to change behaviors already exists in some form and (2) the smartest minds are usually outside of your organization. Our newest project extends our existing work around “code as a research object”, exploring how we can better integrate code and scientific software into the scholarly workflow. The project will test building a bridge that will allow users to push code from their GitHub repository to figshare, providing a Digital Object Identifier for the code (a gold standard of sorts in science, allowing persistent reference linking). We will also be working on a “best practice” standard (think a MIAME standard for code), so that each research object has sufficient documentation to make it possible to meaningfully use. The project will be a collaboration of the Science Lab with Arfon Smith (Github; co-founder Zooniverse) and Mark Hahnel and his team at figshare. Why code? Scientific research is becoming increasingly reliant on software. But despite there being an ever-increasing amount of the academic process described in code, research communities do not yet treat these products as a fundamental component or  “first-class research object” (see our background post here for more). Up until recent years, the sole “research object” in discussion was the published paper, the main means of packaging together the data, methods and research to communicate findings. The web is changing that, making it easier to unpack the components such as data and code for the community to remix, reuse, and build upon. A number of scientists are pushing the envelope, testing out new ways of bundling their code, data and methods together. But outside of copy and pasting lines of code into a paper or, if we’re lucky, having it included in a supplementary information file alongside a paper, the code is still often separated from the documentation needed for others to meaningfully use it to validate and reproduce experiments. And that’s if it’s shared openly at all. Code can go a long way in helping academia move toward the holy grail that is reproducibility. Unfortunately, academics whose main research output is the code they produce, often cannot get the recognition they deserve for creating it. There is also a problem with versioning:  citing a paper written about software (as is common practice), gives no indication of which version, or release in GitHub terms, was used to generate the results. What we’re testing figshare and GitHub are two of the leading repositories for data and code (figshare for data; GitHub for code). Open data repositories like figshare have led the way in recent years in changing our practices in relation to data, championing the idea of data as a first-class research object. figshare and others such as Harvard’s Dataverse and Dryad have helped change how we think of data as part of the research process, providing citable endpoints for the data itself that the community trusts (DOIs), as well as clear licensing and making it easy to download, remix, and reuse information. One of the main objectives here is that the exact code used in particular investigations, can be accessed by anyone and persists in the form it was in when cited. This project will test whether having a means of linking code repositories to those commonly used for data will allow for software and code to be better incorporated into existing credit systems (by having persistent identifiers for code snapshots) and how seamless we can make these workflows for academics using tools they are already familiar with. We’ve seen this tested with data over recent years, with sharing of detailed research data associated with increased citation rates (Piwowar, 2007). This culture shift of publishing more of the product of research is an ongoing process and we’re keen to see software and code elevated to the same status as the academic manuscript. We believe that by having the code closely associated with the data it executes on (or generates) will help reduce the barriers when trying to reproduce and build upon the work of others. This is already being tested in the reverse, with computational scientists nesting their data with the code in GitHub (Carl and Ethan, for example). We want to find out if formally linking the two to help ease that pain will change behavior. We are also looking to foster a culture of reuse with academic code. While we know there are lots of variables in this space, we are actively soliciting feedback from the community to help determine best practices for licensing and workflows. How to get involved (UPDATE: Want to help us test? Instead of sending us an email, how about adding yourself to this issue in our GitHub repository? More about that here.) Mark and Arfon will be joining us for our next Mozilla Science Lab community call on December 12, 2013. Join us to hear more about the project. Have a question you’d like to ask? Add it to the etherpad! We’re also looking for computational researchers and publishers to help us test out the implementation. Shoot us an email if you’d like to participate. Posted in Uncategorized.
  32. APIs – DOIs on landing pages for people.
  33. Scientist works with local folder structure. Version management via github. Local tooling produces metadata description Metadata about the aggregation (and its resources) provided by “hidden folder” Zenodo/figshare pull snapshot from github Providing DOIs for the aggregrations Additional release cycles can prompt new DOIs
  34. Cameron Neylon, BOSC 2013, http://cameronneylon.net/
  35. SKIPPED