SlideShare une entreprise Scribd logo
1  sur  31
FAIR Workflows and
Research Objects get a Workout
Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
DataVerse Community Conference 2021, 15th June 2021
EOSC-Life pan-national data & method thematic commons for
bioscience data and methods
Using and sharing data, tools and workflows in the cloud
Infrastructure Zoo
Flows around a Federated & Diverse System
1466 data repositories / archives
916 data format and metadata
standards*
Not including the institutional or
national repositories like
DataVerse
https://fairsharing.org/ accessed May 2021
From compounds to clinical trials
Primary data - Secondary use
Infrastructure Zoo
Flows around a Federated & Diverse System
https://fairsharing.org/ accessed May 2021
Community domain enclaves
fragmented resources
flow across platforms & sovereignties
Workflows as an entry point and
integration mechanism
Legacy
• data repositories & data platforms
• processing and workflow
platforms
CryoEM Image Analysis Metagenomic Pipelines Drug Discovery
Quality control
Replication
Scrutiny
Shared know-how
Repetition
SARS-CoV-2 pre-processing, monitoring, analysis
https://elixir-europe.org/news/covid-19-variants-galaxy
Beyond Data:ComputationalWorkflows as method objects
to be shared, ported and reused & repurposed
Multi-step
Leverage third party codes
Scalable processing of data
Transparent research
Computational Workflows
Specification
description
Software
Execution
A special kind of software
Separation of the workflow specification from its execution
Precise description of a procedure: multi-
step process coordinated by input/output
data relationships (data types).
Execution of computational
processes (run a code, invoke a
service…).
Data is consumed and produced by
each step.
Beyond Data:ComputationalWorkflows as method objects
to be shared, ported and reused & repurposed
Multi-step
Leverage third party codes
Scalable processing of data
Transparent research
Computational Workflows
<my scripts>
A Zoo of Workflow Systems and “systems”*
Native repositories
*https://s.apache.org/existing-workflow-systems
EMBL-EBI MGnify
Metagenomics
pipelines
Command line tools
Sub-Workflows
Containers
Beyond Data: Multi-part Research Objects
dependencies and associates scattered across repositories and within repositories
made at different times by different people
Workflow itself Workflow associated Objects
Specification
descriptions
Parameters
Input
Datasets
Output
Datasets
Runtime details & Provenance
Documentation
Bind to Dependencies
- Containers
- Codes
- Sub-workflows
Bind to particular test engines
Publications
Image
Other workflows
Sub workflows
Software
Execution
Inputs and outputs
Author
Beyond Data:ComputationalWorkflows as multi-part method objects
to be shared, ported and reused & repurposed
Services for FAIRWorkflows
• Describe workflows with PIDs and metadata
• Flow: Move workflows between services and
platforms
• Parts: Package (scattered) objects linked
together by context (metadata files with their objects)
Honouring
• the legacy and diverse ecosystem
• buy-in from platforms
Be KISSy
• practical and developer friendly standards,
and webby mechanisms
• extensible openendedness – unknown
unknowns & diversity….
Workflow
Registry
Workflow
Systems
Repos Containers Deploys
Testing
Monitoring
Open Registry forWorkflows
Perpetual Development in the open by an open community
https://workflowhub.eu
Towards FAIR workflows and FAIR registry
• Find and AccessWorkflows
– Workflows may remain in their native repositories in
their native form. Or can deposit.
– Register (push) / Harvest (pull)
• Workflows interoperability and reusability
– Using metadata standards framework
Makers are the custodians
• people organisation: spaces, teams, organisations …
• workflow organisation: collections, tagging, facets ...
• credit: for submitters and authors
Open to any platform,
any subject, any person
WorkflowHub Club
TRS -Tool Registry Service API
Access:
FAIRWorkflow are FAIR Software
living and with dependencies…workflow history/provenance
Indicators of Status
Workflow
monitoring
Register versions
(Support Github actions)
Incremental metadata and
supplementary materials
(Tracking & Lifting
out subworkflows)
Which Workflow Objects are FAIR?
• workflow specification with test or
exemplar data?
• implementation of that design in a
particularWfMS?
• instantiation of that implementation
ready to run with input data, parameters
set, computational services spun up?
• run result with intermediate/final data
products and provenance logs?
• In practice this is a bit blurry.
A metadata
framework
extensible
enough to cope
FAIRWorkflows are FAIR Digital Objects
Descriptive, machine actionable metadata framework from the community
practical and developer friendly standards, extensible openendedness
Standardised
metadata about the
workflows
for registration,
discovery
Schema.org profile and types
ComputationalWorkflow
FormalParameter
ComputationalTool
Canonical workflow
description of the
workflow itself
Executable and
Abstract form
Type the input and
output data formats
of the steps
Ontology of types of data
and data identifiers, data
formats, operations in life
sciences
Upload and Download the parts?
Exchange between services & platforms?
Sharing & archiving the components of science
Lets step back!
Beyond Data: Multi-part Research = Multi-part ROs
Each object has its own
metadata and repositories
Integrated view & context over
fragmented resources using
their PIDs and metadata
Need a way of packaging up,
describing the package and
parts, citing, shipping around,
storing, archiving, sharing.
Reference real things. Like
people, mice and equipment.
Beyond Data: Multi-part Research Objects
Describing a Dataset as a
Digital Object
A way of packaging up,
describing the package and
parts, citing, shipping around,
storing, archiving, sharing.
Even reference real things. Like
people, mice and equipment.
Image Courtesy of Peter Sefton: https://arkisto-platform.github.io/standards/ro-crate/
The dataset may contain any kind of
data resource, about anything, in any
format as a file or URL. They can be
scattered across repositories.
Each resource can have a machine
readable description in JSON-LD
format
A human-readable description and
preview can be in an HTML file
that lives alongside the metadata
Provenance and workflow information
can be included - to assist in data and
research-process re-use
RO-Crate DigitalObjects may be
packaged for distribution eg via Zip,
Bagit and OCFL Objects
Courtesy Peter Sefton, https://arkisto-platform.github.io/standards/ro-crate/
A data
repository
perspective
Not just for workflows!
For any kind of object
data, publications, SOPs, software …
and data repositories!
especially data repositories!
Aggregate files, any URI-addressable content, another
RO-Crate, along with contextual information, into a citable
RO-Crate which has its own metadata.
Can use as a bag of references:
large/sensitive datasets
citation aggregator
FAIR
here
FAIR
here
Unbounded Research Objects
Anything referenceable that may be in scattered
across different repositories and/or different
datasets in the same repository.
Self describing integrated view spanning over
fragmented resources using PIDs and metadata
Metadata held alongside heterogeneous data
Infrastructure independent
• Exchange between repositories, registries and
services.
• Avoid vendor lock-in
Practical, lightweight approach Machine
and human readable, search engine friendly
and developer familiar, blah blah
FAIR Object middleware/underware
Standard Web Native PIDs + JSON-LD +
Schema.org, off the shelf archiving formats
Self-describing, Typed by profiles + add
more schema.org and domain ontologies
Extensible, descriptive and content
openendedness, honouring legacy, diversity,
and known and unknown unknowns - one size
does not fit all, blah blah
A Graph inside the RO-Crate
PIDs connect the Graph to the
outside world
http://www.researchobject.org/ro-crate/
RO-Crate variants: Profiles are extensible typing
RO-Crates collect metadata
Workflow-RO-Crate Workflow-Testing-RO-Crate
Workflow-Run-RO-Crate
*https://repository.publisso.de/resource/frl:6423291 https://www.researchobject.org/ro-crate/profiles.html
BioComputeObject-
RO-Crate
Galaxy-Workflow-RO-Crate
maDMP
RO-Crate*
DataRepo-RO-Crate
DataRepo-
DataCube-
RO-Crate
Aggregated
DataCitation
RO-Crate
Secure Bags of
PIDs to sensitive
/ large data
A step towards FAIR Digital Objects*
“To be FAIR each digital object
type has its own metadata
requirements,
and may have its own repositories
and registries”
FAIR DigitalObjects for Science: From Data Pieces toActionable
Knowledge Units: https://doi.org/10.3390/publications8020021
https://fairdo.org
FAIR Digital Objects
Actionable knowledge unit
Digital butterfly – digital twins
Bags of references
courtesy Dimitris Koureas
Coordinator DiSSCo EU
Research Infrastructure
Specimen object image
courtesy of Alex Hardisty
Specimen Data Refinery
Workflows to Digitise Natural History Specimens
FAIR DigitalObjects -> Packaged + Actionable
+
FAIR Digital Object
Framework
Open Digital Specimen
Workflow Infrastructure
courtesy of Alex Hardisty and Laurence Livermore
Real Use Cases Considered Essential!
• Building out in the open accelerated progress
RO-Crate is metadata middleware
• smart use of wheels already invented
• it takes a village: get tools, services on board
• developer friendly, firm best practice
A little bit of semantics goes a long way…
• Schema.org + JSON-LD
…prepare for more
Known and Unknown unknowns, One size does not fit all
• descriptive openendedness , multi-interpretation
Metadata sucks
• auto-curation is the way forward folks!
What about
the workout?
What about
FAIR?
FAIR at multiple levels & granularities
• Workflows & RO-Crates are composite and
nested, with dependencies
• FAIR all the way down
• Not always compatible – e.g. licenses
FAIR+
• Reusable and Usable workflows- testing &
parameter validation. Documentation.
FAIR software paradigm is pervasive
• Applies to RO-Crate Research Objects
FAIR takes a village, of course
C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes,
D.Garijo,Y. Gil, M.R. Crusoe, K. Peters & D.
Schober. FAIR computational workflows. Data
Intelligence 2(2020), 108–121.
doi: 10.1162/dint_a_00033
What about DataVerse?
Workflows have data and software
characteristics
RO-Crate preserves metadata and the objects
– workflow, data, datasets whatever…
• Archive/republish independent of
WorkflowHub
• Move content from one repository to
another, one service to another
• Point to content and don’t move it
• Sharing reproducible results & methods
Set data and
workflows and their
metadata free!
RO-Crate RepositoryCollection, RepositoryObject
represents records in a repository to describe an export from a repository or
digital library
https://www.researchobject.org/ro-crate/community
https://about.workflowhub.eu/community/

Contenu connexe

Tendances

DSpace 7 ORCID Integration
DSpace 7 ORCID IntegrationDSpace 7 ORCID Integration
DSpace 7 ORCID Integration
4Science
 

Tendances (20)

RDA
RDA RDA
RDA
 
RDA for Original Catalogers
RDA for Original CatalogersRDA for Original Catalogers
RDA for Original Catalogers
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 
How Enterprise Architecture Management and Configuration Management DataBase ...
How Enterprise Architecture Management and Configuration Management DataBase ...How Enterprise Architecture Management and Configuration Management DataBase ...
How Enterprise Architecture Management and Configuration Management DataBase ...
 
Workshop Introduction to Neo4j
Workshop Introduction to Neo4jWorkshop Introduction to Neo4j
Workshop Introduction to Neo4j
 
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data LineageStraight Talk to Demystify Data Lineage
Straight Talk to Demystify Data Lineage
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
FOAF
FOAFFOAF
FOAF
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable Entities
 
Data Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseData Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph Database
 
DSpace 7 ORCID Integration
DSpace 7 ORCID IntegrationDSpace 7 ORCID Integration
DSpace 7 ORCID Integration
 
Serialization in .NET
Serialization in .NETSerialization in .NET
Serialization in .NET
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 

Similaire à FAIR Workflows and Research Objects get a Workout

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 

Similaire à FAIR Workflows and Research Objects get a Workout (20)

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
FDO as building block for digitization technology stacks
FDO as building block for digitization technology stacksFDO as building block for digitization technology stacks
FDO as building block for digitization technology stacks
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 

Plus de Carole Goble

Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 

Plus de Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 

Dernier

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Dernier (20)

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

FAIR Workflows and Research Objects get a Workout

  • 1. FAIR Workflows and Research Objects get a Workout Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk DataVerse Community Conference 2021, 15th June 2021
  • 2. EOSC-Life pan-national data & method thematic commons for bioscience data and methods Using and sharing data, tools and workflows in the cloud
  • 3. Infrastructure Zoo Flows around a Federated & Diverse System 1466 data repositories / archives 916 data format and metadata standards* Not including the institutional or national repositories like DataVerse https://fairsharing.org/ accessed May 2021 From compounds to clinical trials Primary data - Secondary use
  • 4. Infrastructure Zoo Flows around a Federated & Diverse System https://fairsharing.org/ accessed May 2021 Community domain enclaves fragmented resources flow across platforms & sovereignties Workflows as an entry point and integration mechanism Legacy • data repositories & data platforms • processing and workflow platforms
  • 5. CryoEM Image Analysis Metagenomic Pipelines Drug Discovery Quality control Replication Scrutiny Shared know-how Repetition
  • 6. SARS-CoV-2 pre-processing, monitoring, analysis https://elixir-europe.org/news/covid-19-variants-galaxy
  • 7. Beyond Data:ComputationalWorkflows as method objects to be shared, ported and reused & repurposed Multi-step Leverage third party codes Scalable processing of data Transparent research Computational Workflows Specification description Software Execution A special kind of software Separation of the workflow specification from its execution Precise description of a procedure: multi- step process coordinated by input/output data relationships (data types). Execution of computational processes (run a code, invoke a service…). Data is consumed and produced by each step.
  • 8. Beyond Data:ComputationalWorkflows as method objects to be shared, ported and reused & repurposed Multi-step Leverage third party codes Scalable processing of data Transparent research Computational Workflows <my scripts> A Zoo of Workflow Systems and “systems”* Native repositories *https://s.apache.org/existing-workflow-systems
  • 10. Beyond Data: Multi-part Research Objects dependencies and associates scattered across repositories and within repositories made at different times by different people Workflow itself Workflow associated Objects Specification descriptions Parameters Input Datasets Output Datasets Runtime details & Provenance Documentation Bind to Dependencies - Containers - Codes - Sub-workflows Bind to particular test engines Publications Image Other workflows Sub workflows Software Execution Inputs and outputs Author
  • 11. Beyond Data:ComputationalWorkflows as multi-part method objects to be shared, ported and reused & repurposed Services for FAIRWorkflows • Describe workflows with PIDs and metadata • Flow: Move workflows between services and platforms • Parts: Package (scattered) objects linked together by context (metadata files with their objects) Honouring • the legacy and diverse ecosystem • buy-in from platforms Be KISSy • practical and developer friendly standards, and webby mechanisms • extensible openendedness – unknown unknowns & diversity…. Workflow Registry Workflow Systems Repos Containers Deploys Testing Monitoring
  • 12. Open Registry forWorkflows Perpetual Development in the open by an open community https://workflowhub.eu Towards FAIR workflows and FAIR registry • Find and AccessWorkflows – Workflows may remain in their native repositories in their native form. Or can deposit. – Register (push) / Harvest (pull) • Workflows interoperability and reusability – Using metadata standards framework Makers are the custodians • people organisation: spaces, teams, organisations … • workflow organisation: collections, tagging, facets ... • credit: for submitters and authors Open to any platform, any subject, any person WorkflowHub Club
  • 13. TRS -Tool Registry Service API Access:
  • 14. FAIRWorkflow are FAIR Software living and with dependencies…workflow history/provenance Indicators of Status Workflow monitoring Register versions (Support Github actions) Incremental metadata and supplementary materials (Tracking & Lifting out subworkflows)
  • 15. Which Workflow Objects are FAIR? • workflow specification with test or exemplar data? • implementation of that design in a particularWfMS? • instantiation of that implementation ready to run with input data, parameters set, computational services spun up? • run result with intermediate/final data products and provenance logs? • In practice this is a bit blurry. A metadata framework extensible enough to cope
  • 16. FAIRWorkflows are FAIR Digital Objects Descriptive, machine actionable metadata framework from the community practical and developer friendly standards, extensible openendedness Standardised metadata about the workflows for registration, discovery Schema.org profile and types ComputationalWorkflow FormalParameter ComputationalTool Canonical workflow description of the workflow itself Executable and Abstract form Type the input and output data formats of the steps Ontology of types of data and data identifiers, data formats, operations in life sciences Upload and Download the parts? Exchange between services & platforms? Sharing & archiving the components of science
  • 17. Lets step back! Beyond Data: Multi-part Research = Multi-part ROs Each object has its own metadata and repositories Integrated view & context over fragmented resources using their PIDs and metadata Need a way of packaging up, describing the package and parts, citing, shipping around, storing, archiving, sharing. Reference real things. Like people, mice and equipment.
  • 18. Beyond Data: Multi-part Research Objects Describing a Dataset as a Digital Object A way of packaging up, describing the package and parts, citing, shipping around, storing, archiving, sharing. Even reference real things. Like people, mice and equipment. Image Courtesy of Peter Sefton: https://arkisto-platform.github.io/standards/ro-crate/
  • 19. The dataset may contain any kind of data resource, about anything, in any format as a file or URL. They can be scattered across repositories. Each resource can have a machine readable description in JSON-LD format A human-readable description and preview can be in an HTML file that lives alongside the metadata Provenance and workflow information can be included - to assist in data and research-process re-use RO-Crate DigitalObjects may be packaged for distribution eg via Zip, Bagit and OCFL Objects Courtesy Peter Sefton, https://arkisto-platform.github.io/standards/ro-crate/ A data repository perspective
  • 20. Not just for workflows! For any kind of object data, publications, SOPs, software … and data repositories! especially data repositories! Aggregate files, any URI-addressable content, another RO-Crate, along with contextual information, into a citable RO-Crate which has its own metadata. Can use as a bag of references: large/sensitive datasets citation aggregator FAIR here FAIR here
  • 21. Unbounded Research Objects Anything referenceable that may be in scattered across different repositories and/or different datasets in the same repository. Self describing integrated view spanning over fragmented resources using PIDs and metadata Metadata held alongside heterogeneous data Infrastructure independent • Exchange between repositories, registries and services. • Avoid vendor lock-in
  • 22. Practical, lightweight approach Machine and human readable, search engine friendly and developer familiar, blah blah FAIR Object middleware/underware Standard Web Native PIDs + JSON-LD + Schema.org, off the shelf archiving formats Self-describing, Typed by profiles + add more schema.org and domain ontologies Extensible, descriptive and content openendedness, honouring legacy, diversity, and known and unknown unknowns - one size does not fit all, blah blah A Graph inside the RO-Crate PIDs connect the Graph to the outside world http://www.researchobject.org/ro-crate/
  • 23. RO-Crate variants: Profiles are extensible typing RO-Crates collect metadata Workflow-RO-Crate Workflow-Testing-RO-Crate Workflow-Run-RO-Crate *https://repository.publisso.de/resource/frl:6423291 https://www.researchobject.org/ro-crate/profiles.html BioComputeObject- RO-Crate Galaxy-Workflow-RO-Crate maDMP RO-Crate* DataRepo-RO-Crate DataRepo- DataCube- RO-Crate Aggregated DataCitation RO-Crate Secure Bags of PIDs to sensitive / large data
  • 24. A step towards FAIR Digital Objects* “To be FAIR each digital object type has its own metadata requirements, and may have its own repositories and registries” FAIR DigitalObjects for Science: From Data Pieces toActionable Knowledge Units: https://doi.org/10.3390/publications8020021 https://fairdo.org
  • 25. FAIR Digital Objects Actionable knowledge unit Digital butterfly – digital twins Bags of references courtesy Dimitris Koureas Coordinator DiSSCo EU Research Infrastructure Specimen object image courtesy of Alex Hardisty
  • 26. Specimen Data Refinery Workflows to Digitise Natural History Specimens FAIR DigitalObjects -> Packaged + Actionable + FAIR Digital Object Framework Open Digital Specimen Workflow Infrastructure courtesy of Alex Hardisty and Laurence Livermore
  • 27. Real Use Cases Considered Essential! • Building out in the open accelerated progress RO-Crate is metadata middleware • smart use of wheels already invented • it takes a village: get tools, services on board • developer friendly, firm best practice A little bit of semantics goes a long way… • Schema.org + JSON-LD …prepare for more Known and Unknown unknowns, One size does not fit all • descriptive openendedness , multi-interpretation Metadata sucks • auto-curation is the way forward folks! What about the workout?
  • 28. What about FAIR? FAIR at multiple levels & granularities • Workflows & RO-Crates are composite and nested, with dependencies • FAIR all the way down • Not always compatible – e.g. licenses FAIR+ • Reusable and Usable workflows- testing & parameter validation. Documentation. FAIR software paradigm is pervasive • Applies to RO-Crate Research Objects FAIR takes a village, of course C. Goble, S. Cohen-Boulakia, S. Soiland-Reyes, D.Garijo,Y. Gil, M.R. Crusoe, K. Peters & D. Schober. FAIR computational workflows. Data Intelligence 2(2020), 108–121. doi: 10.1162/dint_a_00033
  • 29. What about DataVerse? Workflows have data and software characteristics RO-Crate preserves metadata and the objects – workflow, data, datasets whatever… • Archive/republish independent of WorkflowHub • Move content from one repository to another, one service to another • Point to content and don’t move it • Sharing reproducible results & methods Set data and workflows and their metadata free! RO-Crate RepositoryCollection, RepositoryObject represents records in a repository to describe an export from a repository or digital library