presentation at https://researchsoft.github.io/FAIReScience/, FAIReScience 2021 online workshop
virtually co-located with the 17th IEEE International Conference on eScience (eScience 2021)
The Mariana Trench remarkable geological features on Earth.pptx
FAIR Computational Workflows
1. FAIR
Computational
Workflows
Professor Carole Goble
The University of Manchester UK
EU Research Infrastructures ELIXIR, IBISBA, EOSC-Life
BioExcel Centre of Excellence
Software Sustainability Institute UK
FAIRDOM Consortium
carole.goble@manchester.ac.uk
FAIReScience, IEEE eScience, 20th September 2021
2. Computational Workflows for Data intensive Bioscience
prepare, analyze, and share increasing volumes of complex data
CryoEM Image Analysis
Metagenomic Pipelines
Protein Ligand
Simulation
[Adam Hospital]
[Rob Finn]
[Carlos Oscar Sorzano Sanchez]
Nature 573, 149-150 (2019)
https://doi.org/10.1038/d41586-019-02619-z
Multi-step processes to
coordinate and execute multiple
codes and handle data and
processing dependencies
Typically Data flows
Benefit from FAIR data with
machine processable metadata
A precise description
A special kind of software
3. Workflow Management Systems FAIR bits
Abstraction: Separation of the workflow specification from its execution & tools
FAIR stratification, FAIR all the way down
FAIR Software
FAIR Data
FAIR Data FAIR Services
4. Image credit: BioExcel Centre of Excellence
Composition & Portability
different
components,
codes,
languages,
third parties
Workflow Management Systems FAIR bits
Composition: modularisation, FAIR parts & dependencies, propagation of FAIR properties
FAIR all the way down, versions, parts recycled, repurposed, remixed, citable credit
5. Workflow System Landscape
Inter-twingled, mix and matching
Scripting
environments
Interactive Electronic
Research Notebooks
Repositories Registries
Workflow
Management
Systems & execution
platforms
https://s.apache.org/existing-workflow-systems
298 Systems
General and Specialised
General Repositories
Identifiable
Community
6. FAIR Principles for Workflows
Hybrid Processual Digital Objects
Method “Data” Objects
Workflows as
FAIR Software
FAIR+R and FAIR++
Quality, maturity, maintainability
The principles revised
Workflows as
FAIR Digital Objects
Data-like method objects
Associated objects
The principles adapted
Workflows as
FAIR Data Instruments
FAIRification of the dataflow
The data principles supported
C. Goble, S. Cohen-Boulakia, S.
Soiland-Reyes, D. Garijo, Y. Gil, M.R.
Crusoe, K. Peters & D. Schober. FAIR
computational workflows. Data
Intelligence 2(2020), 108–121.
doi: 10.1162/dint_a_000
Workflow Objects
Software Objects
Data FAIRification
7. Efforts: Workflow Findability and Accessibility
Registries: lifecycle support for living workflows and associated objects
Identifiers: DOIs, ORCID, ROR etc
Licensing, credit, attribution
Support versions, reuse & remix
Workflow libraries
Access workflows at source, Github support
Auto / manual harvested metadata
Registry – execution integration
Execution monitoring services
Onboard WfMS platforms
Metadata standards framework
Metadata by stealth
https://workflowhub.eu
Publishing Services
Journals
scripts
Repos
Containers Deploys
Tools
https://dockstore.org/
Registries
8. Efforts: Workflow Metadata Frameworks
Metadata for machines & people, for WfMS, Registries & Services
Common metadata
about the workflow,
tools & parameters
Canonical workflow
description of the
steps of the workflow
Type the input and outputs
of the steps
Run Provenance /
Histories / Tests
RO-Crate format for
packaging a workflow, its
metadata and companion
objects (links to containers,
data etc) for exchange,
archiving, reporting, citing.
FAIR Digital Object
Open
Communities
9. Efforts: Workflow Interoperability
1. Workflow spec & WfMS
interoperability: describe workflows
independently of WfMS. Platform
independent pipeline exchange and
comparison.
2. Workflow Composability: Software interoperates
through APIs and metadata standards (FAIR4RS*).
Workflow-ready tools.
Recycle tested & validated canonical workflow blocks.
https://openwdl.org/
https://www.commonwl.org
Design for FAIR Data
& FAIR Workflow Reuse
Review
Curation
Certification
Governance
Licence combinations
Access permissions
Local -> Global identifiers
Best
Practice
* FAIR4RS First Draft of FAIR4RS principles
10. Efforts: Workflow Reusability and Usability
FAIR+R, FAIR++, FAIR4RS
Reusable – “can be understood, modified, built upon or incorporated into other software
workflows” Composability + Associated Objects + Metadata
Usable – “can be executed”
Containers & Packaging Testing & monitoring Execution standards APIs
Tool Registry Service API
checker workflows
test data
A2. metadata are accessible, even when the workflow is no longer available
Enough metadata that a workflow is read-reproducible as a method description if it no longer runs
11. Effort: Workflows as functions for FAIR Data
Data FAIRification of Workflows, assisted by WfMS & reporting
Challenge of diverse API & AAI landscape, formats and packaging
Review
Curation
Certification
Governance
Best Practice
Golden Examples
Canonical
workflows
Manage
AAI, format,
packaging
choices
Design for
FAIR Data
and Reuse
12. FAIR Computational Workflows
Hybrid Processual Digital Objects
Data + Software FAIR Principles
Data FAIRification methods
WfMS support
FAIR takes a village
Community of projects, WfMS, platforms &
environments, stakeholders.
Long tail pattern. Collective action by a few
WfMS and services nails 80:20.
FAIR by stealth.
Borgman, C. L., & Bourne, P. E. (2021). Why it takes a village to manage and share data. Harvard Data Science Review (under Review), arXiv:2109.01694v1.
13. EOSC-Life https://www.eosc-life.eu/
RO-Crate https://www.researchobject.org/ro-crate/
WorkflowHub https://workflowhub.eu/
Galaxy Europe https://galaxyproject.eu/
Bioschemas https://bioschemas.org/
Common Workflow Language https://www.commonwl.org/
Dockstore https://dockstore.org/
WorkflowsRI https://workflowsri.org/
Acknowledgements
Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo,
Yolanda Gil, Michael Crusoe, Kristian Peters, Daniel
Schober