Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
Introduction of Human Body & Structure of cell.pptx
How are we Faring with FAIR? (and what FAIR is not)
1. How are we
Faring with FAIR
(and what FAIR is not)
Carole Goble
The University of Manchester
FAIRDOM
ELIXIR, EOSC-Life, IBISBA, BioExcel CoE, FAIRplus
carole.goble@manchester.ac.uk
The views expressed in this talk are my own
Workshop: FAIRe Data Infrastructures, 15 October 2020
2. Data discovery and reuse at scale
through good data management
2016
A set of PRINCIPLES to enhance the
value of all digital resources and their
reuse by PEOPLE and by MACHINES
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 [Credit: Susanna Sansone]
3. Branding a trend, stimulating a movement …
2014
2016
2015
Open Science
Data Sharing
Recognition & Credit
Reproducibility Data-driven Science
Automation, AI
4.
5. The compulsory COVID-19 reference
“COVID is a good example .. there must be loads of legacy data.
We’re desperately trying to go back and look at what we knew from
SARS 10 years ago” – Pharma manager, FAIRplus project
https://www.covid19dataportal.org/
https://www.nature.com/articles/s41431-020-0635-7
https://www.rd-alliance.org/group/rda-covid19-rda-covid19-omics-rda-covid19-epidemiology-rda-covid19-
clinical-rda-covid19-1
https://doi.org/10.15497/rda00052
6. The compulsory COVID-19 reference
+ve • Data sharing boost –
• Impossible becomes normal
• Data infrastructure investments
• Mobilising rapid response
-ve • Political, technical and territorial issues
• Licensing, access to datasets, quality …
• Short-term vs long term sustainability
• Collection and governance bottlenecks
https://covid19.galaxyproject.org/
7. The compulsory what are the FAIR principles slide
… in a break out box,
without explanation or justification.
Aspirational, not a standard.
Relaunch a dialogue within the
research and policy communities.
Reboot a journey to wider
accessibility and reusability of data.
Prepare the community for
automation-readiness by supporting
FAIR for machines.
In the paper… 15 overlapping and ambiguous ….
Jacobsen et al FAIR Principles: Interpretations and
Implementation Considerations, J Data Intelligence (2020)
Mons et al Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding
principles for the European Open Science Cloud. Information Services &
Use. 37. 1-8. 10.3233/ISU-170824 (2017)
8. FAIR Principles in spirit
identifiers, metadata, availability, standards
[adapted from Susanna Sansone]
Findable
Accessible
Interoperable
Reusable
Globally unique, resolvable, and persistent identifiers
▪ To retrieve and connect data
Community defined descriptive metadata that is catalogued / searchable
▪ To enhance discoverability and reusability
Common terminologies and standards
▪ To use the same terms and they mean the same thing
Detailed provenance
▪ To contextualize the data and facilitate reproducibility
Terms of access
▪ Open as possible, closed as necessary
Terms of use
▪ Clear licences, ideally to enable innovation and reuse
Automation
9. FAIR Services in practice
identifiers, metadata, availability, standards
Findable
Accessible
Interoperable
Reusable
Persistent Identifiers
Metadata Services
Access Control
Repositories & registries
Data applications
“the internet of data and services”
10. Not one size fits all
FAIR is a set of guiding principles that
provide for a contract of expectation
between data providers and users.
A continuum of features, attributes and
behaviours, via many different
implementations for different use cases.
Communities will need to develop their own FAIR profile:
• for their portfolios of data, processes, governance, policies, assessment
A limited pan-discipline profile.
12. Credit to:
A global movement rather quickly…a FAIR frenzy!
Many of these
projects will
speak today.
Researcher,
clinician
confusion.
Community
coordination
cacophony
13. Movement Start up Phase:
Picking apart the Principles, Inventing Indicators, Assessment and Maturity Frameworks
1. (Re)define the principles and what they mean
2. Measure the “FAIRness level” of data
The target, before & after levels of data FAIRness in various
“FAIRification” processes
3. Measure an organisation’s capability & performance
for FAIR data generation & management
Support strategic investment decisions, cost/benefit analysis, processes &
monitoring, capacity building, change management for FAIR by Design
DOI: 10.15497/RDA0050
Dataset maturity model
Data Management
Infrastructure maturity model
14. What were the principles about?
Contract
Compliance
Certification?
Judgement
Endorsement
Regulation
Trusted Repositories
Commitment of a community of providers
Federation of FAIR data, registries and services
Comparison, Monitoring, Review,Quality
Assessment
Expectation setting of a data provider
Self-evaluation
Awareness, Reporting
Community respectful
Context aware
Health data - Regulation
15. What were the principles about?
Contract
Compliance
Certification?
EOSC Strategic Research &
Innovation Agenda consultation 2020:
metrics & certification least popular
action area
FAIRware
The Tyranny of Metrics
16. FAIR is a Spectrum
Not all data are equal, not all will be worth it
Spectrum of FAIR indicators
Different levels of maturity and importance to
different stakeholders and communities
Communities define levels, depths, coverage
[Barend Mons]
FAIRify
• just in case
• just in time
• just enough
Dataset portfolio
17. FAIR is not
Not Fuzzy “enhancing the ability of machines to automatically find and use data or any
digital object, and support its reuse by individuals” INCF Statement
Not Free
Not Fast
Not Simple
Needs experts, stewards, infrastructure, processes, maintenance …
“FAIR is non-trivial, and domain specific at anything other than the most
superficial level” - MarkWilkinson
From high effort high gain to low effort light gain
All require consensus, process change and maintenance.
• Mons et al Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services & Use. 37. 1-8. 10.3233/ISU-170824.
• Dunning et alAre the FAIR Data Principles fair? IDCC17
Not One Approach Not about turning everything into RDF
18. Coverage and Implementation ChoicesVary
Findable
Accessible
Interoperable
Reusable
Shallow wide,
low cost,
loose federation
restricted value
Deep narrow,
tight federation
harmonisation
high cost, high value
https://fairplus.github.io/cookbook-dev/intro
https://bioschemas.org/
https://www.go-fair.org/2020/07/08/a-
three-point-framework-for-fairification/
19. FAIR by Design
At the start of a collection, built in throughout the life cycle
change management, capacity building
FAIRifying Retrospectively
Legacy datasets, build a cohort,
cost benefit and FAIR readiness over a collection of datasets
20. Other FAIRVariants
FA(I)R Interoperability is
the hardest and
most costly to
define, implement
& maintain
Interoperability
is usually for a
purpose not
“just in case”
FAIR is not about
harmonising all
metadata to one
schema
FAIR+R
Reproducibility is
not the same as
Reusability
FAIR for all digital
objects – software,
workflows, SOPs,
models, containers,
training materials …
Depends on
availability and
metadata
Containers
FAIR++ Business and
change analysis.
Cost Benefit
Analysis.
Scientific /
BusinessValue
Quality control
Impact
Process maturity
Sustainability
FAIR for all digital
objects – software,
workflows, SOPs,
models, containers,
training materials …
EC Report:
Turning FAIR into Reality, 2018
https://www.natureindex.com/news-blog/what-
scientists-need-to-know-about-fair-data, 2019
47% respondents
needed greater
clarification
21. Step back…
Why do we need FAIR?
KnowledgeTurning, Information Flow Josh Sommer, Chordoma Foundation, 2011
Promote information flow
• groups and disciplines
• organisational boundaries at all levels
• technical infrastructure
• enable federation
Biomedical flow
• needs to control the information flow
– Ethical & governance frameworks,GDPR, consent
• federation feasibility varies
Fragmented and independent resources,
infrastructures, governance
Community enclaves.
Churn which leads to knowledge loss.
Scattered Fragmentation and
knowledge churn
22. FAIR is not synonymous with Open
respect authority and governance frameworks
23. Federated System of Catalogues and Repos
federation means common agreements and compliance
respecting accountability and responsibility
• Connected by PIDs
• Moving metadata around
• Common vocabularies
• Common cataloguing data elements
• Term cross-walks and mappings
• Shared API standards.
24. FAIR Profile for Biomedical Community
Findable
Accessible
Interoperable
Reusable
Awareness of data
Thin metadata sparingly shared
Data visiting
Not open but access controlled
Limited exchange under strict governance
Authentication and authorisation protocols
Data visiting
Ethics, consent, privacy preservation
Compliance , Governance
Standards and Regulation are different things
Federated analysis
Combining clinical, research and
public health data. Different scales,
collected for different purposes
Standards for Usage Access
• Beacon API for genomic variants
• Data Use Ontology – usage restrictions
• GA4GH Passports – access policies
Community standards
Trusted Research Environments
Community implementation profiles
GO-FAIR
implementation
networks
federated
25. FAIR Data Management for Projects
FAIR by Design because not everything can be fixed at the end
Platform to build Project Product Hubs
Projects to collaborate and get their
results organised and retained
• Metadata cataloguing: collection &
organisation
• PIDs, ontologies, metadata for machines
• Controlled sharing and access
• Plug into ecosystem of other registries
and repositories
• Data submission brokering to community
repositories
• Credit to contributors
FAIR itself
• FAIR Services are not necessarily FAIR
themselves (e.g. COVID Data Portal)
http://fair-dom.org
29. Mixture of shared and private objects
organised by metadata
https://www.health-atlas.de/
30. e.g. (Pillar III)
in-house data
in-house data
All LiSyM
Patient-related
clinical data
Aggregated data
API
External Tools
API
LiSyM: German Liver Systems Medicine Network
FAIR but never Open
[Wolfgang Mueller, HITS]
Share table
structure
Share
common
code
Share summaries
32. FAIR along the Pipeline
Understanding the pipeline,
moving metadata across
resources
FAIR stewardship effort and
pipeline design
ELIXIR RDMToolkit
33. The FAIR data infrastructure needed ….
tools, services, registries & catalogues, repositories…
Federation of fair repositories & registries
with mixed authority models
FAIR services – metadata services,
ontology servers, search engines, PID
services, validators, integrators,
annotators, assessors, brokers …
Interfaces - shared APIs, common terms,
cross-walks.
FAIR digital objects.
FAIRsFAIR confusagram, FAIR Ecosystem Components:Vision, V2.0
10.5281/zenodo.3734273 (missing the processing of data…)https://www.eoscsecretariat.eu/working-groups/fair-working-group
34. More than indicators, metrics and tech Infrastructure
“digital technologies (hardware, software), resources (data, services, digital libraries,
standards), comms (protocols, access rights, networks), people and organisational structures”
Data stewardship
Software Engs
Professionalisation
POSSIBLE
Processes
Organisational &
Cultural change
NORMATIVE
Incentives
Cost Benefit
REWARDING
Governance
Regulatory
Frameworks
ACCEPTABLE
Policy
REQUIRED
Education
Training
UNDERSTOOD
Sustainability -TRUSTED
35. How is FAIR faring? Maturing…
• Shifted the conversation
• Concept & mobilisation frenzy
• Rush to measure & certify
• Community specific
• FAIR not FOIR
It’s now the phase to embed, sustain and yield benefits for users
A marathon journey not a sprint
It is not simple, but it is no longer optional
36. Acknowledgements
Special thanks to
• Susanna Sansone (University of Oxford, OERC)
• Frederik Coppens (VIB)
• Barend Mons (GO-FAIR)
• Oya Deniz Beyan (Fraunhofer)
• Ibrahim Emam (Imperial College)
• FAIRDOM, GO-FAIR, FAIRplus and ELIXIR colleagues
FAIRDOM is sponsored by
Notes de l'éditeur
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
perform in a specified way in a particular situation or over a particular period.
Personal reflections
with a keynote presentation about the FAIR guiding principles for scientific data management and stewardship and their implementation in the life sciences,
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
3000 google scholar citations
The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines.
Hidden slide
Linking and moving metadata around
Common metadata and ids, shared APIs and cross-walks
HIDDEN SLIDE
Like a goldrush
HIDDEN SLIDE
Not everything that can be counted counts. Not everything that counts can be counted – Einstein & William Bruce Cameron
FAIRsFAIR will partner with the Research on Research Institute (RoRI) in FAIRware, a new initiative to build open source software tools and systems with the potential to transform uptake of the FAIR principles by researchers.
https://www.eoscsecretariat.eu/open-consultation-eosc-strategic-research-and-innovation-agenda
https://www.fairsfair.eu/fairsfair-data-object-assessment-metrics-request-comments
There will be a "mixed economy" of FAIR data in a community. Something may be FAIR but not useful or high quality. Something may be essential but not fully FAIR.
Simple word but is not simple to do.
And be incremental
Not one approach - RDF
https://www.nature.com/articles/s41431-020-0635-7
https://www.go-fair.org/implementation-networks/overview/vodan/
‘recipes’ for making different types of data FAIR
Different stages where the FAIR goes
The steps towards FAIR
FAIR is not about a resource’s
Quality or
Impact or
Scientific value or Business value
across and within groups and disciplines
across and within organisational boundaries at all levels: the lab, the project, the organisation, the community, the country
across all technical infrastructure platforms, repositories, registries, tools
enable federation
Fit for purpose scholarly comms
Fit for purpose publishing
while collaborating to compete in churning
Secure data sharing
Different levels
FAIR Accountability and Responsibility
Links in interfaces, common vocabularies, common minimum cataloguing terms and cross-walks, shared API standards.
Generic approaches (e.g. using schema.org), Generic infrastructure (e.g. Ontology Lookup Service), Generic catalogues (e.g. WorkflowHub)
Federated System of Systemsecosystem of catalogues, repositories, of different types. Degree to which you can get compliance
National and regional level…
Tiered levels….
Registries federated or not
Degree you can get everyone working together
Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
Zoo of catalogues and registries. LOTS of catalogies
Some flock around 1
Some rely on federated metadata
Moving between stages, moving things through boundaries.
NIH meeting for institutional repositories https://www.scgcorp.com/repositories2020/regclosed.
in the
Commons goods
Micro, meso and macro players.
OpenAIRE repository – helps find, access and reuse. Not meant to be interoperable.
What is special about biomedical
EOSCLife recommendations
Heterogeneous, fragmented
But its content may vary in its FAIRness!
Curating retrospectively….
HIDDEN SLIDE
LiSyM Data ManagementShare table structure
Create & share common code
Make it easy-to-install
Create and share summaries
HIDDEN SLIDE Basic research data management solution, simplifying daily study
management tasks
• GUI effective in engaging users in the platform
• Main structure modeled after biological research
• Next steps
– Testing large data files
– Adapting labels to be more suitable for our enviroment
– Establish more automation facilitating the RESTful service
– Looking into advanced features of SEEK
FAIRDOM SEEK + NeLS
First mile - FAIR data management for projects at the “first mile”
Last mile - Organise data / models to deposit / link to (ELIXIR) datasets
FAIR across the pipeline, across infrastructures, across platforms
The FAIR chain of custody
Provenance metadata propagation
Synch registered SOPs and data transformations in NeLS
Permissions and AAI through NeLS Portal and FAIRDOM SEEK
FAIR stewardship effort and pipeline design
Manual and automated
Ramps and skills (see Filip Pattyn, TuesdayLots about “my silo is fair”
SEEK register data and make it available in samples, so they can look at the samples when they are made public with no problem.
There is an issue if they want to access the original data as only Norwegian researchers can log in to the NeLS portal
Technically SOPs recorded in SEEK can/should carry information about data transformations that happened in NeLS
However, that relies on rese
Provenance of data analysis is provided by our Galaxy instances, and we would like to capture some of that into NeLS and SBI (for possibly later reopening in Galaxy or elsewhere), but that is not implemented yet.
If you think of more generic data provenance over time, what is added to a dataset when etc, NeLS does not support that.archers adding it, which means it practice it won't exist
es, we register data and make it avaialble in samples, so they can look at the samples when they are made public with no problem. There is an issue if they want to access the original data as only Norwegian researchers can log in to the NeLS portal
so we can track provenance from the registration point in SEEK
Given NeLS is run by ELIXIR Norway I would expect them at the very least to have a plan, f not some things implemented, but we will see
Moving DOs across
FAIR digital objects – moving around
Links in interfaces, common vocabularies, common minimum cataloguing terms and cross-walks, shared API standards.
Generic approaches (e.g. using schema.org), Generic infrastructure (e.g. Ontology Lookup Service), Generic catalogues (e.g. WorkflowHub)
Globally unique, resolvable, and persistent identifiers
Community defined descriptive metadata that is catalogued / searchable
Common terminologies and standards
Detailed provenance
Terms of access Terms of use
HIDDEN SLIDE
Was 7 recommendations, now 6
Mentions certification but does not argue for it.
PEOPLE MATTER! Its about community building and consensus
BTW: Majority of researchers have never heard of FAIR
https://www.natureindex.com/news-blog/what-scientists-need-to-know-about-fair-data