Science is knowledge work. The scientific method and scholarly communication are about facilitating “knowledge turns” – that is, the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR flow and availability of data, methods for automated processing, reproducible results and on a society of scientists coordinating and collaborating. We need to build a new form of Research Commons and I will present my steps towards this.
Presented at Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018 that accompanied the 42nd Dies Natalis where I was awarded an honorary doctorate
Personal video:
https://www.youtube.com/watch?v=k5WN6KDDatU&index=4&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Video of the symposium:
https://www.youtube.com/watch?v=JN9eMMtCHf8&t=19s&index=6&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Building the FAIR Research Commons: A Data Driven Society of Scientists
1. Building the
FAIR Research Commons:
A Data Driven Society of Scientists
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
FAIR
Research
Commons
Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018
2. Data-Driven Science
Simulations, data exploration, data processing, analytics, text mining,
visual analytics, automated inference….
e-Science:
enabling Data Driven Science
e-Infrastructure:
enabling e-Science
Distributed computing
Data management, Catalogues
Virtual Research Environments
Metadata & Semantic Web technologies
Software Engineering Products and Services
Collaboration, Sharing & Publishing Platforms
4. “The FAIR Guiding Principles for scientific data management and stewardship
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
Principles
Metadata
Identifiers
Access policies
Technical: Political
Social
Economic:
A Flag,
A Meme
5. The Future of a Data-Driven Society
A Society of Scientists
Do Data Driven Science
Data Driven Scholarship
Data contributors,
curators, consumers
Biodiversity Scientists +
Research InfrastructureTechies
ProjectTeams……. Of Individuals
Collaborating and Competing Simultaneously
6. KnowledgeTurning
Increase Flow of Information
• Across scattered resources, platform, people
• Coordination, collaboration
• Cumulative, Dynamic
[original figure: Josh Sommer]
Cumulative
Commons
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8
7. • Distributed, Fragmented, Siloed
• No single entry point
• Living software, models, data, catalogues, tools …
What’s the Commons?
Resources
• collectively created
• owned or shared
• between or among a
community
Governance
https://scholarlycommons.org/
8. Macro, Micro*, pooled
• public resources
• data centres
• journals
• dedicated projects
• governance
• majority of
researchers
• labs & universities
• generators
• my resources
*Meso too – but to complicated for 20 minutes! See
http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
9. Some Data-driven Predictive Science
in Ecological Niche Modelling
predatory fish
the grazer endemic alga
[Obst, Leidenberger]
11. Do Research
Research Infrastructure
Services
Assemble
Methods, Materials Experiment
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Marketplace
Services
Publish
Share
Results
Any
research
product
Selected
products
Manage
Results
The Data-Driven Open Science
Public + Personal Commons
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
12. “The questions don’t change but the
answers do” Dan Reed, Microsoft
Salami Slicing, Scattering
13. 101 Innovations in Scholarly Communication - the Changing Research Workflow, Boseman and Kramer, 2015,
http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826
14. Research
Infrastructure
Services
Assemble
Methods, Materials Experiment
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Marketplace
Services
Share
Results
Manage
Results
Building a FAIR Research Commons
Portable
Automated
Reproducible
Methods
Supporting
Collaborations
Science 2.0 Repositories:Time for a Change in Scholarly Communication
Assante, Candela,Castelli, Manghi, Pagano DOI: 10.1045/january2015-assante
Mesirov,J. Accessible Reproducible Research Science
327(5964), 415-416 (2010)
19. Methods
techniques, algorithms, spec.
of the steps, models, versions,
robustness, statistical power …
Materials
datasets, parameters, thresholds,
versions, algorithm seeds, reference
datasets…
Instruments
tools, codes, services, scripts,
underlying libraries, versions,
workflows…
Laboratory
computational environment,
High performance access,
Operating system…
Data Instruments -> Data Scopes
Method Objects, fragile, updating ….
Maintain for Running
Document for Reading
20. Software is a first class member of
Data-driven Science
56% Of UK researchers
develop their own
research software
or scripts
73% Of UK researchers
have had no
formal software
engineering
training
Survey of researchers from 15 RussellGroup universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
Goble, Better Software, Better Research IEEE Internet Computing doi: 10.1109/MIC.2014.88
De Roure, Goble,Software Design for Empowering Scientists IEEE Software doi: 10.1109/MS.2009.22
Research Software Engineers
National Capability
24. Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL
2012
Don’t Publish, Release
Analogous to software
products and practices
rather than data or
articles
Agile Data-driven
Science
Treat ALL Products and
ALL Research Like Software
“evolving
manuscript”
Sir Mark WalportTime Higher Education Supplement, 14 May 2015
25. Context
Relationships
Credit
Research Goods FAIR Exchange
Governance
Stewardship
Credit
Tracking
Lifecycles
Fixivity…
Arxiv,
my Lab
myExperiment
GitHub,
Web Service myWebSite
bioModels.org,
openModeller
PubMed
Spreadsheet in
figshare
ArrayExpress,
BioSamples,
PRIDE, GBIF,
my Lab,
institutional
repository
Overlaying the
Research Commons
ecosystem
Unbounded
Composite
Living
Rots
26. Tracking, credit mining, comparison, auto-
metadata, blockchain, boundary objects….
1
3
2
A FAIR KnowledgeWeb of Research Objects
Map across metadata
Threaded publications
Navigate, Pivot-Focus, Cite
Self-describing
27. Unit for Reproducibility / Productivity, Portability,
Preservation, Executable Publishing
researchobject.org
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Linked Data / Semantic Web
FAIR machine processable metadata
Standards-based generic
metadata framework
Provenance
Dependencies
Versions
Checklists
Annotations
28. The time is right …
Reproducible Document
Stack project
Social
Technology Process
Purpose
Publishers, Research
Infrastructures, Communities,
Library services, Agencies ….
Not Jo Public….
30. Systems Biology Projects
• SME multi-disciplinary groups
• Multi-site collaborations
• Competing
• Experimentalists, dry modellers
• Self-deposit, no stewardship skills
• Funder driven sharing
modellers
experimentalists
Build a Project Commons!!
• Foster stewardship
• Stimulate sharing
• Ensure retention
• Respect global community,
local project resources
http://fair-dom.org Wolstencroft et al , Nucleic Acids Research, 2016, 10.1093/nar/gkw1032.
31. 3 Studies
Model analysis,
construction, validation
24 Assays/Analysis
Simulations,
characterisations
16
19
13
2
1
Structured organisation
Retain context in one place, Release FAIR products
Use and deposit in the fragmented resources [Penkler, Snoep]
32. FAIRDOMHub Systems Biology Commons
http://fairdomhub.org
Distributed Commons, Integrated View
“During and within” publishing
Simulate
Compare
Validate
10th Anniversary
33. What methods are been used to
determine enzyme activity?
What SOP was used for this
sample?
Where is the validation data for this
model?
Is there any group
generating kinetic data?
Is this data available?
Track versions of my model
Whats the relationship between the
data and model?
Which data belong to
which publications?
Self-controlled spaces
• enclaves -> public
Discover own assets
One entry point
• over external systems
35. TheTragedy of the Commons? FAIR Play?
Values
of assets
of reproducibility
of metadata
economics of infras.
priorities
Behaviours
enclave sharing
hoarding, flirting, voyerism
consumer-producer asymmetry
playground rules
Sweatshop
collaborating but competing
burden - time, skills
short term, shortcuts
principle investigators
tools & templates
seamless join-up
automation, stewards
reprod. debt is hard
The last mile
41. By side effect – metadata for FAIR
Universal tagging of Life
Science datasets, tools,
protocols, training materials
Web scale knowledge graph
Embedded ontologies and
metadata templates
Metadata harvesting by
stealth
https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
42. Ask what can you and Data Science
do for the FAIR Commons?
43. Building the
FAIR Research Commons:
A Data Driven Society of Scientists
Release FAIR
Research Objects
Manage
Datascopes
FAIR play incentives
FAIR
Research
Commons
44. All the members of the Wf4Ever team
Colleagues in Manchester’s
Information Management Group,
ELIXIR-UK, Bioschemas
http://www.researchobject.org
http://www.myexperiment.org
http://wf4ever.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://rightfield.org.uk
http://www.bioschemas.org
http://www.commonwl.org
http://www.bioexcel.eu
http://www.openphacts.org
https://www.force11.org/
Mark Robinson
AlanWilliams
Jo McEntyre
Norman Morrison
Stian Soiland-Reyes
Paul Groth
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Daniel Garijo
Catarina Martins
Alasdair Gray
Rafael Jimenez
Iain Buchan
Caroline Jay
Michael Crusoe
Katy Wolstencroft
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Josh Sommer
Matthias Obst
Jacky Snoep
David Gavaghan
Stuart Owen
Finn Bacall
Paolo Missier
Phil Crouch
Oscar Corcho
Dan Katz
Arfon Smith
David De Roure
Marco Roos
Massimilano Assante
Paolo Manghi