NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks and Off the Shelf Infrastructure
1. FAIR Software (and Data) Citation:
Europe, Research Object Systems,
Networks and Off the Shelf Infrastructure
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute UK
ELIXIR-UK, ELIXIR Interop Platform
carole.goble@manchester.ac.uk
Orcid 0000-0003-1219-2137
NSFWorkshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
2. Acknowledgements
U Manchester
• Stian Soiland-Reyes
• Stuart Owen
• Caroline Jay
• Robert Haines
• Norman Morrison
U Newcastle
• Paolo Missier
U Illinois Urbana-Champaign
• Dan Katz
Murphy Mitchell Consulting Ltd
• Fiona Murphy
F1000
• Liz Allen
U Oxford
• Neil Jefferies
• Lucie Burgess
ISI, USC
• Yolanda Gil
• Daniel Garijo
Force11 DCIP / Harvard
• Tim Clark
ELIXIR / BioSchemas.org
• Rafael Jimenez (Hub)
• Niall Beard (ELIXIR UK)
• Aleks Nenadic (ELIXIR UK)
• Jo McEntyre (EBI,THOR)
NIH BD2K
• Susanna Sansone (bioCADDIE,
ELIXIR)
• Ian Fore (NIH)
Software Sustainability Institute
• Shoaib Sufi
• Neil Chue Hong
• Mike Jackson
STFC
• Catherine Jones
10. Technical and Human
infrastructure for
Open Research
• interoperability and integration
between ORCID and DataCite
infrastructures
• PID e-infrastructure: promote uptake
and sustain
https://project-thor.eu/
Giving Researchers Credit for their Data
https://www.jisc.ac.uk/rd/projects/research-data-spring
• Carrots for authors, ”pain-free” submission
• Helper app for submitting data papers and data for
papers (using DataCite and ORCID)
16. Reproducible
Research: Citing
your execution
environment using
Docker and a DOI
http://www.software.ac.uk/blog/2016-03-29-reproducible-research-citing-your-execution-
environment-using-docker-and-doi
+ +
Caroline Jay, Robert Haines
http://idinteraction.cs.manchester.ac.uk
‘ABC: Using Object Tracking to Automate Behavioural Coding.’ CHI 2016.
=Fixity
Publishing
17. Service vs Science
Background vs Foreground Software
Software and Data* in
foreground most likely cited.
Same software and data viewed
as background not or not
explicitly cited though equally
essential
* Wynholds, et al (2012) Data, data use, and scientific inquiry: two case studies of data practices 10.1145/2232817.2232822
The invisibility of software, esp:
• widely used
• infrastructural
• component/library
• cross-discipline
20. Overcoming Barriers to Software Citation
survey of experiences citing software in research publications
http://bit.ly/1WxWFY7
Caroline Jay, Robert Haines, University of Manchester, UK
Robin Wilson, University of Southampton, UK
23. Repository spanning catalogue,
reference (“cite”) distributed 3rd party content
Standards
Public data
archives
Project data
repositories
Literature archives
Public model
archives
Uploaded content
Plugin Model tools
FAIRDOM
Plugin Data tools
27. "Mapping present and
future predicted distribution
patterns for a meso-grazer
guild in the Baltic Sea" Sonja
Leidenberger et al
Credits
Attributions
In Multiple Packs
30. • Metadata Framework: Bundles and relate multi-hosted scattered digital
resources of a scientific experiment or investigation using standard mechanisms
• Exchange, Publishing, Reproducibility, Portability, Repair
See Stephen Abrams Talk yesterday
31. Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware
Input
Data
Workflow
Description
Provenance
trace
Version of
Codes /
Services
Output
32. Manifest
Construction
Manifest
Identification
to locate things
Aggregates
to link things together
Annotations
about things & their
relationships
Container
Metadata Objects
Citable Reproducible Packaging
Manifest
Description
Type Checklists
what should be there
Provenance
where it came from
Versioning
its evolution
Dependencies
what else is needed
Manifest
Packaging content & links:
Zip files, BagIt, Docker images
Catalogues & Commons Platforms:
FAIRDOM SEEK, STELAR eLab
33. RO Types: Manifest Content Profiles
minimal, maximal, extensible
PID
Citation
Checklist
Version
Provenance
Dependencies
JATS
DC DCAT
ISAEFO
SBML
MIAME CWL
Common
properties
among content
types
Minimum information
for one content type
34. Workflow RO Bundle
ZIP or BagIt folder structure
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
JSON and YAML
35. Persistent Identification of Software:
a building block to citation & curation
Catherine.jones@stfc.ac.uk B. Matthews, I. Gent, J.Tedds & S Lamerton
Project URL http://rrr.cs.st-andrews.ac.uk/
Guidelines for persistently identifying
software using DataCite https://epubs.stfc.ac.uk/work/24058274
36. • Most recent?
– Location indicator, crosslink
– Credit the contributors now, the version now
– Strong presumption it exists and is living
• Fixed Snapshot?
– Defend publication, Reuse
– Credit the contributors then, the version then
– Presumption it exists and is archived
• Line in the sand?
– Credit the contributors then, the version then
– Weak presumption it exists
• Warrant?
• Acknowledgement not contribution
• Don’t care if it exists
• Important “influence” citation for its contributors
What does the citation mean
for the author or reader?
Identifier Resolution, Citation Persistence, Content Decay?
38. Commons
• DOI proliferation
– Channelling for Counting and
Landing Pages
• Authenticity:Tamper-proof
Exchange and Provenance
– Hashing & Checksums
– Secure signature & probity
services
– Block chain
• anti tampering transaction
logging
• https://www.ethereum.org/
– Proll and Rauber, Scalable
data citation in dynamic, large
databases: Model and
reference implementation,
(2014)
10.1109/BigData.2013.6691588
39. • Uber Collection / Hierarchy
/ subsetting (cf. Dryad,
DataONE, DataVerse)*
• RO author/contributor
information in its manifest
• ROs manifest =>
constituent resources,
provenance for
contribution.
*Ball, A. & Duke, M. (2011). "How to Cite Datasets and Link to
Publications?". DCC How-to Guides. Edinburgh: DigitalCuration Centre.
http://www.dcc.ac.uk/resources/how-guides/cite-datasets.
Granularity
Atomicity
Aggregation
40. Robust Transitivity & Propagation
Citation and Credit Aggregation and Granularity
• Backward Citation
– What was this based
on, who did it?
• Forward Citation
– What is using this,
who did that?
• “PageRank”
CreditAggregation
CitationGranularity
Drift
D. S. Katz, "Transitive Credit as a Means to Address Social andTechnological Concerns Stemming from Citation and
Attribution of Digital Products," Journal of Open Research Software, v.2(1): e20, pp. 1-4, 2014. DOI: 10.5334/jors.be
41. 1
3
2
2
3
4
1
1
1
2
2
5
3
3
4
3
Who gets credit for what?
Using Provenance for Credit Mapping
Paolo Missier
Alice
Charlie
Bob
Paolo Missier, DataTrajectories: tracking reuse of published data for transitive credit attribution, IDCC 2016
W3C PROV
dependency graph
“Provlets”
42. • Tracking RO usage and
indirect contributions
• Awarding fractional credit to
contributors
1. “Contriponents”
• contributors + components
2.Weighted contribution
3. Networked Credit maps
• Travel with the contriponents
Transitive Credit contribution
Dan Katz and Arfon Smith
*Katz, D.S. & Smith, A.M., (2015).
TransitiveCredit and JSON-LD.
Journal of Open Research
Software. 3(1), p.e7, DOI:
http://doi.org/10.5334/jors.by
D. S. Katz, "Transitive Credit as a
Means to Address Social and
Technological Concerns Stemming
from Citation andAttribution of
Digital Products," Journal of Open
Research Software, v.2(1): e20, pp.
1-4, 2014. DOI: 10.5334/jors.be
43. How do we
weight and
track ?
https://www.refme.com/uk/
http://depsy.org/
• Literature mining
– Duck et al Ambiguity and
variability of database and
software names in
bioinformatics (2015) DOI:
10.1186/s13326-015-0026-0
• Infrastructure
– Identifier and provenance
infrastructure, dependency
managers, metrics
services, repositories,
machine readable and
processable metadata,
reference managers
• CReDIT
– contributor taxonomy
– http://casrai.org/CRediT
– Time for revision?
http://mdc.lagotto.io/
45. Find | Cite | Credit
Ramps “Riding the metadata COTS-tails”
• 3rd of web pages
• Opening out -> community groups and extensions
• Builds on a shared core and data structure
• Simple embedding in web pages and CMS
• Widespread tooling, harvesters and indexing
• Search engines and Integration tools
• It’s all about the metadata and knowledge graph
Google, Bing, Yahoo, Yandex
49. BioSchemas.org
minimal, maximal, extensible
Training
materials
Events Organizations
Data
Standards
Software
Minimum information
for one content type
Training
materials
Events Organizations
DataSoftware
Standards
Common properties
among content types
Identifier, Title, Description, Author, Topics, Audience, Publication Date, …
50. Schema.org
BioSchemas.org, W3C FHIR WG
Daniel Mietchen et al , Adapting JATS to support data citation, Journal ArticleTag Suite Conference (JATS-
Con) Proceedings 2015, Bethesda (MD): National Center for Biotechnology Information 2015.
Journal ArticleTag Suite
DATS
SoftwareSourceCode
51. • Stretch in all directions
– Granularity, Atomicity,Aggregation
– Only partially automatable
• DynamicCitation
– “Citable Units”
– Buneman et al, https://tinyurl.com/bdf-cacm
• ROs & Contriponents
– Standardised metadata manifests
– Tracking fabrics
– Distributed => will break
• Keep it simple
– Incremental, Commodity based, LowTech
– Guidelines & Conventions
– Ramps – like Bioschemas.org
– Capture metadata all along the way….
Open Questions?
Getting folks (authors, reviewers,
editors) to cite software and data
52. For Further Information
• http://www.researchobject.org
• http://www.wf4ever-project.org
• http://www.fair-dom.org
• http://seek4science.org
• http://www.software.ac.uk
• http://www.bioschemas.org
• http://codemeta.github.io/
• http://myexperiment.org
• http://www.commonwl.org/