The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software.
In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Where does it go from here? The role of software in digital repositories
1. www.software.ac.uk
Where does it go from here?
The Place of Software in Digital Repositories
12 July 2012
OR2012, Edinburgh
Neil Chue Hong (@npch)
N.ChueHong@software.ac.uk
Software Sustainability Institute
2. Software is pervasive
in research www.software.ac.uk
Software Sustainability Institute
3. The Software Sustainability
Institute www.software.ac.uk
A national facility for building better software
• Better software enables better research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Software reviews and refactoring, collaborations
to develop your project, guidance and best practice
on software development, project management,
community building, publicity and more…
Supported by EPSRC
Software Sustainability Institute Grant EP/H043160/1
4. Software Sustainability:
preservation vs sustainability www.software.ac.uk
Sustainability?
Image courtesy of London Permaculture under CC-by-nc-sa license
Image courtesy of Mortati under CC-by-nc-nd
Preservation?
Software Sustainability Institute
5. Why are you considering
software sustainability? www.software.ac.uk
Achieve legal compliance
Create heritage value
Purpose
Enable continued access to data
Encourage software reuse
JISC-funded, with Curtis+Cartwright
http://www.software.ac.uk/resources/preserving-software-resources
Software Sustainability Institute
6. How are you going to choose
the right approach? www.software.ac.uk
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Approach
Transition (process-centric)
Hibernation (knowledge-centric)
Deprecation
Software Sustainability Institute
7. Software Carpentry
www.software.ac.uk
• Helping scientists be more productive by
teaching them basic computing skills
• How to use
repositories
properly
is a key skill
• http://software-carpentry.org
Software Sustainability Institute
8. Just the Nature of the problem?
www.software.ac.uk
Statistics courtesy of Greg Wilson, Software Carpentry, from Nature article
Maintenance is not fun
Published online 13 October 2010 | Nature 467, 775-777 (2010)
doi:10.1038/467775a
Hacking is fun
Software Sustainability Institute
10. Slide from Carole Goble, JCDL 2012
Reuse Review
New Refresh
State
Rerun
Same
State Good enough Repeat
To Verify
Reproduce
with new Data
Data
Replay
Provenance
Repurpose Recover
Reconstruct Repair
Data
Reproduce with new Method
Public
ation
Method Method Method
only
Documentation Provenance Execution
(link data and code)
Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
11. The most important: Reward
www.software.ac.uk
• How do we reward people for important software
contributions?
• Traditionally: publish a research paper that happens to
mention software
Can we provide more direct, acceptable software citations?
• A Research Software Impact Manifesto
http://www.software.ac.uk/blog/2011-05-02-publish-or-be-
damned-alternative-impact-manifesto-research-software
NB Authorship is hard
Software Sustainability Institute
13. Boundary www.software.ac.uk
What do we choose to keep:
- Workflow?
- Software that runs workflow?
- Software referenced by workflow?
- Software dependencies?
What’s the minimum citable part?
Software Sustainability Institute
14. Function
Granularity www.software.ac.uk
Library / Suite / Package
Algorithm
Program
…
Software Sustainability Institute
15. Why do we version?
Versioning www.software.ac.uk
- To indicate a change
- To allow sharing
- To confer special status
Public Public Public
v1 v2 v3
Personal Personal
v3 v3a
Personal Personal Personal
v1 v2 v2a
Personal
v2a
Software Sustainability Institute
17. Differing roles,
different repositories www.software.ac.uk
backup sharing archiving
Timescales Ingest
Policy Metadata
Licensing Assurance
Software Sustainability Institute
18. Software Metapapers
www.software.ac.uk
• Create a complete scholarly record including “standard”
publication, method, dataset and models, and software
e.g. modelling and simulation, statistical analysis
Enable replay, reproduction and reuse
• Pragmatic approach is to create a metadata record for
the software, and link it to a copy of the software in
some storage infrastructure
This is a software metapaper
Peer-review the metadata, not the software
• Journal of Open Research Software:
http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/
Software Sustainability Institute
and the work by B. Matthews et al: The Significant Properties of Software: A Study
19. An acceptable repository
www.software.ac.uk
• Metapaper references an instance of software,
stored in a “suitable” repository
Clear access / deposit / preservation policy
Adherence to standards
Ability to easily “transfer”
Sustainability of hosting organisation
Ability to monitor, check integrity (obsolescence?)
• We may be storing
Binaries, source code (as text or archived), virtual
machines(!)
Software Sustainability Institute
20. Potential for confusion
www.software.ac.uk
• ‘The right license for all parts of the scholarly record’
Victoria Stodden, Enabling Reproducible Research: Open
Licensing for Scientific Innovation
• Commonly used OSI approved licenses include:
Apache License, 2.0 (Apache-2.0)
BSD 3-Clause “New” or “Revised” license (BSD-3-Clause)
BSD 3-Clause “Simplified” or “FreeBSD” license (BSD-2-Clause)
GNU General Public License (GPL)
GNU Library or “Lesser” General Public License (LGPL)
MIT license (MIT)
Mozilla Public License 2.0 (MPL-2.0)
Common Development and Distribution License (CDDL-1.0)
Eclipse Public License (EPL-1.0)
• Does enabling the deposit of software just confuse
those already depositing publications/data?
Software Sustainability Institute
21. 5 Stars of Software?
www.software.ac.uk
• Do we need a 5 stars for software?
Existence – there is accurate
metadata that defines the software
Availability – you can access and run
the software
Openness – the software has an
open permissible license
Assured – the software provides
ways of assuring its correctness
Linked – the related data, c.f.
5 Stars of Linked Data
dependencies and papers are (Berners-Lee)
indicated 5 Stars of Online Journals
(Shotton)
Software Sustainability Institute
22. Take home points www.software.ac.uk
1) Researchers are developing more software
than ever, and trying to do it better
2) They want to be rewarded for creating a
complete scholarly record – this includes
software
3) We still don’t know the best way to shift
from one repository role to another when it
comes to software!
BackupSoftware Sustainability Institutearchiving
-> sharing ->
Notes de l'éditeur
Steven Gray here at CASA has produced a proof of concept showing the last hours snow fall in the UK as Tweets and the last 24 in postcode districts (the important part here is the data underneath, not the Tweets as such)Based on Ben Marsh’s work.
I ended up doing this because we needed to fix the basics:Reproducible researchSoftware credit / career pathsSoftware skillsDrawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
Clarifying the Purposes and Benefits of Software Preservation: http://softwarepreservation.jiscinvolve.org/wp/about/
There is a spectrum of approaches
Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?
c.f work of James Howison
Based on study done for Cameron Neylon’s Beyond Impact workshop
Is it more important to sustain the software that this workflow references, or the workflow itself?
At what level do you reference, at what level do you deposit?
Made more difficult than data because of the fluidly changing collaborative nature of software development – not just adding to the contributor pool
Based on OR2012 workshop outputs
Want to move towards OSI licenses which are similar in spirit to CC-BY e.g. BSD, Apache
C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?