Software Management Plans and Software as Data

Presentation given to the High Performance Computing Summer School as part of a hands-on workshop developing software management plans and looking at software as data within the context of research data management best practices.

  1. 1. Library Services Software Management Plans and ‘Software as Data’ HPC Summer School Research Data Management Community Session Sept. 30th, 2016 Sarah Stewart, Research Data Management Team, Central Library
  2. 2. Missing Data (and Software) In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.” http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate- 1.14416
  3. 3. Obsolecence!
  4. 4. Software: What do I do with it? • Lots of emphasis on ‘data’ management, but software in research is often neglected. • Software is sensitive to changes in its ‘environment’ • There is a lot of variation inherent in software (languages, versions, licensing, etc.)
  5. 5. Software as ‘Data’ • ‘Software is used to create, interpret, present, manipulate and manage data’ (Software Sustainability Institute) • Data: ‘recorded factual material commonly retained by and accepted…as necessary to validate research findings’ (EPSRC) • Software = Data!
  6. 6. Software should be preserved if: • Software can’t be separated from the data or digital object. • Software is classified as a research output • Software has intrinsic value
  7. 7. Digital Preservation Issues • Storage, Retrieval, Reconstruction and Replay are all complexities relating to code libraries, dependencies and software engineering overall. • Planning is essential for subsequent retrieval, reconstruction and replay. • Software is a digital object which is frequently the result of research and is often a vital prerequisite for the preservation of other digital objects. • Software preservation should be part of a broader preservation strategy: Research Data Management.
  8. 8. Strategies for Digital Preservation • Data Integrity and File Fixity checks (management of checksums) – for source code • Media and Format Migrations • Refreshing (reduces bit-rot) • Replication (create duplicate copies, avoids corruption, loss, erasure) • Emulation • Encapsulation (linking content with all information required for it to be deciphered and understood)
  9. 9. Software Management Plans What? • Like Data Management Plans, Software management plans provide an outline of uses, responsibilities, ownership, access and sharing, storage, maintenance and archiving of research software.
  10. 10. Software Management Plans Why? • No clear funder requirements yet, but… • Promotes citability and credit for your research = Increased Research Impact • Research Output can be validated/checked by others • Supports transparency of research and promotes Open Research. • Good practice!
  11. 11. DMPOnline for Software Management Plans • Currently in the process of developing Imperial-specific Software Management plan templates using DMPOnline. • Previous templates through Software Sustainability Institute – some sources available via GitHub.
  12. 12. Software Management Plans How? (at Imperial College London): • Specialised template in DMPOnline (via DCC) • Imperial-specific DMPOnline template (in development). • Use GitHub (Imperial has an enterprise account) • Use Zenodo or another subject-specific repository to archive versions of research software (GitHub integration) • Log metadata about your software into Symplectic. • Contact RDM Team (Central Library) for assistance/support: rdm- enquiries@imperial.ac.uk
  13. 13. Any Questions? Thank you! For more information and support: Webpage: www.imperial.ac.uk/research-data-management E-mail: rdm-enquiries@imperial.ac.uk And also: DMPOnline: https://dmponline.dcc.ac.uk/ Software Sustainability Institute: https://www.software.ac.uk/