Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Library
Services
Research Data (and Software)
Management at Imperial*
(*Everything you need to know to gain impact for you...
Outline
1.) What is RDM and how does it apply to software?
2.) The RDM workflow: Tools and processes at Imperial
3.) Plan:...
What is Research Data Management?
Research Data Management is part of the research
process, and aims to make the research ...
Funder requirements…
“Publicly funded research data are a public good,
produced in the public interest, which should be ma...
The Strong Case for RDM
• Intensive Data-Generating Research Hubs = ‘Big Data’
• UK Med Bio - Bioinformatics Data Science ...
Why spend time on RDM?
• It is not a distraction from ‘real work’.
• You can work effectively and efficiently.
• Save time...
Missing Data (and Software)
In their parents' attic, in boxes in the garage, or stored on now-defunct
floppy disks — these...
Data Loss…
Software: What do I do with it?
• Lots of emphasis on ‘data’ management, but software in
research is often neglected.
• So...
Obsolecence!
Software as ‘Data’
• ‘Software is used to create, interpret, present,
manipulate and manage data’ (Software Sustainability...
Treat software as valuable research output
PyRDM Green Shoots project
Zenodo integrates with GitHub
College survey on dist...
The RDM Workflow at Imperial
RDM Infrastructure
Data
Access
Statement
Who are we?
Helping the Imperial community to
communicate and disseminate their research
and academic work.
What is in a Data Management Plan?
PLAN - Data Management Plans
A Data Management Plan is a document that is created in the early
stages of a project that:
•...
Data Management Plans: DMP Online
Software Management Plans
What?
• Like Data Management Plans, Software management
plans provide an outline of uses, respon...
Software Management Plans
Why?
• No clear funder requirements yet, but…
• Promotes citability and credit for your research...
Software Management Plans
How? (at Imperial College London):
• Specialised template in DMPOnline (via DCC)
• Imperial-spec...
Live Data Storage: Box (and Others)
• Box for live data storage (non-sensitive) and data
sharing
• Sensitive data storage ...
File naming, storing and retrieving
Backups…
• The ‘3-2-1 principle’: always have
at least 3 copies…
…on at least 2 different media…
…with at least 1 off-site...
Archiving and preserving data
Most research now has a requirement to preserve data for
at least 10 years in most cases.
Th...
Software should be preserved if:
• Software can’t be separated from the data or digital
object.
• Software is classified a...
Digital Preservation Issues
• Storage, Retrieval, Reconstruction and Replay are all
complexities relating to code librarie...
Strategies for Digital Preservation
• Data Integrity and File Fixity checks (management of
checksums) – for source code
• ...
Re3Data Repository Index
Zenodo
https://zenodo.org
Research. Shared. — all research outputs
from across all fields of research are
welcome!
Citeabl...
Archiving Data ‘without a Repository?’
• Data is archived in Zenodo
or in UK Data Service
(sensitive data) post-
project
•...
The importance of Metadata
• Ensure correct metadata is
used in order to facilitate
discovery – good metadata
should be fi...
Publishing: The Data (and Software) Access
Statement
“Published results should always include information
on how to access...
Why share data and software?
Build research
profile
Demonstrate
validity of
results
Contribute to
the
community
Because yo...
Share information about your data and software
• You can now share information about your data and software in the College...
Metadata Strikes Again!
• Ensure that you have good quality metadata present in
order to make your software and data finda...
Can’t share your data and software?
• Because it’s sensitive/confidential:
- Share an anonymous version
- Share summary st...
Guidance for licensing
‘How to License Research Data’
A guide from the Digital Curation Centre
http://www.dcc.ac.uk/resour...
Licensing for Software
- Various open-source software licenses: eg. MIT, GNU,
Apache, Mozilla Public, etc.
- https://www.s...
ORCID – Open Researcher and Contributor ID
•Emerging global standard for identifying authors of academic outputs
•The Coll...
In Summary…
• Planning: Use DMPOnline to draft a software
management plan.
• Storage: Use GitHub and/or Box to store activ...
Any Questions?
Thank you!
For more information and support:
Webpage: www.imperial.ac.uk/research-data-management
E-mail: r...
Prochain SlideShare
Chargement dans…5
×

Research Data (and Software) Management at Imperial: (Everything you need to know to gain impact for your work!)

A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Research Data (and Software) Management at Imperial: (Everything you need to know to gain impact for your work!)

  1. 1. Library Services Research Data (and Software) Management at Imperial* (*Everything you need to know to gain impact for your work!) HPC Summer School Research Data Management Community Session Sept. 20th, 2017 Sarah Stewart, Research Data Management Team, Central Library
  2. 2. Outline 1.) What is RDM and how does it apply to software? 2.) The RDM workflow: Tools and processes at Imperial 3.) Plan: Software Management 4.) Store and Archive: the importance of metadata 5.) Publishing and Discovery: Metadata strikes again! 6.) Conclusion/Questions
  3. 3. What is Research Data Management? Research Data Management is part of the research process, and aims to make the research process as efficient as possible, and meet expectations and requirements of the university, research funders, and legislation.
  4. 4. Funder requirements… “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible…” RCUK Common Principles on Data Policy
  5. 5. The Strong Case for RDM • Intensive Data-Generating Research Hubs = ‘Big Data’ • UK Med Bio - Bioinformatics Data Science Group – research into causes and progression of human diseases. • NHS Trust Research Data (Medicine) • Research Computing Group and Research Software Engineering Community • But also many important ‘small data’ projects across College.
  6. 6. Why spend time on RDM? • It is not a distraction from ‘real work’. • You can work effectively and efficiently. • Save time and reduce frustration in the future. • Set systems that work for you.
  7. 7. Missing Data (and Software) In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.” http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate- 1.14416
  8. 8. Data Loss…
  9. 9. Software: What do I do with it? • Lots of emphasis on ‘data’ management, but software in research is often neglected. • Software is sensitive to changes in its ‘environment’ • There is a lot of variation inherent in software (languages, versions, licensing, etc.)
  10. 10. Obsolecence!
  11. 11. Software as ‘Data’ • ‘Software is used to create, interpret, present, manipulate and manage data’ (Software Sustainability Institute) • Data: ‘recorded factual material commonly retained by and accepted…as necessary to validate research findings’ (EPSRC) • Software = Data!
  12. 12. Treat software as valuable research output PyRDM Green Shoots project Zenodo integrates with GitHub College survey on distributed version control Software Sustainability Institute – I a fellow
  13. 13. The RDM Workflow at Imperial
  14. 14. RDM Infrastructure Data Access Statement
  15. 15. Who are we? Helping the Imperial community to communicate and disseminate their research and academic work.
  16. 16. What is in a Data Management Plan?
  17. 17. PLAN - Data Management Plans A Data Management Plan is a document that is created in the early stages of a project that: • Helps you consider all aspects up front • Should be useful for you • Should be kept up to date An initial plan may be expanded later but should provide details about: • Plans and expectations for data • The nature of data and its creation or acquisition • Storage and security • Preservation and sharing
  18. 18. Data Management Plans: DMP Online
  19. 19. Software Management Plans What? • Like Data Management Plans, Software management plans provide an outline of uses, responsibilities, ownership, access and sharing, storage, maintenance and archiving of research software.
  20. 20. Software Management Plans Why? • No clear funder requirements yet, but… • Promotes citability and credit for your research = Increased Research Impact • Research Output can be validated/checked by others • Supports transparency of research and promotes Open Research. • Good practice!
  21. 21. Software Management Plans How? (at Imperial College London): • Specialised template in DMPOnline (via DCC) • Imperial-specific DMPOnline template (in development). • Use GitHub (Imperial has an enterprise account) • Use Zenodo or another subject-specific repository to archive versions of research software (GitHub integration) • Log metadata about your software into Symplectic. • Contact RDM Team (Central Library) for assistance/support: rdm- enquiries@imperial.ac.uk
  22. 22. Live Data Storage: Box (and Others) • Box for live data storage (non-sensitive) and data sharing • Sensitive data storage via ICT secure storage and encryption • Specialist data storage, eg. Omero in Bioinformatics Data Science Group for light microscopy images • Research Computing Repository • Imperial GitHub for Software and code
  23. 23. File naming, storing and retrieving
  24. 24. Backups… • The ‘3-2-1 principle’: always have at least 3 copies… …on at least 2 different media… …with at least 1 off-site • ‘LOCKSS’ – Lots Of Copies Keeps Stuff Safe • Never trust a backup you’ve never tested • Where possible, let ICT/department/faculty handle this
  25. 25. Archiving and preserving data Most research now has a requirement to preserve data for at least 10 years in most cases. This is to: • Enable future work • Support integrity of published findings Need to consider: • What should be kept • What format to keep it in • Where to keep it
  26. 26. Software should be preserved if: • Software can’t be separated from the data or digital object. • Software is classified as a research output • Software has intrinsic value
  27. 27. Digital Preservation Issues • Storage, Retrieval, Reconstruction and Replay are all complexities relating to code libraries, dependencies and software engineering overall. • Planning is essential for subsequent retrieval, reconstruction and replay. • Software is a digital object which is frequently the result of research and is often a vital prerequisite for the preservation of other digital objects. • Software preservation should be part of a broader preservation strategy: Research Data Management.
  28. 28. Strategies for Digital Preservation • Data Integrity and File Fixity checks (management of checksums) – for source code • Media and Format Migrations • Refreshing (reduces bit-rot) • Replication (create duplicate copies, avoids corruption, loss, erasure) • Emulation • Encapsulation (linking content with all information required for it to be deciphered and understood)
  29. 29. Re3Data Repository Index
  30. 30. Zenodo https://zenodo.org Research. Shared. — all research outputs from across all fields of research are welcome! Citeable. Discoverable. — uploads gets a Digital Object Identifier (DOI) to make them easily and uniquely citeable. Communities — create and curate your own community. Your own complete digital repository! Funding — identify grants, integrated in reporting lines for research Flexible licensing — because not everything is under Creative Commons. Safe — your research output is stored safely for the future in the same cloud infrastructure as CERN's own LHC research data.
  31. 31. Archiving Data ‘without a Repository?’ • Data is archived in Zenodo or in UK Data Service (sensitive data) post- project • Software and code archived in Zenodo via GitHub • Metadata from Data and Software are deposited into Spiral via Symplectic • Indexed by DataCite and CrossRef
  32. 32. The importance of Metadata • Ensure correct metadata is used in order to facilitate discovery – good metadata should be findable through both machine and human searches. • Ensure metadata is added following accepted standards (eg. following DCC Metadata Standards guide: http://www.dcc.ac.uk/resource s/metadata-standards/list)
  33. 33. Publishing: The Data (and Software) Access Statement “Published results should always include information on how to access the supporting data.” — RCUK Common Principles on Data Policy Include a statement in all publications stating: - How/where the underlying data can be obtained - What restrictions/terms apply
  34. 34. Why share data and software? Build research profile Demonstrate validity of results Contribute to the community Because you must (sometimes)
  35. 35. Share information about your data and software • You can now share information about your data and software in the College publications repository ‘Spiral’ via a form on Symplectic.
  36. 36. Metadata Strikes Again! • Ensure that you have good quality metadata present in order to make your software and data findable, accessible, (and also interoperable and reusable)
  37. 37. Can’t share your data and software? • Because it’s sensitive/confidential: - Share an anonymous version - Share summary statistics - Deposit in the UK Data Service Secure Lab - Require users to sign a Data Sharing Agreement • Because it’s not relevant to anyone else: - Actually, you’d be surprised… • Because it’s too much work to prepare: - Document and organise it as you go along - The up-front effort will make your future work easier too
  38. 38. Guidance for licensing ‘How to License Research Data’ A guide from the Digital Curation Centre http://www.dcc.ac.uk/resources/how-guides/license-research-data
  39. 39. Licensing for Software - Various open-source software licenses: eg. MIT, GNU, Apache, Mozilla Public, etc. - https://www.software.ac.uk/tags/licensing - https://opensource.org/licenses
  40. 40. ORCID – Open Researcher and Contributor ID •Emerging global standard for identifying authors of academic outputs •The College created ORCID iDs for academics staff in late 2014 (now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements) •Imperial hosted launch of Jisc ORCID consortium with 50 UK universities in September 2015 http://www.imperial.ac.uk/orcid
  41. 41. In Summary… • Planning: Use DMPOnline to draft a software management plan. • Storage: Use GitHub and/or Box to store active software • Archiving: Use Zenodo, GitHub or another subject- specific repository to preserve your software • Discovery: Make your software discoverable in Spiral via Symplectic • Publishing: Include a Data Access Statement in your published article stating where your software can be found and how it can be accessed and used
  42. 42. Any Questions? Thank you! For more information and support: Webpage: www.imperial.ac.uk/research-data-management E-mail: rdm-enquiries@imperial.ac.uk And also: DMPOnline: https://dmponline.dcc.ac.uk/ Software Sustainability Institute: https://www.software.ac.uk/

×