Research Data Management at Imperial College London

A brief overview of the development and current workflows for Research Data Management at Imperial College London, presented to colleagues at the University of Copenhagen and Roskilde University in Denmark.

  1. 1. Library Services Research Data Management at Imperial College London 17th May 2017 – University of Copenhagen Library Sarah Stewart - Research Data Support Assistant, Scholarly Communications Team / @Biostew ORCID: http://orcid.org/0000-0002-9465-4042
  2. 2. Imperial College London (some context) • ~15,000 students and ~8000 staff, including ~3000 researchers • International community, with students from 125 countries • Focus on four main disciplines: Sciences, engineering, medicine and business • Times Higher Education World University Rankings 2016-2017: 3rd in Europe and 8th in the World. • Greatest concentration of high-impact research of any major UK University.
  3. 3. The Strong Case for RDM • Intensive Data-Generating Research Hubs = ‘Big Data’ • UK Med Bio - Bioinformatics Data Science Group – research into causes and progression of human diseases. • NHS Trust Research Data (Medicine) • Research Computing Group and Research Software Engineering Community • But also many important ‘small data’ projects across College.
  4. 4. Funder requirements… “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible…” RCUK Common Principles on Data Policy
  5. 5. Funder requirements…
  6. 6. Data Science hub and KPMG Data Observatory launch (Nov 2015) "At a research intensive university like Imperial it is hard to do anything that doesn't involve data.“ James Stirling, Provost "Data is at the heart of the human condition." Joanna Shields, UK Minister for Internet Safety and Security
  7. 7. The importance of RDM… “In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.” http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
  8. 8. Data Loss…
  9. 9. Process of policy development •2014: Draft policy: “Statement of Strategic Aims” •Lack of reliable data (on data storage needs (scale) in particular) •Concerns about cost of maintaining infrastructure •Concerns about uncertainties and changing market / policy landscape •Decision: re-think approach – more cost-effective, based on better data •Approach: RDM Green Shoots and RDM Investigation •Funded by Vice-Provost (Research) •Green Shoots: 6 bottom-up, academic projects (2nd half of 2014) •RDM investigation (Oct 2014-Jan 2015) •Online survey (academics; 390 responses) •~40 interviews (academics) •Workshops (academics & data managers)
  10. 10. Online survey – where does active data live? 0 10 20 30 40 50 60 70 80 College computer External/portable storage Cloud storage Personal computer Departmental/group storage College H drive ICT central storage Use of different types of storage in %
  11. 11. Online survey – growth of data volume 0 5 10 15 20 25 30 > 1 PB 100 TB – 1 PB 10 TB – 100 TB 1 TB – 10 TB 100 GB – 1 TB 10 GB – 100 GB < 10 GB Research group data storage needs in % Now In 2 years
  12. 12. Findings (best practice) •RDM principles are considered to be sound but not fully practised •Sharing publicly-funded data accepted in principle but some question value and cost •Concerns about (metadata) effort to make shared data discoverable •Metadata schemas are not yet widely available across disciplines •Auto-generate metadata where possible •Consensus that RDM training for PhDs is vital (also to ensure data loss when they leave)
  13. 13. Findings (data) •60-100% of grant required to re-generate data used in publications •% of data that needs retaining to support publications: ~60% •Data storage capacity will have to grow significantly •Concerns around back-up and archiving, esp. considering data volume •Popularity of cloud services (as opposed to College storage) Researchers want self-administered, secure, responsive solution for data sharing, storing and archiving; open APIs preferred (“Yes [storage] is really important. Basically, whenever we have been out to talk to researchers, that's the thing they have latched on to and want to talk about the most.” 10.1371/journal.pone.0114734)
  14. 14. Conclusions / policy implementation principles •Provide platform-independent, flexible data storage •Embed RDM training into PhD progression •Where available, uses existing workflows: Symplectic Elements: metadata management Spiral (DSpace): public (metadata) catalogue •Additional infrastructure: •use external resources •no long-term commitment •as flexible as possible •cost-effective
  15. 15. Infrastructure summary •Flexible, can react to market / policy changes •Components can be exchanged, no additional in-house infrastructure •Make a start, collect data, learn – change as required •Preservation infrastructure needs further work (discussions with Arkivum about ‘framework’ for costing into grants) – how much do we need to retain beyond published data? •It isn’t perfect, but we can make a start
  16. 16. Result: Imperial College RDM Policy “Imperial College London is committed to promoting the highest standards of academic research, including excellence in research data management. This includes a robust digital curation infrastructure that supports open data access and protects confidential data. The College acknowledges legal, ethical and commercial constraints on data sharing and the need to preserve the academic entitlement to publication.” “Principal Investigators have overall responsibility for the effective management of research data generated within or obtained for their research, including by their research groups. The Library and ICT will provide training, guidance and services to support PIs.” http://imperial.ac.uk/research-data-management
  17. 17. Who are we? Helping the Imperial community to communicate and disseminate their research and academic work.
  18. 18. The RDM Workflow at Imperial
  19. 19. RDM Infrastructure Data Access Statement
  20. 20. Data Management Plans: DMP Online
  21. 21. Live Data Storage: Box (and Others) • Box for live data storage (non-sensitive) and data sharing • Sensitive data storage via ICT secure storage and encryption • Specialist data storage, eg. Omero in Bioinformatics Data Science Group for light microscopy images • Research Computing Repository • Imperial GitHub for Software and code
  22. 22. Treat software as valuable research output PyRDM Green Shoots project Zenodo integrates with GitHub College survey on distributed version control Software Sustainability Institute – I a fellow
  23. 23. Archiving Data ‘without a Repository?’ • Data is archived in Zenodo or in UK Data Service (sensitive data) post- project • Software and code archived in Zenodo via GitHub • Metadata from Data and Software are deposited into Spiral via Symplectic • Indexed by DataCite and CrossRef
  24. 24. ORCID – Open Researcher and Contributor ID •Emerging global standard for identifying authors of academic outputs •The College created ORCID iDs for academics staff in late 2014 (now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements) •Imperial hosted launch of Jisc ORCID consortium with 50 UK universities in September 2015 http://www.imperial.ac.uk/orcid
  25. 25. Case for a national infrastructure? Currently, ~100 UK institutions spend effort to define and implement an RDM infrastructure (storage, workflows, interfaces, metadata, compliance, monitoring, business model etc.). Some aspects have to be local, but… …imagine a national research data infrastructure (say for data publishing and preservation), run by RCUK: •Economies of scale •No issues with funding •Just one system to interface with •Increased visibility/discoverability •Solution would by default be compliant •No commercial “ownership” of public data
  26. 26. Outreach – Love Your Data! • PhD Training on RDM Basics and DMPOnline (including PhD-specific DMPOnline template) • RDM ‘Drop-in Clinics’ • RDM ‘Byte-Size’ sessions – informal sessions on various topics • Imperial Data Circus • Open Access Road Show
  27. 27. Liaisons BDAU FoE FoLS Business School DoM DoSC Ped Materi als Bioinf Grad School ESA RDM Clinics RDM Talks 1:1s RMs Chem Aero HPC RSE CDT CM Hub FoM PhD Webi DMP DOIs DoM Event New Starter email Research Doughnut RDM Outreach OA Team OA/RDM OA Week Data Circus Civil Bio engcomp
  28. 28. Imperial Data Circus • Originally for Open Access Week 2016 • Informal showcase for research conducted at Imperial with Open Data and Open Software • Provides a forum to discuss open research across disciplines
  29. 29. Engaging Directly with Researchers • Embedded approach – meet with researchers in situ – in their labs and offices • One-on-one or group meetings • Departmental meetings to inform on policy changes and updates and provide insight into best practice.
  30. 30. Stats are exciting! 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 0 20 40 60 80 100 120 140 Time Series of RDM Enquiries (Totals) Number of enquiries
  31. 31. 0 20 40 60 80 100 120 Dataset Deposits in Spiral, 2015-2016 Number of Deposits Cumulative Deposits 0 2 4 6 8 10 12 14 16 18 Software Deposits in Spiral - 2015-2016 Series1 Series2 Data Catalogue
  32. 32. RDM – How are they asking? How RDM Enquiries are Resolved ASK rdm-enquiries one-to-one
  33. 33. Nature of RDM Enquiries Box Data Access Statement Data Management Plan Data Sharing/Publication Data Archiving DOIs/Metadata Data Policy Software Zenodo Outreach Data Licenses rdm-enquiries: what are they asking?
  34. 34. On the Horizon… • On-line MOOC, pre-recorded webinar and video presentations for researchers and students • Medicine-specific DMPOnline template • Jisc Shared Services Pilot – UK-wide network of data management services (in planning)
  35. 35. Questions? For more information: www.imperial.ac.uk/rdm rdm-enquiries@imperial.ac.uk Sarah Stewart – sarah.stewart@imperial.ac.uk @Biostew Ash Barnes – a.barnes@imperial.ac.uk @ashbarnes71