Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.


545 vues

Publié le

A system to help small marine lab manage its data and metadata. Presented at the Research Data Network workshop, St Andrews, 30 Nov 2016

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci


  1. 1. SMRUDAS: A system to help a small marine lab manage its data and metadata 30th Nov 2016, JISC Research Data Workshop, St Andrews Clint Blight, SMRU cjb22@st-andrews.ac.uk
  2. 2. • Formed in 1978 by merging NERC’s old Seals Research Division and Whale Research Unit. • Until 1996 a NERC research unit based in Cambridge. • Then moved to the University of St Andrews to become a part of their School of Biology. The Sea Mammal Research Unit (SMRU)
  3. 3. Still at the University of St Andrews Part of the Scottish Oceans Institute (SOI) A NERC National Capability Delivery Partner Member of the Marine Alliance for Science and Technology for Scotland (MASTS) SMRU today
  4. 4. • The Unit has grown ( >50 staff + PhD students) • Lots of new and ongoing projects • New work funded + covered by data policies • Long history so also many legacy datasets • Most of the senior NERC staff due to retire over the next few years What does this mean as far as data?
  5. 5. • Air surveys – photos, video, counts • Tagging – telemetry data (locations, dives, CTDs, ….) • Pool based studies - diet records, weights, video,…. • Photo ID – databases of images, catalogues of individuals • Passive acoustic surveys – lots of audio files, ship tracks,.. • Laboratory analysis – contaminants, genetics,.. • Many more … Types of SMRU projects that generate data
  6. 6. • Some in an Oracle database (e.g. telemetry). • Starting to use the University’s central SAN. • Aiming to move more to archives like BODC. That actually sounds quite good! But at the moment still have things like ….. Where’s all this data stored?
  7. 7. Several groups currently storing audio, video, photo, etc. data on NAS boxes like these. (10s of Terabytes) Backups on lots of external drives. In this case acoustic surveys stored on what once would have been fairly state of the art drives. (Again Terabytes of data) Plenty of CDs and DVDs, which not that long ago were one of our most common backup options. But what’s on them and how long will the media last? Random old backups, in this case a tape tucked away in a fire safe that might be a copy of a now departed member of staff’s files from an old server. There are also files, some of which might still be useful, stored on a variety of other older media. Those are going to become harder and harder to access. Then there’s still plenty of analogue data around. SMRU’s grey seal survey only switched to using digital cameras in the last few years.
  8. 8. So how do we know what older datasets SMRU has and where to look for them? • Main long term datasets still in active use. • For past data it can often be a matter of asking the individual scientists within the Unit. • A couple of people around who still know something about SMRU’s older IT systems.
  9. 9. • Managing data & metadata during ongoing projects. • Making datasets publically available or at least more discoverable. • Trying to at least catalogue many of our older legacy datasets, especially those collected by staff who may retiring soon. So data management requirements include:
  10. 10. • Lots of standards, some more applicable than others to particular SMRU datasets. • May have to be able to provide INSPIRE, GEMINI, etc. compliant records. • SMRU’s designated NERC data archive centre is BODC. It seemed that, if they could be implemented, some MEDIN style records might help. What approach to take for metadata
  11. 11. The MEDIN Discovery Metadata Editor http://www.dassh.ac.uk/medin_metadata/ • Fairly friendly web interface for this type of thing. • A way we can at least make datasets discoverable.
  12. 12. Found some RDBMS MEDIN schema info. http://www.oceannet.org/marine_data_stand ards/medin_approved_standards/documents/ medin_usermanaged_tables_user.doc
  13. 13. Then came across medin-rdbms-tool from the GeoData Institute in Southampton https://github.com/geo-data/medin-rdbms-tool
  14. 14. • Create an in-house metadata catalogue. • Set up an RDBMS including the MEDIN tables • Use medin-rdbms-tool to generate XML files. • Import into MEDIN Discovery Metadata Editor • Publish records to make datasets discoverable. Might still work but things get out of sync when new versions of the standard are released  Proposed metadata pathway
  15. 15. How to implement something for the Unit? Kitware’s Midas Platform? http://www.midasplatform.org/
  16. 16. • Open source. • Quite mature - now on MIDAS Version 3. • Web based (PHP, Zend). • Choice of database backends. • Customizable/extensible via modules. • Option of support in future from Kitware. “The Multimedia Digital Archiving System”
  17. 17. Decided to develop SMRUDAS Maybe for data, metadata but also maybe docs code? Not just for archiving but also for ongoing projects. So named with this in mind. SMRU…. Digital / Data Assembly / Archive System / Storage
  18. 18. Two custom modules • Kept to fairly standard MIDAS implementation so option of upgrading to any new versions. • One custom module + tables targeted at very SMRU specific metadata. • Another to provide tables and forms for handling MEDIN style records.
  19. 19. IN SMRUDAS data + metadata flow WAMP server SMRUDAS Data: Hard disk Metadata: MySQL Data Metadata OUT Data to Archive Centre ( E.g. BODC ) MEDIN Metadata XML
  20. 20. SMRUDAS “home page”
  21. 21. The “Projects” page
  22. 22. “SMRU” metadata tab for a project
  23. 23. “MEDIN” metadata tab for a project
  24. 24. Data management through a NERC project • Proposal: outline data management plan (DMP) • Funded: data management plan agreed • During: data + metadata collected and stored • Before end: dataset(s) + metadata archived • After: data discoverable & publically available BODC BODC BODC SMRUDAS
  25. 25. SMRUDAS: Envisioned workflow for a new project Start of project: • Create a “Project” on SMRUDAS • At that point start filling in the basic metadata • Add any initial files (e.g. proposal, data management plan, protocols, etc.)
  26. 26. SMRUDAS: Envisioned workflow for a new project During the project: • If datasets/docs aren’t too big use SMRUDAS as an additional place to archive definitive versions • For larger dataset specify where data actually stored • Continue to update to the metadata record
  27. 27. SMRUDAS: Envisioned workflow for a new project Towards the end of the project: • If datasets/docs aren’t too big upload candidate final versions of datasets, code, docs • Try to create a complete the metadata record • Start liaising with data centre about archiving data
  28. 28. SMRUDAS: Envisioned workflow for a new project By the end of the project: • Final datasets/docs uploaded (or links supplied) • “Compliant” metadata record completed • Long term archiving agreed EVERYONE’S HAPPY!
  29. 29. SMRUDAS: So where are we now? • Took a long time to develop but finally rolled out. • Just starting to populate it with files and metadata. • Might end up being used by just a few core users. • Potentially could add additional modules to meet any new requirements.
  30. 30. SMRUDAS: What’s going to be tricky? • Ensuring major ongoing projects are being updated. • Trying to fit in cataloguing those the older datasets. • Keeping up with changes in software, standards, etc. • Training people about managing data + metadata & the benefits to their research of using such a system.
  31. 31. • Central system to keep track of SMRU data + metadata • Place to keep versions of small/medium size datasets • A tool to simplify creating required metadata records • Streamline passing of data + metadata to centres such as BODC for long term archiving • Should hopefully eventually provide SMRU with fairly complete in-house catalogue of its datasets SMRUDAS: How it should aid data management in SMRU