Introduction: This paper describes a topical case study conducted at University of Helsinki. Current states of research data management (RDM) practices within the academic community have been under close scrutiny during summer 2016 in Project MILDRED, Development Project of Research Data Infrastructure at University of Helsinki (UH).
Project MILDRED: Charting Ground for Research Data Management Services at University of Helsinki
1. PROJECT MILDRED
Charting Ground for
Research Data Management Services
at the University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
1
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
Picture: Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) [CC BY 4.0 ]
CC BY University of Helsinki
2. WHAT IS THE PROJECT MILDRED?
• MILDRED is the project for updating research data infrastructure
of the University of Helsinki to provide tools & services for
supporting the data management.
• The aim of MILDRED is to provide researchers with a state-of-
the-art infrastructure and design data related services to help
researchers.
• The infrastructure will be developed for and with researchers and
together with national and international parties. User groups are
pivotal in the development process.
CC BY University of Helsinki
3. 1. Digitalization of Research Data Services Delivery
2. Data Repository Service
3. Data Publishing and Metadata Service
4. Data Storage and Backup
5. Implementation of Data Management Planning Tool -
DMPTuuli
MILDRED SUBPROJECTS
CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017 3
4. ABOUT THE STUDY
• This was a pre-study for the data repository project of MILDRED 3
• The study was conducted by trainee Anna Salmi supervised by project
manager Mari Elisa Kuusniemi and librarian Mikko Ojanen
• The study was conducted in summer 2016
• The study consisted of three phases:
I. Data inventory
II. Survey
III. Metadata exploration
4CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
5. THE HYPOTHESIS
Identifying the data repositories used by university
researchers would lead to finding the datasets
related to the university.
We can harvest metadata of the datasets to the
university data repository.
5CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
6. PHASE I: DATA INVENTORY
• In June 2016, we conducted an inventory of 250 PLOS articles of
University of Helsinki published 2015-2016.
• PLOS requires public RDM statements from the authors.
→ Data citation was easy to spot off.
6CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
7. PHASE II: SURVEY
• The questionnaire was emailed to the entire research staff of the
University of Helsinki in June 2016.
• 258 answers
• 62 % from life sciences
• 21 % from social science and humanities
• 17 % from natural sciences
• The questionnaire contained
• multi-choice question about different databases
• multi-choice question about alternative data storage places and devices
• free response field for the respondent’s reasoning, comments and questions.
7CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
Anonymized version of the MILDRED survey data is shared in Figshare: https://doi.org/10.6084/m9.figshare.3806394.v4
8. RESULTS
• 44 % of respondents used at least one repository
• 21 % of respondents used at least two repositories
• 10 % of respondents used at least three repositories
• 56 % did not use repositories
The most popular repositories:
• GenBank 17 %
• GitHub 14 %
• Sequence Read Archive (SRA) 7 %
• Gene Expression Omnibus (GEO) 5 %
8
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017CC BY University of Helsinki
9. WHY THE DATA WAS NOT IN A
REPOSITORY
• 29 % lack of knowlegde of suitable repositories
• 11 % sensitive data
9
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017CC BY University of Helsinki
10. REASONS
” I do not know or trust them [repositories] enough. I do not have such big
data that it would be a problem to store it other ways. I would need a
system that is reliable, easy to use and access and permanent solution.”
“The [research] results are fully covered by the published articles.”
“Unclear benefits with respect to effort.”
“It was sufficient until this moment to store the data within University
infrastructure, although convenient data sharing between collaborators is
still lacking.”
10
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
11. PHASE III: METADATA EXPLORATION
• As a result of the inventory and the survey together, a list of 48
repositories was created.
• We harvested all information about these repositories (by repository ID)
from Re3data.
• We looked more closely at
• metadata (focusing on affiliation, contributors)
• data access type
• data licenses
• persistent identifier (PID)
• APIs
• We made a test search to all 48 repositories by the researcher’s name on
affiliation “Helsinki”.
11CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
12. RESEARCHER’S NAME OR
ORGANISATION NAME
• Search by researcher name gave results in 21 from 48 repositories.
• Search by organisation name gave results in 8 from 48 repositories.
12
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017CC BY University of Helsinki
13. 13CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
Repository Openness of database API PIDs
dbGaP open/restricted/embargoed yes (FTP) -
GitHub open/restricted yes (other) -
GBIF open yes (REST) -
Inspire-HEP open yes (OAI-PMH) ARK, DOI, ORCID
Language bank of Finland open/restricted yes (OAI-PMH ) other
MG-RAST open yes (REST/FTP) -
Finnish Soc. Sci. Data Archive open/restricted yes (via KUHA OAI-PMH) other
Zenodo open/restricted yes (OAI-PMH/REST) DOI, ORCID
14. RESULTS FROM METADATA
EXPLORATION
• There are many kinds of contributors: data creator, rights holder,
collector, curator, manager, analyst, submitter, contact, distributor,
and many more.
• Affiliation equal with journal articles or books is difficult or impossible
to define accurately.
14
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017CC BY University of Helsinki
15. CONCLUSIONS
• There clearly is a need for training and marketing of good data
management practices, university services and data repositories
available.
• Our hypothesis proved to be wrong (in most cases).
• There is, neither will be, an efficient way to collect all data produced by
an organisation. Therefore we can concentrate on quality, not volume.
• We should optimize the impact of datasets and the metadata we will collect
into the institutional data repositories.
15CC BY University of Helsinki
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
16. THANK YOU!
mari.elisa.kuusniemi@helsinki.fi
Article about the study was published: Journal of EAHIL 2017; vol. 13 (2): 13-15,
http://eahil.eu/wp-content/uploads/2017/06/journal-2-2017-web-1.pdf
16
Mari Elisa Kuusniemi & Anna Salmi
EAHIL Friday 16 June 2017
Picture: Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) [CC BY 4.0 ]
CC BY University of Helsinki