Invited talk as part of Westminster Insight Research Data Management Forum, https://www.westminsterinsight.co.uk/event/3416/Research_Data_Management_Forum
Securing, storing and enabling safe access to data
1. SECURING, STORING AND ENABLING
SAFE ACCESS TO DATA
ROBIN RICE
Research Data Management Forum:
London, 10 Dec. 2019
Westminster Insight
2. EDINBURGH UNIVERSITY’S RESEARCH DATA SERVICE
• Support for researchers across the data lifecycle
• Help with data management planning, data protection
impact assessment (risk assessment & data flows)
• Advising on safeguards for storing sensitive data
• Providing secure, cost-effective data facilities
• Assistance with information governance – applications
to data holders such as NHS; data use agreements
• Infrastructure for secure data storage: Data Safe Haven
• Infrastructure and policies for long-term data retention:
DataVault
3. TWO ACRONYMS, TWO PARADIGMS: FAIR AND GDPR
• FINDABLE
• ACCESSIBLE
• INTEROPERABLE
• REUSABLE
• GENERAL
• DATA
• PROTECTION
• REGULATION
by SangyaPundir [CC BY-SA 4.0
(https://creativecommons.org/licenses/by-sa/4.0)], from
Wikimedia Commons
4. FAIR PARADIGM: OPEN BY DEFAULT
”
FINDABLE: “Metadata and data should be easy to find for both humans
and computers. Machine-readable metadata are essential for automatic
discovery of datasets and services.”
ACCESSIBLE: “Once the user finds the required data, she/he needs to
know how can they be accessed, possibly including authentication and
authorisation.”
INTEROPERABLE: “The data usually need to be integrated with other
data. In addition, the data need to interoperate with applications or
workflows for analysis, storage, and processing.”
REUSABLE: “The ultimate goal of FAIR is to optimise the reuse of data. To
achieve this, metadata and data should be well-described so that they can
be replicated and/or combined in different settings.”
6. GDPR PARADIGM: PRIVACY BY DEFAULT
Six principles of the GDPR:
a) Lawfulness, fairness and transparency
b) Purpose limitation
c) Data minimisation
d) Accuracy
e) Storage limitation
f) Integrity and confidentiality (security)
8. 8
GDPR Principles and Research
From: https://byglearning.co.uk/mrcrsc-
lms/course/index.php?categoryid=1
9. DATA PROTECTION CHALLENGES FOR HUMAN SUBJECT
RESEARCHERS
• Understanding legal definitions (personal data, special categories,
data controllers and processors)
• Selecting secure data systems designed for privacy
• How to collect sufficient data for research question but not more
• Transparently communicating data processing actions to human
subjects (information sheets & consent forms)
• Understanding and documenting risks for a DPIA (data protection
impact assessment)
• How to anonymise/pseudonymise data; disclosure control
techniques
• Authorising access; creating legally binding data use agreements
• Dealing with breaches
10. UOE RESEARCH DATA SERVICE = TOOLS AND SUPPORT FOR WORKING
ACROSS THE DATA LIFECYCLE
https://www.ed.ac.uk/is/research-data-service
11. ADDITIONAL SAFEGUARDS NEEDED? UNIVERSITY DATA SAFE HAVEN
FOR MANAGING DATA IN ACTIVE RESEARCH PROJECTS
• For projects requiring advanced security, the
Data Safe Haven (DSH) provides a controlled
and secured service environment for
undertaking research using sensitive data.
• The service provides robust controls and
safeguards to enable the secure transfer of
sensitive data into a highly secure
environment where it can be stored,
manipulated and analysed by approved
members of a research team.
1
1
12. UOE DSH ENVIRONMENT: AN ANALYTIC PLATFORM
Secure virtual
environments
for different
projects
A number of virtual
desktops statically
assigned & linked
to each project and
its user group.
A Virtual Desktop
Environment
Restricted access
Clear segregation
of duties
Gatekeepers
2-factor
authentication
End to end
encryption
Up to 5 TB of
storage
1 CPU 4Gb RAM
Key data analysis
tools & packages
(SPSS, MatLab etc)
13. LIFECYCLE OF A DSH RESEARCH PROJECT
DSH processes are governed by DSH Standard
Operating Procedures (SOPs).
14. ARCHIVING, SHARING & RETENTION OF RESEARCH DATA AFTER
THE PROJECT IS FINISHED: DATASHARE AND DATAVAULT
17. WHAT IS DATAVAULT FOR?
The DataVault allows data creators at the University of Edinburgh to:
• Store their data safely with the University for long-term retention
• Link this data to projects, outputs in Pure without having to re-enter
any metadata;
• Receive a DOI for the data which allows easy citation in
publications and other outputs;
• Comply with funder and University requirements to preserve
research data for the long-term;
• Be confident that their data will exist without corruption or decay to
reuse in the future as and when required;
• Personal and confidential data are protected through encryption.
1
7
18. WHAT IS DATAVAULT *NOT* FOR?
• Where it is intended that data will ultimately be made public, they
should instead be deposited either in a suitable disciplinary
repository or in DataShare, our open access data repository.
• DataShare deposits may be placed under embargo up to 5
years, so that files will remain inaccessible temporarily.
• Data needing to be retained only for a short period.
• Data in which a student owns the copyright.
19. WHAT IS INNOVATIVE ABOUT DATAVAULT?
• Fills a gap for a complete data lifecycle institutional service, helping to fulfil
the 2011 RDM policy
• Facilitates a collection of institutional data assets to be managed by the
University
• Incentivises open sharing by pairing with DataShare
• Open metadata records even though nominally ‘closed’
• Buys time for appraising data worthy of further curation
• Combines paradigms of data centres and digital preservation
“The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).” https://www.go-fair.org/fair-principles/
UK ICO website: ‘“(a) processed lawfully, fairly and in a transparent manner in relation to individuals (‘lawfulness, fairness and transparency’);
(b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall not be considered to be incompatible with the initial purposes (‘purpose limitation’);
(c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);
(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay (‘accuracy’);
(e) kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes subject to implementation of the appropriate technical and organisational measures required by the GDPR in order to safeguard the rights and freedoms of individuals (‘storage limitation’);
(f) processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (‘integrity and confidentiality’).”’