Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

ODiP: Reproducibility, open data and GDPR

330 vues

Publié le

Cylcia Bolibaugh spoke about reproducibility, open data and GDPR at the first Open Data in Practice event at the University of York on 15 November 2018.

Publié dans : Formation
  • Soyez le premier à commenter

ODiP: Reproducibility, open data and GDPR

  1. 1. Reproducibility, open data, & GDPR Cylcia Bolibaugh, Education, CReLLU
  2. 2. Data sharing in Education EROS (Education Researchers for Open Science) (UYSEG, CRESJ, PERC, CReLLU) • qualitative • quantitative (experimental) • quantitative (individual differences) • Various goals for sharing data -- today’s focus on reproducibility – Verifiability of a publication’s findings -- data and code
  3. 3. GDPR & Data Protection Act complicate sharing of research data… – Co-regulatory approach: a shift in accountability from data protection authorities to data controllers and data processors (us!) – Adoption of open science practices hindered by worries about compliance (funder, university requirements, legal, ethical),
  4. 4. Personal data & identifiability “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”
  5. 5. The ‘motivated intruder’ test: To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. (Recital 26 EU GDPR)
  6. 6. Differentiating between personal and anonymised data: A balance between (1) risk of disclosure/ re-identification (2) consequences of disclosure (“perceived value of the information”)
  7. 7. A toy dataset (Polish immigrants to the UK) -- accuracy scores on language measure -- reaction times on language measure -- score on cognitive measure -- score on cognitive measure -- Age -- Native language -- Age of arrival to UK -- Length of residence in UK
  8. 8. Assessing risk of reidentification (Klein et al 2018)  Small population and rare traits  Dyadic data  Hierarchical data (e.g., small subsamples of students, co-workers)  Motivated intruder test (e.g., jealous partner, nosy neighbor, envious co-worker, insurers, criminals)
  9. 9. questions, questions… 1) do the biographical variables constitute indirect identifiers? (1b) how can I systematically calculate the risk of re-identification (e.g. what is the risk of reidentification for a Polish immigrant to the UK, based on their age, length of residence in UK and age at time of immigration?) (2) If there is only a very slight possibility that an individual could be indirectly identified, is it still personal data? (3) What if the perceived value of the information that might be linked to that individual is actually quite low (e.g. how many milliseconds an individual took to identify an English word, or their rating of how acceptable a particular phrase or grammatical construction is)? (4) How would one go about documenting their consideration of these factors?
  10. 10. solutions? Reproducibility Open Data Usability Binning ✗ ✓✓ ✓✓✓ Permutation ✓✗ ✓✓ ✓✓✓ K-anonymity tools (e.g. R package sdcMicro) ✗ ✓✓ ✓✓ Synthesized dataset (e.g. R package Synthpop) ✓✓ ✗ ✓ Encrypted data with script (e.g. OSF) ✓✓✓ ✗ ✓ Restricted access depository ✓✓✓ ✓✓✓ ✓✓
  11. 11. OSF approved Protected Access repositories which are GDPR compliant - Research Data Center of the SOEP (DE) - Datorium (DE) - DataFirst (DE) - PsychData (ZPID, Leibniz) - University of Bristol Research Data Repository - The UK Data Service (ESRC)
  12. 12. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc)
  13. 13. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc) Thanks! Questions?
  14. 14. The Open Data badge is earned for making publicly available the digitally- shareable data necessary to reproduce the reported results.

×