Starting with education, inception of research questions, planning, acquisition, analysis and reporting, there are multiple points where Open Science should play a role. In my presentation at the CuttingEEG conference in Paris, I argue that we should not only be sharing primary outcomes as Open Access publications, but that openness involves the full research cycle. Specifically, I will be sharing my experience with Open Data, privacy challenges and possibilities under the GDPR, Open Source for sharing analysis methods, dealing with imperfections in science and versioning of data, code and results. Finally, I will introduce BIDS for EEG, a new effort to increase the impact of shared and well-documented EEG data.
CuttingEEG - Open Science, Open Data and BIDS for EEG
1. Open Science, Open Data and BIDS for EEG
Robert Oostenveld
Donders Institute, Radboud University, Nijmegen, NL
Karolinska Institutet, Stockholm, SE
r.oostenveld@donders.ru.nl
These slides will
be shared online
on slideshare
2. Outline of this session
Dorothy Bishop – simulate and pre-register for more reproducible EEG
Aina Puce – better and detailed reporting of results
Robert Oostenveld – sharing of data and analysis details
3. What is Open Science?
Open educational resources
Open access publications
Pre-registration
Open peer review
Open methodology
Open source
Open hardware
Open data
7. Open Science – infrastructure and tools
Git and GitHub, Gitlab, BitBucket
Work together on code for analyses
Open Science Framework (osf.io)
Work together on documenting
DataVerse, Zenodo, etc
Sharing of data
Code Ocean, Microsoft Azure, Anaconda Clould
Cloud-based computational reproducibility platform
Past - Black-and-white version of article printed on dead trees
Present - PDF for download, sometimes online supplementary material
Future - Online notebooks that reproduces the results in detail
Lab notebook
Science is getting more exiting – but also harder in some ways
8. Open Science – planning ahead
Planning your analysis
Planning and publishing primary outcomes
Writing your scientific papers
Writing your PhD thesis
Public outreach
Planning and publishing secondary outcomes
Publishing details on the methods
Data management plan
Publishing your data
9. Sharing primary and secondary outcomes
Publication with the primary findings
To the wider audience
To your scientific peers
Methods
Protocol
Stimulus material
Analysis methods
Original data
Details on the results
https://en.wikipedia.org/wiki/IMRaD
10. Open Science – planning ahead
Planning your analysis
Planning and publishing primary outcomes
Writing your scientific papers
Writing your PhD thesis
Public outreach
Planning and publishing secondary outcomes
Publishing details on the methods
Data management plan
Publishing your data
11.
12. Share/publish your methods
More details in your analysis than fits in your “Methods” section
Not possible to describe details in human-oriented text
Batch scripting
MATLAB, Python, R, SPSS, Bash, …
Analysis script corresponds to computer code
Version management tools for source code
Git, Subversion, Mercurial
GitHub, Gitlab, BitBucket
13. Version control - linear
V1
V2
V3
V4
2018-02-24
2018-03-16
2018-05-30
2018-06-05
14. Version control – branching …
V1
V2
V3-YoursV3-CoAuth1 V3-CoAuth2
V4-Merged
Version control – branching and merging
15. Version control – branching multiple analyses
V1
V2
V3-bV3-a V3-failed
V4-bV4-a
V5
16. Version control – collaborating
V1
V2
V3-bV3-a
V3-
failed
V4-bV4-a
V5
V1
V2
V3-bV3-a
V3-
failed
V4-bV4-a
V5
V1
V2
V3-bV3-a
V3-
failed
V4-bV4-a
V5
Your copy on
your computer
Your copy on
github
Someone else’s
copy on github
V5-a
V6-a
V5-a
V6-a
V5-a
V6-a
V7
17. Open Science – version control
Multiple versions/editions of your analysis scripts
Release when you think it is ready, i.e. upon publication
New revision when it has been improved
Versions of software … at a time scale of years/months/weeks
Editions of books … at a time scale of decennia
Original scientific data stays constant, but its interpretation
may change over time.
18.
19. Data Management Plan
Think about the data that you will collect
and how to document it
… since you want others to re-use your data
Document the details of your data, e.g. in a “codebook”
… since you want to use data collected by others
To learn new analysis skills
As pilot
For (re)analysis
… since you want to re-use your own data
Write documentation for your “future self”
20. Open Data
Findable
Data and supplementary materials have sufficiently rich metadata
and a unique and persistent identifier.
Accessible
Data is deposited in a trusted repository.
Authentication and authorization procedure where necessary.
Interoperable
(Meta)data uses a formal, shared, and broadly applicable language or format.
Reusable
Data is described with clear and understandable attributes.
There should be a clear and acceptable license for re-use.
https://www.force11.org/group/fairgroup/fairprinciples
21. Open Science - data
Shared data allows for
Improved reproducibility
Pooling, small effects that require large group sizes
Data mining, discovery science and generating new hypothesis
Results in methodological opportunities
Improve algorithms
Estimate effect and group size
Make informed decisions on analysis pipeline
Prevent harking and p-hacking
22. Data from human participants
General Data Protection Regulation (GDPR)
Challenges:
Explicit and strict protection of personal data
Opportunities:
Less influence of national legislation differences
Learn from each other
Develop best practices
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2016.119.01.0001.01.ENG
24. Personal data
name
address
date of birth
phone number
license plate number
IP address
...
Crime Scene Investigation
http://www.abc.net.au/news/2017-09-19/csi/8960590
25. (Biometric) data
facial details
dental record
fingerprint
genetics
cortical folding pattern
clinical data
cortical response to stimulation
responses to a questionaire
26. Personal Data is needed
and should be managed
Required for administration
Contacting your participants
Paying your participants
Follow up incidental findings
Often not required to address the research question
Sometimes used as confound
Check whether the sample is representative
Possibly required to assess scientific integrity
27. Personal Data
Personal data
Name, address, date of birth
Special personal data = “bijzondere persoonsgegevens in NL”
Race
Religion or beliefs
Health
Sexual activities
Political preference, membership of a union
Criminal record
Indirect personal data – identifies someone … when linked to another database
Fingerprint, DNA, facial details
Anatomical MRI
Specific pattern of data (e.g. answers on a questionnaire or interview)
https://autoriteitpersoonsgegevens.nl/nl/over-privacy/persoonsgegevens/wat-zijn-persoonsgegevens
28. Gradient between
personal and research data
indirect personal
data
personal data
a lot of research data
easy easyhard
Keep private
and don’t share
Share as it is
with others
?
29. Limit possible identification
Anonymous
Nobody is able to identify the participant
Pseudonymization
Use a code instead of the participants name
De-identification
Remove (indirectly) identifying features
Blur the indirect personal data
Deface anatomical MRI
Age at the time of acquisition instead of date of birth
Use age bins instead of years
Questionnaire outcomes rather than individual item scores
…
31. Personal and research data
indirect personal
data
personal data
a lot of research data
32. Personal and research data
data minimization
pseudonymization
data minimization
de-identifying, blurring
alotofresearchdata
personaldata
indirect
personaldata
Share
responsibly with
legal constraints
on reuse
Keep safe
and private
33. Legal constraints
Contract between the researcher
… and the funding agency
… and the ethics committee
… and the participants/patients
… and the publisher of the results
… and the recipient of the data upon sharing
34. Legal constraints – Data Use Agreement
CC0 - Public Domain
No copyright.
The person who associated a work with this deed
has dedicated the work to the public domain by
waiving all of his or her rights to the work
worldwide under copyright law, including all related
and neighboring rights, to the extent allowed by law.
You can copy, modify, distribute and perform the
work, even for commercial purposes, all without
asking permission.
Donders Institute - Data Use Agreement
for identifiable human data
I will comply with all relevant rules and regulations
imposed by my institution and my government ….
I will not attempt to establish the identity of or attempt
to contact any of the included human subjects. I will not
link this data to any other database in a way that could
provide identifying information ….
I will not redistribute or share the data with others,
including individuals in my research group, unless they
have independently applied and been granted access to
this data.
I will acknowledge the use of the data and data derived
from the data when publicly presenting …
Failure to abide by these guidelines will result in
termination of my privileges to access to these data.
https://creativecommons.org/publicdomain/zero/1.0/
https://data.donders.ru.nl/doc/dua/
participant → you → recipient
37. What is is?
BIDS is a way to organize your existing raw data
To improve consistent and complete documentation
To facilitate re-use by your future self and others
BIDS is not
A new file format
A search engine
A data sharing tool
38. BIDS for MRI, MEG, EEG … in future also iEEG, PET, eye-tracker, etc.
data/README
CHANGES
dataset_description.json
participants.tsv
/sub-01/anat/…
/sub-01/meg/…
/sub-01/eeg/sub-01_task-auditory_eeg.edf
/sub-01/eeg/sub-01_task-auditory_eeg.json
/sub-01/eeg/sub-01_task-auditory_channels.tsv
/sub-01/eeg/sub-01_task-auditory_events.tsv
/sub-01/eeg/sub-01_electrodes.tsv
/sub-01/eeg/sub-01_coordinates.json
EDF
BrainVision
Neuroscan
Biosemi
EEGLAB .set
39. Metadata in ”sidecar” files
Participants
Demographics
Questionaire outcomes
Equipment
Amplifier, cap, electrode type and placement
Filter settings, reference
Design, task and conditions
Instructions, stimuli material, responses
Trigger codes
Also some details from EEG data to make querying easier
40. Why use BIDS?
Developed with open community discussion
and involvement of experienced researchers
Neuroinformatics and analysis tools available for it
EEGLAB, FieldTrip, MNE-Python, BrainStorm
Increases the chance of your data being indexed and reused
(Future) applications for searching, automated analyses, …
But … it is more important that you share and what you share
than how you share it
41.
42. Summary
New tools to be adopted for Open Science
Planning ahead for analysis and data
Version control and release of analysis details
Data management plan
Responsible sharing, considering your participants’ rights
Organizing EEG data according to BIDS
43. Suggested further reading
This presentation on
https://www.slideshare.net/robertoostenveld
https://opensciencemooc.eu
https://open-science-training-handbook.gitbooks.io/book
http://software-carpentry.org
http://bids.neuroimaging.org
http://data.donders.ru.nl
Notes de l'éditeur
Vragen aan eind
Verzoek van subject om zijn data te wissen -> informed consent procedure
Beschrijving van metadata -> koppeling aan externe ontologies