SlideShare une entreprise Scribd logo
1  sur  28
The ROER4D Open Data initiative
Michelle Willmers and Thomas King
January 2018
CC BY
Introduction to ROER4D
• Research on Open Educational Resources for Development project
– 18 sub-projects, across 26 countries in the Global South from Chile to
Mongolia, with 100 researchers, supported by a Network Hub team based in
the University of Cape Town and Wawasan Open University.
– Datasets in multiple languages (English, Spanish, Mongolian)
– Mostly mixed-methods data (mix of quantitative and qualitative)
• ROER4D Open Data initiative: supporting interested sub-projects in
sharing their data openly
Research
On Open Educational Resources (OER)
for Development
• Imperative to establish empirical baseline research on OER in Global South
• 86 researchers in 26 countries across 3 continents
• Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’
silos
• Open content (typically used in a teaching and learning
content) that can be reused, revised, remixed,
redistributed and retained
• Made possible by open licensing, although increasing
focus on differentiating implicit vs. explicit open
content
• Focus on role OER can play in improving access to quality education
• Focus on role project can play in building Global South Open Education
research capacity
• Strong advocacy and activism component (NGO, CBO sectors – not only
career researchers)
Focus on empirical baseline manifests in focus on curatorial and publishing capacity within the
research project. The project acts as publisher, providing greater agency and control (but
presenting some challenges in terms of accreditation/reward).
Unpacking the “ROER4D” project title…
Curation & Dissemination strategy
• Provide a content management and publishing service to SP researchers and the
Network Hub team in order to advance research capacity development efforts and
increase visibility of outputs.
• Support Principal Investigators and SP researchers in editorial development of
ROER4D outputs.
• Address infrastructure deficits and provide content management solutions
(including content hosting) in a research community with uneven institutional
support and capacity challenges.
• Ensure that the ROER4D legacy is freely accessible for reuse in line with international
curatorial and publishing standards.
• Complement Network Hub Communications efforts in an integrated
communications/dissemination approach.
• Data sharing as component of generalised open content focus.
• Organising and profiling open content increases the potential for reuse and citation
(impact).
• Well-organised, strategic research management and content organisation promotes
rigour in the research process.
• Copyright vests with the author > data-sharing activity determined by their willingness
and capacity to engage.
• Format and platform/tool agnostic.
• Share openly by default on condition that it is valuable, legal and ethical
Data management principles
Research Data
Management
Collect data
Organise data
Refine data
Share data
Document
data
Store data
Backup, archive, on-
site storage, cloud
storage
Metadata, dataset
description
De-identification,
publishing, open
data
Ethics clearance,
methodology,
instruments Formats, naming
conventions
Verification,
validation
The two pillars of Open Data sharing
Consensual
ethical
legal
Comprehensible
coherent
valuable
Research Data Management &
Open Data sharing
Project archive
(external)
Zenodo
Researcher
ROER4D archive (internal)
Google, Vula, UCT eResearch
Centre
Publisher
DataFirst
Network Hub
(Google, Vula)
ROER4D project data flow
Internal
sharing and
collaboration
External
sharing and
collaboration
Open Data terminology
• Open Data = Microdata
– Unit record data (survey data, census data)
– Interview and Focus Group transcripts
– i.e. the ‘raw material’ from which outputs, reports, publications etc. are
produced.
• Supportive documentation = Metadata
– Dataset descriptions
– Study descriptions (methods/methodology, data collection schedules
– Data processing information (e.g. de-identification schema)
Terms and definitions
TERM DEFINITION
Microdata (aka Unit
Record Data)
The information that underlies a research project’s analysis (i.e. the
‘thing’)
Metadata Data that describes a file or record on a database (for example,
keywords, author fields, ISBNs, DOIs)
Research Data
Management (RDM)
Overall term for how individuals/projects/institutions manage their
data
Data Management Plan
(DMP)
Outlines an individual or project’s strategy around all aspects of data
management
Curation Organising, storing/archiving and describing data to ensure & control
its long-term accessibility and usability. May include
collating/concatenating from other sources
De-identification Removing, eliding or replacing pieces of information that reveal
research participants’ (possibly also referents’) identity
Anonymity Personal details (identifiers) are not gathered
Confidentiality Personal details (identifiers) are not shared
Curation platform An on-premises or cloud-based storage space that contains metadata
capabilities, Search Engine Optimisation, and backup capabilities
Why should researchers share data?
• ROER4D motivations:
– Build the empirical base for future research
– Coherent with our generally ‘open’ approach – publishing open
access outputs, actively communicating with audiences and
stakeholders, etc.
• Good practice – many research funders now require some sort of data-
sharing activity or plan
• Improve rigour
– Sharing data openly demands that the dataset is well described
and organised
– Increased scrutiny of the dataset often leads to more refined
analysis
Five pillars of
ROER4D data
publication
approach
Step 1: Evaluate contractual framework,
articulate strategy
Step 2: Get researchers on board
Recruiting participants
• Emphasising social justice through sharing
– Sharing open data allows for latitudinal studies using data from multiple sites
• Emphasising personal reputation
– Sharing open data as a means of building one’s personal profile as a
researcher
• Emphasising rigour
– Sharing data openly enhances the quality of the research
• Check ethics approval and consent
• Ensure first-tier de-identification takes place prior to Network Hub transfer in
order to ensure research subject confidentiality
• ROER4D agnostic in its approach (in terms of scale, format and technical
sophistication)
• Challenges of varying researcher sophistication in terms of data collection and
presentation
• Challenges of varying researcher sophistication in terms of technology employed
to capture, present, and analyse data
Step 3: Source sub-project micro-data
• Archive in LMS and secure institutional archive
• Network Hub C&D team audits researchers’ submitted dataset
> What is the dataset comprised of?
> Are all the pieces there?
> What were the data collection processes, and do we have all the instruments to
share?
> What languages are represented?
> Does something else like it exist?
> Who might it be of use to?
• Address file naming and format issues
• Articulate sub-project-specific data management plan
Step 4: Network Hub curation and quality
assurance
• Scope and conceptualise the dataset
> Which components of the project-generated micro-data are you ethically and
legally allowed to share?
> Which components of the project-generated micro-data will you invest
resources in curating and sharing?
> Which instruments will you include?
• Identify focus of data and points of sensitivity
• Define appropriate second-tier de-identification approach
Step 5: Preparing data for publication
READ
DATA
Coherence
Format &
layout Editing
Fix typos &
identify
anomalous data
1.
2.
3.
4.
5.
De-identifying
Remove
identifiers
Validation
Identify and
account for missing
data
ROER4D data
interrogation
process
The de-identification balancing act
First, do no harm
Remove as much as needed to ensure the
confidentiality or anonymity of the
research participants.
Ensure that all ethical and consent
processes have been adhered to.
Don’t go overboard
Remove as little as is ethical to ensure the
richness of the data.
Take the unit of analysis as the guide – de-
identify up to the Unit of Analysis.
E.g: If Study X compares two universities,
you can safely remove all identifiers lower
than the university affiliation.
HOWEVER
Your data may be useful to others. The
purpose of de-identification is to preserve
confidentiality – don’t de-identify for the
sake of it
ROER4D de-identification process
1. First-level de-identification by researcher
– Removal of direct identifiers (names of people/institutions/companies, ID
numbers, etc.)
– Important to ensure that raw data is not shared
2. Second-level de-identification by C&D team to catch remaining direct
identifiers
3. In-depth sweep of the text to identify indirect identifiers
– Meticulous, thorough, repeated reading of the text (which ties back to
general data enhancement)
Qualitative de-identification
• De-identification located in the same ecosystem as data cleaning and data
validation – no clear line between data improvement and de-identification
– Cleaning up typos
– Standardising presentation and layout
– Identifying unanswered questions (or additional questions), mislabelled
responses, etc.
• Much of these also apply to quantitative data
• Articulation of principles in RDM and description of these processes included in
metadata
Qualitative de-identification example
• Raw data
– Well my name is Susan Tsvangirai, and I’m the Head of the
Anthropology department at the University of Zimbabwe. I first
started getting involved in publishing my data – see I’m the only
person in the country who works on human ecologies, well it’s me
and Ishaan at Wits, but I’m the only one locally, and I started out
using the institutional repository but it didn’t really work. It kept
timing out when I tried to upload resources. So I switched the Zenodo
which was fine but it felt a little bit sterile…
• Cleaned/processed data
– Well my name is [redacted], and I’m the Head of [my] department at
the University of Zimbabwe. I first started getting involved in
publishing my data – see I’m the only person in the country who
works [in my area], well it’s me and [a colleague] at Wits, but I’m the
only one locally, and I started out using the institutional repository
but it didn’t really work. It kept timing out when I tried to upload
resources. So I switched the Zenodo which was fine but it felt a little
bit sterile…
• Generate metadata and dataset description (accompanying narrative)
• Submit content to publisher (in ROER4D instance, DataFirst)
• Link to published outputs
• Include description of process in research Methodology statements
• Profile in project communications activity
Step 6: Publish
Challenges
• Data collected in multiple languages
– De-identification (particularly in qualitative data) far more difficult –
greater reliance on the researcher to identify disclosive information
• Post-hoc consent process
– Departments merge or close, participants retire or disappear
• Data collected by multiple researchers
– Different collection strategies, adherence to interview schedules, use/non-
use of clarifying questions, etc.
Ways forward: ‘Open by design’
• Help researchers write consent forms to facilitate ethical open
data sharing.
• ‘Red flag’ clauses abound in template consent forms,
including:
– “will be used for research purposes only”
– “data will be destroyed after use”
– “only researchers will have access to the data”
• More open consent forms allow for data sharing but do not
mandate it.
Lessons learned
1. Openness increases rigour. Preparing data for publication promotes professional approach to
research process.
2. Preparing data for publication exposes weaknesses in instrument design and research
process.
3. Introducing C&D and data-sharing focus midway through a project poses many challenges,
particularly in terms of ethical and consent components.
4. Data sharing drives focus on reproducibility, transforming traditional approach to crafting
methodology statements.
5. The data preparation process takes time (approx. one week of researchers’ time in ROER4D
context).
6. Obtaining balance between utility and adequate protection in de-identification of qualitative
data is a challenge.
7. Openness is threatening to researchers in terms of exposing weakness in processes and
perceived threat of losing publication advantage.
8. C&D and data sharing activity require support, capacity development and resourcing.

Contenu connexe

Tendances

You down with dmp yeah you know me!
You down with dmp  yeah you know me!You down with dmp  yeah you know me!
You down with dmp yeah you know me!Renaine Julian
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycleMarieke Guy
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Historic Environment Scotland
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchersSarah Jones
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your DataElaine Martin
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collectionSherry Lake
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management PlanningSarah Jones
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and LibrariansJohann van Wyk
 

Tendances (20)

You down with dmp yeah you know me!
You down with dmp  yeah you know me!You down with dmp  yeah you know me!
You down with dmp yeah you know me!
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Data management plans
Data management plansData management plans
Data management plans
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...Supporting the development of a national Research Data Discovery Service - A ...
Supporting the development of a national Research Data Discovery Service - A ...
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 
Hawkins "Implementation of the CONSER Standard Record"
Hawkins "Implementation of the CONSER Standard Record"Hawkins "Implementation of the CONSER Standard Record"
Hawkins "Implementation of the CONSER Standard Record"
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 

Similaire à ROER4D Open Data Initiative

Willmers&King open con2016-ct-14.11.16
Willmers&King open con2016-ct-14.11.16Willmers&King open con2016-ct-14.11.16
Willmers&King open con2016-ct-14.11.16Michelle Willmers
 
Effective research data management
Effective research data managementEffective research data management
Effective research data managementCatherine Gold
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016 Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016Rebecca Raworth, MLIS
 
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchRobin Rice
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-awardMartin Donnelly
 
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...ARDC
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management PlansSarah Jones
 
PARTHENOS Common Policies and Implementation Strategies
PARTHENOS Common Policies and Implementation StrategiesPARTHENOS Common Policies and Implementation Strategies
PARTHENOS Common Policies and Implementation StrategiesParthenos
 
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...Pedro Príncipe
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceGarethKnight
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarPeter McQuilton
 
Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Michelle Willmers
 

Similaire à ROER4D Open Data Initiative (20)

Willmers&King open con2016-ct-14.11.16
Willmers&King open con2016-ct-14.11.16Willmers&King open con2016-ct-14.11.16
Willmers&King open con2016-ct-14.11.16
 
Effective research data management
Effective research data managementEffective research data management
Effective research data management
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your Research
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
 
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
Ands ttt2 perth_accelerate your data skills training_ top tips for topics and...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
PARTHENOS Common Policies and Implementation Strategies
PARTHENOS Common Policies and Implementation StrategiesPARTHENOS Common Policies and Implementation Strategies
PARTHENOS Common Policies and Implementation Strategies
 
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...
RDM librarians Skills & Competencies: roles & training (SPARC & COAR Member W...
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR Webinar
 
Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16
 

Dernier

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 

Dernier (20)

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 

ROER4D Open Data Initiative

  • 1. The ROER4D Open Data initiative Michelle Willmers and Thomas King January 2018 CC BY
  • 2. Introduction to ROER4D • Research on Open Educational Resources for Development project – 18 sub-projects, across 26 countries in the Global South from Chile to Mongolia, with 100 researchers, supported by a Network Hub team based in the University of Cape Town and Wawasan Open University. – Datasets in multiple languages (English, Spanish, Mongolian) – Mostly mixed-methods data (mix of quantitative and qualitative) • ROER4D Open Data initiative: supporting interested sub-projects in sharing their data openly
  • 3.
  • 4. Research On Open Educational Resources (OER) for Development • Imperative to establish empirical baseline research on OER in Global South • 86 researchers in 26 countries across 3 continents • Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’ silos • Open content (typically used in a teaching and learning content) that can be reused, revised, remixed, redistributed and retained • Made possible by open licensing, although increasing focus on differentiating implicit vs. explicit open content • Focus on role OER can play in improving access to quality education • Focus on role project can play in building Global South Open Education research capacity • Strong advocacy and activism component (NGO, CBO sectors – not only career researchers) Focus on empirical baseline manifests in focus on curatorial and publishing capacity within the research project. The project acts as publisher, providing greater agency and control (but presenting some challenges in terms of accreditation/reward). Unpacking the “ROER4D” project title…
  • 5. Curation & Dissemination strategy • Provide a content management and publishing service to SP researchers and the Network Hub team in order to advance research capacity development efforts and increase visibility of outputs. • Support Principal Investigators and SP researchers in editorial development of ROER4D outputs. • Address infrastructure deficits and provide content management solutions (including content hosting) in a research community with uneven institutional support and capacity challenges. • Ensure that the ROER4D legacy is freely accessible for reuse in line with international curatorial and publishing standards. • Complement Network Hub Communications efforts in an integrated communications/dissemination approach.
  • 6. • Data sharing as component of generalised open content focus. • Organising and profiling open content increases the potential for reuse and citation (impact). • Well-organised, strategic research management and content organisation promotes rigour in the research process. • Copyright vests with the author > data-sharing activity determined by their willingness and capacity to engage. • Format and platform/tool agnostic. • Share openly by default on condition that it is valuable, legal and ethical Data management principles
  • 7. Research Data Management Collect data Organise data Refine data Share data Document data Store data Backup, archive, on- site storage, cloud storage Metadata, dataset description De-identification, publishing, open data Ethics clearance, methodology, instruments Formats, naming conventions Verification, validation
  • 8. The two pillars of Open Data sharing Consensual ethical legal Comprehensible coherent valuable Research Data Management & Open Data sharing
  • 9. Project archive (external) Zenodo Researcher ROER4D archive (internal) Google, Vula, UCT eResearch Centre Publisher DataFirst Network Hub (Google, Vula) ROER4D project data flow Internal sharing and collaboration External sharing and collaboration
  • 10. Open Data terminology • Open Data = Microdata – Unit record data (survey data, census data) – Interview and Focus Group transcripts – i.e. the ‘raw material’ from which outputs, reports, publications etc. are produced. • Supportive documentation = Metadata – Dataset descriptions – Study descriptions (methods/methodology, data collection schedules – Data processing information (e.g. de-identification schema)
  • 11. Terms and definitions TERM DEFINITION Microdata (aka Unit Record Data) The information that underlies a research project’s analysis (i.e. the ‘thing’) Metadata Data that describes a file or record on a database (for example, keywords, author fields, ISBNs, DOIs) Research Data Management (RDM) Overall term for how individuals/projects/institutions manage their data Data Management Plan (DMP) Outlines an individual or project’s strategy around all aspects of data management Curation Organising, storing/archiving and describing data to ensure & control its long-term accessibility and usability. May include collating/concatenating from other sources De-identification Removing, eliding or replacing pieces of information that reveal research participants’ (possibly also referents’) identity Anonymity Personal details (identifiers) are not gathered Confidentiality Personal details (identifiers) are not shared Curation platform An on-premises or cloud-based storage space that contains metadata capabilities, Search Engine Optimisation, and backup capabilities
  • 12. Why should researchers share data? • ROER4D motivations: – Build the empirical base for future research – Coherent with our generally ‘open’ approach – publishing open access outputs, actively communicating with audiences and stakeholders, etc. • Good practice – many research funders now require some sort of data- sharing activity or plan • Improve rigour – Sharing data openly demands that the dataset is well described and organised – Increased scrutiny of the dataset often leads to more refined analysis
  • 13. Five pillars of ROER4D data publication approach
  • 14. Step 1: Evaluate contractual framework, articulate strategy
  • 15. Step 2: Get researchers on board
  • 16. Recruiting participants • Emphasising social justice through sharing – Sharing open data allows for latitudinal studies using data from multiple sites • Emphasising personal reputation – Sharing open data as a means of building one’s personal profile as a researcher • Emphasising rigour – Sharing data openly enhances the quality of the research
  • 17. • Check ethics approval and consent • Ensure first-tier de-identification takes place prior to Network Hub transfer in order to ensure research subject confidentiality • ROER4D agnostic in its approach (in terms of scale, format and technical sophistication) • Challenges of varying researcher sophistication in terms of data collection and presentation • Challenges of varying researcher sophistication in terms of technology employed to capture, present, and analyse data Step 3: Source sub-project micro-data
  • 18. • Archive in LMS and secure institutional archive • Network Hub C&D team audits researchers’ submitted dataset > What is the dataset comprised of? > Are all the pieces there? > What were the data collection processes, and do we have all the instruments to share? > What languages are represented? > Does something else like it exist? > Who might it be of use to? • Address file naming and format issues • Articulate sub-project-specific data management plan Step 4: Network Hub curation and quality assurance
  • 19. • Scope and conceptualise the dataset > Which components of the project-generated micro-data are you ethically and legally allowed to share? > Which components of the project-generated micro-data will you invest resources in curating and sharing? > Which instruments will you include? • Identify focus of data and points of sensitivity • Define appropriate second-tier de-identification approach Step 5: Preparing data for publication
  • 20. READ DATA Coherence Format & layout Editing Fix typos & identify anomalous data 1. 2. 3. 4. 5. De-identifying Remove identifiers Validation Identify and account for missing data ROER4D data interrogation process
  • 21. The de-identification balancing act First, do no harm Remove as much as needed to ensure the confidentiality or anonymity of the research participants. Ensure that all ethical and consent processes have been adhered to. Don’t go overboard Remove as little as is ethical to ensure the richness of the data. Take the unit of analysis as the guide – de- identify up to the Unit of Analysis. E.g: If Study X compares two universities, you can safely remove all identifiers lower than the university affiliation. HOWEVER Your data may be useful to others. The purpose of de-identification is to preserve confidentiality – don’t de-identify for the sake of it
  • 22. ROER4D de-identification process 1. First-level de-identification by researcher – Removal of direct identifiers (names of people/institutions/companies, ID numbers, etc.) – Important to ensure that raw data is not shared 2. Second-level de-identification by C&D team to catch remaining direct identifiers 3. In-depth sweep of the text to identify indirect identifiers – Meticulous, thorough, repeated reading of the text (which ties back to general data enhancement)
  • 23. Qualitative de-identification • De-identification located in the same ecosystem as data cleaning and data validation – no clear line between data improvement and de-identification – Cleaning up typos – Standardising presentation and layout – Identifying unanswered questions (or additional questions), mislabelled responses, etc. • Much of these also apply to quantitative data • Articulation of principles in RDM and description of these processes included in metadata
  • 24. Qualitative de-identification example • Raw data – Well my name is Susan Tsvangirai, and I’m the Head of the Anthropology department at the University of Zimbabwe. I first started getting involved in publishing my data – see I’m the only person in the country who works on human ecologies, well it’s me and Ishaan at Wits, but I’m the only one locally, and I started out using the institutional repository but it didn’t really work. It kept timing out when I tried to upload resources. So I switched the Zenodo which was fine but it felt a little bit sterile… • Cleaned/processed data – Well my name is [redacted], and I’m the Head of [my] department at the University of Zimbabwe. I first started getting involved in publishing my data – see I’m the only person in the country who works [in my area], well it’s me and [a colleague] at Wits, but I’m the only one locally, and I started out using the institutional repository but it didn’t really work. It kept timing out when I tried to upload resources. So I switched the Zenodo which was fine but it felt a little bit sterile…
  • 25. • Generate metadata and dataset description (accompanying narrative) • Submit content to publisher (in ROER4D instance, DataFirst) • Link to published outputs • Include description of process in research Methodology statements • Profile in project communications activity Step 6: Publish
  • 26. Challenges • Data collected in multiple languages – De-identification (particularly in qualitative data) far more difficult – greater reliance on the researcher to identify disclosive information • Post-hoc consent process – Departments merge or close, participants retire or disappear • Data collected by multiple researchers – Different collection strategies, adherence to interview schedules, use/non- use of clarifying questions, etc.
  • 27. Ways forward: ‘Open by design’ • Help researchers write consent forms to facilitate ethical open data sharing. • ‘Red flag’ clauses abound in template consent forms, including: – “will be used for research purposes only” – “data will be destroyed after use” – “only researchers will have access to the data” • More open consent forms allow for data sharing but do not mandate it.
  • 28. Lessons learned 1. Openness increases rigour. Preparing data for publication promotes professional approach to research process. 2. Preparing data for publication exposes weaknesses in instrument design and research process. 3. Introducing C&D and data-sharing focus midway through a project poses many challenges, particularly in terms of ethical and consent components. 4. Data sharing drives focus on reproducibility, transforming traditional approach to crafting methodology statements. 5. The data preparation process takes time (approx. one week of researchers’ time in ROER4D context). 6. Obtaining balance between utility and adequate protection in de-identification of qualitative data is a challenge. 7. Openness is threatening to researchers in terms of exposing weakness in processes and perceived threat of losing publication advantage. 8. C&D and data sharing activity require support, capacity development and resourcing.

Notes de l'éditeur

  1. The ROER4D project, conceived in 2012 and running from 2013 to the end of 2017, was explicitly scoped with an ambition to conduct Open Research inasmuch as that proved viable and valuable. An early ambition mentioned in the scoping document was the desire to share data openly, but this process was not begun until 2015 with the elevation of Curation and Dissemination as a core project objective and the subsequent launch of the Open Data Initiative.
  2. The graphic above shows where the ROER4D sub-projects were located and where they conducted their research activities. The research participants included high-school (secondary) and university (tertiary) students, teachers in secondary and tertiary education, government officials, and members of NGOs,.
  3. In the networked model of the ROER4D project, the Curation and Dissemination team were not involved in the gathering and validation of data, but supported the sub-projects in processing and organising their data for long-term curation and storage, and in some cases for publication and sharing as Open Data. Due to contractual requirements, the Network Hub
  4. There are two competing influences on Open Data sharing, namely the ethical imperative – the requirement to actively inform research participants of the Open Data process and protect them from potential negative consequences – and ensuring the integrity and value of the shared dataset by not removing so much content that the final product is incomprehensible or so sparse as to lack value.
  5. The ROER4D Open Data initiative was scoped to serve internal curation purposes (professionalising data stewardship) as well as external, public sharing of micro-data (where ethically and legally possible). The internal curation component was crucial in terms of keeping track of and curating the large amount of data produced by the 17 sub-projects, particularly as relates to the project’s meta-synthesis activities.
  6. Microdata is the raw material that underpins the analysis of a research project. It can consist of quantitative (large-scale datasets, often represented in tabular form) or qualitative data (personal observations, field notes, interview and focus group transcripts). Metadata is the data that supports and describes microdata, and can consist of some or all of the following: dataset descriptions, study methods or methodologies, production dates, data collection and processing schema, etc.
  7. Difference between anonymity and confidentiality: an anonymous survey contains no questions about personal identifiers; a confidential survey does contain these questions, but will not share/publish them.
  8. While there are potential and real benefits to civil society and government from sharing Open Data, there is also a case to be made for the individual benefits accruing to researchers from sharing their data. Open data sharing is increasingly being mandated by funder institutions, particularly large national and regional funders in the Global North, and so familiarising oneself with open data principles and sharing data openly is good practice for those interested in applying to these funders. Finally, and significantly, the process of preparing one’s data for open sharing necessitates deep and thorough data sophistication, through improvement of the microdata and/or metadata.
  9. As the Open Data Initiative was a voluntary activity (not mandated in the original project scoping), participants had to be persuaded to participate. The three primary methods used to encourage participation were through: 1) An appeal to the project’s overall Open Research agenda, by emphasising the value of Open Data for future studies and potentially latitudinal research 2) An appeal to the benefits accruing to contributors’ personal reputation, through the production of a citable research object (an open dataset) 3) An emphasis on the rigour-enhancement inherent in preparing a dataset for open sharing. As the project was conceived with an explicit open agenda, much of the first strategy was implicit in the project’s general Open Research orientation. The second emphasis (personal reputation) relied on the standard practice of measuring citations as a means of measuring an academic’s public profile. Finally, the third strategy highlighted the Open Data sharing process as serving the core academic principle of ethical, rigorous research practice.
  10. The ROER4D Network Hub conceives of data publishing as a ‘data interrogation’ process that may result in published Open Data, but still provides value even if the decision is made not to publish. The data interrogation process relies on frequently returning to read the original data in between coherence checking, editorial work, validation and verification activity, and finally de-identification. This process helps surface issues, particularly indirect identifiers, that are particularly relevant and prevalent in qualitative data.
  11. While ethical considerations and the protection of research participants must come first, part of the value of Open Data lies in part in the ability of other researchers to mine datasets according to different conceptual and analytical frameworks. In such instances, a de-identification approach that only retains such content as supports the original study’s analysis limits the reusability, and thus the value of the dataset.
  12. In quantitative data, disclosive information is typically isolated to specific variables or data values that can be identified and removed automatically. In qualitative data however the interplay between otherwise nondisclosive information or insights may potentially be disclosive. Therefore, more attention must be paid to identifying and removing, eliding or obfuscating these indirect identifiers, which may only be recognisable after repeated passes of the data.
  13. The above is an excerpt from a fictional qualitative dataset with the disclosive information indicated in red. The bold text indicates an indirect identifier that, in combination with the directly disclosive information, becomes disclosive itself. The second paragraph serves as an example of one way of de-identifying this excerpt.
  14. As a networked project ROER4D covered a vast area with different linguistic and cultural norms, and contained sub-projects with different research methodologies. This introduced complexity into the data cleaning process, made even more complicated by the fact that the Open Data Initiative had not formed part of the original scoping and therefore in some cases research participants had to be recontacted in order to gain consent for their data to be shared.