SlideShare une entreprise Scribd logo
1  sur  14
Mechanisms for
Data Quality and Validation
in Citizen Science
A. Wiggins, G. Newman, R. Stevenson & K. Crowston
Presented by Nathan Prestopnik
Motivation

 Data quality and validation are a primary concern
  for most citizen science projects
   More contributors = more opportunities for error

 There has been no review of appropriate data
  quality and validation mechanisms
   Diverse projects face similar challenges

 Contributors’ skills and scale of participation are
  important considerations in ensuring quality
Methods

 Survey
   Questionnaire with 70 items, all optional
   63 completed questionnaires representing 62 projects
   Mostly small-to-medium sized projects in US, Canada,
    UK; most focus on monitoring and observation

 Inductive development of framework
   Based on survey results and authors’ direct experience
    with citizen science projects
Survey: Resources

 FTEs: 0 – 50+
   Average: 2.4; Median: 1
   Often small fractions of several individuals’ time

 Annual budgets: $125 - $1,000,000
   Average: $105,000; Median: $35,000; Mode: $20,000
   Up to 5 different funding sources, usually grants, in-
    kind contributions (staff time), & private donations

 Age/duration: -1 to 100 years
   Average age: 13 years; Median: 9 years; Mode: 2 years
Survey: Methods Used
Method                                                n    Percentage
Expert review                                         46      77%
Photo submissions                                     24      40%
Paper data sheets submitted along with online entry   20      33%
Replication/rating by multiple participants           14      23%
QA/QC training program                                13      22%
Automatic filtering of unusual reports                11      18%
Uniform equipment                                     9       15%
Validation planned but not yet implemented            5       8%
Replication/rating, by the same participant           2       3%
Rating of established control items                   2       3%
None                                                  2       3%
Not sure/don’t know                                   2       3%
Survey:
         Combining Methods
Methods                                      n    Percentage
Single method                                10      17%
Multiple methods, up to 5 (average 2.5)      45      75%
Expert review + Automatic filtering          11      18%
Expert review + Paper data sheets            10      17%
Expert review + Photos                       14      23%
Expert review + Photos + Paper data sheets   6       10%
Expert review + Replication, multiple        10      17%
Survey:
     Resources & Methods
 Number of validation methods and staff are
  positively correlated (r2 = 0.11)
   More staffing = more supervisory capacity

 Number of validation methods and budget are
  negatively correlated (r2 = -0.15)
   If larger budgets means more contributors, this
    constrains scalability of multiple methods
   Larger projects may use fewer but more sophisticated
    mechanisms
   Suggests that human-supervised methods don’t scale
Survey:
 Other Validation Options
 “Please describe any additional validation methods
  used in your project”
   Several projects rely on personal knowledge of
    contributing individuals for data quality
     Not scientifically robust, but understandably relevant
   Most comments referred to details of expert review
     Reinforces the perceived value of expertise
   Reporting interface and associated error-checking is
    often overlooked, but provides important initial data
    verification
Choosing Mechanisms

 Data characteristics to consider when choosing
  mechanisms to ensure quality
   Accuracy and precision: taxonomic, spatial, temporal,
    etc.
   Error prevention: malfeasance (gaming the system),
    inexperience, data entry errors, etc.

 Evaluate assumptions about error and accuracy
   Where does error originate? How do mechanisms
    address this? At what step in the research process?
    How transparent is data review and outcomes? How
    much data will be reviewed? In how much detail?
Mechanisms: Protocols
Mechanism                 Process   Type/Detail
QA project plans          Before    SOP in some areas
Repeated samples/tasks    During    By multiple participants, single
                                    participant, or experts (calibration)
Tasks involving control   During    Contributions compared to known states
items
Uniform/calibrated        During    Used for measurements; cost/scale
equipment                           tradeoff; who pays?
Paper data sheets +       During    Extended details, verifying data entry
online entry*                       accuracy
Digital vouchers*         During    Photos, audio, specimens/archives
Data triangulation,       After     Corroboration from other data sources;
normalization, mining*              statistical & computer science methods
Data documentation*       After     Provide metadata about processes
Mechanisms: Participants

Mechanism                 Process   Types/Details
Participant training      Before,   Initial; Ongoing; Formal QA/QC
                          During
Participant testing       Before,   Following training; Pre/test-retest
                          During
Rating participant        During,   Unknown to participant; Known to
performance               After     participant
Filtering of unusual      During,   Automatically; Manually
reports                   After
Contacting participants   After     May alienate/educate contributors
about unusual reports
Automatic recognition     After     Techniques for image/text processing
Expert review             After     By professionals, experienced contributors,
                                    or multiple parties
Discussion

 Need to pay more attention to way that data are
  created, not just protocols but also qualities of data
  like accuracy, precision

 Clear need for quality/validation mechanisms for
  analysis, not only for data collection/processing
   Data mining techniques
   Spatio-temporal modeling

 Scalability of validation may be limited
   May need to plan different quality management
    techniques based on expected/actual project growth
Future Work

 Most projects worry more about contributor
  expertise than appropriate analysis methods
   Resources are needed to support suitable analysis
    approaches and tools

 Comparative valuation of the efficacy of the data
  quality and validation mechanisms identified
   Develop a QA/QC planning and evaluation tool

 Develop examples of appropriate data
  documentation for citizen science projects
   Necessary for peer review, data re-use
Thanks!

 Nate Prestopnik

 DataONE working group on Public Participation in
  Scientific Research

 US NSF grants 09-43049 & 11-11107

Contenu connexe

Tendances

Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSKatalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practicewww.datatrak.com
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testinglolayoriva
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialpcirnkt
 

Tendances (6)

CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.
 
Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practice
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testing
 
Monitoring Plan Template
Monitoring Plan TemplateMonitoring Plan Template
Monitoring Plan Template
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trial
 

En vedette

GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationCameron Kiddle
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureAndrea Wiggins
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureAndrea Wiggins
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsAndrea Wiggins
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomoFredy Vasquez
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceAndrea Wiggins
 

En vedette (8)

GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 Presentation
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science Cyberinfrastructure
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomo
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
 
All About me
All About meAll About me
All About me
 

Similaire à Mechanisms for Data Quality and Validation in Citizen Science

Optimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronOptimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronNeill Barron
 
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...Sushanta Kumar Sarker
 
RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016Lauren Carr
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.pptTeenaPS1
 
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM Events
 
ISCRAM Impact Evaluation
ISCRAM Impact EvaluationISCRAM Impact Evaluation
ISCRAM Impact EvaluationKenny Meesters
 
Final-Audit-Sampling.pdf
Final-Audit-Sampling.pdfFinal-Audit-Sampling.pdf
Final-Audit-Sampling.pdfssuser5945a3
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixGe Peng
 
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...GoLeanSixSigma.com
 
Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...jehill3
 
Acceptance Testing
Acceptance TestingAcceptance Testing
Acceptance Testingrosman
 
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Water, Land and Ecosystems (WLE)
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Health Catalyst
 
Freeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsFreeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsTransPerfect Trial Interactive
 
Strengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataStrengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataMEASURE Evaluation
 
Quality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfQuality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfNileshJajoo2
 

Similaire à Mechanisms for Data Quality and Validation in Citizen Science (20)

Optimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronOptimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill Barron
 
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
 
Quality payment program 2018
Quality payment program 2018Quality payment program 2018
Quality payment program 2018
 
RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016
 
Test process
Test processTest process
Test process
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.ppt
 
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
 
ISCRAM Impact Evaluation
ISCRAM Impact EvaluationISCRAM Impact Evaluation
ISCRAM Impact Evaluation
 
Final-Audit-Sampling.pdf
Final-Audit-Sampling.pdfFinal-Audit-Sampling.pdf
Final-Audit-Sampling.pdf
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity Matrix
 
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
 
Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...
 
Acceptance Testing
Acceptance TestingAcceptance Testing
Acceptance Testing
 
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Freeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsFreeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with Patients
 
Strengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataStrengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use Data
 
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
TRI's DIA 2015 Presentation, Therapeutic KRIs:  Digestive DiseaseTRI's DIA 2015 Presentation, Therapeutic KRIs:  Digestive Disease
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
 
#W4A2011 - C. Bailey
#W4A2011 - C. Bailey#W4A2011 - C. Bailey
#W4A2011 - C. Bailey
 
Quality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfQuality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdf
 

Plus de Andrea Wiggins

Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen ScienceAndrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science PhenotypesAndrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceAndrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen ScienceAndrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityAndrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen ScienceAndrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceAndrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesAndrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesAndrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace dataAndrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceAndrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsAndrea Wiggins
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceAndrea Wiggins
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceAndrea Wiggins
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property DesignationsAndrea Wiggins
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 

Plus de Andrea Wiggins (20)

Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen Science
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property Designations
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

Mechanisms for Data Quality and Validation in Citizen Science

  • 1. Mechanisms for Data Quality and Validation in Citizen Science A. Wiggins, G. Newman, R. Stevenson & K. Crowston Presented by Nathan Prestopnik
  • 2. Motivation  Data quality and validation are a primary concern for most citizen science projects  More contributors = more opportunities for error  There has been no review of appropriate data quality and validation mechanisms  Diverse projects face similar challenges  Contributors’ skills and scale of participation are important considerations in ensuring quality
  • 3. Methods  Survey  Questionnaire with 70 items, all optional  63 completed questionnaires representing 62 projects  Mostly small-to-medium sized projects in US, Canada, UK; most focus on monitoring and observation  Inductive development of framework  Based on survey results and authors’ direct experience with citizen science projects
  • 4. Survey: Resources  FTEs: 0 – 50+  Average: 2.4; Median: 1  Often small fractions of several individuals’ time  Annual budgets: $125 - $1,000,000  Average: $105,000; Median: $35,000; Mode: $20,000  Up to 5 different funding sources, usually grants, in- kind contributions (staff time), & private donations  Age/duration: -1 to 100 years  Average age: 13 years; Median: 9 years; Mode: 2 years
  • 5. Survey: Methods Used Method n Percentage Expert review 46 77% Photo submissions 24 40% Paper data sheets submitted along with online entry 20 33% Replication/rating by multiple participants 14 23% QA/QC training program 13 22% Automatic filtering of unusual reports 11 18% Uniform equipment 9 15% Validation planned but not yet implemented 5 8% Replication/rating, by the same participant 2 3% Rating of established control items 2 3% None 2 3% Not sure/don’t know 2 3%
  • 6. Survey: Combining Methods Methods n Percentage Single method 10 17% Multiple methods, up to 5 (average 2.5) 45 75% Expert review + Automatic filtering 11 18% Expert review + Paper data sheets 10 17% Expert review + Photos 14 23% Expert review + Photos + Paper data sheets 6 10% Expert review + Replication, multiple 10 17%
  • 7. Survey: Resources & Methods  Number of validation methods and staff are positively correlated (r2 = 0.11)  More staffing = more supervisory capacity  Number of validation methods and budget are negatively correlated (r2 = -0.15)  If larger budgets means more contributors, this constrains scalability of multiple methods  Larger projects may use fewer but more sophisticated mechanisms  Suggests that human-supervised methods don’t scale
  • 8. Survey: Other Validation Options  “Please describe any additional validation methods used in your project”  Several projects rely on personal knowledge of contributing individuals for data quality  Not scientifically robust, but understandably relevant  Most comments referred to details of expert review  Reinforces the perceived value of expertise  Reporting interface and associated error-checking is often overlooked, but provides important initial data verification
  • 9. Choosing Mechanisms  Data characteristics to consider when choosing mechanisms to ensure quality  Accuracy and precision: taxonomic, spatial, temporal, etc.  Error prevention: malfeasance (gaming the system), inexperience, data entry errors, etc.  Evaluate assumptions about error and accuracy  Where does error originate? How do mechanisms address this? At what step in the research process? How transparent is data review and outcomes? How much data will be reviewed? In how much detail?
  • 10. Mechanisms: Protocols Mechanism Process Type/Detail QA project plans Before SOP in some areas Repeated samples/tasks During By multiple participants, single participant, or experts (calibration) Tasks involving control During Contributions compared to known states items Uniform/calibrated During Used for measurements; cost/scale equipment tradeoff; who pays? Paper data sheets + During Extended details, verifying data entry online entry* accuracy Digital vouchers* During Photos, audio, specimens/archives Data triangulation, After Corroboration from other data sources; normalization, mining* statistical & computer science methods Data documentation* After Provide metadata about processes
  • 11. Mechanisms: Participants Mechanism Process Types/Details Participant training Before, Initial; Ongoing; Formal QA/QC During Participant testing Before, Following training; Pre/test-retest During Rating participant During, Unknown to participant; Known to performance After participant Filtering of unusual During, Automatically; Manually reports After Contacting participants After May alienate/educate contributors about unusual reports Automatic recognition After Techniques for image/text processing Expert review After By professionals, experienced contributors, or multiple parties
  • 12. Discussion  Need to pay more attention to way that data are created, not just protocols but also qualities of data like accuracy, precision  Clear need for quality/validation mechanisms for analysis, not only for data collection/processing  Data mining techniques  Spatio-temporal modeling  Scalability of validation may be limited  May need to plan different quality management techniques based on expected/actual project growth
  • 13. Future Work  Most projects worry more about contributor expertise than appropriate analysis methods  Resources are needed to support suitable analysis approaches and tools  Comparative valuation of the efficacy of the data quality and validation mechanisms identified  Develop a QA/QC planning and evaluation tool  Develop examples of appropriate data documentation for citizen science projects  Necessary for peer review, data re-use
  • 14. Thanks!  Nate Prestopnik  DataONE working group on Public Participation in Scientific Research  US NSF grants 09-43049 & 11-11107

Notes de l'éditeur

  1. Rating = classification or judgment tasks, admittedly not the clearest wording, but no one corrected this in text responsesPercentage = percentage of responding projects that use each method
  2. Percentage = Percentage of responding projects that use this combination of methodsThere were a few other combinations that a handful of projects used; these were the dominant ones.Surprised to see so many with photos, as they are hard to use and store, and the frequency of using paper data sheets
  3. Note that we did ask about numbers of contributions, but the units of contribution for each project (and even the way they count volunteers) were so different that they couldn’t be used for analysis
  4. Split framework of mechanisms in two for ease of viewing; these are methods that address the protocol as the presumed source of errorStarred items address errors arising from both protocols and participants
  5. These methods all address expected errors form participants, focusing primarily on skill evaluation and filtering or review for unusual reports