SlideShare a Scribd company logo
1 of 16
Lesson 5: Data Quality Control and Assurance




                                                                                  CC image by Shane Melaugh on Flickr
                                                 CC image by harusday on Flickr


Data Quality Control and Assurance
• Definitions
        o Quality assurance and Quality control
        o Data contamination
        o Types of errors

     • QA/QC best practices




                                                  CC image by cobalt123 on Flickr
        o Before data collection
        o During data collection/entry
        o After data collection/entry




Data Quality Control and Assurance
• After completing this lesson, the participant will be able to:
        o Define data quality control and data quality assurance
        o Perform quality control and assurance on their data at all stages of
          the research cycle




                                                          CC image by 0xFCAF on Flickr

Data Quality Control and Assurance
Plan

                      Analyze                   Collect




               Integrate                              Assure




                      Discover                  Describe

                                     Preserve
Data Quality Control and Assurance
Data Contamination
     • Process or phenomenon, other than the one of interest,
       that affects the variable value
     • Erroneous values




                                                 CC image by Michael Coghlan on Flickr
Data Quality Control and Assurance
• Errors of Commission
        o Incorrect or inaccurate data entered
        o Examples: malfunctioning instrument, mistyped data
     • Errors of Omission
        o Data or metadata not recorded
        o Examples: inadequate documentation, human error, anomalies in the
          field




                                                      CC image by Nick J Webb on Flickr

Data Quality Control and Assurance
• Strategies for preventing errors from entering a dataset
     • Activities to ensure quality of data before collection
     • Activities that involve monitoring and maintaining the
       quality of data during the study




Data Quality Control and Assurance
• Define & enforce standards
        o Formats
        o Codes
        o Measurement units
        o Metadata
     • Assign responsibility for data quality
        o Be sure assigned person is educated in QA/QC




Data Quality Control and Assurance
• Double entry
        o Data keyed in by two independent people
        o Check for agreement with computer verification
     • Record a reading of the data and transcribe from the
       recording
     • Use text-to-speech program to read data back




                                                  CC image by weskriesel on Flickr




Data Quality Control and Assurance
• Design data storage well
        o Minimize number of times items that must be entered repeatedly
        o Use consistent terminology
        o Atomize data: one cell per piece of information
     • Document changes to data
        o Avoids duplicate error checking
        o Allows undo if necessary




Data Quality Control and Assurance
• Make sure data line up in proper columns
     • No missing, impossible, or anomalous values
     • Perform statistical summaries




Data Quality Control and Assurance                CC image by chesapeakeclimate on Flickr
• Look for outliers
        o Outliers are extreme values for a variable given the statistical model
          being used
        o The goal is not to eliminate outliers but to identify potential data
          contamination
                    60


                    50


                    40


                    30


                    20


                    10


                     0
                         0   5   10   15   20    25   30    35




Data Quality Control and Assurance
• Methods to look for outliers
        o Graphical
          • Normal probability plots
          • Regression
          • Scatter plots
        o Maps
        o Subtract values from mean




Data Quality Control and Assurance
• Data contamination is data that results from a factor not
       examined by the study that results in altered data values
     • Data error types: commission or omission
     • Quality assurance and quality control are strategies for
        o preventing errors from entering a dataset
        o ensuring data quality for entered data
        o monitoring, and maintaining data quality throughout the project
     • Identify and enforce quality assurance and quality control
        measures throughout the Data Life Cycle




Data Quality Control and Assurance
1.   D. Edwards, in Ecological Data: Design, Management and
          Processing, WK Michener and JW Brunt, Eds. (Blackwell, New
          York, 2000), pp. 70-91. Available at www.ecoinformatics.org/pubs
     2.   R. B. Cook, R. J. Olson, P. Kanciruk, L. A. Hook, Best practices for
          preparing ecological data sets to share and archive. Bull. Ecol. Soc.
          Amer. 82, 138-141 (2001).
     3.   A. D. Chapman, “Principles of Data Quality:. Report for the Global
          Biodiversity Information Facility” (Global Biodiversity Information
          Facility, Copenhagen, 2004). Available at
          http://www.gbif.org/communications/resources/print-and-online-
          resources/download-publications/bookelets/




Data Quality Control and Assurance
The full slide deck may be downloaded from:
    http://www.dataone.org/education-modules

    Suggested citation:
    DataONE Education Module: Data Quality Control and
    Assurance. DataONE. Retrieved Nov12, 2012. From
    http://www.dataone.org/sites/all/documents/L05_DataQuality
    ControlAssurance.pptx

    Copyright license information:
                 No rights reserved; you may enhance and reuse for
                 your own purposes. We do ask that you provide
                 appropriate citation and attribution to DataONE.


Data Quality Control and Assurance

More Related Content

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Recently uploaded (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

DataONE Education Module 05: Data Quality, Control and Assurance

  • 1. Lesson 5: Data Quality Control and Assurance CC image by Shane Melaugh on Flickr CC image by harusday on Flickr Data Quality Control and Assurance
  • 2. • Definitions o Quality assurance and Quality control o Data contamination o Types of errors • QA/QC best practices CC image by cobalt123 on Flickr o Before data collection o During data collection/entry o After data collection/entry Data Quality Control and Assurance
  • 3. • After completing this lesson, the participant will be able to: o Define data quality control and data quality assurance o Perform quality control and assurance on their data at all stages of the research cycle CC image by 0xFCAF on Flickr Data Quality Control and Assurance
  • 4. Plan Analyze Collect Integrate Assure Discover Describe Preserve Data Quality Control and Assurance
  • 5. Data Contamination • Process or phenomenon, other than the one of interest, that affects the variable value • Erroneous values CC image by Michael Coghlan on Flickr Data Quality Control and Assurance
  • 6. • Errors of Commission o Incorrect or inaccurate data entered o Examples: malfunctioning instrument, mistyped data • Errors of Omission o Data or metadata not recorded o Examples: inadequate documentation, human error, anomalies in the field CC image by Nick J Webb on Flickr Data Quality Control and Assurance
  • 7. • Strategies for preventing errors from entering a dataset • Activities to ensure quality of data before collection • Activities that involve monitoring and maintaining the quality of data during the study Data Quality Control and Assurance
  • 8. • Define & enforce standards o Formats o Codes o Measurement units o Metadata • Assign responsibility for data quality o Be sure assigned person is educated in QA/QC Data Quality Control and Assurance
  • 9. • Double entry o Data keyed in by two independent people o Check for agreement with computer verification • Record a reading of the data and transcribe from the recording • Use text-to-speech program to read data back CC image by weskriesel on Flickr Data Quality Control and Assurance
  • 10. • Design data storage well o Minimize number of times items that must be entered repeatedly o Use consistent terminology o Atomize data: one cell per piece of information • Document changes to data o Avoids duplicate error checking o Allows undo if necessary Data Quality Control and Assurance
  • 11. • Make sure data line up in proper columns • No missing, impossible, or anomalous values • Perform statistical summaries Data Quality Control and Assurance CC image by chesapeakeclimate on Flickr
  • 12. • Look for outliers o Outliers are extreme values for a variable given the statistical model being used o The goal is not to eliminate outliers but to identify potential data contamination 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 Data Quality Control and Assurance
  • 13. • Methods to look for outliers o Graphical • Normal probability plots • Regression • Scatter plots o Maps o Subtract values from mean Data Quality Control and Assurance
  • 14. • Data contamination is data that results from a factor not examined by the study that results in altered data values • Data error types: commission or omission • Quality assurance and quality control are strategies for o preventing errors from entering a dataset o ensuring data quality for entered data o monitoring, and maintaining data quality throughout the project • Identify and enforce quality assurance and quality control measures throughout the Data Life Cycle Data Quality Control and Assurance
  • 15. 1. D. Edwards, in Ecological Data: Design, Management and Processing, WK Michener and JW Brunt, Eds. (Blackwell, New York, 2000), pp. 70-91. Available at www.ecoinformatics.org/pubs 2. R. B. Cook, R. J. Olson, P. Kanciruk, L. A. Hook, Best practices for preparing ecological data sets to share and archive. Bull. Ecol. Soc. Amer. 82, 138-141 (2001). 3. A. D. Chapman, “Principles of Data Quality:. Report for the Global Biodiversity Information Facility” (Global Biodiversity Information Facility, Copenhagen, 2004). Available at http://www.gbif.org/communications/resources/print-and-online- resources/download-publications/bookelets/ Data Quality Control and Assurance
  • 16. The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Data Quality Control and Assurance. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L05_DataQuality ControlAssurance.pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE. Data Quality Control and Assurance

Editor's Notes

  1. First we will definequality assurance and quality control, and review definitions for other related terms including data contamination, and review the common types of errors in data. Then we will review some best practices that can be implemented before data collection, during data collection, and after data collection to ensure high-quality data.
  2. After completing this module, you will be able to define quality control and quality assurance. You will also understand how to perform quality control and assurance on your data at all stages of the data life cycle.
  3. Quality control and quality assurance should be implemented throughout the data life cycle, but here we focus on measures that can be taken during collection. The “assure” step in the data life cycle closely ties in with QAQC measures.
  4. Before describing best practices useful for quality control and assurance, we will define some related terms. Data contamination results from a process or phenomenon, other than the one of interest, which can affect the variable values. Data contamination results in erroneous values in the data set. 
  5. In general, there are two types of errors that can occur in a data set. First, errors of commission are the result of incorrect or inaccurate data being included in the data set. This may happen because of a malfunctioning instrument that produces faulty results, data that are mistyped during entry, or other problems. Errors of omission are the second type of errors. These result from data or metadata being omitted. Situations that result in omission errors are when data are inadequately documented, when there are human errors during data collection or entry, or when there are anomalies in the field that affect the data. Omission errors examples:data are inadequately documented for effective use. when there are human errors during data collection or entry, for example a measurement is forgotten, or a spreadsheet line is skipped during data entry. anomalies in the field that affect the data. If anomalies that are known to affect data are not documented and reported in the metadata, then erroneous data may be recorded and used. GPS unit running out of battery power.
  6. Quality assurance and quality control are phrases used to describe activities that prevent errors from entering or staying in a data set. These activities ensure the quality of the data before it is collected, entered, or analyzed, and monitoring and maintaining the quality of data throughout the study.
  7. The remainder of the module will cover best practices for quality control and quality assurance for the different stages of a research project. First, before data collection, a researcher should think about defining and enforcing standards that will be used during the project. Consider formats that will be used for the data tables or data entry forms. Also, if abbreviations or codes are used, they should be defined up front. Measurement units should also be specified and relevant metadata should be identified before collection. Second, you should assign responsibility for data quality before collection begins. Ideally, the person responsible for data quality assurance and data quality control is the person collecting the data, and is educated in quality control and assurance methods.
  8. Consider using techniques that help eliminate mistakes during data entry. Examples are using Double data entry, where non-digital data are keyed in by two people independently. Differences in entries can then be detected via computer programs and examined further for mistakes. Another way to reduce data entry error is to record yourself reading off the data, and then transcribe it from the recording. You may also use a text-to-speech program reads the data to you while you type it into the computer.
  9. If you are using spreadsheets or databases, you should carefully consider their design before and during data entry. Use consistent terminology within the database, and atomize data. This means only one piece of information is in each cell of the spreadsheet -- multiple pieces of information embedded in a single data cell will be problematic during data analysis. If you are using a database, restrict what can be entered into the database; for example, set up a field to accept only text or only numerical values, choose a maximum number of characters or a range of values a field will accept, or set a field to accept only unique values. Finally, document any changes made to data. It saves time if good records of data editing are kept since multiple users are less likely to spend time on error-checking with old versions of data.If mistakes in data editing or cleaning are made, good data records will allow these mistakes to be undone. Documenting data changesmay be as simple as creating a text file to accompany the data set, or it may involve using a scripted program for correcting errors so that each step taken is clearly documented.
  10. Once data are entered, basic quality assurance measures can be taken. First, if data are in spreadsheets or databases, be sure they line up in their proper columns. Also check for any missing, impossible, or anomalous values. One way to check for these problems is to sort data fields and check for discrepancies. It is often also useful to perform basic statistical summaries, such as means, and standard errors. If data transformation was performed for analysis, compare the statistical summaries before and after transformation to ensure no mistakes were made during transformation.
  11. Another strategy for quality control after data are entered is to look for outliers. Outliers are extreme values for a variable. Extreme values are those that lie outside of the statistical model being used to describe the data. Keep in mind that the goal is not to eliminate outliers but to identify potential data contamination. In this graph, most of the data fall along the black line. The outlier identified with the red arrow should be flagged for further investigation.
  12. One common strategy for identifying outliers is using graphical methods, for instancenormal probability plots, regression (as in the previous slide), or scatter plots> If data are geographical, mapping the points can be to ensure latitude and longitude were correctly entered. This map shows an example of an error that is the result of mis-entering latitude data. Another method for identifying outliers is using statistics. By subtracting values from the mean of the data set, the presence of outliers or faulty data points can become apparent.
  13. During this tutorial we first defined several concepts important for understanding quality assurance and quality control. This included data contamination and the types of data errors that can result in poor quality data.We then covered best practices for quality assurance and quality control. These strategies prevent errors from entering a dataset, or identifying those errors if they are present in the data.It is important to define and enforce quality assurance and quality control standards before, during, and after the collection and entry of data.