2. Learning ObjectivesLearning Objectives
To know the steps necessary for ensuring quality assurance
and control of data at various stages of a study
To understand the difference between pilot testing and pre-
testing
To understand the importance of designing data collection
instruments
To understand how data can be managed using an audit
trail and the various techniques that can be used to inspect
your dataset after it has been entered
3. Performance ObjectivesPerformance Objectives
Know the difference between quality assurance and quality
control and ways to ensure them
Know the objectives of a pilot test and a pre-test
Understand how data collection instruments should be
designed and coded
Be able to manage data using an audit trail
Be able to inspect datasets for errors and rectify them
4. Data Quality ControlData Quality Control
Quality Assurance
– Activities to ensure
quality of data before
data collection
Quality Control
– Monitoring and
maintaining the quality
of data during the
conduct of the study
• Data Management
– Handling and
processing of data
throughout the study
5. Steps in Quality AssuranceSteps in Quality Assurance
1. Specify the study hypothesis
2. Specify general design to test study hypothesis ⇒
Develop an overall study protocol
3. Choose or prepare specific instruments
4. Develop procedures for data collection and processing
⇒ Develop operation manuals
5. Train staff ⇒ Certify staff
6. User certified staff, pretest and pilot-study data
collection and processing instruments and procedures
6. Quality Assurance: Standardization ofQuality Assurance: Standardization of
proceduresprocedures
Why is standardization important?
– In order to achieve highest possible level of uniformity
and standardization of data collection procedures in the
entire study population
Preparation of written manual of operations
– Detailed descriptions of exactly how the procedures
specific to each data collection instrument are to be
carried out (BP example)
– Q by Q’s (question by question) instructions for
interviews
7. Quality Assurance: Training of StaffQuality Assurance: Training of Staff
Aim to make each staff person
thoroughly familiar with procedures
under his/her responsibility
Training certification of the staff
member to perform a specific procedure
8. Quality Assurance: Pretesting and PilotQuality Assurance: Pretesting and Pilot
testingtesting
Pretesting
– Involves assessing
specific procedures
on a sample in
order to detect
major flaws
Pilot Testing
– Formal rehearsal of
study procedures
– Attempts to
reproduce the
whole flow of
operations in a
sample as similar as
possible to study
participants
9. Pretesting and Pilot testing resultsPretesting and Pilot testing results
Pretesting of questionnaire used to assess:
– flow of questions,
– presence of sensitive questions,
– appropriateness of categorization of variables,
– clarity of the q by q instructions to the
interviewer
Pilot testing
– In addition to the above, flow of process
10. Quality Assurance: Data ManagementQuality Assurance: Data Management
Designing data collection
– Layout, questions to ask, sequence of questions,
phrasing of questions, response categories, skip
patterns
– Collect and record “raw”, not processed
information (eg. Age)
– Codebook: link between the questionnaire and
the data entered in the computer
11. Code book exampleCode book example
Variable QNo Meaning Codes Format
Q1Id Q1 Quest. No 1-750 C 3
Q2Sex Q2 Respondent’s sex 1 male
2 female
N 1.0
Q3Child Q3 No of children 99 no response N 2.0
Q4Wt Q4 Weight in kg 999 not recorded N 3.1
Q5roof Q5 Roof type 1 RCC
2 Cement sheet
3 Tin sheet
4 Thatched
Other (specify)
N 2.0
12. Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book
Variable names
– Up to 8 characters a-z and 0-9, must start with a letter
– Combination of question number and description (eg.
q3age)
Meaning:
– short text description describing the meaning of the
variable
– SPSS software can incorporate this info as variable
labels and display it in the output
13. Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book
Codes
– Try and use numerical codes
Predecide codes for no response, missing values
– Question could not be asked or not applicable (eg.
pregnancy outcome)
– Question was asked but respondent did not reply (eg
salary)
– Respondent replied “don’t know”
14. Quality ControlQuality Control
Observation of procedures and performance of staff
members for identification of obvious protocol
deviations
Strategies include:
– Over-the-shoulder observation of staff
– Taping all interviews and reviewing a random sample
– Ongoing field supervision
– field editing by interviewer as well as field supervisor
– Office editing which includes coding
– log book maintenance
– Statistical assessment of trends over time in the
performance of each observer/interviewer/technician
15. Data Management: Audit trailData Management: Audit trail
Researcher should be able to trace each piece of
information back to the original document:
– ID included in the original documents and in the dataset
– All corrections must be documented and explained
– All modifications to the dataset must be documented by
command files
– Each analysis must be documented by a command file
Purpose of audit is to
– protect yourself against mistakes, errors, waste of time
and loss of information
– enable external audit (revision)
16. Data Management: Handling of DataData Management: Handling of Data
Entering data
– Use professional data entry program like
EpiData
Preparations
– complete codebook
– examine questionnaires for obvious
inconsistencies, skip patterns
17. Data Management: Handling of DataData Management: Handling of Data
Error prevention:
– Set up a data entry form resembling your
questionnaire
– Define valid values before entering data
– double data entry by two different operators
compare contents to get list of discrepancies (
EpiInfo)
correct errors in both files and run new comparison
18. First Inspection of data. Error FindingFirst Inspection of data. Error Finding
Add variable and value labels to your data using a syntax
command
Searching for errors
– make printouts of codebook from the data, overview of variables, simple
frequency tables of appropriate variables
– compare codebook created with original codebook and see if label
information is correct
– Inspect the generated summary/frequency tables for illegal or improbable
minimum and maximum values of variables and inconsistencies (eg. 250
years age, pregnant male; 23 yr woman with 19 yr son)
Calculate the error rate by
– randomly select 10% or at least 40 of your questionnaires and re-enter
them into new file
19. Correction of errors - DocumentationCorrection of errors - Documentation
If errors are discovered
– Make corrections in a command file (SPSS
syntax file), this will provide full
documentation of changes made to the dataset
If errors are discovered when comparing
files after double data entry
– you can make corrections directly in the data
entered, provided you end this step with a
comparison of the two files entered and
corrected
20. Correction of errors - DocumentationCorrection of errors - Documentation
Split the process into distinct and well-
defined steps and that your
documentation from one step to another
is consistent
Archive
– once you have a “clean” documented version of
your primary data, save one copy in a safe
place and do your work with another copy
21. AnalysisAnalysis
Make sure you use the right data set
– recommend to create command files for
analysis which start with the command reading
the dataset
Late discovery of errors and inconsistencies
22. Backing up vs ArchivingBacking up vs Archiving
Backing up
– everyday activity
– purpose to able you to restore your data and documents
in case of destruction or loss of data
– not only datasets, but also command files modifying
your data, written documents such as the protocol, log
book and other documenting information
Archiving
– takes place once or a few times during the life of the
project
– purpose is to preserve your data and documents for a
more distant future, maybe to even allow other
researchers access to the information.
Notes de l'éditeur
If necessary, modify step 2-4 and retrain staff on basis of the results of step 6
Detailed descriptions are necessary in order to maximize the likelihood that tasks will be performed as uniformly as possible.
Eg. Description of procedure for blood pressure measurements should include the calibration of the blood pressure apparatus, the position of the participant, the amount of resting time before and between measurements, the size of the cuff, position of the cuff on the arm.
Extensive training of interviewers is crucial since they will be the primary source of your data collection. Their training should include interviewing skills, processing procedures, setting up appointments for interviews or visits, calibrating instruments, etc. Training should also involve lab technicians and those in charge of classifying data obtained from examinations
If necessary, periodic recertification should take place. A staff member should be retrained if during recertification their performance is inadequate
Pretesting and pilot testing often used synonymously but they aren’t.
Pretesting can be done in two stages…1st on a convenience sample of your colleagues, friends; this is just to get an idea of the time it takes, the flow of the questions, etc.
2nd phase of pretesting would be the more formal where the procedure (usually the questionnaire) is administered on approximately 10% of your sample size in a sample as similar as possible to the study participants BUT NOT IN THE SAME AREA
Pilot testing is of all the processes including the questionnaire.
Pilot testing can also be used to evaluate alternative strategies for participant recruitment and data collection
Collect raw data wherever possible. For example age, instead of precoding age into categories like 18-24, 25-36, 36+, etc., record the actual age. Categories can be made with east at the time of analysis using statistical software.
A codebook contains variable names, meaning, skip patterns if any and precoded values as well as codes for no response, missing values
In formatting
N Numeric
N1.0 means 1 space width
N2.1 means 2 spaces before decimal point and 1 space after decimal point
(Note please refer to the software and how it requires the data to be formatted)
C Character
Reference category
-If you know your reference category from before, then make sure it is either the first code or the last code. Eg.
The reference category for type of roof is RCC, then code it as 1. RCC and all the other roof types can be in any order
Other(specify) will not be coded until you have your data
Codebook should include your decision on how to record missing data
SPSS can define certain values as missing. Remember to be consistent with handling of missing information and its coding
Predecide on the codes to use for missing data, etc and keep consistent with the format
Eg. 9, 99, 999 for Don’t know
8, 98, 998 for Refusal /
7, 87, 997 for No response / Not applicable
Same principals of audit apply in research as in keeping financial accounts, i.e. documentations should be such that it is possible to go back from the balance sheet to the individual bills.
Meticulous documentation from the beginning to the end can be a tedious task especially when deadlines are to be met. However, its utility will be fully appreciated when you need to make a few modifications and rerun your analysis. The audit trail is the researchers road map of his/her quests.
EpiData is an easy to use tool for simple or programmed data entry and for data documentation. EpiData can be used to create data entry programs for EpiInfo. Its available for free at http://www.epidata.dk/
Use the COMPARE option in EpiInfo to see the fields that are not identical or similar with the final file.
Most statistical packages have the option of creating a syntax of commands. SPSS also uses syntax. The advantage of syntax is that you may rerun a series of commands at any stage thus consistently being able to duplicate your process.
Handling inconsistent data
go back to the source
data recoded to missing
examine other information available and judge which piece of information is likely to be correct (not recommended)
Decide which method and be consistent. Document in writing.
Error rate calculation
Numerator will be the number of mistake in the final file and the denominator will be the (number of records * number of fields) in the new file. Error rate less than 0.3% is considered acceptable
If errors are discovered, do not be tempted to go directly into the data window and correct errors because 1) the risk of “correcting” the wrong variable or case is high; and 2) the change is undocumented and the audit trail is broken
Despite efforts to secure the data quality, you may still discover errors and inconsistencies during analysis. If you have been maintaining and documenting command files, you can go back and modify the correction command file and rerun this and subsequent command files. If you have not been documenting and maintaining command files, then the procedure may be time consuming - and risky
Final archive to include at least the following:
study protocol
applications to and permissions from ethical committees, etc
data collection instruments (questionnaires, etc.)
coding instructions and other technical descriptions
log book and other written documentation on the processing of data
at least the first and final version of your data
all command files modifying data. The command files should enable to reconstruct the final version from the first version of your data