SQP is a program that predicts the reliability of a survey question from its design characteristics. This presentation held at the European Social Surveys' Quality Enhancement Meeting in Barcelona discusses the workings behind the program.
Predicting the quality of a survey question from its design characteristics
1. Predicting the quality of a survey question
from its design characteristics
Daniel Oberski
Dept. of Methodology & Statistics,
Tilburg University, The Netherlands
U N I V E R S I T A T
P O M P E U F A B R A
Predicting the quality of a survey question from its design characteristics Daniel Oberski
2. Question #1: Where is Tilburg?
Predicting the quality of a survey question from its design characteristics Daniel Oberski
3. Summary
• We wanted to predict how “good” a survey question is
based on its “design characteristics”, i.e. choices such as #
categories, DK present/not, use of showcards, etc.
(Willem’s talk 1 & 2)
→ Trained a random forest of regression trees to predict
the ESS multitrait-multimethod estimates of reliability
and method effect (Cf. M´elanie’s talk) from these question
characteristics, as coded by teams of human coders.
• Prediction powers a webapp allowing users to code
question & get predictions ± uncertainty (Diana’s talk).
Predicting the quality of a survey question from its design characteristics Daniel Oberski
5. European Social Survey, 2002
Method A:
ENTER START TIME:
1 TvTot
CARD 1 On an average weekday, how much time, in total, do you
spend watching television? Please use this card to answer.
No time at all
Less than ½ hour
½ hour to 1 hour
More than 1 hour, up to1½ hours
More than 1½ hours, up to 2 hours
More than 2 hours, up to 2½ hours
More than 2½ hours, up to 3 hours
More than 3 hours
(Don’t know)
A2 TvPol
STILL CARD 1 And again on an average weekday, how much of
your time watching television is spent watching news or
programmes about politics and current affairs1
? Still use
this card.
No time at all 00
00 GO TO A3
01
02
03
04 ASK A2
05
06
07
88
Method B:!
!""#$%&'()*%)+&#!)&,%$#
!
-&.# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#)$32.4"(#
3&5&%464/"89
:##
#
# # # ,$/+%#/)#;!<=>0#### ###?@A#BC@<DE>0# # # #
# # # #
-&1# #!"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#5463&"4"(#3/#
3.&#'$+4/8F
:##
#
# # # ,$/+%#/)#;!<=>G## ?@A#BC@<DE>G# # # #
# # # # # #
#
#
-&2# !"#$"#$%&'$(&#)&&*+$,-#./)#012.#340&-#4"#3/3$5-#+/#,/1#67&"+#'&$+4"(#3.&#
"&)67$7&'688
:##
#
# # # ,$/+%#/)#;!<=>G# #?@A#BC@<DE>G# #
#
#
#
'34564#7897:5;4#;<#=>5;#4?;48;#@<A#5BC44#<C#9765BC44#=7;>#45:>#<D#;>4#D<33<=78B#
6;5;4E48;6F#
Predicting the quality of a survey question from its design characteristics Daniel Oberski
6. Design characteristics for method A
Predicting the quality of a survey question from its design characteristics Daniel Oberski
7. Design characteristics for method B
Predicting the quality of a survey question from its design characteristics Daniel Oberski
8. Reliability and common method variance estimates
from an MTMM experiment
For the TV watching example:
Reliability Method
variance
Method A: 8 categories 0.796
(0.01)
0.098
(0.01)
Method B: write (after outlier del’n) 0.819
(0.02)
0.140
(0.01)
Predicting the quality of a survey question from its design characteristics Daniel Oberski
9. Goal of this research:
obtain a rough estimate of the reliability and method effects of
these kinds of questions without needing a
multitrait-multimethod experiment.
Predicting the quality of a survey question from its design characteristics Daniel Oberski
10. ESS multitrait-multimethod experiments
• In the European Social Survey (ESS), every round about
six MTMM experiments are done;
• Each experiment usually estimates the quality for 9
questions (Method-Trait combinations).
• Range of topics is reasonably diverse, though factual
questions are underrepresented.
• In total about 1051 method-trait combinations available,
3483 “questions” when the analysis is done separately per
country
Predicting the quality of a survey question from its design characteristics Daniel Oberski
11. “Old” multitrait-multimethod experiments
• In addition to the ESS, an older series of experiments also
exists (Andrews; K¨oltringer; Saris; Scherpenzeel, 1990’s)
• These add another 1089 questions for which reliability and
common method coefficients are estimated
• Combining the two datasets (ESS question qualities and
Old experiment qualities, we created a database of 3483
questions with their reliability and common method
estimates (1051 unique method-trait combinations).
Predicting the quality of a survey question from its design characteristics Daniel Oberski
12. Reliability and common method variance estimates of
3483 questions (1051 unique method-trait comb’s)
Reliability coefficient
Reliability coefficient
0.4 0.6 0.8 1.0
0
200
400
600
800
Common method variance
Common method variance
0.0 0.2 0.4 0.6 0.8
0
500
1000
1500
Predicting the quality of a survey question from its design characteristics Daniel Oberski
13. Design characteristics of questions
• Social Desirability
• Centrality
• Reference period
• Question
formulation
• WH word used
• Use of gradation
• Balance of the
request
• Encouragement
• Showcards
present
• Showcards have
pictures
• ...
• Emphasis on subjective
opinion in request
• Information about the
opinion of other people
• Use of stimulus or
statement in the question
• Absolute or comparative
judgment
• Response scale: basic
choice
• Number of categories
• Labels full, partial, or no
• Labels full sentences
• Knowledge provided
• Survey mode
• ...
• Order of the labels
• Correspondence between
labels and numbers of the
scale
• Theoretical range of the
scale
• Neutral category
• Number of fixed reference
points
• Don’t know option
• Interviewer instruction
• Respondent instruction
• Extra motivation, info or
definition available?
• Agree-disagree scale
• . . .
(Saris & Gallhofer 2007)
Predicting the quality of a survey question from its design characteristics Daniel Oberski
14. Predicting the quality of a survey question from its design characteristics Daniel Oberski
15. Coding design characteristics of the 3483 questions
• For each of the 3483 questions in all countries, a team of
coders coded 40 design characteristics of the question;
• Some codes were automatically generated by Natural
Language Processing software (syllables, words, etc).
• Coders were students, assistants to the local coordinators
of the ESS, and two experts;
• For English source version, experts double-coded
questions independently, then created consensus codes;
• Non-expert codes were quality-controlled by detailed
comparison with consensus codes for the English source;
• In a meeting between the experts and each other coder,
the discrepancies were discussed and either corrected or
left in as true differences.
Predicting the quality of a survey question from its design characteristics Daniel Oberski
16. Domain of question # questions # unique comb’s
Internatl politics 64 64
Health 244 88
Living conditions 433 234
Other beliefs 292 292
Work 509 119
Family 109 7
Personal relations 308 62
Consumer behavior 26 26
Leisure activts 131 63
National gvt 169 31
Institutions 412 35
Political parties 30 30
Trade unions 12 12
Economy 266 16
Other 370 89
Predicting the quality of a survey question from its design characteristics Daniel Oberski
17. Concept of question # questions # unique comb’s
Evaluative belief 703 136
Feeling 1044 161
Importance 96 96
Expectation 39 30
Facts, behavior 111 60
Complex concept 44 44
Other simple 1338 608
Predicting the quality of a survey question from its design characteristics Daniel Oberski
18. Meta-analysis dataset
• For each of the 3483 questions, we have in the database:
• The estimated quality (reliability and common method
coefficients)
• About 50 design characteristics (through hand- and
automatic coding)
• Predict reliability and method effect estimates from design
characteristics
Predicting the quality of a survey question from its design characteristics Daniel Oberski
19. Meta-analysis using random forests of regression trees
→
Predicting the quality of a survey question from its design characteristics Daniel Oberski
20. Example of one (1) regression tree (CART).
Predicting the quality of a survey question from its design characteristics Daniel Oberski
21. |
domain=3,4,7,11,13,14,112
domain=3
gradation>=0.5 position< 339.5
position>=410
concept=1,2 position< 404.5
concept=1,73,78
position< 322.5
ncategories>=4.5
domain=6,101,103,120
domain=4,7,11,13,14,112
gradation< 0.5 position>=339.5
position< 410
concept=73,75,76 position>=404.5
concept=2,76
position>=322.5
ncategories< 4.5
1.955
n=1988
1.724
n=1303
0.9636
n=108
0.4959
n=36
1.198
n=72
1.793
n=1195
1.642
n=722
2.023
n=473
1.544
n=108
1.28
n=76
2.17
n=32
2.165
n=365
1.97
n=217
2.45
n=148
2.394
n=685
1.489
n=138
2.622
n=547
2.384
n=233
2.799
n=314
2.681
n=260
3.364
n=54
Example regression tree for reliability coefficient
Predicting the quality of a survey question from its design characteristics Daniel Oberski
22. Obtaining a prediction from a random forest
Predicting the quality of a survey question from its design characteristics Daniel Oberski
23. Training the forest of regression trees
Predicting the quality of a survey question from its design characteristics Daniel Oberski
24. Random forest algorithm (Breiman 2001)
1 Randomly sample (w/ replacement) cases (questions)
→ some “in-bag” cases;
→ Rest is “out-of-bag”.
2 Randomly sample features (characteristics);
3 Grow a regression tree on the in-bag cases using the
subset of selected features;
4 Calculate mean square prediction error on the out-of-bag
cases → built-in cross-validation;
5 Back to (1) until 1500 trees have been grown.
Predicting the quality of a survey question from its design characteristics Daniel Oberski
25. joined.data
Ordered by number of missing items
avgabs_intro
nabst_intro
nnouns_intro
showc_boxes
showc_quest
showc_start
opinionother
showc_horiz
showc_letter
showc_over
position
ncategories
range.correspondence
scal_neutral
absolute
concept
domain
dont_know
form_basic
future
instr_interv
instr_respon
motivation
past
scale_basic
stimulus
usedshowcard
interviewer
visual
fixrefpoints
labels
scale_corres
scale_trange
labels_gramm
computer.assisted
intr_request
avgabs_total
avgsy_total
avgwrd_intro
avgwrd_total
balance
centrality
country
encourage
gradation
intropresent
knowledge
labels_order
language
nabst_total
nnouns_total
nsub_quest
nsyll_total
numsub_intro
nwords_intro
nwords_total
questiontype
rel.est
rel.est.orig
repetition
scale_urange
socdesir
study
subjectiveop
symmetry
used_WH_word
val.est
val.est.orig
from
Predicting the quality of a survey question from its design characteristics Daniel Oberski
26. Missing data
• Imputed using multiple imputation with chained equations
(MICE) (van Buuren & Groothuis-Oudshoorn 2011).
• A small random forest (bush?) is grown for each imputed
dataset
• These bushes are joined to form the final forest
• “raimforest” (RAndom IMputation forest)
Predicting the quality of a survey question from its design characteristics Daniel Oberski
27. Out-of-bag crossvalidation
• R2 for logit(reliability coef.) prediction: 0.64
• R2 for logit(validity coef.) prediction: 0.84
Predicting the quality of a survey question from its design characteristics Daniel Oberski
30. Conditional variable importance
Strobl et al. (2008), BMC Bioinformatics 2008, 9:307.
Predicting the quality of a survey question from its design characteristics Daniel Oberski
33. Back to the example
Predicting the quality of a survey question from its design characteristics Daniel Oberski
34. Predicting the quality of a survey question from its design characteristics Daniel Oberski
35. Predicting the quality of a survey question from its design characteristics Daniel Oberski
36. Evaluation and future work
Predicting the quality of a survey question from its design characteristics Daniel Oberski
37. Some of the challenges encountered
• (Coding process)
• (MTMM estimation process)
• Mapping the old coding system into the new...
• Missing data; non-applicable characteristics (→ raimforest)
• Multicollinearity, aliasing of interactions (→ RF)
• Propagating the uncertainty about the MTMM estimates
(not solved)
• Coming up with a reasonable prediction interval (→ RF)
• Dealing with main vs. supplementary (estimate an order
effect and assume the coded question is like the main
questionnaire)
• ...
Predicting the quality of a survey question from its design characteristics Daniel Oberski
38. Things that still need doing in my opinion
• Publish journal articles to let the world know about this
meta-analysis!
• Model was cross-validated on ESS R1–3. Also validate
predictions against new ESS MTMM results.
• Evaluate practical effect on estimates of interest before &
after correction for reliability and method effect: how
different are the estimates in practice?
• Seriously reduce the number of codes
• Planned experiments to reduce multicollinearity, aliasing in
the meta-analysis
• Propagate uncertainty about the MTMM estimates
• Perhaps compare w/ linear regression/other meta-analysis
techniques
• ...
Predicting the quality of a survey question from its design characteristics Daniel Oberski
39. Thank you for your attention!
doberski@uvt.nl
Predicting the quality of a survey question from its design characteristics Daniel Oberski