A solutions-based approach, illustrated by case studies, which show how inferences can be improved from surveys administered to biased, low response rate and non-probability samples.
It addresses how to improve the accuracy of the survey estimates we generate from poorer quality and non-probability samples.
2. www.srcentre.com.au
Presenters
Social Research Centre Pty Ltd
Darren W. Pennay
Dina Neiger, PhD
Paul J. Lavrakas, PhD
Darren Pennay is an Adjunct Senior Research Fellow with the ANU Centre for Social Research and Methods and
an Adjunct Professor with the Institute for Social Science Research at the University of Queensland
Dina Neiger is a Centre Visitor with the ANU Centre for Social Research and Methods
Paul Lavrakas is a Senior Fellow, NORC at the University of Chicago; Senior Research Fellow, Office of Survey
Research, Michigan State University; and Senior Methodological Advisor for the ANU’s Social Research Centre
2
3. www.srcentre.com.au
The Social Research Centre
➢ Based in Melbourne, Australia
o owned by the Australian National University in Canberra
o operates as a for-profit research services company.
o profits returned to ANU
o Co-founders of the ANU Centre for Social Research and Methods
➢ Conduct social and public policy research for government, not-for-profit and academic
organisations
➢ Primary data collection involving survey research, qualitative research, statistical
methods, data processing and analytics, survey methodology
➢ 65 staff, plus 120-seat CATI call centre
3
4. www.srcentre.com.au
What is Total Survey Quality?
➢ Many national statistical offices, including Australia, Canada, New Zealand and USA,
operate within a Total Survey Quality (TSQ) framework to determine whether or not a
survey is “fit-for-purpose”.
➢ Accuracy is not enough.
4
Dimensions of TSQ Framework
Dimension Description
Accuracy Total survey error is minimised
Credibility Data are considered trustworthy by the survey user communities
Comparability
Consistent with past studies in terms of demographic, spatial, and
temporal comparisons
Usability / Interpretability Documentation is clear and metadata are well organised
Relevance Data satisfy user needs
Accessibility Access to the data is user friendly
Timeliness / Punctuality Data deliverables adhere to schedules
Completeness
Data are rich enough to satisfy the analysis objectives without
undue burden on respondents
Coherence Estimates from different sources can be reliably combined
Source: Biemer, P. (2010) Public Opinion Quarterly 74 (5): p.819.
6. www.srcentre.com.au
Total Survey Error Framework
6
Final Results &
Conclusions
Sampling Error
Nonresponse Error
Adjustment Error
ERRORS OF REPRESENTATION ERRORS OF MEASUREMENT
Coverage Error
Processing Error
Specification Error
Measurement Error
Inferential Error
Target Population
Sampling Frame
Designated Sample
Final sample Final Dataset
Response
Specification
Measurement
7. www.srcentre.com.au
What is a probability sample?
Textbook Definition: A probability sample is one in which every unit in the
population of interest has a known, non-zero probability of being selected for
the sample.
Key elements which will differentiate the process from a non-probability
sampling process:
o Selection into the sample is via a random process.
o Every unit in the population of interest has a chance of being sampled for the research.
o The probability of selection is known.
These design features enable us to:
o Calculate standard errors
o Calculate confidence intervals
o Generalise to the target population of interest
7
10. The general decline in response rates is evident
across nearly all types of surveys, in the United
States and abroad. At the same time, greater effort
and expense are required to achieve even the
diminished response rates of today.
These challenges have led many to question
whether surveys are still providing accurate and
unbiased information [and whether or not
probabilistic surveys are better than the
alternatives?]
Pew Research Center, 2012
10
11. www.srcentre.com.au
Trends in response rates (U.S. data)
There is no sign of an increase in
nonresponse bias since 2012.
On 13 demographic, lifestyle, and health
questions that were compared with
benchmarks from high response rate federal
surveys, estimates from phone polls are just
as accurate, on average, in 2016 as they
were in 2012. The average (absolute)
difference between the Center telephone
estimates and the benchmark survey
estimates was 2.7 percentage points in
2016, compared with 2.8 points in 2012.
11
14. www.srcentre.com.au
Impact of low response rates on survey estimates
Pew conclusions 2012
➢ Despite the growing difficulties in obtaining a high level of participation in most surveys,
well-designed telephone polls that include landlines and cell phones reach a cross-
section of adults that mirrors the American public, both demographically and in many
social behaviours.
Pew conclusions 2016
➢ Telephone polls still provide accurate data on a wide range of social, demographic and
political variables, but some persistent weaknesses persist.
14
15. www.srcentre.com.au
What is the situation in Australia?
A literature review and industry consultation commissioned by RICA and undertaken by
Bednall et al. (2013) concluded the following:
➢ Telephone response rates:
o As far as the telephone is concerned response rates have been in a gradual decline the last
decade.
o Among cold-calling general community surveys, telephone response rates are typically below
10%.
o Co-operation rates, (the ratio of obtained interviews to refusals) are typically below 0.2 (that is
below one interview to five refusals).
o Telephone interviews from client lists have a higher response rate – typically above 20% with
co-operation rates above 1.0.
o It would appear that some topics, such as financial services, may induce a lower level of co-
operation.
o Government sponsored surveys have higher response rates, at times over 50%, but even here
a sharp decline in response rates over time for one long running monitor was observed. Co-
operation rates were also higher in government sponsored surveys.
15
16. www.srcentre.com.au
What is the situation in Australia – SRC Landline surveys
17
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
2011 2012 2013 2014 2015
Non-contactsas%usablesample
Survey year
Non-contacts as % usable sample - various projects
Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
17. www.srcentre.com.au
What is the situation in Australia – SRC Landline surveys
18
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
2011 2012 2013 2014 2015
Refusalsasa%ofcontacts
Survey year
Refusals as % contacts - various projects
Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
18. www.srcentre.com.au
What is the situation in Australia – SRC Landline surveys
19
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
2011 2012 2013 2014 2015
Interviewsasa%ofinterviews+refusals
Survey year
Interviews as % interviews and refusals - various projects
Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
19. www.srcentre.com.au
What is the situation in Australia (cont.)?
! Not saying that low response rates are inconsequential!
! Poorly designed and poorly executed surveys, especially those with very
limited call routines (e.g. polls/trackers enumerated over one or two
evenings only), those with non-coverage errors (e.g. landline telephone
surveys), and those that use non-probability sampling methods (e.g.
online panels) are much more likely to produce biased results.
! HOWEVER, well designed and well executed probability-based surveys
still produce estimates that can be relied on in many situations (and with
known standard errors and confidence intervals!)
22
22. www.srcentre.com.au
A Word of Warning on Probability Sampling
Rivers again ….
[Probability surveys work] if the nonresponse rate is small. Cochran
(1977, p. 363), in his classic text, concluded that the upper limit is
approximately 10 percent nonresponse, which is difficult to achieve today in
even the best funded surveys.
Lohr (2010, p. 355) warns that “many examples exist of surveys with a 70%
response rate whose results are flawed.”
(Doug Rivers, Comments on the “2013 AAPOR Taskforce Report on Non-
probability Sampling” (Douglas Rivers; Comment. J Surv Stat
Methodol 2013; 1 (2): 111-117. doi: 10.1093/jssam/smt009)
25
24. www.srcentre.com.au
The take-up of non-probability online panels
The situation in Australia
➢ In 2014-15, 86% of Australian households had access to the internet at home (ABS).
➢ Online research is continuing to grow in popularity domestically and internationally.
28
24
39
34 34
40
26
28
25
31
12
0
10
20
30
40
50
2009 2010 2012 2013 2015
Online
CATI
Source: ESOMAR / Research Industry Council of Australia
Percent of research industry turnover
Online research
globally – 28%
in 2014
➢ 50+ commercial ‘online
research panels’.
➢ First of these established in
the late 1990’s.
➢ All ‘until now’ use non-
probability sampling
methods.
25. www.srcentre.com.au
The rise of non-probability panels - globally
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
'99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15E '16E
Revenuein$USM
Year
Online Research Revenues 1999-2016E
US Europe Rest of World
29
26. www.srcentre.com.au
Cons
➢ Non-coverage
➢ Self-selection
➢ Reliance on the computer-literate
respondents
➢ Non-probability sampling
o Calculating standard errors
o Calculating confidence intervals
o Generalising to the target population of
interest
Pros
➢ Reduced cost
➢ Improved timeliness
➢ Respondent convenience
➢ Reduced social desirability bias
➢ Can target ‘hard to reach’
populations
➢ Multimedia functionality
➢ Computerised questionnaire
scripts
30
Pros and cons of non-probability panels
27. www.srcentre.com.au
How do they compare – the SRC Online Panels Benchmarking
Survey
➢ Three surveys based on probability samples of the Australian population aged 18 years
and over and five surveys of persons aged 18 years and over administered to members
of non-probability online panels.
➢ Survey questionnaire included a range of demographic questions and questions about
health, wellbeing and use of technology for which high-quality population benchmarks
were available.
➢ Same questions were used across the eight surveys.
➢ 9 minutes average interview length for online and telephone.
o 12 page booklet for the hard copy version.
➢ Fieldwork Oct – Dec 2015.
➢ Data and documentation available from the Australian Data Archive.
https://www.ada.edu.au/ada/01329
31
28. www.srcentre.com.au
Results – substantive health characteristics
32
Substantive variables
Benchmark
value
Distance from benchmarks
Probability Non-probability
ABS
ANU
Poll
RDD P1 P2 P3 P4 P5
Life satisfaction (8 out of 10)
Percentage point error 32.6 -2.0 -2.0 1.9 -11.9 -11.6 -4.5 -9.2 -7.9
Psychological distress - Kessler 6 (Low)
Percentage point error 82.2 -10.6 -11.6 -8.1 -25.9 -23.5 -22.2 -25.0 -23.2
General Health Status (SF1) (Very good)
Percentage point error 36.2 0.4 -2.0 -2.6 -4.1 -5.8 -5.3 -5.0 1.5
Private Health Insurance
Percentage point error 57.1 3.4 1.9 3.3 -8.9 -12.5 -3.7 -0.6 -2.6
Daily smoker
Percentage point error 13.5 -4.1 3.5 1.6 9.8 6.7 3.9 2.7 4.3
Consumed alcohol in the last 12 months
Percentage point error 81.9 3.6 -2.8 -4.0 2.4 5.3 3.9 4.2 1.5
29. www.srcentre.com.au
Results – substantive health characteristics
Average error across six substantive measures
33
Probability Surveys Non-probability Panels
ABS
ANU
Poll
RDD P1 P2 P3 P4 P5
Avge. error 4.02 3.98 3.58 10.5 10.9 7.24 7.78 6.83
Largest absolute error 10.59 11.57 8.08 25.86 23.52 22.20 24.96 23.20
No. of significant
differences from
benchmarks (out of 6)
2 3 2 4 6 3 3 3
30. www.srcentre.com.au
Two questions …
1. How do we reduce non-response bias in low response rate
probability surveys
2. How do we reduce bias in survey estimates based on non-
probability online panels
34
31. www.srcentre.com.au
Instructors
Dina Neiger, PhD
➢ Dina is a professional statistician with over 20 years experience and a track record of
achievement in leadership and technical roles.
➢ Social Research Centre, Monash University, Australian Bureau of Statistics, Peter
MacCallum Cancer Centre.
➢ 1st Class Honours degree in Statistics and PhD in Business Systems from Monash
University with an emphasis in applied Operations Research and Process Engineering.
➢ Accredited Statistician (AStat) and member of the Statistical Society of Australia (SSA).
➢ Full member of the Australian Society for Operations Research (ASOR).
➢ Calibration and blending methods to improve accuracy of the non-probability samples,
establishment and maintenance of the first Australian Online Probability panel and
complex business survey design and weighting.
35
32. www.srcentre.com.au
Instructors
Paul J. Lavrakas, PhD
➢ Research psychologist, research methodologist, and prolific author.
➢ Since 2007, Independent Consultant with clients in the USA, Australia, Belgium, Japan
and Canada; and Senior Fellow at NORC (U-Chicago) and Office of Survey Research
(Michigan State U.) .
➢ From 2000-2007, Chief Research Methodologist for Nielson Media Research.
➢ 1978-2000 - Professor at Northwestern U. and Ohio State U. and founding faculty
director of a survey research center at each university.
➢ Australian roles - Senior Methodological Adviser, the Social Research Centre. Member
of the Scientific Advisory Board for the Centre for Social Research and Methods, ANU.
➢ President of the American Association for Public Opinion Research (2011-2014) and
continues to serve on a volunteer basis on many AAPOR task forces and committees.
36
34. www.srcentre.com.au
Investigating Nonresponse Bias
➢ Methods have evolved in the past decade regarding how NR Bias can be investigated
(Groves and Brick, 2007; Olson and Montaquilla, 2012)
➢ One approach is to conduct analyses comparing the difference between early
responders and middle responders vs. late responders on key measures in a
survey with the hypothesis/premise being that late responders are more like
nonrespondents than are early and middle responders
➢ However, if no differences are found between late responders vs. early and middle
responders, that is NOT taken as evidence that there is no differences between
respondents and nonrespondents.
38
35. www.srcentre.com.au
Case Study 1: University Faculty Health Benefits Survey
Study Background
➢ The purpose of this survey was to provide reliable and valid information about the
opinions of faculty towards the prospect of the university creating a new health care
service for faculty working on the main campus.
➢ Approximately 800 of 5,000 members of the faculty were randomly sampled, and a
telephone survey was conducted by the university’s survey research center, which
yielded 47% completion rate.
➢ Nonresponse bias was investigated using multiple methods, with one being a Level of
Effort analysis.
39
36. www.srcentre.com.au
Level of Effort Analyses
➢ Level of Effort was operationalized as the number of calls made to a given sampled
respondent who completed the interview
➢ Analyses were conducted to investigate whether the effort it took to complete an
interview was related to any of the substantive questions that were asked in the
questionnaire.
➢ From these analyses it was learned than none of the key questionnaire items had a
statistically significant (p < .05) association with this effort measure
➢ For one of the questionnaire items – asking whether the respondent had heard about
the possible new health service prior to being interviewed – there was a marginally
significant correlation (p < .07) with effort, but the size of this correlation was extremely
small (r = 0.09).
➢ Thus, it was judged that there as no reliable evidence that any nonignorable
nonresponse bias was present in the findings reported in the main body of the report.
40
37. www.srcentre.com.au
Case Study 2: Voter Identification Survey
Study Background
➢ In 2012, there were several states in the USA whose state governments were controlled
by conservative party members who passed legislation changing the requirements for
the identification that a voter must show to prove that he or she were eligible to vote on
the day of the Presidential Election.
➢ The purpose of the survey, which was funded by large labor unions that were known to
support liberal candidates/policies, was to identify how many and which types of
residents in the state were most likely to be disenfranchised by the new Voter ID
legislation.
41
38. www.srcentre.com.au
Level of Effort Analysis 1
➢ One approach was to compare key data provided by those who initially refused, but
later agreed to complete the questionnaire after being recontacted with a “refusal
conversion” protocol, with data provided by the cohort that never refused.
➢ In this analysis it was found that for neither of the two of the key statistics measured by
the survey – awareness of the new Voter ID legislation; whether someone had a valid
photo ID to vote at their local polling place on November 6, 2012 – were there
statistically significant differences (p < .05) between those who never refused and those
who initially refused but later were converted.
➢ Thus, there was no evidence found to suggest that refusing nonrespondents who were
not converted would have given materially different answers to these key questions than
did the respondents.
42
39. www.srcentre.com.au
Level of Effort Analysis 2
➢ A second approach was to take into account the number of call attempts it took to
complete an interview, and investigate if answers to the same two key variables
correlated with the level of effort expended.
➢ In this analysis it was found that as more effort was expended to reach a respondent,
the likelihood the respondent had a valid photo ID for voting purposes increased at a
statistically significant level (p < .007).
o Thus, the findings suggested that the unweighted survey data were somewhat biased in the
direction of underestimating the proportion of people with a valid photo ID
43
40. www.srcentre.com.au
Level of Effort Analysis 2 (cont.)
➢ However, since the level of effort to achieve a completion correlated with various
demographic characteristics of respondents (e.g., it take less effort to reach elderly
respondents and more effort to reach young adults) and given that key demographic
characteristics were taken into account in the weighting process, it was judged to be
unlikely that the findings reported about the proportion of registered voters without a
valid photo ID were subject to nonignorable nonresponse bias.
➢ Concerning awareness of the legislation, there was no statistically significant difference
associated with the level of effort expended to gain a completion.
o Thus, there was no evidence found to suggest that uncontacted nonrespondents would
have given materially different answers to this key questions than did the respondents.
44
42. www.srcentre.com.au
Case Study: IMLS Nonresponse Follow-Up (NRFU) Study
➢ Another preferred method to investigate Nonresponse Bias is to conduct a follow-up
survey of a sample of the original survey’s nonresponders.
➢ Little has been reported regarding how to use the data from such follow-up studies
➢ Please note, that my slides today are extracted from a paper presentation that I made in
2015 at the 70th annual conference of the American Association for Public Opinion
Research (AAPOR), “Studying Nonresponse Bias with a Follow-up Survey of Initial
Nonresponders in a Dual Frame RDD Survey”
46
43. www.srcentre.com.au
2013 Public Needs for Library and Museum Services (PNLMS)
Survey
➢ Sponsored by the Institute of Museum and Library Service (IMLS), a federal agency in
the USA
➢ Conducted by M. Davis and Company, Inc., Philadelphia PA
➢ National dual-frame RDD survey of the general population of the USA
o Data gathered September-November 2013
o 3,537 interviews completed; 2,506 from LL frame; 1,031 from CP frame
• Landline sample AAPOR RR3 was 25%
• Cell Phone sample AAPOR RR3 was 10%
47
44. www.srcentre.com.au
2014 IMLS Nonresponse Follow-up (NRFU) Survey
➢ Created shortened questionnaire with key variables
➢ Used a noncontingent and a contingent incentive protocol
➢ Used only best calibre interviewers
➢ 201 interviews were completed in January-February 2014 with a random sample of
nonresponders to the main survey; 100 from LL frame, 101 from CP frame
o Landline follow-up sample AAPOR RR3 was 32%
o Cell Phone follow-up sample AAPOR RR3 was 16%
➢ The original sample and the follow-up samples had remarkably similar demographic
characteristics with the exception that the follow-up sample had proportionally more
young adults aged 18-24 years and fewer older adults aged 65+ year (p < .03) than the
original survey.
48
45. www.srcentre.com.au
Analytic Approach: Phase 1
➢ Combining the two surveys via Weighting, in two Phases
o Phase 1: Begin by assigning weights, separately, to each of four groups
• Landline RDD sample (original study)
• Landline NR follow-up sample
• Cell phone RDD sample (original study)
• Cell phone NR follow-up sample
49
46. www.srcentre.com.au
Analytic Approach: Phase 1
➢ For each of the four groups, Phase 1 including the following seven steps conducted for
each US Census region within each sample type:
1. Probability of Selection and Design Weight
2. Nonresponse Follow-up Adjustment
3. Unknown Eligibility Adjustment
4. Removal of Known Ineligible
5. Unit Nonresponse Adjustment
6. Adult Subsampling Adjustment
7. Multiple Phone Line Adjustment
➢ Steps 1 and 2 created the weights to start the weighting process
➢ Steps 3-7 were sequential adjustments to the starting weights
50
47. www.srcentre.com.au
1. Probability of Selection and Design Weight
51
For each of the four samples:
➢ Using information from the sampling frame, the probability of selection was calculated
as the number of released telephone numbers in each Census region divided by
the total number of telephone numbers in the census region
➢ The design weight was the inverse of the probability of selection for the released
telephone numbers and zero for the non-released telephone numbers
48. www.srcentre.com.au
2. Nonresponse Follow-up Adjustment
52
For each of the two NRFU samples:
➢ For the telephone numbers that were eligible for the nonresponse follow-up study, a
further adjustment to the design weight was required to account for the subsampling of
telephone numbers that were called from among all the eligible nonresponse follow-up
numbers
➢ This adjustment to the design weight was calculated as the number of follow-up
eligible telephone numbers divided by the number of follow-up eligible called
telephone numbers
49. www.srcentre.com.au
3. Unknown Eligibility Adjustment
As in any DFRDD survey, a considerable portion of the sampled numbers
ended the original field period with their eligibility unresolved
For each of the four samples:
➢ Using standard AAPOR final disposition codes, two groups of telephone numbers were
created in each Census region: (1) unknown eligibility telephone numbers and (2)
known eligibility telephone numbers
➢ Unknown eligibility adjustment factor was the sum of the follow-up adjusted design
weights for all telephone numbers divided by the sum of the follow-up adjusted
design weights for the known eligibility telephone numbers
➢ The unknown eligibility adjusted weight is the product of the unknown eligibility
adjustment factor and the follow-up adjusted design weight for the known eligibility
telephone numbers and zero (0.0) for the unknown eligibility telephone numbers
53
50. www.srcentre.com.au
4. Removal of Known Ineligible
As with all telephone surveys of the general public, many of the telephone
numbers that were released were found to be ineligible for various reasons,
including various non-working numbers (e.g., disconnected or temporarily
out of service, technical problems, etc.), and numbers that were not part of
the target population, e.g., fax, business, or other nonresidential numbers.
For each of the four samples:
➢ Using AAPOR final disposition codes, these known ineligible telephone numbers
were identified and removed from the weighting process
54
51. www.srcentre.com.au
5. Unit Nonresponse Adjustment
55
As with all telephone surveys of the general public, for many of the
telephone numbers for which contact was made, it was determined that
there was at least one eligible adult but no data were collected
For each of the four samples:
➢ Using the final disposition codes for the eligible telephone numbers produced two groups
of eligible telephone numbers in each Census region, (1) eligible responding
telephone numbers and (2) eligible non-responding telephone numbers
➢ The unit nonresponse adjustment shifts the weights from the eligible nonrespondents to
the eligible respondents
➢ The nonresponse adjustment factor was the sum of the unknown eligibility adjusted
weights for all eligible telephone numbers divided by the sum of the unknown
eligibility adjusted weights for the responding telephone numbers
➢ The nonresponse adjusted weight was the product of the nonresponse adjustment factor
and the unknown eligibility adjusted weight for respondents, and zero (0.0) for eligible
nonrespondents
52. www.srcentre.com.au
6. Landline Adult Subsampling Adjustment
56
Within eligible landline households there may have been more than one
eligible adult in residence; this was determined in the questionnaire
For each of the four samples:
➢ When there were two or more eligible adults associated with an landline telephone
number, the adult subsampling adjustment factor was capped at two (2.0)
➢ The adult subsampling adjusted weight was the product of the adult subsampling
adjustment factor and the nonresponse adjusted weight.
53. www.srcentre.com.au
7. Multiple Phone Line Adjustment
Most adults can be sampled on more than one phone line.
For each of the four samples:
➢ A multiplicity adjustment was implemented when there were two or more telephone
numbers, either landline or cell phone, at which the responding adult could have been
contacted; this multiplicity adjustment factor was capped at two (2.0), otherwise,
the multiplicity adjustment factor was one (1.0)
This completed Phase 1 of the separate weighting of each of the four
groups
57
54. www.srcentre.com.au
Phase 2 Combining Samples: Composite Weights
58
For each of the four samples:
➢ The composite weighting adjusted the multiplicity adjusted weight from the landline
respondents and cell phone respondents to account for overlap in the two samples
➢ If the person could have been reached by both frames the composite weight was 0.5;
if the person could only be reached by one frame it was 1.0
55. www.srcentre.com.au
Phase 2 Combining Samples: Calibration
For each of the four samples:
➢ The final adjustment forced the weight totals from the survey data using the composite
weight to match external population control totals; the external control totals were
based on the following characteristics:
o An extrapolation of Blumberg and Luke’s 2013 NHIS findings for telephone service
ownership so that 50% were both landline and cell phone, 41% cell phone only, and 9%
landline only
o Socio-demographic characteristics from the most recent American Community Survey
• Gender (Male, Female)
• Age group (18-44, 45-64, 64+)
• Marital status (Never been Married, Married, Separated/Divorced/Widow)
• Hispanicity/Race (Hispanic, Non-Hispanic African American, Non-Hispanic White, Non-Hispanic
Other)
• Education (Less High School/High School, Some College, Associate or Bachelor Degree, Advanced
Degree)
• Presence or absence of children (No, Yes)
59
56. www.srcentre.com.au
Phase 2 Combining Samples: Calibration (cont.)
➢ The calibration methodology that was used was Iterative Proportional Fitting, i.e.,
raking or sample balancing;
➢ This method forces all of the different characteristics to simultaneously match the
control totals
o Each of the socio-demographic characteristics was used as a separate dimension in the
raking process
60
57. www.srcentre.com.au
Using the Final Combined Data from the Four Samples
➢ After all this was done we had a weighted dataset containing all the interviews from
the original study and from the follow-up survey of nonresponders (n= 3,738)
➢ This allowed us to generate two types of population estimates for key behavioral
variables in the study
o Estimates based on the original survey only
o Estimates based on the combined surveys
➢ We then compared the differences between the two estimates to determine whether or
not a statistic was materially different when the data from the nonresponse follow-up
survey was taken into account
61
58. www.srcentre.com.au
An Example of a Material Difference in a Key Statistic
➢ Percent of parents/guardians reporting having a child who visited a Zoo/Aquarium in the
past 30 days
62
Original Survey
Yes, Did Visit 19.2%
Est. # in USA 14.45M
59. www.srcentre.com.au
An Example of a Material Difference in a Key Statistic
➢ Percent of parents/guardians reporting having a child who visited a Zoo/Aquarium in the
past 30 days
➢ 8.1 percentage point difference
➢ Estimated 6.1M difference
63
Original Survey Combined Surveys
Yes, Did Visit 19.2% 27.3%
Est. # in USA 14.45M 20.55M
60. www.srcentre.com.au
The Extent of Differences Identified by the NR Follow-up Study
➢ For 10 of the 28 behavioral measures, the percentage found in the original survey
differed by more than two percentage points (2 pp) from the percentage found in the
combined survey dataset
➢ Of these, six of the behavioral measures differed by more than five percentage points (5
pp)
➢ All of these measures were for estimates associated with the reported behavior of
children (where the adult respondent served as a proxy for her/his child) and none of
them were for estimates of the reported behavior of the adult respondent
Nonresponse Bias exists at the level of the individual measure, and here we
found evidence of several measures that likely were highly biased
64
62. www.srcentre.com.au
Acknowledgement
Analysis by Andrew Ward, Principal Statistician, Social Research Centre
Thanks to Dr Siu-Ming Tam and Mr Paul Schubert, ABS, for their
collaboration on this work and their kind permission to use the survey data
for this presentation.
Slides based on presentation by Andrew Ward at the Australian Statistical
Conference in December 2016.
66
63. www.srcentre.com.au
Community Trust in Statistics Survey
Determine public
awareness and trust of
official statistics
Dual frame phone survey
Also available for respondents
and non-respondents: part-of-
state (based on the landline
prefix) or mobile
Collected info from ~727
refusals – age, sex,
awareness, trust
67
64. www.srcentre.com.au
Challenge
➢ Incorporation of refusal information in non-response adjustment
Probability of response
derived from propensity
model
Base weight =
Design weight / Probability of response
Limited auxiliary information available for respondents and non-
respondents
68
65. www.srcentre.com.au
Non-response adjustment
Awareness / Trust
Non-respondents
(%)
Respondents
(%)
Have heard of and trust a great deal 12.4 20.5
Have heard of and tend to trust 28.5 54.6
Have heard of and tend to distrust 6.2 10.5
Have heard of and distrust a great deal 4.8 2.2
Have heard of but DK / Refused trust 26.3 2.9
Total awareness 78.2 90.7
Have not heard of 17.1 9.0
Don’t know / Refused 4.8 0.4
69
66. www.srcentre.com.au
Gives boost to respondents most like non-respondents
Location, Age, Sex, Awareness of official statistics, Trust in official
statistics
Logistic regression model predicts response probability from information
available for both respondents and non-respondents
Non-response adjustment (cont’d)
70
67. www.srcentre.com.au
Non-response adjustment (cont’d)
Awareness / Trust
Unweighted
(%)
Without
propensity
weight (%)
With
propensity
weight (%)
With % -
Without %
Have heard of and trust a great
deal
20.5 15.3 14.3 -1
Have heard of and tend to trust 54.6 52.6 47.5 -5.1
Have heard of and tend to distrust 10.5 11.1 9.9 -1.2
Have heard of and distrust a great
deal
2.2 2.4 2.9 0.5
Have heard of but DK / Refused
trust
2.9 3.4 7.5 4.1
Total awareness 90.7 84.8 82.1 -2.7
Have not heard of 9.0 14.5 17.1 2.6
Don’t know / Refused 0.4 0.7 0.9 0.2
71
69. www.srcentre.com.au
Non-response adjustment (cont’d)
Has face-validity, though…
For example, persons
with university education
are typically over-
represented in RDD
surveys
Therefore awareness
estimates may have been
inflated
73
71. www.srcentre.com.au
Data
The Social Research Centre
Cancer Council Victoria
Slides and Ideas
Darren Pennay
Andrew C Ward
Paul J Lavrakas
Tina Petroulias
Sebastian Misson
Charles DiSogra
Inferences from Non Probability
Samples Workshop participants,
Paris 2017
75
Acknowledgments
72. www.srcentre.com.au
Key background references
➢ Yeager, D. S., Krosnick, J. A., Chang, L., Javitz, H. S., Levendusky, M. S., Simpser, A.,
(2011)
o Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with
Probability and Non-Probability Samples. Public Opinion Quarterly
➢ DiSogra, C., Cobb C., Chan E., Dennis J. M. (2011)
o Calibrating Non-Probability Internet Samples with Probability Samples Using Early Adopter
Characteristics. Section on Survey Research Methods – JSM 2011
➢ Valliant, R., Dever, J. A. (2011)
o Estimating propensity adjustments for volunteer web surveys, Sociological Methods &
Research, 40(1), pp. 105-137)
➢ Terhnian, G., Bremer, J., Haney C., (2014)
o A model based approach for achieving a representative sample
➢ Fahimi, M., F. M. Barlas, R. K. Thomas and N. Buttermore (2015)
o Scientific Surveys Based on Incomplete Sampling Frames and High Rates of Nonresponse.
Survey Practice. 8 (5)
76
73. www.srcentre.com.au
Usual starting point
➢ Design weight – inverse of probability of selection
o Probability sample: number of people in the household, number of landlines, etc.
o Non-probability sample: unknown, usually given design weight=1
➢ Calibration (post-stratification, raking, RIM)
o Uses external measures as “benchmarks” to adjust/weight (calibrate) the data to
improve accuracy and force estimates to be consistent with other data sources
o Applied to both probability and non-probability samples for key population
demographics e.g. gender, age, education to reflect population distribution
o If non-probability sample uses quotas for the same demographics then post-
stratification will have minimal impact
77
74. www.srcentre.com.au
Selecting benchmarks
Should be a known source of non-response bias and likely to have an effect on survey estimates.
Must have information for each sample member individually and the population as a whole
➢ Identical or as close to as possible (e.g. the same survey question and the census)
Stratified sample:
➢ Unequal probability of selection within population subgroups (markets / strata)
➢ Population totals within each subgroup/stratum must be included as part of the weights
Common variables in general population surveys:
➢ Telephone status (for dual frame surveys)
➢ Gender
➢ Age by education
➢ Country of birth (Australia V Other English Speaking Country V Non-English Speaking Countries)
➢ Geography (State, Metro V regional)
78
75. www.srcentre.com.au
Is your adjustment working?
Key considerations:
➢ Variance (probability samples) and bias
Variance:
➢ One sample of many possible samples, how different would your result be if you happened to select a different random sample?
➢ If use too many benchmarks (or data is severely biased), weights will become extreme increasing variance of your estimate and
decreasing your confidence in the accuracy of your result
Bias:
➢ Difference from your survey result to the true value
➢ If ignore a known source of non-response bias then survey results will be further away from the truth
➢ Need to know the truth in order to measure
➢ Bias measurement dilemma - if already know the truth why bother with the survey?
What about weighting efficiency?
➢ A measure of how much work a weight is doing
➢ What is an acceptable minimum? No standard
➢ Can be misleading especially in non-probability or highly biased samples (e.g. weight=1)
79
76. www.srcentre.com.au
Case study 1: The Online Panels Benchmarking Study
➢ Three surveys based on probability samples of the Australian population aged 18 years
and over
➢ Five surveys of persons aged 18 years and over administered to members of non-
probability online panels
➢ Survey questionnaire included a range of demographic questions and questions about
health, wellbeing and use of technology
➢ Same questions were used across the eight surveys (Unified approach to questionnaire
design to try and minimise mode effects)
➢ 9 minutes average interview length for online and telephone (12 page booklet)
➢ Fieldwork: Oct – Dec 2015
➢ Data and documentation available from the Australian Data Archive
https://www.ada.edu.au/ada/01329
80
77. www.srcentre.com.au
Results – substantive health characteristics (modal response)
81
Substantive variables
Benchmark
value
Distance from benchmarks
Probability Non-probability
ABS
ANU
Poll
DF
RDD
P1 P2 P3 P4 P5
Life satisfaction (8 out of 10)
Percentage point error 32.6 -2.0 -2.0 1.9 -11.9 -11.6 -4.5 -9.2 -7.9
Psychological distress - Kessler 6 (Low)
Percentage point error 82.2 -10.6 -11.6 -8.1 -25.9 -23.5 -22.2 -25.0 -23.2
General Health Status (SF1) (Very good)
Percentage point error 36.2 0.4 -2.0 -2.6 -4.1 -5.8 -5.3 -5.0 1.5
Private Health Insurance
Percentage point error 57.1 3.4 1.9 3.3 -8.9 -12.5 -3.7 -0.6 -2.6
Daily smoker
Percentage point error 13.5 -4.1 3.5 1.6 9.8 6.7 3.9 2.7 4.3
Consumed alcohol in the last 12 months
Percentage point error 81.9 3.6 -2.8 -4.0 2.4 5.3 3.9 4.2 1.5
78. www.srcentre.com.au
Impact on weighting
Impact of weighting on average absolute error: The average difference
(percentage points) across all benchmarks between the official statistics and
the survey estimates.
82
Probability Surveys Non-probability Panels
ABS
ANU
Poll
RDD P1 P2 P3 P4 P5
Unweighted (avge. error) 4.28 3.68 4.63 9.34 10.35 7.28 7.20 6.41
Weighted (avge. error) 4.02 3.98 3.58 10.5 10.9 7.24 7.78 6.83
Impact ✓ ✓ -
80. www.srcentre.com.au
Design weight: Non-probability sample
➢ Current approaches:
o Best case scenario
• Model based selection to address known biases (e.g. quotas)
o Worst case scenario:
• Participation is a combination of idiosyncratic contacts, low response rates, and non-coverage.
• This is not the kind of design one would choose if there were an affordable alternative.
• (Doug Rivers, Comments on the 2013 AAPOR Taskforce Report on Non-probability Sampling”
(Douglas Rivers; Comment. J Surv Stat Methodol 2013; 1 (2): 111-117. doi: 10.1093/jssam/smt009)
➢ Possible alternative:
o Adapt propensity response model to model likelihood of being part of the non-probability
sample
o Need probability reference sample
84
81. www.srcentre.com.au
Probability reference sample
➢ Gold standard that we’d like non-probability sample to resemble as closely as possible
o Known to produce accurate estimates for key outcomes
o Ideally includes data that can be compared to independent benchmarks
➢ Needs
o Common data items with the non-probability estimate
• As a minimum: demographic and attitudinal and outcomes of interest
o Comparable data
• Mode (depending on the questionnaire)
• Question wording
• Reference timeframe
85
82. www.srcentre.com.au
Case study 2
LinA as a reference sample for OPBS
➢ Life in Australia
o http://www.srcentre.com.au/our-research#life-in-aus
o Australia’s first and only probability-based online panel
o Launched in December 2016
o 3,300 adults from across Australia randomly recruited via a dedicated dual-frame telephone
survey, Aged 18 years and over, Online and offline population
o Replicated online panel benchmarking study in February 2016
➢ OPBS non-probability panels reweighted
o Design weight=1
o Post-stratification weight matched to LinA weighting
o Add enrolment to vote to the substantive characteristics
86
83. www.srcentre.com.au
LinA as a reference sample
87
Substantive variables
Benchmark
value (%)
Distance from benchmarks (percentage point
difference from benchmark)
LinA
Non-probability Panels
1 2 3 4 5
Life satisfaction
(8 out of 10)
32.6 -1.4 -11.1 -13.6 -5.9 -10.3 -8.0
Psychological distress -
Kessler 6 (Low)
82.2 -21.3 -26.5 -25.5 -22.7 -25.1 -23.4
General Health Status -
SF1 (Very good)
36.2 -3.8 -4.2 -4.2 -4.0 -5.3 0.1
Private Health Insurance 57.1 2.6 -8.0 -9.8 -3.9 0.1 -2.0
Daily smoker 13.5 -1.0 9.0 6.0 3.5 2.2 3.5
Consumed alcohol in the
last 12 months
81.9 2.7 -1.1 -6.4 -3.6 -4.9 -1.7
Enrolled to vote 78.5 9.0 8.4 7.6 10.7 8.8 13.1
Average absolute difference 5.23 8.54 9.14 6.79 7.09 6.48
84. www.srcentre.com.au
Back to model-based design weights
➢ Use reference sample to calculate
o Propensity scores – conditional probability that a respondent in the non-probability sample
rather than in a probability sample given observed characteristics of the respondent
• Logistic model (R PracTools package, Valliant et al. 2015)
• Non-probability cases=1, reference cases=0
• Design weight for non-probability sample - inverse of estimated probability of inclusion in non-
probability sample given weighted reference sample
• Beware: extreme weights, size and quality of the reference sample
➢ Result:
o Probability-based design weights for non-probability sample
o Now ready to look at calibration/weighting adjustment methods
88
86. www.srcentre.com.au
No agreed approach – work in progress
➢ Starting point – known biases in the non-probability sample
➢ For example, for online panels
o Heavier internet users
o Heavier media users
o More interested in technology
o Early adopters skew
o More health care card holders
➢ Common theme
o Standard demographics are not enough
o High quality “official” benchmarks not available
➢ Probability reference sample to the rescue!
o As long as items that are known/suspected biases in the non-probability sample are
included
90
87. www.srcentre.com.au
Cast study 2
➢ Significant differentiators
o Early adopter variables
o Internet usage variables
o Income
o Employment
o Remoteness
o Home ownership
o Media consumption variables were not included in the survey
➢ Incorporate one additional variable at a time compare with benchmarks to evaluate
impact on bias
91
88. www.srcentre.com.au
Options for EA inclusion in calibration
➢ Each EA variable is included as a 0/1 variables where
o 1 means agreed or strongly agreed with that statement and
o 0 all else (disagreed or strongly disagreed, did not respond)
➢ Scale derived using Rasch model
o Bond, T. G. and C. M. Fox (2007). Applying the Rasch model: Fundamental measurement in
the human sciences. (2nd ed.) Mahwah, N.J.: Erlbaum.
➢ Categorical scale
o None – did not agree or strongly agree to any of 5 statements
o Some – agreed or strongly agreed with at least one and no more than 2 out of 5 statements
o High – agreed or strongly agreed with 3 or more out of 5 statements
➢ Agreement total calculated as the number of statements (min 0, max 5) that the
respondent agreed or strongly agreed with
92
89. www.srcentre.com.au
Options for EA inclusion in calibration
Composite is better than individual variables
93
6.2
2.5
-0.4
-1.6
2
-1.5
-4.4
-5.5
7.8
-0.3
-1.5
-2.4
-8
-6
-4
-2
0
2
4
6
8
10
EA1-EA4 EA1-EA5 Rasch Scale EA1-EA5 Categorical Scale EA1-EA5 Agreement total
%change
Impact of EA variables on bias (% change)
Ave abs error compared to unweighted Ave abs error compared to std adjustment
Ave RMSE compared to std adjustment
90. www.srcentre.com.au
Internet usage measures
Not useful when calibrating online to online
94
2.6
5
4.5
5.5
-1.5
0.9
0.3
1.3
0.1
2
6.4
1.9
-2
-1
0
1
2
3
4
5
6
7
Look for information Access at home, Look for
information
Look for information, Post to
blogs etc, Financial
transactions, Social Media
Access at home,
Frequency of use
%change
Impact of Internet Usage measures on bias (% change)
Ave abs error compared to unweighted Ave abs error compared to std adjustment
Ave RMSE compared to std adjustment
91. www.srcentre.com.au
Socio-economic variables in calibration
Income – single most influential variable to reduce bias and RMSE
Benefit from including both: income and employment
95
-7.3
-10.3
-7.3
-5.9
-11.0
-13.9
-10.9
-9.6
-8.5
-10.6
-5.8
-4.6
-16.0
-14.0
-12.0
-10.0
-8.0
-6.0
-4.0
-2.0
0.0
Income Income, Employment Income, Employment, Home
Ownership
Income, Employment, Home
Ownership, Remoteness
%change
Impact of Income, Employment and Home Ownership variables on bias (% change)
Ave abs error compared to unweighted Ave abs error compared to std adjustment Ave RMSE compared to std adjustment
93. www.srcentre.com.au
Impact of calibration
(LinA: Income, Employment, EA total agreement score)
97
Benefit of adjustment is sample dependent!
Standard adjustments do not help with bias reductions!!
Average Absolute Error % Change in bias
Unweighted
Std
adjustments
Design weights
and key
differentiators
From
unweighted
From std
adjustment
Panel 1 9.2 9.8 7.8 -15.1 -19.8
Panel 2 10.0 10.4 9.3 -7.2 -10.9
Panel 3 7.7 7.7 5.3 -31.0 -31.5
Panel 4 7.4 8.1 7.1 -3.3 -11.9
Panel 5 7.4 7.4 7.8 5.4 5.7
All Panels 8.3 8.7 7.5 -10.4 -13.9
94. www.srcentre.com.au
Comparison with independent benchmarks vs reference
sample benchmarks
98
Mostly change in bias is consistent but there can be large differences for some samples
-15.1
-7.2
-31.0
-3.3
5.4
-26.9
5.7
-31.7
28.7
1.0
-40.0
-30.0
-20.0
-10.0
0.0
10.0
20.0
30.0
40.0
Panel 1 Panel 2 Panel 3 Panel 4 Panel 5
% Change in bias from unweighted
Independent Benchmarks Reference Sample
-19.8
-10.9
-31.5
-11.9
5.7
-30.6
0.6
-31.5
-12.4
-8.3
-50.0
-40.0
-30.0
-20.0
-10.0
0.0
10.0
20.0
Panel 1 Panel 2 Panel 3 Panel 4 Panel 5
% Change in bias from std adjustment
Independent Benchmarks Reference Sample
95. www.srcentre.com.au
What about blending?
➢ Adding probability sample to non-probability sample will further
decrease the bias
➢ Confirmed that
o Combining probability and non-probability data greatly reduces variability across panels and
has much larger impact than calibration alone
o Use the best probability sample to combine with non probability regardless of mode and
response rate
➢ But…
➢ Costs and feasibility of running parallel probability sample
o Push to web – costs (incentives, follow up), timeliness
o Listed vs RDD mobile – option for blending
99