The document discusses the problems with observational data collection. It notes that while observational data is more available due to increased processing power, it can lead to spurious correlations, underlying biases, missing the real driver of behaviors, issues with the counterfactual scenario, measurement error, difficulties explaining why behaviors occur, observer effects, and survival bias. The document recommends discussing observational study design with experts, making predictions, identifying the counterfactual, incorporating experiments, exploring multiple causal patterns, and adding explanatory qualitative research to understand underlying reasons for behaviors.
3. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
Observa2onal Data
• Most big data (e.g. bank records, social media)
• Census or sample (e.g. all phones, sample with an app)
• Objec2ve or subjec2ve (e.g. receipts, ethnography)
• Structured or unstructured (e.g. phone records, images
uploaded to SM)
• Behaviour or mo2va2onal (e.g. loyalty cards, facial
coding)
• Naturally occurring or experiment
• Observa2onal only or with ques2ons (e.g. ad test using
biometrics)
7. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
Underlying Bias
When HRT was 1st assessed (Nurses
Health Study – large observa2onal study
in the USA), seemed to protect the heart.
Doctors were recommended to prescribe
it more widely.
Women’s Health Ini2a2ve (gold standard
controlled experiment) – suggested it
was slightly bad for the heart.
Why?
Women receiving HRT were
systema2cally healthier and wealthier.
10. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
Measurement Error
Dana Gruschwitz & Dr. Robert Schönduwe,
ESRA, 2017, Lisbon, Portugal
Long-standing transport study in Germany.
People have been using PAPI and CAPI to capture
journeys – memory based.
Trial with mobiles, to automa2cally capture informa2on.
Less ‘heaping’ of the distances and 2mes J
But 16% fewer journeys (11% less distance, 18% fewer
minutes) were recorded L
Why?
Phone app turned itself off when people’s phone
badery reached 20%
14. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
Observer Effects
Watching / measuring behaviour can
change behaviour.
UK RAC study of speed cameras, 2013,
found 27% reduc2on in fatal and
serious collisions.
Note, nobody was deliberately crashing,
it was the underlying behaviour that
changed.
hdps://www.racfounda2on.org/media-centre/speed-camera-transparency-data
18. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
U2lising Experiments
Region A
– T1 sales = 100
– T2, TV, sales = 110
– T3, TV & Twider, sales = 130
Region B
– T1, sales 100
– T2, Twider, sales = 110
– T3, TV & Twider, sales = 130
Region C
– T1 sales = 100
– T2, sales = 105
– T3, sales = 110
The counterfactual = some growth would
have happened anyway.
21. The Problems with Observa2onal Data Collec2on
Ray Poynter, The Future Place
The Future of
Data Collection
Recommenda2ons
• Talk to experts about designing and interpre2ng observa2onal
data collec2on
– Weigh2ng, matching, Bayesian sta2s2cs etc
• Make predic2ons
• Iden2fy the counter factual
• Build experiments into your data collec2on
• Look for other causal paderns and explana2ons
• Add explanatory research – e.g. qual – understand the why