Talk given at the Machine Learning and Data Analytics Symposium (MLDAS 2019). https://qcai.qcri.org/index.php/events/mldas-2019/.
Contact me if you're interested in the topic of poverty mapping or data for development in general.
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Using advertising data to model migration, poverty and digital gender gaps
1. Using Advertising Data to Model Migration,
Poverty and Digital Gender Gaps
Ingmar Weber
April 1, 2019
MLDAS
@ingmarweber
2. Great Collaborators
• Mapping poverty in the Philippines
– with UNICEF and Thinking Machines
• Tracking digital gender gaps
– with Data2X and University of Oxford
• Monitoring the Venezuelan exodus
– with UNHCR, UNICEF and iMMAP
Joao Palotti
Masoomali Fatehkia
8. Why Map Poverty?
• Monitor sustainable development
• Plan better poverty reduction interventions
• Impact assessment of interventions
– Low latency a huge plus
9. Obtaining Training Data
• 2017 household survey implemented by the
Philippine Statistics Authority (PSA)
• Representative sample of ~40 households in
n=1214 “clusters”
• Asset ownership based wealth index (y=WI)
=> standard regression task
10. Sources of Ground Truth Noise
• Sampling noise
– Wealth index depends on particular households
– Expected R^2 = .95 (bootstrap estimate)
• Spatial perturbation
– True location is (x,y), but reported at (x’,y’)
– Protects privacy
– Expected R^2 = .89 (simulations)
• Combined
– Expected R^2 = .84
– “Expected upper bound”
11. Features to Map Poverty
24 variables on connection type,
device manufacturer, device type
12. Modeling the Wealth Index
● Model selection using LASSO:
Wealth Index / 1000 = - 96
+ 115 * (frac.FB users with 4G)
+ 216 * (frac. FB users with WiFi)
+ 48 * (frac. FB users with iOS)
- 89 * (frac. FB users with Cherry Mobile)
+ 11 * (frac. FB users with high end phones)
+ 30 * (FB penetration)
+ 3 * (log population density)
Tried regression trees, didn’t help
13. Modeling the Wealth Index
2017
2019
R^2 = 0.58
(10-fold CV)
Offl. baseline
R^2 = .37
Upper bound:
R^2 = .84
Due to DHS noise
15. Summary
- Challenging in low population areas (k-anonymity)
- Can catch temporal changes? Unclear.
+ Potentially more “causal” than satellite features
+ Supports demographic dis-aggregation
+ Does not break down at lowest decile
+ Promising to combine with other data sources
• Interested? Launching poverty mapping initiative
32. Advertising Audience Estimates
+ Global reach with over 2 billion users
+ FB, LinkedIn, Google, Snapchat, IG, ...
+ Real-time estimates
+ Uses anonymous and aggregate data
+ Gender, age, location, country of origin, ….
33. Advertising Audience Estimates
- Black box on how attributes are inferred
- Needs modeling for bias correction
- Usage patterns change over time
- Only includes people who are online
- Could create “use FB!” incentives
- Risk of misuse