SlideShare a Scribd company logo
1 of 70
Download to read offline
Big Data in Economic Research:
Twitter, Phone calls and Political events
EUI Summer School in Sofia
Julian Hinz
European University Institute &
Kiel Centre for Globalization
In this lecture
• Call detail records data: Insecurity and industrial organization
• Political event data: Local conflict and mines
• Twitter data: Spatial distribution of languages and migration
Call detail records data
Offline services
• Despite “offline” service, often digital logs
• Phone calls, Taxi rides, Transportation networks
• mainly metadata
• Often “surprisingly” open data
Call detail records
• (anonymized) phone number of caller and callee
• date and time stamp
• type of interaction recorded (call, SMS, data)
• duration of calls, amount of data
• coordinates of the caller’s/callee’s cell tower
• sometimes further variables: indicator for business customers,
subscription details, billing address...
Call detail records
Data used (mainly) descriptively for variety of purposes
• Spread of diseases: Wesolowski et al. (2012) on Malaria, Buckee et al.
(2014) on Ebola, ...
• Optimizing transportation networks: Berlingerio et al. (2013) for
Abidjan, Ivory Coast
• Displacement of people: Bengtsson et al. (2010, 2011) on earthquake
and Cholera outbreak in Haiti, Wilson et al. (2016) on earthquake in
Nepal
• Geography of social networks: Phithakkitnukoon et al. (2012)
• Effect of geography on social networks: Büchel and Ehrlich (2017)
exploit exogenous travel time increase
Malaria sources and sinks
Figure: Source: Wesolowski et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
CDRs for Portugal
Figure: Source: Phithakkitnukoon et al. (2012)
Blumenstock et al. (2018):
Insecurity and Industrial Organization
• Most of textbook economics does not take physical insecurity into
account
→ not much of the literature either: lack of data
• Blumenstock et al. (2018): Combine CDR data from Afghanistan with
geolocalized conflict data
• firms reduce presence in districts following major increases in violence
• effects persist for up to six months
• larger firms are more responsive to violence
Calls in Afghanistan
Figure: Source: Blumenstock et al. (2012)
Calls in Afghanistan
Figure: Source: Blumenstock et al. (2012)
Calls in Afghanistan
Figure: Source: Blumenstock et al. (2012)
Political event data
Political event data
• Political events, Conflict
• Human-coded vs. machine-coded from press coverage
• Probably most known: Correlates of War
→ interstate conflict, treaties, threats, alliances
• Uppsala Conflict Data Program
→ multiple datasets, some georeferenced
• Global Terrorism Database
→ geocoded, very detailed
Political event data
• “Integrated Crisis Early Warning System” (ICEWS)
→ DARPA program, released with 1 year lag
• Global Database of Events, Language, and Tone (GDELT)
→ daily data since 1979
• Phoenix from Open Event Data Alliance (OEDA)
→ near realtime event data
• GDELT and Phoenix provide API
Berman et al. (2017): Minerals and local
conflict
• “This Mine is Mine! How Minerals Fuel Conflicts in Africa”, Berman et
al. (2017)
• geolocalized data on conflict events in African countries between
1997–2010
• geolocalized data on mining extraction of 14 minerals (Raw Material
Data)
• mining activity increases the incidence of conflicts at the local level
• then spreads violence across territory and time
→ financial capacities of fighting groups increases
Figure: Source: Berman et al. (2017)
Figure: Source: Berman et al. (2017)
Figure: Source: Berman et al. (2017)
Berman et al. (2017): This mine is mine!
• Two simple specifications
CONFLICTkt = α1Mkt + α2 ln pW
kt + α3
(
Mkt × ln pW
kt
)
+ FEk + FEit + εkt
CONFLICTkt = α3
(
Mk × ln pW
kt
)
+ FEk + FEit + εkt
Figure: Source: Berman et al. (2017)
Figure: Source: Berman et al. (2017)
Twitter data
Online services
• Twitter, LinkedIn, Facebook, Instagram, Tumblr, Airbnb,...
• Content, but also metadata
• Often provide some data access
→ currently in flux
Twitter data
• Twitter Streaming API: 1 % random sample of all tweets
→ filters: keyword, geolocation
→ between 40 and 60 per second
• 42 variables: text, username, user_lang, lang, followers, timezone,
latitude, longitude, place, source,...
• Relatively easy to get access to the data: http://dev.twitter.com
Twitter data in research
• Obvious: Text-mining
→ Brexit, Trump election,.. Gorodnichenko et al. (2018), De Lyon et al.
(2018), Halberstam and Knight (2016)
• Not so obvious: Metadata
→ Language distribution
→ Migration
Hinz and Leromain (2018):
Languages and trade
• Spatial distribution of languages in Europe
• Geolocation from “coordinates”, and “user_lang” or “lang”
→ large heterogeneity across and within countries
• Coordinates provided either by the user’s device’s GPS coordinates, or
a self-assigned location
→ Barratt, J. Cheshire, and E. Manley (2013) use similar data for NY
boroughs
Bots and human users
• Bots: an issue, we follow Chu et al. (2012) only taking those sent from
smart phones and official app
• 6.6 million unique human Twitter users
• 481,720 unique human Twitter users in Europe
• 73 different languages
• 25 % tweet in more than 1 language, in Germany 31 %
• 958,071 unique language-user observations
Twitter and UK Census Population
Twitter and UK Census Main Language
Figure: Language use on Twitter and UK census, correlation = 0.49.
Twitter and Eurobarometer
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
1
10
100
1 10 100
Share of spoken language in Eurobarometer
ShareoflanguageinTwitterdata Language
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Arabic
Czech
Danish
Dutch
English
Estonian
Finnish
French
German
Greek
Hungarian
Icelandic
Italian
Latvian
Lithuanian
Polish
Portuguese
Russian
Slovak
Slovenian
Spanish
Swedish
Turkish
Counterfactual simulations
• Gravity between locations
Xod = G ×
Yo
Φ−θ
o
×
Ed
P−θ
d
× τ−θ
od
with
Lod =
∑
l
Pθ
dl
Pθ
d
ldl ×
Φ
−(γ−θ)
ol
Φθ
o
F
−( γ
θ
−1)
ol
=
∑
l
Pθ
dl
Pθ
d
ldl ×
Φθ
ol
Φθ
o
F
1−γ
θ
ol
Φγ
ol
Data and calibration
• Two types of “locations”: points in Europe and countries
• Aggregate to 30 arc minutes → 3,408 locations in Europe
→ average distance within location is about 15 kilometers
• Each country outside Europe as a single location
• calibrate production and expenditure in all locations to match external
country-to-country flows
Data and calibration
• We specify trade costs ϕod to be determined by: distance, RTA,
common currency
• Data on distances from Hinz (2016) or computed
• Data on the languages spoken in countries other than those in Europe
come from Melitz (2014)
• RTA and CU set to 1 within country or within location
• Coefficients from meta-analysis by Head et al. (2016)
Scenario 1: Common European language
→ Spoken by every inhabitant next to local languages
Scenario 2: Impact of within-country language
diversity
→ Welfare impact of eliminating within-country language diversity from
European countries and allowing only domestic language
Scenario 3: Elimination of all foreign languages in
UK
→ Welfare impact of eliminating allowing only English being spoken in UK
Scenario 4: Migration of Arabic-speaking population
to Germany
→ Welfare impact of 10 percent of population speaking Arabic in Germany
Hausmann, Hinz and Yıldırım (2018):
Venezuelan emigration
• Economic crisis in Venezuela: Large (?) number of refugees
→ lack of official numbers
• Dataset of geolocalized Tweets of people that tweeted from
Venezuela between February 2017 and May 2018
→ 5.4 million tweets
→ 490.000 tweets from 30.000 human Twitter users
• Idea: What location(s) do they tweet from over time?
Migration and social media
• Hawelka et al. (2014): global mobility patterns, tourism flows
• Jurdak et al. (2015) city-to-city travel in Australia
• Morstatter et al. (2013): random sample creates an accurate picture of
the entire population of geolocated Tweets
• Question: How representative are geolocalized tweets?
Population and Tweets
Figure: “Gridded Population of the World” and number of Tweets by location
Population and Users
Figure: “Gridded Population of the World” and number of Twitter users by location
Representativeness of Twitter users in
Venezuela
• “Digital in 2017 Global Overview report”: 44% of Venezuelans social
media, 35% from mobile device
• “ Tendencias Digitales”: 56% of internet users in Venezuela use Twitter
or comparable social media services
• Twitter: penetration in Venezuela 26 %
Tweets per users
Figure: Number of tweets per user in the dataset
Days per users
Figure: Number of days a user is observed in the dataset
How to make use we don’t capture tourists?
• We narrow sample to users who
→ tweeted from Venezuela exclusively between Feb and May ’17
(Period 1)
→ tweeted from a country exclusively between Feb and May ’18
(Period 2)
• Everyone who is not in Venezuela in period 2: migrant
• reduces sample to 818 (!)
→ Problem: Large heterogeneity in tweet frequency
Distribution of countries
Figure: Distribution of countries of users between February and April ’18
SAY HI!
@julianhinz
mail@julianhinz.com

More Related Content

What's hot

1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1crore projects
 

What's hot (6)

Sampling graphs efficiently - MAD Stat (TSE)
Sampling graphs efficiently - MAD Stat (TSE)Sampling graphs efficiently - MAD Stat (TSE)
Sampling graphs efficiently - MAD Stat (TSE)
 
HunchLab 2.0 Getting Started
HunchLab 2.0 Getting StartedHunchLab 2.0 Getting Started
HunchLab 2.0 Getting Started
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
 
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
 
Isi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and biasIsi 2017 presentation on Big Data and bias
Isi 2017 presentation on Big Data and bias
 
Sampling methods for graphs
Sampling methods for graphsSampling methods for graphs
Sampling methods for graphs
 

Similar to Big Data in Economic Research: Twitter, Phone calls and Political events

Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Marco Brambilla
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
PAPIs.io
 

Similar to Big Data in Economic Research: Twitter, Phone calls and Political events (20)

SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
SoBigData - Exploring human mobility and migration with BigData @ NTTS2017
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Extracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual dataExtracting interesting concepts from large-scale textual data
Extracting interesting concepts from large-scale textual data
 
Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...Weather events identification in social media streams: tools to detect their ...
Weather events identification in social media streams: tools to detect their ...
 
Data stories
Data storiesData stories
Data stories
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
Homophily in Twitter Political Networks_A Cross_Country Analysis_Presentation...
Homophily in Twitter Political Networks_A Cross_Country Analysis_Presentation...Homophily in Twitter Political Networks_A Cross_Country Analysis_Presentation...
Homophily in Twitter Political Networks_A Cross_Country Analysis_Presentation...
 
Homelessness Data Discussion
Homelessness Data DiscussionHomelessness Data Discussion
Homelessness Data Discussion
 
UK hyperlocal news and the public interest, Andy Williams and Jerome Turner, ...
UK hyperlocal news and the public interest, Andy Williams and Jerome Turner, ...UK hyperlocal news and the public interest, Andy Williams and Jerome Turner, ...
UK hyperlocal news and the public interest, Andy Williams and Jerome Turner, ...
 
Chung-Jui LAI - Polarization of Political Opinion by News Media
Chung-Jui LAI - Polarization of Political Opinion by News MediaChung-Jui LAI - Polarization of Political Opinion by News Media
Chung-Jui LAI - Polarization of Political Opinion by News Media
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 
New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
 
Estimating migrant stocks and flows using social media data
Estimating migrant stocks and flows using social media dataEstimating migrant stocks and flows using social media data
Estimating migrant stocks and flows using social media data
 

More from PhDSofiaUniversity

More from PhDSofiaUniversity (20)

Project UniverCity - Community Research Awards - ToT session III
Project UniverCity - Community Research Awards - ToT session IIIProject UniverCity - Community Research Awards - ToT session III
Project UniverCity - Community Research Awards - ToT session III
 
Project UniverCity - Community Research Awards - ToT session II
Project UniverCity - Community Research Awards - ToT session IIProject UniverCity - Community Research Awards - ToT session II
Project UniverCity - Community Research Awards - ToT session II
 
Project UniverCity - Community Research Awards - ToT session I
Project UniverCity - Community Research Awards - ToT session IProject UniverCity - Community Research Awards - ToT session I
Project UniverCity - Community Research Awards - ToT session I
 
Платформата Early Birds
Платформата Early BirdsПлатформата Early Birds
Платформата Early Birds
 
Legal Research on Europe: Possiblity and Challenges of Intredisciplinarity
Legal Research on Europe: Possiblity and Challenges of IntredisciplinarityLegal Research on Europe: Possiblity and Challenges of Intredisciplinarity
Legal Research on Europe: Possiblity and Challenges of Intredisciplinarity
 
Финансиране от Фонд "Научни изследвания"
Финансиране от Фонд "Научни изследвания"Финансиране от Фонд "Научни изследвания"
Финансиране от Фонд "Научни изследвания"
 
European project management
European project managementEuropean project management
European project management
 
The logical framework matrix approach (LFMA)
The logical framework matrix approach (LFMA)The logical framework matrix approach (LFMA)
The logical framework matrix approach (LFMA)
 
Erasmus+ Knowledge Alliances
Erasmus+ Knowledge AlliancesErasmus+ Knowledge Alliances
Erasmus+ Knowledge Alliances
 
Effective development and management of joint programmes
Effective development and management of joint programmesEffective development and management of joint programmes
Effective development and management of joint programmes
 
Erasmus+ Strategic Partnership
Erasmus+ Strategic PartnershipErasmus+ Strategic Partnership
Erasmus+ Strategic Partnership
 
Erasmus+ Capacity Building
Erasmus+ Capacity BuildingErasmus+ Capacity Building
Erasmus+ Capacity Building
 
Erasmus Mundus Joint Master Degrees
Erasmus Mundus Joint Master DegreesErasmus Mundus Joint Master Degrees
Erasmus Mundus Joint Master Degrees
 
Erasmus+ International Credit Mobility
Erasmus+ International Credit MobilityErasmus+ International Credit Mobility
Erasmus+ International Credit Mobility
 
Wos su june2017
Wos su june2017Wos su june2017
Wos su june2017
 
Marie Sklodowska Curie Actions
Marie Sklodowska Curie ActionsMarie Sklodowska Curie Actions
Marie Sklodowska Curie Actions
 
The Logical Framework Matrix Approach
The Logical Framework Matrix ApproachThe Logical Framework Matrix Approach
The Logical Framework Matrix Approach
 
The EU Legal Framework on Higher Education Policies and International Coopera...
The EU Legal Framework on Higher Education Policies and International Coopera...The EU Legal Framework on Higher Education Policies and International Coopera...
The EU Legal Framework on Higher Education Policies and International Coopera...
 
Erasmus+ Knowledge Alliances
Erasmus+ Knowledge AlliancesErasmus+ Knowledge Alliances
Erasmus+ Knowledge Alliances
 
The Internationalization of Higher Education
The Internationalization of Higher EducationThe Internationalization of Higher Education
The Internationalization of Higher Education
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Big Data in Economic Research: Twitter, Phone calls and Political events

  • 1. Big Data in Economic Research: Twitter, Phone calls and Political events EUI Summer School in Sofia Julian Hinz European University Institute & Kiel Centre for Globalization
  • 2. In this lecture • Call detail records data: Insecurity and industrial organization • Political event data: Local conflict and mines • Twitter data: Spatial distribution of languages and migration
  • 4. Offline services • Despite “offline” service, often digital logs • Phone calls, Taxi rides, Transportation networks • mainly metadata • Often “surprisingly” open data
  • 5. Call detail records • (anonymized) phone number of caller and callee • date and time stamp • type of interaction recorded (call, SMS, data) • duration of calls, amount of data • coordinates of the caller’s/callee’s cell tower • sometimes further variables: indicator for business customers, subscription details, billing address...
  • 6. Call detail records Data used (mainly) descriptively for variety of purposes • Spread of diseases: Wesolowski et al. (2012) on Malaria, Buckee et al. (2014) on Ebola, ... • Optimizing transportation networks: Berlingerio et al. (2013) for Abidjan, Ivory Coast • Displacement of people: Bengtsson et al. (2010, 2011) on earthquake and Cholera outbreak in Haiti, Wilson et al. (2016) on earthquake in Nepal • Geography of social networks: Phithakkitnukoon et al. (2012) • Effect of geography on social networks: Büchel and Ehrlich (2017) exploit exogenous travel time increase
  • 7. Malaria sources and sinks Figure: Source: Wesolowski et al. (2012)
  • 8. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 9. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 10. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 11. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 12. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 13. CDRs for Portugal Figure: Source: Phithakkitnukoon et al. (2012)
  • 14. Blumenstock et al. (2018): Insecurity and Industrial Organization • Most of textbook economics does not take physical insecurity into account → not much of the literature either: lack of data • Blumenstock et al. (2018): Combine CDR data from Afghanistan with geolocalized conflict data • firms reduce presence in districts following major increases in violence • effects persist for up to six months • larger firms are more responsive to violence
  • 15. Calls in Afghanistan Figure: Source: Blumenstock et al. (2012)
  • 16. Calls in Afghanistan Figure: Source: Blumenstock et al. (2012)
  • 17. Calls in Afghanistan Figure: Source: Blumenstock et al. (2012)
  • 19. Political event data • Political events, Conflict • Human-coded vs. machine-coded from press coverage • Probably most known: Correlates of War → interstate conflict, treaties, threats, alliances • Uppsala Conflict Data Program → multiple datasets, some georeferenced • Global Terrorism Database → geocoded, very detailed
  • 20. Political event data • “Integrated Crisis Early Warning System” (ICEWS) → DARPA program, released with 1 year lag • Global Database of Events, Language, and Tone (GDELT) → daily data since 1979 • Phoenix from Open Event Data Alliance (OEDA) → near realtime event data • GDELT and Phoenix provide API
  • 21. Berman et al. (2017): Minerals and local conflict • “This Mine is Mine! How Minerals Fuel Conflicts in Africa”, Berman et al. (2017) • geolocalized data on conflict events in African countries between 1997–2010 • geolocalized data on mining extraction of 14 minerals (Raw Material Data) • mining activity increases the incidence of conflicts at the local level • then spreads violence across territory and time → financial capacities of fighting groups increases
  • 22. Figure: Source: Berman et al. (2017)
  • 23. Figure: Source: Berman et al. (2017)
  • 24. Figure: Source: Berman et al. (2017)
  • 25. Berman et al. (2017): This mine is mine! • Two simple specifications CONFLICTkt = α1Mkt + α2 ln pW kt + α3 ( Mkt × ln pW kt ) + FEk + FEit + εkt CONFLICTkt = α3 ( Mk × ln pW kt ) + FEk + FEit + εkt
  • 26. Figure: Source: Berman et al. (2017)
  • 27. Figure: Source: Berman et al. (2017)
  • 29. Online services • Twitter, LinkedIn, Facebook, Instagram, Tumblr, Airbnb,... • Content, but also metadata • Often provide some data access → currently in flux
  • 30. Twitter data • Twitter Streaming API: 1 % random sample of all tweets → filters: keyword, geolocation → between 40 and 60 per second • 42 variables: text, username, user_lang, lang, followers, timezone, latitude, longitude, place, source,... • Relatively easy to get access to the data: http://dev.twitter.com
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Twitter data in research • Obvious: Text-mining → Brexit, Trump election,.. Gorodnichenko et al. (2018), De Lyon et al. (2018), Halberstam and Knight (2016) • Not so obvious: Metadata → Language distribution → Migration
  • 37. Hinz and Leromain (2018): Languages and trade • Spatial distribution of languages in Europe • Geolocation from “coordinates”, and “user_lang” or “lang” → large heterogeneity across and within countries • Coordinates provided either by the user’s device’s GPS coordinates, or a self-assigned location → Barratt, J. Cheshire, and E. Manley (2013) use similar data for NY boroughs
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44. Bots and human users • Bots: an issue, we follow Chu et al. (2012) only taking those sent from smart phones and official app • 6.6 million unique human Twitter users • 481,720 unique human Twitter users in Europe • 73 different languages • 25 % tweet in more than 1 language, in Germany 31 % • 958,071 unique language-user observations
  • 45. Twitter and UK Census Population
  • 46. Twitter and UK Census Main Language Figure: Language use on Twitter and UK census, correlation = 0.49.
  • 47. Twitter and Eurobarometer q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 10 100 1 10 100 Share of spoken language in Eurobarometer ShareoflanguageinTwitterdata Language q q q q q q q q q q q q q q q q q q q q q q q Arabic Czech Danish Dutch English Estonian Finnish French German Greek Hungarian Icelandic Italian Latvian Lithuanian Polish Portuguese Russian Slovak Slovenian Spanish Swedish Turkish
  • 48. Counterfactual simulations • Gravity between locations Xod = G × Yo Φ−θ o × Ed P−θ d × τ−θ od with Lod = ∑ l Pθ dl Pθ d ldl × Φ −(γ−θ) ol Φθ o F −( γ θ −1) ol = ∑ l Pθ dl Pθ d ldl × Φθ ol Φθ o F 1−γ θ ol Φγ ol
  • 49. Data and calibration • Two types of “locations”: points in Europe and countries • Aggregate to 30 arc minutes → 3,408 locations in Europe → average distance within location is about 15 kilometers • Each country outside Europe as a single location • calibrate production and expenditure in all locations to match external country-to-country flows
  • 50. Data and calibration • We specify trade costs ϕod to be determined by: distance, RTA, common currency • Data on distances from Hinz (2016) or computed • Data on the languages spoken in countries other than those in Europe come from Melitz (2014) • RTA and CU set to 1 within country or within location • Coefficients from meta-analysis by Head et al. (2016)
  • 51. Scenario 1: Common European language → Spoken by every inhabitant next to local languages
  • 52.
  • 53. Scenario 2: Impact of within-country language diversity → Welfare impact of eliminating within-country language diversity from European countries and allowing only domestic language
  • 54.
  • 55. Scenario 3: Elimination of all foreign languages in UK → Welfare impact of eliminating allowing only English being spoken in UK
  • 56.
  • 57. Scenario 4: Migration of Arabic-speaking population to Germany → Welfare impact of 10 percent of population speaking Arabic in Germany
  • 58.
  • 59. Hausmann, Hinz and Yıldırım (2018): Venezuelan emigration • Economic crisis in Venezuela: Large (?) number of refugees → lack of official numbers • Dataset of geolocalized Tweets of people that tweeted from Venezuela between February 2017 and May 2018 → 5.4 million tweets → 490.000 tweets from 30.000 human Twitter users • Idea: What location(s) do they tweet from over time?
  • 60.
  • 61.
  • 62. Migration and social media • Hawelka et al. (2014): global mobility patterns, tourism flows • Jurdak et al. (2015) city-to-city travel in Australia • Morstatter et al. (2013): random sample creates an accurate picture of the entire population of geolocated Tweets • Question: How representative are geolocalized tweets?
  • 63. Population and Tweets Figure: “Gridded Population of the World” and number of Tweets by location
  • 64. Population and Users Figure: “Gridded Population of the World” and number of Twitter users by location
  • 65. Representativeness of Twitter users in Venezuela • “Digital in 2017 Global Overview report”: 44% of Venezuelans social media, 35% from mobile device • “ Tendencias Digitales”: 56% of internet users in Venezuela use Twitter or comparable social media services • Twitter: penetration in Venezuela 26 %
  • 66. Tweets per users Figure: Number of tweets per user in the dataset
  • 67. Days per users Figure: Number of days a user is observed in the dataset
  • 68. How to make use we don’t capture tourists? • We narrow sample to users who → tweeted from Venezuela exclusively between Feb and May ’17 (Period 1) → tweeted from a country exclusively between Feb and May ’18 (Period 2) • Everyone who is not in Venezuela in period 2: migrant • reduces sample to 818 (!) → Problem: Large heterogeneity in tweet frequency
  • 69. Distribution of countries Figure: Distribution of countries of users between February and April ’18