Processing Social Media Messages in Mass Emergency: A Survey

Processing Social Media Messages in
Mass Emergency: Survey Summary
Muhammad Imran Carlos Castillo
Fernando Diaz Sarah Vieweg
Authors
mimran@hbku.edu.qa chato@acm.org
diazf@acm.org sarahvieweg@gmail.com
Date: 25th April 2018

Overarching Goal
“To extract time-critical information from social media
that is useful for emergency responders, affected
communities, and other concerned population in
disaster situations.”
Urgent help need Urgent aid need

Survey Study Selection
Domain filters Topic filters Data filters
- Humanitarian
- Disaster response
- Mass emergencies
- Computing
- Artificial intelligence
- Machine learning
- Twitter
- Facebook
- Micro-blogging
Keywords
Final selection = 180 published research papers
Domain Topics Data
>700 articles
Duplicate filters

Topics Covered
Humanitarian + Social Media + AI
Volume & Velocity
(~18)
Data acquisition,
storage, and retrieval
Event Detection
(~36)
Topic detection and
tracking
Classification &
Clustering
(~40)
Classification and
clustering
Information
Summarization
~(15)
Abstractive and
Extractive summarization
Semantics and
Crisis Ontologies
(~10)
Semantic enrichment &
Crisis ontologies
Information
Veracity
(~18)
Credibility and
misinformation
Information
Visualization
(~12)
Crisis maps, dashboards
Total ~180
papers surveyed

Twitter Storms during Emergencies
Source: https://www.wsj.com/articles/twitter-storms-can-help-gauge-damage-of-real-storms-and-disasters-study-says-1457722801
(Castillo C, Big Crisis
Data, 2016, Cambridge
University Press)
Volume
Velocity 72k tweets/min
27 million in 3 days

(Yury Kryvasheyeu et al. Sci Adv 2016;2:e1500779)
Blue: represents a location farther from the disaster
Red: represents a location closer to the disaster
Twitter Activity Across Locations
during Disasters
Activity Retweeting
Strong relationship between proximity to Sandy’s path and social media activity

Event Description
• Why to detect events from social media?
– Human sensors report incidents very quickly
– Tweet waves travel faster than earthquake waves
• What is an event?
– Events can be defined as situations, actions or
occurrences that happen in a certain location at a
specific time (Dou et al. 2012)
• An event is generally characterized by: 5W1H
– Who? When? Where? What? Why? How?

Event Detection using
Bursty Behavior
(Liang et al. Quantifying Information Flow During Emergencies, 2014, Nature.)

Event Detection Systems
System Approach Event
types
Real-
time
Query
type
Spatio-
temporal
Sub-
events
Reference
Twitter
Monitor
Burst
detection
Open domain Yes Open No No [Mathioudakis et al. 2010]
TwitInfo Burst
detection
Earthquakes Yes Keyword Spatial Yes [Marcus et al. 2011]
Twevent Burst
detection
Open domain Yes Open No No [Li et al. 2012b]
TEDAS Supervised
classification
Crime/disast
ers
No Keyword Yes No [Li et al. 2012a]
LeadLine Burst
detection
Open domain No Keyword Yes No [Dou et al. 2012]
TwiCal Supervised
classification
Conflicts/poli
tics
Yes Open Temporal No [Ritter et al. 2012]
Tweet4Act Dictionaries Disasters Yes Keyword No No [Chowdhury et al. 2013]
ESA Burst
detection
Open domain Yes Keyword Spatial No [Robinson et al. 2013a]

Challenges and Future Directions
• Inadequate spatial information
– Spatial and temporal information are two integral
components of an event
– Automatic text-based geo-tagging may help
• Mundane events
– #MusicMonday #FollowFriday are misleading
• Describing the events
– Named-entities, tracking, semantic enhancements

Information
Classification and Clustering

By Information Provided
• Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011];
hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice
[Bruns 2014]; status, protocol [Hughes et al. 2014b]
• Affected or trapped people [Caragea et al. 2011]; casualties, people missing,
found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured,
missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011]
• Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure
[Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services
[Hughes et al. 2014b]
• Needs and donations of money, goods, services [Imran et al. 2013b];
food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et
al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations,
resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014]
• Other useful information: hospital/clinic service, water sanitation [Caragea et
al. 2011]; consequences [Olteanu et al. 2014]

By Information Provided
• Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011];
hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice
[Bruns 2014]; status, protocol [Hughes et al. 2014b]
• Affected or trapped people [Caragea et al. 2011]; casualties, people missing,
found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured,
missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011]
• Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure
[Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services
[Hughes et al. 2014b]
• Needs and donations of money, goods, services [Imran et al. 2013b];
food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et
al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations,
resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014]
• Other useful information: hospital/clinic service, water sanitation [Caragea et
al. 2011]; consequences [Olteanu et al. 2014]
- Supervised classification techniques
- Learning algorithms include SVMs, Random
Forest, Ensemble methods, and lately deep
learning e.g., RNN
- Unsupervised: clustering, and LDA for topic modeling
Formal response organizations prefer supervised
classification as most of the times categories are
defined.

Systems for Crisis Data Processing
Twitris [Purohit and Sheth 2013]
Twitter; semantic enrichment, classify automatically, geotag
SensePlace2 [MacEachren et al. 2011]
Twitter; geotag, visualize heat-maps based on geotags
EAIMS Emergency Analysis Identification and Management System [McCreadie et
al. 2016] Twitter; sentiment, alerts, credibility,
ESA Emergency Situation Awareness
[Yin et al. 2012; Power et al. 2014]
Twitter; detect bursts, classify, cluster, geotag

Systems for Crisis Data Processing
Twitcident [Abel et al. 2012]
Twitter and TwitPic; semantic enrichment, classify
CrisisTracker [Rogstadius et al. 2013]
Twitter; cluster, annotate manually
Tweedr [Ashktorab et al. 2014]
Twitter; classify automatically, extract information, geotag
AIDR: Artificial Intelligence for Disaster Response [Imran et al. 2014a]
Twitter & Facebook; annotate manually,
classify automatically (text + image)

Challenges and Future Directions
• Missing actionable insights
– Who and where help is needed
– Automatic extraction of actionable/serviceable msgs
• Labeled data scarcity
– Most of the systems are labeled data hungry
– More robust domain adaption and transfer learning
techniques are required
• Focus on other content type (Images)
– Images contain critical information (e.g., damage)
– More focus on multimodal research is required

Information Summarization
Tribhuvan international airport closed after the quake
Airport closed after 7.9 Earthquake in Kathmandu
Tribhuvan international airport closed after 7.9 earthquake in
Kathmandu.
Summaries reduce information overload issue

Key Objectives and Challenges
• Information coverage
– Capture most situational updates from data. The summary should be
rich in terms of information coverage
• Less redundant information
– Messages on Twitter contain duplicate information. Produce
summaries with less redundant but important updates
• Readability
– Twitter messages are often noisy, informal, and full of grammatical
mistakes. The aim here is to produce more readable summaries
• Real-time (online/updated summaries)
– The system should not be heavily overloaded with computations
such that by the time the summary is produced, the utility of that
information is marginal
(McCreadie et al. 2013; Aslam et al. 2013;
Nenkova and McKeown 2011; Guo et al. 2013, Rudra et al., 2016)

Crisis Datasets (Labeled + Unlabeled)
CrisisMMD: Multimodal Twitter Datasets from Natural
Disasters
http://CrisisNLP.qcri.org/
http://CrisisLex.org/

Conclusion and Future Directions
• Applied Research at its Best
– Real-world problems and challenges
– Social Media for Social Good
– Decent work on information filtering and classification (last 6-8 years)
• Social media imagery content is another potential source of information
• Labeled data scarcity problem
– No or few labeled data instances (in early hours)
– High diversity among organizations needs
– Information needs change overtime
– Domain adaptation and transfer learning techniques required
• From situational to actionable insights
– Identify requests and needs in real-time
– Triangulate missing information
– Rank them based on their urgency to help responders

Thank you!
Contact me at: mimran@hbku.edu.qa OR @mimran15
For queries, questions, and datasets:
Recommended books:
Processing Social Media Messages in Mass Emergency: A Survey.
ACM Computing Surveys, 2015.
Full survey paper:

Processing Social Media Messages in Mass Emergency: A Survey

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Processing Social Media Messages in Mass Emergency: A Survey

Similar to Processing Social Media Messages in Mass Emergency: A Survey (20)

More from Muhammad Imran

More from Muhammad Imran (12)

Recently uploaded

Recently uploaded (20)

Processing Social Media Messages in Mass Emergency: A Survey

Editor's Notes