Millions of people use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information involves solving multiple challenges, including parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.
Processing Social Media Messages in Mass Emergency: A Survey
1. Processing Social Media Messages in
Mass Emergency: Survey Summary
Muhammad Imran Carlos Castillo
Fernando Diaz Sarah Vieweg
Authors
mimran@hbku.edu.qa chato@acm.org
diazf@acm.org sarahvieweg@gmail.com
Date: 25th April 2018
2. Overarching Goal
“To extract time-critical information from social media
that is useful for emergency responders, affected
communities, and other concerned population in
disaster situations.”
Urgent help need Urgent aid need
3. Survey Study Selection
Domain filters Topic filters Data filters
- Humanitarian
- Disaster response
- Mass emergencies
- Computing
- Artificial intelligence
- Machine learning
- Twitter
- Facebook
- Micro-blogging
Keywords
Final selection = 180 published research papers
Domain Topics Data
>700 articles
Duplicate filters
4. Topics Covered
Humanitarian + Social Media + AI
Volume & Velocity
(~18)
Data acquisition,
storage, and retrieval
Event Detection
(~36)
Topic detection and
tracking
Classification &
Clustering
(~40)
Classification and
clustering
Information
Summarization
~(15)
Abstractive and
Extractive summarization
Semantics and
Crisis Ontologies
(~10)
Semantic enrichment &
Crisis ontologies
Information
Veracity
(~18)
Credibility and
misinformation
Information
Visualization
(~12)
Crisis maps, dashboards
Total ~180
papers surveyed
6. Twitter Storms during Emergencies
Source: https://www.wsj.com/articles/twitter-storms-can-help-gauge-damage-of-real-storms-and-disasters-study-says-1457722801
(Castillo C, Big Crisis
Data, 2016, Cambridge
University Press)
Volume
Velocity 72k tweets/min
27 million in 3 days
7. (Yury Kryvasheyeu et al. Sci Adv 2016;2:e1500779)
Blue: represents a location farther from the disaster
Red: represents a location closer to the disaster
Twitter Activity Across Locations
during Disasters
Activity Retweeting
Strong relationship between proximity to Sandy’s path and social media activity
9. Event Description
• Why to detect events from social media?
– Human sensors report incidents very quickly
– Tweet waves travel faster than earthquake waves
• What is an event?
– Events can be defined as situations, actions or
occurrences that happen in a certain location at a
specific time (Dou et al. 2012)
• An event is generally characterized by: 5W1H
– Who? When? Where? What? Why? How?
10. Event Detection using
Bursty Behavior
(Liang et al. Quantifying Information Flow During Emergencies, 2014, Nature.)
11. Event Detection Systems
System Approach Event
types
Real-
time
Query
type
Spatio-
temporal
Sub-
events
Reference
Twitter
Monitor
Burst
detection
Open domain Yes Open No No [Mathioudakis et al. 2010]
TwitInfo Burst
detection
Earthquakes Yes Keyword Spatial Yes [Marcus et al. 2011]
Twevent Burst
detection
Open domain Yes Open No No [Li et al. 2012b]
TEDAS Supervised
classification
Crime/disast
ers
No Keyword Yes No [Li et al. 2012a]
LeadLine Burst
detection
Open domain No Keyword Yes No [Dou et al. 2012]
TwiCal Supervised
classification
Conflicts/poli
tics
Yes Open Temporal No [Ritter et al. 2012]
Tweet4Act Dictionaries Disasters Yes Keyword No No [Chowdhury et al. 2013]
ESA Burst
detection
Open domain Yes Keyword Spatial No [Robinson et al. 2013a]
12. Challenges and Future Directions
• Inadequate spatial information
– Spatial and temporal information are two integral
components of an event
– Automatic text-based geo-tagging may help
• Mundane events
– #MusicMonday #FollowFriday are misleading
• Describing the events
– Named-entities, tracking, semantic enhancements
14. By Information Provided
• Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011];
hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice
[Bruns 2014]; status, protocol [Hughes et al. 2014b]
• Affected or trapped people [Caragea et al. 2011]; casualties, people missing,
found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured,
missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011]
• Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure
[Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services
[Hughes et al. 2014b]
• Needs and donations of money, goods, services [Imran et al. 2013b];
food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et
al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations,
resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014]
• Other useful information: hospital/clinic service, water sanitation [Caragea et
al. 2011]; consequences [Olteanu et al. 2014]
15. By Information Provided
• Caution and advice [Imran et al. 2013b]; warnings [Acar and Muraki 2011];
hazard preparation [Olteanu et al. 2014]; tips [Leavitt and Clark 2014]; advice
[Bruns 2014]; status, protocol [Hughes et al. 2014b]
• Affected or trapped people [Caragea et al. 2011]; casualties, people missing,
found, or seen [Imran et al. 2013b]; self-reports [Acar and Muraki 2011]; injured,
missing, killed [Vieweg et al. 2010]; looking for missing people [Qu et al. 2011]
• Infrastructure/utilities damage [Imran et al. 2013b]; collapsed structure
[Caragea et al. 2011]; built environment [Vieweg et al. 2010]; closure and services
[Hughes et al. 2014b]
• Needs and donations of money, goods, services [Imran et al. 2013b];
food/water shortage [Caragea et al. 2011]; donations or volunteering [Olteanu et
al. 2014]; help requests, relief coordination [Qu et al. 2011]; relief, donations,
resources [Hughes et al. 2014b]; help and fundraising [Bruns 2014]
• Other useful information: hospital/clinic service, water sanitation [Caragea et
al. 2011]; consequences [Olteanu et al. 2014]
- Supervised classification techniques
- Learning algorithms include SVMs, Random
Forest, Ensemble methods, and lately deep
learning e.g., RNN
- Unsupervised: clustering, and LDA for topic modeling
Formal response organizations prefer supervised
classification as most of the times categories are
defined.
16. Systems for Crisis Data Processing
Twitris [Purohit and Sheth 2013]
Twitter; semantic enrichment, classify automatically, geotag
SensePlace2 [MacEachren et al. 2011]
Twitter; geotag, visualize heat-maps based on geotags
EAIMS Emergency Analysis Identification and Management System [McCreadie et
al. 2016] Twitter; sentiment, alerts, credibility,
ESA Emergency Situation Awareness
[Yin et al. 2012; Power et al. 2014]
Twitter; detect bursts, classify, cluster, geotag
17. Systems for Crisis Data Processing
Twitcident [Abel et al. 2012]
Twitter and TwitPic; semantic enrichment, classify
CrisisTracker [Rogstadius et al. 2013]
Twitter; cluster, annotate manually
Tweedr [Ashktorab et al. 2014]
Twitter; classify automatically, extract information, geotag
AIDR: Artificial Intelligence for Disaster Response [Imran et al. 2014a]
Twitter & Facebook; annotate manually,
classify automatically (text + image)
18. Challenges and Future Directions
• Missing actionable insights
– Who and where help is needed
– Automatic extraction of actionable/serviceable msgs
• Labeled data scarcity
– Most of the systems are labeled data hungry
– More robust domain adaption and transfer learning
techniques are required
• Focus on other content type (Images)
– Images contain critical information (e.g., damage)
– More focus on multimodal research is required
20. Information Summarization
Tribhuvan international airport closed after the quake
Airport closed after 7.9 Earthquake in Kathmandu
Tribhuvan international airport closed after 7.9 earthquake in
Kathmandu.
Summaries reduce information overload issue
21. Key Objectives and Challenges
• Information coverage
– Capture most situational updates from data. The summary should be
rich in terms of information coverage
• Less redundant information
– Messages on Twitter contain duplicate information. Produce
summaries with less redundant but important updates
• Readability
– Twitter messages are often noisy, informal, and full of grammatical
mistakes. The aim here is to produce more readable summaries
• Real-time (online/updated summaries)
– The system should not be heavily overloaded with computations
such that by the time the summary is produced, the utility of that
information is marginal
(McCreadie et al. 2013; Aslam et al. 2013;
Nenkova and McKeown 2011; Guo et al. 2013, Rudra et al., 2016)
23. Conclusion and Future Directions
• Applied Research at its Best
– Real-world problems and challenges
– Social Media for Social Good
– Decent work on information filtering and classification (last 6-8 years)
• Social media imagery content is another potential source of information
• Labeled data scarcity problem
– No or few labeled data instances (in early hours)
– High diversity among organizations needs
– Information needs change overtime
– Domain adaptation and transfer learning techniques required
• From situational to actionable insights
– Identify requests and needs in real-time
– Triangulate missing information
– Rank them based on their urgency to help responders
24. Thank you!
Contact me at: mimran@hbku.edu.qa OR @mimran15
For queries, questions, and datasets:
Recommended books:
Processing Social Media Messages in Mass Emergency: A Survey.
ACM Computing Surveys, 2015.
Full survey paper:
Editor's Notes
Our goal in this paper was to survey systems, techniques, and computational models that help extract time-critical information from social media useful for emergency responders and affected communities.
For example, look at these two messages. The message on the left side, which was collected during the recent hurricane Harvey, asks about urgent help for an old person who got trapped.
The message on the right side, requests about urgent need of baby food and medicines during a flood situation in Kashmir.
Before start reading the papers, we decided three aspects that influence what papers to select and what not. We formed several keyword searches using domain + topics + data sources. We used several scholarly search engines
After getting the results, two of the authors looked at the papers and filter out the ones which were not relevant. Our final set has around 184 papers.
----- Meeting Notes (4/16/18 13:04) -----
- No listing, but the message
- Opinions (
-
These are some numbers from a few major past disasters from 2010 to 2013 originally reported in the WSJ. There were 27 million tweets posted in 3 days after the Boston marathon bombing in 2013.
How fast these messages arrive?
Well, during 2011 Japan earthquake the highest velocity record according the Big Crisis Data book, was 72k.
It is not only the velocity is high, actually social media breaks stories faster than traditional channels. When a magnitude-5.8 earthquake hit Virginia in 2011, the first Twitter report from a bystander at the epicenter reached New York about 40 seconds ahead of the quake’s first shock waves. Sourced WSJ
Now with all the big volume and high velocity, the question is whether this Twitter activity indicate anything or is it random?
According to this paper published in the Science Journal, there is a strong relationship between disaster proximity and social media activity. “Rapid assessment of disaster damage using social media activity
In all charts, the primary plot shows results for messages with keyword “sandy” and the small chart for keyword “weather” to contrast behaviors between event-related and neutral words.
Blue represents a location farther from the disaster. Red represents a location closer to the disaster.
A: Chart A shows a sharp decline in the activity as the distance between a location and the path of the hurricane increases.
B: The chart B shows the activity and retweet fraction. It seems that the retweet rate is inversely related to activity, with affected areas producing more original content.
None of the features discussed above are present for neutral words (see the insets in all panels).
--Backup—
A: After the distance exceeds 1200 to 1500 km, its effect on the strength of response disappears. This trend may be caused by a combination of factors, with direct observation of disaster effects and perception of risk both increasing the tweet activity of the East Coast cities. Anxiety, anticipation, and risk perception evidently contribute to the magnitude of response because many of the communities falling into the decreasing trend were not directly hit or were affected only marginally, whereas New Orleans, for example, shows a significant tweeting level that reflects its historical experience with damaging hurricanes like Katrina.
C: The chart C shows content popularity. The popularity of the content created in the disaster area is also higher and therefore increases with activity as well.
Now, with all these huge activity on social media during disasters, can we use it to automatically detection disaster events?
We want to detect events from social media because 1) human sensors are generally fast, 2) we saw that tweet waves travel faster than earthquake waves
According to a study published in Nature on “Quantifying Information flow during emergencies”. The authors used mobile SMS and calls to predict suspicious events.
According to this study, the actions and reactions of affected people due to a disaster or due to a non-disaster event are differentiable.
Go are users who directly affected by the disaster
G1 are users who are contacted by G0 users
If you compare, bombing, jet scare, and plane crash with concert event, you notice a consistent pattern in all disaster event which is not visible in the non-disaster event.
G0 activity goes up as they hit disaster
G1 also go up in the case of emergency, but not really in the case of non-emergency event
Several systems and techniques have been developed in the last couple of years.
Here I listed a few important ones with their capabilities e.g, event type, real-time, query type, spatio-temporal, and whether they able to identify sub-events or not.
You notice that most of these systems are based on burst detection, which is could be misleading, especially in social media due to mundane events messages.
Temporal = able to predict the time of a detected event
Spatial = able to predict the location of an event
After an event is detected, the next step is to analyze what the data. Two famous techniques classification and clustering have been used for this purpose.
Here I listed a number of works, with their detailed task.
Here I listed a number of works, with their detailed task.
Unfortunately most of these systems are not developed based on stakeholders needs. Future system should be requirements-driven
Information summarization is another very important step after classification.
There are mainly two types of summarization approaches: extractive in which same content as source is used to generate summaries. Abstractive in which new content is used to summarize a set of documents.