5. 5 / 46
The Big Challenge
Sense making from planetary
scale geo-social data-streams
Situation recognition
6. 6 / 46
Concept recognition from multimedia data
Heterogeneous Media
Heterogeneous Media
Single Media
Location Scenes
Environ Trajectories
Situations
mentsK
3.4
Single Media
aware
Location Visual
Real world Visual
360 K 11.4K
Activities
unaware Objects
Objects Events
Static Dynamic
SPACE
TIME
7. 7 / 46
Contributions
1. Computationally define situations
2. Define a generic process for Situation
recognition
a) Situation Modeling
b) Situation Evaluation:
• E-mage + Situation Recognition Algebra
c) Personalized Alerts
3. EventShop: Web-based system for situation
evaluation
8. 8 / 46
Situations: Other definitions
• Endsley, 1988: “the perception of elements in the environment within a
volume of time and space, the comprehension of their meaning, and the
projection of their status in the near future”
• Merriam-Webster dictionary: “relative position or combination of
circumstances at a certain moment”
• McCarthy, 1969: “A situation is a finite sequence of actions.”
• Yau, 2006: “A situation is a set of contexts in the application over a period
of time that affects future system behavior”
• Dietrich, 2003: “…extensive information about the environment to be
collected from all sensors independent of their interface technology. Data is
transformed into abstract symbols. A combination of symbols leads to
representation of current situations…which can be detected”
9. 9 / 46
Situations: commonalities
• Goal Based • Abstraction
• Space-Time • Computationally
• Future Actions Grounded
Future Computationally
Work Goal Based Space-Time Abstraction
Actions Grounded
McCarthy, 1968 X
Barwise, 1971 X X
Endsley, 1988 X X X X
Sarter, 1991 o X
Adam, 1993 X X
Dominguez,1994 X X X X
Smith, 1995 X o X X
Steinberg, 1999 X X X o
Jeannot, 2003 X
Moray, 2004 o X
Dietrich, 2004 X X
Yau, 2006 X X X
Dostal, 2007 o X
Singh, 2009 X X X
Merriam-Webster
(accessed 2012) o
This work (aim) X X X X X
o = Partial support
10. 10 / 46
Situation: Definition
• Situation: An actionable abstraction of
observed spatio-temporal characteristics.
• e.g. flu epidemic, severe asthma threat, road
congestion, wildfire, flash-mob
Future Computationally
Goal Based Space-Time Abstraction
Actions Grounded
11. 11 / 46
Overall Framework: Motivating example
Aggregation, Operations
Alert level
= High
Date: 3rd Jun, 2011
STT data Situation Detection User-Feedback
Tweet: 1) Classification ‘Please visit nearest CDC
‘Urrgh… sinus’ 2) Control action center at 4th St
immediately’
Loc: NYC,
Date: 3rd Jun, 2011
Theme: Allergy
12. 12 / 46
Eco-system: Situation based applications
Human
Sensor/
Wisdom
source App logic
Analyst
Analysis &
insights
Spatio- Macro
Device Situation
Temporal situation Situation
Sensors detection
aggregation based
operators
controller
Personal
situation
Archives Control
decisions
Event processing engine
Human
Sensor/
Alerts
Actuator
Singh, Jain: Situation based control. (Best Student Paper) IEEE Situation Management Workshop’09
Singh, Kankanhalli, Jain: Motivating contributors. (Best Paper) ACM Workshop on Social Media ’09
13. Overall framework 13 / 46
A) Situation B) Situation C) Visualization,
Modeling Recognition Personalization, and Alerts
i) Visualization
C1
…
v2 v3 Personal
@ context ii) Personalization
Personal
v5 v6 + ized
STT situation
∏ @
Δ Stream
Available
resources
+
Emage
iii) Alerts
Situation
14. 14 / 46
Design principles
• Humans as sensors
• Space + Time as fundamental axes
• Real time situation evaluation (E-mage Streams)
(a) Pollen levels (Source: Visual) (b) Census data (Source: text file) (c) Reports on ‘Hurricanes’ (source: Twitter stream)
d) Cloud cover (Source: Satellite imagery) (e) Predicted hurricane path (source: KML) (f) Open shelters coverage(Source: KML)
15. 15 / 46
A) Situation Modeling
• Help domain experts externalize their internal
models of situations of interest e.g. epidemic.
• Building blocks:
• Operators
• Operands
• Wizard:
• A prescriptive approach for modeling situations using
the operators and operands
Singh, Gao, Jain: Situation recognition: An evolving problem for heterogeneous
dynamic big multimedia data, ACM Multimedia ‘12.
16. 16 / 46
Building Blocks: Operands
• Knowledge or data driven building blocks
Growth rate
(Flu reports)
Feature
Twitter-Flu Data source
-Emage
(#Reports)
Representation
level
Thresholds
(0, 50) Meta-data
17. 17 / 46
Building Blocks: Operators
Supporting
Operator Type Data
parameter(s)
Output
1) Data into right
representation Δ Transform …
Spatio-temporal
window
∏ Filter +
Mask
Aggregate +
2) Analyze data to
Classification
derive features Classification method
@ Characterization Property Growth Rate
required = 125%
Pattern Matching
+
Pattern 72%
{Features}
3) Use features to Φ Learn f Learning
method
f
evaluate situations {Situation}
18. 18 / 46
Situation Modeling
v v ϵ { Low,
Mid, High}
f1 <USA, 5 mins,
0.01x 0.01>
v2 v3 v4
v=f(v1, …, vk) @ f2 ∏
• If (type = imprecise)
• identify learning data source, method Emage
v5 v6
Emage
vi ∏ @
Δ Δ
If (atomic)
Emage
• Identify Data source. D1 Emage
D2
Δ Δ
• Type, URL, ST bounds
• Identify highest Rep. level reqd.
D2 D3
• Identify operations
Else
Get_components(vi)
}
}
25. 25 / 46
Sample Queries
• Select E-mages of USA for theme ‘Obama’.
• ∏spatial(region=[24,-125],[24,-65]) (TEStheme=Obama)
• Identify three clusters for each E-mage above.
• kmeans(3) (∏spatial(region=[24,-125],[24,-65])(TEStheme=Obama))
• Show me the cluster with most interest in ‘Obama’.
• ∏value(v=1) (kmeans(n=3) (∏spatial(region=[24,-125],[24,-65]) (TEStheme=Obama)))
• Show me the speed for high interest cluster in ‘Katrina’ emages
• @speed(@epicenter(∏value(v=1) (kmeans(n=3) (∏spatial(region=[24,-125],[24,-65])
(TEStheme=Katrina)))))
• How similar is pattern above to ‘exponential increase’?
• exp-increase(@speed(@epicenter (∏value(v=1) (kmeans(n=3) (∏spatial(region=[24,-125],[24,-65])
(TEStheme=Katrina))))
26. 26 / 46
C) Personalization and Alerts
Personalized situation: An actionable integration of a user's personal
context with surrounding spatiotemporal situation.
1) Macro
situation
Macro 2)
Personal
data-sources Context Personalized
situation
Profile + 3)
Preferences Personalized
alerts
User Available
data resources
Resource
data
IF user Ui <is-in> (PSj) THEN <connect-to> Rk
29. 29 / 46
EVENTSHOP:
Recognizing situations from web streams
30. 30 / 46
EventShop: System Implementation
• Front end:
• Javascript (JSLinb library)
• Front-Back end Interaction
• Java servlets, Apache
• Back End
• Java
• C++ (OpenCV classes)
• Ingestion wrappers available for:
• Twitter streams, Flickr stream, CSV data, KML data, Geo-images,
MySQL data archives, Funf (mobile phone sensors)
Gao, Singh, Jain: EventShop: From Heterogeneous Web Streams
to Personalized Situation Detection and Control, ACM WebScience ‘12.
31. 31 / 46
S.No Query Language Operator Media processing Media processing Operator Details
Translation into Operator
Media processing 1. Filter
-Spatial Arithmetic AND with the spatial mask
operators -Temporal Arithmetic AND with the temporal mark
-Thematic Arithmetic =
-Value Arithmetic AND, >, <, =
2. Aggregation
-Max, Min, +,-,%,* Arithmetic Max, Min, +,-,%,*
- NOT, OR, AND, Logical NOT, OR, AND
-Convolution Convolution Convolution
3. Classification
- Predefined segments count Segmentation K-means
- Predefined segment boundaries Segmentation thresholds
4. Characterization
i) Spatial
- Count, Min, Max, Sum, Average, Variation Statistical Count, Min, Max, Sum, Average
- Coverage Arithmetic Count
- Epicenter Arithmetic Weighted average
- Circularity Convolution Scale free convolution with known circular kernel
- Growth rate Arithmetic +, -, %
ii) Temporal
- Displacement, Distance, Velocity, Arithmetic +, -, %, *
Acceleration, Growth rate
- Future estimation Arithmetic Multiplication with Kernels based on users choice e.g.
linear, progression exponential growth
- Periodicity Convolution Auto correlation i.e. Self convolution with time-lagged
variant.
5. Pattern Matching
- Scaled Matching Convolution Convolution with user defined or pre-defined Kernels
- Scale free Matching Convolution, Statistical Maxima from Loops of Convolution with different image
sizes.
32. 32 / 46
Evaluations
1. Design principles
• Humans as sensors to detect real world events
2. Data representation and Situation recognition
algebra
• Expressive, computable and explicit
• Real world results
3. Framework for situation recognition
• modeling,
• situation evaluation,
• personalized alerts
33. 33 / 46
Humans as sensors
• Can social media be used to detect real world events?
Observed Observed
S.No Category Event Physical Date Physical Location
Temporal Peak Spatial Peak
38.89, -77.03
1 Politics Health Care Bill passed 2010-03-21 2010-03-21 41, -74
(Washington)
37.77, -122.41
2 Politics California Prop 8, Trial Day 1 2010-01-11 2010-01-11 38,-122
(San Francisco)
31.13, -97.78
3 Society Fort Hood Shootings 2009-11-05 2009-11-05 33,-97
(Fort Hood, TX)
28.54, -81.38
4 Society SeaWorld Whale Accident 2010-02-12 2010-02-12 29,-81
(Orlando, FL)
Winter Olympics Opening 49.24, -123.11
5 Sports 2010-02-12 2010-02-12 44,-79
ceremony (Vancouver)
40.71, -74.00
6 Sports Baseball World Series final 2009-11-04 2009-11-04 41, -74
(New York)
34.05, -118.24
7 Entertainment Oscars 2010-03-07 2010-03-07 34, -118
(Los Angeles)
2010-03-12 to 30.26, -97.74
8 Entertainment South by Southwest festival 2010-03-15 30, -98
2010-03-21 (Austin, TX)
2010-01-05 to 36.17, -115.13
9 Tech. Conv. CES 2010 2010-01-06 34,-118
2010-01-07 (Las Vegas)
2010-02-10 to 33.76, -118.19
10 Tech. Conv. TED 2010 2010-01-10 34, -118
2010-02-13 (Long Beach,CA)
34. 34 / 46
Data representation + Algebra
• Applications
• Business analytics
• Political event analytics
• Seasonal characteristics
• Data
• Twitter feeds archive
• Loops of location based queries for different terms
• Over 100 million tweets using ‘Spritzer’/ ‘Gardenhose’ APIs
• Flickr feeds
• API: Tags, RGB values from >800K images
• Implementation
• Matlab + Java + Python
35. 35 / 46
iPhone theme AT&T
based e-mage, retail
Jun 2 to Jun 15, 2009 locations
. Convolution
Store
+ Add * catchment
area
Aggregate
Subtract
AT&T total
interest - catchment
area
<geoname>
Convolution
. @Spatial.Max Decision
<name>College City</name>
<lat>39.0057303</lat>
<lng>-122.0094129</lng>
Best Location is at <geonameId>5338600</geonameId>
*
<countryCode>US</countryCode>
Geocode [39, - <countryName>United
States</countryName>
122] , just north of <fcl>P</fcl>
Bay Area, CA <fcode>PPL</fcode>
<fclName>city, village,...</fclName>
<fcodeName>populated
place</fcodeName>
<population/>
Under-served <distance>1.0332</distance>
interest areas Store catchment </geoname>
area
36. 36 / 46
Seasonal characteristics analysis
• Fall colors in New England
• Show me the difference between red and green colors for New
England region, as it varies throughout the year.
• subtract(@spatial(sum)(πspatial(R=[(40,-76), (44,-71)]) (TEStheme=Red)),
@spatial(sum)(πspatial(R=[(40,-76), (44,-71)])(TEStheme=Green)))
0
Jan Dec
37. 37 / 46
Building applications using the framework
Application Data Operators
S.No Application Data Used Scale
deployed? modalities used
Wildfire detection in Satellite data,
1 Real Yes Macro F, A, Ch
California Google insights
2 Hurricane monitoring Simulated No Macro n/a F, A, Ch, P
Flu epidemic
3 Real No Macro Twitter, Census F, A, C
surveillance
Macro, Twitter, Air
Allergy/ Asthma
4 Real In-progress Personalized Quality, Pollen F, A, C
recommendation
alerts Count
Macro,
Thailand flood
5 Real Yes Personalized KML F, A, C
mitigation
alerts
Legend:
F = Filter,
A = Aggregate,
C = Classification,
Ch = Characterization,
P = Pattern Matching
38. 38 / 46
Wildfire recognition model (Satellite data)
Fire detector
ϵ {fire, non-fire},
(Satellite driven) <California, 24hrs, 0.01x
0.01>
AND
Significant band
Unclouded? Hot enough?
variation?
Thresh Thresh AND
=392 =310
Emage (12 µm band Emage (Mid IR
Absolute value Spatial Neighbor
temp.) surface temp.)
variation variation
Δ Δ Thresh= 30
∏ ∏
Thresh= 5
Satellite Satellite
Band 4 Spatial Neighborhood
Band 12
Difference value Difference
LAADS.com,
LAADS.com, <California, 24hrs, Subtract
<California, 24hrs, Subtract
0.01x 0.01>
0.01x 0.01>
Difference Neighborhood
Emage (4 µm Emage (11µm value Mean value
temperature)
temperature)
Convolve
(7X7)
Satellite Satellite
Band 12 Band 12
Difference value
LAADS.com, LAADS.com,
<California, 24hrs, <California, 24hrs,
0.01x 0.01> 0.01x 0.01>
39. 39 / 46
Wildfire recognition model (Social data)
Fire detector
ϵ {fire, non-fire},
(Social) <California, 24hrs,
0.01x 0.01>
And
Spatially anomalous Temporally anomalous
∏ ∏ Thresh=7
Thresh= 5
Difference with other Difference with Historical
areas today average
Subtract
Subtract
Emage (Google Spatial Avg. of Emage (Google Emage (Google
Insights- Fire) Interest Insights- Fire) Insights- Historical Avg)
Δ Δ Δ
Average
Google Emage (Google Google Google
Insights-Fire Insights- Fire) Insights-Fire Insights-Fire
Δ
Google.com/insights, Google.com/insights, Google.com/insights,
<California, 24hrs, <California, 24hrs, <California, 24hrs,
Metros> Google Metros> Metros>
Insights-Fire
Google.com/insights,
<California, 24hrs,
Metros>
40. 40 / 46
Wildfire recognition
Fire detector ϵ {fire, non-fire},
Situation <California, 24hrs,
0.01x 0.01>
OR
Modeling
Fire detector Fire detector
(Social) (Satellite)
Situation
Evaluation
50
45
40
35
30 Social detector
Results 25
20
Satellite detector
15 Combined
10 Ground truth
5
Number of
0
Wildfires detected
2010 2011 Total
43. 43 / 46
Social Life Networks
Connecting People and Resources
Situation aware routing
Information
Aggregation Situation
and Detection Alerts
Composition
Queries
Jain, Singh, Gao: Social Life Networks for the Middle of the Pyramid, ACM
43
Workshop on Social Media Engagement ‘11.
44. 44 / 46
Related Work: Snapshot
Area Combine Human Data Define Location Real-time Toolkits
hetero sensors analytics situations aware streams
data
Situation X X o o X
Awareness X
Situation X
Calculus
X
Web data o X X o X
mining
X X
Social media o X X o X
mining X X X
Multimedia X o o o
Event detection X
Complex event X X o X
processing/ X X
Active DB
GIS XX o X
X XX o
Mashup toolkits X X o X X
(Y! pipes, ifttt) X
This work X X X X X X X
o = partial support
45. 45 / 46
Future work
• EventShop:
• Personalization
• Scalability
• Prediction
• Using such tools to nudge people into taking
desired actions
• Supporting Grids and Graphs for analysis
• Social Life Networks
46. 46 / 46
Summary
• Personalized Actionable Situations
• 1st Systematic approach
• Situation Modeling
• EventShop: Web based system for
Situation Evaluation
• Apps: Democratize data and action taking
• Eco-system for data-to-action
49. 49 / 46
Analyzing Big Data
Field/ Approach Databases Networks Spatio-temporal
Data structure Tables Graphs Grids
Apps Business records, Internet traffic, Healthcare, Disaster
Banking Social network, relief, Business,
Roads Security
Problems Querying, Searching Shortest path, Situation detection
influence, anomaly
Operators Select, Project, Join Diameter, influence Select, Aggregate,
detection, connected ST characterization,
components ST pattern matching,
Classification
Modeling ER modeling, Network diagrams, Situation models
Query plan PetriNets
Tools SQL server, Oracle NS2, NetworkX EventShop
DBMS
50. 50 / 46
Geo-Social Power Laws
• Studied 5.6 Million Tweets for a month
• There is a fixed relative ratio for the occurrence of events
of different magnitude across space or time.
Across Space Across Time
Whole world
Only USA 1 month
1 week
Around
New York
1 day
3 weeks
city
30 mins 2 weeks
Log(Rank)
Log(Rank)
Log(Magnitude) Log(Magnitude)
Singh, Jain: Structural Analysis of Emerging Event-Web, (Short Paper)
World Wide Web Conference‘10.
51. 51 / 46
Situation Modeling
• A conceptual step before physically
implementing situation detection filters
• Analogy: E/R modeling, UML
• Helps domains experts externalize
concepts e.g. ‘Epidemic’
53. 53 / 46
Queries
• Seasonal characteristics
• Show me the segments based on
average greenery, as they vary
over the year.
• kmeans(n=3)(∏temporal(t>1293840)(TEStheme=‘green’))
• Political event analytics
• Show me the difference of
interests in Personalities (p1, p2) in
places where H is an issue.
• mult(diff(TEStheme=p1,TEStheme=p2),
thresholds(30)(TEStheme=H))
p1=Obama, p2=Romney, H=Guns,
Aug 9, 2012, via EventShop
This work required combination of efforts coming from stream data processing perspective and situation recognition from a media processing perspective. Hence parts of this work were done in collaboration with Mingyan. She looked at the problem from Stream data processing perspective, while I focused on defining situations as a concept, and their recognition. Specific focus of joint work was on 2b) Situation evaluation, and 3) EventShop implementation.