Automating Google Workspace (GWS) & more with Apps Script
Event Mining in Social Multimedia
1. Event Mining in Social Multimedia
Supervised Learning and Clustering Approaches
Symeon Papadopoulos
Information Technologies Institute (ITI)
Centre for Research & Technologies Hellas (CERTH)
Workshop on Event-based Media Integration and Processing
Barcelona, 21-22 October 2013
4. Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
Pope Francis
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
ACM Multimedia > EBMIP 2013
#4
Symeon Papadopoulos
5. demonstration /
riot / speech
news
personal
wedding / birthday / drinks
entertainment
concert / play / sports
ACM Multimedia > EBMIP 2013
#5
Symeon Papadopoulos
6. event multimedia hold value
• archiving/story-telling (personal use)
• news & media (journalists, editors)
• promotional material (organizers, artists)
• marketing (sponsors, advertisers)
ACM Multimedia > EBMIP 2013
#6
Symeon Papadopoulos
7. event multimedia lifecycle
PRE
DURING
POST
announcement
promotional material
shared online
EVENT MEDIA INDEXING &
REPLAY TECHNOLOGIES
BARELY COPE!
happening
attendants capture the event
(photos/videos)
attendants share & comment on event content
indexing & replay
COMMODITIZATION OF MEDIA
CAPTURING & SHARING > EXPLOSIVE
GROWTH OF EVENT MEDIA
ACM Multimedia > EBMIP 2013
annotation (tagging)
search > replay / reuse
#7
Symeon Papadopoulos
8. event media indexing wish list
• automatic: ideally parameter-free or with intuitive
parameters
• fast: casual users are impatient, professional users
need quick results
• scalable: possible to apply in very large collections
• serendipitous: discover non-obvious (long tail)
event multimedia
ACM Multimedia > EBMIP 2013
#8
Symeon Papadopoulos
10. multimedia event detection
event detection involves the automatic organization of
a multimedia collection C into groups of items, each
(group) of which corresponds to a distinct event.
COLLECTION
EVENT SET
E1
EVENT DETECTION
E2
EN
ACM Multimedia > EBMIP 2013
#10
Symeon Papadopoulos
11. event detection variants
are we interested in all events?
YES
do all input images depict events?
NO
partitioning
filter media +
clustering
discovery mode
clustering +
filter events
filter media +
clustering +
filter events
detection mode
NO
ACM Multimedia > EBMIP 2013
#11
Symeon Papadopoulos
12. variant 1
• all input media items depict events
• all possible output events are of interest
• scenario: personal/professional collection consisting
solely of events > need for automatic organization
• approach: produce a partitioning (non-overlapping
clusters that cover the full set of media items) of the
input collection into events
ACM Multimedia > EBMIP 2013
#12
Symeon Papadopoulos
13. variant 2
• input media items may depict anything
• all possible output events are of interest
• scenario: media collected from the Web > discovery
of interesting event media content
• approach: (a) filter non-event media items > use
approach of variant 1, (b) cluster media items
(hoping that resulting clusters will be purely event or
non-event) and filter non-event clusters
ACM Multimedia > EBMIP 2013
#13
Symeon Papadopoulos
14. variant 3
• all input media items depict events
• not all possible output events are of interest
• scenario: personal/professional collection of event
content > retrieval of target events
• approach: cluster media items into events and filter
based on desired event attributes (e.g. location,
type, etc.)
ACM Multimedia > EBMIP 2013
#14
Symeon Papadopoulos
15. variant 4
• input media items may depict anything
• not all possible output events are of interest
• scenario: media collected from the Web > retrieval
of target events
• approach: (a) approach of variant 1a + filter events
by desired attributes, (b) approach similar to 1b, but
not only filter non-event clusters, but also noninteresting event clusters
ACM Multimedia > EBMIP 2013
#15
Symeon Papadopoulos
16. prevalent problems
• clustering
– group media items into events
• cluster classification
– does a particular cluster represent an event? if so, what
type of event does it represent?
• media item classification
– does a media item depict an event? what type?
ACM Multimedia > EBMIP 2013
#16
Symeon Papadopoulos
17. how to tackle them?
we are going to explore two paradigms:
• unsupervised clustering + cluster classification >
variant 2 + variant 4
by Quack et al., CIVR2008
[extended by Papadopoulos et al., Multimedia 2011]
• supervised clustering > variant 1 + variant 3
by Reuter et al., ICMR2012
[extended by Petkos et al., ICMR2012/MMM2014]
ACM Multimedia > EBMIP 2013
#17
Symeon Papadopoulos
22. limitations
• applicable only to geotagged images
– assumes quite accurate positioning ~100m
• dissimilarity matrix computation is expensive!
– hard to scale to sets much larger than 10,000
• homography mapping expensive (due to featurefeature matching)
• cluster classification sensitive to clustering results (if
a landmark cluster is split into two smaller ones, it
may be incorrectly classified as event)
ACM Multimedia > EBMIP 2013
#22
Symeon Papadopoulos
23. extension
Papadopoulos et al. Multimedia 2011
• city-based image collection (does not require considerable
geotagging accuracy)
• construction of hybrid image similarity graph
– visual: SIFT + BoW + top-20 + median similarity filtering
– text: two options
• cheap: cooccurrence frequency (exclude frequent tags) + filtering
• costly: tag occurrence vectors > LSI > low-dimensional vectors > top-K
• graph clustering: SCAN (Xu et al., KDD2007)
• cluster classification
– two features + two tag-based features + SVM/kNN
• cluster naming
– frequent tag sequence mining (from titles)
ACM Multimedia > EBMIP 2013
#23
Symeon Papadopoulos
24. graph clustering :: SCAN
hub
(μ,ε)- core
structural similarity
outlier
• resilient to spurious links (e.g. visual links that connect
unrelated images)
• very fast (scales linearly to the number of edges)
• leaves less-/ and over-connected items out of the clustering
ACM Multimedia > EBMIP 2013
#24
Symeon Papadopoulos
25. tag-based cluster features
• manually label clusters as “landmarks” or “events”
• aggregate tags of contained images and derive corresponding
tag profiles*
EVENT
LANDMARK
• for a new cluster compute number of contained tags in each
of the two profiles > two additional features
* could be city-specific or global
ACM Multimedia > EBMIP 2013
#25
Symeon Papadopoulos
26. caveats
• graph construction may affect results
– k-nn versus ε-nn, parameter selection
– modality combination (in our case very simplistic)
• graph clustering
– does not take into account weights
– sometimes it leaves out of the clusters far too many items
• cluster classification
– sensitive to cluster granularity (e.g. fragmented clusters are very
challenging since first two features are misleading)
• cluster naming
– unreliable for small clusters, depends a lot on contained items (quality
of metadata, text language)
ACM Multimedia > EBMIP 2013
#26
Symeon Papadopoulos
28. supervised event detection
• rationale: use a large number of “known” event
assignments to “learn” how to classify new content
into events
two main paradigms
• item-to-cluster: learn whether a new item belongs to
a given event cluster or not
• item-to-item: learn whether two items belong to the
same event cluster or not
ACM Multimedia > EBMIP 2013
#28
Symeon Papadopoulos
30. supervised clustering
Reuter et al., ICMR2012
• blocking
– six database queries to retrieve 330 nearest events in terms of:
capture time (200), upload time (50), geo-location (20),
tag/title/description similarity (20/20/20)
• new image-candidate event pair described by nine features
– temporal similarity (upload+capture), proximity (Haversine formula),
tag/title/description similarity using cosine and BM25
• same event classification and clustering
– SVM used to rank candidate events (from blocking) based on
probability that new image belongs to them + second classifier (SVM)
to decide whether new image should start a new event (separate
features, incl. first SVM prediction scores + time difference)
ACM Multimedia > EBMIP 2013
#30
Symeon Papadopoulos
31. limitations
• simplistic treatment of missing metadata
– set similarity equal to 0 when metadata (e.g. geo-location)
is missing > could be misleading in case the two items
would actually be similar if such information was available
• for some features, representing an event by a proxy
(using centroids for aggregation) might not be rich
enough, e.g. in cases of geo-location
– this is a general characteristic of item-to-cluster methods
• does not make use of visual content
– makes approach faster at the expense of missing some
associations that might only surface in the form of visual
similarity (e.g. when metadata are of poor quality)
ACM Multimedia > EBMIP 2013
#31
Symeon Papadopoulos
32. extension
Petkos et al., ICMR2012/MMM2014
• blocking
– similar to Reuter et al. 2012 (except that it retrieves most similar
images, not events) but also includes visual similarity (VLAD + Product
Quantization) [MMM2014]. Up to 350* similar images are retrieved.
• image-image pair described by 11 similarity values:
– uploader (0/1), image (GIST and SURF+VLAD), text (same as in Reuter
et al., 2012), quantized time difference, geodesic distance (in km)
– two separate classifiers are trained, one when both images have
location information, and one when either of the two does not
• clustering
– a same-event graph is constructed based on the predictions of the
classifiers
– graph clustering is carried out in two flavours: batch (by use of SCAN)
and online by use of QCA (Nguyen et al., 2011) [MMM2014]
* in practice much lower (~100-200) due to overlap between candidates from different similarities
ACM Multimedia > EBMIP 2013
#32
Symeon Papadopoulos
33. online clustering of same-event graph
QCA maintains community structure incrementally following
graph change operations: node & edge addition (removal
operations not applicable in same event graph): based on the
concept of community attraction forces
Cz
new edge
new node
force from Cu to Cz
A
D
X
force from Cz to Cu
C
Cw
B
Cu
• Depending on a test (computed based on local
graph structure), community structure could
remain the same, X assigned to Cu or A to Cz.
• If A is assigned to Cu, all its neighbours will be
checked for potential reassignment.
ACM Multimedia > EBMIP 2013
#33
Symeon Papadopoulos
34. caveats
• the method requires maintaining the same-event
graph in-memory
– starts becoming hard to apply in collections bigger than
some hundreds of thousands of images
– in general, item-to-item event detection methods are less
scalable compared to item-to-cluster > potential solution
by use of graph databases
• in batch mode, the use of SCAN leads to images
being excluded from clusters
– variants of the algorithm to make it partitional if necessary
(by assigning hubs & outliers to adjacent clusters)
ACM Multimedia > EBMIP 2013
#34
Symeon Papadopoulos
36. how to evaluate?
• different approach depending on problem variant
• for variants 2 and 4, it is hard to create ground truth
(since we are interested in all possible events)
– implicit measures of cluster goodness
– user-based
• for variants 1 and 3, it is possible to collect or create
comprehensive ground truth
– mediaeval
ACM Multimedia > EBMIP 2013
#36
Symeon Papadopoulos
37. case study:
landmark & event discovery in Barcelona
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
38. dataset
Geo-query to flickr API with centre in Barcelona (2010)
• 207,750 photos by 7,768 users
• tag pre-processing:
– filter very short and very long tags
– tags consisting of alphanumeric characters (e.g. camera models)
– tags from a blacklist (e.g. “geotagged”)
• 33,959 tags > 173,825 photos with at least one of them
• remove tags used in more than 350 photos (e.g. “Barcelona”,
“Catalunya”) > 120,742 photos with at least one of them
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
39. implicit evaluation of clustering quality
• perform the clustering without making use of location
information, and then measure how coherent the resulting
clusters are > measure of quality (i.e. tight clusters > more
likely to not contain irrelevant images)
• we call the measure GCC, Geospatial Cluster Coherence
mean
std
SCAN graph
clustering
k-means data
clustering
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
40. user-based evaluation
• random selection of 33 visual and 40 tag-based clusters (from SCAN) and
corresponding k-means clusters (based on member sets overlap)
• each cluster was presented to two independent evaluators and they were
asked to mark (in a Web UI) the images that were not perceived as
relevant > P, R* (and F) + κ-statistic
• we call this SCQ, Subjective Cluster Quality
+ in a second study, we
compared visual, tag &
hybrid (all from SCAN)
> hybrid were found to
have an F-score 28.5%
higher than visual and
19.8% than tag-based
* this is a pseudo-recall, computed by pooling “correct” images from all methods together
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
44. a bit of background...
• mediaeval
– well-known benchmarking activity since 2010 (started as
VideoCLEF in 2008)
– consists of several tasks dedicated to specific challenges
• social event detection (SED)
– first run in 2011 (7 participants)
– this year was the third edition of the task with a bit
different challenge definitions and increased participation!
(11 participants)
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
45. task definition & dataset
• 2011 collection: 73,645 flickr photos from five cities, May 2009
find events related to two target categories
variant 4
> soccer matches in Barcelona and Rome
> concerts in venues Paradiso and Parc del Forum
• 2012 collection: 167,332 flickr photos from five cities, 2009-2011
find events related to three target categories
variant 4
> technical events (e.g. exhibitions, fairs) in Germany
> soccer events in Hamburg and Madrid
> Indignados movement in Madrid
• 2013 collection 1: 437,370 flickr photos + 1,327 YouTube videos
collection 2: 57,165 Instagram photos
variant 1 cluster collection 1 into events (attach YouTube videos to them)
categorize collection 2 images into eight event types or non-event
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
46. sed2012: evaluation setup
• approach by Petkos et al., MMM2014
– method designed for event detection as in variant 4 > used
only 7,779 photos belonging to events in order to assess
clustering quality (=Normalized Mutual Information, NMI)
• ground truth: photos clustered around 149 events
(18 technical, 79 soccer, 52 Indignados)
• assess the following aspects:
– accuracy of same-event classification
– compare clustering quality between item-to-cluster and
the two versions of item-to-item (batch & incremental)
– measure contributions of different features
– study generalization abilities of same event model
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
47. sed2012: SE accuracy & clustering quality
• same event classification accuracy 98.58% (SVM)
– 10K pos/neg training, 10K pos/neg testing (random)
• clustering quality (NMI): 30/119 training/testing events [10 random splits]
– incremental same or better than batch
– item-to-item better than item-to-cluster (significant at 0.95 confidence)
BATCH
INCREMENTAL
ITEM-TO-CLUSTER
AVG
0.924
0.934
0.898
STD
0.019
0.021
0.027
• when non-event photos enter the dataset, NMI degrades quickly
NON-EVENT
BATCH
INCREMENTAL
ITEM-TO-CLUSTER
5%
0.4824
0.5164
0.3954
10%
0.3421
0.3683
0.2899
*
* In the second table, results were obtained using sed2011 for training and sed2012 for testing.
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
48. sed2012: contribution of features
• same experiments using limited sets of features
FEATUERS
BATCH
INCREMENTAL
VISUAL
0.8020 ∓ 0.0193
0.8179 ∓ 0.0151
TEXTUAL
0.7925 ∓ 0.0255
0.7792 ∓ 0.0310
VISUAL+TIME
0.9244 ∓ 0.0195
0.9360 ∓ 0.0183
TEXTUAL+TIME
0.9016 ∓ 0.0173
0.9049 ∓ 0.0209
• repeating the same experiments without the use of
blocking led to significantly worse results
– e.g. 0.030 for visual, 0.7148 for textual
• time is an extremely important feature
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
49. sed2012: generalizing same event model
• train using one event type > test on a different one
• in most cases negative impact
• in few cases, performance is very high!
BATCH
soccer
technical
Indignados
soccer
-
0.8658
0.8494
technical
0.7967
-
0.8977
Indignados
0.9645
0.8456
-
INCREMENTAL
soccer
technical
Indignados
soccer
-
0.8892
0.8667
technical
0.7661
-
0.7735
Indignados
0.9845
0.8482
-
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
50. sed2013 (just a couple of days ago!)
• challenge 1 (full clustering into events)
– modified version of method by Petkos et al. MMM2014
post-processing step to assign hubs & outliers (by SCAN) to
detected events (different variations used in different runs)
– median performance (compared to other teams)
ex. results: NMI = 0.9131, F = 0.7031, divergence = 0.6367
• challenge 2 (classification into event types)
– method based on combining VLAD/PCA + tags/pLSA and
Approximate Laplacian Eigenmaps (Mantziou et al., 2013)
– median performance (compared to other teams)
ex. Results: F1 = 0.3344, F1 div. = 0.2261,
F1 (E/NE) = 0.7163, F1 div. (E/NE) = 0.2157
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
51. evaluation: main caveat
• creation strategy of benchmark dataset can
dramatically affect how hard (or easy) the problem is
– if events are very sparsely distributed over time, then a
simple time-based clustering could be sufficient
– if events correspond to users one-to-one, then a simple
user-based look-up could yield very high accuracy
– using the same source for training/testing makes it easy
• need to explore new challenging settings
– multiple sources of multimedia
– huge amounts of non-event content
– very dense coverage of feature space by test events
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
53. the many faces of event detection
• event detection in multimedia can be formulated in
different ways
– we examined four variants
– essentially a combination of clustering & classification
• depending on the setting, unsupervised clustering or
supervised learning are valid options for tackling the
problem
• presented two frameworks (+extensions) for
different variants of the problem
• discussed different evaluation strategies & datasets
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
54. related research problems
• event crawling
– where to look for content that is likely related to events?
– what kind of queries to formulate?
• event search & recommendation
– assume a very large index of events
– what to retrieve?
• event summarization
– have found & indexed many photos for an event
– how/what to present?
ACM Multimedia > EBMIP 2013
#54
Symeon Papadopoulos
55. holy grail for event detection
• query with event name
• obtain a summary of relevant media from different
sources (twitter, facebook, google+, flickr, ...)
• drill down into sub-events
• event analytics/statistics
• recreate considerable part of event experience from
indexed media content + data
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
56. Special Issue
• Social Multimedia and Storytelling: using social media for
capturing, mining and recreating experiences, events and
places
–
–
–
–
–
place- and event-centric social multimedia discovery and collection;
social event detection;
real-world place and event mining and analytics;
place and event summarization through social content;
...
• editors:
– Pablo Cesar, Ayman Shamma, Aisling Kelliher, Ramesh Jain, me
• expected submission date: July 1st 2014
• call for papers not yet online (coming soon)
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
59. references (i)
• Quack, T., Leibe, B., & Van Gool, L. (2008). World-scale mining of objects
and events from community photo collections. In Proceedings of the 2008
international conference on Content-based image and video retrieval (pp.
47-56). ACM.
• Papadopoulos, S., Zigkolis, C., Kompatsiaris, Y., & Vakali, A. (2011).
Cluster-based landmark and event detection on tagged photo
collections. IEEE Multimedia 18(1), (pp. 52-63)
• Reuter, T., & Cimiano, P. (2012, June). Event-based classification of social
media streams. In Proceedings of the 2nd ACM International Conference
on Multimedia Retrieval (p. 22). ACM.
• Petkos, G., Papadopoulos, S., & Kompatsiaris, Y. (2012). Social event
detection using multimodal clustering and integrating supervisory signals.
In Proceedings of the 2nd ACM International Conference on Multimedia
Retrieval (p. 23). ACM.
ACM Multimedia > EBMIP 2013
#59
Symeon Papadopoulos
60. references (ii)
• Petkos, G., Papadopoulos, S., Schinas, M., Kompatsiaris, Y. (2014). Graphbased Multimodal Clustering for Social Event Detection in Large
Collections of Images. In Proceedings of the 20th international conference
on Multimedia Modeling, to appear.
• Xu, X., Yuruk, N., Feng, Z., & Schweiger, T. A. (2007). SCAN: a structural
clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD
international conference on Knowledge discovery and data mining (pp.
824-833). ACM.
• Nguyen, N. P., Dinh, T. N., Xuan, Y., & Thai, M. T. (2011). Adaptive
algorithms for detecting community structure in dynamic social networks.
In 2011 Proceedings of IEEE INFOCOM, (pp. 2282-2290). IEEE.
• Mantziou, E., Papadopoulos, S., & Kompatsiaris, Y. (2013). Large-scale
semi-supervised learning by Approximate Laplacian Eigenmaps, VLAD and
pyramids. In 14th International Workshop on Image Analysis for
Multimedia Interactive Services (WIAMIS), 2013 (pp. 1-4). IEEE.
ACM Multimedia > EBMIP 2013
Symeon Papadopoulos
Archiving: capture moments and then show them to friends/replay them/tell stories
News & media: coverage, convey the image of an important happening to the world
Promotional material: photos can be great attractors to future events
Marketing: Sponsors can blend their brand into event content (e.g. Fischer at TIFF), advertisers can gain better understanding of the audience/clients by analyzing photos of the event (e.g. demographics/gender of people)
Unscheduled or small-scale events typically do not have the PRE phase.