SlideShare une entreprise Scribd logo
1  sur  47
SocialSensor: Sensing User Generated Input for
Improved Media Discovery and Experience
Social Multimedia Crawling & Mining
EventSense: Capturing the Pulse of Large-scale Events by
Mining Social Media Streams
Dr. Yiannis Kompatsiaris, Project Coordinator
Samos 2013 Summit on Digital Innovation for Government, Business and Society
#2
Overview
• Motivation
• Objectives
• Architecture
• Use Cases and Requirements
• News
– Social Multimedia Crawling & Mining
• Infotainment
– EventSense: Capturing the Pulse of Large-scale Events by
Mining Social Media Streams
• Conclusions
#3
What is SocialSensor?
• 3-year FP7 European Integrated Project
– http://www.socialsensor.eu
• Members: CERTH, ATC (Greece), Deutsche Welle,
University Koblenz, Research Center for Artificial
Intelligence (Germany), The City University London,
Alcatel – Lucent Bell Labs, JCP Consult (France),
University of Klagenfurt (Austria), IBM Israel, Yahoo
Iberia + Robert Gordon University Aberdeen (UK)
• 1.5 years into the project (Development of user
requirements, use case scenarios, architecture and
implementation and first R&D components and
prototypes. Currently: Evaluation and 2nd
round)
Motivation: Social Networks as Sensors
• Social Networks is a data source with an
extremely dynamic nature that reflects
events and the evolution of community
focus (user’s interests)
• Transform individually rare but
collectively frequent media to meaningful
topics, events, points of interest,
emotional states and social connections
• Mine the data and their relations and
exploit them in the right context
• Scalable mining and indexing approaches
taking into account the content and social
context of social networks
Relevant Applications
Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and
Jiawei Han. The wisdom of social multimedia:
using flickr for prediction and forecast,
International conference on Multimedia (MM '10). ACM.
Federal Emergency Management Agency plans to
engage the public more in disaster response by
sharing data and leveraging reports from mobile
phones and social media
5
“…if you're more than 100 km away from the epicenter
[of an earthquake] you can read about the quake on
twitter before it hits you…”
Objective
SocialSensor quickly surfaces trusted and relevant material
from social media – with context
DySCODySCO
behaviou
r
location
timecontent
usage
social context
Massive social media
and unstructured web
Social media mining
Aggregation & indexing
News - Infotainment
Personalised access
Ad-hoc P2P networks
#7
The SocialSensor Vision
SocialSensor quickly surfaces trusted and relevant
material from social media – with context.
•“quickly”: in real time
•“surfaces”: automatically discovers, clusters and searches
•“trusted”: automatic support in verification process
•“relevant”: to the users, personalized
•“material”: any material (text, image, audio, video =
multimedia), aggregated with other sources (e.g. web)
•“social media”: across many relevant social media platforms
•“with context”: location, time, sentiment, influence, trust
#10
Conceptual Architecture and Main components
SEMANTIC MIDDLEWARE
Public
Data
In-project
Data
SEARCH & RECOMMENDATION
USER MODELLING & PRESENTATION
INDEXINGMINING
STORAGE
DATA COLLECTION / CRAWLING
• Real time dynamic topic
and event clustering
• Trend, popularity
and sentiment analysis
• Calculate trust/influence
scores around people
• Personalized search,
access & presentation
based on social network
interactions
• Semantic enrichment
and discovery of services
DySCO concept
• Integrate social content mining, search and intelligent
presentation in a personalized, context and network-
aware way, through the new concept of Dynamic Social
COntainers (DySCOs)
• Composite objects containing a number of items (e.g. articles,
tweets, images, videos)
• Focused on a particular topic of interest (e.g. an event, a story)
• Contain all available information about the topic
• Metadata can be added dynamically
• Ability to search for DySCOS by matching “DySCO features” with
“search features”
• Ability to check and recommend similar DySCOs
recommendations
#11
#13
Use Cases: News
#14
“It has changed the way we do
news”(MSN)
“Social media is the key place for emerging stories –
internationally, nationally, locally” (BBC)
“Social media is transforming the way we do journalism”
(New York Times)
Source: picture alliance / dpa
#15
#16
Source: Getty Images
“It’s really hard to find the nuggets of useful stuff
in an ocean of content” (BBC)
“Things that aren’t relevant crowd out the content
you are looking for” (MSN)
“The filters aren’t configurable
enough” (CNN)
Verification was simpler in the past...
Source: Frank Grätz
#17
#18
An example: BBC Verification Procedure: Arab
Spring Coverage
• Referencing locations against maps and existing images
from, in particular, geo-located ones.
• Working with our colleagues in BBC Arabic and BBC
Monitoring to ascertain that accents and language are
correct for the location.
• Searching for the original source of the
upload/sequences as an indicator of date.
• Examining weather reports and shadows to confirm that
the conditions shown fit with the claimed date and time.
• Maintaining lists of previously verified material to act as
reference for colleagues covering the stories.
• Checking scenery, weaponry, vehicles and licence plates
against those known for the given country.
News application
• Real-time Search/browse news items crawled from different social media
• Automatically discovered trending topics
• Web analytics
• Sentiment scores for topics
#19
Alethiometer
• Measuring the degree of truth behind tweets
• Overall trust score for a tweet
• Various Contributor, Content, Context validity metrics
#20
#21
Social Multimedia Crawling & Mining
Case study: #OccupyGezi
E. Schinas, S. Papadopoulos, I. Tsampoulatidis, K. Iliakopoulou, Y. Kompatsiaris
#22
Multimedia crawling & mining
• Monitor/query multiple sources for shared media
content: Twitter, Facebook, Flickr, YouTube, etc.
• Multiple indexing schemes:
– text-based (Solr)
– visual content-based (SURF+VLAD for feature extraction,
ADC for similarity-based indexing)
• Clustering
– geo-spatial (BIRCH)
– visual (SCAN)
• Web-based presentation of results
#23
Crawling & mining system deployment
StreamManager
Twitter Facebook Flickr YouTube RSS Instagram
160.xx.xx.207
MongoDBWrapper
160.xx.xx.207
TextIndexer (Solr)
160.xx.xx.207
160.xx.xx.207
MediaFetcher, FeatureExtractor (HDFS)
160.xx.xx.58 160.xx.xx.107
Social Focused Crawler (HDFS)
160.xx.xx.187
Nutch
Nutch VLAD
FeatureIndexer (HDFS)
160.xx.xx.207
IVFADC
Data Mining
160.xx.xx.191
Visual Clust. Geo Clust. Statistics
Web server
160.xx.xx.116
API (3)API (4)
API (1) API (2)
#24
#OccupyGezi
• Monitors:
Keywords: gezipark, taksimgezipark, Taksim, Taksim Gezi Park
Location: Istanbul
• Current statistics:
#25
Geographical spread of event
• Seems like it is not a localized event (as several
official Turkish news sources claimed), but spreads
all over Turkey and even in major cities abroad
#26
Different granularities
#27
Trending media by use of clustering
#28
Visual Memes
#29
Statistics
#31
Use Cases: Infotainment
#32
Capturing & mining large-scale events
• Large-scale events  attended by thousands of
people  captured by mobile devices in the form of
status updates, photos, ratings, etc.
• Challenge:
– Organize information
around entities of
interest
– Extract meaningful
insights, obtain
informative summaries
• EventSense framework
#33
Infotainment
• Thessaloniki
International Film
Festival
– 80,000 viewers / 100,000
visitors in 10 days
– 150 films, 350 screenings
• Fete de la Musique
Berlin
– 100,000 visitors every
year
– 5,000 musicians
#35
ThessFest
ThessFest• Thessaloniki
International Film
Festival
• Support
twitter/comment
usage within the app
• Ratings and
comments per film
• Feedback
aggregation
– Votes
– Tweets
• Real-time feedback
to the organisation
and visitors
#36
ThessFest
• Gather “realistic” user requirements
• Early showcase and evaluation of SocialSensor
technologies in real-world event scale
• Engage users and create an informed user basis
• TDF14, 15 + TIFF53
– 1400+ users
– 40K+ user sessions
– positive response to social media
• Next version
– Updated features bases on SocialSensor prototype
Fête de la Musique Berlin app
• FETEberlin in App Store and Google Play
• More than 100K visitors
• About 5K musicians
• More than 5K app downloads
App features
•Browse and filter detailed program
•Interactive maps and routing
•Social Sharing
•Artists’ and Stages Details
•Social Monitoring
Main benefits for attendants
•Visitors can browse through maps and
don’t get lost as stages are numerous
•Event schedule is available always and per
stage
– Very useful when the server was down and
there was no access to the online schedule
#37
Fête de la Musique Berlin app
FETEberlin Facts & User Feedback
Unique Users Sessions Frequency of Use
App Store 2904 13751 2,5 sessions per day
Google Play 2210 12097 2,9 sessions per day
Total 5114 25848 Avg: 2,7 sessions
per day
Future Plans
•Enhanced Event and Visitor
Engagement
•Send Last minute updates
•Create buzz around the event and make
users the event ambassadors
•Gain insightful knowledge on the
impact that the event made via social
media analysis
– Organize better events
#39
#40
EventSense: Capturing the Pulse of Large-scale
Events by Mining Social Media Streams
Case study: Thessaloniki International Film Festival
E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris,
Y. Mass, J. Herzig, L. Boudakidis
#41
Entity Detection
• Entities are defined as lists of properties:
– a film consists of a title, description, names of
director(s)/actors
• Matching status updates (tweets) to entities relies on
representing both as vectors, using cosine similarity,
and thresholding:
m: message (tweet), f: feature (term), M: set of all event messages
boost(f): boosting factor when f is a named entity
#42
Topic detection
• For each new message, find the Nearest Neighbour
(NN) using Locality Sensitive Hashing (LSH)
• If similarity exceeds an empirically selected
threshold, assign to the topic of NN, otherwise
create new topic
• Clusters of one messages are discarded as outliers
• Cluster-merging is conducted as a post-processing
step to compensate for topic over-segmentation
• Similar approach to (Petrovic et al., 2010)
S. Petrovic, M. Osborne, V. Lavrenko. Streaming first story detection with application to twitter. In
Human Language Technologies, HLT ’10, pages 181–189, Stroudsburg, PA, USA, 2010. ACL
#43
Sentiment detection (1/2)
• Build positive/negative sentiment classifiers using
emoticons
• Build neutral classifier using positive/negative
classifiers
• Feature extraction:
– Remove stop words, emoticons, terms occurring only
once, trim repeated letters
– Negation terms (“not”, “isn’t”) are attached to subsequent
terms to form new unigrams (e.g. “nothappy”)
– Treat user mentions, URLs, punctuation, repeated letters
and all-caps words as additional features
#44
Sentiment detection (2/2)
• Naïve Bayes classifier per sentiment
• P(f|c) estimate using ML and Laplace correction
• Probability estimate for special features (e.g. user mentions)
using Bernoulli model and Laplace correction
• Classification using maximum log-likelihood
• Neutral messages:
– Mutual Information (MI)
– MI of features
– Sentiment intensity of message
– Use thresholding for decision
#45
Evaluation
• Case study: 53rd
Thessaloniki International Film Festival
(TIFF53), Nov 2-11, 2012
• 168 films included in the program (titles, descriptions in
Greek and English)
• 3974 tweets using #tiff53
• Manual annotation regarding:
– film
– sentiment (pos/neg/neut)
• Additional data using ThessFest mobile app:
– #bookmarks per film (number of times a user added the film to their
schedule)
– #ratings + avg. rating per film
#46
Tweet-film matching
• film = <title, description, directors, actors>
• Multiple entity representations using Greek/English/both, uni-/bi-grams
• Similarity threshold sensitivity analysis
Pooling multiple representationsthreshold ∈ (0.1, 0.3)
#47
Topic analysis
• Top-10 topics
• Manual inspection
of clusters:
– 53.8% of topic titles
considered
informative
– 98.5% of clusters
were found to be
“clean”
• Topics in time
#48
Sentiment analysis
• Training (using emoticons and Twitter API)
– 800K positive & negative tweets for English
– 12K positive & negative tweets for Greek
• Tuning (for threshold)
– Manually annotated dataset from Thessaloniki Documentary Festival
(similar event)
– 325/73/553 in English and 781/216/781 in Greek
• Testing
– 324/33/724 in English and 901/315/1667 in Greek
– Best accuracy (English) ~ 0.75
– Performance in Greek much poorer
compared to English 
need for richer training corpus
pos neg neut
#49
Aggregation & summarization (1/2)
#T: number of tweets
Pol: polarity of film tweets
Subj: subjectivity of film tweets
R: average rating
#R: number of ratings
#F: number of times the film was bookmarked
• Films with positive polarity are rated higher.
• Films that are tweeted a lot are also more
likely to be rated.
• Films that are tweet a lot are also more likely
to be added to the users’ bookmarks.
#50
Aggregation & summarization (2/2)
Most active & influential Twitter
accounts (+sentiment per user)
Most shared photos
(+number of retweets)
Conclusions
• Great interest in both use cases
• In news social media have transformed both news generation and
consumption
• Social media data mining can provide interesting results in
many applications
• Not all data always available (e.g. User queries, fb)
– Infrastructure, Policy issues
• Technical challenges
– Fusion (multi-modality, context), real-time, noise, big data,
aggregation (web, Linked Open Data)
• Applications challenges
• User engagement, visualization, become part of existing workflows,
privacy, copyright, commercialization
Thank you!

Contenu connexe

Similaire à SocialSensor Project Captures Real-Time Insights from Social Media

Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Yiannis Kompatsiaris
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Yiannis Kompatsiaris
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Crowdsourced planning nash_27mar2014.pptx
Crowdsourced planning nash_27mar2014.pptxCrowdsourced planning nash_27mar2014.pptx
Crowdsourced planning nash_27mar2014.pptxAndrew Nash
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectPRELIDA Project
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Peter Löwe
 
AMSWMC MV NPD.pptx
AMSWMC MV NPD.pptxAMSWMC MV NPD.pptx
AMSWMC MV NPD.pptxAna Canhoto
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitiesCUbRIK Project
 
Geeks bearing gifts: Unwrapping New Technologies, Version April12
Geeks bearing gifts: Unwrapping New Technologies, Version April12Geeks bearing gifts: Unwrapping New Technologies, Version April12
Geeks bearing gifts: Unwrapping New Technologies, Version April12ayoungkin
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsAdam Papendieck
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Yiannis Kompatsiaris
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ librariesKai Li
 
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Andrei Ciortea
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) WebDavid Crowley
 

Similaire à SocialSensor Project Captures Real-Time Insights from Social Media (20)

Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Crowdsourced planning nash_27mar2014.pptx
Crowdsourced planning nash_27mar2014.pptxCrowdsourced planning nash_27mar2014.pptx
Crowdsourced planning nash_27mar2014.pptx
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...Data Science: History repeated? – The heritage of the Free and Open Source GI...
Data Science: History repeated? – The heritage of the Free and Open Source GI...
 
AMSWMC MV NPD.pptx
AMSWMC MV NPD.pptxAMSWMC MV NPD.pptx
AMSWMC MV NPD.pptx
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
Geeks bearing gifts: Unwrapping New Technologies, Version April12
Geeks bearing gifts: Unwrapping New Technologies, Version April12Geeks bearing gifts: Unwrapping New Technologies, Version April12
Geeks bearing gifts: Unwrapping New Technologies, Version April12
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)
 
Lesson3 esa summer_school_brovelli
Lesson3 esa summer_school_brovelliLesson3 esa summer_school_brovelli
Lesson3 esa summer_school_brovelli
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ libraries
 
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) Web
 
Social Media Dataset
Social Media DatasetSocial Media Dataset
Social Media Dataset
 

Plus de Yiannis Kompatsiaris

AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionYiannis Kompatsiaris
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingYiannis Kompatsiaris
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionYiannis Kompatsiaris
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and ApplicationsYiannis Kompatsiaris
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaYiannis Kompatsiaris
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...Yiannis Kompatsiaris
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Yiannis Kompatsiaris
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...Yiannis Kompatsiaris
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsYiannis Kompatsiaris
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Yiannis Kompatsiaris
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsYiannis Kompatsiaris
 

Plus de Yiannis Kompatsiaris (13)

AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI session
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted Living
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event Detection
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and Applications
 
Processing Large Complex Data
Processing Large Complex DataProcessing Large Complex Data
Processing Large Complex Data
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
Dem@care Project Short Overview
Dem@care Project Short OverviewDem@care Project Short Overview
Dem@care Project Short Overview
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applications
 

Dernier

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

SocialSensor Project Captures Real-Time Insights from Social Media

  • 1. SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience Social Multimedia Crawling & Mining EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Dr. Yiannis Kompatsiaris, Project Coordinator Samos 2013 Summit on Digital Innovation for Government, Business and Society
  • 2. #2 Overview • Motivation • Objectives • Architecture • Use Cases and Requirements • News – Social Multimedia Crawling & Mining • Infotainment – EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams • Conclusions
  • 3. #3 What is SocialSensor? • 3-year FP7 European Integrated Project – http://www.socialsensor.eu • Members: CERTH, ATC (Greece), Deutsche Welle, University Koblenz, Research Center for Artificial Intelligence (Germany), The City University London, Alcatel – Lucent Bell Labs, JCP Consult (France), University of Klagenfurt (Austria), IBM Israel, Yahoo Iberia + Robert Gordon University Aberdeen (UK) • 1.5 years into the project (Development of user requirements, use case scenarios, architecture and implementation and first R&D components and prototypes. Currently: Evaluation and 2nd round)
  • 4. Motivation: Social Networks as Sensors • Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) • Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections • Mine the data and their relations and exploit them in the right context • Scalable mining and indexing approaches taking into account the content and social context of social networks
  • 5. Relevant Applications Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM. Federal Emergency Management Agency plans to engage the public more in disaster response by sharing data and leveraging reports from mobile phones and social media 5 “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”
  • 6. Objective SocialSensor quickly surfaces trusted and relevant material from social media – with context DySCODySCO behaviou r location timecontent usage social context Massive social media and unstructured web Social media mining Aggregation & indexing News - Infotainment Personalised access Ad-hoc P2P networks
  • 7. #7 The SocialSensor Vision SocialSensor quickly surfaces trusted and relevant material from social media – with context. •“quickly”: in real time •“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process •“relevant”: to the users, personalized •“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web) •“social media”: across many relevant social media platforms •“with context”: location, time, sentiment, influence, trust
  • 8. #10 Conceptual Architecture and Main components SEMANTIC MIDDLEWARE Public Data In-project Data SEARCH & RECOMMENDATION USER MODELLING & PRESENTATION INDEXINGMINING STORAGE DATA COLLECTION / CRAWLING • Real time dynamic topic and event clustering • Trend, popularity and sentiment analysis • Calculate trust/influence scores around people • Personalized search, access & presentation based on social network interactions • Semantic enrichment and discovery of services
  • 9. DySCO concept • Integrate social content mining, search and intelligent presentation in a personalized, context and network- aware way, through the new concept of Dynamic Social COntainers (DySCOs) • Composite objects containing a number of items (e.g. articles, tweets, images, videos) • Focused on a particular topic of interest (e.g. an event, a story) • Contain all available information about the topic • Metadata can be added dynamically • Ability to search for DySCOS by matching “DySCO features” with “search features” • Ability to check and recommend similar DySCOs recommendations #11
  • 11. #14 “It has changed the way we do news”(MSN) “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “Social media is transforming the way we do journalism” (New York Times) Source: picture alliance / dpa
  • 12. #15
  • 13. #16 Source: Getty Images “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC) “Things that aren’t relevant crowd out the content you are looking for” (MSN) “The filters aren’t configurable enough” (CNN)
  • 14. Verification was simpler in the past... Source: Frank Grätz #17
  • 15. #18 An example: BBC Verification Procedure: Arab Spring Coverage • Referencing locations against maps and existing images from, in particular, geo-located ones. • Working with our colleagues in BBC Arabic and BBC Monitoring to ascertain that accents and language are correct for the location. • Searching for the original source of the upload/sequences as an indicator of date. • Examining weather reports and shadows to confirm that the conditions shown fit with the claimed date and time. • Maintaining lists of previously verified material to act as reference for colleagues covering the stories. • Checking scenery, weaponry, vehicles and licence plates against those known for the given country.
  • 16. News application • Real-time Search/browse news items crawled from different social media • Automatically discovered trending topics • Web analytics • Sentiment scores for topics #19
  • 17. Alethiometer • Measuring the degree of truth behind tweets • Overall trust score for a tweet • Various Contributor, Content, Context validity metrics #20
  • 18. #21 Social Multimedia Crawling & Mining Case study: #OccupyGezi E. Schinas, S. Papadopoulos, I. Tsampoulatidis, K. Iliakopoulou, Y. Kompatsiaris
  • 19. #22 Multimedia crawling & mining • Monitor/query multiple sources for shared media content: Twitter, Facebook, Flickr, YouTube, etc. • Multiple indexing schemes: – text-based (Solr) – visual content-based (SURF+VLAD for feature extraction, ADC for similarity-based indexing) • Clustering – geo-spatial (BIRCH) – visual (SCAN) • Web-based presentation of results
  • 20. #23 Crawling & mining system deployment StreamManager Twitter Facebook Flickr YouTube RSS Instagram 160.xx.xx.207 MongoDBWrapper 160.xx.xx.207 TextIndexer (Solr) 160.xx.xx.207 160.xx.xx.207 MediaFetcher, FeatureExtractor (HDFS) 160.xx.xx.58 160.xx.xx.107 Social Focused Crawler (HDFS) 160.xx.xx.187 Nutch Nutch VLAD FeatureIndexer (HDFS) 160.xx.xx.207 IVFADC Data Mining 160.xx.xx.191 Visual Clust. Geo Clust. Statistics Web server 160.xx.xx.116 API (3)API (4) API (1) API (2)
  • 21. #24 #OccupyGezi • Monitors: Keywords: gezipark, taksimgezipark, Taksim, Taksim Gezi Park Location: Istanbul • Current statistics:
  • 22. #25 Geographical spread of event • Seems like it is not a localized event (as several official Turkish news sources claimed), but spreads all over Turkey and even in major cities abroad
  • 24. #27 Trending media by use of clustering
  • 28. #32 Capturing & mining large-scale events • Large-scale events  attended by thousands of people  captured by mobile devices in the form of status updates, photos, ratings, etc. • Challenge: – Organize information around entities of interest – Extract meaningful insights, obtain informative summaries • EventSense framework
  • 29. #33 Infotainment • Thessaloniki International Film Festival – 80,000 viewers / 100,000 visitors in 10 days – 150 films, 350 screenings • Fete de la Musique Berlin – 100,000 visitors every year – 5,000 musicians
  • 30. #35 ThessFest ThessFest• Thessaloniki International Film Festival • Support twitter/comment usage within the app • Ratings and comments per film • Feedback aggregation – Votes – Tweets • Real-time feedback to the organisation and visitors
  • 31. #36 ThessFest • Gather “realistic” user requirements • Early showcase and evaluation of SocialSensor technologies in real-world event scale • Engage users and create an informed user basis • TDF14, 15 + TIFF53 – 1400+ users – 40K+ user sessions – positive response to social media • Next version – Updated features bases on SocialSensor prototype
  • 32. Fête de la Musique Berlin app • FETEberlin in App Store and Google Play • More than 100K visitors • About 5K musicians • More than 5K app downloads App features •Browse and filter detailed program •Interactive maps and routing •Social Sharing •Artists’ and Stages Details •Social Monitoring Main benefits for attendants •Visitors can browse through maps and don’t get lost as stages are numerous •Event schedule is available always and per stage – Very useful when the server was down and there was no access to the online schedule #37
  • 33. Fête de la Musique Berlin app
  • 34. FETEberlin Facts & User Feedback Unique Users Sessions Frequency of Use App Store 2904 13751 2,5 sessions per day Google Play 2210 12097 2,9 sessions per day Total 5114 25848 Avg: 2,7 sessions per day Future Plans •Enhanced Event and Visitor Engagement •Send Last minute updates •Create buzz around the event and make users the event ambassadors •Gain insightful knowledge on the impact that the event made via social media analysis – Organize better events #39
  • 35. #40 EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams Case study: Thessaloniki International Film Festival E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, Y. Mass, J. Herzig, L. Boudakidis
  • 36. #41 Entity Detection • Entities are defined as lists of properties: – a film consists of a title, description, names of director(s)/actors • Matching status updates (tweets) to entities relies on representing both as vectors, using cosine similarity, and thresholding: m: message (tweet), f: feature (term), M: set of all event messages boost(f): boosting factor when f is a named entity
  • 37. #42 Topic detection • For each new message, find the Nearest Neighbour (NN) using Locality Sensitive Hashing (LSH) • If similarity exceeds an empirically selected threshold, assign to the topic of NN, otherwise create new topic • Clusters of one messages are discarded as outliers • Cluster-merging is conducted as a post-processing step to compensate for topic over-segmentation • Similar approach to (Petrovic et al., 2010) S. Petrovic, M. Osborne, V. Lavrenko. Streaming first story detection with application to twitter. In Human Language Technologies, HLT ’10, pages 181–189, Stroudsburg, PA, USA, 2010. ACL
  • 38. #43 Sentiment detection (1/2) • Build positive/negative sentiment classifiers using emoticons • Build neutral classifier using positive/negative classifiers • Feature extraction: – Remove stop words, emoticons, terms occurring only once, trim repeated letters – Negation terms (“not”, “isn’t”) are attached to subsequent terms to form new unigrams (e.g. “nothappy”) – Treat user mentions, URLs, punctuation, repeated letters and all-caps words as additional features
  • 39. #44 Sentiment detection (2/2) • Naïve Bayes classifier per sentiment • P(f|c) estimate using ML and Laplace correction • Probability estimate for special features (e.g. user mentions) using Bernoulli model and Laplace correction • Classification using maximum log-likelihood • Neutral messages: – Mutual Information (MI) – MI of features – Sentiment intensity of message – Use thresholding for decision
  • 40. #45 Evaluation • Case study: 53rd Thessaloniki International Film Festival (TIFF53), Nov 2-11, 2012 • 168 films included in the program (titles, descriptions in Greek and English) • 3974 tweets using #tiff53 • Manual annotation regarding: – film – sentiment (pos/neg/neut) • Additional data using ThessFest mobile app: – #bookmarks per film (number of times a user added the film to their schedule) – #ratings + avg. rating per film
  • 41. #46 Tweet-film matching • film = <title, description, directors, actors> • Multiple entity representations using Greek/English/both, uni-/bi-grams • Similarity threshold sensitivity analysis Pooling multiple representationsthreshold ∈ (0.1, 0.3)
  • 42. #47 Topic analysis • Top-10 topics • Manual inspection of clusters: – 53.8% of topic titles considered informative – 98.5% of clusters were found to be “clean” • Topics in time
  • 43. #48 Sentiment analysis • Training (using emoticons and Twitter API) – 800K positive & negative tweets for English – 12K positive & negative tweets for Greek • Tuning (for threshold) – Manually annotated dataset from Thessaloniki Documentary Festival (similar event) – 325/73/553 in English and 781/216/781 in Greek • Testing – 324/33/724 in English and 901/315/1667 in Greek – Best accuracy (English) ~ 0.75 – Performance in Greek much poorer compared to English  need for richer training corpus pos neg neut
  • 44. #49 Aggregation & summarization (1/2) #T: number of tweets Pol: polarity of film tweets Subj: subjectivity of film tweets R: average rating #R: number of ratings #F: number of times the film was bookmarked • Films with positive polarity are rated higher. • Films that are tweeted a lot are also more likely to be rated. • Films that are tweet a lot are also more likely to be added to the users’ bookmarks.
  • 45. #50 Aggregation & summarization (2/2) Most active & influential Twitter accounts (+sentiment per user) Most shared photos (+number of retweets)
  • 46. Conclusions • Great interest in both use cases • In news social media have transformed both news generation and consumption • Social media data mining can provide interesting results in many applications • Not all data always available (e.g. User queries, fb) – Infrastructure, Policy issues • Technical challenges – Fusion (multi-modality, context), real-time, noise, big data, aggregation (web, Linked Open Data) • Applications challenges • User engagement, visualization, become part of existing workflows, privacy, copyright, commercialization

Notes de l'éditeur

  1. Benefits: (i) Intelligent extraction of objects and events from the social Web, (ii) multimodal indexing and organization, (iii) personalized access and presentation of content (incl. media delivery and caching), and (iv) concrete and real integration of the social dimension of the current Web.
  2. ----- Besprechungsnotizen (03.04.12 14:41) ----- In the course of the project we have interviewed a considerable number of journalists and executives from some of the worlds biggest media outlets like CNN, the BBC, The New York Times and others... Here are some of the quotes. 3x klicken (bis alle 3 Quotes sichtbar). But journalists are not only describing the positive side. There are also huge challenges. And you can see from the following slide what is most challenging...
  3. ----- Besprechungsnotizen (03.04.12 16:44) ----- Or we can turn it the other way round: We have a known source whom the reporter trusts...