AWS Community Day CPH - Three problems of Terraform
Live topic generation from event streams
1. Live Topic Generation
from Event Streams
Vuk Milicic, José Luis Redondo Garcia,
Giuseppe Rizzo, Raphaël Troncy, Thomas Steiner
raphael.troncy@eurecom.fr / @rtroncy
3. Media Finder (zooming on media items)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 3
4. Media Finder (timeline view)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 4
5. Media Finder (timeline view)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 5
6. Media Server
Composition of media item extractors (12 SNs)
Rely on search APIs + a fix 30s timeout window to provide results
Fallback on screen scraping when necessary (Twitter ecosystem)
Implemented as a NodeJS server
Serialize results in a common schema (JSON)
22nd World Wide Web Conference (WWW) - Rio de Janeiro15/05/2013 - 6
7. 15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 7
Deep link
Permalink
Clean text for NLP
processing
Aggregate view of ALL
social interactions
12 Social Networks
8. Media Finder Architecture
Media items harvesting using the Media Server
http://eventmedia.eurecom.fr/media-
server/search/{combined}/{term}
https://github.com/vuknje/media-server (@tomayac fork)
Image near de-duplication
DCT signature on image and video frame,
Hamming distance between image pairs
Clustering and disambiguation
Named Entity Extraction using NERD
Topic Generation using LDA
Density-based clustering using OPTICS
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 8
9. Named Entities are Pivotal
http://nerd.eurecom.fr/
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 9
REST API Ontology
Dashboard UI
13. Media Finder (named entities clustering)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 13
14. Media Finder (zooming in a cluster)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 14
15. Summary
Pick an event identified with a hashtag
Use MediaServer to get media items
aggregated over multiple social networks
Use NERD to get entities
aggregated over multiple extractors
Cluster and identify meaningful topics
(aka entities)
with a meaningful label
often disambiguated with a DBpedia URI giving access
to more encyclopedic knowledge
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 15
16. Live Topic Generation from Event Streams
Meet us at WWW 2013 Demo Session, Booth 14
http://www.youtube.com/watch?v=8iRiwz7cDYY
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 16