SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
SOCIAL MEDIA MINING AND
         MULTIMEDIA ANALYSIS
      RESEARCH AND APPLICATIONS


                          Yiannis Kompatsiaris
                 Informatics and Telematics Institute
              Centre for Research and Technology - Hellas

                                      h"p://mklab.i-.gr	
  
                                             	
  

University of Surrey, CVSSP Seminar                           Guildford, 31 July, 2012
Contents
•  Introduction
•  Emergent Semantics from Social Media
     •  Opportunities and Challenges
     •  Applications
•  Research Approaches
     •  Community detection in Social Media
     •  Social media “teacher” of the machine
     •  Concept detection
•  SocialSensor Applications
•  Conclusions - Issues
University of Surrey, CVSSP Seminar             Guildford, 31 July, 2012
Social networks and media
 •  Users upload, tag, share,
    connect and search
      •    Over 800 million unique users visit
           YouTube each month
      •    Over 3 billion hours of video are
           watched each month on YouTube
      •    72 hours of video are uploaded to
           YouTube every minute

 •  Emphasis is on uploading,
    visualization of results and
    interfaces
 •  User engagement
 •  Single media item analysis
 •  Usage of the Collective
    nature of Social Networks

University of Surrey, CVSSP Seminar              Guildford, 31 July, 2012
Web 2.0 Content
  •    Multi-modality: e.g. image + tags, image + video
  •    Rich (Social) Context: spatio-temporal, social
       connections, relations and social graph
  •    Huge volume: Massively produced and shared
  •    Dynamic: Fast updates, real-time, streaming feeds
  •    Multi-source: may be generated by different
       applications, user communities, e.g. delicious,
       StumbleUpon and reddit are all social bookmarking sites
       •  Also connected to other sources (e.g. LOD, web)
  •    Inconsistent quality: noise, spam, ambiguity




University of Surrey, CVSSP Seminar                Guildford, 31 July, 2012
s
                                                   Comm



                                                        Favs



Time                                                     Tags



                                                     Capti
                                                             on




                               User
                              Profile




University of Surrey, CVSSP Seminar     Guildford, 31 July, 2012
Social Web as a graph




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
social#web#as#a#graph#


     announcement&of&Mubarak’s&resigna<on&




     nodes&=&twi+er&users&
     edges&=&retweets&on&#jan25&hashtag&




    h1p://gephi.org/2011/the7egyp9an7revolu9on7on7twi1er/#                10#


University of Surrey, CVSSP Seminar                          Guildford, 31 July, 2012
blogosphere"as"a"graph"



                                      technical&4&gadgets&




       nodes&=&blogs&
                                                                 society&4&poli5cs&
       edges&=&hyperlinks&




   h-p://datamining.typepad.com/gallery/blog8map8gallery.html"                        9"

University of Surrey, CVSSP Seminar                                   Guildford, 31 July, 2012
Two main directions
 •  1. Improve access to social media
      §  Tag refinement, suggestion, propagation, concept
          detection
      §  Result apply to single media items

 •  2. Extract implicit information, capture
    emergent semantics
      §  Exploit explicit and implicit relations
           §  Not explicitly identifiable by users
      §  Data mining, Collective Intelligence

 Scalable approaches taking into account the
 content and social context of social networks
University of Surrey, CVSSP Seminar                   Guildford, 31 July, 2012
Tags everywhere
                           Sharing, describe content and search




University of Surrey, CVSSP Seminar                 Guildford, 31 July, 2012
Very low precision




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Very low recall




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Can we improve things?




     By combining information from many
      photos - tags, it seems that we can
                Stable patterns
         in tagging systems over time


University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
“Single” media item analysis
 •  Use features of large number of similar content
      §  E.g. visual and textual features and similarity
      §  Tag refinement, suggestion, propagation, concept
          detection




University of Surrey, CVSSP Seminar                   Guildford, 31 July, 2012
Social Networks and Collective Intelligence
 •    Social Networks is a data source with an extremely
      dynamic nature that reflects events and the evolution
      of community focus (user’s interests)
 •    Web 2.0 data consists of individually rare but
      collectively frequent events and topics
 •    Potential for much more if we mine the data and their
      relations and exploit them in the right context
 •    Search and Discovery of meaningful topics, entities,
      points of interest, social connections and events
 •    Rather than search for isolated or directly connected
      social media items



University of Surrey, CVSSP Seminar            Guildford, 31 July, 2012
Social Networks and Collective Intelligence
                                      •    “If a group has a means of
                                           aggregating different
                                           opinions, the group
                                           collective solution may well
                                           be smarter than even the
                                           smartest person’s solution”
                                      •    Conditions
                                           •  Diversity (large-scale)
                                           •  Independence
                                           •  Aggregation
                                           •  Motivation for best guess
                                              •  Gamification
University of Surrey, CVSSP Seminar                         Guildford, 31 July, 2012
Social Networks and Collective Intelligence
                                      •    “Social networks have emergent
                                           properties. Emergent properties
                                           are new attributes of a whole
                                           that arise from the interaction
                                           and interconnection of the parts”
                                      •    Emotions, Health, Sexual
                                           relationships do not depend just
                                           on our connections (e.g. number
                                           of them) but on our position -
                                           structure in the social graph
                                           •  Central – Hub
                                           •  Outlier
                                           •  Transitivity (connections
                                              between friends)



University of Surrey, CVSSP Seminar                            Guildford, 31 July, 2012
Extraction of implicit information




  trace Flickr users from a chronologically ordered set of
  geographically referenced photos
  Who are the Italians and who are the Americans?
  MIT SENSEABLE CITY LAB, “The World's eyes”


University of Surrey, CVSSP Seminar                Guildford, 31 July, 2012
What else we can do?
                                       Contribute to our
                                       understanding of
   Tags that are “representative”         the world
    for a geographical area


 •  1. Clustering of photos
      §  K-means, based on their
          location [Kennedy07]
 •  2. Rank each cluster’s tags
 •  3. Get tags above a certain       Representative tags for San
                                      Francisco [Kennedy07]
    threshold



University of Surrey, CVSSP Seminar              Guildford, 31 July, 2012
Sensors and automatically
 user generated content
 Uses the GPS in cellular phones
   to gather traffic information,
   process it, and distribute it
   back to the phones in real
   time

 •  online, real-time data
    processing
 •  privacy-preservation
 •  data efficiency, i.e. not
    requiring excessive cellular
    network                                     Mobile Century Project: http://
                                      traffic.berkeley.edu/mobilecentury.html

University of Surrey, CVSSP Seminar                       Guildford, 31 July, 2012
Applications
 Xin Jin, Andrew Gallagher, Liangliang Cao,
 Jiebo Luo, and Jiawei Han. The wisdom of
 social multimedia: using flickr for
 prediction and forecast, International
 conference on Multimedia (MM '10). ACM.




                                                     Federal Emergency Management Agency
                                                     plans to engage the public more in
                                                     disaster response by sharing data and
                                                     leveraging reports from mobile phones
                                                     and social media


                                              Gogobot: Travel Discovery Goes Social And
                                              Visual ”The service raised $4 million in funding (Google
                                              CEO Eric Schmidt is one of the investors)…This is a $100
                                              billion a year industry in the U.S. It’s something like $350
                                              billion worldwide.”




University of Surrey, CVSSP Seminar                                              Guildford, 31 July, 2012
                                                   21
Social Media as real-time Sensors




 “…if you're more than 100 km away from the
   epicenter [of an earthquake] you can read about
   the quake on twitter before it hits you…”
University of Surrey, CVSSP Seminar     Guildford, 31 July, 2012
Applications
•  Science
     •  Sociology, machine learning, computer vision (annotation)
•  Tourism – Leisure – Culture
     •  Off-the-beaten path POI extraction
•  Marketing
     •  Brand monitoring, personalised ads
•  E-Gov and e-participation
     •  Direct citizens feedback (fixmystreet app)
•  News
     •  Topics, trends event detection
•  Others
     •  Environment, emergency response, energy saving, etc

University of Surrey, CVSSP Seminar                  Guildford, 31 July, 2012
Research Fields and Issues
•  Statistical analysis, machine learning, data mining,
   pattern recognition, social network analysis
     •  Clustering
•  Image, text, video feature extraction and analysis
•  Representation, modeling, data reduction
     •  Graph theory
•    Fusion techniques
•    Stream processing and real-time architectures
•    Performance, scalability
•    Multi-disciplinarity (sociologists)
•    Security, privacy
University of Surrey, CVSSP Seminar        Guildford, 31 July, 2012
Social Media Community
                     Detection




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Examples of Social Media networks
              Folksonomy (Delicious)                                                MetaGraph (Digg)




                                                                         Lin, Y., Sun, J., Castro, P., Konuru, R., Sundaram, H., and
 Mika, P. (2005) Ontologies Are Us: A Unified Model of Social Networks   Kelliher, A. (2009) MetaFac: community discovery via relational
 and Semantics. Proceedings of the 4th International Semantic Web        hypergraph factorization. Proceedings of KDD '09, ACM, pp.
 Conference (ISWC 2005), Springer Berlin / Heidelberg, pp. 522-536       527-536



University of Surrey, CVSSP Seminar                                                                  Guildford, 31 July, 26
                                                                                                                         2012
What is a community in a network?
 Group of vertices that are more densely connected to each
   other than to the rest of the network.
 Multiple definitions to quantify
   communities:
     Fortunato S. (2010) Community detection in graphs. Physics Reports486:
     75-174

     S. Papadopoulos, Y. Kompatsiaris, A. Vakali, P. Spyridonos. “Community
     Detection in Social Media”. In Data Mining and Knowledge Discovery, DOI:
     10.1007/s10618-011-0224-z

                                                                                     intra-community edge




                                                                         inter-community edge

University of Surrey, CVSSP Seminar                                                     Guildford, 31 July, 2012
                                                                                                          27
Subgraphs
                                    k=3)(triangle))   k=4)           k=5)
   •  k"clique)
Each node is
connected to all k-1
nodes



   •  N"clique)                                              N=2)(star))

 N is the length of the
 path allowed to all
 other members
                                                                              2"core)

   •  k"core)             4"core)                                                1"core)

all vertices have            3"core)
degree at least k                                                              0"core)

                                                                                        31)
University of Surrey, CVSSP Seminar                                Guildford, 31 July, 2012
Approach illustration (1/2)
  Two-step process:

  • 1st step:
       (µ, ε) – core detection
  •  2nd step:
     Local expansion

  •  3rd step:
     Characterization of
     remaining vertices as hubs
     or outliers




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Approach illustration (2/2)
 •  Structural	
  similarity	
  +	
  Local	
  
    expansion	
  	
  
         (highly	
  efficient	
  and	
  scalable	
  approach)	
  

 •  Not	
  necessary	
  to	
  know	
  the	
  number	
  

                                                                                                                                +
    	
  of	
  clusters	
  

 •  Noise	
  resilient	
  
         (not	
  all	
  nodes	
  need	
  to	
  be	
  part	
  of	
  a	
  
            community)	
  

 •  Generic	
  approach	
  adaptable	
  to	
  	
  
    	
  many	
  applica-ons	
  
         (depending	
  on	
  node	
  –	
  edge	
  
            representa-on)	
  
   S.	
  Papadopoulos,	
  Y.	
  Kompatsiaris,	
  A.	
  Vakali.	
  “A	
  Graph-­‐based	
  Clustering	
  Scheme	
  for	
  Iden-fying	
  Related	
  Tags	
  in	
  Folksonomies”.
                                                                                                                                                                            	
  
   In	
  Proceedings	
  of	
  DaWaK'10,	
  Springer-­‐Verlag,	
  65-­‐76	
  	
  


University of Surrey, CVSSP Seminar                                                                                                       Guildford, 31 July, 2012
LYCOS iQ Tag Network

                                                       Computers:
                                                       A densely interconnected
                                                       community




                                      History:
                                      A star-shaped
                                      community




University of Surrey, CVSSP Seminar                   Guildford, 31 July, 2012
Hybrid photo Clustering




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
                                                        32
landmark	
  

University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
                                             event	
  
Photo clustering results
 Most clusters correspond to landmarks or events


                                                                 EVENTS
                                                   baptism


      LANDMARKS

                                                   conference




                                                    castels




University of Surrey, CVSSP Seminar                     Guildford, 31 July, 2012
                                                                          34
Sample results:
   [Visual] vs. [Tag] vs. [Visual + Tag]

                  VISUAL
                                      HYBRID




                   TAG




University of Surrey, CVSSP Seminar       Guildford, 31 July, 2012
                                                            35
Numericalonresults: Geospatial clusternot include
   Table 1. Cluster quality comparison between SCAN and k-means approaches. The performance is
   evaluated separately    visual and tag-based features and for multiple values of k. We could
 coherence in the tag cluster comparison because the large number of K led to an estimated
   k-means with K ‫3 ؍‬M
       execution time of over a week.


                                              Geospatial cluster
                                                  coherence
                      Clustering method      (m stands for meters)           Subjective cluster quality
       Cluster type   (number of clusters)   md (m)      sd (m)        P          R          F             
       Visual         SCANVIS (560)           357.1      1185.7      1.000      0.110      0.199          1.000
                      KMVIS,1M (560)         2470.0      1734.4      0.806      0.324      0.462          0.226
                      KMVIS,2M (1,120)       2249.7      1893.7      0.899      0.294      0.443          0.544
                      KMVIS,3M (1,680)       2183.1      2027.4      0.929      0.271      0.420          0.719
       Tag            SCANTAG-C (1,774)       767.4      1712.0      0.898      0.253      0.394          0.642
                      SCANTAG-LSI (4,027)     456.3      1151.1      0.950      0.182      0.306          0.820
                      KMTAG,1M (4,027)        766.8      1762.7      0.848      0.307      0.451          0.564
                      KMTAG,2M (8,054)        563.2      1528.7      0.903      0.258      0.401          0.707


  For 29 landmark clusters, the automatically generated cluster center
     more precise than the ones produced by similarity graphs. We found that the best
     fell on average within 80 meters of the actual landmark position
     k-means clustering. In terms of the GCC mea- information-retrieval performance is achieved
  S. Papadopoulos, C. Zigkolis, Y. are clearly su- by use of the“Cluster-based graph. More spe-
      sure, the SCAN-produced clusters Kompatsiaris, A. Vakali. hybrid similarity Landmark and
      perior to the k-means Tagged Photo Collections”. In IEEE Multimedia Magazine 18(1),
      Event Detection on ones, which indicates cifically, the F-measure of the HYB image clus-
      pp. 52-63, 2011
      better geographical focus and thus better corre- ters was 28.5 percent higher than the one of
     spondence to landmarks and events (which are          VIS clusters and 19.8 percent higher than the
     usually highly localized). The difference in GCC      one of TAG-C clusters. The interannotator
University of Surrey, CVSSPfor visual clusters. The
     is especially pronounced Seminar                                                    Guildford, 31 July, 2012
                                                           agreement for these results was substantial, be- 36
     actual GCC performance of k-means clustering          cause in all cases the obtained -statistic values
clusour.gr	
  applica/on	
  
                                       PHOTOS	
  	
  METADATA	
  
                                                                                                            SPATIAL	
  CLUSTERING	
  +	
  TEMPORAL	
  ANALYSIS	
  
                                      tags:	
  sagrada	
  familia,	
  
                                      cathedral,	
  barcelona	
  
                                          taken:	
  12	
  May	
  2009	
  
                                          lat:	
  41.4036,	
  lon:	
  2.1743	
  



   CLASSIFICATION	
  TO	
  LANDMARKS/EVENTS	
  
   #users	
  /	
  #photos	
  
                                                                                                                       COMMUNITY	
  DETECTION	
  
                                                                                                    ]	
  
                                                                                        0	
  photos
                                                                 50	
  u sers	
  /	
  12
                                                [2	
  years,	
  
                                                                                                                                                                         VISUAL	
  
                                                                                                                                                                         TAG	
  
                                                                                                                                                                         HYBRID	
  
                                                               0	
  photos]	
  
                             [1	
  day,	
  2	
  users	
  /	
  1
                                                                                  dura-on	
  
 S.	
   Papadopoulos,	
   C.	
   Zigkolis,	
   Y.	
   Kompatsiaris,	
   A.	
   Vakali.	
   “Cluster-­‐based	
   Landmark	
   and	
   Event	
   Detec-on	
   on	
   Tagged	
   Photo	
  
 Collec-ons”.	
  In	
  IEEE	
  Mul-media	
  Magazine	
  18(1),	
  pp.	
  52-­‐63,	
  2011	
  


University of Surrey, CVSSP Seminar                                                                                                            Guildford, 31 July, 2012
DIVERSE	
  SET	
  
OF	
  AREA	
  PHOTOS	
  	
  
                                         PHOTO	
  CLUSTER	
  SUMMARY	
  


TIME	
  SLICES	
  




                                         ORIGINAL	
  PHOTO	
  METADATA	
  

PHOTO	
  CLUSTERS	
  
RANKED	
  BY	
                         AREA	
  TAGS	
  
POPULARITY	
  



 University of Surrey, CVSSP Seminar               Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Available	
  on	
  AppStore	
  
                              http://clusttour.gr/itunes	
  




University of Surrey, CVSSP Seminar                            Guildford, 31 July, 2012
Social Media “teacher” of the
                 machine




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Self training + Social Media

                         Manually labelled         High Quality Annotations
                              data                 Expensive to generate


       Enhance training set
       with unlabelled data
           based on the                Train classifier
        classifier’s decision



  Visual +             Apply on unlabelled         Crowdsourcing –
                              data
  Textual                                          social media


University of Surrey, CVSSP Seminar                          Guildford, 31 July, 2012
Challenges
 Region instead of image annotation
      E.g. tags are global annotations, while local ones
        are needed
 Imperfect segmentation
      •  Adaptive size region selection
 •  Visual and textual similarity ambiguity
      •  Fusion of scores




University of Surrey, CVSSP Seminar         Guildford, 31 July, 2012
Proposed approach
 Adapted self training for region selection
      − Initial models are trained using labelled regions
      − The models are applied on regions extracted by
        loosely tagged images (obtained at almost no cost)
           ü Dismiss regions that are relatively too small to be useful
      − Tags add an extra layer of confidence in the
        selection process
           ü Semantic relatedness between concepts and tags is
              calculated using either WordNet or a modified version of
              Google Similarity Distance
      − Select regions based on visual and textual
        information and use them to enhance the positive
        training set

University of Surrey, CVSSP Seminar                      Guildford, 31 July, 2012
Dismiss non-
                                      informative regions




                                                                        Combine
                                                                       Visual and
                                                                        Textual
                                                                      information




  Use the
  selected
samples to
  enhance
     the
  positive
training set




University of Surrey, CVSSP Seminar                         Guildford, 31 July, 2012
Experimental Setup
 •      SAIAPR TC-12 dataset (imageCLEF) 20k manually
        labelled images split into 3 subsets
        •  train 14k images (used for testing the proposed approach
           directly) – 70%
        •  validation 2k images (used as the initial training set) –
           10%
        •  test 4k images (used for evaluation) – 20%
 •      MIRFlickr-1m
        •  1 million loosely tagged images (used for selecting
           regions to enhance the initial classifiers)


      E. Chatzilari, S. Nikolopoulos, Y. Kompatsiaris, J. Kittler. Multi-Modal Region
      Selection Approach for Training Object Detectors, ICMR 2012, Hong Kong -
      China, June 2012

University of Surrey, CVSSP Seminar                                  Guildford, 31 July, 2012
Performance Comparison of Retrained
 models
                                The configuration incorporating both visual and
                                textual information exhibits the highest performance
                                in 44 out of the 63 examined concepts, compared to
                                4 for the typical self training configuration and 15 for
                                the configuration based on the initial classifiers.




              Validation    Visual        Visual*Textual
                                                                The proposed approach for
Without ARD                   4.9                6              adaptive region dismissal greatly
 With ARD
                 5.7
                              5.1                7              increases the performance of the
                                                                resulting classifiers.
University of Surrey, CVSSP Seminar                                                  Guildford, 31 July, 2012
Current work
 •    Application to global (whole image) annotation
 •    Introduction of visual ambiguity for improved selection of
      training samples
 •    Learning of concepts which are visually similar and co-occur
      in images
      •  E.g. “sea” – “sky”
 •    Do not select such training samples




University of Surrey, CVSSP Seminar                Guildford, 31 July, 2012
Semi-supervised machine
       learning for concept detection




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Concept	
  Detec/on	
  

   •  Use	
  of	
  similarity	
  graph	
  structure	
  for	
  machine	
  
      learning	
  
   •  Exploit	
  mul--­‐modal	
  informa-on	
  through	
  different	
  
      fusion	
  techniques	
  


 	
  




University of Surrey, CVSSP Seminar                               Guildford, 31 July, 2012
                                          #54	
  
Spectral	
  Graph	
  Clustering	
  




                         Example:	
  Values	
  of	
  second	
  eigenvector	
  
                            of	
  normalized	
  Laplacian	
  matrix	
  




University of Surrey, CVSSP Seminar                                          Guildford, 31 July, 2012
                                               #55	
  
Fusion	
  (1)	
  




University of Surrey, CVSSP Seminar             Guildford, 31 July, 2012
                                      #56	
  
Fusion	
  (2)	
  




University of Surrey, CVSSP Seminar             Guildford, 31 July, 2012
                                      #57	
  
MIR-­‐Flickr	
  Experimental	
  Results	
  

 25000	
  images	
  +	
  labels,	
  38	
  concepts	
  




University of Surrey, CVSSP Seminar                      Guildford, 31 July, 2012
                                         #58	
  
Proposed	
  Approach	
  Vs.	
  Hare	
  	
  Lewis,	
  2010	
  




University of Surrey, CVSSP Seminar                  Guildford, 31 July, 2012
                                      #59	
  
Proposed	
  Approach	
  Vs.	
  Guillaumin	
  et	
  al.,	
  2010	
  




University of Surrey, CVSSP Seminar                   Guildford, 31 July, 2012
                                      #60	
  
Other	
  relevant	
  approaches	
  
 •  S.	
  Nikolopoulos,	
  E.	
  Giannakidou,	
  I.	
  Kompatsiaris,	
  I.	
  Patras,	
  and	
  A.	
  
    Vakali,	
  “Combining	
  mul/-­‐modal	
  features	
  for	
  social	
  media	
  
    analysis'',	
  in	
  book	
  Social	
  Media	
  Modeling	
  and	
  Compu-ng,	
  Springer	
  
    2011	
  
      •  pLSA-­‐based	
  aspect	
  models	
  to	
  define	
  a	
  latent	
  seman-c	
  space	
  where	
  
         heterogeneous	
  types	
  of	
  informa-on	
  can	
  be	
  effec-vely	
  combined	
  
 •  Georgios	
  Petkos,	
  Symeon	
  Papadopoulos,	
  Yiannis	
  Kompatsiaris,	
  
    “Social	
  Event	
  Detec/on	
  using	
  Mul/modal	
  Clustering	
  and	
  
    Integra/ng	
  Supervisory	
  Signals”,	
  ICMR	
  2012.	
  
 •  	
  E.	
  Spyromitros-­‐Xioufis,	
  S.	
  Papadopoulos,	
  I.	
  Kompatsiaris,	
  G.	
  
    Tsoumakas,	
  I.	
  Vlahavas.	
  An	
  Empirical	
  Study	
  on	
  the	
  Combina/on	
  of
                                                                                              	
  
    SURF	
  Features	
  with	
  VLAD	
  Vectors	
  for	
  Image	
  Search”	
  WIAMIS	
  2012, 	
  
    Dublin,	
  Ireland,	
  May	
  2012	
  

University of Surrey, CVSSP Seminar                                                     Guildford, 31 July, 2012
                                                       #61	
  
VLAD+SIFT	
  vs.	
  VLAD+SURF	
   	
   	
  	
  
 Accuracy	
  vs.	
  dimensionality	
  
 VLAD+SURF	
  improves	
  VLAD+SIFT	
  and	
  FV+SIFT	
  across	
  all	
  dimensions	
  in	
  both	
  
    Holidays	
  and	
  Oxford	
  datasets	
  




 Results	
  in	
  rows	
  star-ng	
  with	
  *	
  are	
  taken	
  from	
  Jégou	
  et	
  al.,	
  2011,	
  	
  hence	
  the	
  missing	
  values	
  for	
  some	
  entries.	
  
 SIFT	
  corresponds	
  	
  to	
  PCA	
  reduced	
  SIFT	
  which	
  yielded	
  beer	
  results	
  than	
  standard	
  SIFT	
  in	
  Jegou	
  et	
  al.,	
  2011	
  

University of Surrey, CVSSP Seminar                                                                                                        Guildford, 31 July, 2012
                                                                                       #62
SocialSensor
          Applications and Use Cases
                      hp://www.socialsensor.eu	
  	
  




University of Surrey, CVSSP Seminar                 Guildford, 31 July, 2012
“Social media is transforming the way we do journalism”
(New York Times)	


  	
  
  	
  
            “Social media is the key place for emerging stories –
            internationally, nationally, locally” (BBC)	




“It has changed the way we do
news”(MSN)


                                                      Source: picture alliance / dpa

 University of Surrey, CVSSP Seminar                Guildford, 31 July, 2012
                                       #64
University of Surrey, CVSSP Seminar         Guildford, 31 July, 2012
                                      #65
“It’s really hard to find the nuggets of useful stuff
            in an ocean of content” (BBC)	


  	
  
  	
  
“Things that aren’t relevant crowd out the content
  	
  
you are looking for” (MSN)	

  	
  
  	
  
  	
                                               “The filters aren’t configurable
                                                   enough” (CNN)	

  	
  
  	
  
  	
  
University of Surrey, CVSSP Seminar                                                                                                                      Guildford, 31 July, 2012
 	
  
                                                                                           #66
         	
     	
     	
     	
     	
     	
       	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Source:	
  Gey	
  Images	
  
Verifica/on	
  was	
  simpler	
  in	
  the	
  past...	
  




Source: Frank Grätz
University of Surrey, CVSSP Seminar                         Guildford, 31 July, 2012
                                      #67
Infotainment	
  
  Events	
  with	
  large	
  numbers	
  
    of	
  visitors	
  
  Thessaloniki	
  Interna-onal	
  
    Film	
  Fes-val	
  	
  
       80,000	
  viewers	
  /	
  100,000	
  
         visitors	
  in	
  10	
  days	
  
       150	
  films,	
  350	
  screenings	
  
  Discovery	
  and	
  presenta-on	
  
    of	
  relevant	
  aggregated	
  
    social	
  media	
  (e.g.	
  film	
  
    ra-ngs	
  from	
  tweets)	
  

University of Surrey, CVSSP Seminar                  Guildford, 31 July, 2012
                                               #68
Conclusions and Issues
•  Social media data mining provides interesting
   results in many applications
•  Not all data always available (e.g. User queries, fb)
     •  Infrastructure
     •  Policy issues
•  Scalability and Real-time approaches
•  Fusion of various modalities
     •  Content, social, temporal, location
•  Linking other sources (web, Linked Open Data)
•  Applications and commercialization
     •  Proven functionality for the organization
     •  User engagement


University of Surrey, CVSSP Seminar                 Guildford, 31 July, 2012
Colleagues
•  Dr. Symeon Papadopoulos
     •    Community detection
     •    Graph-based concept detection
     •    Visual Features
•  Dr. Georgios Petkos
     •    Multimodal event detection
•  Dr. Spiros Nikolopoulos
     •  pLSA fusion
•  Elisavet Chatzilari (PhD Student)
     •    Social media for learning
•  Lefteris Spyromitros (PhD Student)
     •  Visual Features
•  Juxhin Bakalli and Manos Schinas
     •  Applications development (Clusttour and ThessFest)
•  Prof. Athina Vakali (Informatics Dept, AUTh)
     •  Collaboration in Community Detection / Clusttour


University of Surrey, CVSSP Seminar                        Guildford, 31 July, 2012
Thank	
  you!	
  
                             hp://mklab.i-.gr	
  
                                    	
  




University of Surrey, CVSSP Seminar                  Guildford, 31 July, 2012
Scalability Challenges

  •  Network, crawling, data collection
      •     Streaming data
  •  Users
      •     High numbers of users
  •  Processing (e.g. NLP, clustering, etc)
      •     Links
           •    Web sites
           •    Retweets, mentions, etc
            •  Multimedia content (e.g. images, YouTube videos)


University of Surrey, CVSSP Seminar                 Guildford, 31 July, 2012
Some Statistics




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Datasift Architecture




University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Datasift processing
•    Process the whole firehose: +250 MTweets/day
•    40+ services run in the system
•    handling the firehose
•    low latency natural language processing and entity extraction
     on tweets
•    low latency in-line augmentation of tweets
•    low latency handling very large individual filters
•    keeping a history of the firehose by persisting the 1TB of
     data it sends each day
•    allowing analytics to be run on the history of the firehose
•    real-time billing
•    streaming filter results to 1000s of clients
•    http://highscalability.com/blog/2011/11/29/datasift-
     architecture-realtime-datamining-at-120000-tweets-p.html

University of Surrey, CVSSP Seminar                Guildford, 31 July, 2012
Datasift statistics
•  Current Peak Delivery of 120,000 Tweets Per Second
   (260Mbit bandwidth)
•  Performs 250+ million sentiment analysis with sub 100ms
   latency
•  1TB of augmented (includes gender, sentiment, etc) data
   transits the platform daily
•  Data Filtering Nodes Can process up to 10,000 unique
   streams (with peaks of 8000+ tweets running through them
   per second)
•  Can do data-lookup's on 10,000,000+ username lists in real-
   time
•  Links Augmentation Performs 27 million link resolves +
   lookups plus 15+ million full web page aggregations per day.
•  http://highscalability.com/blog/2011/11/29/datasift-
   architecture-realtime-datamining-at-120000-tweets-p.html

University of Surrey, CVSSP Seminar              Guildford, 31 July, 2012
Frameworks
•  MapReduce (Hadoop)
     •  Computation distribution
     •  Batch processing of huge datasets
     •  Parallel processing on large clusters of compute nodes
•  Cassandra, Tokyo Cabinet
     •    Key value stores
     •    Horizontal scaling for many users
     •    Huge Data indexing
     •    Fault tolerance
     •    Not sophisticated query possibilities
•  MongoDB
     •  JSON native support
     •  Large-Scale data storage
•  Memcached
     •  Efficient caching
     •  Clustering

University of Surrey, CVSSP Seminar                        Guildford, 31 July, 2012
Scalability Processing Approaches

 •  Sampling
 •  Dimensionality reduction
      •    E.g. VLAD
 •  Local computations
 •  Iterative scanning/processing
      •    stream based
 •  Multi-level – Hierarchical
 •  Distributed

University of Surrey, CVSSP Seminar   Guildford, 31 July, 2012
Image Representation Approaches

 Bag-Of-Words (BOW)
      The most widely used
      Memory usage and search time are usually
      prohibitive for  10M images
 Vector of Locally Aggregated Descriptors VLAD
      More accurate than BOW for the same representation
      size
      Cheaper to compute
      Dimensionality can be further reduced with PCA
      without noticeable impact in accuracy.

University of Surrey, CVSSP Seminar          Guildford, 31 July, 2012
Experimental Results (holidays
 dataset)
           method k              descriptor dimension MAP
           BOW          1K       SIFT-pca       1K             40.1
           BOW          20K      SIFT-pca       20K            43.7
           BOW          200K     SIFT-pca       200K           54.0
           VLAD         64       SIFT-pca       4096           55.6
           VLAD         64       SIFT           8192           55.2
           VLAD         128      SIFT           16384          56.7
           VLAD         64       SURF           4096           63.2
           VLAD         128      SURF           8192           65.6
           Fisher       64       SIFT-pca       4096           59.7
           Fisher       256      SIFT-pca       16384          62.5


University of Surrey, CVSSP Seminar                               Guildford, 31 July, 2012
                             Eleftherios Spyromitros-Xioufis   1/8/12
Experimental Results (holidays
 dataset)
     method k              descriptor D              D’ (pca) MAP
     BOW          20K      SIFT-pca      20K         512         44.9
                                                     128         45.2
                                                     64          44.4
     VLAD         64       SIFT-pca      4096        512         59.8
                                                     128         55.7
                                                     64          55.3
     VLAD         64       SURF          4096        512         63.4
                                                     128         58.6
                                                     64          55.6
     Fisher       64       SIFT-pca      4096        512         61.0
                                                     128         56.5
                                                     64          52.0
University of Surrey, CVSSP Seminar                              Guildford, 31 July, 2012
                            Eleftherios Spyromitros-Xioufis   1/8/12

Contenu connexe

Similaire à Social Media Mining for Emergent Semantics

Learning, Living and researching in a Networked World
Learning, Living and researching in a Networked WorldLearning, Living and researching in a Networked World
Learning, Living and researching in a Networked WorldTerry Anderson
 
Social media for researchers workshop 071112
Social media for researchers workshop 071112Social media for researchers workshop 071112
Social media for researchers workshop 071112Nicole Beale
 
Learning Ecology Potential Of Google Earth
Learning Ecology Potential Of Google EarthLearning Ecology Potential Of Google Earth
Learning Ecology Potential Of Google EarthGerard Brady
 
Exploring the Digital University
Exploring the Digital University Exploring the Digital University
Exploring the Digital University Sheila MacNeill
 
Social networking as enabler of social responsibility and sustainability
Social networking as enabler of social responsibility and sustainabilitySocial networking as enabler of social responsibility and sustainability
Social networking as enabler of social responsibility and sustainabilityVedran Podobnik
 
Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Sue Beckingham
 
Netnography webinar
Netnography webinarNetnography webinar
Netnography webinarsuresh sood
 
Exploring social theory through enterprise social media (muller, ibm research)
Exploring social theory through enterprise social media (muller, ibm research)Exploring social theory through enterprise social media (muller, ibm research)
Exploring social theory through enterprise social media (muller, ibm research)Michael Muller
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen ScienceAndrea Wiggins
 
Social media for researchers workshop 4th July 2012 University of Southampton
Social media for researchers workshop 4th July 2012 University of SouthamptonSocial media for researchers workshop 4th July 2012 University of Southampton
Social media for researchers workshop 4th July 2012 University of SouthamptonNicole Beale
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abelsfpehar
 
Conole opal
Conole opalConole opal
Conole opalgrainne
 
Dundee symposium 31_may13
Dundee symposium 31_may13Dundee symposium 31_may13
Dundee symposium 31_may13Sheila MacNeill
 
2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and Training2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and TrainingNUI Galway
 
Design is the New Black - How to integrate thoughtful learning design in soci...
Design is the New Black - How to integrate thoughtful learning design in soci...Design is the New Black - How to integrate thoughtful learning design in soci...
Design is the New Black - How to integrate thoughtful learning design in soci...Stella Lee
 
Jacobson and Mackey: Metaliteracy Workshop
Jacobson and Mackey: Metaliteracy Workshop Jacobson and Mackey: Metaliteracy Workshop
Jacobson and Mackey: Metaliteracy Workshop ALATechSource
 
University of North Carolina talk 28 May 2012
University of North Carolina talk 28 May 2012University of North Carolina talk 28 May 2012
University of North Carolina talk 28 May 2012Chris Batt
 
Pres-ACMgroup2012intro-v2-isajahnke
Pres-ACMgroup2012intro-v2-isajahnkePres-ACMgroup2012intro-v2-isajahnke
Pres-ACMgroup2012intro-v2-isajahnkeIsa Jahnke
 

Similaire à Social Media Mining for Emergent Semantics (20)

Learning, Living and researching in a Networked World
Learning, Living and researching in a Networked WorldLearning, Living and researching in a Networked World
Learning, Living and researching in a Networked World
 
Social media for researchers workshop 071112
Social media for researchers workshop 071112Social media for researchers workshop 071112
Social media for researchers workshop 071112
 
Learning Ecology Potential Of Google Earth
Learning Ecology Potential Of Google EarthLearning Ecology Potential Of Google Earth
Learning Ecology Potential Of Google Earth
 
Exploring the Digital University
Exploring the Digital University Exploring the Digital University
Exploring the Digital University
 
Social networking as enabler of social responsibility and sustainability
Social networking as enabler of social responsibility and sustainabilitySocial networking as enabler of social responsibility and sustainability
Social networking as enabler of social responsibility and sustainability
 
We are digital!
We are digital!We are digital!
We are digital!
 
Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence
 
Netnography webinar
Netnography webinarNetnography webinar
Netnography webinar
 
Exploring social theory through enterprise social media (muller, ibm research)
Exploring social theory through enterprise social media (muller, ibm research)Exploring social theory through enterprise social media (muller, ibm research)
Exploring social theory through enterprise social media (muller, ibm research)
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
 
Social media for researchers workshop 4th July 2012 University of Southampton
Social media for researchers workshop 4th July 2012 University of SouthamptonSocial media for researchers workshop 4th July 2012 University of Southampton
Social media for researchers workshop 4th July 2012 University of Southampton
 
Making energy efficiency research relevant
Making energy efficiency research relevant Making energy efficiency research relevant
Making energy efficiency research relevant
 
Lida change-reference-abels
Lida change-reference-abelsLida change-reference-abels
Lida change-reference-abels
 
Conole opal
Conole opalConole opal
Conole opal
 
Dundee symposium 31_may13
Dundee symposium 31_may13Dundee symposium 31_may13
Dundee symposium 31_may13
 
2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and Training2011.10.10 Multi-Disciplinary Research Themes and Training
2011.10.10 Multi-Disciplinary Research Themes and Training
 
Design is the New Black - How to integrate thoughtful learning design in soci...
Design is the New Black - How to integrate thoughtful learning design in soci...Design is the New Black - How to integrate thoughtful learning design in soci...
Design is the New Black - How to integrate thoughtful learning design in soci...
 
Jacobson and Mackey: Metaliteracy Workshop
Jacobson and Mackey: Metaliteracy Workshop Jacobson and Mackey: Metaliteracy Workshop
Jacobson and Mackey: Metaliteracy Workshop
 
University of North Carolina talk 28 May 2012
University of North Carolina talk 28 May 2012University of North Carolina talk 28 May 2012
University of North Carolina talk 28 May 2012
 
Pres-ACMgroup2012intro-v2-isajahnke
Pres-ACMgroup2012intro-v2-isajahnkePres-ACMgroup2012intro-v2-isajahnke
Pres-ACMgroup2012intro-v2-isajahnke
 

Plus de Yiannis Kompatsiaris

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionYiannis Kompatsiaris
 
Social media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and eventsSocial media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and eventsYiannis Kompatsiaris
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingYiannis Kompatsiaris
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionYiannis Kompatsiaris
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and ApplicationsYiannis Kompatsiaris
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaYiannis Kompatsiaris
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Yiannis Kompatsiaris
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Yiannis Kompatsiaris
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...Yiannis Kompatsiaris
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Yiannis Kompatsiaris
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...Yiannis Kompatsiaris
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsYiannis Kompatsiaris
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Yiannis Kompatsiaris
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Yiannis Kompatsiaris
 

Plus de Yiannis Kompatsiaris (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI session
 
Social media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and eventsSocial media mining for sensing and responding to real-world trends and events
Social media mining for sensing and responding to real-world trends and events
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted Living
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event Detection
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and Applications
 
Processing Large Complex Data
Processing Large Complex DataProcessing Large Complex Data
Processing Large Complex Data
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)
 
Dem@care Project Short Overview
Dem@care Project Short OverviewDem@care Project Short Overview
Dem@care Project Short Overview
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Social Media Mining for Emergent Semantics

  • 1. SOCIAL MEDIA MINING AND MULTIMEDIA ANALYSIS RESEARCH AND APPLICATIONS Yiannis Kompatsiaris Informatics and Telematics Institute Centre for Research and Technology - Hellas h"p://mklab.i-.gr     University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 2. Contents •  Introduction •  Emergent Semantics from Social Media •  Opportunities and Challenges •  Applications •  Research Approaches •  Community detection in Social Media •  Social media “teacher” of the machine •  Concept detection •  SocialSensor Applications •  Conclusions - Issues University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 3. Social networks and media •  Users upload, tag, share, connect and search •  Over 800 million unique users visit YouTube each month •  Over 3 billion hours of video are watched each month on YouTube •  72 hours of video are uploaded to YouTube every minute •  Emphasis is on uploading, visualization of results and interfaces •  User engagement •  Single media item analysis •  Usage of the Collective nature of Social Networks University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 4. Web 2.0 Content •  Multi-modality: e.g. image + tags, image + video •  Rich (Social) Context: spatio-temporal, social connections, relations and social graph •  Huge volume: Massively produced and shared •  Dynamic: Fast updates, real-time, streaming feeds •  Multi-source: may be generated by different applications, user communities, e.g. delicious, StumbleUpon and reddit are all social bookmarking sites •  Also connected to other sources (e.g. LOD, web) •  Inconsistent quality: noise, spam, ambiguity University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 5. s Comm Favs Time Tags Capti on User Profile University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 6. Social Web as a graph University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 7. social#web#as#a#graph# announcement&of&Mubarak’s&resigna<on& nodes&=&twi+er&users& edges&=&retweets&on&#jan25&hashtag& h1p://gephi.org/2011/the7egyp9an7revolu9on7on7twi1er/# 10# University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 8. blogosphere"as"a"graph" technical&4&gadgets& nodes&=&blogs& society&4&poli5cs& edges&=&hyperlinks& h-p://datamining.typepad.com/gallery/blog8map8gallery.html" 9" University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 9. Two main directions •  1. Improve access to social media §  Tag refinement, suggestion, propagation, concept detection §  Result apply to single media items •  2. Extract implicit information, capture emergent semantics §  Exploit explicit and implicit relations §  Not explicitly identifiable by users §  Data mining, Collective Intelligence Scalable approaches taking into account the content and social context of social networks University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 10. Tags everywhere Sharing, describe content and search University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 11. Very low precision University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 12. Very low recall University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 13. Can we improve things? By combining information from many photos - tags, it seems that we can Stable patterns in tagging systems over time University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 14. “Single” media item analysis •  Use features of large number of similar content §  E.g. visual and textual features and similarity §  Tag refinement, suggestion, propagation, concept detection University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 15. Social Networks and Collective Intelligence •  Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) •  Web 2.0 data consists of individually rare but collectively frequent events and topics •  Potential for much more if we mine the data and their relations and exploit them in the right context •  Search and Discovery of meaningful topics, entities, points of interest, social connections and events •  Rather than search for isolated or directly connected social media items University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 16. Social Networks and Collective Intelligence •  “If a group has a means of aggregating different opinions, the group collective solution may well be smarter than even the smartest person’s solution” •  Conditions •  Diversity (large-scale) •  Independence •  Aggregation •  Motivation for best guess •  Gamification University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 17. Social Networks and Collective Intelligence •  “Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts” •  Emotions, Health, Sexual relationships do not depend just on our connections (e.g. number of them) but on our position - structure in the social graph •  Central – Hub •  Outlier •  Transitivity (connections between friends) University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 18. Extraction of implicit information trace Flickr users from a chronologically ordered set of geographically referenced photos Who are the Italians and who are the Americans? MIT SENSEABLE CITY LAB, “The World's eyes” University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 19. What else we can do? Contribute to our understanding of Tags that are “representative” the world for a geographical area •  1. Clustering of photos §  K-means, based on their location [Kennedy07] •  2. Rank each cluster’s tags •  3. Get tags above a certain Representative tags for San Francisco [Kennedy07] threshold University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 20. Sensors and automatically user generated content Uses the GPS in cellular phones to gather traffic information, process it, and distribute it back to the phones in real time •  online, real-time data processing •  privacy-preservation •  data efficiency, i.e. not requiring excessive cellular network Mobile Century Project: http:// traffic.berkeley.edu/mobilecentury.html University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 21. Applications Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM. Federal Emergency Management Agency plans to engage the public more in disaster response by sharing data and leveraging reports from mobile phones and social media Gogobot: Travel Discovery Goes Social And Visual ”The service raised $4 million in funding (Google CEO Eric Schmidt is one of the investors)…This is a $100 billion a year industry in the U.S. It’s something like $350 billion worldwide.” University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 21
  • 22. Social Media as real-time Sensors “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…” University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 23. Applications •  Science •  Sociology, machine learning, computer vision (annotation) •  Tourism – Leisure – Culture •  Off-the-beaten path POI extraction •  Marketing •  Brand monitoring, personalised ads •  E-Gov and e-participation •  Direct citizens feedback (fixmystreet app) •  News •  Topics, trends event detection •  Others •  Environment, emergency response, energy saving, etc University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 24. Research Fields and Issues •  Statistical analysis, machine learning, data mining, pattern recognition, social network analysis •  Clustering •  Image, text, video feature extraction and analysis •  Representation, modeling, data reduction •  Graph theory •  Fusion techniques •  Stream processing and real-time architectures •  Performance, scalability •  Multi-disciplinarity (sociologists) •  Security, privacy University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 25. Social Media Community Detection University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 26. Examples of Social Media networks Folksonomy (Delicious) MetaGraph (Digg) Lin, Y., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Mika, P. (2005) Ontologies Are Us: A Unified Model of Social Networks Kelliher, A. (2009) MetaFac: community discovery via relational and Semantics. Proceedings of the 4th International Semantic Web hypergraph factorization. Proceedings of KDD '09, ACM, pp. Conference (ISWC 2005), Springer Berlin / Heidelberg, pp. 522-536 527-536 University of Surrey, CVSSP Seminar Guildford, 31 July, 26 2012
  • 27. What is a community in a network? Group of vertices that are more densely connected to each other than to the rest of the network. Multiple definitions to quantify communities: Fortunato S. (2010) Community detection in graphs. Physics Reports486: 75-174 S. Papadopoulos, Y. Kompatsiaris, A. Vakali, P. Spyridonos. “Community Detection in Social Media”. In Data Mining and Knowledge Discovery, DOI: 10.1007/s10618-011-0224-z intra-community edge inter-community edge University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 27
  • 28. Subgraphs k=3)(triangle)) k=4) k=5) •  k"clique) Each node is connected to all k-1 nodes •  N"clique) N=2)(star)) N is the length of the path allowed to all other members 2"core) •  k"core) 4"core) 1"core) all vertices have 3"core) degree at least k 0"core) 31) University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 29. Approach illustration (1/2) Two-step process: • 1st step: (µ, ε) – core detection •  2nd step: Local expansion •  3rd step: Characterization of remaining vertices as hubs or outliers University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 30. Approach illustration (2/2) •  Structural  similarity  +  Local   expansion     (highly  efficient  and  scalable  approach)   •  Not  necessary  to  know  the  number   +  of  clusters   •  Noise  resilient   (not  all  nodes  need  to  be  part  of  a   community)   •  Generic  approach  adaptable  to      many  applica-ons   (depending  on  node  –  edge   representa-on)   S.  Papadopoulos,  Y.  Kompatsiaris,  A.  Vakali.  “A  Graph-­‐based  Clustering  Scheme  for  Iden-fying  Related  Tags  in  Folksonomies”.   In  Proceedings  of  DaWaK'10,  Springer-­‐Verlag,  65-­‐76     University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 31. LYCOS iQ Tag Network Computers: A densely interconnected community History: A star-shaped community University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 32. Hybrid photo Clustering University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 32
  • 33. landmark   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 event  
  • 34. Photo clustering results Most clusters correspond to landmarks or events EVENTS baptism LANDMARKS conference castels University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 34
  • 35. Sample results: [Visual] vs. [Tag] vs. [Visual + Tag] VISUAL HYBRID TAG University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 35
  • 36. Numericalonresults: Geospatial clusternot include Table 1. Cluster quality comparison between SCAN and k-means approaches. The performance is evaluated separately visual and tag-based features and for multiple values of k. We could coherence in the tag cluster comparison because the large number of K led to an estimated k-means with K ‫3 ؍‬M execution time of over a week. Geospatial cluster coherence Clustering method (m stands for meters) Subjective cluster quality Cluster type (number of clusters) md (m) sd (m) P R F Visual SCANVIS (560) 357.1 1185.7 1.000 0.110 0.199 1.000 KMVIS,1M (560) 2470.0 1734.4 0.806 0.324 0.462 0.226 KMVIS,2M (1,120) 2249.7 1893.7 0.899 0.294 0.443 0.544 KMVIS,3M (1,680) 2183.1 2027.4 0.929 0.271 0.420 0.719 Tag SCANTAG-C (1,774) 767.4 1712.0 0.898 0.253 0.394 0.642 SCANTAG-LSI (4,027) 456.3 1151.1 0.950 0.182 0.306 0.820 KMTAG,1M (4,027) 766.8 1762.7 0.848 0.307 0.451 0.564 KMTAG,2M (8,054) 563.2 1528.7 0.903 0.258 0.401 0.707 For 29 landmark clusters, the automatically generated cluster center more precise than the ones produced by similarity graphs. We found that the best fell on average within 80 meters of the actual landmark position k-means clustering. In terms of the GCC mea- information-retrieval performance is achieved S. Papadopoulos, C. Zigkolis, Y. are clearly su- by use of the“Cluster-based graph. More spe- sure, the SCAN-produced clusters Kompatsiaris, A. Vakali. hybrid similarity Landmark and perior to the k-means Tagged Photo Collections”. In IEEE Multimedia Magazine 18(1), Event Detection on ones, which indicates cifically, the F-measure of the HYB image clus- pp. 52-63, 2011 better geographical focus and thus better corre- ters was 28.5 percent higher than the one of spondence to landmarks and events (which are VIS clusters and 19.8 percent higher than the usually highly localized). The difference in GCC one of TAG-C clusters. The interannotator University of Surrey, CVSSPfor visual clusters. The is especially pronounced Seminar Guildford, 31 July, 2012 agreement for these results was substantial, be- 36 actual GCC performance of k-means clustering cause in all cases the obtained -statistic values
  • 37. clusour.gr  applica/on   PHOTOS    METADATA   SPATIAL  CLUSTERING  +  TEMPORAL  ANALYSIS   tags:  sagrada  familia,   cathedral,  barcelona   taken:  12  May  2009   lat:  41.4036,  lon:  2.1743   CLASSIFICATION  TO  LANDMARKS/EVENTS   #users  /  #photos   COMMUNITY  DETECTION   ]   0  photos 50  u sers  /  12 [2  years,   VISUAL   TAG   HYBRID   0  photos]   [1  day,  2  users  /  1 dura-on   S.   Papadopoulos,   C.   Zigkolis,   Y.   Kompatsiaris,   A.   Vakali.   “Cluster-­‐based   Landmark   and   Event   Detec-on   on   Tagged   Photo   Collec-ons”.  In  IEEE  Mul-media  Magazine  18(1),  pp.  52-­‐63,  2011   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 38. DIVERSE  SET   OF  AREA  PHOTOS     PHOTO  CLUSTER  SUMMARY   TIME  SLICES   ORIGINAL  PHOTO  METADATA   PHOTO  CLUSTERS   RANKED  BY   AREA  TAGS   POPULARITY   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 39. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 40. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 41. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 42. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 43. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 44. Available  on  AppStore   http://clusttour.gr/itunes   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 45. Social Media “teacher” of the machine University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 46. Self training + Social Media Manually labelled High Quality Annotations data Expensive to generate Enhance training set with unlabelled data based on the Train classifier classifier’s decision Visual + Apply on unlabelled Crowdsourcing – data Textual social media University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 47. Challenges Region instead of image annotation E.g. tags are global annotations, while local ones are needed Imperfect segmentation •  Adaptive size region selection •  Visual and textual similarity ambiguity •  Fusion of scores University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 48. Proposed approach Adapted self training for region selection − Initial models are trained using labelled regions − The models are applied on regions extracted by loosely tagged images (obtained at almost no cost) ü Dismiss regions that are relatively too small to be useful − Tags add an extra layer of confidence in the selection process ü Semantic relatedness between concepts and tags is calculated using either WordNet or a modified version of Google Similarity Distance − Select regions based on visual and textual information and use them to enhance the positive training set University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 49. Dismiss non- informative regions Combine Visual and Textual information Use the selected samples to enhance the positive training set University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 50. Experimental Setup •  SAIAPR TC-12 dataset (imageCLEF) 20k manually labelled images split into 3 subsets •  train 14k images (used for testing the proposed approach directly) – 70% •  validation 2k images (used as the initial training set) – 10% •  test 4k images (used for evaluation) – 20% •  MIRFlickr-1m •  1 million loosely tagged images (used for selecting regions to enhance the initial classifiers) E. Chatzilari, S. Nikolopoulos, Y. Kompatsiaris, J. Kittler. Multi-Modal Region Selection Approach for Training Object Detectors, ICMR 2012, Hong Kong - China, June 2012 University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 51. Performance Comparison of Retrained models The configuration incorporating both visual and textual information exhibits the highest performance in 44 out of the 63 examined concepts, compared to 4 for the typical self training configuration and 15 for the configuration based on the initial classifiers. Validation Visual Visual*Textual The proposed approach for Without ARD 4.9 6 adaptive region dismissal greatly With ARD 5.7 5.1 7 increases the performance of the resulting classifiers. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 52. Current work •  Application to global (whole image) annotation •  Introduction of visual ambiguity for improved selection of training samples •  Learning of concepts which are visually similar and co-occur in images •  E.g. “sea” – “sky” •  Do not select such training samples University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 53. Semi-supervised machine learning for concept detection University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 54. Concept  Detec/on   •  Use  of  similarity  graph  structure  for  machine   learning   •  Exploit  mul--­‐modal  informa-on  through  different   fusion  techniques     University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #54  
  • 55. Spectral  Graph  Clustering   Example:  Values  of  second  eigenvector   of  normalized  Laplacian  matrix   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #55  
  • 56. Fusion  (1)   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #56  
  • 57. Fusion  (2)   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #57  
  • 58. MIR-­‐Flickr  Experimental  Results   25000  images  +  labels,  38  concepts   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #58  
  • 59. Proposed  Approach  Vs.  Hare    Lewis,  2010   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #59  
  • 60. Proposed  Approach  Vs.  Guillaumin  et  al.,  2010   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #60  
  • 61. Other  relevant  approaches   •  S.  Nikolopoulos,  E.  Giannakidou,  I.  Kompatsiaris,  I.  Patras,  and  A.   Vakali,  “Combining  mul/-­‐modal  features  for  social  media   analysis'',  in  book  Social  Media  Modeling  and  Compu-ng,  Springer   2011   •  pLSA-­‐based  aspect  models  to  define  a  latent  seman-c  space  where   heterogeneous  types  of  informa-on  can  be  effec-vely  combined   •  Georgios  Petkos,  Symeon  Papadopoulos,  Yiannis  Kompatsiaris,   “Social  Event  Detec/on  using  Mul/modal  Clustering  and   Integra/ng  Supervisory  Signals”,  ICMR  2012.   •   E.  Spyromitros-­‐Xioufis,  S.  Papadopoulos,  I.  Kompatsiaris,  G.   Tsoumakas,  I.  Vlahavas.  An  Empirical  Study  on  the  Combina/on  of   SURF  Features  with  VLAD  Vectors  for  Image  Search”  WIAMIS  2012,   Dublin,  Ireland,  May  2012   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #61  
  • 62. VLAD+SIFT  vs.  VLAD+SURF         Accuracy  vs.  dimensionality   VLAD+SURF  improves  VLAD+SIFT  and  FV+SIFT  across  all  dimensions  in  both   Holidays  and  Oxford  datasets   Results  in  rows  star-ng  with  *  are  taken  from  Jégou  et  al.,  2011,    hence  the  missing  values  for  some  entries.   SIFT  corresponds    to  PCA  reduced  SIFT  which  yielded  beer  results  than  standard  SIFT  in  Jegou  et  al.,  2011   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #62
  • 63. SocialSensor Applications and Use Cases hp://www.socialsensor.eu     University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 64. “Social media is transforming the way we do journalism” (New York Times)     “Social media is the key place for emerging stories – internationally, nationally, locally” (BBC) “It has changed the way we do news”(MSN) Source: picture alliance / dpa University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #64
  • 65. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #65
  • 66. “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC)     “Things that aren’t relevant crowd out the content   you are looking for” (MSN)       “The filters aren’t configurable enough” (CNN)       University of Surrey, CVSSP Seminar Guildford, 31 July, 2012   #66                                              Source:  Gey  Images  
  • 67. Verifica/on  was  simpler  in  the  past...   Source: Frank Grätz University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #67
  • 68. Infotainment   Events  with  large  numbers   of  visitors   Thessaloniki  Interna-onal   Film  Fes-val     80,000  viewers  /  100,000   visitors  in  10  days   150  films,  350  screenings   Discovery  and  presenta-on   of  relevant  aggregated   social  media  (e.g.  film   ra-ngs  from  tweets)   University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 #68
  • 69. Conclusions and Issues •  Social media data mining provides interesting results in many applications •  Not all data always available (e.g. User queries, fb) •  Infrastructure •  Policy issues •  Scalability and Real-time approaches •  Fusion of various modalities •  Content, social, temporal, location •  Linking other sources (web, Linked Open Data) •  Applications and commercialization •  Proven functionality for the organization •  User engagement University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 70. Colleagues •  Dr. Symeon Papadopoulos •  Community detection •  Graph-based concept detection •  Visual Features •  Dr. Georgios Petkos •  Multimodal event detection •  Dr. Spiros Nikolopoulos •  pLSA fusion •  Elisavet Chatzilari (PhD Student) •  Social media for learning •  Lefteris Spyromitros (PhD Student) •  Visual Features •  Juxhin Bakalli and Manos Schinas •  Applications development (Clusttour and ThessFest) •  Prof. Athina Vakali (Informatics Dept, AUTh) •  Collaboration in Community Detection / Clusttour University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 71. Thank  you!   hp://mklab.i-.gr     University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 72. Scalability Challenges •  Network, crawling, data collection •  Streaming data •  Users •  High numbers of users •  Processing (e.g. NLP, clustering, etc) •  Links •  Web sites •  Retweets, mentions, etc •  Multimedia content (e.g. images, YouTube videos) University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 73. Some Statistics University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 74. Datasift Architecture University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 75. Datasift processing •  Process the whole firehose: +250 MTweets/day •  40+ services run in the system •  handling the firehose •  low latency natural language processing and entity extraction on tweets •  low latency in-line augmentation of tweets •  low latency handling very large individual filters •  keeping a history of the firehose by persisting the 1TB of data it sends each day •  allowing analytics to be run on the history of the firehose •  real-time billing •  streaming filter results to 1000s of clients •  http://highscalability.com/blog/2011/11/29/datasift- architecture-realtime-datamining-at-120000-tweets-p.html University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 76. Datasift statistics •  Current Peak Delivery of 120,000 Tweets Per Second (260Mbit bandwidth) •  Performs 250+ million sentiment analysis with sub 100ms latency •  1TB of augmented (includes gender, sentiment, etc) data transits the platform daily •  Data Filtering Nodes Can process up to 10,000 unique streams (with peaks of 8000+ tweets running through them per second) •  Can do data-lookup's on 10,000,000+ username lists in real- time •  Links Augmentation Performs 27 million link resolves + lookups plus 15+ million full web page aggregations per day. •  http://highscalability.com/blog/2011/11/29/datasift- architecture-realtime-datamining-at-120000-tweets-p.html University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 77. Frameworks •  MapReduce (Hadoop) •  Computation distribution •  Batch processing of huge datasets •  Parallel processing on large clusters of compute nodes •  Cassandra, Tokyo Cabinet •  Key value stores •  Horizontal scaling for many users •  Huge Data indexing •  Fault tolerance •  Not sophisticated query possibilities •  MongoDB •  JSON native support •  Large-Scale data storage •  Memcached •  Efficient caching •  Clustering University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 78. Scalability Processing Approaches •  Sampling •  Dimensionality reduction •  E.g. VLAD •  Local computations •  Iterative scanning/processing •  stream based •  Multi-level – Hierarchical •  Distributed University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 79. Image Representation Approaches Bag-Of-Words (BOW) The most widely used Memory usage and search time are usually prohibitive for 10M images Vector of Locally Aggregated Descriptors VLAD More accurate than BOW for the same representation size Cheaper to compute Dimensionality can be further reduced with PCA without noticeable impact in accuracy. University of Surrey, CVSSP Seminar Guildford, 31 July, 2012
  • 80. Experimental Results (holidays dataset) method k descriptor dimension MAP BOW 1K SIFT-pca 1K 40.1 BOW 20K SIFT-pca 20K 43.7 BOW 200K SIFT-pca 200K 54.0 VLAD 64 SIFT-pca 4096 55.6 VLAD 64 SIFT 8192 55.2 VLAD 128 SIFT 16384 56.7 VLAD 64 SURF 4096 63.2 VLAD 128 SURF 8192 65.6 Fisher 64 SIFT-pca 4096 59.7 Fisher 256 SIFT-pca 16384 62.5 University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 Eleftherios Spyromitros-Xioufis 1/8/12
  • 81. Experimental Results (holidays dataset) method k descriptor D D’ (pca) MAP BOW 20K SIFT-pca 20K 512 44.9 128 45.2 64 44.4 VLAD 64 SIFT-pca 4096 512 59.8 128 55.7 64 55.3 VLAD 64 SURF 4096 512 63.4 128 58.6 64 55.6 Fisher 64 SIFT-pca 4096 512 61.0 128 56.5 64 52.0 University of Surrey, CVSSP Seminar Guildford, 31 July, 2012 Eleftherios Spyromitros-Xioufis 1/8/12