SlideShare une entreprise Scribd logo
1  sur  80
High Tech Campus, Philips Research
                         Eindhoven, Netherlands




Random Indexing and Quantum
   Negation for TV-Shows
  Retrieval and Classification
             Cataldo Musto, Ph.D. Student
  cataldomusto@di.uniba.it - cataldo.musto@philips.com
           University of Bari “Aldo Moro” (Italy), SWAP Research Group
          Philips Research Center - Eindhoven (Netherlands) - HI&E Group
                                  14.07.11
outline
                •     part 1:    introduction
                     •      information overload, personalization, information filtering, recommender
                            systems

                •     part 2:    approaches
                     •      vector space model, random indexing, quantum negation

                •     part 3:    scenario
                     •      tv-show recommendation, description of the data, description of the tasks

                •     part 4:    experimental evaluation
                     •      results, discussion, future work



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 1: introduction
                                       what are we talking about?




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
TV
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
text messages
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
phone calls
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
internet navigation
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
scenario
                •     Daily interaction with electronic
                      devices

                     •     eMail, Web navigation, Social
                           media, instant messaging


                •     Continuous flow of
                      information

                     •     in 2007, 500.000 terabyte of
                           information have been produced
                           on the Web in one year

                     •     By including also telephone,
                           radio, TV and so on we reach 18
                           exabytes of data!



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
information overload
                •     Consequences:
                      cognitive overload

                     •     It is impossible to
                           effectively deal with
                           this surplus of
                           information

                     •     It is difficult to quickly
                           find the information
                           we really need

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Solution:


personalization
information filtering
                     ”
                     An information filtering system is a
                   system that removes redundant of
                unwanted information from an information
                        stream using automated methods ”
                                                                                                      Wikipedia.


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
information filtering systems

                • How do they work?
                 • Usually, in three steps
                   • Training Step
                   • User Modeling
                   • Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 1:


                                                     Training
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 2:


                                    User Modeling
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Step 3:


                                                    Filtering
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
recommender systems
                •     A specific type of Information Filtering system
                      that attempts                  to recommend
                      information items (films, television, video on
                      demand, music, books,  etc) that are likely to be of
                      interest to the user


                     •      Everyday we interact with recommender
                            systems, even if we do not know it!

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Amazon
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
YouTube
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
recommendation approaches
                •     Content-based filtering
                     •       No interactions between users. Each user is an atomic            entity
                     •       Prerequisite: each item to be recommended has to be described through a                set of
                             textual features
                     •       We store in a user profile the features that often
                             occur in the items she like
                •     Assumption: if a user usually likes items in whose description often occurs a specific feature we
                      can assume that he      will like that items also in the future

                •     e.g.
                     •       If User_A likes a news with the features “Football” and “Internazionale FC” inside
                     •       We can recommend her other news about both Football or Internazionale
                             FC



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 2: approaches
  vector space model, random indexing,quantum negation




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vector space model
                •     Introduced by Salton in
                      1975

                     •     Given a set of M documents
                           (items) d = (d1.....dM)

                     •     Given N features describing
                           the documents

                     •     Each document (item) is
                           represented in a an N-
                           dimensional vector space

                     •     The whole corpus is
                           represented in a N*M matrix
                           called term/document
                           matrix



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vector space model
                     •      VSM in a recommendation scenario
                          •      Document: point in the vector space
                          •      User profile: point in the vector space
                               •      e.g. built as the sum of the vector space representation of the documents
                                      liked in the past by the user
                          •      Goal: to find the documents that are the most relevant ones for that user profile
                          •      Assumption

                               •      the most           similar documents in the vector space are the most
                                      relevant ones

                               •      Cosine Similarity to compute the similarity between query and
                                      documents




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
vsm analysis (2)
                     •      Weak Points
                          •      Not incremental

                               •      The whole Vector Space has to be generated from                      scratch
                                      whenever a new item is added to the repository
                          •      High Dimensionality
                               •      NLP operations (stopwords elimination, stemming and so on)
                          •      Does not manage negative evidence
                               •      The vector space representation only depends on the features that occur in
                                      the document, there are no assumption about the features that don’t occur
                          •      Does not manage the latent semantic of documents

                               •      Any permutation of the terms in a document has                the same
                                      VSM representation!


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
idea
                    • To introduce tools and techniques
                          able to overcome these drawbacks
                         • Random Indexing
                          • Dimensionality reduction technique
                                     Sahlgren, 2005


                         • Quantum Negation
                          • Based on Quantum Logic
                                     Widdows, 2007



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
random indexing
                     •      Random Indexing (RI) is an incremental and effective
                            technique for dimensionality reduction
                     •      Distributional Models
                          •      Assumption: we can infer information about terms
                                 by analyzing how are they used in large corpus of data


                     •      Based on the so-called “Distributional Hypothesis”
                          •      “Words that occur in the same context tend to have
                                 similar meanings”
                          •      “Meaning is its use” (Wittgenstein)


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
how it works?



                                         Random Indexing reduces the original
                                      dimensional term/doc matrix to a new lower
                                                  dimensional matrix




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
how it works?
          •     How?
               •      By multiplying the original
                      matrix with a random
                      one, built in an incremental
                      way
                    •      formally: An,m * Rm,k = Bn,k
                    •      k << m
               •      After projection, the
                      distance between points in
                      the vector space is preserved
                    •      Johnson-Lindenstrauss
                           Lemma
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
random matrix
                      •      How is the random matrix build?
                      •      The whole process is based on the concept of “context”
                           •      Given a term, its “context” could be the whole document, a
                                  paragraph, a sentence, a sliding window of words and so on.

                           •      The definition of the context influences the structure of the
                                  matrix


                      •      The matrix is built in an iterative and incremental way

                           •      The vector representing each document depends on the terms
                                  that occur in it

                           •      The vector representing each term depends on its context




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
item representation
                •     A context vector is assigned for each context (for simplicity, we
                      assume as context the whole document)
                     •      This vector has a fixed dimension (k) and it can contain only values in
                            -1, 0,1. Values are distributed in a random way but the number of non-
                            zero elements is much smaller.

                •     The Vector Space representation of a term is obtained by summing all
                      its context (the documents it occurs in).

                •     The Vector Space representation of a document (item) is
                      obtained by summing the context vectors of the terms that occur in it


                •     Output: lower-dimensional vector space representation
                      based on random context vectors


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
quantum negation
                •     Random Indexing is still not capable of managing negative evidence
                •     RI can be coupled with Quantum Negation (QN) operator
                     •      Definition inherited by Quantum logic

                     •      Negation as a form of orthogonality                                   between
                            vectors
                     •      Given two vectors A e B , we can define the vector A                                  not B
                          •      It represents the projection of the vector A on the subspace
                                 orthogonal to those generated by vector B
                          •      In a recommendation scenario, this operator could be used to
                                 model two vectors, the first one representing positive
                                 evidence and the second one for modeling negative ones


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
...summing up
              •      VSM is an effective model for document retrieval

              •      It can be exploited in recommendation scenarios
              •      It suffers from some well-known drawbacks
              •      Solutions
                    •     Random Indexing is an incremental and effective approach
                          that can catch the high-dimensionality problem
                    •     Quantum Negation can effectively model negative evidence

                    •     The combined use of RI and QN is a good
                          alternative to VSM, especially for real-life scenarios


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 3: scenario
                                       tv-shows recommendation




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Scenario:
                              EPG (Electronic Program Guides)
                                      personalization
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
scenario

                •     Given a set of TV-Shows
                      we want to provide
                      user a set of
                      suggestions about the
                      shows that she should
                      watch, according on her
                      preferences




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
approach

            Currently the recommendation
            model is implemented through
            the Vector Space Model (VSM)


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data
              •     TV shows gathered from a set of
                    47 German-language broadcast
                    channel

              •     Each TV show is described
                    through a set of          textual
                    features (title, synopsis,
                    description, etc.) gathered from an
                    XML feed

              •     Each TV-Show is mapped to a fixed
                    program type (Movie, Sport,
                    Documentary, Magazine, etc.)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
problems
              •      How to represent the data?

                    •     We compared two                            approaches
                         •     Bag of Words (BOW)
                         •     Tag.me

              •      Which ones are the                     typical use cases?
                    •     We identified two tasks
                         •     Classification Task
                         •     Retrieval Task


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data representation
                    • Bag of Words
                     • Each item i is described through the
                               words that appear in the text

                         • Weighting of the words
                          • Counting of the occurrences,
                                     normalization, TF-IDF weighting, etc.



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
BOW representation
                    • To improve BOW representation
                       • Usually textual description are very noisy
                       • Full of uninformative words
                       • Further processing can improve
                                     the classical BOW representation
                                   •      Stopword removal: filtering of all the
                                          uninformative words (articles, adverbs,
                                          adjectives and so on)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
data representation
                     • Tag.me
                      • Online tool developed by the University
                                 of Pisa (Italy)
                          •      Goal: to identify Wikipedia concepts that
                                 occur in the text
                          •      Idea: to process original text through Tag.me
                                 in order to avoid noise and provide a novel
                                 representation based on high-level
                                 Wikipedia concepts


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
tag.me web interface




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
final output
   Bow




  Tag.me



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
description of the tasks
                •     task 1: classification

                     •      Given a flow of TV shows, we would classify
                            them against a the set of program types

                •     task 2: retrieval

                     •      Given a set of program type and a repository
                            of TV shows, we would retrieve the shows
                            that belong to a specific program type

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Steps
                 • 1) Build a vector space for the tv shows
                 • 2) Build a vector for each program type
                 • 3) Use cosine similarity to compare tv shows
                            and program types
                     •      4) Assign the TV show to the program type that got
                            the      highest cosine similarity
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 1: build a vector space
                      representation of the TV-shows
                     •      For each TV show we collected a set of words by
                            using the synopsis and the title of the show
                     •      We filtered out the set of the words through a
                            fixed set of 996                          stopwords for
                            German language
                     •      We calculated the TF-IDF score for each
                            document

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 2: build a vector for each
                      program type
                     • Given the vector space representation of
                            each document
                     • The vector space representation of each
                            program type is the sum of the
                            vector space representations of each tv-
                            show that belongs to that program type

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                •     Given a set of TV-shows

                     •      T=(s1...sn)
                •     Given a set of program types

                     •      P=(t1...tm)
                •     We define a function pt: P T
                     •      It returns the program type of a tv show
                •     We can build the set S(t_i) as the set of the tv-shows that belong to t_i
                     •      It returns the program type of a tv show
                     •

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Given the set
                      S(t_i) with a
                      cardinality of k,
                      the vector space
                      representation of
                      the program
                      type is simply
                      given by

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
VSM for TV shows classification

                • Step 3 and Step 4
                • Given the vector space representation of both
                      program types and tv shows
                     •      Use of cosine similarity to compare each TV
                            shows against the set of the program types
                     •      We assigned the TV show to the program type
                            that got the highest                                cosine similarity

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI for TV shows classification

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the
                            Random Indexing algorithm
                     •      3) Build a vector for each program type on the (reduced)
                            vector space
                     •      4) Use cosine similarity to compare tv shows and
                            program types
                     •      5) Assign the TV show to the program type that got the
                            highest cosine similarity


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI for TV shows retrieval

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the Random
                            Indexing algorithm
                     •      3) Build a positive vector for each program type on the
                            (reduced) vector space
                     •      4) Use cosine similarity to compare tv shows and
                            program types
                     •      5) Rank the tv shows and assign the first N to
                            the program type


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval

                •     Steps
                     •      1) Build a vector space for the tv shows
                     •      2) Reduce the vector space through the Random Indexing
                            algorithm
                     •      3) Build a positive vector for each program type on the
                            (reduced) vector space
                     •      4) Build a negative vector for each program type
                            on the (reduced) vector space
                     •      5) Use cosine similarity to compare tv shows with
                            both positive and negative program types vectors
                     •      6) Rank the tv shows and assign the first N to the program type


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval

                •     Given a set of TV-shows

                     •      T=(s1...sn)
                •     Given a set of program types

                     •      P=(t1...tm)
                •     We define a function pt: P T
                     •      It returns the program type of a tv show
                •     We can build the set S(t_i) as the set of the tv-shows that belong to t_i
                     •      It returns the program type of a tv show
                     •

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
RI+QN for TV shows retrieval
                •     Given the sets S(t_i) and
                      its complement with a
                      cardinality of k and z the
                      vector space
                      representation of the
                      program type is simply
                      given by
                •     The positive and negative
                      vector will be combined in
                      order to emphasize the
                      features that occur in the
                      positive vector and avoid
                      the ones that occur in the
                      negative one


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
...summing up
                •     Classification task
                     •      Comparison of VSM and RI
                     •      We build a vector space
                     •      Applied RI to reduce the vector space
                     •      We tried to classify TV shows in the complete vector space and in the reduced
                            one, comparing the accuracy
                •     Retrieval task
                     •      Comparison of RI and RI+QN
                     •      We build a vector space
                     •      Applied RI to reduce the vector space
                     •      Build both positive and negative program types vectors and applied QN
                     •      We tried to retrieve TV shows and we compared the the RI without negation and
                            the RI with negation



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
part 4: experimental evaluation
                                  results, discussion, future work




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
dataset
                                                                             program
                 tv shows                          133.579                                                          17
                                                                              types

                  features                                                  features
                                                   306,006                                                     74,599
                   (BOW)                                                    (Tag.me)

                                                                               avg
            avg features
                                                      42.11                 features                              9.21
              (BOW)
                                                                            (Tag.me)



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
experimental design
                • 10-fold cross validation
                 • Dataset splitted in 10 partitions
                 • 9 partitions for training the models, the
                            last one for testing

                     • Results averaged over all the
                            partitions

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
metrics

                • classification task
                 • precision =
                • retrieval task
                • precision @n =
                • precision @k% =
C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
tuning of parameters
                •     Random Indexing algorithm
                     •      Dimension of the vectors
                          •      Classification task: 500, 700
                          •      Retrieval task: 500, 1000, 1500, 2000
                     •      Minimum number of occurrences
                          •      Classification task: 2
                          •      Retrieval task: 1, 3
                     •      Training Cycles
                          •      Classification task: 1, 2
                          •      Retrieval task: 1


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task - results
                   size                  occur.                 cycles                 tag.me                      bow

                    500                         2                     1                   37.38                    42.91

                    700                         2                     1                   40.28                    47.76

                    500                         2                     1                   44.61                    54.32

                    700                         2                     1                   45.33                    54.33




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task: comparison
                                                                                                                      68.7

                                                                  54.3                      54.3

                                      47.7
            42.9




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification - results per program type




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
classification task - outcomes
                •     BOW better than Tag.me
                     •      Representation too poor

                     •      Difficult to learn a solid and effective model for text classification
                •     Dimension of the vector space and the second training cycles affect the
                      predictive accuracy
                •     RI does not overcome the baseline

                     •      Vector space reduced          over 99% (from 133579 to 500 or 700)
                     •      Too much loss of information

                     •      but
                          •       Splitting the results for single program types the Random Indexing got better results in
                                  10 out of 17 program types
                          •       Need to investigate the reasons of that



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n

                  82.6%

                  66.3%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n


                                                         65.9%


                                                         45.2%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@n



                                                                                                            58.1%


                                                                                                            36.5%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@k%
                86.0%


                58.1%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - bow - p@k%


                                                                                                               55.4%

                                                                                                               35.4%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n
                  61.9%



                  47.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n


                                                       53.7%



                                                       40.9%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@n


                                                                                                               51.6%



                                                                                                               39.0%



C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@k%
                 76.6%



                 57.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - tagme - p@k%



                                                                                                           49.6%


                                                                                                           35.4%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview
                     82.6%




                     61.9%




C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview



                                                              65.0%


                                                            53.0%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - overview




                                                                                                         58.3%

                                                                                                         53.2%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
retrieval task - outcomes

                • BOW always better than Tag.me
                 • Between 5 and 20% difference
                • Parameters do not affect the accuracy
                • QN operator improves the retrieval
                      accuracy by almost 20%


C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
conclusions & future work
                •     In scenarios where the recommender system has to deal with a continous flow of
                      information the VSM is not suitable

                     •      RI is able to effectively catch typical VSM drawbacks
                               •      Classification task

                                     •     Even if its accuracy is lower, these preliminar results need to be further
                                           investigated, for example testing the algorithm with different values
                                           of the parameters

                                     •     Is a worsening in precision suitable for an algorithm that provides a big
                                           improvement in scalability and efficiency?
                               •      Retrieval Task

                                     •     QN improves the predictive accuracy of the model in the
                                           retrieval tasks

                                     •     Novel operator, this is important outcome with                  a good
                                           scientific impact

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
Thanks for you
                                     attention.




                               Cataldo Musto, Ph.D. Student
                    cataldomusto@di.uniba.it - cataldo.musto@philips.com
                                       University of Bari “Aldo Moro” (Italy), SWAP Research Group
                                      Philips Research Center - Eindhoven (Netherlands) - HI&E Group
                                                                   14.07.11

C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11

Contenu connexe

Similaire à Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification

Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk BollenMaduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollenimec.archive
 
Huawei STW 2018 public
Huawei STW 2018 publicHuawei STW 2018 public
Huawei STW 2018 publicAlan Smeaton
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...Albert Y. C. Chen
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Thilo Stadelmann
 
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
 
Break out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di FioreBreak out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di Fioreimec.archive
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
T bc(김은희)
T bc(김은희)T bc(김은희)
T bc(김은희)eunhui kim
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitiesCUbRIK Project
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: reviewDmytro Mishkin
 

Similaire à Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification (13)

Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk BollenMaduf10 Mobile Tv Or Tv On Mobile   An Jacobs En Dirk Bollen
Maduf10 Mobile Tv Or Tv On Mobile An Jacobs En Dirk Bollen
 
Huawei STW 2018 public
Huawei STW 2018 publicHuawei STW 2018 public
Huawei STW 2018 public
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
 
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...The Opportunities and Challenges of Putting the Latest Computer Vision and De...
The Opportunities and Challenges of Putting the Latest Computer Vision and De...
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
 
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
 
Break out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di FioreBreak out: Project Communication and Dissemination - Fabian Di Fiore
Break out: Project Communication and Dissemination - Fabian Di Fiore
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
T bc(김은희)
T bc(김은희)T bc(김은희)
T bc(김은희)
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 

Plus de Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 

Plus de Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification

  • 1. High Tech Campus, Philips Research Eindhoven, Netherlands Random Indexing and Quantum Negation for TV-Shows Retrieval and Classification Cataldo Musto, Ph.D. Student cataldomusto@di.uniba.it - cataldo.musto@philips.com University of Bari “Aldo Moro” (Italy), SWAP Research Group Philips Research Center - Eindhoven (Netherlands) - HI&E Group 14.07.11
  • 2. outline • part 1: introduction • information overload, personalization, information filtering, recommender systems • part 2: approaches • vector space model, random indexing, quantum negation • part 3: scenario • tv-show recommendation, description of the data, description of the tasks • part 4: experimental evaluation • results, discussion, future work C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 3. part 1: introduction what are we talking about? C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 4. TV C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 5. text messages C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 6. phone calls C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 7. internet navigation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 8. scenario • Daily interaction with electronic devices • eMail, Web navigation, Social media, instant messaging • Continuous flow of information • in 2007, 500.000 terabyte of information have been produced on the Web in one year • By including also telephone, radio, TV and so on we reach 18 exabytes of data! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 9. information overload • Consequences: cognitive overload • It is impossible to effectively deal with this surplus of information • It is difficult to quickly find the information we really need C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 11. information filtering ” An information filtering system is a system that removes redundant of unwanted information from an information stream using automated methods ” Wikipedia. C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 12. information filtering systems • How do they work? • Usually, in three steps • Training Step • User Modeling • Filtering C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 13. Step 1: Training C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 14. Step 2: User Modeling C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 15. Step 3: Filtering C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 16. recommender systems • A specific type of Information Filtering system that attempts to recommend information items (films, television, video on demand, music, books,  etc) that are likely to be of interest to the user • Everyday we interact with recommender systems, even if we do not know it! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 17. Amazon C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 18. YouTube C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 19. recommendation approaches • Content-based filtering • No interactions between users. Each user is an atomic entity • Prerequisite: each item to be recommended has to be described through a set of textual features • We store in a user profile the features that often occur in the items she like • Assumption: if a user usually likes items in whose description often occurs a specific feature we can assume that he will like that items also in the future • e.g. • If User_A likes a news with the features “Football” and “Internazionale FC” inside • We can recommend her other news about both Football or Internazionale FC C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 20. part 2: approaches vector space model, random indexing,quantum negation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 21. vector space model • Introduced by Salton in 1975 • Given a set of M documents (items) d = (d1.....dM) • Given N features describing the documents • Each document (item) is represented in a an N- dimensional vector space • The whole corpus is represented in a N*M matrix called term/document matrix C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 22. vector space model • VSM in a recommendation scenario • Document: point in the vector space • User profile: point in the vector space • e.g. built as the sum of the vector space representation of the documents liked in the past by the user • Goal: to find the documents that are the most relevant ones for that user profile • Assumption • the most similar documents in the vector space are the most relevant ones • Cosine Similarity to compute the similarity between query and documents C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 23. vsm analysis (2) • Weak Points • Not incremental • The whole Vector Space has to be generated from scratch whenever a new item is added to the repository • High Dimensionality • NLP operations (stopwords elimination, stemming and so on) • Does not manage negative evidence • The vector space representation only depends on the features that occur in the document, there are no assumption about the features that don’t occur • Does not manage the latent semantic of documents • Any permutation of the terms in a document has the same VSM representation! C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 24. idea • To introduce tools and techniques able to overcome these drawbacks • Random Indexing • Dimensionality reduction technique Sahlgren, 2005 • Quantum Negation • Based on Quantum Logic Widdows, 2007 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 25. random indexing • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Distributional Models • Assumption: we can infer information about terms by analyzing how are they used in large corpus of data • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 26. how it works? Random Indexing reduces the original dimensional term/doc matrix to a new lower dimensional matrix C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 27. how it works? • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m * Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preserved • Johnson-Lindenstrauss Lemma C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 28. random matrix • How is the random matrix build? • The whole process is based on the concept of “context” • Given a term, its “context” could be the whole document, a paragraph, a sentence, a sliding window of words and so on. • The definition of the context influences the structure of the matrix • The matrix is built in an iterative and incremental way • The vector representing each document depends on the terms that occur in it • The vector representing each term depends on its context C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 29. item representation • A context vector is assigned for each context (for simplicity, we assume as context the whole document) • This vector has a fixed dimension (k) and it can contain only values in -1, 0,1. Values are distributed in a random way but the number of non- zero elements is much smaller. • The Vector Space representation of a term is obtained by summing all its context (the documents it occurs in). • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in it • Output: lower-dimensional vector space representation based on random context vectors C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 30. quantum negation • Random Indexing is still not capable of managing negative evidence • RI can be coupled with Quantum Negation (QN) operator • Definition inherited by Quantum logic • Negation as a form of orthogonality between vectors • Given two vectors A e B , we can define the vector A not B • It represents the projection of the vector A on the subspace orthogonal to those generated by vector B • In a recommendation scenario, this operator could be used to model two vectors, the first one representing positive evidence and the second one for modeling negative ones C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 31. ...summing up • VSM is an effective model for document retrieval • It can be exploited in recommendation scenarios • It suffers from some well-known drawbacks • Solutions • Random Indexing is an incremental and effective approach that can catch the high-dimensionality problem • Quantum Negation can effectively model negative evidence • The combined use of RI and QN is a good alternative to VSM, especially for real-life scenarios C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 32. part 3: scenario tv-shows recommendation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 33. Scenario: EPG (Electronic Program Guides) personalization C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 34. scenario • Given a set of TV-Shows we want to provide user a set of suggestions about the shows that she should watch, according on her preferences C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 35. approach Currently the recommendation model is implemented through the Vector Space Model (VSM) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 36. data • TV shows gathered from a set of 47 German-language broadcast channel • Each TV show is described through a set of textual features (title, synopsis, description, etc.) gathered from an XML feed • Each TV-Show is mapped to a fixed program type (Movie, Sport, Documentary, Magazine, etc.) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 37. problems • How to represent the data? • We compared two approaches • Bag of Words (BOW) • Tag.me • Which ones are the typical use cases? • We identified two tasks • Classification Task • Retrieval Task C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 38. data representation • Bag of Words • Each item i is described through the words that appear in the text • Weighting of the words • Counting of the occurrences, normalization, TF-IDF weighting, etc. C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 39. BOW representation • To improve BOW representation • Usually textual description are very noisy • Full of uninformative words • Further processing can improve the classical BOW representation • Stopword removal: filtering of all the uninformative words (articles, adverbs, adjectives and so on) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 40. data representation • Tag.me • Online tool developed by the University of Pisa (Italy) • Goal: to identify Wikipedia concepts that occur in the text • Idea: to process original text through Tag.me in order to avoid noise and provide a novel representation based on high-level Wikipedia concepts C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 41. tag.me web interface C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 42. final output Bow Tag.me C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 43. description of the tasks • task 1: classification • Given a flow of TV shows, we would classify them against a the set of program types • task 2: retrieval • Given a set of program type and a repository of TV shows, we would retrieve the shows that belong to a specific program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 44. VSM for TV shows classification • Steps • 1) Build a vector space for the tv shows • 2) Build a vector for each program type • 3) Use cosine similarity to compare tv shows and program types • 4) Assign the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 45. VSM for TV shows classification • Step 1: build a vector space representation of the TV-shows • For each TV show we collected a set of words by using the synopsis and the title of the show • We filtered out the set of the words through a fixed set of 996 stopwords for German language • We calculated the TF-IDF score for each document C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 46. VSM for TV shows classification • Step 2: build a vector for each program type • Given the vector space representation of each document • The vector space representation of each program type is the sum of the vector space representations of each tv- show that belongs to that program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 47. VSM for TV shows classification • Given a set of TV-shows • T=(s1...sn) • Given a set of program types • P=(t1...tm) • We define a function pt: P T • It returns the program type of a tv show • We can build the set S(t_i) as the set of the tv-shows that belong to t_i • It returns the program type of a tv show • C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 48. VSM for TV shows classification • Given the set S(t_i) with a cardinality of k, the vector space representation of the program type is simply given by C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 49. VSM for TV shows classification • Step 3 and Step 4 • Given the vector space representation of both program types and tv shows • Use of cosine similarity to compare each TV shows against the set of the program types • We assigned the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 50. RI for TV shows classification • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a vector for each program type on the (reduced) vector space • 4) Use cosine similarity to compare tv shows and program types • 5) Assign the TV show to the program type that got the highest cosine similarity C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 51. RI for TV shows retrieval • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a positive vector for each program type on the (reduced) vector space • 4) Use cosine similarity to compare tv shows and program types • 5) Rank the tv shows and assign the first N to the program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 52. RI+QN for TV shows retrieval • Steps • 1) Build a vector space for the tv shows • 2) Reduce the vector space through the Random Indexing algorithm • 3) Build a positive vector for each program type on the (reduced) vector space • 4) Build a negative vector for each program type on the (reduced) vector space • 5) Use cosine similarity to compare tv shows with both positive and negative program types vectors • 6) Rank the tv shows and assign the first N to the program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 53. RI+QN for TV shows retrieval • Given a set of TV-shows • T=(s1...sn) • Given a set of program types • P=(t1...tm) • We define a function pt: P T • It returns the program type of a tv show • We can build the set S(t_i) as the set of the tv-shows that belong to t_i • It returns the program type of a tv show • C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 54. RI+QN for TV shows retrieval • Given the sets S(t_i) and its complement with a cardinality of k and z the vector space representation of the program type is simply given by • The positive and negative vector will be combined in order to emphasize the features that occur in the positive vector and avoid the ones that occur in the negative one C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 55. ...summing up • Classification task • Comparison of VSM and RI • We build a vector space • Applied RI to reduce the vector space • We tried to classify TV shows in the complete vector space and in the reduced one, comparing the accuracy • Retrieval task • Comparison of RI and RI+QN • We build a vector space • Applied RI to reduce the vector space • Build both positive and negative program types vectors and applied QN • We tried to retrieve TV shows and we compared the the RI without negation and the RI with negation C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 56. part 4: experimental evaluation results, discussion, future work C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 57. dataset program tv shows 133.579 17 types features features 306,006 74,599 (BOW) (Tag.me) avg avg features 42.11 features 9.21 (BOW) (Tag.me) C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 58. experimental design • 10-fold cross validation • Dataset splitted in 10 partitions • 9 partitions for training the models, the last one for testing • Results averaged over all the partitions C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 59. metrics • classification task • precision = • retrieval task • precision @n = • precision @k% = C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 60. tuning of parameters • Random Indexing algorithm • Dimension of the vectors • Classification task: 500, 700 • Retrieval task: 500, 1000, 1500, 2000 • Minimum number of occurrences • Classification task: 2 • Retrieval task: 1, 3 • Training Cycles • Classification task: 1, 2 • Retrieval task: 1 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 61. classification task - results size occur. cycles tag.me bow 500 2 1 37.38 42.91 700 2 1 40.28 47.76 500 2 1 44.61 54.32 700 2 1 45.33 54.33 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 62. classification task: comparison 68.7 54.3 54.3 47.7 42.9 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 63. classification - results per program type C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 64. classification task - outcomes • BOW better than Tag.me • Representation too poor • Difficult to learn a solid and effective model for text classification • Dimension of the vector space and the second training cycles affect the predictive accuracy • RI does not overcome the baseline • Vector space reduced over 99% (from 133579 to 500 or 700) • Too much loss of information • but • Splitting the results for single program types the Random Indexing got better results in 10 out of 17 program types • Need to investigate the reasons of that C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 65. retrieval task - bow - p@n 82.6% 66.3% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 66. retrieval task - bow - p@n 65.9% 45.2% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 67. retrieval task - bow - p@n 58.1% 36.5% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 68. retrieval task - bow - p@k% 86.0% 58.1% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 69. retrieval task - bow - p@k% 55.4% 35.4% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 70. retrieval task - tagme - p@n 61.9% 47.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 71. retrieval task - tagme - p@n 53.7% 40.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 72. retrieval task - tagme - p@n 51.6% 39.0% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 73. retrieval task - tagme - p@k% 76.6% 57.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 74. retrieval task - tagme - p@k% 49.6% 35.4% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 75. retrieval task - overview 82.6% 61.9% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 76. retrieval task - overview 65.0% 53.0% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 77. retrieval task - overview 58.3% 53.2% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 78. retrieval task - outcomes • BOW always better than Tag.me • Between 5 and 20% difference • Parameters do not affect the accuracy • QN operator improves the retrieval accuracy by almost 20% C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 79. conclusions & future work • In scenarios where the recommender system has to deal with a continous flow of information the VSM is not suitable • RI is able to effectively catch typical VSM drawbacks • Classification task • Even if its accuracy is lower, these preliminar results need to be further investigated, for example testing the algorithm with different values of the parameters • Is a worsening in precision suitable for an algorithm that provides a big improvement in scalability and efficiency? • Retrieval Task • QN improves the predictive accuracy of the model in the retrieval tasks • Novel operator, this is important outcome with a good scientific impact C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11
  • 80. Thanks for you attention. Cataldo Musto, Ph.D. Student cataldomusto@di.uniba.it - cataldo.musto@philips.com University of Bari “Aldo Moro” (Italy), SWAP Research Group Philips Research Center - Eindhoven (Netherlands) - HI&E Group 14.07.11 C.Musto: Random Indexing and Quantum Negation for TV shows Retrieval and Classification - Philips Research , Eindhoven (The Netherlands) - 14.07.11

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n