SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
IIR 2010 - First Italian Information Retrieval Workshop
                         Padova, 28 gen 10




                                                   !!"#$%&'
                                                   "!(''
                                                   #&&!))'#$*
                                                   $!+),$#-./#%,$''
                                                   0!)!#+&1'2+,34'
                                                   1546778889*.93$.(#9.:7;)8#4''




An IR-based approach to tag              C. Musto, F. Narducci, P. Lops,
                                          M.de Gemmis, G. Semeraro
recommendation
outline
    • Background

         • Web 2.0 and User-Generated Content

         • Collaborative Tagging Systems

         • Tag Recommendation

    • STaR: Social Tag Recommender System

         • Basic assumptions

         • Architecture

    • Experimental Evaluation

    • Conclusions and future work


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   2
background
    •What is a tag?

    •Where do we use tags?

    •Why do we use tags?

    •Why do we need a tag recommender?

    •How does a tag recommender works?



C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   3
web 2.0
       • Nowadays web sites tend to
             be more and more social
       • Web 2.0 platforms let users
             to publish auto-produced
             content
            • users can post photos,
                  videos
            • users can express opinions
                  (e.g. reviews)
            • users can annotate
                  resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   4
social tagging
    •Users annotate resources
     of interest with free
     keywords, called tags

       • The act of
             collaboratively
             annotate resources
             with tags produces
             a lexical structure
             called folksonomy

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   5
folskonomies
        •    The act of collaboratively annotate resources with tags produces a lexical
             structure called folksonomy
            •     A folksonomy is a set of tags
            •     Usually represented with a Tag Cloud




            •     The more a tag is used by the community to describe a resource, the
                  more is the likelihood that it faithfully describes the information
                  conveyed by the resource

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   6
social tagging systems
    • Advantages

         • Information organized in a way that closely follows the user
           mental model

         • Effective retrieval, serendipitous browsing

    • Disadvantages

         • Tag space usually very noisy

         • Polysemy, synonymy, level variation
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   7
social tagging systems
     • These problems are of hindrance to completely
           exploit the expressive power of folksonomies
        • e.g. ) Searching the resources annotated with the
              tag “Macbook” will exclude the resources
              annotated with the tag “MacBookPro”
     • Folksonomies can’t be exploited for retrieval and
           filtering resources in an effective way
        • Tag Recommenders are more and more required
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   8
tag recommenders: how do they work?
    •A user posts a new resource on a platform

         •e.g. a new bookmark on bibsonomy.org

    •The resource is analyzed

    •A set of (hopefully) relevant tags is produced and filtered

    •The user freely chooses the most appropriate tags to annotate
     the resource


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   9
STaR: Social Tag Recommender System
    •Basic assumptions

         • Resources with similar content should be annotated with
           similar tags

             •Improved retrieval techniques

         • The users previous tagging activity should be taken into
           account

             •Increasing the weight of tags already used to annotate
              similar resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   10
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   11
STaR: indexing strategy
    •Based on Apache Lucene engine

    •A Personal Index for each user

         •Information on her previously tagged resources

    •A Social Index for the whole community

         •Information about all the resources previously tagged by the
          community


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   12
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   13
STaR: retrieval of similar resources
    •Given a resource to be tagged

         •Both the Personal Index and the Social Index queried

         •Lucene Scoring function replaced with the Okapi BM25
          implementation

             •State-of-the-art retrieval model

         •Resources with similarity exceeding a certain threshold
          retrieved

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   14
Retrieval of Similar
                                               STaR                           Resources
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   15
STaR Architecture
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   16
STaR: extraction of candidate tags
    • Extraction of tags from the most similar resources retrieved in the
      previous step

    • Building a set of candidate tags

    • Each tag assigned with a score by weighting the normalized occurence
      of the tag with the similar score returned by Lucene




         • Possible different weights to resources retrieved querying the
           Personal Index or the Social Index


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   17
Tag Extraction
                                                     STaR                      Process        18
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
experimental evaluation
    • Goal

         • To evaluate the accurary of STaR using different Lucene scoring functions
           (Experiment 1)

             • Original vs. BM25

         • To evaluate the best combination of weights for resources retrieved from
           Personal Index and Social Index (Experiment 2)

    • Dataset

         • Gathered from Bibsonomy

         • 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   19
results of experiment 1
      scoring resource precision                                               recall                      f1
        original             bookmark                   25,26                   29,67                   27,29

          bm25               bookmark                    25,62                  36,62                   30,15

        original                BibTex                  14,06                   21,45                   16,99

          bm25                   BibTex                  13,72                  22,91                   17,16

        original                overall                 16,43                   23,58                   19,37

          bm25                  overall                  16,45                  26,46                   20,29

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   20
results of experiment 2
                         social tag        personal tag
    approach              weight              weight          precision               recall                 f1
    community-
      based
                             1,0                 0,0              34,44               35,89               35,15

    user-based                0,0                1,0               44,73              40,53               42,53

     hybrid_1                0,7                 0,3              32,31               38,57               35,16

     hybrid_2                0,5                 0,5              32,36               37,55               34,76

     hybrid_3                0,3                 0,7              35,47               39,68               37,46

     baseline                  -                   -               42,03              13,23               20,13

C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   21
ECML/PKDD Discovery Challenge 2009



   •STaR participated in the ECML/
    PKDD 2009 Discovery Challenge

   •The only Italian team

   •Sixth place in the task of                                                                                We are
    content-based tag                                                                                         there
    recommendation (more than 20
    participants)
C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   22
conclusions
    • Users tend to reuse their own tags to annotate similar resources

    • The integration of a more effective scoring function (BM25) improves the recommender
      accuracy

    • Robust recommendation model

         • Partecipation to the Discovery Challenge @ECML-PKDD 09

    • Future Work

         • Tag extraction from textual content of resources

             • Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset

         • Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy


C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova   23
http://www.di.uniba.it/~swap/

                   Thanks for your attention


     Cataldo Musto
      Ph.D. Student
University of Bari - “Aldo Moro”
              Italy
cataldomusto@di.uniba.it

Contenu connexe

Plus de Cataldo Musto

Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 
Mappare l'Odio - Hate Speech & Social Media
Mappare l'Odio - Hate Speech & Social MediaMappare l'Odio - Hate Speech & Social Media
Mappare l'Odio - Hate Speech & Social MediaCataldo Musto
 
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...Cataldo Musto
 

Plus de Cataldo Musto (20)

Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 
Mappare l'Odio - Hate Speech & Social Media
Mappare l'Odio - Hate Speech & Social MediaMappare l'Odio - Hate Speech & Social Media
Mappare l'Odio - Hate Speech & Social Media
 
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...
A Multi-Criteria Recommender System Exploiting Aspect-based Sentiment Analysi...
 

An IR-based approach to tag recommendation

  • 1. IIR 2010 - First Italian Information Retrieval Workshop Padova, 28 gen 10 !!"#$%&' "!('' #&&!))'#$* $!+),$#-./#%,$'' 0!)!#+&1'2+,34' 1546778889*.93$.(#9.:7;)8#4'' An IR-based approach to tag C. Musto, F. Narducci, P. Lops, M.de Gemmis, G. Semeraro recommendation
  • 2. outline • Background • Web 2.0 and User-Generated Content • Collaborative Tagging Systems • Tag Recommendation • STaR: Social Tag Recommender System • Basic assumptions • Architecture • Experimental Evaluation • Conclusions and future work C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 2
  • 3. background •What is a tag? •Where do we use tags? •Why do we use tags? •Why do we need a tag recommender? •How does a tag recommender works? C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 3
  • 4. web 2.0 • Nowadays web sites tend to be more and more social • Web 2.0 platforms let users to publish auto-produced content • users can post photos, videos • users can express opinions (e.g. reviews) • users can annotate resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 4
  • 5. social tagging •Users annotate resources of interest with free keywords, called tags • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 5
  • 6. folskonomies • The act of collaboratively annotate resources with tags produces a lexical structure called folksonomy • A folksonomy is a set of tags • Usually represented with a Tag Cloud • The more a tag is used by the community to describe a resource, the more is the likelihood that it faithfully describes the information conveyed by the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 6
  • 7. social tagging systems • Advantages • Information organized in a way that closely follows the user mental model • Effective retrieval, serendipitous browsing • Disadvantages • Tag space usually very noisy • Polysemy, synonymy, level variation C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 7
  • 8. social tagging systems • These problems are of hindrance to completely exploit the expressive power of folksonomies • e.g. ) Searching the resources annotated with the tag “Macbook” will exclude the resources annotated with the tag “MacBookPro” • Folksonomies can’t be exploited for retrieval and filtering resources in an effective way • Tag Recommenders are more and more required C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 8
  • 9. tag recommenders: how do they work? •A user posts a new resource on a platform •e.g. a new bookmark on bibsonomy.org •The resource is analyzed •A set of (hopefully) relevant tags is produced and filtered •The user freely chooses the most appropriate tags to annotate the resource C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 9
  • 10. STaR: Social Tag Recommender System •Basic assumptions • Resources with similar content should be annotated with similar tags •Improved retrieval techniques • The users previous tagging activity should be taken into account •Increasing the weight of tags already used to annotate similar resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 10
  • 11. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 11
  • 12. STaR: indexing strategy •Based on Apache Lucene engine •A Personal Index for each user •Information on her previously tagged resources •A Social Index for the whole community •Information about all the resources previously tagged by the community C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 12
  • 13. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 13
  • 14. STaR: retrieval of similar resources •Given a resource to be tagged •Both the Personal Index and the Social Index queried •Lucene Scoring function replaced with the Okapi BM25 implementation •State-of-the-art retrieval model •Resources with similarity exceeding a certain threshold retrieved C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 14
  • 15. Retrieval of Similar STaR Resources C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 15
  • 16. STaR Architecture C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 16
  • 17. STaR: extraction of candidate tags • Extraction of tags from the most similar resources retrieved in the previous step • Building a set of candidate tags • Each tag assigned with a score by weighting the normalized occurence of the tag with the similar score returned by Lucene • Possible different weights to resources retrieved querying the Personal Index or the Social Index C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 17
  • 18. Tag Extraction STaR Process 18 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova
  • 19. experimental evaluation • Goal • To evaluate the accurary of STaR using different Lucene scoring functions (Experiment 1) • Original vs. BM25 • To evaluate the best combination of weights for resources retrieved from Personal Index and Social Index (Experiment 2) • Dataset • Gathered from Bibsonomy • 263,004 bookmark posts, 158,924 BibTeX entries, 3,617 different users C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 19
  • 20. results of experiment 1 scoring resource precision recall f1 original bookmark 25,26 29,67 27,29 bm25 bookmark 25,62 36,62 30,15 original BibTex 14,06 21,45 16,99 bm25 BibTex 13,72 22,91 17,16 original overall 16,43 23,58 19,37 bm25 overall 16,45 26,46 20,29 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 20
  • 21. results of experiment 2 social tag personal tag approach weight weight precision recall f1 community- based 1,0 0,0 34,44 35,89 35,15 user-based 0,0 1,0 44,73 40,53 42,53 hybrid_1 0,7 0,3 32,31 38,57 35,16 hybrid_2 0,5 0,5 32,36 37,55 34,76 hybrid_3 0,3 0,7 35,47 39,68 37,46 baseline - - 42,03 13,23 20,13 C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 21
  • 22. ECML/PKDD Discovery Challenge 2009 •STaR participated in the ECML/ PKDD 2009 Discovery Challenge •The only Italian team •Sixth place in the task of We are content-based tag there recommendation (more than 20 participants) C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 22
  • 23. conclusions • Users tend to reuse their own tags to annotate similar resources • The integration of a more effective scoring function (BM25) improves the recommender accuracy • Robust recommendation model • Partecipation to the Discovery Challenge @ECML-PKDD 09 • Future Work • Tag extraction from textual content of resources • Work in progress: 3% of improvement in f1-measure on the ECML/PKDD 09 dataset • Word Sense Disambiguation algorithms for tackling tag synonymy and polysemy C Musto, F Narducci, M de Gemmis, P Lops, G Semeraro - An IR-based approach to tag recommendation - IIR 2010 - Padova 23
  • 24. http://www.di.uniba.it/~swap/ Thanks for your attention Cataldo Musto Ph.D. Student University of Bari - “Aldo Moro” Italy cataldomusto@di.uniba.it