SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Comparing social tags to microblogs


   Victoria Lai, Christopher Rajashekar, William Rand
              Modeling Social Media 2011
                    October 9, 2011
Social Tags and Social Media
     Brand manager – what are people saying about a product
      online?
     Goal: See if tags about an album
      reflect Twitter conversations
     Amazon tags
       Where purchases take place
       Easier to collect than tweets




2
Similarity framework S(fa(ta),fw(tw)) > θ
                                                          ta
                               album tweets               all tags
album tags (ta)                                           top ten tags
                               keywords (tw)
                                                          fa
            importance                     importance     tag weights
            measure (fa)                   measure (fw)   fw
                                                          frequency
                                                          tf-idf
 phrase 1   #                   phrase 1   #
 phrase 2   #                   phrase 2   #              S
                                                          Spearman
 phrase 3   #         S > θ?    phrase 3   #
                                                          Kendall tau
      …




                                     …
                                                          Precision
                                                          Recall
Baselines (θ)
 General control
   I, the, and, a, of
   Used in tf-idf
 Music control
   music
   Used as threshold
Relevant Work
 Heymann, Ramage, and Garcia-Molina (2008)
  IR measures
 Eck, Lamere, Bertin-Mahieux, and Green (2007)
  correlation measures
 Wagner and Strohmaier (2010)
  tweet stream properties
 Inouye and Kalita (2011)
  automatic tweet summarization
 Wu, Zhang, and Ostendorf (2010)
  tf-idf on user tweets
Correlations
        Threshold (music control)         Base case                   Best case
         C1: ta = all tags, fw =    C2: ta = all tags, fw = C3: ta = top tags, fw =
Album      freq, tw = music                  freq                   tf-idf
         Spearman       Kendall     Spearman      Kendall Spearman          Kendall
 D1         0.44          0.38         0.29           0.25     0.69           0.43
 D2         0.29          0.24         0.38           0.37     0.78           0.70
 D3         0.24          0.20         0.38           0.33     0.33           0.31
 D4         0.30          0.26         0.40           0.35     0.60           0.51
 J1         0.64          0.55         0.31           0.28     0.31           0.28
 J5         0.20          0.18         0.23           0.18     0.63           0.44
 J6         0.47          0.37         0.28           0.19     0.63           0.45
 F2         0.24         0.20         0.43          0.36       0.30           0.28
                       Shaded – strongest correlation listed
                        C3 Bolded – better than base case
Information Retrieval
            Album    Precision     Precision      Recall
                       (P1)      threshold (P2)
       D1           0.48       0.43             0.002
       D2           0.24       0.62             0.008
       D3           0.29      0.36              0.001
       D4           0.36      0.36              0.0004
       J1           0.20      0.50              0.0003
       J3           0.00      0.75              0.00
       J5           0.57      0.40              0.0002
       J6           0.75      0.38              0.0004
       F1           0.00      0.50              0.00
       F2           0.67      0.59              0.00009
       Average      0.35      0.49              0.001
       HV         0.51        0.45              0.0003
       average
       LV average 0.20        0.53              0.002
Conclusions
 Good proxy for top content when sufficient Twitter activity
 More relevant tags are higher in tweet keyword rankings
 TF-IDF is effective


Next Steps
 Larger dataset
 Analysis over time
 Other sources like LastFM
 Linguistic analysis (clustering, stemming)
 Other user-generated data (e.g. user reviews)
Questions?

Contenu connexe

En vedette

Cbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrilloCbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrillo
Silvia Saldaña
 
Dead rising 2所有武器合成图
Dead rising 2所有武器合成图Dead rising 2所有武器合成图
Dead rising 2所有武器合成图
graystep209
 
Power San Martí
Power  San  MartíPower  San  Martí
Power San Martí
lacala
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
kunlun
 
Le systéme reproducteur clauderic sirois 3
Le systéme reproducteur   clauderic sirois 3Le systéme reproducteur   clauderic sirois 3
Le systéme reproducteur clauderic sirois 3
clasir0182
 
‘Blame’ ts
‘Blame’ ts‘Blame’ ts
‘Blame’ ts
Emel1234
 

En vedette (15)

Cbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrilloCbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrillo
 
PresentaciónTICS Power Point
PresentaciónTICS Power PointPresentaciónTICS Power Point
PresentaciónTICS Power Point
 
Cv michele piersanti_europass
Cv michele piersanti_europassCv michele piersanti_europass
Cv michele piersanti_europass
 
Gestion de projet
Gestion de projetGestion de projet
Gestion de projet
 
Développement des chaînes de traitement d'images GEOSUD
Développement des chaînes de traitement d'images GEOSUDDéveloppement des chaînes de traitement d'images GEOSUD
Développement des chaînes de traitement d'images GEOSUD
 
Dead rising 2所有武器合成图
Dead rising 2所有武器合成图Dead rising 2所有武器合成图
Dead rising 2所有武器合成图
 
Power San Martí
Power  San  MartíPower  San  Martí
Power San Martí
 
Revette Engineering
Revette EngineeringRevette Engineering
Revette Engineering
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Le systéme reproducteur clauderic sirois 3
Le systéme reproducteur   clauderic sirois 3Le systéme reproducteur   clauderic sirois 3
Le systéme reproducteur clauderic sirois 3
 
Santjordi2014 copy
Santjordi2014 copySantjordi2014 copy
Santjordi2014 copy
 
‘Blame’ ts
‘Blame’ ts‘Blame’ ts
‘Blame’ ts
 
Equity research report Ways2Capital 22 june 2015
Equity research report Ways2Capital 22 june 2015 Equity research report Ways2Capital 22 june 2015
Equity research report Ways2Capital 22 june 2015
 
Nba hoopz manual ntsc dreamcast
Nba hoopz manual ntsc dreamcastNba hoopz manual ntsc dreamcast
Nba hoopz manual ntsc dreamcast
 
後台的朋友
後台的朋友後台的朋友
後台的朋友
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Comparing social tags to microblogs

  • 1. Comparing social tags to microblogs Victoria Lai, Christopher Rajashekar, William Rand Modeling Social Media 2011 October 9, 2011
  • 2. Social Tags and Social Media  Brand manager – what are people saying about a product online?  Goal: See if tags about an album reflect Twitter conversations  Amazon tags  Where purchases take place  Easier to collect than tweets 2
  • 3. Similarity framework S(fa(ta),fw(tw)) > θ ta album tweets all tags album tags (ta) top ten tags keywords (tw) fa importance importance tag weights measure (fa) measure (fw) fw frequency tf-idf phrase 1 # phrase 1 # phrase 2 # phrase 2 # S Spearman phrase 3 # S > θ? phrase 3 # Kendall tau … … Precision Recall
  • 4. Baselines (θ)  General control  I, the, and, a, of  Used in tf-idf  Music control  music  Used as threshold
  • 5. Relevant Work  Heymann, Ramage, and Garcia-Molina (2008) IR measures  Eck, Lamere, Bertin-Mahieux, and Green (2007) correlation measures  Wagner and Strohmaier (2010) tweet stream properties  Inouye and Kalita (2011) automatic tweet summarization  Wu, Zhang, and Ostendorf (2010) tf-idf on user tweets
  • 6. Correlations Threshold (music control) Base case Best case C1: ta = all tags, fw = C2: ta = all tags, fw = C3: ta = top tags, fw = Album freq, tw = music freq tf-idf Spearman Kendall Spearman Kendall Spearman Kendall D1 0.44 0.38 0.29 0.25 0.69 0.43 D2 0.29 0.24 0.38 0.37 0.78 0.70 D3 0.24 0.20 0.38 0.33 0.33 0.31 D4 0.30 0.26 0.40 0.35 0.60 0.51 J1 0.64 0.55 0.31 0.28 0.31 0.28 J5 0.20 0.18 0.23 0.18 0.63 0.44 J6 0.47 0.37 0.28 0.19 0.63 0.45 F2 0.24 0.20 0.43 0.36 0.30 0.28 Shaded – strongest correlation listed C3 Bolded – better than base case
  • 7. Information Retrieval Album Precision Precision Recall (P1) threshold (P2) D1 0.48 0.43 0.002 D2 0.24 0.62 0.008 D3 0.29 0.36 0.001 D4 0.36 0.36 0.0004 J1 0.20 0.50 0.0003 J3 0.00 0.75 0.00 J5 0.57 0.40 0.0002 J6 0.75 0.38 0.0004 F1 0.00 0.50 0.00 F2 0.67 0.59 0.00009 Average 0.35 0.49 0.001 HV 0.51 0.45 0.0003 average LV average 0.20 0.53 0.002
  • 8. Conclusions  Good proxy for top content when sufficient Twitter activity  More relevant tags are higher in tweet keyword rankings  TF-IDF is effective Next Steps  Larger dataset  Analysis over time  Other sources like LastFM  Linguistic analysis (clustering, stemming)  Other user-generated data (e.g. user reviews)