SlideShare une entreprise Scribd logo
1  sur  11
Leveraging Usage Data for Linked
                 Data Movie Entity Summarization
Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel

                          USEWOD Workshop, April 17, 2012, Lyon - France




©www.sti-innsbruck.at INNSBRUCK www.sti-innsbruck.at
  Copyright 2012 STI
Overview


1. Problem statement

2. Proposed approach

3. Dataset

4. Preliminary results

5. Conclusion


www.sti-innsbruck.at     2
Problem statement


   • Problem:

   Linked Data entities comprise too much information for a
   user to grasp them quickly.


   • Entity summarization:

   “... we aim at solving this novel problem that we call entity
   summarization to produce a version of the original
   description that is more concise, yet containing sufficient
   information for users to quickly identify the underlying
   entity.” [Cheng et al., 2011]

www.sti-innsbruck.at                                               3
Proposed approach (1)


   • Techniques:
        Item-based collaborative filtering. [Sarwar et al., 2001]
        k-nearest neighbor (kNN).

   • Usage data:
                             Bob   Alice   Marc   Elena   John   Mary

                 Toy Story     1      0      1      0       1      1

                 Heat          0      0      1      1       0      0

                 Jumanji       1      0      1      0       1      0

                 Top Gun       1      0      0      1       1      0

                 The Juror     1      1      0      1       0      0

www.sti-innsbruck.at                                                    4
Proposed approach (2)


 Feature ranking for a specific entity e:

 • First idea:
    – Count shared features in the nearest neighbor set
    – Rank features according to the number of their
      occurrence
    – Problem: many features occur very often e.g.
           (cc:attributionName, “Source: Freebase - The World's database”)
 • Improvement:
    – Introduce TF-IDF to weight the features
        w(e,f) = |neighbor(e,f)| x log (|all()| / |all(f)|)
    – Rank features according to their weight

www.sti-innsbruck.at                                                         5
Dataset


 • Initial datasets:
    – HetRec 2011 (2113 users, 10197 movies, 855598
       ratings)
    – Freebase

 • Identified more than 10000 movies of HetRec 2011 in
   Freebase

 • kNN:
    (fb:en.pulp_fiction, knn:20, fb:en.reservoir_dogs)




www.sti-innsbruck.at                                     6
Preliminary Results (1)




www.sti-innsbruck.at       7
Preliminary Results (2)




www.sti-innsbruck.at       8
Conclusions


 • Preliminary results look promising.

 • Interesting challenges:
    – accounting for numeric values
    – features as a result of property chains

 • Original idea of entity summarization:
   “... not just represent the main themes of the original
      data, but rather, can best identify the underlying
      entity” [Cheng et al., 2011]

 • Restriction to a single domain.
www.sti-innsbruck.at                                         9
Thank you




 andreas.thalhammer@sti2.at
www.sti-innsbruck.at               10
References


 [Cheng et al., 2011] Gong Cheng, Thanh Tran, and Yuzhong Qu. “RELIN:
    relatedness and informativeness-based centrality for entity
    summarization”. In: Proc. of the 10th intl. conf. on the semantic web -
    Volume Part I. ISWC’11. Bonn, Germany: Springer-Verlag, 2011, pp.
    114–129.

 [Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and
    John Reidl. Item-based collaborative filtering recommendation
    algorithms. In proceedings of the 10th intl. conf. on World Wide Web,
    WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM.




www.sti-innsbruck.at                                                          11

Contenu connexe

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

En vedette (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Leveraging Usage Data for Linked Data Movie Entity Summarization

  • 1. Leveraging Usage Data for Linked Data Movie Entity Summarization Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel USEWOD Workshop, April 17, 2012, Lyon - France ©www.sti-innsbruck.at INNSBRUCK www.sti-innsbruck.at Copyright 2012 STI
  • 2. Overview 1. Problem statement 2. Proposed approach 3. Dataset 4. Preliminary results 5. Conclusion www.sti-innsbruck.at 2
  • 3. Problem statement • Problem: Linked Data entities comprise too much information for a user to grasp them quickly. • Entity summarization: “... we aim at solving this novel problem that we call entity summarization to produce a version of the original description that is more concise, yet containing sufficient information for users to quickly identify the underlying entity.” [Cheng et al., 2011] www.sti-innsbruck.at 3
  • 4. Proposed approach (1) • Techniques: Item-based collaborative filtering. [Sarwar et al., 2001] k-nearest neighbor (kNN). • Usage data: Bob Alice Marc Elena John Mary Toy Story 1 0 1 0 1 1 Heat 0 0 1 1 0 0 Jumanji 1 0 1 0 1 0 Top Gun 1 0 0 1 1 0 The Juror 1 1 0 1 0 0 www.sti-innsbruck.at 4
  • 5. Proposed approach (2) Feature ranking for a specific entity e: • First idea: – Count shared features in the nearest neighbor set – Rank features according to the number of their occurrence – Problem: many features occur very often e.g. (cc:attributionName, “Source: Freebase - The World's database”) • Improvement: – Introduce TF-IDF to weight the features w(e,f) = |neighbor(e,f)| x log (|all()| / |all(f)|) – Rank features according to their weight www.sti-innsbruck.at 5
  • 6. Dataset • Initial datasets: – HetRec 2011 (2113 users, 10197 movies, 855598 ratings) – Freebase • Identified more than 10000 movies of HetRec 2011 in Freebase • kNN: (fb:en.pulp_fiction, knn:20, fb:en.reservoir_dogs) www.sti-innsbruck.at 6
  • 9. Conclusions • Preliminary results look promising. • Interesting challenges: – accounting for numeric values – features as a result of property chains • Original idea of entity summarization: “... not just represent the main themes of the original data, but rather, can best identify the underlying entity” [Cheng et al., 2011] • Restriction to a single domain. www.sti-innsbruck.at 9
  • 11. References [Cheng et al., 2011] Gong Cheng, Thanh Tran, and Yuzhong Qu. “RELIN: relatedness and informativeness-based centrality for entity summarization”. In: Proc. of the 10th intl. conf. on the semantic web - Volume Part I. ISWC’11. Bonn, Germany: Springer-Verlag, 2011, pp. 114–129. [Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and John Reidl. Item-based collaborative filtering recommendation algorithms. In proceedings of the 10th intl. conf. on World Wide Web, WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM. www.sti-innsbruck.at 11

Notes de l'éditeur

  1. eachentityisassociatedwith an averageof 192 triples