Leveraging Usage Data for Linked Data Movie Entity Summarization

•

0 j'aime•887 vues

Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more human friendly presentation, these summarizations can play a central role for semantic search engines and semantic recommender systems. In current approaches, it has been tried to apply entity summarization based on patterns that are inherent to the regarded data. The proposed approach of this paper focuses on the movie domain. It utilizes usage data in order to support measuring the similarity between movie entities. Using this similarity it is possible to determine the k-nearest neighbors of an entity. This leads to the idea that features that entities share with their nearest neighbors can be considered as significant or important for these entities. Additionally, we introduce a downgrading factor (similar to TF-IDF) in order to overcome the high number of commonly occurring features. We exemplify the approach based on a movie-ratings dataset that has been linked to Freebase entities. http://arxiv.org/pdf/1204.2718v1

Technologie

Leveraging Usage Data for Linked
Data Movie Entity Summarization
Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel

USEWOD Workshop, April 17, 2012, Lyon - France

©www.sti-innsbruck.at INNSBRUCK www.sti-innsbruck.at
Copyright 2012 STI

Overview

1. Problem statement

2. Proposed approach

3. Dataset

4. Preliminary results

5. Conclusion

www.sti-innsbruck.at 2

Problem statement

• Problem:

Linked Data entities comprise too much information for a
user to grasp them quickly.

• Entity summarization:

“... we aim at solving this novel problem that we call entity
summarization to produce a version of the original
description that is more concise, yet containing sufficient
information for users to quickly identify the underlying
entity.” [Cheng et al., 2011]

www.sti-innsbruck.at 3

Proposed approach (1)

• Techniques:
Item-based collaborative filtering. [Sarwar et al., 2001]
k-nearest neighbor (kNN).

• Usage data:
Bob Alice Marc Elena John Mary

Toy Story 1 0 1 0 1 1

Heat 0 0 1 1 0 0

Jumanji 1 0 1 0 1 0

Top Gun 1 0 0 1 1 0

The Juror 1 1 0 1 0 0

www.sti-innsbruck.at 4

Proposed approach (2)

Feature ranking for a specific entity e:

• First idea:
– Count shared features in the nearest neighbor set
– Rank features according to the number of their
occurrence
– Problem: many features occur very often e.g.
(cc:attributionName, “Source: Freebase - The World's database”)
• Improvement:
– Introduce TF-IDF to weight the features
w(e,f) = |neighbor(e,f)| x log (|all()| / |all(f)|)
– Rank features according to their weight

www.sti-innsbruck.at 5

Dataset

• Initial datasets:
– HetRec 2011 (2113 users, 10197 movies, 855598
ratings)
– Freebase

• Identified more than 10000 movies of HetRec 2011 in
Freebase

• kNN:
(fb:en.pulp_fiction, knn:20, fb:en.reservoir_dogs)

www.sti-innsbruck.at 6

Preliminary Results (1)

www.sti-innsbruck.at 7

Preliminary Results (2)

www.sti-innsbruck.at 8

Conclusions

• Preliminary results look promising.

• Interesting challenges:
– accounting for numeric values
– features as a result of property chains

• Original idea of entity summarization:
“... not just represent the main themes of the original
data, but rather, can best identify the underlying
entity” [Cheng et al., 2011]

• Restriction to a single domain.
www.sti-innsbruck.at 9

Thank you

andreas.thalhammer@sti2.at
www.sti-innsbruck.at 10

References

[Cheng et al., 2011] Gong Cheng, Thanh Tran, and Yuzhong Qu. “RELIN:
relatedness and informativeness-based centrality for entity
summarization”. In: Proc. of the 10th intl. conf. on the semantic web -
Volume Part I. ISWC’11. Bonn, Germany: Springer-Verlag, 2011, pp.
114–129.

[Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and
John Reidl. Item-based collaborative filtering recommendation
algorithms. In proceedings of the 10th intl. conf. on World Wide Web,
WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM.

www.sti-innsbruck.at 11

Contenu connexe

Dernier

Developing An App To Navigate The Roads of BrazilV3cube

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

A Year of the Servo Reboot: Where Are We Now?Igalia

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

GenAI Risks & Security Meetup 01052024.pdflior mazor

Dernier (20)

Developing An App To Navigate The Roads of Brazil

Boost Fertility New Invention Ups Success Rates.pdf

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

A Year of the Servo Reboot: Where Are We Now?

The 7 Things I Know About Cyber Security After 25 Years | April 2024

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

What Are The Drone Anti-jamming Systems Technology?

2024: Domino Containers - The Next Step. News from the Domino Container commu...

How to Troubleshoot Apps for the Modern Connected Worker

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Boost PC performance: How more available memory can improve productivity

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Data Cloud, More than a CDP by Matt Robison

GenAI Risks & Security Meetup 01052024.pdf

En vedette

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

En vedette (20)

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

Leveraging Usage Data for Linked Data Movie Entity Summarization

1. Leveraging Usage Data for Linked Data Movie Entity Summarization Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel USEWOD Workshop, April 17, 2012, Lyon - France ©www.sti-innsbruck.at INNSBRUCK www.sti-innsbruck.at Copyright 2012 STI

2. Overview 1. Problem statement 2. Proposed approach 3. Dataset 4. Preliminary results 5. Conclusion www.sti-innsbruck.at 2

3. Problem statement • Problem: Linked Data entities comprise too much information for a user to grasp them quickly. • Entity summarization: “... we aim at solving this novel problem that we call entity summarization to produce a version of the original description that is more concise, yet containing sufficient information for users to quickly identify the underlying entity.” [Cheng et al., 2011] www.sti-innsbruck.at 3

4. Proposed approach (1) • Techniques: Item-based collaborative filtering. [Sarwar et al., 2001] k-nearest neighbor (kNN). • Usage data: Bob Alice Marc Elena John Mary Toy Story 1 0 1 0 1 1 Heat 0 0 1 1 0 0 Jumanji 1 0 1 0 1 0 Top Gun 1 0 0 1 1 0 The Juror 1 1 0 1 0 0 www.sti-innsbruck.at 4

5. Proposed approach (2) Feature ranking for a specific entity e: • First idea: – Count shared features in the nearest neighbor set – Rank features according to the number of their occurrence – Problem: many features occur very often e.g. (cc:attributionName, “Source: Freebase - The World's database”) • Improvement: – Introduce TF-IDF to weight the features w(e,f) = |neighbor(e,f)| x log (|all()| / |all(f)|) – Rank features according to their weight www.sti-innsbruck.at 5

6. Dataset • Initial datasets: – HetRec 2011 (2113 users, 10197 movies, 855598 ratings) – Freebase • Identified more than 10000 movies of HetRec 2011 in Freebase • kNN: (fb:en.pulp_fiction, knn:20, fb:en.reservoir_dogs) www.sti-innsbruck.at 6

7. Preliminary Results (1) www.sti-innsbruck.at 7

8. Preliminary Results (2) www.sti-innsbruck.at 8

9. Conclusions • Preliminary results look promising. • Interesting challenges: – accounting for numeric values – features as a result of property chains • Original idea of entity summarization: “... not just represent the main themes of the original data, but rather, can best identify the underlying entity” [Cheng et al., 2011] • Restriction to a single domain. www.sti-innsbruck.at 9

10. Thank you andreas.thalhammer@sti2.at www.sti-innsbruck.at 10

11. References [Cheng et al., 2011] Gong Cheng, Thanh Tran, and Yuzhong Qu. “RELIN: relatedness and informativeness-based centrality for entity summarization”. In: Proc. of the 10th intl. conf. on the semantic web - Volume Part I. ISWC’11. Bonn, Germany: Springer-Verlag, 2011, pp. 114–129. [Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and John Reidl. Item-based collaborative filtering recommendation algorithms. In proceedings of the 10th intl. conf. on World Wide Web, WWW ’01, pages 285–295, New York, NY, USA, 2001. ACM. www.sti-innsbruck.at 11

Notes de l'éditeur

eachentityisassociatedwith an averageof 192 triples

Leveraging Usage Data for Linked Data Movie Entity Summarization

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Leveraging Usage Data for Linked Data Movie Entity Summarization

Notes de l'éditeur