SlideShare une entreprise Scribd logo
1  sur  24
Penguins in Sweaters, or Serendipitous
Entity Search on User-generated-Content
chenwq
2014/04/16
Mounia Lalmas et al.
(Yahoo! Labs, CIKM 2013 Best Paper )
Mounia Lalmas
@mounialalmas
mounia-lalmas
mounialalmas
Principal Research Scientist at Yahoo! Labs
Professor of Information Retrieval
at the Department of Computer Science at Queen Mary,
University of London
Her research focuses on three main areas:
user engagement
social media and search.
Contents 1/23
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
Why/when do penguins wear sweaters?
Entity Search
Building an entity-driven serendipitous search system based on
enriched entity networks extracted from Wikipedia and Yahoo!
Answers
Serendipity
Finding something good or useful while not specifically
looking for it
Serendipitous search systems provide relevant and
interesting results
2/23
What is entity search
How people become entitiesHow people become entities
3/23
What is entity search
Entities Extraction
Proximity Measure
between two entities
Entities Ranking
according to their
proximity to a query
entity
4/23
What is Serendipity
“making fortunate discoveries by accident”
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems
by coverage and serendipity. IRecSys 2010.
Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
Serendipity = interestingness + relevance
Result interestingness given the query
Personal interest in result
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in
web search. SIGCHI 2009.
5/23
What is Serendipity
Intuition from recsys:
unexpectedness
usefulness u(RSi)
6/23
What connections between entities
do web community knowledge
portals offer?
WHAT
WHY
How do they contribute to an
interesting, serendipitous browsing
experience?
Why/when do penguins wear sweaters?
6/23
Why/when do penguins wear sweaters?
community-driven question & answer
portal
•67M questions & 262M answers
•2 years [2010/2011]
•English-language
community-driven encyclopedia
•3 795 865 articles
•from end of December 2011
•English Wikipedia
minimally curated
opinions, gossip, personal info
variety of points of view
minimally curated
opinions, gossip, personal info
variety of points of view
curated
high-quality knowledge
variety of niche topics
curated
high-quality knowledge
variety of niche topics
7/23
Contents
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
8/23
Entity & Relationship Extraction
Entity defined as any concept having a Wikipedia page
1. Identify surface forms[http]
,
2. resolve to Wikipedia entities[Zhou]
,
3. rank entities using aboutness score[Paranjpe]
;
https://www.otexts.org/node/832
Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd
International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.
Relationship: Cosine similarity of tf/idf vectors
(concatenation of documents where entity appears)
9/23
Entity & Relationship Extraction
Aboutness
Relationship
10/23
Entity Networks
Dataset # Nodes # Edges # Isolated
Yahoo! Answers 896,799 112,595,138 69,856
Wikipedia 1,754,069 237,058,218 82,381
Wikipedia
Yahoo Answers
11/23
Retrieval
Algorithm: Lazy Random walk with restart[Chung]
[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.
12/23
Rank Aggregation
For a given query, combine the results from
different search engines
Simple median-rank aggregation[Sculley]
A B C D E
C D E A B
C A D B E
Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.
13/23
Contents
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis
14/23
Retrieval
Wikipedia Yahoo! Answers Combined
Precision @ 5 0.668 0.724 0.744
MAP 0.716 0.762 0.782
3 label per query-result pair
Yahoo! Answers
Jon Rubinstein
Timothy Cook
Kane Kramer
Steve Wozniak
Jerry York
Wikipedia
System 7
PowerPC G4
SuperDrive
Power Macintosh
Power Computing Corp.
Steve Jobs
 Annotator agreement
(overlap): 85%
 Average overlap in top 5
results: 12%
15/23
What connections between entities
do web community knowledge
portals offer?
WHAT
WHY
How do they contribute to an
interesting, serendipitous browsing
experience?
Why/when do penguins wear sweaters?
16/23
• Sentiment
– using SentiStrength compute positive & negative scores
– compute attitude and sentimentality
– Entity-level scores
• Quality
– Flesch Reading Ease score
Attitude (Polarity) Sentimentality (Strength) Readability
 Topical Category
– Yahoo Content Taxonomy
Entity Networks with Implicit Metadata
17/23
Entity Networks with Metadata
Table 5: Serendipitous across different runs
| relevant & unexpected | / | unexpected |
number of serendipitous results out of all of
the unexpected results retrieved
| relevant & unexpected | / | retrieved |
serendipitous out of all retrieved
18/23
User-perceived Quality
1. Which result is more relevant to the query?
2. If someone is interested in the query, would they also be interested in these
results?
3. Even if you are not interested in the query, are these results interesting to you
personally?
4. Would you learn anything new about the query?
19/23
Entity Networks with Metadata
Data General +Topic
Which result is more WP 0.162 0.194
relevant to the query? YA 0.336 0.374
Comb 0.201 0.222
If someone is interested in WP 0.162 0.176
the query, would they also YA 0.312 0.343
be interested in the result? Comb 0.184 0.222
Even if you are not interested WP 0.139 0.144
in the query, is the result YA 0.324 0.359
interesting to you personally? Comb 0.168 0.198
Would you learn anything WP 0.167 0.164
new about the query from YA 0.307 0.346
this result? Comb 0.184 0.203
Topical
category
constraint
promote results
of same topic
as query entity
Sentiment and
Readability
constraints
hurt performance
Table 6: Similarity (Kendall’s tau-b[Fagin]
) between result sets and reference ranking
Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third
ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.
22/23
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Contenu connexe

Tendances

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching MisinformationScott A. Hale
 
งานคอม
งานคอมงานคอม
งานคอมzaazaa2342
 
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)Jennifer Staley
 
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...bruno_aliprandi
 
Classifying Twitter Content
Classifying Twitter ContentClassifying Twitter Content
Classifying Twitter ContentStephen Dann
 
Snapshot: University of Idaho Library Reference Department
Snapshot: University of Idaho Library Reference DepartmentSnapshot: University of Idaho Library Reference Department
Snapshot: University of Idaho Library Reference DepartmentKristin Henrich
 
Software in the scientific literature: Problems with seeing, finding, and usi...
Software in the scientific literature: Problems with seeing, finding, and usi...Software in the scientific literature: Problems with seeing, finding, and usi...
Software in the scientific literature: Problems with seeing, finding, and usi...James Howison
 
Ibm cognitive seminar march 2015 watsonsim final
Ibm cognitive seminar march 2015  watsonsim finalIbm cognitive seminar march 2015  watsonsim final
Ibm cognitive seminar march 2015 watsonsim finaldiannepatricia
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactPaul Groth
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hinjins0618
 
Cognitive Models in Recommender Systems
Cognitive Models in Recommender SystemsCognitive Models in Recommender Systems
Cognitive Models in Recommender SystemsChristoph Trattner
 
Open Knowledge Extraction at ESWC2016
Open Knowledge Extraction at ESWC2016Open Knowledge Extraction at ESWC2016
Open Knowledge Extraction at ESWC2016Anna Lisa Gentile
 
Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)Mason Porter
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and DataRDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and DataASIS&T
 
Can we use altmetric at institutional level?
Can we use altmetric at institutional level?Can we use altmetric at institutional level?
Can we use altmetric at institutional level?Torres Salinas
 

Tendances (19)

Researching Misinformation
Researching MisinformationResearching Misinformation
Researching Misinformation
 
งานคอม
งานคอมงานคอม
งานคอม
 
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)
Using Semantic Analysis for Curricular Alignment (Sloan-C Presentation)
 
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...
Altmetrics Apps: new approaches to measure the impact of scientific pubblicat...
 
Classifying Twitter Content
Classifying Twitter ContentClassifying Twitter Content
Classifying Twitter Content
 
Snapshot: University of Idaho Library Reference Department
Snapshot: University of Idaho Library Reference DepartmentSnapshot: University of Idaho Library Reference Department
Snapshot: University of Idaho Library Reference Department
 
Software in the scientific literature: Problems with seeing, finding, and usi...
Software in the scientific literature: Problems with seeing, finding, and usi...Software in the scientific literature: Problems with seeing, finding, and usi...
Software in the scientific literature: Problems with seeing, finding, and usi...
 
Webometrics report
Webometrics reportWebometrics report
Webometrics report
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Ibm cognitive seminar march 2015 watsonsim final
Ibm cognitive seminar march 2015  watsonsim finalIbm cognitive seminar march 2015  watsonsim final
Ibm cognitive seminar march 2015 watsonsim final
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impact
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
Cognitive Models in Recommender Systems
Cognitive Models in Recommender SystemsCognitive Models in Recommender Systems
Cognitive Models in Recommender Systems
 
Open Knowledge Extraction at ESWC2016
Open Knowledge Extraction at ESWC2016Open Knowledge Extraction at ESWC2016
Open Knowledge Extraction at ESWC2016
 
Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)Paper Writing in Applied Mathematics (slightly updated slides)
Paper Writing in Applied Mathematics (slightly updated slides)
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and DataRDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data
RDAP 15: Supplemental Files for ETDS: Diversity, Documentation, and Data
 
Clustering
ClusteringClustering
Clustering
 
Can we use altmetric at institutional level?
Can we use altmetric at institutional level?Can we use altmetric at institutional level?
Can we use altmetric at institutional level?
 

Similaire à Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Product ethnography
Product ethnographyProduct ethnography
Product ethnographyAnand jha
 
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender SystemEvaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender SystemErasmo Purificato
 
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...TimDraws
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
EDRD*6000 - SlideShare Presentation - Paul Simon
EDRD*6000 - SlideShare Presentation - Paul SimonEDRD*6000 - SlideShare Presentation - Paul Simon
EDRD*6000 - SlideShare Presentation - Paul SimonPaul Simon
 
Social Information Access: A Personal Update
Social Information Access: A Personal UpdateSocial Information Access: A Personal Update
Social Information Access: A Personal UpdateDaqing He
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523ORCID, Inc
 
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
 
Detection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningDetection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningIRJET Journal
 
Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Mathieu d'Aquin
 
Frances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationFrances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationARLGSW
 
Personal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlPersonal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlFrances Ryan
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...Simon Knight
 
IL Frames webinar2015
IL Frames webinar2015IL Frames webinar2015
IL Frames webinar2015Gingercat1
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfAmber Ford
 
The Human Factor in Digital Recommender Systems
The Human Factor in Digital Recommender SystemsThe Human Factor in Digital Recommender Systems
The Human Factor in Digital Recommender SystemsSIMAdmin
 

Similaire à Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content (20)

Product ethnography
Product ethnographyProduct ethnography
Product ethnography
 
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender SystemEvaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
Evaluating Explainable Interfaces for a Knowledge Graph-Based Recommender System
 
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...
Helping Users Discover Perspectives: Enhancing Opinion Mining with Joint Topi...
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
EDRD*6000 - SlideShare Presentation - Paul Simon
EDRD*6000 - SlideShare Presentation - Paul SimonEDRD*6000 - SlideShare Presentation - Paul Simon
EDRD*6000 - SlideShare Presentation - Paul Simon
 
Social Information Access: A Personal Update
Social Information Access: A Personal UpdateSocial Information Access: A Personal Update
Social Information Access: A Personal Update
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
 
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
 
Detection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningDetection of Fake News Using Machine Learning
Detection of Fake News Using Machine Learning
 
Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...Open Web Data for Education - Linked Data technologies for connecting open ed...
Open Web Data for Education - Linked Data technologies for connecting open ed...
 
Frances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationFrances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentation
 
Personal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlPersonal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t control
 
Googlization
GooglizationGooglization
Googlization
 
M045067275
M045067275M045067275
M045067275
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...
 
IL Frames webinar2015
IL Frames webinar2015IL Frames webinar2015
IL Frames webinar2015
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
 
The Human Factor in Digital Recommender Systems
The Human Factor in Digital Recommender SystemsThe Human Factor in Digital Recommender Systems
The Human Factor in Digital Recommender Systems
 
Ijetcas14 580
Ijetcas14 580Ijetcas14 580
Ijetcas14 580
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

  • 1. Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content chenwq 2014/04/16 Mounia Lalmas et al. (Yahoo! Labs, CIKM 2013 Best Paper )
  • 2. Mounia Lalmas @mounialalmas mounia-lalmas mounialalmas Principal Research Scientist at Yahoo! Labs Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London Her research focuses on three main areas: user engagement social media and search.
  • 3. Contents 1/23 1 3 What/why serendipitous search How to build serendipitous search system Experiments setting and analysis
  • 4. Why/when do penguins wear sweaters? Entity Search Building an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers Serendipity Finding something good or useful while not specifically looking for it Serendipitous search systems provide relevant and interesting results 2/23
  • 5. What is entity search How people become entitiesHow people become entities 3/23
  • 6. What is entity search Entities Extraction Proximity Measure between two entities Entities Ranking according to their proximity to a query entity 4/23
  • 7. What is Serendipity “making fortunate discoveries by accident” M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010. Serendipity = unexpectedness + relevance “Expected” result baselines from web search Serendipity = interestingness + relevance Result interestingness given the query Personal interest in result P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009. 5/23
  • 8. What is Serendipity Intuition from recsys: unexpectedness usefulness u(RSi) 6/23
  • 9. What connections between entities do web community knowledge portals offer? WHAT WHY How do they contribute to an interesting, serendipitous browsing experience? Why/when do penguins wear sweaters? 6/23
  • 10. Why/when do penguins wear sweaters? community-driven question & answer portal •67M questions & 262M answers •2 years [2010/2011] •English-language community-driven encyclopedia •3 795 865 articles •from end of December 2011 •English Wikipedia minimally curated opinions, gossip, personal info variety of points of view minimally curated opinions, gossip, personal info variety of points of view curated high-quality knowledge variety of niche topics curated high-quality knowledge variety of niche topics 7/23
  • 11. Contents 1 3 What/why serendipitous search How to build serendipitous search system Experiments setting and analysis 8/23
  • 12. Entity & Relationship Extraction Entity defined as any concept having a Wikipedia page 1. Identify surface forms[http] , 2. resolve to Wikipedia entities[Zhou] , 3. rank entities using aboutness score[Paranjpe] ; https://www.otexts.org/node/832 Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343. D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009. Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears) 9/23
  • 13. Entity & Relationship Extraction Aboutness Relationship 10/23
  • 14. Entity Networks Dataset # Nodes # Edges # Isolated Yahoo! Answers 896,799 112,595,138 69,856 Wikipedia 1,754,069 237,058,218 82,381 Wikipedia Yahoo Answers 11/23
  • 15. Retrieval Algorithm: Lazy Random walk with restart[Chung] [1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997. 12/23
  • 16. Rank Aggregation For a given query, combine the results from different search engines Simple median-rank aggregation[Sculley] A B C D E C D E A B C A D B E Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007. 13/23
  • 17. Contents 1 3 What/why serendipitous search How to build serendipitous search system Experiments setting and analysis 14/23
  • 18. Retrieval Wikipedia Yahoo! Answers Combined Precision @ 5 0.668 0.724 0.744 MAP 0.716 0.762 0.782 3 label per query-result pair Yahoo! Answers Jon Rubinstein Timothy Cook Kane Kramer Steve Wozniak Jerry York Wikipedia System 7 PowerPC G4 SuperDrive Power Macintosh Power Computing Corp. Steve Jobs  Annotator agreement (overlap): 85%  Average overlap in top 5 results: 12% 15/23
  • 19. What connections between entities do web community knowledge portals offer? WHAT WHY How do they contribute to an interesting, serendipitous browsing experience? Why/when do penguins wear sweaters? 16/23
  • 20. • Sentiment – using SentiStrength compute positive & negative scores – compute attitude and sentimentality – Entity-level scores • Quality – Flesch Reading Ease score Attitude (Polarity) Sentimentality (Strength) Readability  Topical Category – Yahoo Content Taxonomy Entity Networks with Implicit Metadata 17/23
  • 21. Entity Networks with Metadata Table 5: Serendipitous across different runs | relevant & unexpected | / | unexpected | number of serendipitous results out of all of the unexpected results retrieved | relevant & unexpected | / | retrieved | serendipitous out of all retrieved 18/23
  • 22. User-perceived Quality 1. Which result is more relevant to the query? 2. If someone is interested in the query, would they also be interested in these results? 3. Even if you are not interested in the query, are these results interesting to you personally? 4. Would you learn anything new about the query? 19/23
  • 23. Entity Networks with Metadata Data General +Topic Which result is more WP 0.162 0.194 relevant to the query? YA 0.336 0.374 Comb 0.201 0.222 If someone is interested in WP 0.162 0.176 the query, would they also YA 0.312 0.343 be interested in the result? Comb 0.184 0.222 Even if you are not interested WP 0.139 0.144 in the query, is the result YA 0.324 0.359 interesting to you personally? Comb 0.168 0.198 Would you learn anything WP 0.167 0.164 new about the query from YA 0.307 0.346 this result? Comb 0.184 0.203 Topical category constraint promote results of same topic as query entity Sentiment and Readability constraints hurt performance Table 6: Similarity (Kendall’s tau-b[Fagin] ) between result sets and reference ranking Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58. 22/23

Notes de l'éditeur

  1. 图片的故事来自于互联网用户在搜索石油泄漏(oil spill)的时候,意外发现的搜索结果里面有一条关于企鹅需要穿毛衣这样的信息,并且对这个信息感兴趣。 这样的搜索结果称为是Serndipity的,见定义。 文章通过Entity Search的技术从yahoo! Answer和维基百科搜索这样的结果给用户,增加用户体验。
  2. Those two datasets are user-generated content. Represents the content of each data source as an entity network The challenge including: Extracting entities from different datasets Building a meaningfull similarity measure
  3. 2. This approach employs a resolution model based on a rich set of both content-sensitive and content-independent features, derived from Wikipedia and various other data sources including web behavioral data.
  4. Taking the (order-insensitive) concatenation of all the documents in C where e appears Extract lexicon by tokenizing every document, removing stop words and applying Porter’s stemming algorithm on the obtained tokens
  5. The two graphs are almost fully connected. The largest connected component spans 92.5% of the nodes in YA, and 95.78% in WP. This is due to the presence of popular entities that appear ubiquitously in the two datasets. These entities represent very common concepts, which are not particular to the subject of a document. These entities will be removed from the entities networks as they are not likely to be relevant to the input entity. Reduce the candicate entities space by restricting to the pairs of entities that co-occur in at least one document.
  6. eta = 0.9 alpha = 0.15, get worsening results. So random walks with no jump. Stop criterion: The F-norm of the difference between two successive iterations is < 10E-6 Reach the maximum of 30 iterations
  7. Lazy random walks algorithm with restart achieves 67% on WP and 72% on YA The combination of WP and YA achieves high accuracy 74% and the Mean Average Precision with 78.2%.
  8. http://www.amazon.com/readability-yardstick-Rudolf-Franz-Flesch/dp/B0007JR5LQ
  9. Top: For each query, retrieve the 5 entities that occur most frequently in the top 5 search results provided by two major commercial search engines Top Nwq: Similar to previous case, but excluding the Wikipedia page of the input entity (if present) from the set of results returned by the search engines. The performance improved and implied that entities from WP’s entity networks contributed to the serendipitous of search results. Rel: Return the top 5 entities in the related-query suggestions provide Rel + Top: Return the union of the sets of entity recommendations provided by Top and Rel Value in parentheses is always almost as high as the corresponding serendipity value, confirms that, the methods proposals by this paper indeed retrieving a considerable fraction of results that are both unexpected and relevant