Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Penguins in Sweaters, or Serendipitous
Entity Search on User-generated-Content
chenwq
2014/04/16
Mounia Lalmas et al.
(Yahoo! Labs, CIKM 2013 Best Paper )

Mounia Lalmas
@mounialalmas
mounia-lalmas
mounialalmas
Principal Research Scientist at Yahoo! Labs
Professor of Information Retrieval
at the Department of Computer Science at Queen Mary,
University of London
Her research focuses on three main areas:
user engagement
social media and search.

Contents 1/23
1
3
What/why serendipitous search
How to build serendipitous search system
Experiments setting and analysis

Why/when do penguins wear sweaters?
Entity Search
Building an entity-driven serendipitous search system based on
enriched entity networks extracted from Wikipedia and Yahoo!
Answers
Serendipity
Finding something good or useful while not specifically
looking for it
Serendipitous search systems provide relevant and
interesting results
2/23

What is entity search
How people become entitiesHow people become entities
3/23

What is entity search
Entities Extraction
Proximity Measure
between two entities
Entities Ranking
according to their
proximity to a query
entity
4/23

What is Serendipity
“making fortunate discoveries by accident”
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems
by coverage and serendipity. IRecSys 2010.
Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
Serendipity = interestingness + relevance
Result interestingness given the query
Personal interest in result
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in
web search. SIGCHI 2009.
5/23

What is Serendipity
Intuition from recsys:
unexpectedness
usefulness u(RSi)
6/23

What connections between entities
do web community knowledge
portals offer?
WHAT
WHY
How do they contribute to an
interesting, serendipitous browsing
experience?
6/23

community-driven question & answer
portal
•67M questions & 262M answers
•2 years [2010/2011]
•English-language
community-driven encyclopedia
•3 795 865 articles
•from end of December 2011
•English Wikipedia
minimally curated
opinions, gossip, personal info
variety of points of view
minimally curated
opinions, gossip, personal info
variety of points of view
curated
high-quality knowledge
variety of niche topics
curated
high-quality knowledge
variety of niche topics
7/23

Contents
1
3
8/23

Entity & Relationship Extraction
Entity defined as any concept having a Wikipedia page
1. Identify surface forms[http]
,
2. resolve to Wikipedia entities[Zhou]
,
3. rank entities using aboutness score[Paranjpe]
;
https://www.otexts.org/node/832
Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd
International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.
Relationship: Cosine similarity of tf/idf vectors
(concatenation of documents where entity appears)
9/23

Entity & Relationship Extraction
Aboutness
Relationship
10/23

Entity Networks
Dataset # Nodes # Edges # Isolated
Yahoo! Answers 896,799 112,595,138 69,856
Wikipedia 1,754,069 237,058,218 82,381
Wikipedia
Yahoo Answers
11/23

Retrieval
Algorithm: Lazy Random walk with restart[Chung]
[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.
12/23

Rank Aggregation
For a given query, combine the results from
different search engines
Simple median-rank aggregation[Sculley]
A B C D E
C D E A B
C A D B E
Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.
13/23

Contents
1
3
14/23

Retrieval
Wikipedia Yahoo! Answers Combined
Precision @ 5 0.668 0.724 0.744
MAP 0.716 0.762 0.782
3 label per query-result pair
Yahoo! Answers
Jon Rubinstein
Timothy Cook
Kane Kramer
Steve Wozniak
Jerry York
Wikipedia
System 7
PowerPC G4
SuperDrive
Power Macintosh
Power Computing Corp.
Steve Jobs
 Annotator agreement
(overlap): 85%
 Average overlap in top 5
results: 12%
15/23

What connections between entities
do web community knowledge
portals offer?
WHAT
WHY
How do they contribute to an
interesting, serendipitous browsing
experience?
16/23

• Sentiment
– using SentiStrength compute positive & negative scores
– compute attitude and sentimentality
– Entity-level scores
• Quality
– Flesch Reading Ease score
Attitude (Polarity) Sentimentality (Strength) Readability
 Topical Category
– Yahoo Content Taxonomy
Entity Networks with Implicit Metadata
17/23

User-perceived Quality
1. Which result is more relevant to the query?
2. If someone is interested in the query, would they also be interested in these
results?
3. Even if you are not interested in the query, are these results interesting to you
personally?
4. Would you learn anything new about the query?
19/23

Entity Networks with Metadata
Data General +Topic
Which result is more WP 0.162 0.194
relevant to the query? YA 0.336 0.374
Comb 0.201 0.222
If someone is interested in WP 0.162 0.176
the query, would they also YA 0.312 0.343
be interested in the result? Comb 0.184 0.222
Even if you are not interested WP 0.139 0.144
in the query, is the result YA 0.324 0.359
interesting to you personally? Comb 0.168 0.198
Would you learn anything WP 0.167 0.164
new about the query from YA 0.307 0.346
this result? Comb 0.184 0.203
Topical
category
constraint
promote results
of same topic
as query entity
Sentiment and
Readability
constraints
hurt performance
Table 6: Similarity (Kendall’s tau-b[Fagin]
) between result sets and reference ranking
Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third
ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.
22/23

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Similaire à Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content (20)

Dernier

Dernier (20)

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Notes de l'éditeur