SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
A Survey of Entity Ranking over
RDF Graphs
Nikita Zhiltsov
Kazan Federal University
Russia
November 29, 2013

1 / 60
Outline
1 Introduction
2 Task Statement and Evaluation Methodology
3 Approaches
4 Conclusion

2 / 60
Motivation
The increasing amount of valuable semi-structured
data has become available online, e.g.
RDF graphs: Linking Open Data (LOD) cloud
Web pages enhanced with microformats, RDFa
etc.: CommonCrawl, Web Data Commons
Google: Freebase Annotations of the ClueWeb
Corpora
More than a half of queries from real query logs
have the entity-centric user intent
Examples from industry: Google Knowledge Graph,
Facebook Graph Search, Yandex Islands ⇒
3 / 60
Google Knowledge Graph

4 / 60
Facebook Graph Graph

5 / 60
Yandex Islands

6 / 60
Overview of Semantic Search Approaches
T. Tran, P. Mika. Semantic Search - Systems, Concepts, Methods and Communities behind It

7 / 60
Outline
1 Introduction
2 Task Statement and Evaluation Methodology
3 Approaches
4 Conclusion

8 / 60
In this talk, we focus on entity ranking over RDF
graphs given a keyword search query

9 / 60
Key Issues in Entity Ranking

Ambiguity in names
Related entities from heterogeneous
data sources
Complex queries with clarifying terms

10 / 60
Key Issues in Entity Ranking
Ambiguity in names

Given a query university of michigan,
University of Michigan, Ann Arbor
Central Michigan University, Michigan
Technological University, Michigan State
University

11 / 60
Key Issues in Entity Ranking
Related entities from heterogeneous data sources

Given a query harry potter movie,

Semantic link information can effectively enhance term
context
12 / 60
Key Issues in Entity Ranking
Complex queries with clarifying terms

Given a query shobana
masala, the user intent is
likely about Shobana
Chandrakumar, an Indian
actress starring in movies of
the Masala genre

13 / 60
Ad-hoc Object Retrieval in the Web of Data
Jeffrey Pound, Peter Mika, Hugo Zaragoza
WWW 2010

14 / 60
Query Categories
Entity query (∼ 40%∗), e.g. 1978 cj5
jeep
Type query† (∼ 12%), e.g. doctors in
barcelona
Attribute query (∼ 5%), e.g. zip code
atlanta
Other query (∼ 36%)
however, ∼ 14% of them contain a context
entity or type
∗

estimated on real query logs from Yahoo!

†

a.k.a. list search query
15 / 60
Repeatable and Reliable
Search System Evaluation
using Crowdsourcing
Roi Blanco, Harry Halpin, Daniel M. Herzig,
Peter Mika, Jeffrey Pound, Henry S. Thompson,
Thanh D. Tran
SIGIR 2011

16 / 60
Data Collection

Billion Triples Challenge 2009 RDF data set
The size of uncompressed data is 247GB;
1.4B triples describing 114 million objects
It was composed by combining crawls of
multiple RDF search engines

17 / 60
Data Collection
Classes

18 / 60
Data Collection
Properties

19 / 60
Data Collection
Sources

20 / 60
Query Set Preparation
1

Emulate top queries
Given Microsoft Live Search log containing
queries repeated by at least 10 different users
Sample 50 queries prefiltered with a NER and
a gazetteer

2

Emulate long-tailed queries
Given Yahoo! Search Query Log Tiny Sample
v1.0 – 4,500 queries
Sample and manually filter out ambiguous
queries ⇒ 42 queries

3

⇒ a list of 92 queries
21 / 60
Crowdsourcing Judgements
A purpose-built rendering tool to present
the search results
There have been conducted the evaluation
(MT1) and its repetition(MT2) after 6
months
Using Amazon Mechanical Turk HITs
Each HIT consists of 12 query-result pairs:
10 real ones and 2 were from "golden
standard" annotated by experts
64 workers for MT1 and 69 workers for MT2
22 / 60
Rendering Tool

23 / 60
Analysis of Results
Repeatability

The level of agreement is the same for two
pools
The rank order of the systems is unchanged
24 / 60
Targeting Evaluation Measures I
All the measures are usually computed on top-10 search
results (k=10)
1

P@k (precision at k):
P @k(π, l) =

2

t≤k I{lπ(k) =1}

k

MAP (mean average precision):
AP (π, l) =

m
k=1 P @k

· I{lπ(k) =1}

m1

MAP = mean of AP over all queries
25 / 60
Targeting Evaluation Measures II
3

NDCG: normalized discounted cumulative gain
k

G(lπ(j) ) · η(j),

DCG@k(π, l) =
j=1

where G(·), the rating of a document, is usually
1
G(z) = 2z − 1, η(j) = log(j+1) , lπ(j) ∈ {0, 1, 2}
N DCG@k(π, l) =

1
DCG@k(π, l)
Zk

26 / 60
Analysis of Results
Reliability

Metric Difference
MAP
1.8%
NDCG 3.5%
P@10
12.8%

In the setting, experts rate more results
negative than workers
P@10 is more fragile than MAP and NDCG
27 / 60
Yahoo! SemSearch Challenge (YSC) 2010 & 2011
http://semsearch.yahoo.com

28 / 60
Outline
1 Introduction
2 Task Statement and Evaluation Methodology
3 Approaches
4 Conclusion

29 / 60
Entity Search Track Submission by
Yahoo! Research Barcelona
Roi Blanco, Peter Mika, Hugo Zaragoza
SSW at WWW 2010

30 / 60
YSC 2010 Winner Approach
RDF S-P-O triples with literals are only considered
Triples are filtered by predicates from a predefined
list of 300 predicates
Triples about the same subject are grouped into a
pseudo document with multiple fields
BM25F ranking formula is applied (the weighting
scheme wc is handcrafted):
BM 25F =
t∈q∩d

tf (t, d)
· idf (t),
k1 + b ∗ tf (t, d)
wc · tfc (t, d)

tf (t, d) =
c∈d

31 / 60
Sindice BM25MF at SemSearch 2011
Stephane Campinas, Renaud Delbru, Nur A. Rakhmawati,
Diego Ceccarelli, Giovanni Tummarello
SSW at WWW 2011

32 / 60
YSC 2011 Winner Approach I
URI resolution for triple objects
Extended BM25F approach with additional
normalization for term frequencies per
predicate types:
The weighting scheme is handcrafted
The proportion of query terms in entity
literals
33 / 60
YSC 2011 Winner Approach II
RDF graph example:

34 / 60
YSC 2011 Winner Approach III
Star-shaped query matching the entity:

35 / 60
YSC 2011 Winner Approach IV
Empirical weights:

36 / 60
On the Modeling of Entities
for Ad-Hoc Entity Search in the Web of Data
Robert Neumayer, Kristztian Balog, Kjetil Nørvåg
ECIR 2012

37 / 60
Approach to entity representation I
RDF graph example:

38 / 60
Approach to entity representation II
a) Unstructured Entity Model; b) Structure Entity Model:

39 / 60
Main Findings
Two generative language models (LMs) for
the task:
Unstructured Entity Model
Structured Entity Model

The evaluation on the YSC data shows that
the representation of relations as a mixture
of predicate type LMs can contribute
significantly to overall performance
40 / 60
LM Retrieval Framework
P (q|e)P (e) rank
= P (q|e)P (e),
P (q)
where P (e|q) - probability of being relevant given query q
P (e|q) =

Further Assumptions
(i) P (e) is uniform; (ii) query terms are i.i.d
Let θe be the entity model that predicts how likely the
entity would produce a given term t, then
the query likelihood is
P (t|θe )tf (t,q)

P (q|θe ) =
t∈q

41 / 60
Unstructured Entity Model

Idea
Collapse all text values of properties associated
with the entity into a single document and apply
standard IR techniques
The entity model is a Dirichlet-smoothed
multinomial distribution:
P (t|θe) =

tf (t, e) + µP (t|θc)
|e| + µ
42 / 60
Structured Entity Model
Folding Predicates

Group RDF triples by the following predicate types pt :
Name, e.g. literal values of foaf:name, rdfs:label
Attributes, i.e. remaining datatype properties
OutRelations: resolving "object" (O) URIs in S-P-O
triple getting their names
InRelations: resolving "subject" (S) URIs in S-P-O
triple getting their names

43 / 60
Structured Entity Model
Mixture of Language Models
p
Each group has its own LM P (t|θe t ):
p
P (t|θe t )

p
tf (t, pt, e) + µpt P (t|θc t )
=
|pt, e| + µpt

Then, the entity model is a linear mixture of the
predicate type LMs:
p
P (t|θe t )P (pt)

P (t|θe) =
pt

44 / 60
Comparative Evaluation
Model
UEM
SEM
UEM
SEM

MAP

P@10
NDCG
YSC 2010
0.207
0.314
0.383
0.282 (+36.2%) 0.400 (+27.4%) 0.494 (+29.0%)
YSC 2011
0.207
0.188
0.295
0.261 (+26.1%) 0.242 (+28.7%) 0.400 (+35.6%)

The multi-fielded document approach improves
the targeted measures on 26-35%

45 / 60
Combining N-gram Retrieval with Weights
Propagation on Massive RDF Graphs
He Hu, Xiaoyang Du
FSKD 2012

46 / 60
Approach I
Considering 2- to 5-grams while indexing entity
URIs as well as literals
Thinking of URIs as hierarchical names
Computing the entity-query similarity scores:
simU RI (Q) =

engram_hit_count
(||Q| − |U RI.path|| + 1) · (U RI.depth + 1)

simLIT ERAL (Q) =

engram_hit_count
||Q| − |LIT ERAL.length|| + 1

47 / 60
Approach II
Ranking score:
ScoreU RI (Q) = 1 − e−sim(Q)

Taking advantage of iterative PageRank-like weight
propagation:
WU RI_hit (i + 1) = α · WU RI_hit (i)
WU RI_unhit (i + 1) = (1 − α) ·

WU RI_hit_neighbors (i)
NU RI_hit_neighbors

Improvement up to 80% w.r.t. the plain n-gram
ranker
48 / 60
Combining Inverted Indices
and Structured Search
for Ad-hoc Object Retrieval
Alberto Tonton, Gianluca Demartini,
Phillipe Cudré-Mauroux
SIGIR 2012

49 / 60
Hybrid Search System

50 / 60
Structured Inverted Index
Consider the following property values as fields:
URI: tokens from entity URI, e.g. http:
//dbpedia.org/page/Barack_Obama
⇒ ’barack’, ’obama’ etc.
Labels: values of a list of manually selected
datatype properties
Attributes: other properties
BM25F is used as a ranking function
51 / 60
Graph-based Entity Search
1

2
3

4

Given a query q, obtain a list of entities
Retr = {e1 , e2 , . . . , en } ranked by the BM25F
scores
Use top-N elements as seeds for graph traversal
To get StructRetr = {e1 , . . . , em }, exploit
promising LOD properties‡ as well as Jaro-Winkler
string similarity scores JW (q, e ) > τ
Combine two rankings:
f inalScore(q, e ) = λ × BM 25(q, e) + (1 − λ) × JW (q, e )

‡

owl:sameAs, dbpedia:disambiguates, dbpedia:redirect
52 / 60
Evaluation

The graph-based approach (S1_1) outperforms BM25
scoring with 25% improvement of MAP on the 2010 data set
No significant improvement over baseline on the 2011 data
set
This may be explained by the lack of the used predicates
(owl:sameAs volume < 0.7%)
53 / 60
Improving Entity Search over Linked Data
by Modeling Latent Semantics
Nikita Zhiltsov, Eugene Agichtein
CIKM 2013

54 / 60
Key Contributions
A tensor factorization based approach to incorporate
semantic link information into ranking model
Outperforms the state of the art baseline in
NDCG/MAP/P@10
A thorough evaluation of the proposed techniques
by acquiring thousands of manual labels to augment
the YSC benchmark data set
⇒ more details in the next talk
55 / 60
Negative results
The ideas that do not work out

56 / 60
Negative Results
The ideas from standard IR that do not work out:
Wordnet-based query expansion [Tonon et al.,
SIGIR 2012]
Pseudo-relevance feedback [Tonon et al., SIGIR
2012]
Query suggestions of a commercial search engine
[Tonon et al., SIGIR 2012]
Direct application of centrality measures, such as
PageRank and HITS [Campinas et al., SSW WWW
2010; Dali et al., 2012]
57 / 60
Outline
1 Introduction
2 Task Statement and Evaluation Methodology
3 Approaches
4 Conclusion

58 / 60
Wrap up
Entity search over RDF graphs a.k.a. ad-hoc object
retrieval has emerged as a new task in IR
There is a robust and consistent evaluation
methodology for it
State-of-the-art approaches revolve around
applications of well-known IR methods along
Lack of approaches for leveraging semantic links
Lots of data: scalability really matters
59 / 60
Thanks for your attention!

60 / 60

Contenu connexe

Tendances

Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled GraphsMarko Rodriguez
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...National Institute of Informatics
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012María Poveda Villalón
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsEnrico Palumbo
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environmentizahn
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLGábor Szárnyas
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...Victor Giannakouris
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 

Tendances (20)

Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
 
Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...Applying tensor decompositions to author name disambiguation of common Japane...
Applying tensor decompositions to author name disambiguation of common Japane...
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Data visualization
Data visualizationData visualization
Data visualization
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Week 2 - Data Structures and Algorithms
Week 2 - Data Structures and AlgorithmsWeek 2 - Data Structures and Algorithms
Week 2 - Data Structures and Algorithms
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Week 1 - Data Structures and Algorithms
Week 1 - Data Structures and AlgorithmsWeek 1 - Data Structures and Algorithms
Week 1 - Data Structures and Algorithms
 

En vedette

Semantic Search Over The Web
Semantic Search Over The WebSemantic Search Over The Web
Semantic Search Over The Webalierkan
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Websamar_slideshare
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Aldo Gangemi
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...Álvaro Sicilia
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?Irfan Ullah
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 

En vedette (10)

Semantic Search Over The Web
Semantic Search Over The WebSemantic Search Over The Web
Semantic Search Over The Web
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...PhD Dissertation Supporting tools for automated generation and visual editing...
PhD Dissertation Supporting tools for automated generation and visual editing...
 
School intro
School introSchool intro
School intro
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 

Similaire à A Survey of Entity Ranking over RDF Graphs

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...FedorNikolaev
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic rankingFELIX75
 
Slides
SlidesSlides
Slidesbutest
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-rankingFELIX75
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slidesSean Mulvehill
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)krisztianbalog
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 

Similaire à A Survey of Entity Ranking over RDF Graphs (20)

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
 
Missing Data imputation
Missing Data imputationMissing Data imputation
Missing Data imputation
 
probabilistic ranking
probabilistic rankingprobabilistic ranking
probabilistic ranking
 
Slides
SlidesSlides
Slides
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
DB and IR Integration
DB and IR IntegrationDB and IR Integration
DB and IR Integration
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
IR tutorial
IR tutorialIR tutorial
IR tutorial
 
DB-IR-ranking
DB-IR-rankingDB-IR-ranking
DB-IR-ranking
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

A Survey of Entity Ranking over RDF Graphs

  • 1. A Survey of Entity Ranking over RDF Graphs Nikita Zhiltsov Kazan Federal University Russia November 29, 2013 1 / 60
  • 2. Outline 1 Introduction 2 Task Statement and Evaluation Methodology 3 Approaches 4 Conclusion 2 / 60
  • 3. Motivation The increasing amount of valuable semi-structured data has become available online, e.g. RDF graphs: Linking Open Data (LOD) cloud Web pages enhanced with microformats, RDFa etc.: CommonCrawl, Web Data Commons Google: Freebase Annotations of the ClueWeb Corpora More than a half of queries from real query logs have the entity-centric user intent Examples from industry: Google Knowledge Graph, Facebook Graph Search, Yandex Islands ⇒ 3 / 60
  • 7. Overview of Semantic Search Approaches T. Tran, P. Mika. Semantic Search - Systems, Concepts, Methods and Communities behind It 7 / 60
  • 8. Outline 1 Introduction 2 Task Statement and Evaluation Methodology 3 Approaches 4 Conclusion 8 / 60
  • 9. In this talk, we focus on entity ranking over RDF graphs given a keyword search query 9 / 60
  • 10. Key Issues in Entity Ranking Ambiguity in names Related entities from heterogeneous data sources Complex queries with clarifying terms 10 / 60
  • 11. Key Issues in Entity Ranking Ambiguity in names Given a query university of michigan, University of Michigan, Ann Arbor Central Michigan University, Michigan Technological University, Michigan State University 11 / 60
  • 12. Key Issues in Entity Ranking Related entities from heterogeneous data sources Given a query harry potter movie, Semantic link information can effectively enhance term context 12 / 60
  • 13. Key Issues in Entity Ranking Complex queries with clarifying terms Given a query shobana masala, the user intent is likely about Shobana Chandrakumar, an Indian actress starring in movies of the Masala genre 13 / 60
  • 14. Ad-hoc Object Retrieval in the Web of Data Jeffrey Pound, Peter Mika, Hugo Zaragoza WWW 2010 14 / 60
  • 15. Query Categories Entity query (∼ 40%∗), e.g. 1978 cj5 jeep Type query† (∼ 12%), e.g. doctors in barcelona Attribute query (∼ 5%), e.g. zip code atlanta Other query (∼ 36%) however, ∼ 14% of them contain a context entity or type ∗ estimated on real query logs from Yahoo! † a.k.a. list search query 15 / 60
  • 16. Repeatable and Reliable Search System Evaluation using Crowdsourcing Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh D. Tran SIGIR 2011 16 / 60
  • 17. Data Collection Billion Triples Challenge 2009 RDF data set The size of uncompressed data is 247GB; 1.4B triples describing 114 million objects It was composed by combining crawls of multiple RDF search engines 17 / 60
  • 21. Query Set Preparation 1 Emulate top queries Given Microsoft Live Search log containing queries repeated by at least 10 different users Sample 50 queries prefiltered with a NER and a gazetteer 2 Emulate long-tailed queries Given Yahoo! Search Query Log Tiny Sample v1.0 – 4,500 queries Sample and manually filter out ambiguous queries ⇒ 42 queries 3 ⇒ a list of 92 queries 21 / 60
  • 22. Crowdsourcing Judgements A purpose-built rendering tool to present the search results There have been conducted the evaluation (MT1) and its repetition(MT2) after 6 months Using Amazon Mechanical Turk HITs Each HIT consists of 12 query-result pairs: 10 real ones and 2 were from "golden standard" annotated by experts 64 workers for MT1 and 69 workers for MT2 22 / 60
  • 24. Analysis of Results Repeatability The level of agreement is the same for two pools The rank order of the systems is unchanged 24 / 60
  • 25. Targeting Evaluation Measures I All the measures are usually computed on top-10 search results (k=10) 1 P@k (precision at k): P @k(π, l) = 2 t≤k I{lπ(k) =1} k MAP (mean average precision): AP (π, l) = m k=1 P @k · I{lπ(k) =1} m1 MAP = mean of AP over all queries 25 / 60
  • 26. Targeting Evaluation Measures II 3 NDCG: normalized discounted cumulative gain k G(lπ(j) ) · η(j), DCG@k(π, l) = j=1 where G(·), the rating of a document, is usually 1 G(z) = 2z − 1, η(j) = log(j+1) , lπ(j) ∈ {0, 1, 2} N DCG@k(π, l) = 1 DCG@k(π, l) Zk 26 / 60
  • 27. Analysis of Results Reliability Metric Difference MAP 1.8% NDCG 3.5% P@10 12.8% In the setting, experts rate more results negative than workers P@10 is more fragile than MAP and NDCG 27 / 60
  • 28. Yahoo! SemSearch Challenge (YSC) 2010 & 2011 http://semsearch.yahoo.com 28 / 60
  • 29. Outline 1 Introduction 2 Task Statement and Evaluation Methodology 3 Approaches 4 Conclusion 29 / 60
  • 30. Entity Search Track Submission by Yahoo! Research Barcelona Roi Blanco, Peter Mika, Hugo Zaragoza SSW at WWW 2010 30 / 60
  • 31. YSC 2010 Winner Approach RDF S-P-O triples with literals are only considered Triples are filtered by predicates from a predefined list of 300 predicates Triples about the same subject are grouped into a pseudo document with multiple fields BM25F ranking formula is applied (the weighting scheme wc is handcrafted): BM 25F = t∈q∩d tf (t, d) · idf (t), k1 + b ∗ tf (t, d) wc · tfc (t, d) tf (t, d) = c∈d 31 / 60
  • 32. Sindice BM25MF at SemSearch 2011 Stephane Campinas, Renaud Delbru, Nur A. Rakhmawati, Diego Ceccarelli, Giovanni Tummarello SSW at WWW 2011 32 / 60
  • 33. YSC 2011 Winner Approach I URI resolution for triple objects Extended BM25F approach with additional normalization for term frequencies per predicate types: The weighting scheme is handcrafted The proportion of query terms in entity literals 33 / 60
  • 34. YSC 2011 Winner Approach II RDF graph example: 34 / 60
  • 35. YSC 2011 Winner Approach III Star-shaped query matching the entity: 35 / 60
  • 36. YSC 2011 Winner Approach IV Empirical weights: 36 / 60
  • 37. On the Modeling of Entities for Ad-Hoc Entity Search in the Web of Data Robert Neumayer, Kristztian Balog, Kjetil Nørvåg ECIR 2012 37 / 60
  • 38. Approach to entity representation I RDF graph example: 38 / 60
  • 39. Approach to entity representation II a) Unstructured Entity Model; b) Structure Entity Model: 39 / 60
  • 40. Main Findings Two generative language models (LMs) for the task: Unstructured Entity Model Structured Entity Model The evaluation on the YSC data shows that the representation of relations as a mixture of predicate type LMs can contribute significantly to overall performance 40 / 60
  • 41. LM Retrieval Framework P (q|e)P (e) rank = P (q|e)P (e), P (q) where P (e|q) - probability of being relevant given query q P (e|q) = Further Assumptions (i) P (e) is uniform; (ii) query terms are i.i.d Let θe be the entity model that predicts how likely the entity would produce a given term t, then the query likelihood is P (t|θe )tf (t,q) P (q|θe ) = t∈q 41 / 60
  • 42. Unstructured Entity Model Idea Collapse all text values of properties associated with the entity into a single document and apply standard IR techniques The entity model is a Dirichlet-smoothed multinomial distribution: P (t|θe) = tf (t, e) + µP (t|θc) |e| + µ 42 / 60
  • 43. Structured Entity Model Folding Predicates Group RDF triples by the following predicate types pt : Name, e.g. literal values of foaf:name, rdfs:label Attributes, i.e. remaining datatype properties OutRelations: resolving "object" (O) URIs in S-P-O triple getting their names InRelations: resolving "subject" (S) URIs in S-P-O triple getting their names 43 / 60
  • 44. Structured Entity Model Mixture of Language Models p Each group has its own LM P (t|θe t ): p P (t|θe t ) p tf (t, pt, e) + µpt P (t|θc t ) = |pt, e| + µpt Then, the entity model is a linear mixture of the predicate type LMs: p P (t|θe t )P (pt) P (t|θe) = pt 44 / 60
  • 45. Comparative Evaluation Model UEM SEM UEM SEM MAP P@10 NDCG YSC 2010 0.207 0.314 0.383 0.282 (+36.2%) 0.400 (+27.4%) 0.494 (+29.0%) YSC 2011 0.207 0.188 0.295 0.261 (+26.1%) 0.242 (+28.7%) 0.400 (+35.6%) The multi-fielded document approach improves the targeted measures on 26-35% 45 / 60
  • 46. Combining N-gram Retrieval with Weights Propagation on Massive RDF Graphs He Hu, Xiaoyang Du FSKD 2012 46 / 60
  • 47. Approach I Considering 2- to 5-grams while indexing entity URIs as well as literals Thinking of URIs as hierarchical names Computing the entity-query similarity scores: simU RI (Q) = engram_hit_count (||Q| − |U RI.path|| + 1) · (U RI.depth + 1) simLIT ERAL (Q) = engram_hit_count ||Q| − |LIT ERAL.length|| + 1 47 / 60
  • 48. Approach II Ranking score: ScoreU RI (Q) = 1 − e−sim(Q) Taking advantage of iterative PageRank-like weight propagation: WU RI_hit (i + 1) = α · WU RI_hit (i) WU RI_unhit (i + 1) = (1 − α) · WU RI_hit_neighbors (i) NU RI_hit_neighbors Improvement up to 80% w.r.t. the plain n-gram ranker 48 / 60
  • 49. Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval Alberto Tonton, Gianluca Demartini, Phillipe Cudré-Mauroux SIGIR 2012 49 / 60
  • 51. Structured Inverted Index Consider the following property values as fields: URI: tokens from entity URI, e.g. http: //dbpedia.org/page/Barack_Obama ⇒ ’barack’, ’obama’ etc. Labels: values of a list of manually selected datatype properties Attributes: other properties BM25F is used as a ranking function 51 / 60
  • 52. Graph-based Entity Search 1 2 3 4 Given a query q, obtain a list of entities Retr = {e1 , e2 , . . . , en } ranked by the BM25F scores Use top-N elements as seeds for graph traversal To get StructRetr = {e1 , . . . , em }, exploit promising LOD properties‡ as well as Jaro-Winkler string similarity scores JW (q, e ) > τ Combine two rankings: f inalScore(q, e ) = λ × BM 25(q, e) + (1 − λ) × JW (q, e ) ‡ owl:sameAs, dbpedia:disambiguates, dbpedia:redirect 52 / 60
  • 53. Evaluation The graph-based approach (S1_1) outperforms BM25 scoring with 25% improvement of MAP on the 2010 data set No significant improvement over baseline on the 2011 data set This may be explained by the lack of the used predicates (owl:sameAs volume < 0.7%) 53 / 60
  • 54. Improving Entity Search over Linked Data by Modeling Latent Semantics Nikita Zhiltsov, Eugene Agichtein CIKM 2013 54 / 60
  • 55. Key Contributions A tensor factorization based approach to incorporate semantic link information into ranking model Outperforms the state of the art baseline in NDCG/MAP/P@10 A thorough evaluation of the proposed techniques by acquiring thousands of manual labels to augment the YSC benchmark data set ⇒ more details in the next talk 55 / 60
  • 56. Negative results The ideas that do not work out 56 / 60
  • 57. Negative Results The ideas from standard IR that do not work out: Wordnet-based query expansion [Tonon et al., SIGIR 2012] Pseudo-relevance feedback [Tonon et al., SIGIR 2012] Query suggestions of a commercial search engine [Tonon et al., SIGIR 2012] Direct application of centrality measures, such as PageRank and HITS [Campinas et al., SSW WWW 2010; Dali et al., 2012] 57 / 60
  • 58. Outline 1 Introduction 2 Task Statement and Evaluation Methodology 3 Approaches 4 Conclusion 58 / 60
  • 59. Wrap up Entity search over RDF graphs a.k.a. ad-hoc object retrieval has emerged as a new task in IR There is a robust and consistent evaluation methodology for it State-of-the-art approaches revolve around applications of well-known IR methods along Lack of approaches for leveraging semantic links Lots of data: scalability really matters 59 / 60
  • 60. Thanks for your attention! 60 / 60