SlideShare une entreprise Scribd logo
1  sur  62
Part II
Entity Retrieval
Krisztian Balog
University of Stavanger
Half-day tutorial at the WWW’13 conference | Rio de Janeiro, Brazil, 2013
What is an entity?
- Uniquely identifiable “thing” or “object”
- Properties:
- ID
- Name(s)
- Type(s)
- Attributes
- Relationships to other entities
Entity retrieval tasks
- Ad-hoc entity retrieval
- List completion
- Question answering
- Factual questions
- List questions
- Related entity finding
- Type-restricted variations
- People, blogs, products, movies, etc.
What’s so special about it?
- Entities are not always directly represented
- Recognise and disambiguate entities in text
- Collect and aggregate information about a given
entity from multiple documents and even multiple
data collections
- More structure
- Types (from some taxonomy)
- Attributes (from some ontology)
- Relationships to other entities (“typed links”)
In this Part
- Focus on the ad-hoc entity retieval task
- Mainly probabilistic models
- Specifically, Language Models
Outline for Part II
- Crash course into probability theory
- Ranking with ready-made entity descriptions
- Ranking without explicit entity representations
- Evaluation initiatives
- Future directions
Ad-hoc entity retrieval
- Input: unconstrained natural language query
- “telegraphic” queries (neither well-formed nor
grammatically correct sentences or questions)
- Output: ranked list of entities
- Collection: unstructured and/or semi-
structured documents
Ranking with ready-made
entity descriptions
This is not unrealistic...
Document-based entity
representations
- Each entity is described by a document
- Ranking entities much like ranking documents
- Unstructured
- Semi-structured
Standard Language Modeling
approach
- Rank documents d according to their likelihood
of being relevant given a query q: P(d|q)
P(d|q) =
P(q|d)P(d)
P(q)
/ P(q|d)P(d)
Document prior
Probability of the document
being relevant to any query
Query likelihood
Probability that query q
was “produced” by document d
P(q|d) =
Y
t2q
P(t|✓d)n(t,q)
Standard Language Modeling
approach (2)
Number of times t appears in q
Empirical
document model
Collection
model
Smoothing parameter
Maximum
likelihood
estimates
P(q|d) =
Y
t2q
P(t|✓d)n(t,q)
Document language model
Multinomial probability distribution
over the vocabulary of terms
P(t|✓d) = (1 )P(t|d) + P(t|C)
n(t, d)
|d|
P
d n(t, d)
P
d |d|
Here, documents=entities, so
P(e|q) / P(e)P(q|✓e) = P(e)
Y
t2q
P(t|✓e)n(t,q)
Entity prior
Probability of the entity
being relevant to any query
Entity language model
Multinomial probability distribution
over the vocabulary of terms
Semi-structured entity
representation
- Entity description documents are rarely
unstructured
- Representing entities as
- Fielded documents -- the IR approach
- Graphs -- the DB/SW approach
dbpedia:Audi_A4
foaf:name Audi A4
rdfs:label Audi A4
rdfs:comment The Audi A4 is a compact executive car
produced since late 1994 by the German car
manufacturer Audi, a subsidiary of the
Volkswagen Group. The A4 has been built [...]
dbpprop:production 1994
2001
2005
2008
rdf:type dbpedia-owl:MeanOfTransportation
dbpedia-owl:Automobile
dbpedia-owl:manufacturer dbpedia:Audi
dbpedia-owl:class dbpedia:Compact_executive_car
owl:sameAs freebase:Audi A4
is dbpedia-owl:predecessor of dbpedia:Audi_A5
is dbpprop:similar of dbpedia:Cadillac_BLS
Mixture of Language Models
[Ogilvie & Callan, 2003]
- Build a separate language model for each field
- Take a linear combination of them
mX
j=1
µj = 1
Field language model
Smoothed with a collection model built
from all document representations of the
same type in the collectionField weights
P(t|✓d) =
mX
j=1
µjP(t|✓dj )
P. Ogilvie and J. Callan. Combining document representations for known item search. SIGIR'03.
Setting field weights
- Heuristically
- Proportional to the length of text content in that field,
to the field’s individual performance, etc.
- Empirically (using training queries)
- Problems
- Number of possible fields is huge
- It is not possible to optimise their weights directly
- Entities are sparse w.r.t. different fields
- Most entities have only a handful of predicates
Predicate folding
- Idea: reduce the number of fields by grouping
them together
- Grouping based on
- Type [Pérez-Agüera et al. 2010]
- Manually determined importance [Blanco et al. 2011]
R. Blanco, P. Mika, and S. Vigna. Effective and efficient entity search in RDF data. ISWC'11.
J.R. Pérez-Agüera, J. Arroyo, J. Greenberg, J.P. Iglesias, and V. Fresno. Using BM25F for
semantic search. SemSearch'10.
Hierarchical Entity Model
[Neumayer et al. 2012]
- Organise fields into a 2-level hierarchy
- Field types (4) on the top level
- Individual fields of that type on the bottom level
- Estimate field weights
- Using training data for field types
- Using heuristics for bottom-level types
R. Neumayer, K. Balog and K. Nørvåg. On the modeling of entities for ad-hoc entity search in
the web of data. ECIR'12.
Two-level hierarchy
foaf:name Audi A4
rdfs:label Audi A4
rdfs:comment The Audi A4 is a compact executive car
produced since late 1994 by the German car
manufacturer Audi, a subsidiary of the
Volkswagen Group. The A4 has been built [...]
dbpprop:production 1994
2001
2005
2008
rdf:type dbpedia-owl:MeanOfTransportation
dbpedia-owl:Automobile
dbpedia-owl:manufacturer dbpedia:Audi
dbpedia-owl:class dbpedia:Compact_executive_car
owl:sameAs freebase:Audi A4
is dbpedia-owl:predecessor of dbpedia:Audi_A5
is dbpprop:similar of dbpedia:Cadillac_BLS
Name
Attributes
Out-relations
In-relations
Formally
P(t|✓d) =
X
F
P(t|F, d)P(F|d)
P(t|F, d) =
X
df 2F
P(t|df , F)P(df |F, d)
Field type importance
Taken to be the same for all entities
P(F|d) = P(F)
Term generation
Importance of a term is jointly determined by
the field it occurs as well as all fields of that
type (smoothed with a coll. level model)
Field generation
Uniform or estimated
heuristically (based on
length, popularity, etc)
P(t|df , F) = (1 )P(t|df ) + P(t|✓dF
)
Term importance
Comparison of models
d
dfF
...
t
dfF t
... ...d
tdf
...
tdf
...d
t
...
t
Unstructured
document model
Fielded
document model
Hierarchical
document model
Probabilistic Retrieval Model
for Semistructured data
[Kim et al. 2009]
- Extension to the Mixture of Language Models
- Find which document field each query term
may be associated with
Mapping probability
Estimated for each query term
P(t|✓d) =
mX
j=1
µjP(t|✓dj )
P(t|✓d) =
mX
j=1
P(dj|t)P(t|✓dj )
J. Kim, X. Xue, and W.B. Croft. A probabilistic retrieval model for semistructured data. ECIR'09.
Estimating the mapping
probability
Term likelihood
Probability of a query term
occurring in a given field type
Prior field probability
Probability of mapping the query term
to this field before observing collection
statistics
P(dj|t) =
P(t|dj)P(dj)
P(t)
X
dk
P(t|dk)P(dk)
P(t|Cj) =
P
d n(t, dj)
P
d |dj|
Example
Query: meg ryan war
cast 0.407
team 0.382
title 0.187
genre 0.927
title 0.070
location 0.002
cast 0.601
team 0.381
title 0.017
dj dj djP(t|dj) P(t|dj) P(t|dj)
The usual suspects from
document retrieval...
- Priors
- HITS, PageRank
- Document link indegree [Kamps & Koolen 2008]
- Pseudo relevance feedback
- Document-centric vs. entity-centric [Macdonald &
Ounis 2007; Serdyukov et al. 2007]
- sampling expansion terms from top ranked documents
and/or (profiles of) top ranked candidates
- Field-based [Kim & Croft 2011]
J. Kamps and M. Koolen. The importance of link evidence in Wikipedia. ECIR'08.
C. Macdonald and I. Ounis. Expertise drift and query expansion in expert search. CIKM'07.
P. Serdyukov, S. Chernov, and W. Nejdl. Enhancing expert search through query modeling. ECIR'07.
J.Y. Kim and W.B. Croft. A Field Relevance Model for Structured Document Retrieval. ECIR'12.
So far...
- Ranking (fielded) documents...
- What is special about entities?
- Type(s)
- Relationships with other entities
Entity types
rdf:type dbpedia-owl:MeanOfTransportation
dbpedia-owl:Automobile
Using target types
Assuming they have been identified...
- Constraining results
- Soft/hard filtering
- Different ways to measure type similarity (between
target types and the types associated with the entity)
- Set-based
- Content-based
- Lexical similarity of type labels
- Query expansion
- Adding terms from type names to the query
- Entity expansion
- Categories as a separate metadata field
Modeling terms and categories
[Balog et al. 2011]
K. Balog, M. Bron, and M. de Rijke. Query modeling for entity search based on terms,
categories and examples. TOIS'11.
Term-based representation
Query model
p(t|✓T
e )p(t|✓T
q ) p(c|✓C
q ) p(c|✓C
e )
Entity model Query model Entity model
Category-based representation
KL(✓T
q ||✓T
e ) KL(✓C
q ||✓C
e )
P(e|q) / P(q|e)P(e)
P(q|e) = (1 )P(✓T
q |✓T
e ) + P(✓C
q |✓C
e )
Identifying target types
- Types of top ranked entities [Vallet & Zaragoza
2008]
- Direct term-based vs. indirect entity-based
representations [Balog & Neumayer 2012]
- Hierarchical case is difficult...
D. Vallet and H. Zaragoza. Inferring the most important types of a query: a semantic approach. SIGIR'08.
K. Balog and R. Neumayer. Hierarchical target type identification for entity-oriented queries. CIKM'12.
U. Sawant and S. Chakrabarti. Learning joint query interpretation and response ranking. WWW'13.
Expanding target types
- Pseudo relevance feedback
- Based on hierarchical structure
- Using lexical similarity of type labels
Ranking without explicit
entity representations
Scenario
- Entity descriptions are not readily available
- Entity occurrences are annotated
The basic idea
Use documents to get from queries to entities
e
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx xxxxxx
xx x xxx xx x xxxx xx
xxx x xxxxxx xx x xxx xx
xxxx xx xxx xx x xxxxx
xxx xx x
q
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xxxxxx xx x xxx
xx x xxxx xx xxx x xxxxx
xx x xxx xx xxxx xx xxx
xx x xxxxx xxx
Query-document
association
the document’s relevance
Document-entity
association
how well the document
characterises the entity
Two principal approaches
- Profile-based methods
- Create a textual profile for entities, then rank them
(by adapting document retrieval techniques)
- Document-based methods
- Indirect representation based on mentions identified
in documents
- First ranking documents (or snippets) and then
aggregating evidence for associated entities
Profile-based methods
q
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx xxxxxx
xx x xxx xx x xxxx xx
xxx x xxxxxx xx x xxx xx
xxxx xx xxx xx x xxxxx
xxx xx x
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xxxxxx xx x xxx
xx x xxxx xx xxx x xxxxx
xx x xxx xx xxxx xx xxx
xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
e
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
e
e
Document-based methods
q
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx xxxxxx
xx x xxx xx x xxxx xx
xxx x xxxxxx xx x xxx xx
xxxx xx xxx xx x xxxxx
xxx xx x
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xx x xxx xx xxxx
xx xxx xx x xxxxx xxx xx
x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx
x xxx xx x xxxx xx xxx x
xxxxxx xxxxxx xx x xxx
xx x xxxx xx xxx x xxxxx
xx x xxx xx xxxx xx xxx
xx x xxxxx xxx
X
e
X
X
e
e
Many possibilities in terms of
modeling
- Generative probabilistic models
- Discriminative probabilistic models
- Voting models
- Graph-based models
Generative probabilistic
models
- Candidate generation models (P(e|q))
- Two-stage language model
- Topic generation models (P(q|e))
- Candidate model, a.k.a. Model 1
- Document model, a.k.a. Model 2
- Proximity-based variations
- Both families of models can be derived from the
Probability Ranking Principle [Fang & Zhai 2007]
H. Fang and C. Zhai. Probabilistic models for expert finding. ECIR'07.
Candidate models (“Model 1”)
[Balog et al. 2006]
P(q|✓e) =
Y
t2q
P(t|✓e)n(t,q)
Smoothing
With collection-wide background model
(1 )P(t|e) + P(t)
X
d
P(t|d, e)P(d|e)
K. Balog, L. Azzopardi, and M. de Rijke. Formal Models for Expert Finding in Enterprise Corpora. SIGIR'06.
Document-entity
association
Term-candidate
co-occurrence
In a particular document.
In the simplest case:P(t|d)
Document models (“Model 2”)
[Balog et al. 2006]
P(q|e) =
X
d
P(q|d, e)P(d|e)
Document-entity
association
Document relevance
How well document d
supports the claim that e
is relevant to q
Y
t2q
P(t|d, e)n(t,q)
Simplifying assumption
(t and e are conditionally
independent given d)
P(t|✓d)
K. Balog, L. Azzopardi, and M. de Rijke. Formal Models for Expert Finding in Enterprise Corpora. SIGIR'06.
Document-entity associations
- Boolean (or set-based) approach
- Weighted by the confidence in entity linking
- Consider other entities mentioned in the
document
Proximity-based variations
- So far, conditional independence assumption
between candidates and terms when
computing the probability P(t|d,e)
- Relationship between terms and entities that in
the same document is ignored
- Entity is equally strongly associated with everything
discussed in that document
- Let’s capture the dependence between entities
and terms
- Use their distance in the document
Using proximity kernels
[Petkova & Croft 2007]
D. Petkova and W.B. Croft. Proximity-based document representation for named entity retrieval. CIKM'07.
P(t|d, e) =
1
Z
NX
i=1
d(i, t)k(t, e)
Indicator function
1 if the term at position i is t,
0 otherwise
Normalising
contant
Proximity-based kernel
- constant function
- triangle kernel
- Gaussian kernel
- step function
Figure taken from D. Petkova and W.B. Croft. Proximity-based document representation for named entity
retrieval. CIKM'07.
Many possibilities in terms of
modeling
- Generative probabilistic models
- Discriminative probabilistic models
- Voting models
- Graph-based models
Discriminative models
- Vs. generative models:
- Fewer assumptions (e.g., term independence)
- “Let the data speak”
- Sufficient amounts of training data required
- Incorporating more document features, multiple
signals for document-entity associations
- Estimating P(r=1|e,q) directly (instead of P(e,q|r=1))
- Optimisation can get trapped in a local optimum
Arithmetic Mean
Discriminative (AMD) model
[Yang et al. 2010]
Y. Fang, L. Si, and A. P. Mathur. Discriminative models of integrating document evidence and
document-candidate associations for expert search. SIGIR'10.
P✓(r = 1|e, q) =
X
d
P(r1 = 1|q, d)P(r2 = 1|e, d)P(d)
Document
prior
Query-document
relevance
Document-entity
relevance
logistic function
over a linear
combination of features
⇣ Ng
X
j=1
jgj(e, dt)
⌘⇣ Nf
X
i=1
↵ifi(q, dt)
⌘
standard logistic
function
weight
parameters
(learned)
features
Learning to rank
- Pointwise
- AMD, GMD [Yang et al. 2010]
- Multilayer perceptrons, logistic regression [Sorg &
Cimiano 2011]
- Additive Groves [Moreira et al. 2011]
- Pairwise
- Ranking SVM [Yang et al. 2009]
- RankBoost, RankNet [Moreira et al. 2011]
- Listwise
- AdaRank, Coordinate Ascent [Moreira et al. 2011]
P. Sorg and P. Cimiano. Finding the right expert: Discriminative models for expert retrieval. KDIR’11.
C. Moreira, P. Calado, and B. Martins. Learning to rank for expert search in digital libraries of academic
publications. PAI'11.
Z. Yang, J. Tang, B. Wang, J. Guo, J. Li, and S. Chen. Expert2bole: From expert finding to bole search.
KDD'09.
Voting models
[Macdonald & Ounis 2006]
- Inspired by techniques from data fusion
- Combining evidence from different sources
- Documents ranked w.r.t. the query are seen as
“votes” for the entity
C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert
search task. CIKM'06.
Voting models
Many different variants, including...
- Votes
- Number of documents mentioning the entity
- Reciprocal Rank
- Sum of inverse ranks of documents
- CombSUM
- Sum of scores of documents
Score(e, q) = |{M(e)  R(q)}|
X
{M(e)R(q)}
s(d, q)
Score(e, q) =
X
{M(e)R(q)}
1
rank(d, q)
Score(e, q) = |M(e)  R(q)|
Graph-based models
[Serdyukov et al. 2008]
- One particular way of constructing graphs
- Vertices are documents and entities
- Only document-entity edges
- Search can be approached as a random walk
on this graph
- Pick a random document or entity
- Follow links to entities or other documents
- Repeat it a number of times
P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance prop- agation for expert
finding. CIKM'08.
Infinite random walk model
[Serdyukov et al. 2008]
P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance propagation for expert finding.
CIKM'08.
Pi(d) = PJ (d) + (1 )
X
e!d
P(d|e)Pi 1(e),
Pi(e) =
X
d!e
P(e|d)Pi 1(d),
PJ (d) = P(d|q),
ee e
d d
e
d d
Further reading
K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si.
Expertise Retrieval. FnTIR'12.
Evaluation initiatives
Test collections
Campaign Task Collection
Entity
repr.
#Topics
TREC Enterprise
(2005-08)
Expert finding
Enterprise intranets
(W3C, CSIRO)
Indirect
99 (W3C)
127 (CSIRO)
TREC Entity
(2009-11)
Rel. entity finding Web crawl
(ClueWeb09)
Indirect
120TREC Entity
(2009-11) List completion
Web crawl
(ClueWeb09)
Indirect
70
INEX Entity Ranking
(2007-09)
Entity search
Wikipedia Direct 55
INEX Entity Ranking
(2007-09) List completion
Wikipedia Direct 55
SemSearch Chall.
(2010-11)
Entity search Semantic Web crawl
(BTC2009)
Direct
142SemSearch Chall.
(2010-11) List search
Semantic Web crawl
(BTC2009)
Direct
50
INEX Linked Data
(2012-13)
Ad-hoc search
Wikipedia + RDF
(Wikipedia-LOD)
Direct
100 (’12)
144 (’13)
Test collections (2)
- Entity search as Question Answering
- TREC QA track
- QALD-2 challenge
- INEX-LD Jeopardy task
Entity search in DBpedia
[Balog & Neumayer 2013]
- Synthesising queries and relevance
assessments from previous eval. campaigns
- From short keyword queries to natural
language questions
- 485 queries in total
- Results are mapped to DBpedia
K. Balog and R. Neumayer. A test collection for entity search in DBpedia. SIGIR’13
Open challenges
- Combining text and structure
- Knowledge bases and unstructured Web documents
- Query understanding and modeling
- See [Sawant & Chakrabarti 2013] at the main
conference
- Result presentation
- How to interact with entities
U. Sawant and S. Chakrabarti. Learning joint query interpretation and response ranking. WWW'13.
Resources
- Complete tutorial material
http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
- Referred papers
http://www.mendeley.com/groups/3339761/entity-linking-and-retrieval-
tutorial-at-www-2013-and-sigir-2013/papers/

Contenu connexe

Tendances

Entity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. EffectivenessEntity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEduserv Foundation
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluationkrisztianbalog
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...Jie Bao
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologySteven Miller
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Dublin Core Basic Syntax Tutorial
Dublin Core Basic Syntax TutorialDublin Core Basic Syntax Tutorial
Dublin Core Basic Syntax TutorialEduserv Foundation
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 

Tendances (18)

Entity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. EffectivenessEntity Linking in Queries: Efficiency vs. Effectiveness
Entity Linking in Queries: Efficiency vs. Effectiveness
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadata
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Dublin Core Basic Syntax Tutorial
Dublin Core Basic Syntax TutorialDublin Core Basic Syntax Tutorial
Dublin Core Basic Syntax Tutorial
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
SWT Lecture Session 8 - Rules
SWT Lecture Session 8 - RulesSWT Lecture Session 8 - Rules
SWT Lecture Session 8 - Rules
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 

Similaire à Entity Retrieval (WWW 2013 tutorial)

Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
Slides
SlidesSlides
Slidesbutest
 
Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Paperschetanagavankar
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Mehwish Alam
 
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...FedorNikolaev
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 

Similaire à Entity Retrieval (WWW 2013 tutorial) (20)

Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
Sigir 2011 proceedings
Sigir 2011 proceedingsSigir 2011 proceedings
Sigir 2011 proceedings
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Slides
SlidesSlides
Slides
 
Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
 
Object oriented database
Object oriented databaseObject oriented database
Object oriented database
 
OODB
OODBOODB
OODB
 
Entities and attributes
Entities and attributesEntities and attributes
Entities and attributes
 
Oodb
OodbOodb
Oodb
 
Oodb
OodbOodb
Oodb
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
215 oodb
215 oodb215 oodb
215 oodb
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
DB and IR Integration
DB and IR IntegrationDB and IR Integration
DB and IR Integration
 

Plus de krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphskrisztianbalog
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Labkrisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systemskrisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Searchkrisztianbalog
 

Plus de krisztianbalog (11)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Entity Retrieval (WWW 2013 tutorial)

  • 1. Part II Entity Retrieval Krisztian Balog University of Stavanger Half-day tutorial at the WWW’13 conference | Rio de Janeiro, Brazil, 2013
  • 2.
  • 3. What is an entity? - Uniquely identifiable “thing” or “object” - Properties: - ID - Name(s) - Type(s) - Attributes - Relationships to other entities
  • 4. Entity retrieval tasks - Ad-hoc entity retrieval - List completion - Question answering - Factual questions - List questions - Related entity finding - Type-restricted variations - People, blogs, products, movies, etc.
  • 5. What’s so special about it? - Entities are not always directly represented - Recognise and disambiguate entities in text - Collect and aggregate information about a given entity from multiple documents and even multiple data collections - More structure - Types (from some taxonomy) - Attributes (from some ontology) - Relationships to other entities (“typed links”)
  • 6. In this Part - Focus on the ad-hoc entity retieval task - Mainly probabilistic models - Specifically, Language Models
  • 7. Outline for Part II - Crash course into probability theory - Ranking with ready-made entity descriptions - Ranking without explicit entity representations - Evaluation initiatives - Future directions
  • 8. Ad-hoc entity retrieval - Input: unconstrained natural language query - “telegraphic” queries (neither well-formed nor grammatically correct sentences or questions) - Output: ranked list of entities - Collection: unstructured and/or semi- structured documents
  • 10. This is not unrealistic...
  • 11. Document-based entity representations - Each entity is described by a document - Ranking entities much like ranking documents - Unstructured - Semi-structured
  • 12. Standard Language Modeling approach - Rank documents d according to their likelihood of being relevant given a query q: P(d|q) P(d|q) = P(q|d)P(d) P(q) / P(q|d)P(d) Document prior Probability of the document being relevant to any query Query likelihood Probability that query q was “produced” by document d P(q|d) = Y t2q P(t|✓d)n(t,q)
  • 13. Standard Language Modeling approach (2) Number of times t appears in q Empirical document model Collection model Smoothing parameter Maximum likelihood estimates P(q|d) = Y t2q P(t|✓d)n(t,q) Document language model Multinomial probability distribution over the vocabulary of terms P(t|✓d) = (1 )P(t|d) + P(t|C) n(t, d) |d| P d n(t, d) P d |d|
  • 14. Here, documents=entities, so P(e|q) / P(e)P(q|✓e) = P(e) Y t2q P(t|✓e)n(t,q) Entity prior Probability of the entity being relevant to any query Entity language model Multinomial probability distribution over the vocabulary of terms
  • 15. Semi-structured entity representation - Entity description documents are rarely unstructured - Representing entities as - Fielded documents -- the IR approach - Graphs -- the DB/SW approach
  • 16. dbpedia:Audi_A4 foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS
  • 17. Mixture of Language Models [Ogilvie & Callan, 2003] - Build a separate language model for each field - Take a linear combination of them mX j=1 µj = 1 Field language model Smoothed with a collection model built from all document representations of the same type in the collectionField weights P(t|✓d) = mX j=1 µjP(t|✓dj ) P. Ogilvie and J. Callan. Combining document representations for known item search. SIGIR'03.
  • 18. Setting field weights - Heuristically - Proportional to the length of text content in that field, to the field’s individual performance, etc. - Empirically (using training queries) - Problems - Number of possible fields is huge - It is not possible to optimise their weights directly - Entities are sparse w.r.t. different fields - Most entities have only a handful of predicates
  • 19. Predicate folding - Idea: reduce the number of fields by grouping them together - Grouping based on - Type [Pérez-Agüera et al. 2010] - Manually determined importance [Blanco et al. 2011] R. Blanco, P. Mika, and S. Vigna. Effective and efficient entity search in RDF data. ISWC'11. J.R. Pérez-Agüera, J. Arroyo, J. Greenberg, J.P. Iglesias, and V. Fresno. Using BM25F for semantic search. SemSearch'10.
  • 20. Hierarchical Entity Model [Neumayer et al. 2012] - Organise fields into a 2-level hierarchy - Field types (4) on the top level - Individual fields of that type on the bottom level - Estimate field weights - Using training data for field types - Using heuristics for bottom-level types R. Neumayer, K. Balog and K. Nørvåg. On the modeling of entities for ad-hoc entity search in the web of data. ECIR'12.
  • 21. Two-level hierarchy foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS Name Attributes Out-relations In-relations
  • 22. Formally P(t|✓d) = X F P(t|F, d)P(F|d) P(t|F, d) = X df 2F P(t|df , F)P(df |F, d) Field type importance Taken to be the same for all entities P(F|d) = P(F) Term generation Importance of a term is jointly determined by the field it occurs as well as all fields of that type (smoothed with a coll. level model) Field generation Uniform or estimated heuristically (based on length, popularity, etc) P(t|df , F) = (1 )P(t|df ) + P(t|✓dF ) Term importance
  • 23. Comparison of models d dfF ... t dfF t ... ...d tdf ... tdf ...d t ... t Unstructured document model Fielded document model Hierarchical document model
  • 24. Probabilistic Retrieval Model for Semistructured data [Kim et al. 2009] - Extension to the Mixture of Language Models - Find which document field each query term may be associated with Mapping probability Estimated for each query term P(t|✓d) = mX j=1 µjP(t|✓dj ) P(t|✓d) = mX j=1 P(dj|t)P(t|✓dj ) J. Kim, X. Xue, and W.B. Croft. A probabilistic retrieval model for semistructured data. ECIR'09.
  • 25. Estimating the mapping probability Term likelihood Probability of a query term occurring in a given field type Prior field probability Probability of mapping the query term to this field before observing collection statistics P(dj|t) = P(t|dj)P(dj) P(t) X dk P(t|dk)P(dk) P(t|Cj) = P d n(t, dj) P d |dj|
  • 26. Example Query: meg ryan war cast 0.407 team 0.382 title 0.187 genre 0.927 title 0.070 location 0.002 cast 0.601 team 0.381 title 0.017 dj dj djP(t|dj) P(t|dj) P(t|dj)
  • 27. The usual suspects from document retrieval... - Priors - HITS, PageRank - Document link indegree [Kamps & Koolen 2008] - Pseudo relevance feedback - Document-centric vs. entity-centric [Macdonald & Ounis 2007; Serdyukov et al. 2007] - sampling expansion terms from top ranked documents and/or (profiles of) top ranked candidates - Field-based [Kim & Croft 2011] J. Kamps and M. Koolen. The importance of link evidence in Wikipedia. ECIR'08. C. Macdonald and I. Ounis. Expertise drift and query expansion in expert search. CIKM'07. P. Serdyukov, S. Chernov, and W. Nejdl. Enhancing expert search through query modeling. ECIR'07. J.Y. Kim and W.B. Croft. A Field Relevance Model for Structured Document Retrieval. ECIR'12.
  • 28. So far... - Ranking (fielded) documents... - What is special about entities? - Type(s) - Relationships with other entities
  • 30. Using target types Assuming they have been identified... - Constraining results - Soft/hard filtering - Different ways to measure type similarity (between target types and the types associated with the entity) - Set-based - Content-based - Lexical similarity of type labels - Query expansion - Adding terms from type names to the query - Entity expansion - Categories as a separate metadata field
  • 31. Modeling terms and categories [Balog et al. 2011] K. Balog, M. Bron, and M. de Rijke. Query modeling for entity search based on terms, categories and examples. TOIS'11. Term-based representation Query model p(t|✓T e )p(t|✓T q ) p(c|✓C q ) p(c|✓C e ) Entity model Query model Entity model Category-based representation KL(✓T q ||✓T e ) KL(✓C q ||✓C e ) P(e|q) / P(q|e)P(e) P(q|e) = (1 )P(✓T q |✓T e ) + P(✓C q |✓C e )
  • 32. Identifying target types - Types of top ranked entities [Vallet & Zaragoza 2008] - Direct term-based vs. indirect entity-based representations [Balog & Neumayer 2012] - Hierarchical case is difficult... D. Vallet and H. Zaragoza. Inferring the most important types of a query: a semantic approach. SIGIR'08. K. Balog and R. Neumayer. Hierarchical target type identification for entity-oriented queries. CIKM'12. U. Sawant and S. Chakrabarti. Learning joint query interpretation and response ranking. WWW'13.
  • 33. Expanding target types - Pseudo relevance feedback - Based on hierarchical structure - Using lexical similarity of type labels
  • 35. Scenario - Entity descriptions are not readily available - Entity occurrences are annotated
  • 36. The basic idea Use documents to get from queries to entities e xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx Query-document association the document’s relevance Document-entity association how well the document characterises the entity
  • 37. Two principal approaches - Profile-based methods - Create a textual profile for entities, then rank them (by adapting document retrieval techniques) - Document-based methods - Indirect representation based on mentions identified in documents - First ranking documents (or snippets) and then aggregating evidence for associated entities
  • 38. Profile-based methods q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx e e
  • 39. Document-based methods q xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx X e X X e e
  • 40. Many possibilities in terms of modeling - Generative probabilistic models - Discriminative probabilistic models - Voting models - Graph-based models
  • 41. Generative probabilistic models - Candidate generation models (P(e|q)) - Two-stage language model - Topic generation models (P(q|e)) - Candidate model, a.k.a. Model 1 - Document model, a.k.a. Model 2 - Proximity-based variations - Both families of models can be derived from the Probability Ranking Principle [Fang & Zhai 2007] H. Fang and C. Zhai. Probabilistic models for expert finding. ECIR'07.
  • 42. Candidate models (“Model 1”) [Balog et al. 2006] P(q|✓e) = Y t2q P(t|✓e)n(t,q) Smoothing With collection-wide background model (1 )P(t|e) + P(t) X d P(t|d, e)P(d|e) K. Balog, L. Azzopardi, and M. de Rijke. Formal Models for Expert Finding in Enterprise Corpora. SIGIR'06. Document-entity association Term-candidate co-occurrence In a particular document. In the simplest case:P(t|d)
  • 43. Document models (“Model 2”) [Balog et al. 2006] P(q|e) = X d P(q|d, e)P(d|e) Document-entity association Document relevance How well document d supports the claim that e is relevant to q Y t2q P(t|d, e)n(t,q) Simplifying assumption (t and e are conditionally independent given d) P(t|✓d) K. Balog, L. Azzopardi, and M. de Rijke. Formal Models for Expert Finding in Enterprise Corpora. SIGIR'06.
  • 44. Document-entity associations - Boolean (or set-based) approach - Weighted by the confidence in entity linking - Consider other entities mentioned in the document
  • 45. Proximity-based variations - So far, conditional independence assumption between candidates and terms when computing the probability P(t|d,e) - Relationship between terms and entities that in the same document is ignored - Entity is equally strongly associated with everything discussed in that document - Let’s capture the dependence between entities and terms - Use their distance in the document
  • 46. Using proximity kernels [Petkova & Croft 2007] D. Petkova and W.B. Croft. Proximity-based document representation for named entity retrieval. CIKM'07. P(t|d, e) = 1 Z NX i=1 d(i, t)k(t, e) Indicator function 1 if the term at position i is t, 0 otherwise Normalising contant Proximity-based kernel - constant function - triangle kernel - Gaussian kernel - step function
  • 47. Figure taken from D. Petkova and W.B. Croft. Proximity-based document representation for named entity retrieval. CIKM'07.
  • 48. Many possibilities in terms of modeling - Generative probabilistic models - Discriminative probabilistic models - Voting models - Graph-based models
  • 49. Discriminative models - Vs. generative models: - Fewer assumptions (e.g., term independence) - “Let the data speak” - Sufficient amounts of training data required - Incorporating more document features, multiple signals for document-entity associations - Estimating P(r=1|e,q) directly (instead of P(e,q|r=1)) - Optimisation can get trapped in a local optimum
  • 50. Arithmetic Mean Discriminative (AMD) model [Yang et al. 2010] Y. Fang, L. Si, and A. P. Mathur. Discriminative models of integrating document evidence and document-candidate associations for expert search. SIGIR'10. P✓(r = 1|e, q) = X d P(r1 = 1|q, d)P(r2 = 1|e, d)P(d) Document prior Query-document relevance Document-entity relevance logistic function over a linear combination of features ⇣ Ng X j=1 jgj(e, dt) ⌘⇣ Nf X i=1 ↵ifi(q, dt) ⌘ standard logistic function weight parameters (learned) features
  • 51. Learning to rank - Pointwise - AMD, GMD [Yang et al. 2010] - Multilayer perceptrons, logistic regression [Sorg & Cimiano 2011] - Additive Groves [Moreira et al. 2011] - Pairwise - Ranking SVM [Yang et al. 2009] - RankBoost, RankNet [Moreira et al. 2011] - Listwise - AdaRank, Coordinate Ascent [Moreira et al. 2011] P. Sorg and P. Cimiano. Finding the right expert: Discriminative models for expert retrieval. KDIR’11. C. Moreira, P. Calado, and B. Martins. Learning to rank for expert search in digital libraries of academic publications. PAI'11. Z. Yang, J. Tang, B. Wang, J. Guo, J. Li, and S. Chen. Expert2bole: From expert finding to bole search. KDD'09.
  • 52. Voting models [Macdonald & Ounis 2006] - Inspired by techniques from data fusion - Combining evidence from different sources - Documents ranked w.r.t. the query are seen as “votes” for the entity C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert search task. CIKM'06.
  • 53. Voting models Many different variants, including... - Votes - Number of documents mentioning the entity - Reciprocal Rank - Sum of inverse ranks of documents - CombSUM - Sum of scores of documents Score(e, q) = |{M(e) R(q)}| X {M(e)R(q)} s(d, q) Score(e, q) = X {M(e)R(q)} 1 rank(d, q) Score(e, q) = |M(e) R(q)|
  • 54. Graph-based models [Serdyukov et al. 2008] - One particular way of constructing graphs - Vertices are documents and entities - Only document-entity edges - Search can be approached as a random walk on this graph - Pick a random document or entity - Follow links to entities or other documents - Repeat it a number of times P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance prop- agation for expert finding. CIKM'08.
  • 55. Infinite random walk model [Serdyukov et al. 2008] P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance propagation for expert finding. CIKM'08. Pi(d) = PJ (d) + (1 ) X e!d P(d|e)Pi 1(e), Pi(e) = X d!e P(e|d)Pi 1(d), PJ (d) = P(d|q), ee e d d e d d
  • 56. Further reading K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si. Expertise Retrieval. FnTIR'12.
  • 58. Test collections Campaign Task Collection Entity repr. #Topics TREC Enterprise (2005-08) Expert finding Enterprise intranets (W3C, CSIRO) Indirect 99 (W3C) 127 (CSIRO) TREC Entity (2009-11) Rel. entity finding Web crawl (ClueWeb09) Indirect 120TREC Entity (2009-11) List completion Web crawl (ClueWeb09) Indirect 70 INEX Entity Ranking (2007-09) Entity search Wikipedia Direct 55 INEX Entity Ranking (2007-09) List completion Wikipedia Direct 55 SemSearch Chall. (2010-11) Entity search Semantic Web crawl (BTC2009) Direct 142SemSearch Chall. (2010-11) List search Semantic Web crawl (BTC2009) Direct 50 INEX Linked Data (2012-13) Ad-hoc search Wikipedia + RDF (Wikipedia-LOD) Direct 100 (’12) 144 (’13)
  • 59. Test collections (2) - Entity search as Question Answering - TREC QA track - QALD-2 challenge - INEX-LD Jeopardy task
  • 60. Entity search in DBpedia [Balog & Neumayer 2013] - Synthesising queries and relevance assessments from previous eval. campaigns - From short keyword queries to natural language questions - 485 queries in total - Results are mapped to DBpedia K. Balog and R. Neumayer. A test collection for entity search in DBpedia. SIGIR’13
  • 61. Open challenges - Combining text and structure - Knowledge bases and unstructured Web documents - Query understanding and modeling - See [Sawant & Chakrabarti 2013] at the main conference - Result presentation - How to interact with entities U. Sawant and S. Chakrabarti. Learning joint query interpretation and response ranking. WWW'13.
  • 62. Resources - Complete tutorial material http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/ - Referred papers http://www.mendeley.com/groups/3339761/entity-linking-and-retrieval- tutorial-at-www-2013-and-sigir-2013/papers/