Greek Election Candidates

Stat rosa pristina nomine,
nomine nuda tenemus

César de Pablo Sanchez

Overview of previous work

TAC-KBP 2010 - Combining Similarities and Regression
Classifiers for Entity Linking

1. Task definition: KBP and EL
2. System description
3. Results
4. Conclusions

Drug Drug Interactions

Relation extraction
Anaphora resolution

OPINATOR - Opinion Mining
Sentiment loaded dictionaries
Sentiment classification
Opinion summarization
Search/Navigation

Knowledge acquisition
List candidates for the Greek elections in June.


What party does Tsipras represents?

How old is he?
What does Syriza means?


What party does Tsipras represents?

How old is he?
What does Syriza means?

How old is Samaras?


1. Task definition: KBP and EL
2. System description
3. Results
4. Conclusions


Knowledge Base Population
César de Pablo, Juan Perea, Paloma Martínez


Knowledge

Base

KBP


Knowledge

Base

KBP
from Wikipedia dump (2008)
●
Title, name, type, id,
●
wiki text,
●
several facts as [name, value]
● 1.3 million English newswire
documents
● Published from 1994 and 2008
● 488.240 webpages

IE = KBP?

QA = KBP?

IE = KBP?
Accurate extraction of facts – not annotation

Learn facts from corpus - repetition is not
important but helps confidence

Asserting wrong information is bad

Scalability

Provenance

QA = KBP?

IE = KBP?
Accurate extraction of facts – not annotation Slots are fixed but targets change
Learn facts from corpus - repetition is not Leverage knowledge from the KB
important but helps confidence
Global resolution - ground information to the KB
Asserting wrong information is bad
Avoid contradiction
Scalability
Detect novel info
Provenance

QA = KBP?

Task at TAC - KBP
●
Task –1: Slot Filling in
Entity Linking grounding entity mentions
document to KB entries
● Slot Filling – Learning attributes about target
entities

Task 2: Entity Linking

Task at TAC - KBP
●
Task –1: Slot Filling in
Entity Linking grounding entity mentions
entities

Task at TAC - KBP
● Entity Linking – grounding entity mentions in
entities

Task 2: Entity Linking

Entity Linking: Example
For a name string and a document, determine which mentions in
● Entity Linking – grounding entity entity in a KB

if any is being referred to by entries string
document to KB the name
● <query
id="EL006455">
Slot Filling – Learning attributes about target
<name>Reserve Bank</name>
entities
<docid>eng-NG-31-100316-11150589</docid>
<entity>E0700143</entity>
</query>

<query id="EL06472">
</query>

Entity Linking: Example
For a name string and a document, determine which mentions in
● Entity Linking – grounding entity entity in a KB

if any is being referred to by entries string
document to KB the name
● <query
id="EL006455">
entities
…
</query>
E0421510: Reserve Bank of Australia
…
E0700143: Reserve Bank of India
<query id="EL06472"> ....
<docid>eng-NG-31-142262-10040510</docid> NIL
</query>

Entity Linking: Challenges
Focus on confusable entities

●
Ambiguous names : Reserve Bank, Alan Jackson, Fonda
●● Slot Filling – Learning attributes about target
entities


●
Ambiguous names entries
document to KB
●● Multiple Name– Learning attributes about target
Slot Filling variants: Saddam Hussain, Saddam Hussein
entities


●
document to KB
Slot Filling variants
● entities
Acronym expansion: CDC, AZ


●
document to KB
● entities
Acronym expansion
●
Variety of cases : Centre for Disease Control, European Centre
for Disease Control, AZ, Arizona, Astra Zeneca


●
document to KB
● entities
Acronym expansion
●
Variety of cases
●
Pilot task – entity linking withouth text support
●
Identify missing entities – then cluster (2011)

Entity Linking: Evaluation
Name mention – document pairs
●
Accuracy micro = num correct / num queries
●
Accuracy macro = group by entities (2009)

queries NIL set genre % NIL
3904 2229 eval 2009 news 0.571
1500 426 train 2010 web 0.284
2250 1230 eval 2010 news + 0.547
web

uc3m EL system
●
Supervised architecture

●
Use similarities to KB entries or parts of them – avoid a
document between objects
wide feature vector

●
entities

1) Candidate Entity Retrieval
2) Candidate Filtering
3) Validation (NIL classification)

1) Candidate Retrieval
●
Each KB article is indexed using Lucene, using several
indexes and fields KB entries
document to
●
● ALIASFilling – names plus aliases extracted from wiki slots:
Slot - include Learning attributes about target
alias, abbreviation, website, etc.
entities
●
NER – Named entities extracted from text: <id, ne, text>
●
KB - entity slots <id, [(slot_name,slot_value)]>
●
WIKIPEDIA – anchorList, category, redirect, outlinks, inlinks
●
Each EL query transforms into several Lucene queries –
result [KB name, score] list

1) Candidate Retrieval
●
EL Query: [Michael Jordan,eng-NG-31-100316-11150589]

●
Lucene queries:to KB entries
document
●
● name=Michael AND name = Jordan
●
entities
alias=Michael AND alias = Jordan
●
abbr=Michael AND abbr = Jordan

●
For each query:
●
[EL0989789, Michael Jordan, 25.00]
●
[EL6565356, Michael B. Jordan , 25.00]
●
[EL6565356, Michael I. Jordan , 25.00]
●
[EL6565356, Michael-Hakim Jordan , 25.00]
●
[EL6565356, Jordan , 20.00]

2) Candidate Filtering
●
Classification problem

●
decide (EL query KB entries name + wiki text ) is a good
document to + text , KB
match
●
In fact, rank by prediction confidence
entities
●
Use similarity scores as features – norm and unnorm
●
Use a cost sensitive classifier.
●
Best results: Model trees with linear regression leafs

Features
●
Index-based scores:
●
sim (EL queries, KB entries) directly from initial retrieval
●
Context-similarity Learning attributes about target
● Slot Filling – scores:

● entities
sim(document, wikitext) o sim(document,slots)
●
Name similarity score:
●
sim (EL queries, KB entries) – more expensive: equal,
QcontainsE, EcontainsQ, Jaro, Jaro-Winkler, SLIM (based on
SecondString)

3) Validation
●
Classification – selected candidate is good enough or NIL

●
Positive examples KBcorrect candidate example
document to – entries
● ● Slot Filling – Learning attributes about target
Negative examples – top ranked entities for those queries
entities
that do not have a link in the KB
●
Balanced dataset
●
Best classifier: Logistic Regression

EL results - main
●

●
document to KB web
news entriesnews+web Highest Median
●
750 ORG 0.69 0.67 0.67 0.85 0.68
● Slot GPE 0.52– Learning attributes about target
749
Filling 0.53 0.51 0.80 0.60
●
entities 0.82
751 PER 0.76 0.85 0.96 0.85
●
2250 ALL 0.67 0.65 0.68 0.87 0.69

●

● Influence of domain?

EL results - main
●

●
document to KB web
●
750 ORG 0.69 0.67 0.67 0.85 0.68
749
Filling 0.53 0.51 0.80 0.60
●
entities 0.82
751 PER 0.76 0.85 0.96 0.85
●
2250 ALL 0.67 0.65 0.68 0.87 0.69

●

EL results - main
●

●
document to KB web
●
750 ORG 0.69 0.67 0.67 0.85 0.68
749
Filling 0.53 0.51 0.80 0.60
●
entities 0.82
751 PER 0.76 0.85 0.96 0.85
●
2250 ALL 0.67 0.65 0.68 0.87 0.69

● GPE are particularly difficult

EL results - main
● AA

document to KB web
750 ORG 0.69 0.67 0.67 0.85 0.68
749
Filling 0.53 0.51 0.80 0.60
entities 0.82
751 PER 0.76 0.85 0.96 0.85
2250 ALL 0.67 0.65 0.68 0.87 0.69

news web news+web Highest Median
2250 ALL 0.67 0.65 0.68 0.87 0.69
1020 noNIL 0.51 0.59 0.49
1230 NIL 0.81 0.70 0.82

EL results – pilot w/o text
●

●
news(main) news +n-sim NIL +n-sim all
●
2250 ALL 0.67 0.58 0.66 0.70
1020 noNIL 0.51 0.35 0.40 0.47
●
entities NIL 0.81
1230 0.77 0.88 0.88
●

● Including name similarity scores helped

EL systems comparison
●
Prior on Link probability/popularity (Stanford-UBC 2009, LCC 2010,
Microsoft 2011)
Learning to rank algorithms: ListNet (CUNY 2011)

●
Expand queries: acronym expansion/correference (NUS 2011)
entities
●
Unsupervised system – entity co-ocurrence + PageRank
(WebTLab 2010)

●
Inductive EL – first cluster, then link (LCC 2011)
●
Collective entity linking (Microsoft 2011)

Conclusion
●
Supervised EL system

●
Influence of training size
●● beware of training data distribution
● entities
Consider name-similarities even for reranking
●
Improve initial candidate retrieval
●
Perform collective Entity Linking
●
Efficiency?

Related tasks
● Cluster Documents Mentioning Entities
● Entity correference – document and cross-
document
● Add missing links between Wikipedia pages
● Link entities to matching Wikipedia articles

Greek Election Candidates

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Greek Election Candidates

Similar to Greek Election Candidates (20)

Recently uploaded

Recently uploaded (14)

Greek Election Candidates