SlideShare a Scribd company logo
1 of 25
Download to read offline
Alberto Parravicini
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Fast Entity
Linking via Graph
Embeddings
Alberto Parravicini 23/05/2019
Entity Linking (EL): connecting words of
interest to unique identities (e.g. Wikipedia
Page)
Entity Linking
2
Alberto Parravicini 23/05/2019
Component of applications that require
high-level representations of text:
1. Search Engines, for semantic search
2. Recommender Systems, to retrieve documents
similar to each other
3. Chat bots, to understand intents and entities
Use Cases
3
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
1. Named Entity Recognition (NER): spot
mentions (a.k.a. Named Entities)
● High-accuracy in the state-of-the-art[1]
The EL Pipeline (1/2)
[1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging."
4
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
5
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
6
Alberto Parravicini 23/05/2019
An EL system requires 2 steps:
2. Entity Linking: connect mentions to entities
The EL Pipeline (2/2)
7
Alberto Parravicini 23/05/2019
● Novel unsupervised framework for EL
● No dependency on NLP
● First EL algorithm to use graph embeddings
● Accuracy similar to supervised SoA techniques
● Highly scalable and real-time execution time
● < 1 sec to process text with 30+ mentions
Our contributions
8
Alberto Parravicini 23/05/2019
Our EL Pipeline
Input text
with NER
Graph Building Embeddings Candidate Finder
Disambiguation
Graph Learning
Execution
Output
1 2 3 4
9
Alberto Parravicini 23/05/2019
We obtain a large graph from DBpedia
● All the information of Wikipedia, stored as triples
● 12M entities, 170M links
Step 1/4
Graph Creation
10
Alberto Parravicini 23/05/2019
From DBPedia, we build two graphs
Redirects GraphProperty Graph
Step 1/4
Graph Creation
11
Alberto Parravicini 23/05/2019
● Graph embeddings encode vertices as vectors
● “Similar” vertices have “similar” embeddings
● Idea: entities with the same context should have
low embedding distance
Embeddings Creation (1/2)
Step 2/4
12
Alberto Parravicini 23/05/2019
● In our work, we use DeepWalk[1]
● Like word2vec[2]
, it leverages random walks (i.e.
vertex sequences) to create embeddings
● Embedding size 170, walk length 8
● DeepWalk uses only the graph topology
● Simple baseline, we can use better algorithms
and leverage graph features
Embeddings Creation (2/2)
[1] Bryan Perozzi et al.2014. Deepwalk
[2] Tomas Mikolov et al. 2013. Distributed
representations of words and phrases and
their compositionality
Step 2/4
13
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
Step 3/4
14
Alberto Parravicini 23/05/2019
Candidates Finder
Step 3/4
Idea: for each mention, select a few candidate
vertices with index- based string similarity
● Solve ambiguity following redirect and
disambiguation edges
15
Alberto Parravicini 23/05/2019
Disambiguation (1/3)
●We want to pick the “best” candidate for
each mention
● In a “good” solution, candidates are related to
each other (e.g. Donald Trump, Hillary Clinton)
●Observation: a good tuple of candidates has
embeddings close to each other
●Evaluating all combinations
is infeasible
● 10 mentions with 100 candidates 10010
Step 4/4
16
Alberto Parravicini 23/05/2019
17
●We use an heuristic state-space search
algorithm to maximize:
Sum of string
similarities
Step 4/4
Sum of embedding
cosine similarities
w.r.t. tuple mean
Disambiguation (2/3)
Alberto Parravicini 23/05/2019
● Iterative state-space heuristic
Greedy
iterative
procedure
Step 4/4
18
Disambiguation (3/3)
Alberto Parravicini 23/05/2019
● We compared against 6 SoA EL algorithms, on 5
datasets
● Our Micro-averaged F1 score is comparable with
SoA supervised algorithms
Results: accuracy
19
Alberto Parravicini 23/05/2019
● Different settings enable real-time EL, with
minimal loss in accuracy
● E.g. number of iterations, early stop
Results: exec. time
20
Alberto Parravicini 23/05/2019
● Execution time is well divided between
Candidate Finder and Disambiguation
Results: exec. time of
single steps
21
Thank you!
Fast Entity Linking via Graph Embeddings
● Novel unsupervised framework for EL
● First EL algorithm to use graph embeddings
○ Accuracy similar to supervised SoA techniques
● Real-time execution time
Alberto Parravicini, alberto.parravicini@polimi.it
Rhicheek Patra
Davide Bartolini
Marco Santambrogio
2019-05-{17-31}, NGCX
Alberto Parravicini 23/05/2019
●Turn topology and properties of each vertex
into a vector
●“Similar” vertices have “similar” embeddings
Embeddings
Step 2/4
embedding
properties
topology
Hamilton, William L., Rex Ying, and Jure Leskovec.
"Representation learning on graphs: Methods and
applications." arXiv preprint arXiv:1709.05584 (2017).
Alberto Parravicini 23/05/2019
… and join them together
Step 1/4
Graph Creation
Alberto Parravicini 23/05/2019
Candidates Finder
Idea: for each mention, select a small number of
candidate vertices with string similarity
● We use a simple index-based string search
● Fuzzy matching with 2-grams and 3-grams
● This provide a simple baseline (60-70%
accuracy)
Step 3/4

More Related Content

What's hot

What's hot (13)

Configuration of classes
Configuration of classesConfiguration of classes
Configuration of classes
 
Bio 180208
Bio 180208Bio 180208
Bio 180208
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
Media4Math's Algebra Series
Media4Math's Algebra SeriesMedia4Math's Algebra Series
Media4Math's Algebra Series
 
Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002Information Flow based Ontology Mapping - 2002
Information Flow based Ontology Mapping - 2002
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
BigInt: Integers as big as you want in JavaScript (Web Engines Hackfest 2017)
 
Fast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster CodeFast Python: Master the Basics to Write Faster Code
Fast Python: Master the Basics to Write Faster Code
 
brf to mathml
brf to mathmlbrf to mathml
brf to mathml
 
Intro to Elixir
Intro to ElixirIntro to Elixir
Intro to Elixir
 
Integrate with Tracing and Logging
Integrate with Tracing and LoggingIntegrate with Tracing and Logging
Integrate with Tracing and Logging
 
Unit 1 polynomial manipulation
Unit 1   polynomial manipulationUnit 1   polynomial manipulation
Unit 1 polynomial manipulation
 

Similar to Fast and Accurate Entity Linking via Graph Embedding

Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontana
Ajay Ohri
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Faculty of Technical Sciences, University of Novi Sad
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
Janna Joceli Omena
 

Similar to Fast and Accurate Entity Linking via Graph Embedding (20)

Gospel - High-performance heterogeneous architectures for graph analytics
 Gospel - High-performance heterogeneous architectures for graph analytics Gospel - High-performance heterogeneous architectures for graph analytics
Gospel - High-performance heterogeneous architectures for graph analytics
 
IRJET- Python Based Machine Learning for Profile Matching
IRJET- 	  Python Based Machine Learning for Profile MatchingIRJET- 	  Python Based Machine Learning for Profile Matching
IRJET- Python Based Machine Learning for Profile Matching
 
Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...Pattern-based Definition and Generation of Components for a Synchronous React...
Pattern-based Definition and Generation of Components for a Synchronous React...
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontana
 
Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015Master in Big Data Analytics and Social Mining 20015
Master in Big Data Analytics and Social Mining 20015
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
CIS 5 Project.pdf
CIS 5 Project.pdfCIS 5 Project.pdf
CIS 5 Project.pdf
 
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdfcis5-204-Project-ch11c - Evan, Le, Mata.pdf
cis5-204-Project-ch11c - Evan, Le, Mata.pdf
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Locloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging toolsLocloud - D2.6: Crawler ready tagging tools
Locloud - D2.6: Crawler ready tagging tools
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoT
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
 
Comparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & PythonComparing the performance of a business process: using Excel & Python
Comparing the performance of a business process: using Excel & Python
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Recently uploaded (20)

NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 

Fast and Accurate Entity Linking via Graph Embedding

  • 1. Alberto Parravicini Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX Fast Entity Linking via Graph Embeddings
  • 2. Alberto Parravicini 23/05/2019 Entity Linking (EL): connecting words of interest to unique identities (e.g. Wikipedia Page) Entity Linking 2
  • 3. Alberto Parravicini 23/05/2019 Component of applications that require high-level representations of text: 1. Search Engines, for semantic search 2. Recommender Systems, to retrieve documents similar to each other 3. Chat bots, to understand intents and entities Use Cases 3
  • 4. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 1. Named Entity Recognition (NER): spot mentions (a.k.a. Named Entities) ● High-accuracy in the state-of-the-art[1] The EL Pipeline (1/2) [1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." 4
  • 5. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 5
  • 6. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 6
  • 7. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 7
  • 8. Alberto Parravicini 23/05/2019 ● Novel unsupervised framework for EL ● No dependency on NLP ● First EL algorithm to use graph embeddings ● Accuracy similar to supervised SoA techniques ● Highly scalable and real-time execution time ● < 1 sec to process text with 30+ mentions Our contributions 8
  • 9. Alberto Parravicini 23/05/2019 Our EL Pipeline Input text with NER Graph Building Embeddings Candidate Finder Disambiguation Graph Learning Execution Output 1 2 3 4 9
  • 10. Alberto Parravicini 23/05/2019 We obtain a large graph from DBpedia ● All the information of Wikipedia, stored as triples ● 12M entities, 170M links Step 1/4 Graph Creation 10
  • 11. Alberto Parravicini 23/05/2019 From DBPedia, we build two graphs Redirects GraphProperty Graph Step 1/4 Graph Creation 11
  • 12. Alberto Parravicini 23/05/2019 ● Graph embeddings encode vertices as vectors ● “Similar” vertices have “similar” embeddings ● Idea: entities with the same context should have low embedding distance Embeddings Creation (1/2) Step 2/4 12
  • 13. Alberto Parravicini 23/05/2019 ● In our work, we use DeepWalk[1] ● Like word2vec[2] , it leverages random walks (i.e. vertex sequences) to create embeddings ● Embedding size 170, walk length 8 ● DeepWalk uses only the graph topology ● Simple baseline, we can use better algorithms and leverage graph features Embeddings Creation (2/2) [1] Bryan Perozzi et al.2014. Deepwalk [2] Tomas Mikolov et al. 2013. Distributed representations of words and phrases and their compositionality Step 2/4 13
  • 14. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges Step 3/4 14
  • 15. Alberto Parravicini 23/05/2019 Candidates Finder Step 3/4 Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges 15
  • 16. Alberto Parravicini 23/05/2019 Disambiguation (1/3) ●We want to pick the “best” candidate for each mention ● In a “good” solution, candidates are related to each other (e.g. Donald Trump, Hillary Clinton) ●Observation: a good tuple of candidates has embeddings close to each other ●Evaluating all combinations is infeasible ● 10 mentions with 100 candidates 10010 Step 4/4 16
  • 17. Alberto Parravicini 23/05/2019 17 ●We use an heuristic state-space search algorithm to maximize: Sum of string similarities Step 4/4 Sum of embedding cosine similarities w.r.t. tuple mean Disambiguation (2/3)
  • 18. Alberto Parravicini 23/05/2019 ● Iterative state-space heuristic Greedy iterative procedure Step 4/4 18 Disambiguation (3/3)
  • 19. Alberto Parravicini 23/05/2019 ● We compared against 6 SoA EL algorithms, on 5 datasets ● Our Micro-averaged F1 score is comparable with SoA supervised algorithms Results: accuracy 19
  • 20. Alberto Parravicini 23/05/2019 ● Different settings enable real-time EL, with minimal loss in accuracy ● E.g. number of iterations, early stop Results: exec. time 20
  • 21. Alberto Parravicini 23/05/2019 ● Execution time is well divided between Candidate Finder and Disambiguation Results: exec. time of single steps 21
  • 22. Thank you! Fast Entity Linking via Graph Embeddings ● Novel unsupervised framework for EL ● First EL algorithm to use graph embeddings ○ Accuracy similar to supervised SoA techniques ● Real-time execution time Alberto Parravicini, alberto.parravicini@polimi.it Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX
  • 23. Alberto Parravicini 23/05/2019 ●Turn topology and properties of each vertex into a vector ●“Similar” vertices have “similar” embeddings Embeddings Step 2/4 embedding properties topology Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
  • 24. Alberto Parravicini 23/05/2019 … and join them together Step 1/4 Graph Creation
  • 25. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a small number of candidate vertices with string similarity ● We use a simple index-based string search ● Fuzzy matching with 2-grams and 3-grams ● This provide a simple baseline (60-70% accuracy) Step 3/4