Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Fast and Accurate Entity Linking via Graph Embedding

Entity Linking, the task of mapping ambiguous Named Entities to unique identifiers in a knowledge base, is a cornerstone of multiple Information Retrieval and Text Analysis pipelines. So far, no single entity linking algorithm has been able to offer the accuracy and scalability required to deal with the ever-increasing amount of data in the web and become a de-facto standard. In this work, we propose a framework for entity linking that
leverages graph embeddings to perform collective disambiguation. This framework support pluggable algorithms for generating embeddings and ranking the candidates. Moreover, we use our framework to evaluate a reference pipeline that uses DBpedia as knowledge base and leverages specific algorithms for fast candidate search and high-performance state-space search optimization. Compared to existing solutions, our pipeline offers state-of-the-art accuracy on a variety of datasets without any supervised training and provides real-time execution even when processing documents with dozens of Named Entities. Additionally, the flexibility of our framework allows adapting adapt to a multitude of scenarios by balancing accuracy and execution time.

  • Identifiez-vous pour voir les commentaires

  • Soyez le premier à aimer ceci

Fast and Accurate Entity Linking via Graph Embedding

  1. 1. Alberto Parravicini Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX Fast Entity Linking via Graph Embeddings
  2. 2. Alberto Parravicini 23/05/2019 Entity Linking (EL): connecting words of interest to unique identities (e.g. Wikipedia Page) Entity Linking 2
  3. 3. Alberto Parravicini 23/05/2019 Component of applications that require high-level representations of text: 1. Search Engines, for semantic search 2. Recommender Systems, to retrieve documents similar to each other 3. Chat bots, to understand intents and entities Use Cases 3
  4. 4. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 1. Named Entity Recognition (NER): spot mentions (a.k.a. Named Entities) ● High-accuracy in the state-of-the-art[1] The EL Pipeline (1/2) [1] Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." 4
  5. 5. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 5
  6. 6. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 6
  7. 7. Alberto Parravicini 23/05/2019 An EL system requires 2 steps: 2. Entity Linking: connect mentions to entities The EL Pipeline (2/2) 7
  8. 8. Alberto Parravicini 23/05/2019 ● Novel unsupervised framework for EL ● No dependency on NLP ● First EL algorithm to use graph embeddings ● Accuracy similar to supervised SoA techniques ● Highly scalable and real-time execution time ● < 1 sec to process text with 30+ mentions Our contributions 8
  9. 9. Alberto Parravicini 23/05/2019 Our EL Pipeline Input text with NER Graph Building Embeddings Candidate Finder Disambiguation Graph Learning Execution Output 1 2 3 4 9
  10. 10. Alberto Parravicini 23/05/2019 We obtain a large graph from DBpedia ● All the information of Wikipedia, stored as triples ● 12M entities, 170M links Step 1/4 Graph Creation 10
  11. 11. Alberto Parravicini 23/05/2019 From DBPedia, we build two graphs Redirects GraphProperty Graph Step 1/4 Graph Creation 11
  12. 12. Alberto Parravicini 23/05/2019 ● Graph embeddings encode vertices as vectors ● “Similar” vertices have “similar” embeddings ● Idea: entities with the same context should have low embedding distance Embeddings Creation (1/2) Step 2/4 12
  13. 13. Alberto Parravicini 23/05/2019 ● In our work, we use DeepWalk[1] ● Like word2vec[2] , it leverages random walks (i.e. vertex sequences) to create embeddings ● Embedding size 170, walk length 8 ● DeepWalk uses only the graph topology ● Simple baseline, we can use better algorithms and leverage graph features Embeddings Creation (2/2) [1] Bryan Perozzi et al.2014. Deepwalk [2] Tomas Mikolov et al. 2013. Distributed representations of words and phrases and their compositionality Step 2/4 13
  14. 14. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges Step 3/4 14
  15. 15. Alberto Parravicini 23/05/2019 Candidates Finder Step 3/4 Idea: for each mention, select a few candidate vertices with index- based string similarity ● Solve ambiguity following redirect and disambiguation edges 15
  16. 16. Alberto Parravicini 23/05/2019 Disambiguation (1/3) ●We want to pick the “best” candidate for each mention ● In a “good” solution, candidates are related to each other (e.g. Donald Trump, Hillary Clinton) ●Observation: a good tuple of candidates has embeddings close to each other ●Evaluating all combinations is infeasible ● 10 mentions with 100 candidates 10010 Step 4/4 16
  17. 17. Alberto Parravicini 23/05/2019 17 ●We use an heuristic state-space search algorithm to maximize: Sum of string similarities Step 4/4 Sum of embedding cosine similarities w.r.t. tuple mean Disambiguation (2/3)
  18. 18. Alberto Parravicini 23/05/2019 ● Iterative state-space heuristic Greedy iterative procedure Step 4/4 18 Disambiguation (3/3)
  19. 19. Alberto Parravicini 23/05/2019 ● We compared against 6 SoA EL algorithms, on 5 datasets ● Our Micro-averaged F1 score is comparable with SoA supervised algorithms Results: accuracy 19
  20. 20. Alberto Parravicini 23/05/2019 ● Different settings enable real-time EL, with minimal loss in accuracy ● E.g. number of iterations, early stop Results: exec. time 20
  21. 21. Alberto Parravicini 23/05/2019 ● Execution time is well divided between Candidate Finder and Disambiguation Results: exec. time of single steps 21
  22. 22. Thank you! Fast Entity Linking via Graph Embeddings ● Novel unsupervised framework for EL ● First EL algorithm to use graph embeddings ○ Accuracy similar to supervised SoA techniques ● Real-time execution time Alberto Parravicini, alberto.parravicini@polimi.it Rhicheek Patra Davide Bartolini Marco Santambrogio 2019-05-{17-31}, NGCX
  23. 23. Alberto Parravicini 23/05/2019 ●Turn topology and properties of each vertex into a vector ●“Similar” vertices have “similar” embeddings Embeddings Step 2/4 embedding properties topology Hamilton, William L., Rex Ying, and Jure Leskovec. "Representation learning on graphs: Methods and applications." arXiv preprint arXiv:1709.05584 (2017).
  24. 24. Alberto Parravicini 23/05/2019 … and join them together Step 1/4 Graph Creation
  25. 25. Alberto Parravicini 23/05/2019 Candidates Finder Idea: for each mention, select a small number of candidate vertices with string similarity ● We use a simple index-based string search ● Fuzzy matching with 2-grams and 3-grams ● This provide a simple baseline (60-70% accuracy) Step 3/4

×