SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5
Expression extractions should be improved and implemented on open source software. The careful use of natural language processing
algorithms could provide better filtering metrics and support in expression merging
The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering
important entities that could have been excluded by the automatic filtering.
Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo)
OOV IV
LATTICE Lab
CNRS – Ecole Normale Supérieure
U Paris 3 Sorbonne Nouvelle
ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators
Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie
pablo.ruiz.fabo@ens.fr
Our users’ needs in Entity Linking (EL)
o Target users: social science researchers
o Performance of EL systems varies widely depending on corpus
characteristics and types of entities required
o Difficult for users to choose optimal EL system for their corpora
o Our target users wish to filter EL results, making informed
choices about entities to keep and discard
o Public open source tools
o Combine outputs of several tools to get complementary results
o Providing metrics for users to evaluate quality of an annotation
o Simultaneous access to metrics and text to validate annotations
o Besides manual selection, automatic selection also possible via
weighted voting of annotations
The Problem Our Approach
Demo features
TRAFFIC-LIGHT MATRIX FORMAT
o Annotation confidence scores provided by EL services
o Measures of coherence between an entity and the most
representative entities in the corpus
› Wikipedia Link-based Measure: Relatedness between two entities
as a function of Wikipedia pages linking to both and linking to one only
Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4])
› Other possible measures
• Distance between entities’ categories in a Wikipedia
category graph
Corpus: subset of PoliInformatics [2], about 2008 US financial crisis
(1) Query via Search Text displays:
• Document Panel: Documents matching the query
• Entity Panel: Entities extracted in the documents matching the
query displayed on doc. panel, plus:
(2) Confidence Scores for each annotator, normalized to a 0-1
range. (T=Tagme, S=Spotlight, W=Wikipedia Miner).
(3) Coherence score between the entity and a representative
subset of the corpus entities.
(4) Entities not coherent with the corpus are flagged in red.
(5) Query via Search Entities displays:
• Entity Panel: Entities matching the query.
• Document Panel: Documents containing one of the entities
displayed on the entity panel.
(6) Refine Search: Entities can be selected with a list of types
(like ORG) or selected individually with checkboxes.
(7) The Auto-Selection tab shows the output of an automatic
filtering via weighted voting of annotations.
(8) Charts: examples of co-occurrence networks, created offline
exploiting workflow information (sentence number, confidence, …)
0.0
1.0
Scale
DOC.PANELENTITYPANEL 1
5
3
4
6
2
7
8
System workflows
o User always has access to full results, but the workflow can
select a subset of the annotations automatically.
o Workflow combines, via weighted voting, outputs of:
TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy
o Votes are weighted according to each annotator’s precision on
two reference corpora (IITB and AIDA/CONLL B), depending on
whether user requires annotations for common-noun entity
mentions or not.
on demo not shown on demo
Evaluation
o Automatic EL system combination improved results over each
individual system’s results ([5], our *SEM poster).
o Assessed with strong annotation match and entity match [6] on
four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT.
[1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text
Analytics. Sociologica, 3:1-17. Il Mulino, Bologna.
[2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL
LACSS Workshop.
[3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from
Wikipedia links. In Proc AAAI WS on Wikipedia and AI.
[4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP.
[5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through
weighted voting. In Proc. *SEM.
[6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation
systems. In Proc. of WWW, 249-260.
Metrics to assist in manual filtering
Annotation voting for automatic filtering
DEMO LINK: http://129.199.228.10/nav/gui/

Contenu connexe

Similaire à Entity Linking Combining Open Source Annotators

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologycsandit
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Thomas Burguiere
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$Sof Ouni
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsWaqas Tariq
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)SangMe Nam
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide sharerwpreston135
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classificationIsabella Peters
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxanhlodge
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 

Similaire à Entity Linking Combining Open Source Annotators (20)

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontology
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
Sub1557
Sub1557Sub1557
Sub1557
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak Reports
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
CSE509 Lecture 5
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide share
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 

Dernier

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Cherry
 

Dernier (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 

Entity Linking Combining Open Source Annotators

  • 1. http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5 Expression extractions should be improved and implemented on open source software. The careful use of natural language processing algorithms could provide better filtering metrics and support in expression merging The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering important entities that could have been excluded by the automatic filtering. Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo) OOV IV LATTICE Lab CNRS – Ecole Normale Supérieure U Paris 3 Sorbonne Nouvelle ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie pablo.ruiz.fabo@ens.fr Our users’ needs in Entity Linking (EL) o Target users: social science researchers o Performance of EL systems varies widely depending on corpus characteristics and types of entities required o Difficult for users to choose optimal EL system for their corpora o Our target users wish to filter EL results, making informed choices about entities to keep and discard o Public open source tools o Combine outputs of several tools to get complementary results o Providing metrics for users to evaluate quality of an annotation o Simultaneous access to metrics and text to validate annotations o Besides manual selection, automatic selection also possible via weighted voting of annotations The Problem Our Approach Demo features TRAFFIC-LIGHT MATRIX FORMAT o Annotation confidence scores provided by EL services o Measures of coherence between an entity and the most representative entities in the corpus › Wikipedia Link-based Measure: Relatedness between two entities as a function of Wikipedia pages linking to both and linking to one only Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4]) › Other possible measures • Distance between entities’ categories in a Wikipedia category graph Corpus: subset of PoliInformatics [2], about 2008 US financial crisis (1) Query via Search Text displays: • Document Panel: Documents matching the query • Entity Panel: Entities extracted in the documents matching the query displayed on doc. panel, plus: (2) Confidence Scores for each annotator, normalized to a 0-1 range. (T=Tagme, S=Spotlight, W=Wikipedia Miner). (3) Coherence score between the entity and a representative subset of the corpus entities. (4) Entities not coherent with the corpus are flagged in red. (5) Query via Search Entities displays: • Entity Panel: Entities matching the query. • Document Panel: Documents containing one of the entities displayed on the entity panel. (6) Refine Search: Entities can be selected with a list of types (like ORG) or selected individually with checkboxes. (7) The Auto-Selection tab shows the output of an automatic filtering via weighted voting of annotations. (8) Charts: examples of co-occurrence networks, created offline exploiting workflow information (sentence number, confidence, …) 0.0 1.0 Scale DOC.PANELENTITYPANEL 1 5 3 4 6 2 7 8 System workflows o User always has access to full results, but the workflow can select a subset of the annotations automatically. o Workflow combines, via weighted voting, outputs of: TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy o Votes are weighted according to each annotator’s precision on two reference corpora (IITB and AIDA/CONLL B), depending on whether user requires annotations for common-noun entity mentions or not. on demo not shown on demo Evaluation o Automatic EL system combination improved results over each individual system’s results ([5], our *SEM poster). o Assessed with strong annotation match and entity match [6] on four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT. [1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text Analytics. Sociologica, 3:1-17. Il Mulino, Bologna. [2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL LACSS Workshop. [3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc AAAI WS on Wikipedia and AI. [4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP. [5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through weighted voting. In Proc. *SEM. [6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation systems. In Proc. of WWW, 249-260. Metrics to assist in manual filtering Annotation voting for automatic filtering DEMO LINK: http://129.199.228.10/nav/gui/