SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Datasets and GATE Evaluation Framework for 
Benchmarking Wikipedia Based NER Systems 
Milan Dojchinovski1,2, Tomáš Kliegr1 
2 Faculty of Information Technology 
Czech Technical University in Prague 
1 Faculty of Informatics and Statistics 
University of Economics, Prague 
“NLP & DBpedia” ISWC 2013 workshop 
October 22nd, 2013, Sydney, Australia 
Milan Dojchinovski 
milan.dojchinovski@vse.cz - @m1ci - http://dojchinovski.mk 
Except where otherwise noted, the content of this presentation is licensed under 
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported 
Czech Technical University 
in Prague 
University of Economics 
Prague
Outline 
‣ Introduction 
‣ Prerequisites and challenges 
‣ GATE framework for benchmarking NER 
‣ Conclusion and future directions 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems 2
What is a Named Entity Recognition task? 
3 
‣ Main sub-tasks 
- spotting of entities: tagging text fragment as an entity 
- disambiguation of entities: unique identification of entities using URIs 
- classification of entities: assignment of type to an entity 
The Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, 
Czech Republic. Its construction started in 1357 under the auspices of King Charles IV, 
and finished in the beginning of the 15th century. The bridge replaced the old Judith 
Bridge built 1158–1172 that had been badly damaged by a flood in 1342. 
Entity Entity URI Type 
Charles Bridge http://dbpedia.org/resource/Charles_Bridge http://dbpedia.org/ontology/Bridge 
Vltava http://dbpedia.org/resource/Vltava http://dbpedia.org/ontology/River 
Prague http://dbpedia.org/resource/Prague http://dbpedia.org/ontology/City 
Czech Republic http://dbpedia.org/resource/Czech_Republic http://dbpedia.org/ontology/Country 
King Charles IV http://dbpedia.org/resource/Charles_IV,_Holy_Roman_Emperor http://dbpedia.org/ontology/Person 
Judith Bridge http://dbpedia.org/resource/Judith_Bridge http://dbpedia.org/ontology/Bridge 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Existing NER tools 
4 
‣ DBpedia Spotlight, NERD, THD (EntityClassifier.eu), AlchemyAPI, Open 
Calais, Evri, Lupedia, Wikimeta, Yahoo!, Zemata, and others. 
‣ Differences 
- types come from different taxonomies 
- types with different granularity (Person, SoccerPlayer, Manager) 
- types are plain text literals 
‣ Similarities 
- disambiguation with DBpedia or Wikipedia resources 
- DBpedia Spotlight, Entityclassifier.eu, AlchemyAPI, Wikimeta 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Outline 
5 
‣ Introduction 
‣ Challenges and prerequisites 
‣ GATE framework for benchmarking NER 
‣ Conclusion and future directions 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Challenges and Prerequisites 
6 
‣ Entity spotting 
- spotted entity text fragments might not be exactly overlapping, but still correct 
- entity start and end offset might be different 
The Charles Bridge is a famous historic bridge ... - ground-truth annotations 
The Charles Bridge is a famous historic bridge ... - annotations from a NER system 
‣ Entity disambiguation 
- using unique DBpedia/Wikipedia resource URIs or URIs from YAGO or Freebase. 
The Charles Bridge is a famous historic bridge ... 
“Charles Bridge” - http://dbpedia.org/resource/Charles_Bridge 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Challenges and Prerequisites 
7 
‣ Entity classification 
- different NER might return different types for same entity 
- although different (in granularity), but still correct 
- Person and SoccerManager are not same types, but they are correct 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Outline 
8 
‣ Introduction 
‣ Prerequisites and challenges 
‣ GATE framework for benchmarking NER 
‣ Conclusion and future directions 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Architecture overview 
9 
‣ Unified evaluation framework 
- any NER tool can be easily integrated and evaluated 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Realization 
10 
‣ GATE text engineering framework 
- open-source, strong community support 
- easy to extend (plugins and new processing resources) 
- several existing NER clients for GATE: THD, OpenCalais, ANNIE, etc. 
- evaluation tools that can be reused 
- Corpus Quality Assurance 
- Annotation diff 
‣ Developed tools 
- plugins for import of News and Tweets datasets 
- plugin for type alignment 
- reference implementation of a NER client as a GATE plugin 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Evaluation Workflow 
11 
• Steps to evaluate NER system 
1. Import ground-truth dataset 
- use provided plugins 
- News and Tweets datasets 
2. Run NER on the ground-truth corpus 
- use a GATE client plugin for the NER system 
- if not existent, should be implemented! 
3. Align entity classes with the ground-truth classes 
- use the provided OntologyAwareDiffPR plugin 
4. Evaluate the performance of the NER tool 
- use the Corpus Quality Assurance tool 
- evaluate NE spotting, disambiguation, classification 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
News and Tweets datasets 
12 
• Tweets dataset - CC BY-NC-SA 3.0 
- dataset from the Making Sense of Microposts (MSM) 2013 workshop challenge 
- 1044 tweets, 1523 entities 
• News dataset - CC BY-SA 3.0 
- derivation from the datasets presented at the WEKEX 2011 workshop 
- standard-length news articles, 10 articles, 588 entities 
Fig. 1. Example of an entity annotation in GATE 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Type Alignment 
13 
• Implemented as a GATE plugin 
- OntologyAwareDiffPR 
• Pre-requirements 
- DBpedia Ontology 
- typeURI feature in the ground-truth and NER annotation 
typeURI: http://dbpedia.org/ontology/Person 
aligned: http://dbpedia.org/Ontology/SoccerManager 
before 
alignment 
typeURI: http://dbpedia.org/ontology/SoccerManager 
after 
alignment 
NER annotation ground-truth annotation 
typeURI: http://dbpedia.org/ontology/Person 
typeURI: http://dbpedia.org/ontology/SoccerManager 
aligned: http://dbpedia.org/Ontology/SoccerManager 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Outline 
14 
‣ Introduction 
‣ Prerequisites and challenges 
‣ GATE framework for benchmarking NER 
‣ Conclusion and future directions 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Conclusion and Future Directions 
15 
• GATE Evaluation Framework 
- two ground-truth datasets 
- two plugins for import of News and Tweets datasets 
- a plugin to perform basic type alignment 
- reference implementation of Entityclassifier.eu NER as a GATE client 
- plugins published under GPLv3.0 
• Future Work 
- integration of additional NER/NIF systems 
- development of an advance type alignment 
- improvement of existing datasets and creation of new 
- additional ground-truth datasets 
- additional NER evaluation statistics 
- using NERD ontology to integrate tag sets of common wikifiers 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
16 
Thank you! 
Questions, comments, ideas? 
Milan Dojchinovski @m1ci 
milan.dojchinovski@fit.cvut.cz http://dojchinovski.mk 
Feedback 
‣ Framework resources: 
- general info: http://entityclassifier.eu/datasets/evaluation/ 
- datasets: http://entityclassifier.eu/datasets/evaluation/benchmark-datasets/ 
- tools: http://entityclassifier.eu/datasets/evaluation/tools/ 
Except where otherwise noted, the content of this presentation is licensed under 
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
Datasets annotation process 
17 
‣ Inter-annotator agreement 
- two annotators 
- additional one/two annotators for spurious cases 
Dataset Wikipedia URL Coarse grained 
type 
Fine grained 
type 
Most frequent 
sense 
News 0,61 0,65 0,7 0,77 
Tweets 0,79 n/a 0,64 0,86 
• Fields 
- URL to English Wikipedia 
- Fine-grained type 
- Coarse-grained type 
- MFS flag 
- Common entity flag 
- Full name 
- Partial flag 
- Incorrect capitalization flag (Tweets only) 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
News and Tweets datasets 
18 
• Tweets dataset 
- dataset from the Making Sense of Microposts (MSM) 2013 Workshop challenge 
- 1044 tweets, 1523 entities 
- Creative Commons BY-NC-SA 3.0 
• News dataset 
- derivation from the datasets presented at the WEKEX 2011 workshop 
- standard-length news articles, 10 articles, 588 entities) 
- Creative Commons BY-SA 3.0 
Documents 
Entities 
All With CoNNL type Ontology type Wikipedia URL 
News 10 588 580 367 440 
Tweets 1044 1523 1523 1379 1354 
Fig. 1. Size metrics for the Tweets and News dataset 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
Entityclassifier.eu preliminary results 
19 
• http://entityclassifier.eu 
Task Precision 
(strict/lenient) 
Recall 
(strict/lenient) 
F1.0 score 
(strict/lenient) 
Entity spotting 0.45/0.56 0.67/0.84 0.54/0.67 
Entity disambiguation 0.24/0.26 0.36/0.39 0.29/0.31 
Entity classification 0.12/0.13 0.17/0.19 0.14/0.15 
Fig. 1. Results for the Entityclassifier.eu NER on the Tweets dataset 
Task Precision 
(strict/lenient) 
Recall 
(strict/lenient) 
F1.0 score 
(strict/lenient) 
Entity spotting 0.69/0.78 0.33/0.38 0.45/0.51 
Entity disambiguation 0.37/0.41 0.18/0.20 0.24/0.27 
Entity classification 0.69/0.78 0.33/0.38 0.45/0.51 
Fig. 2. Results for the Entityclassifier.eu NER on the News dataset 
GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Contenu connexe

Similaire à Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Hypermedia for Machine APIs
Hypermedia for Machine APIsHypermedia for Machine APIs
Hypermedia for Machine APIsMichael Koster
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search APIOCLC Research
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Ákos Horváth
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery Labs
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Innovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesInnovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesSteve Speicher
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Ricard de la Vega
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaegerOracle Korea
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with JaegerInho Kang
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionSergey Sotnikov
 

Similaire à Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems (20)

A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Hypermedia for Machine APIs
Hypermedia for Machine APIsHypermedia for Machine APIs
Hypermedia for Machine APIs
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search API
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptx
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Innovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesInnovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open Interfaces
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Sword Crig 2007 12 06
Sword Crig 2007 12 06Sword Crig 2007 12 06
Sword Crig 2007 12 06
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
NextGenML
NextGenML NextGenML
NextGenML
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with Jaeger
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems

  • 1. Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems Milan Dojchinovski1,2, Tomáš Kliegr1 2 Faculty of Information Technology Czech Technical University in Prague 1 Faculty of Informatics and Statistics University of Economics, Prague “NLP & DBpedia” ISWC 2013 workshop October 22nd, 2013, Sydney, Australia Milan Dojchinovski milan.dojchinovski@vse.cz - @m1ci - http://dojchinovski.mk Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Czech Technical University in Prague University of Economics Prague
  • 2. Outline ‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems 2
  • 3. What is a Named Entity Recognition task? 3 ‣ Main sub-tasks - spotting of entities: tagging text fragment as an entity - disambiguation of entities: unique identification of entities using URIs - classification of entities: assignment of type to an entity The Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, Czech Republic. Its construction started in 1357 under the auspices of King Charles IV, and finished in the beginning of the 15th century. The bridge replaced the old Judith Bridge built 1158–1172 that had been badly damaged by a flood in 1342. Entity Entity URI Type Charles Bridge http://dbpedia.org/resource/Charles_Bridge http://dbpedia.org/ontology/Bridge Vltava http://dbpedia.org/resource/Vltava http://dbpedia.org/ontology/River Prague http://dbpedia.org/resource/Prague http://dbpedia.org/ontology/City Czech Republic http://dbpedia.org/resource/Czech_Republic http://dbpedia.org/ontology/Country King Charles IV http://dbpedia.org/resource/Charles_IV,_Holy_Roman_Emperor http://dbpedia.org/ontology/Person Judith Bridge http://dbpedia.org/resource/Judith_Bridge http://dbpedia.org/ontology/Bridge GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 4. Existing NER tools 4 ‣ DBpedia Spotlight, NERD, THD (EntityClassifier.eu), AlchemyAPI, Open Calais, Evri, Lupedia, Wikimeta, Yahoo!, Zemata, and others. ‣ Differences - types come from different taxonomies - types with different granularity (Person, SoccerPlayer, Manager) - types are plain text literals ‣ Similarities - disambiguation with DBpedia or Wikipedia resources - DBpedia Spotlight, Entityclassifier.eu, AlchemyAPI, Wikimeta GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 5. Outline 5 ‣ Introduction ‣ Challenges and prerequisites ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 6. Challenges and Prerequisites 6 ‣ Entity spotting - spotted entity text fragments might not be exactly overlapping, but still correct - entity start and end offset might be different The Charles Bridge is a famous historic bridge ... - ground-truth annotations The Charles Bridge is a famous historic bridge ... - annotations from a NER system ‣ Entity disambiguation - using unique DBpedia/Wikipedia resource URIs or URIs from YAGO or Freebase. The Charles Bridge is a famous historic bridge ... “Charles Bridge” - http://dbpedia.org/resource/Charles_Bridge GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 7. Challenges and Prerequisites 7 ‣ Entity classification - different NER might return different types for same entity - although different (in granularity), but still correct - Person and SoccerManager are not same types, but they are correct GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 8. Outline 8 ‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 9. Architecture overview 9 ‣ Unified evaluation framework - any NER tool can be easily integrated and evaluated GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 10. Realization 10 ‣ GATE text engineering framework - open-source, strong community support - easy to extend (plugins and new processing resources) - several existing NER clients for GATE: THD, OpenCalais, ANNIE, etc. - evaluation tools that can be reused - Corpus Quality Assurance - Annotation diff ‣ Developed tools - plugins for import of News and Tweets datasets - plugin for type alignment - reference implementation of a NER client as a GATE plugin GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 11. Evaluation Workflow 11 • Steps to evaluate NER system 1. Import ground-truth dataset - use provided plugins - News and Tweets datasets 2. Run NER on the ground-truth corpus - use a GATE client plugin for the NER system - if not existent, should be implemented! 3. Align entity classes with the ground-truth classes - use the provided OntologyAwareDiffPR plugin 4. Evaluate the performance of the NER tool - use the Corpus Quality Assurance tool - evaluate NE spotting, disambiguation, classification GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 12. News and Tweets datasets 12 • Tweets dataset - CC BY-NC-SA 3.0 - dataset from the Making Sense of Microposts (MSM) 2013 workshop challenge - 1044 tweets, 1523 entities • News dataset - CC BY-SA 3.0 - derivation from the datasets presented at the WEKEX 2011 workshop - standard-length news articles, 10 articles, 588 entities Fig. 1. Example of an entity annotation in GATE GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 13. Type Alignment 13 • Implemented as a GATE plugin - OntologyAwareDiffPR • Pre-requirements - DBpedia Ontology - typeURI feature in the ground-truth and NER annotation typeURI: http://dbpedia.org/ontology/Person aligned: http://dbpedia.org/Ontology/SoccerManager before alignment typeURI: http://dbpedia.org/ontology/SoccerManager after alignment NER annotation ground-truth annotation typeURI: http://dbpedia.org/ontology/Person typeURI: http://dbpedia.org/ontology/SoccerManager aligned: http://dbpedia.org/Ontology/SoccerManager GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 14. Outline 14 ‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 15. Conclusion and Future Directions 15 • GATE Evaluation Framework - two ground-truth datasets - two plugins for import of News and Tweets datasets - a plugin to perform basic type alignment - reference implementation of Entityclassifier.eu NER as a GATE client - plugins published under GPLv3.0 • Future Work - integration of additional NER/NIF systems - development of an advance type alignment - improvement of existing datasets and creation of new - additional ground-truth datasets - additional NER evaluation statistics - using NERD ontology to integrate tag sets of common wikifiers GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 16. 16 Thank you! Questions, comments, ideas? Milan Dojchinovski @m1ci milan.dojchinovski@fit.cvut.cz http://dojchinovski.mk Feedback ‣ Framework resources: - general info: http://entityclassifier.eu/datasets/evaluation/ - datasets: http://entityclassifier.eu/datasets/evaluation/benchmark-datasets/ - tools: http://entityclassifier.eu/datasets/evaluation/tools/ Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
  • 17. Datasets annotation process 17 ‣ Inter-annotator agreement - two annotators - additional one/two annotators for spurious cases Dataset Wikipedia URL Coarse grained type Fine grained type Most frequent sense News 0,61 0,65 0,7 0,77 Tweets 0,79 n/a 0,64 0,86 • Fields - URL to English Wikipedia - Fine-grained type - Coarse-grained type - MFS flag - Common entity flag - Full name - Partial flag - Incorrect capitalization flag (Tweets only) GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 18. News and Tweets datasets 18 • Tweets dataset - dataset from the Making Sense of Microposts (MSM) 2013 Workshop challenge - 1044 tweets, 1523 entities - Creative Commons BY-NC-SA 3.0 • News dataset - derivation from the datasets presented at the WEKEX 2011 workshop - standard-length news articles, 10 articles, 588 entities) - Creative Commons BY-SA 3.0 Documents Entities All With CoNNL type Ontology type Wikipedia URL News 10 588 580 367 440 Tweets 1044 1523 1523 1379 1354 Fig. 1. Size metrics for the Tweets and News dataset GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems
  • 19. Entityclassifier.eu preliminary results 19 • http://entityclassifier.eu Task Precision (strict/lenient) Recall (strict/lenient) F1.0 score (strict/lenient) Entity spotting 0.45/0.56 0.67/0.84 0.54/0.67 Entity disambiguation 0.24/0.26 0.36/0.39 0.29/0.31 Entity classification 0.12/0.13 0.17/0.19 0.14/0.15 Fig. 1. Results for the Entityclassifier.eu NER on the Tweets dataset Task Precision (strict/lenient) Recall (strict/lenient) F1.0 score (strict/lenient) Entity spotting 0.69/0.78 0.33/0.38 0.45/0.51 Entity disambiguation 0.37/0.41 0.18/0.20 0.24/0.27 Entity classification 0.69/0.78 0.33/0.38 0.45/0.51 Fig. 2. Results for the Entityclassifier.eu NER on the News dataset GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems