SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Development of Named Entities 
Recognition for French Newspapers 
Journée d’information 
Europeana Newspapers 
27/11/2014 
BnF / Paris, France 
Clemens Neudecker, State Library Berlin 
@cneudecker
What is „Named Entity Recognition“? 
• Named Entity Recognition (NER) is a sub-task of 
Information Extraction and is typically understood as being 
part of the area of Computational Linguistics / Natural 
Language Processing. 
• The main aim of NER is the automatic extraction and 
classification of knowledge or information from 
semantically unstructured text. 
• NER is still subject to academic research (cf. Google & 
MSR Competition) – practical use in the cultural heritage 
digitisation sector remains a rare case. 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 2
Asked differently: What is a „Named Entity“? 
• PERSON: 
• Names of persons and families, but also names of fictional 
persons („Albert Einstein“, „Präsident der USA“, „Micky Maus“) 
• ORGANISATION: 
• Names of companies, governemental or non-governemental 
organisations („IBM“, „The Beatles“, „Labour Party“) 
• PLACE: 
• Cities, Provinces, Counties, geographical areas, asf. 
(„Paris“, „Haute-Pyrénées“, „Alpes“) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
3
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
NER (I) 
4 
1. Detection/Classification of person names, places and 
organisations in a running text (includes POS)
NER (II) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
5 
2. Disambiguation of terms (Example “Jordan”) 
through contextual information
NER (III) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
6 
3. Linking to authority files and online databases 
(Linked Data)
Supported languages in ENP 
3 Languages: 
• German 
• Dutch 
• French 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
7
Approaches 
• Machine learning vs. rule-based 
• Advantages of machine-learning systems: 
• No need for specific linguistic expertise 
• Processing of large amounts of material 
• Advantages of rule-based systems: 
• Can be tuned to very high accuracy for particular texts 
• Adaptation to local grammar and specific text style 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
8
Software 
• Open Source ML software developed by Stanford 
University, adapted and extended for Europeana 
Newspapers by the KB National Library of the Netherlands 
• Software is available as open source from Github for 
download and testing: 
https://github.com/KBNLresearch/europeananp-ner 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
9
Training 
• Training the NER systems with the help of manually 
annotated corpora („gold corpus“) and gazzetteers 
• Publication of annotated data from ENP as open data 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
10
Encoding 
• Results of NER are stored in a library specific format: 
ALTO (Analyzed Layout and Text Object) 
• Versions > 2.1 of ALTO specifically allow to use NER „Tags“ 
<String STYLEREFS="ID7" HEIGHT="132.0" WIDTH="570.0" HPOS="5937.0" 
VPOS="3279.0" CONTENT="Reynolds" WC="0.95238096" TAGREFS="Tag5"></String> 
<String STYLEREFS="ID7" HEIGHT="102.0" WIDTH="540.0" HPOS="18438.0" 
VPOS="22008.0" CONTENT="Baltimore" WC="0.82539684„ TAGREFS="Tag10"></String> 
… 
<Tags> 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
11 
<NamedEntityTag ID="Tag5" TYPE="Person" LABEL="Reynolds"/> 
<NamedEntityTag ID="Tag6" TYPE=”Place" LABEL=”Baltimore"/> 
</Tags>
Problems and challenges 
• OCR errors reduce the accuracy of the classification and 
slow down the overall processing time for recognition due to 
high noise. 
• Historical spelling variation for person names and place 
names in particular. 
• In many cases the historical spelling variants can not be 
found in online knowledge bases. 
 Specific adaptation of the software via external modules 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
12
Initial results: Dutch 
Persons Places Organisations 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
13 
Precision 0.940 0.950 0.942 
Recall 0.588 0.760 0.559 
F-measure 0.689 0.838 0.671
Why Named Entity Recognition? 
• Example: Analysis of log files from the newspaper collection of 
the National Library of Wales shows that 9 out of 10 queries 
are for a person or place name! 
(Source: Paul Gooding, Exploring Usage of Digital Newspaper Archives through Web Log 
Analysis: A Case Study of Welsh Newspapers Online, presented at DH2014, Lausanne) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
14
Thank you for your attention! 
Merci de votre attention! 
@eurnews 
http://www.europeana-newspapers.eu 
http://www.theeuropeanlibrary.org/tel4/newspapers 
http://www.europeana.eu/

Contenu connexe

Tendances

Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana Newspapers
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineLIBER Europe
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspapers
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Data Driven Innovation
 
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankThe EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankViola Zazzera
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectEuropeana Newspapers
 

Tendances (20)

Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLieder
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project Overview
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday Genereux
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available online
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation Plan
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
EurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_NeudeckerEurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_Neudecker
 
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankThe EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
 

En vedette

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayEuropeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 

En vedette (7)

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
DocWorks Demo
DocWorks DemoDocWorks Demo
DocWorks Demo
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
What is a named entity
What is a named entityWhat is a named entity
What is a named entity
 
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 

Similaire à Presentation of Clemens Neudecker, BnF Information Day

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerBiblioteca Nacional de España
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxFIWARE
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Apulian ICT Living Labs
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapersEuropeana
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
Introducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedIntroducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedParthenos
 
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...European Data Forum
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE
 
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Jisc
 
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Invest Northern Ireland
 
European Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEuropean Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEOSC-hub project
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 

Similaire à Presentation of Clemens Neudecker, BnF Information Day (20)

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
FP7-ICT Programme
FP7-ICT ProgrammeFP7-ICT Programme
FP7-ICT Programme
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptx
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapers
 
Enoll hannover-2013-anna
Enoll hannover-2013-annaEnoll hannover-2013-anna
Enoll hannover-2013-anna
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
Introducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedIntroducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updated
 
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
 
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
 
European Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEuropean Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunities
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 

Plus de Europeana Newspapers

Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers
 

Plus de Europeana Newspapers (20)

Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday Kempf
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday Bolioli
 
ENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillemsENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillems
 
ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen
 
ENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizingaENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizinga
 
ENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijnsENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijns
 
ENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijckENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijck
 
ENP_ONB_infoday_Schaller
ENP_ONB_infoday_SchallerENP_ONB_infoday_Schaller
ENP_ONB_infoday_Schaller
 
ENP_ONB_infoday_Neudecker
ENP_ONB_infoday_NeudeckerENP_ONB_infoday_Neudecker
ENP_ONB_infoday_Neudecker
 

Dernier

Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 

Dernier (20)

Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 

Presentation of Clemens Neudecker, BnF Information Day

  • 1. Development of Named Entities Recognition for French Newspapers Journée d’information Europeana Newspapers 27/11/2014 BnF / Paris, France Clemens Neudecker, State Library Berlin @cneudecker
  • 2. What is „Named Entity Recognition“? • Named Entity Recognition (NER) is a sub-task of Information Extraction and is typically understood as being part of the area of Computational Linguistics / Natural Language Processing. • The main aim of NER is the automatic extraction and classification of knowledge or information from semantically unstructured text. • NER is still subject to academic research (cf. Google & MSR Competition) – practical use in the cultural heritage digitisation sector remains a rare case. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
  • 3. Asked differently: What is a „Named Entity“? • PERSON: • Names of persons and families, but also names of fictional persons („Albert Einstein“, „Präsident der USA“, „Micky Maus“) • ORGANISATION: • Names of companies, governemental or non-governemental organisations („IBM“, „The Beatles“, „Labour Party“) • PLACE: • Cities, Provinces, Counties, geographical areas, asf. („Paris“, „Haute-Pyrénées“, „Alpes“) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 3
  • 4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp NER (I) 4 1. Detection/Classification of person names, places and organisations in a running text (includes POS)
  • 5. NER (II) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 5 2. Disambiguation of terms (Example “Jordan”) through contextual information
  • 6. NER (III) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 6 3. Linking to authority files and online databases (Linked Data)
  • 7. Supported languages in ENP 3 Languages: • German • Dutch • French This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 7
  • 8. Approaches • Machine learning vs. rule-based • Advantages of machine-learning systems: • No need for specific linguistic expertise • Processing of large amounts of material • Advantages of rule-based systems: • Can be tuned to very high accuracy for particular texts • Adaptation to local grammar and specific text style This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 8
  • 9. Software • Open Source ML software developed by Stanford University, adapted and extended for Europeana Newspapers by the KB National Library of the Netherlands • Software is available as open source from Github for download and testing: https://github.com/KBNLresearch/europeananp-ner This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 9
  • 10. Training • Training the NER systems with the help of manually annotated corpora („gold corpus“) and gazzetteers • Publication of annotated data from ENP as open data This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 10
  • 11. Encoding • Results of NER are stored in a library specific format: ALTO (Analyzed Layout and Text Object) • Versions > 2.1 of ALTO specifically allow to use NER „Tags“ <String STYLEREFS="ID7" HEIGHT="132.0" WIDTH="570.0" HPOS="5937.0" VPOS="3279.0" CONTENT="Reynolds" WC="0.95238096" TAGREFS="Tag5"></String> <String STYLEREFS="ID7" HEIGHT="102.0" WIDTH="540.0" HPOS="18438.0" VPOS="22008.0" CONTENT="Baltimore" WC="0.82539684„ TAGREFS="Tag10"></String> … <Tags> This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 11 <NamedEntityTag ID="Tag5" TYPE="Person" LABEL="Reynolds"/> <NamedEntityTag ID="Tag6" TYPE=”Place" LABEL=”Baltimore"/> </Tags>
  • 12. Problems and challenges • OCR errors reduce the accuracy of the classification and slow down the overall processing time for recognition due to high noise. • Historical spelling variation for person names and place names in particular. • In many cases the historical spelling variants can not be found in online knowledge bases.  Specific adaptation of the software via external modules This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 12
  • 13. Initial results: Dutch Persons Places Organisations This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 13 Precision 0.940 0.950 0.942 Recall 0.588 0.760 0.559 F-measure 0.689 0.838 0.671
  • 14. Why Named Entity Recognition? • Example: Analysis of log files from the newspaper collection of the National Library of Wales shows that 9 out of 10 queries are for a person or place name! (Source: Paul Gooding, Exploring Usage of Digital Newspaper Archives through Web Log Analysis: A Case Study of Welsh Newspapers Online, presented at DH2014, Lausanne) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 14
  • 15. Thank you for your attention! Merci de votre attention! @eurnews http://www.europeana-newspapers.eu http://www.theeuropeanlibrary.org/tel4/newspapers http://www.europeana.eu/