"NERD: an open source platform for extracting and disambiguating named entities in very diverse documents" - Keynote Talk given at the NLP&DBpedia International Workshop (NLP&DBpedia), 22 October 2013
Unraveling Multimodality with Large Language Models.pdf
NERD: an open source platform for extracting and disambiguating named entities in very diverse documents NLP-DBpedia 2013
1. NERD: an open source
platform for extracting and
disambiguating named entities
in very diverse documents
Raphaël Troncy <raphael.troncy@eurecom.fr>
Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
2. What is a Named Entity recognition task?
A task that aims to locate and classify the name of a
person or an organization, a location, a brand, a
product, a numeric expression including time, date,
money and percent in a textual document
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-2
3. Example
“ I want to book a room in an hotel located in
the heart of Paris, just a stone’s throw from the
Eiffel Tower ”
Eric Charton, “Named Entity Detection and Entity Linking in the
Context of Semantic Web: Exploring the ambiguity question”
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-3
5. What is Paris? Type Ambiguity
dbpedia-owl:Asteroid
schema:City
schema:Movie
dbpedia-owl:Film
Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-5
6. Named Entity Recognition (NER)
I
want
to
book
a
room
in
…
Paris
PRP
VBP
TO
VB
DT
NN
IN
…
NNP
O
O
O
O
O
O
O
…
LOC
Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-6
7. What is Paris? Name Ambiguity
Paris, Kentucky
Paris, France
Paris, Maine
Paris, Idaho
Paris, Tennessee
Paris, Ontario
Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-7
8. Named Entity Linking (NEL)
I
want
to
book
a
room
in
…
Paris
PRP
VBP
TO
VB
DT
NN
IN
…
NNP
O
O
O
O
O
O
O
…
LOC
O
O
O
O
O
O
O
…
http://dbpedia.org/resource/Paris
Giuseppe Rizzo, “Learning with the Web: Structuring data to
ease machine understanding”
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-8
9. NER Tools and Web APIs
Standalone software
GATE
Stanford CoreNLP
Temis
http://nerd.eurecom.fr/
Web APIs
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
-9
10. NERD: Named Entity Recognition and
Disambiguation
Compare performances of
NER and NEL tools
Understand strengths and weaknesses of different Web APIs
Adapt NER processing to different context
(Learn how to) Combine NER (/ NEL) tools
Participate in various benchmarks
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 10
11. What is NERD?
ontology1
REST API2
UI3
1
http://nerd.eurecom.fr/ontology
2 http://nerd.eurecom.fr/api/application.wadl
3 http://nerd.eurecom.fr
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 11
12. Factual comparison of 10 Web NER tools
Alchemy
API
DBpedia
Spotlight
Evri
Extractiv
Lupedia
Open
Calais
Saplo
Wikimeta
Yahoo!
Zemanta
Language
EN,FR,
GR,IT,
PT,RU,
SP,SW
EN
GR*
PT*
SP*
EN,I
T
EN
EN,FR,
IT
EN,FR
SP
EN,
SW
EN,FR
SP
EN
EN
Granularity
OEN
OEN
OED
OEN
OEN
OEN
OED
OEN
OEN
OED
Entity
position
N/A
char
offset
N/A
word
offset
range of
chars
char
offset
N/A
POS
offset
range
of
chars
N/A
Alchemy
DBpedia
FreeBase
Scema.or
g
Evri
DBpedia
DBpedia
LinkedM
DB
Open
Calais
N/A
ESTER
Yahoo
FreeBase
Number of
classes
324
320
5
34
319
95
5
7
13
81
Response
Format
JSON
MicroF
XML
RDF
HTML
JSON
RDF
XML
HTM
L
JSO
N
RDF
HTML
JSON
RDF
XML
HTML
JSON
RDFa
XML
JSON
MicroF
ormat
JSON
JSON
XML
JSON
XML
XML
JSON
RDF
Quota
(calls/day)
30000
unl
300
3000
unl
50000
NLP&DBpedia International Workshop, Sydney, October 2013
0
1333
unl
5000
10000
Classification
schema
22/10/2013 -
12/15
13. NERD Ontology
Aligned the taxonomies used by
the extractors
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 13
14. NERD type
Building the NERD Ontology
Occurrence
Person
10
Organization
10
Country
Company
6
Continent
5
City
5
RadioStation
5
Album
5
Product
5
...
NLP&DBpedia International Workshop, Sydney, October 2013
6
Location
22/10/2013 -
6
...
- 14
15. NERD REST API
RDF
/document
/user
/annotation/{extractor}
/extraction
/evaluation
...
GET,
POST,
PUT,
DELETE
JSON
“entities” : [{
“entity”: “Tim Berners-Lee” ,
“type”: “Person” ,
“uri”: "http://dbpedia.org/resource/Tim_berners_lee",
“nerdType”: "http://nerd.eurecom.fr/ontology#Person",
“startChar”: 30,
“endChar”: 45,
“confidence”: 1,
“relevance”: 0.5
}]
Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction
Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 15
16. NERD meets NIF
Model documents through a
set of strings deferencable on
the Web
: offset_23107_ 23110 a str:String ;
str:referenceContext :offset_0_26546 .
Map string to entity
: offset_23107_ 23110 sso:oen dbpedia:W3C.
Classification
dbpedia:W3C
rdf:type
nerd:Organization .
Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the Linked
Data Cloud. In: (LDOW'12) Linked Data on the Web (WWW'12), Lyon, France.
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 16
19. History of NER benchmarks
CoNLL 2003 and CoNLL 2005
schema (4 types): person, organization, location and miscellaneous
ACE 2004, ACE 2005 and ACE 2007
schema (7 types): person, organization, location, facility, weapon,
vehicle and geo-political entity
entity recognition, co-ref, find relationships among entities extracted
TAC 2009 (Knowledge Base Track)
schema (3 types): person, organization and location
create a knowledge base from the named entities extracted
ETAPE 2012 (Named Entity Task)
schema: Quaero (7 main types, 32 sub-types)
MSM 2013: tweet corpus !
schema (4 types): person, organization, location, miscellaneous
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 19
20. ETAPE 2012 challenge
genre
train
dev
test
TV news
7h 40m
1h 40m
1h 40m
BFM Story, Top QUestions (LCP)
TV debates
10h 30m
5h 10m
5h 10m
Pile et Face, Ca vous regarde,
Entre les lignes (LCP)
1h 05m
1h 05m
La place du village (TV8)
TV amusements -
sources
Train
Dev
Eval
Item length
26h
10h 55m
10h 55m
Nb files
44
15
15
Nb words
290517
91656
115511
Nb Named Entities
46763
14398
13055
Nb unique categories
33
33
33
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 20
21. NERD @ ETAPE (naïve combined strategy)
extraction
(eA1,tA1,URIA1,siA1,eiA1) ...
(eA2,tA2,URIA2,siA2,eiA2)
(eA3,tA3,URIA3,siA3,eiA3)
...
...
cleaning
fusion
`
22/10/2013 -
(eN1,tN1,URIN1,siN1,eiN1)
(eN2,tN2,URIN2,siN2,eiN2)
When at least 2 extractors classify the
same entity with a different type then
we apply a preferred selection order
(empirically defined): Wikimeta,
AlchemyAPI, OpenCalais, Lupedia
NLP&DBpedia International Workshop, Sydney, October 2013
- 21
22. Participation at ETAPE (combined+ strategy)
ETAPE
Train & Dev
...
Learned model
POS tagger
Created
static rules
Apply rules
(eA1,tA1,URIA1,siA1,eA1
)
(eA2,tA2,URIA2,siA2,eiA2
)
(e1,t1,URI1,si1,ei1)
fusion
Conflicts handled by
priority selection: own,
Wikimeta,AlchemyAPI,
OpenCalais,Lupedia
(eN1,tN1,URIN1,sN1,eN1)
`(e ,t ,URI ,s ,e )
N2 N2
N2 N2 N2
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 22
31. Linking pieces of knowledge
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 31
32. Linking pieces of knowledge
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 32
33. Named Entities for Video Classification
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 33
34. Workflow
5:Timed Text
6: NEs with time
alignment
(json)
2: Metadata
7: RDFize (ttl)
Media Fragment Enricher Services
Metadata &
timed-text
1: Video
URL
NERD
Client
3: metadata
RDFizator
9: SPARQL query
4:NERDify
Video and
metadata preview
Categorization
Triple Store
8: Generate
Category
Video replay with subtitles and
aligned NEs
Media Fragment Enricher UI
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 34
35. Channel signature based on NE distribution
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 35
38. ... and enrichment for hypervideos
CONCEPT IN
PLAYER
Cubism
Expressionism
Fauvism
FACETS / PROPERTIES OF CONCEPT
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
CONTENT ENRICHMENT
- 38
39. Media Fragments and Annotations
http://data.linkedtv.eu/medi
a/e2899e7f#t=840,900
nerd:Location
Casablanca
nerd:Location
Cafe Rick
nerd:Person
H. Bogart
nerd:Person
I. Bergman
Media Fragment URI 1.0
22/10/2013 -
Chapters
Scenes
Shots
etc…
NLP&DBpedia International Workshop, Sydney, October 2013
- 39
41. Media Fragment + Open Annotation + NERD
Locator
MediaResource
OffsetBasedString
Annotation
MediaFragment
Entity
Type
URL (hyperlink)
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 41
42. Towards a Linked Media Layer
Enriching media with media from a closed collection
(e.g. BBC archive)
The MediaEval scenario (~ 1697 hours of archived BBC video)
http://www.multimediaeval.org/mediaeval2013/hyper2013/
Enriching media with content from the open web
LinkedTV scenarios: white listed web sites for each program
Media Collector for Social Media
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 42
43. Seed video enriched with web content
rbbaktuell_20120809
nerd:Location
Brandenburg
oa
45. Media Finder (named entities clustering)
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 45
46. Media Finder (zooming in a cluster)
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 46
47. Media Finder: http://mediafinder.eurecom.fr/
Live Topic Generation from Event Streams
WWW 2013 Demo Session
http://www.youtube.com/watch?v=8iRiwz7cDYY
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 47
48. Credits
Giuseppe Rizzo, Vuk Milicic,
José Luis Redondo Garcia (EURECOM)
Thomas Steiner (Google Inc.)
Marieke van Erp (Free University of Amsterdam)
Yunjia Li (University of Southampton)
… and many other students
22/10/2013 -
NLP&DBpedia International Workshop, Sydney, October 2013
- 48