Make Sense of Microposts with NEEL Challenge

Making Sense of Microposts
(#Microposts2015) @ WWW2015
Named Entity rEcognition
and Linking Challenge
http://www.scc.lancs.ac.uk/microposts2015/challenge/

NEEL challenge overview
➢ Challenging to make sense of Microposts
○ they are very short text messages
○ they contain abbreviations and typos
○ they are “grammar free”
➢ The NEEL challenge aims to explore new
approaches to foster research into novel,
more accurate entity recognition and linking
approaches tailored for Microposts

2013
2014
Information Extraction (IE)
named entity recognition (4 types)
2015
Named Entity Extraction and Linking
(NEEL)
named entity extraction and linking to
DBpedia 3.9 entries
Named Entity rEcognition and Linking
(NEEL)
named entity recognition (7 types) and linking
to DBpedia 2014 entries

➢ normalization
○ linguistic pre-processing and expansion of tweets
➢ entity recognition and linking
○ sequential and semi-joint tasks
○ large Knowledge Bases (such as DBpedia and
Yago) as lexical dictionaries and source of already
existing relations among entities
○ supervised learning approaches to both predict the
type of the entity given the linguistic and contextual
similarity, and the link given the semantic similarity
○ unsupervised learning approaches for grouping
similar lexical entities, affecting the entity resolution
Highlights of the submitted
approaches over the 3-year challenge

Sponsorship
➢ Successfully obtained sponsorship each year
○ highlights importance of this practical research
○ importance extends BEYOND academia
➢ Sponsor has early access to results as senior
PC member
○ opportunity to liaise with participants to extend work
➢ Workshop and participants obtain greater
exposure

➢ Italian company operating in the business of
knowledge extraction and representation
➢ successfully participated in 2014 NEEL
challenge, ranking 3rd overall

29 teams expressed intent to take part in the
challenge

21 teams finally got
involved and signed the
agreement to access to
the NEEL challenge
corpus

NEEL corpus
no. of tweets %
Training 3498 58.06
Development 500 8.3
Test 2027 33.64

NEEL Corpus details
➢ 6025 tweets
○ events from 2011 and 2013 such the London Riots,
the Oslo bombing (cf. event-annotated tweets
provided by the Redites project)
○ events in 2014 such as UCI Cyclo-cross World Cup
➢ Corpus available after having signed the
NEEL Agreement Form
(remains available by contacting msm.
orgcom@gmail.com)

Manual creation of the Gold
Standard
3-step annotation
1. unsupervised annotations, with intent to
extract candidate links which were used as
input to the second stage. NERD-ML was
used as off-the-shelf system
2. three human annotators analyzed and
complemented the annotations. GATE was
used as the workbench
3. one domain expert reviewed and resolved
problematic cases

Evaluation protocol
Participants were asked to wrap their
prototypes as a publicly accessible
web service following a REST-based
protocol
Widen the dissemination, ensure the
reproducibility, the reuse, and the
correctness of the results

Evaluation periods
D-Time to test the contending entries
(REST APIs) submitted by the
participants
T-Time for the final evaluation and
metric computations

Submissions and Runs
➢ Paper submission
○ describing approach taken
○ identifying and detailing any limitations or
dependencies of approach
➢ Up to 10 contending entries
○ best of 3 used for the final ranking

Evaluation scorer
TAC KBP official scorer
https://github.com/wikilinks/neleval

Evaluation metrics
tagging strong_typed_mention_match
(check entity name boundary and type)
linking strong_link_match
clustering mention_ceaf (NIL over the exact
match of the entities)
latency computation time

Ranking strategy
rs
= 0.4*clusteringF1
+
0.3*taggingF1
+
0.3*linkingF1
we resolved to the latency to sort draws

7 teams participated to
the T-Time

Drop of 14 participants
due to complexity
i) of the challenge protocol, which has
required broaden expertise in different
domains such as Information Extraction,
Data Semantics, and Web
ii) generally low results

Ikuya Yamada, Hideaki Takeda and
Yoshiyasu Takefuji
An End-to-End Entity Linking Approach
for Tweets
Team Ousia

rank runid
team name rs
1 9 ousia 0.8067
2 7 acubelab 0.4757
3 guru uva 0.4756
4 UNIBA-SUP uniba 0.4329
5 ualberta ualberta 0.3808
6 CEN_NEEL_1 cen_neel 0.0004
7 run2 tcs-iitkgp NCA*
NEEL Final Ranking
NCA = annotations not compliant with the NEEL specs

NEEL Final Ranking
breakdown per clusteringF1
rank runid
team name clusteringF1
1 9 ousia 0.84
2 guru uva 0.643
3 7 acubelab 0.506
6 CEN_NEEL_1 cen_neel 0.001
7 run2 tcs-iitkgp NCA

NEEL Final Ranking
breakdown per taggingF1
rank runid
team name taggingF1
1 9 ousia 0.807
2 guru uva 0.412
3 7 acubelab 0.388
6 CEN_NEEL_1 cen_neel 0

NEEL Final Ranking
breakdown per linkingF1
rank runid
team name linkingF1
1 9 ousia 0.762
2 7 acubelab 0.523
3 guru uva 0.316
6 CEN_NEEL_1 cen_neel 0

NEEL Final Ranking
breakdown per submission

rank team name runID
taggingF1
clusteringF1
linkingF1
latency[ms] score
1 ousia 9 0.807 0.84 0.762 8500.99 +/- 3619.12 0.8067
2 ousia 5 0.68 0.843 0.762 8477.88 +/- 3596.47 0.7698
3 ousia 10 0.679 0.842 0.762 8493.38 +/-3562.96 0.7691
4 acubelab 7 0.388 0.506 0.523 127.97 +/- 21.84 0.4757
5 uva guru 0.412 0.643 0.316 186.95 +/- 88.53 0.4756
6 acubelab 6 0.385 0.506 0.524 126.55 +/- 20.31 0.4751
7 acubelab 9 0.386 0.504 0.52 126.54 +/- 19.16 0.4734
8 uva wiz 0.404 0.642 0.285 187.83 +/- 99.78 0.4635
9 uva qtip 0.383 0.595 0.318 1731.16 +/- 857.98 0.4483
10 uniba UNIBA-SUP 0.367 0.459 0.464 2034.75 +/- 2346.23 0.4329
11 ualberta ualberta 0.329 0.394 0.415 3406.43 +/- 7625.28 0.3808
12 uniba
UNIBA-
UNSUP 0.283 0.37 0.348 761.88 +/- 631.59 0.3373
13 cen_neel
CEN_NEEL_
1 0 0.001 0 12366.61 +/- 27598.28 0.0004
14 tcs-iitkgp run2 NCA NCA NCA 12888.27 +/- 11654.02 NaN

Acknowledgements
The research leading to this
work was partially supported by
the European Union’s 7th
Framework Programme via the
projects LinkedTV

Make Sense of Microposts with NEEL Challenge

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (20)

Similaire à Make Sense of Microposts with NEEL Challenge

Similaire à Make Sense of Microposts with NEEL Challenge (20)

Plus de Giuseppe Rizzo

Plus de Giuseppe Rizzo (20)

Dernier

Dernier (20)

Make Sense of Microposts with NEEL Challenge