1. Making Sense of Microposts
(#Microposts2015) @ WWW2015
Named Entity rEcognition
and Linking Challenge
http://www.scc.lancs.ac.uk/microposts2015/challenge/
2. NEEL challenge overview
➢ Challenging to make sense of Microposts
○ they are very short text messages
○ they contain abbreviations and typos
○ they are “grammar free”
➢ The NEEL challenge aims to explore new
approaches to foster research into novel,
more accurate entity recognition and linking
approaches tailored for Microposts
3. 2013
2014
Information Extraction (IE)
named entity recognition (4 types)
2015
Named Entity Extraction and Linking
(NEEL)
named entity extraction and linking to
DBpedia 3.9 entries
Named Entity rEcognition and Linking
(NEEL)
named entity recognition (7 types) and linking
to DBpedia 2014 entries
4. ➢ normalization
○ linguistic pre-processing and expansion of tweets
➢ entity recognition and linking
○ sequential and semi-joint tasks
○ large Knowledge Bases (such as DBpedia and
Yago) as lexical dictionaries and source of already
existing relations among entities
○ supervised learning approaches to both predict the
type of the entity given the linguistic and contextual
similarity, and the link given the semantic similarity
○ unsupervised learning approaches for grouping
similar lexical entities, affecting the entity resolution
Highlights of the submitted
approaches over the 3-year challenge
5. Sponsorship
➢ Successfully obtained sponsorship each year
○ highlights importance of this practical research
○ importance extends BEYOND academia
➢ Sponsor has early access to results as senior
PC member
○ opportunity to liaise with participants to extend work
➢ Workshop and participants obtain greater
exposure
6. ➢ Italian company operating in the business of
knowledge extraction and representation
➢ successfully participated in 2014 NEEL
challenge, ranking 3rd overall
8. 21 teams finally got
involved and signed the
agreement to access to
the NEEL challenge
corpus
9. NEEL corpus
no. of tweets %
Training 3498 58.06
Development 500 8.3
Test 2027 33.64
10. NEEL Corpus details
➢ 6025 tweets
○ events from 2011 and 2013 such the London Riots,
the Oslo bombing (cf. event-annotated tweets
provided by the Redites project)
○ events in 2014 such as UCI Cyclo-cross World Cup
➢ Corpus available after having signed the
NEEL Agreement Form
(remains available by contacting msm.
orgcom@gmail.com)
11. Manual creation of the Gold
Standard
3-step annotation
1. unsupervised annotations, with intent to
extract candidate links which were used as
input to the second stage. NERD-ML was
used as off-the-shelf system
2. three human annotators analyzed and
complemented the annotations. GATE was
used as the workbench
3. one domain expert reviewed and resolved
problematic cases
12. Evaluation protocol
Participants were asked to wrap their
prototypes as a publicly accessible
web service following a REST-based
protocol
Widen the dissemination, ensure the
reproducibility, the reuse, and the
correctness of the results
13. Evaluation periods
D-Time to test the contending entries
(REST APIs) submitted by the
participants
T-Time for the final evaluation and
metric computations
14. Submissions and Runs
➢ Paper submission
○ describing approach taken
○ identifying and detailing any limitations or
dependencies of approach
➢ Up to 10 contending entries
○ best of 3 used for the final ranking
19. Drop of 14 participants
due to complexity
i) of the challenge protocol, which has
required broaden expertise in different
domains such as Information Extraction,
Data Semantics, and Web
ii) generally low results
28. Acknowledgements
The research leading to this
work was partially supported by
the European Union’s 7th
Framework Programme via the
projects LinkedTV