Presentation given at NLP&DBpedia workshop on 18 October 2016. The presentation accompanies the work described in: https://nlpdbpedia2016.files.wordpress.com/2016/09/nlpdbpedia2016_paper_9.pdf
2. Conclusions
• Finegrained entity typing is necessary for semantic
queries over text
• Search space for word2vec is large, topics help
• Combining Distributional Semantics with DBpedia can
help overcome NIL and Dark Entities
https://github.com/MvanErp/entity-typing/
3. Dark entities: little or no information available in KB
https://github.com/MvanErp/entity-typing/
4. Dark entities: little or no information available in KB
https://github.com/MvanErp/entity-typing/
5. Distributional Semantics
• Similar concepts (denoted by words) occur in similar
contexts
• Word2Vec (Mikolov et al., 2013) explores this notion in a
popular implementation
Sushi
Teriyaki
Udon
Okonomiyaki
Soba
Sashimi
Kimono
Yukata
Nemaki
KFC
Steak
Hamburger
McDonald’s
Jeans
T-shirt
Skirt
6. Research Question:
• Can we predict the type of the concept ‘Sushi’ by
modelling it in a distributional semantics space and
comparing its vector to the vectors of concepts for which
we do know the type?
Sushi
Teriyaki
Udon
Okonomiyaki
Soba
Sashimi
Kimono
Yukata
Nemaki
KFC
Steak
Hamburger
McDonald’s
Jeans
T-shirt
Skirt
7. Setup
• 7 Named Entity Linking Benchmark datasets (AIDA-YAGO,
2014 NEEL, 2015 NEEL, OKE2015, RSS500, WES2015,
Wikinews)
• 3 Word2Vec models: GoogleNews, English Wikipedia,
Reuters RCV1*
• Compare all entities within datasets to each other and return
highest ranking type (as taken from DBpedia)
* AIDA-YAGO is part of Reuters RCV1
https://github.com/MvanErp/entity-typing/
11. Conclusions and Future Work
• Difficult task, but topics help
• Ranking needs to be improved
• Multi-class classification (KFC: food & organisation,
Arnold Schwarzenegger: Actor & Politician)
• What else can we discover beyond type?
https://github.com/MvanErp/entity-typing/