DevEX - reference for building teams, processes, and platforms
NUIG Research Showcase 2014
1. Entity Linking with Multiple Knowledge Bases
What is the text talking about?
Motivation
Written communication has been a common way of sharing knowledge between humans.
But machines understand natural language text as a sequence of characters without any
meaning.
When asked about a term (sequence of characters) the computer can spot that sequence but
cannot explain its meaning.
Bianca Pereira
This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.
Proposed Solution
Even big cross-domain Knowledge Bases do not cover all knowledge in the world.
Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. In
other words, we want to enable the use of different sources of concepts.
Our approach is based on three main steps: selection of textual features, selection of
Knowledge Base Features, and use of a Collective Inference Algorithm.
When a human reader wants to understand the content of a text she uses the words around a
given term to determine its meaning (context words). Noun phrases and verbs are the main
source of information. In the same way, words appearing near the term are more relevant
than those appearing far in the text. In a computer-based environment those features are
extracted and used to measure how probable a given concept in the knowledge base has
been cited by that term.
When analyzing those context words, a human also performs the mapping between the
words in the text and her previous knowledge. This is used to modify the probability that the
term is citing a given concept instead of another one. In a computer-based environment, the
relationship between concepts in a Knowledge Base can be used to modify the probability of
linking with a given entry.
In the last step, a human uses the coherence characteristic of a text to perform the
understanding of all terms. The basic assumption is that terms appearing in a coherent text
are somehow related in the previous knowledge of the reader (unless they are concepts
introduced by the text). In a computer-based environment, this step aggregates all features
and, using the probabilities computed, detect all the best linking between each term in the
text and their respective concepts in the Knowledge Base. This is done through a process
called Collective Inference.
Problem Statement
Natural language texts are hard to understand due to two linguistic features: polysemy and
synonymy.
Related Work
Humans process the content of a text first by matching the terms with their previous
knowledge. In a computer-based environment this previous knowledge is given by a
Knowledge Base.
In Computer Science, the process that mimics this linking process is called Entity Linking. It
is the task of linking terms in a text with Knowledge Base entries that represent the same real
world concept.
Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases
(e.g. Wikipedia, DBPedia and YAGO).
Challenges
The disambiguation of terms is our key challenge. In other words, the definition of the right
concept for each term cited in text.
Since our goal is in the use of multiple Knowledge Bases there are also two other challenges
to address: the processing of Big Data and the hetereogeneity in the semantic description
of Knowledge Bases.
This text is
not meaningful
for machines.
This text is not
meaningful for
machines.
SOURCE: http://google.com SOURCE: http://bing.com SOURCE: http://yahoo.com
Polysemy happens when a single term
may be related to more than one concept.
Synonymy happens when there are many
terms that refer to the same concept.
Jackson
NUIG
National University
of Ireland, Galway
Michael Jackson, the singer of Black or White, died in 2009.
http://en.wikipedia.org/wiki/Michael_Jackso
n
http://en.wikipedia.org/wiki/Black_or_White
X X
I started my night watching Copacabana and ended in a party dancing
Havana D’Primera.
Michael Jackson, the composer of Blame it on the Boogie, has the same
name of the member of Jackson 5.
? ?
context words
http://musicbrainz.org/work/8ffc75e5-
3ddb-4a6a-a2d5-8ec5ecee1c78
singer_of composer_of
http://musicbrainz.org/artist/f27ec8db-
af05-4f36-916e-3d57f91ecf5e
http://musicbrainz.org/artist/059e57d8-
af63-4d90-8078-ebed36985fff
Michael Jackson, the composer of Blame it on the Boogie, has the same
name of the member of Jackson 5.
?
? ?
Main Findings
Not all Knowledge Bases contain textual descriptions for all concepts. As major previous work
assume.
Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross-
domain ones [3]?
How is the method when applied in cross-domain ones [4]?
To be continued.. (a.k.a. Future Work)
References
[1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence,
194, 130-150.
[2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of
named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association
for Computational Linguistics.
[3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international
conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee.
[4] EuroSentiment Project. Work Package 4. http://eurosentiment.eu
Pictures from http://pixabay.com