SlideShare une entreprise Scribd logo
1  sur  1
Entity Linking with Multiple Knowledge Bases
What is the text talking about?
Motivation
Written communication has been a common way of sharing knowledge between humans.
But machines understand natural language text as a sequence of characters without any
meaning.
When asked about a term (sequence of characters) the computer can spot that sequence but
cannot explain its meaning.
Bianca Pereira
This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.
Proposed Solution
Even big cross-domain Knowledge Bases do not cover all knowledge in the world.
Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. In
other words, we want to enable the use of different sources of concepts.
Our approach is based on three main steps: selection of textual features, selection of
Knowledge Base Features, and use of a Collective Inference Algorithm.
When a human reader wants to understand the content of a text she uses the words around a
given term to determine its meaning (context words). Noun phrases and verbs are the main
source of information. In the same way, words appearing near the term are more relevant
than those appearing far in the text. In a computer-based environment those features are
extracted and used to measure how probable a given concept in the knowledge base has
been cited by that term.
When analyzing those context words, a human also performs the mapping between the
words in the text and her previous knowledge. This is used to modify the probability that the
term is citing a given concept instead of another one. In a computer-based environment, the
relationship between concepts in a Knowledge Base can be used to modify the probability of
linking with a given entry.
In the last step, a human uses the coherence characteristic of a text to perform the
understanding of all terms. The basic assumption is that terms appearing in a coherent text
are somehow related in the previous knowledge of the reader (unless they are concepts
introduced by the text). In a computer-based environment, this step aggregates all features
and, using the probabilities computed, detect all the best linking between each term in the
text and their respective concepts in the Knowledge Base. This is done through a process
called Collective Inference.
Problem Statement
Natural language texts are hard to understand due to two linguistic features: polysemy and
synonymy.
Related Work
Humans process the content of a text first by matching the terms with their previous
knowledge. In a computer-based environment this previous knowledge is given by a
Knowledge Base.
In Computer Science, the process that mimics this linking process is called Entity Linking. It
is the task of linking terms in a text with Knowledge Base entries that represent the same real
world concept.
Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases
(e.g. Wikipedia, DBPedia and YAGO).
Challenges
The disambiguation of terms is our key challenge. In other words, the definition of the right
concept for each term cited in text.
Since our goal is in the use of multiple Knowledge Bases there are also two other challenges
to address: the processing of Big Data and the hetereogeneity in the semantic description
of Knowledge Bases.
This text is
not meaningful
for machines.
This text is not
meaningful for
machines.
SOURCE: http://google.com SOURCE: http://bing.com SOURCE: http://yahoo.com
Polysemy happens when a single term
may be related to more than one concept.
Synonymy happens when there are many
terms that refer to the same concept.
Jackson
NUIG
National University
of Ireland, Galway
Michael Jackson, the singer of Black or White, died in 2009.
http://en.wikipedia.org/wiki/Michael_Jackso
n
http://en.wikipedia.org/wiki/Black_or_White
X X
I started my night watching Copacabana and ended in a party dancing
Havana D’Primera.
Michael Jackson, the composer of Blame it on the Boogie, has the same
name of the member of Jackson 5.
? ?
context words
http://musicbrainz.org/work/8ffc75e5-
3ddb-4a6a-a2d5-8ec5ecee1c78
singer_of composer_of
http://musicbrainz.org/artist/f27ec8db-
af05-4f36-916e-3d57f91ecf5e
http://musicbrainz.org/artist/059e57d8-
af63-4d90-8078-ebed36985fff
Michael Jackson, the composer of Blame it on the Boogie, has the same
name of the member of Jackson 5.
?
? ?
Main Findings
Not all Knowledge Bases contain textual descriptions for all concepts. As major previous work
assume.
Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross-
domain ones [3]?
How is the method when applied in cross-domain ones [4]?
To be continued.. (a.k.a. Future Work)
References
[1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence,
194, 130-150.
[2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of
named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association
for Computational Linguistics.
[3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international
conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee.
[4] EuroSentiment Project. Work Package 4. http://eurosentiment.eu
Pictures from http://pixabay.com

Contenu connexe

Similaire à NUIG Research Showcase 2014

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
Object? You Keep Using that Word
Object? You Keep Using that WordObject? You Keep Using that Word
Object? You Keep Using that WordKevlin Henney
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project Jie Bao
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceDaniel Lewis
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'mahmad
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based ReporterStefan Prutianu
 
Literacy Integration Presentation
Literacy Integration PresentationLiteracy Integration Presentation
Literacy Integration PresentationNAFCareerAcads
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
About Correlation Technology
About Correlation TechnologyAbout Correlation Technology
About Correlation Technologys0P5a41b
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 

Similaire à NUIG Research Showcase 2014 (20)

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Object? You Keep Using that Word
Object? You Keep Using that WordObject? You Keep Using that Word
Object? You Keep Using that Word
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconferenceMarcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
 
Literacy Integration Presentation
Literacy Integration PresentationLiteracy Integration Presentation
Literacy Integration Presentation
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Edinburgh
EdinburghEdinburgh
Edinburgh
 
FinalReport
FinalReportFinalReport
FinalReport
 
About Correlation Technology
About Correlation TechnologyAbout Correlation Technology
About Correlation Technology
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 

Plus de Bianca Pereira

Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's blockBianca Pereira
 
HCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectHCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectBianca Pereira
 
Taxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionTaxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionBianca Pereira
 
How to build your topic?
How to build your topic?How to build your topic?
How to build your topic?Bianca Pereira
 
Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's blockBianca Pereira
 
Smart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeSmart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeBianca Pereira
 
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Bianca Pereira
 
Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Bianca Pereira
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterBianca Pereira
 
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Bianca Pereira
 
PhD Day: Adaptive Entity Linking
PhD Day: Adaptive Entity LinkingPhD Day: Adaptive Entity Linking
PhD Day: Adaptive Entity LinkingBianca Pereira
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsBianca Pereira
 
PhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationPhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationBianca Pereira
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachBianca Pereira
 
How to Make Your Content Smarter
How to Make Your Content SmarterHow to Make Your Content Smarter
How to Make Your Content SmarterBianca Pereira
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Bianca Pereira
 

Plus de Bianca Pereira (17)

Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's block
 
HCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectHCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science project
 
Taxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionTaxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base Construction
 
How to build your topic?
How to build your topic?How to build your topic?
How to build your topic?
 
Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's block
 
Smart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeSmart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's College
 
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
 
Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge Chapter
 
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
 
PhD Day: Adaptive Entity Linking
PhD Day: Adaptive Entity LinkingPhD Day: Adaptive Entity Linking
PhD Day: Adaptive Entity Linking
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data Datasets
 
PhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationPhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology Modularization
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
 
How to Make Your Content Smarter
How to Make Your Content SmarterHow to Make Your Content Smarter
How to Make Your Content Smarter
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)
 

Dernier

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

NUIG Research Showcase 2014

  • 1. Entity Linking with Multiple Knowledge Bases What is the text talking about? Motivation Written communication has been a common way of sharing knowledge between humans. But machines understand natural language text as a sequence of characters without any meaning. When asked about a term (sequence of characters) the computer can spot that sequence but cannot explain its meaning. Bianca Pereira This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. Proposed Solution Even big cross-domain Knowledge Bases do not cover all knowledge in the world. Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. In other words, we want to enable the use of different sources of concepts. Our approach is based on three main steps: selection of textual features, selection of Knowledge Base Features, and use of a Collective Inference Algorithm. When a human reader wants to understand the content of a text she uses the words around a given term to determine its meaning (context words). Noun phrases and verbs are the main source of information. In the same way, words appearing near the term are more relevant than those appearing far in the text. In a computer-based environment those features are extracted and used to measure how probable a given concept in the knowledge base has been cited by that term. When analyzing those context words, a human also performs the mapping between the words in the text and her previous knowledge. This is used to modify the probability that the term is citing a given concept instead of another one. In a computer-based environment, the relationship between concepts in a Knowledge Base can be used to modify the probability of linking with a given entry. In the last step, a human uses the coherence characteristic of a text to perform the understanding of all terms. The basic assumption is that terms appearing in a coherent text are somehow related in the previous knowledge of the reader (unless they are concepts introduced by the text). In a computer-based environment, this step aggregates all features and, using the probabilities computed, detect all the best linking between each term in the text and their respective concepts in the Knowledge Base. This is done through a process called Collective Inference. Problem Statement Natural language texts are hard to understand due to two linguistic features: polysemy and synonymy. Related Work Humans process the content of a text first by matching the terms with their previous knowledge. In a computer-based environment this previous knowledge is given by a Knowledge Base. In Computer Science, the process that mimics this linking process is called Entity Linking. It is the task of linking terms in a text with Knowledge Base entries that represent the same real world concept. Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases (e.g. Wikipedia, DBPedia and YAGO). Challenges The disambiguation of terms is our key challenge. In other words, the definition of the right concept for each term cited in text. Since our goal is in the use of multiple Knowledge Bases there are also two other challenges to address: the processing of Big Data and the hetereogeneity in the semantic description of Knowledge Bases. This text is not meaningful for machines. This text is not meaningful for machines. SOURCE: http://google.com SOURCE: http://bing.com SOURCE: http://yahoo.com Polysemy happens when a single term may be related to more than one concept. Synonymy happens when there are many terms that refer to the same concept. Jackson NUIG National University of Ireland, Galway Michael Jackson, the singer of Black or White, died in 2009. http://en.wikipedia.org/wiki/Michael_Jackso n http://en.wikipedia.org/wiki/Black_or_White X X I started my night watching Copacabana and ended in a party dancing Havana D’Primera. Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? context words http://musicbrainz.org/work/8ffc75e5- 3ddb-4a6a-a2d5-8ec5ecee1c78 singer_of composer_of http://musicbrainz.org/artist/f27ec8db- af05-4f36-916e-3d57f91ecf5e http://musicbrainz.org/artist/059e57d8- af63-4d90-8078-ebed36985fff Michael Jackson, the composer of Blame it on the Boogie, has the same name of the member of Jackson 5. ? ? ? Main Findings Not all Knowledge Bases contain textual descriptions for all concepts. As major previous work assume. Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross- domain ones [3]? How is the method when applied in cross-domain ones [4]? To be continued.. (a.k.a. Future Work) References [1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, 130-150. [2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association for Computational Linguistics. [3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee. [4] EuroSentiment Project. Work Package 4. http://eurosentiment.eu Pictures from http://pixabay.com