SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Usage of Word Sense Disambiguation in
Concept Identification in Ontology
Construction
1
Guest Talk at University of Moratuwa, Department of Computer Science and Engineering
5th November, 2016
Discussed by: Kiruparan Balachandran
Background Information - Ontology
Ontology provides a potential method to describe domain knowledge
2
algorithm
sorting algorithm
problem
solve
complexity
has
is a
Background Information - Ontology learning layer-cake approach
Terms
Relations
Concept Hierarchy
Concepts
Synonyms
{Randomized algorithm, sorting algorithm, system software, application software}
{Randomized algorithm, sorting algorithm}, {system software, application software}
Algorithm (I, E, L)
isA(sorting algorithm, algorithm) - known as Taxonomy relationship
solve (algorithm, problem) - known as Non- Taxonomy relationship
RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)
3
Implemented approach follows Buitelaar et al. criteria in forming concepts
from terms
• An intentional definition of the concept
• Formal definition: A term can be considered as a concept if the term is linked with a valid relation to
another term.
• Informal definition: A term should have a textual description.
• A set of concept instances, i.e. its extensions: a term can be considered a concept if it has
instances.
• A set of linguistic realizations.
4
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
5
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
For example ts = “we propose a hardware design, call the
virtual line scheme, that allows the utilization of large virtual
cache line when fetch datum from memory for better
exploitation of spatial locality”
cache#n#1, cache#n#2, and cache#n#3
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
6
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
7
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
• Simplified LESK
• To solve combinatorial explosion
• Runs a separate disambiguation process for each ambiguous word in the input text
• Adapted LESK
• Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms,
attribute relations, and their associated definitions
8
Less accuracy
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
9
ConSim (C1, C2) =
2∗N3
N1+N2+2∗N3
root
C3
C1 C2
N1 N2
N3
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
10
Weight = C – path length – k * number of changes of direction
Which algorithm best suited ?
• Link strength of a parent-child link using corpus statistical information
11
Information content + distance
Information Content : obtained by estimating probability of occurrence of class in a large text corpus
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
12
Disambiguating Concepts (LESK ?)
cache#n#1, cache#n#2, and cache#n#3
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
13
Disambiguating Concepts (LESK ?)
For example
• WNs1 e.g. “a hidden storage space for money or
provisions or weapons”
• WNs2 e.g. “a secret store of valuables or money”
• WNs3 e.g. “RAM memory that is set aside as a
specialized buffer storage, which is continually updated;
used to optimize data transfers between system
elements with different characteristics”
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
14
Disambiguating Concepts (LESK ?)
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
15
Disambiguating Concepts (LESK ?)
Evaluation – domain-specific concept extraction
Annotator 1 Annotator 2 Annotator 3
ComSciPrecision for concepts 75% 56% 78%
Our
approach
MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al.
Bio MedicalRecall 58.70% 57.73% 20.27%
• Identified 253 computer science domain-specific concepts validated by three domain experts
• Measured the inter-annotator agreement using Fleiss' kappa
• 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories)
• Identified 47 domain-specific concepts for the GENIA corpus
• compared with two different approaches discussed by Zhou et al. and Subramaniam et al.
16
Why LESK ?
17
Conclusion
Choosing a best WSD algorithm based on
• Nature of your problem
• Available factors
• Performance with respect to accuracy and time
References
18
K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on
Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41.
P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005.
X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer,
2006, pp. 1145-1149.
L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and
an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417.
G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305,
pp. 305-332, 1998.
S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed:
Springer, 2002, pp. 136-145.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138.
M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual
international conference on Systems documentation, 1986, pp. 24-26.
C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265-
283, MIT Press, 1998.
J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.
Questions ?
Thank You…
19

Contenu connexe

Tendances

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterSudarsun Santhiappan
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 

Tendances (20)

Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

En vedette

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]akm sabbir
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Grupo HULAT
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 

En vedette (15)

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Word-sense disambiguation
Word-sense disambiguationWord-sense disambiguation
Word-sense disambiguation
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Similaire à Usage of word sense disambiguation in concept identification in ontology construction

A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniquesvivatechijri
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approchanil maurya
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 

Similaire à Usage of word sense disambiguation in concept identification in ontology construction (20)

A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
G04124041046
G04124041046G04124041046
G04124041046
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
Fusing semantic data
Fusing semantic dataFusing semantic data
Fusing semantic data
 
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 

Dernier

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 

Dernier (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 

Usage of word sense disambiguation in concept identification in ontology construction

  • 1. Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction 1 Guest Talk at University of Moratuwa, Department of Computer Science and Engineering 5th November, 2016 Discussed by: Kiruparan Balachandran
  • 2. Background Information - Ontology Ontology provides a potential method to describe domain knowledge 2 algorithm sorting algorithm problem solve complexity has is a
  • 3. Background Information - Ontology learning layer-cake approach Terms Relations Concept Hierarchy Concepts Synonyms {Randomized algorithm, sorting algorithm, system software, application software} {Randomized algorithm, sorting algorithm}, {system software, application software} Algorithm (I, E, L) isA(sorting algorithm, algorithm) - known as Taxonomy relationship solve (algorithm, problem) - known as Non- Taxonomy relationship RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem) 3
  • 4. Implemented approach follows Buitelaar et al. criteria in forming concepts from terms • An intentional definition of the concept • Formal definition: A term can be considered as a concept if the term is linked with a valid relation to another term. • Informal definition: A term should have a textual description. • A set of concept instances, i.e. its extensions: a term can be considered a concept if it has instances. • A set of linguistic realizations. 4
  • 5. Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 5 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts For example ts = “we propose a hardware design, call the virtual line scheme, that allows the utilization of large virtual cache line when fetch datum from memory for better exploitation of spatial locality”
  • 6. cache#n#1, cache#n#2, and cache#n#3 Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 6 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts
  • 7. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing 7
  • 8. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing • Simplified LESK • To solve combinatorial explosion • Runs a separate disambiguation process for each ambiguous word in the input text • Adapted LESK • Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms, attribute relations, and their associated definitions 8 Less accuracy
  • 9. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 9 ConSim (C1, C2) = 2∗N3 N1+N2+2∗N3 root C3 C1 C2 N1 N2 N3
  • 10. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 10 Weight = C – path length – k * number of changes of direction
  • 11. Which algorithm best suited ? • Link strength of a parent-child link using corpus statistical information 11 Information content + distance Information Content : obtained by estimating probability of occurrence of class in a large text corpus
  • 12. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 12 Disambiguating Concepts (LESK ?) cache#n#1, cache#n#2, and cache#n#3
  • 13. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 13 Disambiguating Concepts (LESK ?) For example • WNs1 e.g. “a hidden storage space for money or provisions or weapons” • WNs2 e.g. “a secret store of valuables or money” • WNs3 e.g. “RAM memory that is set aside as a specialized buffer storage, which is continually updated; used to optimize data transfers between system elements with different characteristics”
  • 14. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 14 Disambiguating Concepts (LESK ?)
  • 15. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 15 Disambiguating Concepts (LESK ?)
  • 16. Evaluation – domain-specific concept extraction Annotator 1 Annotator 2 Annotator 3 ComSciPrecision for concepts 75% 56% 78% Our approach MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al. Bio MedicalRecall 58.70% 57.73% 20.27% • Identified 253 computer science domain-specific concepts validated by three domain experts • Measured the inter-annotator agreement using Fleiss' kappa • 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories) • Identified 47 domain-specific concepts for the GENIA corpus • compared with two different approaches discussed by Zhou et al. and Subramaniam et al. 16
  • 17. Why LESK ? 17 Conclusion Choosing a best WSD algorithm based on • Nature of your problem • Available factors • Performance with respect to accuracy and time
  • 18. References 18 K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41. P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005. X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer, 2006, pp. 1145-1149. L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417. G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305, pp. 305-332, 1998. S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed: Springer, 2002, pp. 136-145. Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138. M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual international conference on Systems documentation, 1986, pp. 24-26. C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265- 283, MIT Press, 1998. J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.