SlideShare a Scribd company logo
1 of 26
Can we use Linked Data
Semantic Annotators for
the Extraction of Domain-
Relevant Expressions?
Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada
Prof. Amal Zouaq, Royal Military College of Canada, Canada
Dr. Ludovic Jean-louis, Ecole Polytechnique de
Montréal, Canada
Introduction
What is Semantic Annotation on the LOD?
Candidate Selection
Disambiguation
Traditionally focused on named entity annotation
Semantic Annotators
LOD-based Semantic Annotators
Most used annotators
REST API / Web Services
Default configuration
Alchemy
(entities +
keywords)
YahooZemanta
Wikimeta
DBPedia
Spotlight
OpenCalais
Lupedia
Annotation Evaluation
Few comparative studies have been published
on the performance of linked data Semantic
Annotators
The available studies generally focus on
“traditional” named entity annotation evaluation
(ORG, Person, etc.)
NEs versus domain topics
Semantic Annotators start to go beyond these
traditional NEs
Predefined classes such as PERSON, ORG
Expansion of possible classes (NERD Taxonomy :
sport events, operating systems, political events, …)
 RDF (NE? Domain topic ?)
 Semantic Web (NE? Domain topic ?)
Domain Topics: important “concepts” for the
domain of interest
NE versus Topics Annotation
Research Question 1
Previous studies did not draw any conclusion about
the relevance of the annotated entities
But relevance might prove to be a crucial point for
tasks such as information seeking or query
answering
RQ1: Can we use linked data-based semantic
annotators to accurately identify Domain-relevant
expressions?
Research Methodology
We relied on a corpus of 8 texts taken from
Wikipedia, all related to the artificial intelligence
domain. Together, these texts represent 10570
words.
We obtained 2023 expression occurrences among
which 1151 were distinct.
Building the Gold Standard
We asked a human evaluator to analyze each
expression and make the following decisions:
Does the annotated expression represent an
understandable named entity or topic?
Is the expression a relevant keyword according
to the document?
Is the expression a relevant named entity
according to the document?
Distribution of Expressions
Note that only 639 out of 1151 detected expressions
are understandable (56%). Thus a substantial
number of detected expressions represents noise.
Frequency of detected Expressions
Most of the expressions were detected by only
one (76%) or two (16%) annotators.
Semantic annotators seemed complementary.
(326 relevant)
(117 relevant)
(40 relevant)
(15 relevant)
(4 relevant)
(4 relevant)
(1 relevant)
Evaluation
Standard precision, recall and F-score
“Partial” recall, since our Gold Standard
considers only the expressions detected by
at least one annotator, instead of considering
all possible expressions in the corpus.
All Expressions
mend
Named Entities
Domain Topics
Few Highlights
F-score is unsatisfactory for almost all annotators
Best F-score around 62-63 (All expressions, topics)
Alchemy seems to stand out
It obtains a high value for recall when all expressions
or only topics are considered
But the precision of Alchemy (59%) is not very
high, compared to the results of Zemanta
(81%), OpenCalais (75%) and Yahoo (74%)
None of the annotators was good at detecting relevant
named entities.
DBpedia Spotlight annotated many expressions that
are not considered understandable
Research Question 2
Given the complementarity of Semantic
Annotators, we decided to experiment with
this second research question:
Can we improve the overall performance
of semantic annotators using voting
methods and machine learning?
Voting and machine learning
methods
We experimented with the following
methods:
Simple votes
Weighted votes where the weight is the
precision of individual annotators on a
training corpus
K-nearest-neighbours classifier
Naïve Bayes classifier
Decision tree (C4.5)
Rule induction (PART algorithm)
Experiment Setup
We divided the set of annotated
expressions into 5 balanced partitions
One partition for testing
Remaining partitions for training
We ran 5 experiments for each of the
following situations: all entities, only
named entities and only topics
Baseline: Simple Votes
All Expressions:
Best prec : 77%
(Zemanta)
Best rec/F : 69% -
62% (Alchemy)
Best P: 81% (R/F
12%-21%)
(Zemanta)
Best R/F: 69%-63%
(Alchemy) P=59%
Threshold: Number of annotators
Weighted Votes
Best P: 81%
(Zemanta) (R/F
12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Machine Learning
Best P: 81% (Zemanta)
(R/F 12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Highlights
On the average, weighted vote displays the
best precision (0.66) among combination
methods if only topics are considered
Alchemy (0.59)
Alchemy finds many relevant expressions
that are not discovered by other annotators
Machine learning methods achieve better
recall and F-score on the average, especially
if decision tree or rule induction (PART) is
used
Conclusion
Can we use Linked Data Semantic
Annotators for the Extraction of Domain-
Relevant Expressions?
Best overall performance (Recall/F-Score:
Alchemy) (59%- 69%-63%)
Best precision: Zemanta (81%)
Need of filtering!
Combination/machine learning methods
Increased recall with weighted voting
scheme
Future Work
Limitations
The size of the gold standard
A limited set of semantic annotators
Increase the size of the GS
Explore various combinations for annotators/
features
Compare Semantic Annotators with keyword
extractors
QUESTIONS?
Thank you for your attention!
amal.zouaq@rmc.ca
http://azouaq.athabascau.ca

More Related Content

What's hot

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondPeter Geil
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Povedasemanticsconference
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionAldo Gangemi
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Andres Garcia-Silva
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010AAT Taiwan
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionsssw2012
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
 

What's hot (19)

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyond
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introduction
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 

Similar to Zouaq wole2013

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRubén Izquierdo Beviá
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRubén Izquierdo Beviá
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppratnapatil14
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Marcia Zeng
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsNaveen Ashish
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15Hans Höchtl
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Sunil Kumar Kopparapu
 

Similar to Zouaq wole2013 (20)

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Ontology
OntologyOntology
Ontology
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
Knowledge Extraction
Knowledge ExtractionKnowledge Extraction
Knowledge Extraction
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Zouaq wole2013

  • 1. Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada Prof. Amal Zouaq, Royal Military College of Canada, Canada Dr. Ludovic Jean-louis, Ecole Polytechnique de Montréal, Canada
  • 2. Introduction What is Semantic Annotation on the LOD? Candidate Selection Disambiguation Traditionally focused on named entity annotation
  • 3. Semantic Annotators LOD-based Semantic Annotators Most used annotators REST API / Web Services Default configuration Alchemy (entities + keywords) YahooZemanta Wikimeta DBPedia Spotlight OpenCalais Lupedia
  • 4. Annotation Evaluation Few comparative studies have been published on the performance of linked data Semantic Annotators The available studies generally focus on “traditional” named entity annotation evaluation (ORG, Person, etc.)
  • 5. NEs versus domain topics Semantic Annotators start to go beyond these traditional NEs Predefined classes such as PERSON, ORG Expansion of possible classes (NERD Taxonomy : sport events, operating systems, political events, …)  RDF (NE? Domain topic ?)  Semantic Web (NE? Domain topic ?) Domain Topics: important “concepts” for the domain of interest
  • 6. NE versus Topics Annotation
  • 7. Research Question 1 Previous studies did not draw any conclusion about the relevance of the annotated entities But relevance might prove to be a crucial point for tasks such as information seeking or query answering RQ1: Can we use linked data-based semantic annotators to accurately identify Domain-relevant expressions?
  • 8. Research Methodology We relied on a corpus of 8 texts taken from Wikipedia, all related to the artificial intelligence domain. Together, these texts represent 10570 words. We obtained 2023 expression occurrences among which 1151 were distinct.
  • 9. Building the Gold Standard We asked a human evaluator to analyze each expression and make the following decisions: Does the annotated expression represent an understandable named entity or topic? Is the expression a relevant keyword according to the document? Is the expression a relevant named entity according to the document?
  • 10. Distribution of Expressions Note that only 639 out of 1151 detected expressions are understandable (56%). Thus a substantial number of detected expressions represents noise.
  • 11. Frequency of detected Expressions Most of the expressions were detected by only one (76%) or two (16%) annotators. Semantic annotators seemed complementary. (326 relevant) (117 relevant) (40 relevant) (15 relevant) (4 relevant) (4 relevant) (1 relevant)
  • 12. Evaluation Standard precision, recall and F-score “Partial” recall, since our Gold Standard considers only the expressions detected by at least one annotator, instead of considering all possible expressions in the corpus.
  • 16. Few Highlights F-score is unsatisfactory for almost all annotators Best F-score around 62-63 (All expressions, topics) Alchemy seems to stand out It obtains a high value for recall when all expressions or only topics are considered But the precision of Alchemy (59%) is not very high, compared to the results of Zemanta (81%), OpenCalais (75%) and Yahoo (74%) None of the annotators was good at detecting relevant named entities. DBpedia Spotlight annotated many expressions that are not considered understandable
  • 17. Research Question 2 Given the complementarity of Semantic Annotators, we decided to experiment with this second research question: Can we improve the overall performance of semantic annotators using voting methods and machine learning?
  • 18. Voting and machine learning methods We experimented with the following methods: Simple votes Weighted votes where the weight is the precision of individual annotators on a training corpus K-nearest-neighbours classifier Naïve Bayes classifier Decision tree (C4.5) Rule induction (PART algorithm)
  • 19. Experiment Setup We divided the set of annotated expressions into 5 balanced partitions One partition for testing Remaining partitions for training We ran 5 experiments for each of the following situations: all entities, only named entities and only topics
  • 20. Baseline: Simple Votes All Expressions: Best prec : 77% (Zemanta) Best rec/F : 69% - 62% (Alchemy) Best P: 81% (R/F 12%-21%) (Zemanta) Best R/F: 69%-63% (Alchemy) P=59% Threshold: Number of annotators
  • 21. Weighted Votes Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 22. Machine Learning Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 23. Highlights On the average, weighted vote displays the best precision (0.66) among combination methods if only topics are considered Alchemy (0.59) Alchemy finds many relevant expressions that are not discovered by other annotators Machine learning methods achieve better recall and F-score on the average, especially if decision tree or rule induction (PART) is used
  • 24. Conclusion Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Best overall performance (Recall/F-Score: Alchemy) (59%- 69%-63%) Best precision: Zemanta (81%) Need of filtering! Combination/machine learning methods Increased recall with weighted voting scheme
  • 25. Future Work Limitations The size of the gold standard A limited set of semantic annotators Increase the size of the GS Explore various combinations for annotators/ features Compare Semantic Annotators with keyword extractors
  • 26. QUESTIONS? Thank you for your attention! amal.zouaq@rmc.ca http://azouaq.athabascau.ca