SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Similarity Measures for
Semantic Relation Extraction
Mont Clair State University, Brown Bag Seminar (USA)
Alexander Panchenko
Universit´e catholique de Louvain &
Ditital Society Laboratory LLC
alexander.panchenko@uclouvain.be
May 2, 2014
Alexander Panchenko 1/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Alexander Panchenko 2/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 3/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Computational Lexical Semantics
* Picture is adapted from Computational Linguistics LINGI2263 course
http://www.uclouvain.be/en-cours-2013-LINGI2263.html
Alexander Panchenko 4/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Introduction
Motivation
1 Synonyms, hypernyms and co-hyponyms are useful for:
text similarity (ˇSaric et al., 2012);
query expansion (Hsu et al., 2006);
question answering (Sun et al., 2005);
2 Manual resource construction is prohibitively expensive.
3 Extractors do not meet quality of the handcrafted resources.
Focus
Similarity-based semantic relation extraction.
Research Question
How to improve precision and coverage of such measures?
Alexander Panchenko 5/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Resources
Definition
A semantic resource is an undirected graph (C, R):
nodes C represent terms;
edges R represent untyped semantic relations.
Alexander Panchenko 6/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Relation Extractors
We study extractors based on two components:
1 semantic similarity measures;
2 nearest neighbors procedures.
Terms
Similarity Measure
R
S
Normalizer
S
Semantic Similarity Measure
Semantic Relations
Feature Extractor
Text-Based Data
kNN Procedure
F
C
Semantic Relation Extractor
Alexander Panchenko 7/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Similarity Measures
Definition
A semantic similarity measure quantifies semantic relatedness input
terms ci , cj with the similarity score sij = sim(ci , cj ):
sij =
high if ci , cj is a pair of syn, hyper, cohypo
0 otherwise
Properties
Nonnegativity: 0 ≤ sij ≤ 1;
Reflexivity: sij = 1 ⇔ ci = cj ;
Symmetry: sij = sji ;
Triangle inequality: sij ≤ sik + skj
Alexander Panchenko 8/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Similarity Measures
Many dissimilar pairs, few similar pairs: sij ∼ exp(λ):
Similarity distribution of the term “doctor”:
Alexander Panchenko 9/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Evaluation of Semantic Similarity Measures
1 Correlations with human judgments:
Criterion: Pearson correlation (ρ) и Spearman correlation (r).
Datasets: MC, RG, WordSim.
2 Semantic relation ranking:
Criterion: Precision, Recall, F-measure.
Dataset: BLESS, SN.
3 Semantic relation extraction:
Criterion: Precision@k.
Data: annotation and/or dictionaries.
4 Application-based evaluation:
short text classification system (iCOP);
lexico-semantic search engine (Serelex).
Panchenko A., Similarity Measures for Semantic Relation
Extraction. PhD thesis. Universit´e catholique de Louvain. 197
pages, 2013, (Chapter 1).
Alexander Panchenko 10/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Correlations with human judgments
Alexander Panchenko 11/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Relation Ranking
Precision P(k = 50) = 1
7 ≈ 0.86
word, ci word, cj relation type sij
aficionado enthusiast syn 0.07197
aficionado fan syn 0.05195
aficionado admirer syn 0.01964
aficionado addict syn 0.01326
aficionado devotee syn 0.01163
aficionado foundling random 0.00777
aficionado fanatic syn 0.00414
aficionado adherent syn 0.00353
aficionado capital random 0.00232
aficionado statute random 0.00029
aficionado blot random 0.00025
aficionado meddler random 0.00005
aficionado enlargement random 0.00003
aficionado bawdyhouse random 0.00000
Alexander Panchenko 12/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 13/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Related publications
This work stems from Hearst, M. A. Automatic acquisition of
hyponyms from large text corpora. In ACL, pages 539–545,
1992.
Selected publications:
Panchenko A., Morozova O., Naets H. A Semantic
Similarity Measure Based on Lexico-Syntactic Patterns.
In Proceedings of KONVENS 2012, pp.174–178, Vienna
(Austria), 2012
Panchenko A., Romanov P., Morozova O., Naets H.,
Philippovich A., Fairon C. Serelex: Search and
Visualization of Semantically Related Words. In
Proceedings of the 35th European Conference on Information
Retrieval (ECIR 2013), Moscow (Russia), 2013.
Alexander Panchenko 14/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
A live demo
http://serelex.cental.be/
Alexander Panchenko 15/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-syntactic patterns
18 patterns that extract hypernyms, co-hyponyms and
synonyms
Alexander Panchenko 16/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Patterns are encoded as FSTs
Finite State Transducers (FSTs)
Open source corpus processing tool Unitex:
http://igm.univ-mlv.fr/~unitex/
Alexander Panchenko 17/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
A pattern encoded as an FST
Take into account linguistic variation
Unlike string-based patterns (Bollegala et al., 2007)
Alexander Panchenko 18/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Patterns extract concordances
such diverse {[occupations]} as {[doctors]},
{[engineers]} and {[scientists]}[PATTERN=1]
such {non-alcoholic [sodas]} as {[root beer]} and
{[cream soda]}[PATTERN=1]
{traditional[food]}, such as
{[sandwich]},{[burger]}, and {[fry]}[PATTERN=2]
Alexander Panchenko 19/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Corpus
Corpus Wikipedia+ukWaC: 2.9 · 1012 tokens
Extracted concordances
Wikipedia – 1.196.468
ukWaC – 2.227.025
WaCypedia+ukWaC – 3.423.493
Alexander Panchenko 20/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Reranking formula Efreq-Rnum-Cfreq-Pnum
sij =
√
pij ·
2 · µb
bi∗ + b∗j
·
P(ci , cj )
P(ci )P(cj )
.
P(ci , cj ) =
eij
ij eij
– extraction probability of the pair ci , cj ,
eij – frequency of co-occurrence of ci and cj in concordances K
P(ci ) = fi
i fi
– probability of the term ci , fi – frequency of ci
bi∗ = j:eij ≥β 1 – the number of extractions for term ci with
the frequency ≥ β, µb = 1
|C|
|C|
i=1 bi∗ – the average number
of extractions per term
pij ∈ [1; 18] – number of distinct patterns which extracted the
relation ci , cj
Alexander Panchenko 21/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Relation Ranking
Precision is comparable or better w.r.t. the baselines;
Recall is lower w.r.t. the baselines.
Figure : Precision-Recall graphs (the BLESS dataset).
Alexander Panchenko 22/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Semantic Relation Extraction
Precision@1 ≈ 0.80;
“Good” coverage:
Alexander Panchenko 23/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 24/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Related publications
Panchenko A. A Study of Heterogeneous Similarity
Measures for Semantic Relation Extraction. // In
JEP-TALN-RECITAL 2012 — Grenoble (France), 2012.
Panchenko A., Similarity Measures for Semantic Relation
Extraction. PhD thesis. Universit´e catholique de Louvain.
197 pages, 2013: Chapters 2.1, 3.1.
Alexander Panchenko 25/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Compared Semantic Similarity Measures
37 distinct measures;
Q1: Are the measures are complementary?
Q2: If yes, in which respects?
Alexander Panchenko 26/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
The Best Single Measures (MC, RG, WordSim, BLESS, SN)
Each one extracts many co-hyponyms, e.g.:
Canon, Nikon ,
Lamborghini, Ferrari ,
Obama, Romney .
Alexander Panchenko 27/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Further Results
Most dissimilar measures
Figure : 21 measures grouped according to
their relation distributions.
Measures are
complementary w.r.t.:
lexical coverage;
performances;
types of semantic
relations they extract.
Alexander Panchenko 28/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Implementation of the baseline measures
Semantic Vectors:
https://code.google.com/p/semanticvectors/
S-Space Package:
https://code.google.com/p/airhead-research/
WordNet::Similarity:
http://wn-similarity.sourceforge.net
NLTK: http://nltk.googlecode.com/svn/trunk/doc/
howto/wordnet.html
WikiRelate!
PatternSim / Serelex: http://serelex.cental.be
Web-based metrics:
http://cwl-projects.cogsci.rpi.edu/msr
LSA: http://lsa.colorado.edu
Alexander Panchenko 29/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 30/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Related publications
Panchenko A., Morozova O. A Study of Hybrid Similarity
Measures for Semantic Relation Extraction. // Innovative
Hybrid Approaches to the Processing of Textual Data
Workshop, EACL 2012 — Avignon (France), 2012 — pp. 10–18
Panchenko A., Similarity Measures for Semantic Relation
Extraction. PhD thesis. Universit´e catholique de Louvain.
197 pages, 2013, (Chapter 4).
Panchenko A. A Study of Heterogeneous Similarity
Measures for Semantic Relation Extraction. // In
JEP-TALN-RECITAL 2012 — Grenoble (France), 2012 — pp.
29–42.
Alexander Panchenko 31/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Hybrid vs Single Measures
Terms, C
simi
(a) (b)
combination method
Scmb
S1 SN
sim1
S1
simN
norm
SN
...
...norm
norm
Scmb
knn
R
Si
norm
Si
knn
SingleSimilarityMeasure
HybridSimilarityMeasure
Relations,
Terms, C
RRelations,
Features
Figure : Semantic relation extractor based on:
(a) a single similarity measure;
(b) a hybrid similarity measure.
Alexander Panchenko 32/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
16 Features = 16 Single Similarity Measures
5 network-based measures :
1 WuPalmer;
2 Leacock and Chodorow;
3 Resnik;
4 Jiang and Conrath;
5 Lin.
3 web-based measures (NGD-Yahoo/Bing/Google);
5 corpus-based measures:
2 distributional (BDA, SDA)
1 lexico-syntactic patterns (PatternSim)
2 other co-occurence based (LSA, NGD-Factiva)
3 definition-based measures
1 ExtendedLesk;
2 GlossVectors;
3 DefVectors-WktWiki.
Alexander Panchenko 33/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Unsupervised Combination Methods
1 Mean: scmb
ij = 1
K k=1,K sk
ij ;
2 Mean-Nnz: scmb
ij = 1
|k:sk
ij >0,k=1,K| k=1,K sk
ij ;
3 Mean-Zscore: Scmb = 1
K
K
k=1
Sk −µk
σk
;
4 Median: scmb
ij = median(s1
ij , . . . , sK
ij );
5 Max: scmb
ij = max(s1
ij , . . . , sK
ij );
6 RankFusion: scmb
ij = 1
K k=1,K rk
ij ;
7 RelationFusion (Panchenko and Morozova, 2012).
Alexander Panchenko 34/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Supervised Combination Methods
8 Logit, Logit-L1, Logit-L2.
A binary logistic regression;
Positive examples – synonyms, hyponyms, co-hyponyms from
BLESS/SN;
Negative examples – random relations from BLESS/SN;
A relation ci , t, cj ∈ R is represented with a vector of
pairwise similarities: x = (s1
ij , . . . , sN
ij ), N = 2, 16;
Category yij :
yij =
0 if ci , t, cj is a random relation
1 otherwise
Using the model (w1, . . . , wK ) for combination:
scmb
ij =
1
1 + e−z
, z =
K
k=1
wk sk
ij + w0.
Alexander Panchenko 35/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Supervised Combination Methods
9 SVM.
The weights w and the support
vectors SV :
w =
xi ∈SV
αi yi xi .
Using the model
scmb
ij = wT
x+b =
K
k=1
wi sk
ij +b.
Alexander Panchenko 36/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Hybrid Similarity Measures
Precision-Recall graphs calculated on the BLESS dataset:
(a) 16 single measures and the best hybrid measure Logit-E15;
(b) 8 hybrid measures.
Alexander Panchenko 37/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Hybrid Similarity Measure Logit-E15
Figure : Similarity scores between 74 words related to the word “acacia”.
Alexander Panchenko 38/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Supervised Hybrid Similarity Measures
Alexander Panchenko 39/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Supervised Hybrid Similarity Measures (cont.)
Figure : Meta-parameter optimization with the grid search of the
C-SVM-radial-E15 measure.
Alexander Panchenko 40/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 41/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 42/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Related publications
Panchenko A., Romanov P., Morozova O., Naets H.,
Philippovich A., Fairon C. Serelex: Search and
Visualization of Semantically Related Words. In
Proceedings of the 35th European Conference on Information
Retrieval (ECIR 2013), Moscow (Russia), 2013.
Panchenko A., Naets H., Brouwers L., Romanov P., Fairon C.,
Recherche et visualisation de mots s´emantiquement li´es.
Actes de la 20e conf´erence sur le Traitement Automatique des
Langues Naturelles (TALN’2013). Les Sables d’Olonne,
France. pp.747–754, 2013.
Alexander Panchenko 43/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Search for Related Words: the List and the Graph
http://serelex.cental.be/
Alexander Panchenko 44/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Search for Related Words: the List and the Graph
Alexander Panchenko 45/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Search for Related Words: the Images
Alexander Panchenko 46/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Lexico-Semantic Search Engine “Serelex”
Evaluation of the Serelex
Figure : Users’ satisfaction with the top 20 results.
Alexander Panchenko 47/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Filename Categorization System “iCOP”
Plan
1 The Context and the Problem
2 Pattern-Based Semantic Similarity Measure
3 Comparison of Similarity Measures
4 Hybrid Semantic Similarity Measures
5 Applications of Semantic Similarity Measures
Lexico-Semantic Search Engine “Serelex”
Filename Categorization System “iCOP”
Alexander Panchenko 48/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Filename Categorization System “iCOP”
Related publications
Panchenko A., Naets H., Beaufort R., Fairon C. Towards
Detection of Child Sexual Abuse Media: Classification of
the Associated Filenames. In Proceedings of the 35th
European Conference on Information Retrieval (ECIR 2013).
LNCS 7814, pp. 776-779. Springler-Verlag Berlin Heidelberg
2013.
Panchenko A, Beaufort R., Fairon C. Detection of Child
Sexual Abuse Media on P2P Networks: Normalization
and Classification of Associated Filenames. In
Proceedings of Workshop on Language Resources for Public
Security Applications of the 8th International Conference on
Language Resources and Evaluation (LREC), 2012
Alexander Panchenko 49/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Filename Categorization System “iCOP”
Short text classification with Vocabulary Projection
Alexander Panchenko 50/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Filename Categorization System “iCOP”
Evaluation of the Vocabulary Projection
Training Dataset Test Dataset Accuracy Accuracy (voc. projection)
Gallery (train) Gallery 96.41 96.83 (+0.42)
PirateBay Title+Desc+Tags PirateBay Title+Desc+Tags 98.92 98.86 (–0.06)
PirateBay Title+Tags PirateBay Title+Tags 97.73 97.63 (–0.10)
Gallery PirateBay Title+Desc+Tags 90.57 91.48 (+0.91)
Gallery PirateBay Title+Tags 84.23 88.89 (+4.66)
PirateBay Title+Desc+Tags Gallery 88.83 89.04 (+0.21)
PirateBay Title+Tags Gallery 91.16 91.30 (+0.14)
Table : Performance of an C-SVM linear classifier (10-fold cross
validation).
Alexander Panchenko 51/52
The Problem Pattern-Based Measure Comparison Hybrid Measures Applications
Filename Categorization System “iCOP”
Thank you! Questions?
Alexander Panchenko 52/52

Contenu connexe

Tendances

EASE 2019 keynote
EASE 2019 keynoteEASE 2019 keynote
EASE 2019 keynotePer Runeson
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingJaguaraci Silva
 
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...CSCJournals
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answeringAli Kabbadj
 
Unsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsUnsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsSpandana Gella
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET Journal
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text miningredpel dot com
 
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...Γιώργος Αλεξανδρίδης
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...cseij
 
20051128.doc
20051128.doc20051128.doc
20051128.docbutest
 
Ou leverhulme gt
Ou leverhulme gtOu leverhulme gt
Ou leverhulme gtAnne Adams
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyICDEcCnferenece
 
Building and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringBuilding and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringDaniel Mendez
 

Tendances (18)

D1802023136
D1802023136D1802023136
D1802023136
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
 
EASE 2019 keynote
EASE 2019 keynoteEASE 2019 keynote
EASE 2019 keynote
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
Unsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media TextsUnsupervised Word Usage Similarity in Social Media Texts
Unsupervised Word Usage Similarity in Social Media Texts
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
 
1.model building
1.model building1.model building
1.model building
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
From Free-text User Reviews to Product Recommendation using Paragraph Vectors...
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
 
20051128.doc
20051128.doc20051128.doc
20051128.doc
 
Ou leverhulme gt
Ou leverhulme gtOu leverhulme gt
Ou leverhulme gt
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
Building and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringBuilding and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software Engineering
 

En vedette

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 
Detecting Gender by Full Name: Experiments with the Russian Language
Detecting Gender by Full Name:  Experiments with the Russian LanguageDetecting Gender by Full Name:  Experiments with the Russian Language
Detecting Gender by Full Name: Experiments with the Russian LanguageAlexander Panchenko
 
Sentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking FacebookSentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking FacebookAlexander Panchenko
 
Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...Alexander Panchenko
 
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...Alexander Panchenko
 
Text Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataText Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataAlexander Panchenko
 
Идея и реальность. Трагедия креативного агентства.
Идея и реальность. Трагедия креативного агентства. Идея и реальность. Трагедия креативного агентства.
Идея и реальность. Трагедия креативного агентства. MOST Creative Club
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Неологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукНеологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукAlexander Panchenko
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologiesMarina Santini
 

En vedette (12)

Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 
Detecting Gender by Full Name: Experiments with the Russian Language
Detecting Gender by Full Name:  Experiments with the Russian LanguageDetecting Gender by Full Name:  Experiments with the Russian Language
Detecting Gender by Full Name: Experiments with the Russian Language
 
Sentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking FacebookSentiment Index of the Russian Speaking Facebook
Sentiment Index of the Russian Speaking Facebook
 
Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...Вычислительная лексическая семантика: метрики семантической близости и их при...
Вычислительная лексическая семантика: метрики семантической близости и их при...
 
Making Sense of Word Embeddings
Making Sense of Word EmbeddingsMaking Sense of Word Embeddings
Making Sense of Word Embeddings
 
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
Dmitry Gubanov. An Approach to the Study of Formal and Informal Relations of ...
 
Text Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK DataText Analysis of Social Networks: Working with FB and VK Data
Text Analysis of Social Networks: Working with FB and VK Data
 
Идея и реальность. Трагедия креативного агентства.
Идея и реальность. Трагедия креативного агентства. Идея и реальность. Трагедия креативного агентства.
Идея и реальность. Трагедия креативного агентства.
 
MOST Creative Club 2014
MOST Creative Club 2014MOST Creative Club 2014
MOST Creative Club 2014
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Неологизмы в социальной сети Фейсбук
Неологизмы в социальной сети ФейсбукНеологизмы в социальной сети Фейсбук
Неологизмы в социальной сети Фейсбук
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologies
 

Similaire à Similarity Measures for Semantic Relation Extraction

Community detection using citation relations and textual similarities in a la...
Community detection using citation relations and textual similarities in a la...Community detection using citation relations and textual similarities in a la...
Community detection using citation relations and textual similarities in a la...Nees Jan van Eck
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015Syed imran ali
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
 
ppt research method 1.ppt
ppt research method 1.pptppt research method 1.ppt
ppt research method 1.pptnovasyahminan
 
Correlation research
Correlation researchCorrelation research
Correlation researchAmina Tariq
 
Mixed methods-research -design-and-procedures
Mixed methods-research -design-and-proceduresMixed methods-research -design-and-procedures
Mixed methods-research -design-and-proceduresABCComputers
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Probit analysis in toxicological studies
Probit analysis in toxicological studies Probit analysis in toxicological studies
Probit analysis in toxicological studies kunthavai Nachiyar
 
Correlational research
Correlational researchCorrelational research
Correlational researchJijo G John
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...Comparison Intelligent Electronic Assessment with Traditional Assessment for ...
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...CSEIJJournal
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...cseij
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...cseij
 

Similaire à Similarity Measures for Semantic Relation Extraction (20)

Community detection using citation relations and textual similarities in a la...
Community detection using citation relations and textual similarities in a la...Community detection using citation relations and textual similarities in a la...
Community detection using citation relations and textual similarities in a la...
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015
 
Recommender system
Recommender systemRecommender system
Recommender system
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
ppt research method 1.ppt
ppt research method 1.pptppt research method 1.ppt
ppt research method 1.ppt
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Correlation research
Correlation researchCorrelation research
Correlation research
 
Mixed methods-research -design-and-procedures
Mixed methods-research -design-and-proceduresMixed methods-research -design-and-procedures
Mixed methods-research -design-and-procedures
 
Measure Term Similarity Using a Semantic Network Approach
Measure Term Similarity Using a Semantic Network ApproachMeasure Term Similarity Using a Semantic Network Approach
Measure Term Similarity Using a Semantic Network Approach
 
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of BurdenSystematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Probit analysis in toxicological studies
Probit analysis in toxicological studies Probit analysis in toxicological studies
Probit analysis in toxicological studies
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...Comparison Intelligent Electronic Assessment with Traditional Assessment for ...
Comparison Intelligent Electronic Assessment with Traditional Assessment for ...
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
 
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
COMPARISON INTELLIGENT ELECTRONIC ASSESSMENT WITH TRADITIONAL ASSESSMENT FOR ...
 

Plus de Alexander Panchenko

Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...Alexander Panchenko
 
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlAlexander Panchenko
 
Improving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic ClassesImproving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic ClassesAlexander Panchenko
 
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesInducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesAlexander Panchenko
 
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...Alexander Panchenko
 
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Alexander Panchenko
 
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...Alexander Panchenko
 
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationUsing Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationAlexander Panchenko
 
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Alexander Panchenko
 
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Alexander Panchenko
 
Getting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part IIGetting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part IIAlexander Panchenko
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...Alexander Panchenko
 

Plus de Alexander Panchenko (12)

Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...Graph's not dead: from unsupervised induction of linguistic structures from t...
Graph's not dead: from unsupervised induction of linguistic structures from t...
 
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
 
Improving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic ClassesImproving Hypernymy Extraction with Distributional Semantic Classes
Improving Hypernymy Extraction with Distributional Semantic Classes
 
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical ResourcesInducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources
 
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Que...
 
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...
Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Indu...
 
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...The 6th Conference on Analysis of Images, Social Networks, and Texts  (AIST 2...
The 6th Conference on Analysis of Images, Social Networks, and Texts (AIST 2...
 
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense DisambiguationUsing Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
 
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction...
 
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
Noun Sense Induction and Disambiguation using Graph-Based Distributional Sema...
 
Getting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part IIGetting started in Apache Spark and Flink (with Scala) - Part II
Getting started in Apache Spark and Flink (with Scala) - Part II
 
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
IIT-TUDA at SemEval-2016 Task 5: Beyond Sentiment Lexicon: Combining Domain ...
 

Dernier

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 

Dernier (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

Similarity Measures for Semantic Relation Extraction

  • 1. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Similarity Measures for Semantic Relation Extraction Mont Clair State University, Brown Bag Seminar (USA) Alexander Panchenko Universit´e catholique de Louvain & Ditital Society Laboratory LLC alexander.panchenko@uclouvain.be May 2, 2014 Alexander Panchenko 1/52
  • 2. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Alexander Panchenko 2/52
  • 3. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 3/52
  • 4. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Computational Lexical Semantics * Picture is adapted from Computational Linguistics LINGI2263 course http://www.uclouvain.be/en-cours-2013-LINGI2263.html Alexander Panchenko 4/52
  • 5. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Introduction Motivation 1 Synonyms, hypernyms and co-hyponyms are useful for: text similarity (ˇSaric et al., 2012); query expansion (Hsu et al., 2006); question answering (Sun et al., 2005); 2 Manual resource construction is prohibitively expensive. 3 Extractors do not meet quality of the handcrafted resources. Focus Similarity-based semantic relation extraction. Research Question How to improve precision and coverage of such measures? Alexander Panchenko 5/52
  • 6. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Resources Definition A semantic resource is an undirected graph (C, R): nodes C represent terms; edges R represent untyped semantic relations. Alexander Panchenko 6/52
  • 7. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Relation Extractors We study extractors based on two components: 1 semantic similarity measures; 2 nearest neighbors procedures. Terms Similarity Measure R S Normalizer S Semantic Similarity Measure Semantic Relations Feature Extractor Text-Based Data kNN Procedure F C Semantic Relation Extractor Alexander Panchenko 7/52
  • 8. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Similarity Measures Definition A semantic similarity measure quantifies semantic relatedness input terms ci , cj with the similarity score sij = sim(ci , cj ): sij = high if ci , cj is a pair of syn, hyper, cohypo 0 otherwise Properties Nonnegativity: 0 ≤ sij ≤ 1; Reflexivity: sij = 1 ⇔ ci = cj ; Symmetry: sij = sji ; Triangle inequality: sij ≤ sik + skj Alexander Panchenko 8/52
  • 9. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Similarity Measures Many dissimilar pairs, few similar pairs: sij ∼ exp(λ): Similarity distribution of the term “doctor”: Alexander Panchenko 9/52
  • 10. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Evaluation of Semantic Similarity Measures 1 Correlations with human judgments: Criterion: Pearson correlation (ρ) и Spearman correlation (r). Datasets: MC, RG, WordSim. 2 Semantic relation ranking: Criterion: Precision, Recall, F-measure. Dataset: BLESS, SN. 3 Semantic relation extraction: Criterion: Precision@k. Data: annotation and/or dictionaries. 4 Application-based evaluation: short text classification system (iCOP); lexico-semantic search engine (Serelex). Panchenko A., Similarity Measures for Semantic Relation Extraction. PhD thesis. Universit´e catholique de Louvain. 197 pages, 2013, (Chapter 1). Alexander Panchenko 10/52
  • 11. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Correlations with human judgments Alexander Panchenko 11/52
  • 12. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Relation Ranking Precision P(k = 50) = 1 7 ≈ 0.86 word, ci word, cj relation type sij aficionado enthusiast syn 0.07197 aficionado fan syn 0.05195 aficionado admirer syn 0.01964 aficionado addict syn 0.01326 aficionado devotee syn 0.01163 aficionado foundling random 0.00777 aficionado fanatic syn 0.00414 aficionado adherent syn 0.00353 aficionado capital random 0.00232 aficionado statute random 0.00029 aficionado blot random 0.00025 aficionado meddler random 0.00005 aficionado enlargement random 0.00003 aficionado bawdyhouse random 0.00000 Alexander Panchenko 12/52
  • 13. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 13/52
  • 14. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Related publications This work stems from Hearst, M. A. Automatic acquisition of hyponyms from large text corpora. In ACL, pages 539–545, 1992. Selected publications: Panchenko A., Morozova O., Naets H. A Semantic Similarity Measure Based on Lexico-Syntactic Patterns. In Proceedings of KONVENS 2012, pp.174–178, Vienna (Austria), 2012 Panchenko A., Romanov P., Morozova O., Naets H., Philippovich A., Fairon C. Serelex: Search and Visualization of Semantically Related Words. In Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013), Moscow (Russia), 2013. Alexander Panchenko 14/52
  • 15. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications A live demo http://serelex.cental.be/ Alexander Panchenko 15/52
  • 16. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-syntactic patterns 18 patterns that extract hypernyms, co-hyponyms and synonyms Alexander Panchenko 16/52
  • 17. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Patterns are encoded as FSTs Finite State Transducers (FSTs) Open source corpus processing tool Unitex: http://igm.univ-mlv.fr/~unitex/ Alexander Panchenko 17/52
  • 18. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications A pattern encoded as an FST Take into account linguistic variation Unlike string-based patterns (Bollegala et al., 2007) Alexander Panchenko 18/52
  • 19. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Patterns extract concordances such diverse {[occupations]} as {[doctors]}, {[engineers]} and {[scientists]}[PATTERN=1] such {non-alcoholic [sodas]} as {[root beer]} and {[cream soda]}[PATTERN=1] {traditional[food]}, such as {[sandwich]},{[burger]}, and {[fry]}[PATTERN=2] Alexander Panchenko 19/52
  • 20. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Corpus Corpus Wikipedia+ukWaC: 2.9 · 1012 tokens Extracted concordances Wikipedia – 1.196.468 ukWaC – 2.227.025 WaCypedia+ukWaC – 3.423.493 Alexander Panchenko 20/52
  • 21. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Reranking formula Efreq-Rnum-Cfreq-Pnum sij = √ pij · 2 · µb bi∗ + b∗j · P(ci , cj ) P(ci )P(cj ) . P(ci , cj ) = eij ij eij – extraction probability of the pair ci , cj , eij – frequency of co-occurrence of ci and cj in concordances K P(ci ) = fi i fi – probability of the term ci , fi – frequency of ci bi∗ = j:eij ≥β 1 – the number of extractions for term ci with the frequency ≥ β, µb = 1 |C| |C| i=1 bi∗ – the average number of extractions per term pij ∈ [1; 18] – number of distinct patterns which extracted the relation ci , cj Alexander Panchenko 21/52
  • 22. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Relation Ranking Precision is comparable or better w.r.t. the baselines; Recall is lower w.r.t. the baselines. Figure : Precision-Recall graphs (the BLESS dataset). Alexander Panchenko 22/52
  • 23. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Semantic Relation Extraction Precision@1 ≈ 0.80; “Good” coverage: Alexander Panchenko 23/52
  • 24. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 24/52
  • 25. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Related publications Panchenko A. A Study of Heterogeneous Similarity Measures for Semantic Relation Extraction. // In JEP-TALN-RECITAL 2012 — Grenoble (France), 2012. Panchenko A., Similarity Measures for Semantic Relation Extraction. PhD thesis. Universit´e catholique de Louvain. 197 pages, 2013: Chapters 2.1, 3.1. Alexander Panchenko 25/52
  • 26. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Compared Semantic Similarity Measures 37 distinct measures; Q1: Are the measures are complementary? Q2: If yes, in which respects? Alexander Panchenko 26/52
  • 27. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications The Best Single Measures (MC, RG, WordSim, BLESS, SN) Each one extracts many co-hyponyms, e.g.: Canon, Nikon , Lamborghini, Ferrari , Obama, Romney . Alexander Panchenko 27/52
  • 28. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Further Results Most dissimilar measures Figure : 21 measures grouped according to their relation distributions. Measures are complementary w.r.t.: lexical coverage; performances; types of semantic relations they extract. Alexander Panchenko 28/52
  • 29. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Implementation of the baseline measures Semantic Vectors: https://code.google.com/p/semanticvectors/ S-Space Package: https://code.google.com/p/airhead-research/ WordNet::Similarity: http://wn-similarity.sourceforge.net NLTK: http://nltk.googlecode.com/svn/trunk/doc/ howto/wordnet.html WikiRelate! PatternSim / Serelex: http://serelex.cental.be Web-based metrics: http://cwl-projects.cogsci.rpi.edu/msr LSA: http://lsa.colorado.edu Alexander Panchenko 29/52
  • 30. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 30/52
  • 31. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Related publications Panchenko A., Morozova O. A Study of Hybrid Similarity Measures for Semantic Relation Extraction. // Innovative Hybrid Approaches to the Processing of Textual Data Workshop, EACL 2012 — Avignon (France), 2012 — pp. 10–18 Panchenko A., Similarity Measures for Semantic Relation Extraction. PhD thesis. Universit´e catholique de Louvain. 197 pages, 2013, (Chapter 4). Panchenko A. A Study of Heterogeneous Similarity Measures for Semantic Relation Extraction. // In JEP-TALN-RECITAL 2012 — Grenoble (France), 2012 — pp. 29–42. Alexander Panchenko 31/52
  • 32. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Hybrid vs Single Measures Terms, C simi (a) (b) combination method Scmb S1 SN sim1 S1 simN norm SN ... ...norm norm Scmb knn R Si norm Si knn SingleSimilarityMeasure HybridSimilarityMeasure Relations, Terms, C RRelations, Features Figure : Semantic relation extractor based on: (a) a single similarity measure; (b) a hybrid similarity measure. Alexander Panchenko 32/52
  • 33. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications 16 Features = 16 Single Similarity Measures 5 network-based measures : 1 WuPalmer; 2 Leacock and Chodorow; 3 Resnik; 4 Jiang and Conrath; 5 Lin. 3 web-based measures (NGD-Yahoo/Bing/Google); 5 corpus-based measures: 2 distributional (BDA, SDA) 1 lexico-syntactic patterns (PatternSim) 2 other co-occurence based (LSA, NGD-Factiva) 3 definition-based measures 1 ExtendedLesk; 2 GlossVectors; 3 DefVectors-WktWiki. Alexander Panchenko 33/52
  • 34. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Unsupervised Combination Methods 1 Mean: scmb ij = 1 K k=1,K sk ij ; 2 Mean-Nnz: scmb ij = 1 |k:sk ij >0,k=1,K| k=1,K sk ij ; 3 Mean-Zscore: Scmb = 1 K K k=1 Sk −µk σk ; 4 Median: scmb ij = median(s1 ij , . . . , sK ij ); 5 Max: scmb ij = max(s1 ij , . . . , sK ij ); 6 RankFusion: scmb ij = 1 K k=1,K rk ij ; 7 RelationFusion (Panchenko and Morozova, 2012). Alexander Panchenko 34/52
  • 35. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Supervised Combination Methods 8 Logit, Logit-L1, Logit-L2. A binary logistic regression; Positive examples – synonyms, hyponyms, co-hyponyms from BLESS/SN; Negative examples – random relations from BLESS/SN; A relation ci , t, cj ∈ R is represented with a vector of pairwise similarities: x = (s1 ij , . . . , sN ij ), N = 2, 16; Category yij : yij = 0 if ci , t, cj is a random relation 1 otherwise Using the model (w1, . . . , wK ) for combination: scmb ij = 1 1 + e−z , z = K k=1 wk sk ij + w0. Alexander Panchenko 35/52
  • 36. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Supervised Combination Methods 9 SVM. The weights w and the support vectors SV : w = xi ∈SV αi yi xi . Using the model scmb ij = wT x+b = K k=1 wi sk ij +b. Alexander Panchenko 36/52
  • 37. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Hybrid Similarity Measures Precision-Recall graphs calculated on the BLESS dataset: (a) 16 single measures and the best hybrid measure Logit-E15; (b) 8 hybrid measures. Alexander Panchenko 37/52
  • 38. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Hybrid Similarity Measure Logit-E15 Figure : Similarity scores between 74 words related to the word “acacia”. Alexander Panchenko 38/52
  • 39. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Supervised Hybrid Similarity Measures Alexander Panchenko 39/52
  • 40. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Supervised Hybrid Similarity Measures (cont.) Figure : Meta-parameter optimization with the grid search of the C-SVM-radial-E15 measure. Alexander Panchenko 40/52
  • 41. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 41/52
  • 42. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 42/52
  • 43. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Related publications Panchenko A., Romanov P., Morozova O., Naets H., Philippovich A., Fairon C. Serelex: Search and Visualization of Semantically Related Words. In Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013), Moscow (Russia), 2013. Panchenko A., Naets H., Brouwers L., Romanov P., Fairon C., Recherche et visualisation de mots s´emantiquement li´es. Actes de la 20e conf´erence sur le Traitement Automatique des Langues Naturelles (TALN’2013). Les Sables d’Olonne, France. pp.747–754, 2013. Alexander Panchenko 43/52
  • 44. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Search for Related Words: the List and the Graph http://serelex.cental.be/ Alexander Panchenko 44/52
  • 45. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Search for Related Words: the List and the Graph Alexander Panchenko 45/52
  • 46. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Search for Related Words: the Images Alexander Panchenko 46/52
  • 47. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Lexico-Semantic Search Engine “Serelex” Evaluation of the Serelex Figure : Users’ satisfaction with the top 20 results. Alexander Panchenko 47/52
  • 48. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Filename Categorization System “iCOP” Plan 1 The Context and the Problem 2 Pattern-Based Semantic Similarity Measure 3 Comparison of Similarity Measures 4 Hybrid Semantic Similarity Measures 5 Applications of Semantic Similarity Measures Lexico-Semantic Search Engine “Serelex” Filename Categorization System “iCOP” Alexander Panchenko 48/52
  • 49. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Filename Categorization System “iCOP” Related publications Panchenko A., Naets H., Beaufort R., Fairon C. Towards Detection of Child Sexual Abuse Media: Classification of the Associated Filenames. In Proceedings of the 35th European Conference on Information Retrieval (ECIR 2013). LNCS 7814, pp. 776-779. Springler-Verlag Berlin Heidelberg 2013. Panchenko A, Beaufort R., Fairon C. Detection of Child Sexual Abuse Media on P2P Networks: Normalization and Classification of Associated Filenames. In Proceedings of Workshop on Language Resources for Public Security Applications of the 8th International Conference on Language Resources and Evaluation (LREC), 2012 Alexander Panchenko 49/52
  • 50. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Filename Categorization System “iCOP” Short text classification with Vocabulary Projection Alexander Panchenko 50/52
  • 51. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Filename Categorization System “iCOP” Evaluation of the Vocabulary Projection Training Dataset Test Dataset Accuracy Accuracy (voc. projection) Gallery (train) Gallery 96.41 96.83 (+0.42) PirateBay Title+Desc+Tags PirateBay Title+Desc+Tags 98.92 98.86 (–0.06) PirateBay Title+Tags PirateBay Title+Tags 97.73 97.63 (–0.10) Gallery PirateBay Title+Desc+Tags 90.57 91.48 (+0.91) Gallery PirateBay Title+Tags 84.23 88.89 (+4.66) PirateBay Title+Desc+Tags Gallery 88.83 89.04 (+0.21) PirateBay Title+Tags Gallery 91.16 91.30 (+0.14) Table : Performance of an C-SVM linear classifier (10-fold cross validation). Alexander Panchenko 51/52
  • 52. The Problem Pattern-Based Measure Comparison Hybrid Measures Applications Filename Categorization System “iCOP” Thank you! Questions? Alexander Panchenko 52/52