SlideShare une entreprise Scribd logo
1  sur  16
Human and Machine Judgements
about Russian Semantic Relatedness
• A. Panchenko, D. Ustalov, D. Paperno, C. Meyer, N.
Konstantinova, N. Loukachevitch, Ch. Bieman
Motivation
• A semantic similarity measure is a specific kind of
• similarity measure for nouns or multiword expressions.
• … high values for synonyms, hyponyms, free associations, etc.
• … low values for unrelated pairs
• Applications:
• information retrieval, document clustering, topic detection, question
answering, word sense disambiguation, text summarization…
• Most datasets, approaches were proposed for English
• 2015 Russe
• The First International Workshop on Russian Semantic Similarity
Evaluation (RUSSE)
• 19 participants, 105 runs, special session at the Dialog-2015 conference.
Russian Datasets for Measuring Word
Semantic Similarity
• Human Judgement dataset (HJ dataset)
– Word pairs with human judgements
• Russian Thesaurus dataset (RT dataset)
– synonyms and hypernyms from RuThes thesaurus
• Associative Thesaurus dataset (AE dataset)
– cognitive associations between words
• Machine Judgements
– combination of submissions from a shared task on Russian
semantic similarity
• Russian Distributional Thesaurus
Human judgements about semantic
similarity (HJ)
• This is the standard way to assess a semantic similarity
measure.
• The HJ dataset contains word pairs translated from the
widely used benchmarks for English:
• Miller-Charles set – 30 word pairs
• Rubenstein, H., Goodenough – 65 word pairs
• WordSim – 353 word pairs:
• Additionally subdivided into similarity set and relatedness set
• Evaluation: Correlations with human judgments in terms of Spearman’s
rank correlation
• Agreement in ordering
Human judgements: Crowdsourcing
Example of human judgements about
semantic similarity (HJ)
RuThes Lingustic Ontology
http://www.labinform.ru/pub/ruthes/index.htm
• 96 thousand unique words and expressions
– Synonyms
– Conceptual relations: class-subclass, part-whole, conceptual
dependence
•The dataset contains 114 066
relations for 6 832 nouns.
•Half of these relations are
synonyms and hypernyms
from the RuThes-lite
thesaurus
•half of them are unrelated
words.
Thesaurus Sociation.org
•Non-commercial
Internet-project
• contains 325,863
associations for 37,463
words
Structure of the semantic relation
classification (RT, AE) benchmarks
Russe: Best models according to the
HJ benchmark
MJ: Machine Judgements of Word Pairs
from the RUSSE Shared Task
• This dataset contains 12 886 word pairs coming
from HJ, RT, and AE datasets
• The pairs have continuous relatedness scores
• To estimate these scores we averaged 105
submissions of the shared task on Russian
semantic similarity, RUSSE.
• Each run consisted of 12 886 word pairs along
with their similarity scores.
Gathering Machine Judgements
• Select one best submission for each of 19
participating teams for HJ, RT and AE datasets
• Rank the 19 best submissions. The best one
has rank r1 = 19; the worst has rank r19 = 1
• Combine scores of these 19 best submissions
– The score of a pair is equal to sum of run scores
multiplied by run weight
– Run weight: rank, exponent of rank, or square root of
rank
• Combined approach is better than single
submission
Machine Judgements: Example
• word1,word2,sim,wmean
• препарат,вещество, 1.0,0.484418
• препарат,лекарство, 1.0,0.634770
• препарат,перестройка, 0.0,0.157699
• препарат,барселона, 0.0,0.105411
• инспекция,проверка, 1.0,0.532748
• инспекция,гол, 0.0,0.107823
• латы,меч, 1.0,0.428076
• латы,щит, 1.0,0.441120
• латы,рыцарь, 1.0,0.453718
• латы,броня, 1.0,0.414047
• латы,доспехи, 1.0,0.543852
DT: Open Russian Distributional Thesaurus
• skip-gram model (Mikolov et al., 2013)
• trained on a 12.9 billion word collection of books
in Russian
– minimal word frequency -- 5,
– number of dimensions in a word vector -- 500,
– Context window size: 10 words
– For the most frequent 932,000 words, 250 nearest
neighbours with the cosine similarity between word
vectors are calculated.
– These related words were lemmatized using
PyMorphy2.
Conclusion
• We presented new Russian resources for evaluating of
semantic relatedness measures
• Russian HJ datasets: Miller-Charles, Rubenstein, Goodenough;
WordSim-353
• RuThes dataset and Human associations dataset
• Machine Judgements Dataset and Distributional Thesaurus
• The resources can be obtained from
• http://panchenko.me/rsr/
• The semantic similarity and relatedness are useful in
many NLP and information retrieval applications

Contenu connexe

En vedette

Climate Smart Agriculture-Brochure
Climate Smart Agriculture-BrochureClimate Smart Agriculture-Brochure
Climate Smart Agriculture-Brochuresurendra gautam
 
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...Grant Goddard
 
Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16Bernard Moore
 
Research Paper (1)
Research Paper (1)Research Paper (1)
Research Paper (1)guest309917
 
MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)Namrata Bhowmik
 
GR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails WebflowGR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails WebflowGR8Conf
 
Portrait of professional developer 2.0
Portrait of professional developer 2.0Portrait of professional developer 2.0
Portrait of professional developer 2.0Mikalai Alimenkou
 
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN Tauqeer Khalid Khan
 
IIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global SummaryIIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global SummaryAlan Quayle
 
tecnica de Respiracion
tecnica de Respiraciontecnica de Respiracion
tecnica de RespiracionLina Sapuy
 
Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Sriram Krishnan
 
OOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applicationsOOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applicationsMikalai Alimenkou
 
Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10datamemoryusa
 

En vedette (17)

Climate Smart Agriculture-Brochure
Climate Smart Agriculture-BrochureClimate Smart Agriculture-Brochure
Climate Smart Agriculture-Brochure
 
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
'UK Radio Industry Consolidation: How Relevant Is The US Experience?' by Gran...
 
Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16Curriculum Vitae 12'29'16
Curriculum Vitae 12'29'16
 
Research Paper (1)
Research Paper (1)Research Paper (1)
Research Paper (1)
 
MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)MKOJ_Corp_PPT_CHRP_2016_T (1)
MKOJ_Corp_PPT_CHRP_2016_T (1)
 
GR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails WebflowGR8Conf 2011: Grails Webflow
GR8Conf 2011: Grails Webflow
 
Cervell, creativitat i oci.
Cervell, creativitat i oci.Cervell, creativitat i oci.
Cervell, creativitat i oci.
 
Portrait of professional developer 2.0
Portrait of professional developer 2.0Portrait of professional developer 2.0
Portrait of professional developer 2.0
 
Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)Posa a prova la teva creativitat (jornada filosòfica 2016)
Posa a prova la teva creativitat (jornada filosòfica 2016)
 
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
HUMOUR AND SATIRE IN PROSE AND POETRY OF COL MUHAMMAD KHALID KHAN
 
IIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global SummaryIIT RTC 2016 TADHack Global Summary
IIT RTC 2016 TADHack Global Summary
 
tecnica de Respiracion
tecnica de Respiraciontecnica de Respiracion
tecnica de Respiracion
 
Interpolation
InterpolationInterpolation
Interpolation
 
Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014Spark at Twitter - Seattle Spark Meetup, April 2014
Spark at Twitter - Seattle Spark Meetup, April 2014
 
OOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applicationsOOP paradigm, principles of good design and architecture of Java applications
OOP paradigm, principles of good design and architecture of Java applications
 
L'inconscient
L'inconscientL'inconscient
L'inconscient
 
Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10Datamemoryusa Presentacion virtual desktop 5 19 10
Datamemoryusa Presentacion virtual desktop 5 19 10
 

Similaire à Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness

Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...Jinho Choi
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalMadhusudan Daad
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAbhinav Gupta
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Taskjcscholtes
 
Semantic Application for Healthcare
Semantic Application for HealthcareSemantic Application for Healthcare
Semantic Application for Healthcarescholten
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesjoinson
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Aldo Gangemi
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methodsbutest
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
A review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingA review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingReza Sadeghi
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationArmin Haller
 
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...Rocío Cañamares
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsCheng Zhang
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)Nicolas Van Labeke
 

Similaire à Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness (20)

Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...Advances in Methods and Evaluations for Distributional Semantic Models using ...
Advances in Methods and Evaluations for Distributional Semantic Models using ...
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document Retrieval
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
 
Semantic Application for Healthcare
Semantic Application for HealthcareSemantic Application for Healthcare
Semantic Application for Healthcare
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
Emulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning MethodsEmulating Human Essay Scoring With Machine Learning Methods
Emulating Human Essay Scoring With Machine Learning Methods
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
A review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingA review on Analyzing Multiple Medical Corpora Using Word Embedding
A review on Analyzing Multiple Medical Corpora Using Word Embedding
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
 
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Generating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice QuestionsGenerating Adequate Distractors for Multiple-Choice Questions
Generating Adequate Distractors for Multiple-Choice Questions
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
Relationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology SearchRelationship-Based Top-K Concept Retrieval for Ontology Search
Relationship-Based Top-K Concept Retrieval for Ontology Search
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 

Plus de AIST

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray ImagesAIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныAIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискAIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...AIST
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAAIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeAIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesAIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationAIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsAIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceAIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumAIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 

Plus de AIST (20)

Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  ImagesAlexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray  Images
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоныАлена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поискПавел Браславский,Velpas - Velpas: мобильный визуальный поиск
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
 
Иосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBAИосиф Иткин, Exactpro - TBA
Иосиф Иткин, Exactpro - TBA
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge ExchangeNikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
 
George Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product CategoriesGeorge Moiseev - Classification of E-commerce Websites by Product Categories
George Moiseev - Classification of E-commerce Websites by Product Categories
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Marina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chantsMarina Danshina - The methodology of automated decryption of znamenny chants
Marina Danshina - The methodology of automated decryption of znamenny chants
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First GlanceEdward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamediumValeri Labunets - The bichromatic excitable Schrodinger metamedium
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 

Dernier

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Dernier (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Alexander Panchenko - Human and Machine Judgements about Russian Semantic Relatedness

  • 1. Human and Machine Judgements about Russian Semantic Relatedness • A. Panchenko, D. Ustalov, D. Paperno, C. Meyer, N. Konstantinova, N. Loukachevitch, Ch. Bieman
  • 2. Motivation • A semantic similarity measure is a specific kind of • similarity measure for nouns or multiword expressions. • … high values for synonyms, hyponyms, free associations, etc. • … low values for unrelated pairs • Applications: • information retrieval, document clustering, topic detection, question answering, word sense disambiguation, text summarization… • Most datasets, approaches were proposed for English • 2015 Russe • The First International Workshop on Russian Semantic Similarity Evaluation (RUSSE) • 19 participants, 105 runs, special session at the Dialog-2015 conference.
  • 3. Russian Datasets for Measuring Word Semantic Similarity • Human Judgement dataset (HJ dataset) – Word pairs with human judgements • Russian Thesaurus dataset (RT dataset) – synonyms and hypernyms from RuThes thesaurus • Associative Thesaurus dataset (AE dataset) – cognitive associations between words • Machine Judgements – combination of submissions from a shared task on Russian semantic similarity • Russian Distributional Thesaurus
  • 4. Human judgements about semantic similarity (HJ) • This is the standard way to assess a semantic similarity measure. • The HJ dataset contains word pairs translated from the widely used benchmarks for English: • Miller-Charles set – 30 word pairs • Rubenstein, H., Goodenough – 65 word pairs • WordSim – 353 word pairs: • Additionally subdivided into similarity set and relatedness set • Evaluation: Correlations with human judgments in terms of Spearman’s rank correlation • Agreement in ordering
  • 6. Example of human judgements about semantic similarity (HJ)
  • 7. RuThes Lingustic Ontology http://www.labinform.ru/pub/ruthes/index.htm • 96 thousand unique words and expressions – Synonyms – Conceptual relations: class-subclass, part-whole, conceptual dependence •The dataset contains 114 066 relations for 6 832 nouns. •Half of these relations are synonyms and hypernyms from the RuThes-lite thesaurus •half of them are unrelated words.
  • 9. Structure of the semantic relation classification (RT, AE) benchmarks
  • 10. Russe: Best models according to the HJ benchmark
  • 11. MJ: Machine Judgements of Word Pairs from the RUSSE Shared Task • This dataset contains 12 886 word pairs coming from HJ, RT, and AE datasets • The pairs have continuous relatedness scores • To estimate these scores we averaged 105 submissions of the shared task on Russian semantic similarity, RUSSE. • Each run consisted of 12 886 word pairs along with their similarity scores.
  • 12. Gathering Machine Judgements • Select one best submission for each of 19 participating teams for HJ, RT and AE datasets • Rank the 19 best submissions. The best one has rank r1 = 19; the worst has rank r19 = 1 • Combine scores of these 19 best submissions – The score of a pair is equal to sum of run scores multiplied by run weight – Run weight: rank, exponent of rank, or square root of rank • Combined approach is better than single submission
  • 13. Machine Judgements: Example • word1,word2,sim,wmean • препарат,вещество, 1.0,0.484418 • препарат,лекарство, 1.0,0.634770 • препарат,перестройка, 0.0,0.157699 • препарат,барселона, 0.0,0.105411 • инспекция,проверка, 1.0,0.532748 • инспекция,гол, 0.0,0.107823 • латы,меч, 1.0,0.428076 • латы,щит, 1.0,0.441120 • латы,рыцарь, 1.0,0.453718 • латы,броня, 1.0,0.414047 • латы,доспехи, 1.0,0.543852
  • 14. DT: Open Russian Distributional Thesaurus • skip-gram model (Mikolov et al., 2013) • trained on a 12.9 billion word collection of books in Russian – minimal word frequency -- 5, – number of dimensions in a word vector -- 500, – Context window size: 10 words – For the most frequent 932,000 words, 250 nearest neighbours with the cosine similarity between word vectors are calculated. – These related words were lemmatized using PyMorphy2.
  • 15.
  • 16. Conclusion • We presented new Russian resources for evaluating of semantic relatedness measures • Russian HJ datasets: Miller-Charles, Rubenstein, Goodenough; WordSim-353 • RuThes dataset and Human associations dataset • Machine Judgements Dataset and Distributional Thesaurus • The resources can be obtained from • http://panchenko.me/rsr/ • The semantic similarity and relatedness are useful in many NLP and information retrieval applications