SlideShare a Scribd company logo
1 of 1
Download to read offline
Wikification of Concept Mentions within Spoken Dialogues
Using Domain Constraints from Wikipedia
Seokhwan Kim, Rafael E. Banchs, Haizhou Li
Human Language Technology Department, Institute for Infocomm Research (I2
R), Singapore
Wikification on Spoken Dialogues
Linking mentions to the relevant concepts in Wikipedia
Differences between spoken dialogues and written texts
Number of speakers
Dependencies to background knowledge
Degree of informal and noisy expressions
Examples of Wikification on Singapore tour guide dialogues
Guide How can I help you?
Tourist Can you recommend some good places to visit in Singapore?
Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice
place to visit.
Tourist That is a symbol for your country, right?
Guide Yes, we use that to symbolise Singapore.
Tourist Okay.
Guide The lion head symbolised the founding of the island and the fish body
just symbolised the humble fishing village.
Tourist How can I get there from Orchard Road?
Guide You can take the red line train from Orchard and stop at Raffles Place.
Tourist Is this walking distance from the station to the destination?
Guide Yes, it’ll take only ten minutes on foot.
Tourist Alright.
Guide Well, you can also enjoy some seafoods at the riverside near the
place.
Tourist What food do you have any recommendations to try there?
Guide If you like spicy foods, you must try chilli crab which is one of our
favourite dishes here.
Tourist Great! I’ll try that.
Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles
Place MRT Station Singapore River, Chilli crab
Three-step Approach for Wikification on Dialogues
Input Mention
mi
Linking
Validity
Analysis
In-dialogue
Reference
Analysis
Domain
Relevance
Analysis
Speaker
Relatedness
Analysis
Candidate
Generation
Wikipedia
Concepts
History
<mj, f(mj)>j=0..(i-1)
Candidate
Ranking
Output Concept
f(mi)
Step 1
Step 2
Step 3
Step 1: Mention Analysis
Analyzing four binary properties of a given mention
Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness
Guide: In the morning I suggest to you to go to Botanical Garden.
LV ID DR SRG SRT
- - - - -
LV ID DR SRG SRT
+ - + + -
Tourist: Oh, we also have Botanical Garden.
LV ID DR SRG SRT
+ - - - +
Tourist: That is actually one of my favourite places here.
LV ID DR SRG SRT
+ + - - +
LV ID DR SRG SRT
+ - - - +
Guide: If so, you might like this place also.
LV ID DR SRG SRT
+ + + + -
Step 2: Candidate Generation
Candidates retrieval from a Lucene index on the Wikipedia collection
With filtering constraints based on the analyzed properties in step 1
Combination of multiple constraints: Intersection or Union
Step 3: Candidate Ranking
Ranking SVM: Supervised learning to rank algorithm
s(m, c) =



4 if c is the exactly same as g(m),
3 if c is the parent article of g(m),
2 if c belongs to the same article
but different section of g(m),
1 otherwise.
m: a mention
c: a candidate concept
g(m): the manual annotation for the most relevant concept of m
Datasets
Singapore tour guide dialogues
Human-human mixed initiative dialogues
35 sessions, 21 hours, 31,034 utterances
Manually annotated with relevant Wikipedia concepts
Preprocessed by Stanford CoreNLP toolkit
Wikipedia collection
4,797,927 articles and 25,577,464 sections in total
Collected from Wikipedia database dump as of January 2015
Indexed into a Lucene index
Evaluation: Mention Analysis
SVMlight
was used for training four mention analyzers
With four sets of features: mention (M), utterance (U), dialogue (D),
and Wikipedia-based (W) features
Five-fold cross validation with F-measure
Features LV ID SRG SRT
M 86.29 69.15 71.10 72.94
M+U 86.90 70.43 70.43 68.85
M+D 86.17 71.09 70.56 71.52
M+W 86.21 68.96 70.66 71.86
M+U+D 86.82 72.37 70.12 68.30
M+U+W 86.84 70.13 70.19 68.78
M+U+D+W 86.77 72.20 69.94 68.10
Evaluation: Candidate Generation
Four sets of candidates were prepared for each mention
Baseline: Retrieved with no filtering
Intersection: Filtered with intersection of analyzed properties
Union: Filtered with union of analyzed properties
Oracle: Filtered with manually annotated properties
Top 100 candidates were retrieved from a Lucene index for each set
Evaluation: Candidate Ranking
SVMrank
was used for training ranking functions
The top-ranked item in the list is considered as the result of Wikification
Five-fold cross validation with Precision/Recall/F-measure
Method P R F
Baseline 26.85 22.52 21.24
Intersection 44.37 27.35 33.84
Union 38.04 31.97 34.74
Manual Filtering 39.90 34.72 37.13
1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg

More Related Content

Viewers also liked

Wikipedia Document Classification
Wikipedia Document Classification Wikipedia Document Classification
Wikipedia Document Classification Mohit Sharma
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionSeokhwan Kim
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2VecKouhei Nakaji
 

Viewers also liked (6)

Wikipedia Document Classification
Wikipedia Document Classification Wikipedia Document Classification
Wikipedia Document Classification
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
Natural Language in Human-Robot Interaction
Natural Language in Human-Robot InteractionNatural Language in Human-Robot Interaction
Natural Language in Human-Robot Interaction
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
 

Similar to Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Automated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domainAutomated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domaindreamgirl314
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 
"Thinking in English" information structures task array
"Thinking in English" information structures task array"Thinking in English" information structures task array
"Thinking in English" information structures task arrayLawrie Hunter
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Esha t patkar portfolio 2020
Esha t patkar portfolio 2020Esha t patkar portfolio 2020
Esha t patkar portfolio 2020EshaPatkar
 
NLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasNLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasFariz Darari
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptxHaibinSu2
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013Nicola Louise Beddall-Hill
 

Similar to Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia (14)

Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Cv huaiping
Cv huaipingCv huaiping
Cv huaiping
 
Automated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domainAutomated evaluation of crowdsourced annotations in the cultural heritage domain
Automated evaluation of crowdsourced annotations in the cultural heritage domain
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
"Thinking in English" information structures task array
"Thinking in English" information structures task array"Thinking in English" information structures task array
"Thinking in English" information structures task array
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Esha t patkar portfolio 2020
Esha t patkar portfolio 2020Esha t patkar portfolio 2020
Esha t patkar portfolio 2020
 
NLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasNLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it has
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptx
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
Evaluating 'Thetford tomb raiders' Sharing research findings via an App AltC2013
 

More from Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...Seokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템Seokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...Seokhwan Kim
 

More from Seokhwan Kim (16)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)The Fifth Dialog State Tracking Challenge (DSTC5)
The Fifth Dialog State Tracking Challenge (DSTC5)
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
A Graph-based Cross-lingual Projection Approach for Spoken Language Understan...
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
EPG 정보 검색을 위한 예제 기반 자연어 대화 시스템
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
 

Recently uploaded

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 

Recently uploaded (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

  • 1. Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department, Institute for Infocomm Research (I2 R), Singapore Wikification on Spoken Dialogues Linking mentions to the relevant concepts in Wikipedia Differences between spoken dialogues and written texts Number of speakers Dependencies to background knowledge Degree of informal and noisy expressions Examples of Wikification on Singapore tour guide dialogues Guide How can I help you? Tourist Can you recommend some good places to visit in Singapore? Guide Well if you like to visit an icon of Singapore, Merlion park will be a nice place to visit. Tourist That is a symbol for your country, right? Guide Yes, we use that to symbolise Singapore. Tourist Okay. Guide The lion head symbolised the founding of the island and the fish body just symbolised the humble fishing village. Tourist How can I get there from Orchard Road? Guide You can take the red line train from Orchard and stop at Raffles Place. Tourist Is this walking distance from the station to the destination? Guide Yes, it’ll take only ten minutes on foot. Tourist Alright. Guide Well, you can also enjoy some seafoods at the riverside near the place. Tourist What food do you have any recommendations to try there? Guide If you like spicy foods, you must try chilli crab which is one of our favourite dishes here. Tourist Great! I’ll try that. Singapore, Merlion Park, Orchard Road, North South MRT Line, Raffles Place MRT Station Singapore River, Chilli crab Three-step Approach for Wikification on Dialogues Input Mention mi Linking Validity Analysis In-dialogue Reference Analysis Domain Relevance Analysis Speaker Relatedness Analysis Candidate Generation Wikipedia Concepts History <mj, f(mj)>j=0..(i-1) Candidate Ranking Output Concept f(mi) Step 1 Step 2 Step 3 Step 1: Mention Analysis Analyzing four binary properties of a given mention Linking validity, In-dialogue reference, Domain relevance, Speaker relatedness Guide: In the morning I suggest to you to go to Botanical Garden. LV ID DR SRG SRT - - - - - LV ID DR SRG SRT + - + + - Tourist: Oh, we also have Botanical Garden. LV ID DR SRG SRT + - - - + Tourist: That is actually one of my favourite places here. LV ID DR SRG SRT + + - - + LV ID DR SRG SRT + - - - + Guide: If so, you might like this place also. LV ID DR SRG SRT + + + + - Step 2: Candidate Generation Candidates retrieval from a Lucene index on the Wikipedia collection With filtering constraints based on the analyzed properties in step 1 Combination of multiple constraints: Intersection or Union Step 3: Candidate Ranking Ranking SVM: Supervised learning to rank algorithm s(m, c) =    4 if c is the exactly same as g(m), 3 if c is the parent article of g(m), 2 if c belongs to the same article but different section of g(m), 1 otherwise. m: a mention c: a candidate concept g(m): the manual annotation for the most relevant concept of m Datasets Singapore tour guide dialogues Human-human mixed initiative dialogues 35 sessions, 21 hours, 31,034 utterances Manually annotated with relevant Wikipedia concepts Preprocessed by Stanford CoreNLP toolkit Wikipedia collection 4,797,927 articles and 25,577,464 sections in total Collected from Wikipedia database dump as of January 2015 Indexed into a Lucene index Evaluation: Mention Analysis SVMlight was used for training four mention analyzers With four sets of features: mention (M), utterance (U), dialogue (D), and Wikipedia-based (W) features Five-fold cross validation with F-measure Features LV ID SRG SRT M 86.29 69.15 71.10 72.94 M+U 86.90 70.43 70.43 68.85 M+D 86.17 71.09 70.56 71.52 M+W 86.21 68.96 70.66 71.86 M+U+D 86.82 72.37 70.12 68.30 M+U+W 86.84 70.13 70.19 68.78 M+U+D+W 86.77 72.20 69.94 68.10 Evaluation: Candidate Generation Four sets of candidates were prepared for each mention Baseline: Retrieved with no filtering Intersection: Filtered with intersection of analyzed properties Union: Filtered with union of analyzed properties Oracle: Filtered with manually annotated properties Top 100 candidates were retrieved from a Lucene index for each set Evaluation: Candidate Ranking SVMrank was used for training ranking functions The top-ranked item in the list is considered as the result of Wikification Five-fold cross validation with Precision/Recall/F-measure Method P R F Baseline 26.85 22.52 21.24 Intersection 44.37 27.35 33.84 Union 38.04 31.97 34.74 Manual Filtering 39.90 34.72 37.13 1 Fusionopolis Way, #21-01 Connexis (South Tower), Singapore 138632 Email: kims@i2r.a-star.edu.sg