SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
The Fifth Dialog State Tracking Challenge (DSTC5)
Seokhwan Kim1
, Luis Fernando D’Haro1
, Rafael E. Banchs1
, Jason D. Williams2
, Matthew Henderson3
, Koichiro Yoshino4
1
Institute for Infocomm Research, Singapore. 2
Microsoft Research, USA. 3
Google, USA. 4
Nara Institute of Science and Technology, Japan.
Problems
Goal
Human-human dialogs on tourist information in English and Chinese
Focusing on the problem of adaptation to a new language
Main Task
Dialog State Tracking (DST)
Pilot Tasks
Spoken Language Understanding (SLU)
Speech Act Prediction (SAP)
Spoken Language Generation (SLG)
End-to-end System (EES)
Datasets
Dialogs
Set Task Language # dialogs # utterances
Train ALL English 35 31,304 ← DSTC4 datasets
Dev ALL Chinese 2 3,130
Test MAIN Chinese 10 14,878
Test SLU Chinese 8 12,655
Test SAP Chinese 8 11,456
Test SLG Chinese 8 12,346
Translations
5-best translations were provided for each utterance with word alignments
generated by English-to-Chinese and Chinese-to-English MT systems
The ontology for DSTC4 was given with its automatic translation to Chinese
Main Task: Dialog State Tracking
Task Definition
Dialog state tracking for each sub-dialog level
Input
Transcribed utterances from the beginning of the session to each timestep
Manually segmented by sub-dialogs and annotated with topic categories
Output
Frame structures defined with slot-value pairs
For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation
Example
Speaker Utterance Dialog State
Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction
TYPE OF PLACE:
Ethnic enclave
NEIGHBORHOOD:
Kampong Glam
Tourist 对。(Right.)
Guide 你看,它是个-它是马来村嘛
(You see, it is a- it’s a Malay Village)
Tourist 对,甘榜- (Right, Kampong-)
Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food
CUISINE:
Malay cuisine
NEIGHBORHOOD:
Kampong Glam
Tourist 比较有特色的食物, (It’s quite a unique food,)
Guide 对,哦。(Right.)
Guide 马来食物,基本上,它是香。
(Malay food, basically, it smells very nice.)
Tourist 那我们住宿呢?(Then, where do we stay?)
TOPIC: Accommodation
INFO: Pricerange
NAME: V Hotel
Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.)
Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.)
Tourist 好的。 (Okay.)
Guide 如果要去,我建议的这个马来文化村,
TOPIC: Transportation
INFO: Duration
TYPE: Walking
FROM: V Hotel
TO: Kampong Glam
(If you want to go, I suggest this Malay cultural village,)
Tourist 马来村? (Malay village?)
Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.)
Tourist 好。 (That’s good.)
Main Task: Dialog State Tracking
Baselines
Fuzzy string matching between ontology entries and utterances (DSTC4)
Baseline 1: Translations in English with the original ontology in English
Baseline 2: Original utterances in Chinese with the translated ontology in Chinese
Evaluation
Schedules: (1) every turn; (2) only at the end of each sub-dialog
Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure
Results (32 entries from 9 teams)
Schedule 1 Schedule 2
Team Entry Accuracy F-measure Accuracy F-measure
0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1
0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2
1 0 0.0397 0.3115 0.0551 0.3565
1 1 0.0386 0.3032 0.0597 0.3540
1 2 0.0393 0.3071 0.0551 0.3563
1 3 0.0387 0.3052 0.0597 0.3580
1 4 0.0417 0.3166 0.0612 0.3675
2 0 0.0736 0.3966 0.0964 0.4430
2 1 0.0567 0.3764 0.0712 0.4267
2 2 0.0529 0.3756 0.0681 0.4259
2 3 0.0788 0.4047 0.0956 0.4519
2 4 0.0699 0.4024 0.0872 0.4499
3 0 0.0351 0.2060 0.0505 0.2539
3 1 0.0303 0.2424 0.0367 0.2830
3 2 0.0289 0.2074 0.0406 0.2573
3 3 0.0341 0.2442 0.0451 0.2895
4 0 0.0583 0.3280 0.0765 0.3658
4 1 0.0407 0.3405 0.0413 0.3572
4 2 0.0515 0.3708 0.0635 0.3945
4 3 0.0552 0.3649 0.0681 0.3913
4 4 0.0454 0.3572 0.0559 0.3758
5 0 0.0330 0.2749 0.0520 0.3314
5 1 0.0187 0.1804 0.0230 0.1967
5 2 0.0183 0.1520 0.0168 0.1371
5 3 0.0313 0.1574 0.0413 0.1880
5 4 0.0093 0.0945 0.0115 0.0977
6 0 0.0389 0.2849 0.0482 0.3230
6 1 0.0340 0.3070 0.0383 0.3532
6 2 0.0491 0.2988 0.0643 0.3381
7 0 0.0092 0.0783 0.0107 0.0794
7 1 0.0085 0.0767 0.0115 0.0809
8 0 0.0192 0.1570 0.0214 0.1554
8 1 0.0068 0.0554 0.0069 0.0577
9 0 0.0231 0.1114 0.0314 0.1449
Pilot Task: Spoken Language Understanding
Task Definition
Input: Transcribed utterance at each timestep
Output
Speech Act: 4 main categories with 21 attributes
Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers
Example
Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Speech Act: INI (RECOMMEND)
Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。
(I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.)
Pilot Task: Spoken Language Understanding
Baselines: SVM for Speech Acts and CRF for Semantic Tags
Evaluation Metrics: Precision/Recall/F-measure
Results on Speech Acts (12 entries from 4 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline
2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239
2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297
2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261
2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285
3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409
3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300
5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739
5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261
5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635
5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594
5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700
7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571
Results on Sementic Tags (8 entries from 3 teams)
Guide Tourist
Team Entry P R F P R F
0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline
3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513
5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859
5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847
5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574
5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480
5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433
7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517
Pilot Task: Spoken Language Generation
Task Definition
Input: Speech act and semantic tags at each time step
Output: Generated utterance
Example
Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC>
Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.)
Baseline
Example-based language generation
Using k-nearest neighbors algorithm on speech acts and semantic tags
Evaluation Metrics
BLEU: Geometric average of n-gram precision of system outputs to references
AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability
Results (4 entries from 1 team)
Guide Tourist
Team Entry AM-FM BLEU AM-FM BLEU
0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline
5 0 0.2818 0.3264 0.3221 0.4850
5 1 0.3180 0.3371 0.3635 0.5249
5 2 0.2737 0.2852 0.3100 0.4741
5 3 0.2405 0.2758 0.4258 0.5302
* More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).

Contenu connexe

Similaire à The Fifth Dialog State Tracking Challenge (DSTC5)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)TelecomValley
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?Ray Poynter
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Muhamad Rizky
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
 
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Umedia2011 -  uP: A lightweight protocol for services in smart spacesUmedia2011 -  uP: A lightweight protocol for services in smart spaces
Umedia2011 - uP: A lightweight protocol for services in smart spacesFabricio Nogueira Buzeto
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Ivan Kitov
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countsAPNIC
 
2011 SAIR It's not about pie when it comes to the facts
2011 SAIR It's not about pie when it comes to the facts2011 SAIR It's not about pie when it comes to the facts
2011 SAIR It's not about pie when it comes to the factsDavid Onder
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4thasfawm
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3Narith Por
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101Talha Obaid
 

Similaire à The Fifth Dialog State Tracking Challenge (DSTC5) (13)

SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)SophiaConf 2018 - J. Rahajarison (My Little Adventure)
SophiaConf 2018 - J. Rahajarison (My Little Adventure)
 
How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?How to use a Kalman Filter in Brand Tracking?
How to use a Kalman Filter in Brand Tracking?
 
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
Ground Vibration Control Using Signature Hole Method - Thesis BE Mining, Univ...
 
1. talleres lectoescritura
1. talleres lectoescritura1. talleres lectoescritura
1. talleres lectoescritura
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
 
Umedia2011 - uP: A lightweight protocol for services in smart spaces
Umedia2011 -  uP: A lightweight protocol for services in smart spacesUmedia2011 -  uP: A lightweight protocol for services in smart spaces
Umedia2011 - uP: A lightweight protocol for services in smart spaces
 
Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...Remote detection of weak aftershocks of the DPRK underground explosions using...
Remote detection of weak aftershocks of the DPRK underground explosions using...
 
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the countssCorrecting for country skew: How APNIC adjusts for sample bias in the counts
sCorrecting for country skew: How APNIC adjusts for sample bias in the counts
 
Trigonotabel
TrigonotabelTrigonotabel
Trigonotabel
 
2011 SAIR It's not about pie when it comes to the facts
2011 SAIR It's not about pie when it comes to the facts2011 SAIR It's not about pie when it comes to the facts
2011 SAIR It's not about pie when it comes to the facts
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
 
Group assigment statistic group3
Group assigment statistic group3Group assigment statistic group3
Group assigment statistic group3
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
 

Plus de Seokhwan Kim

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)Seokhwan Kim
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingSeokhwan Kim
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Seokhwan Kim
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Seokhwan Kim
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...Seokhwan Kim
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSeokhwan Kim
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingSeokhwan Kim
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...Seokhwan Kim
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionSeokhwan Kim
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...Seokhwan Kim
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessSeokhwan Kim
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...Seokhwan Kim
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionSeokhwan Kim
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionSeokhwan Kim
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...Seokhwan Kim
 

Plus de Seokhwan Kim (16)

The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Dynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic TrackingDynamic Memory Networks for Dialogue Topic Tracking
Dynamic Memory Networks for Dialogue Topic Tracking
 
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constra...
 
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
Towards Improving Dialogue Topic Tracking Performances with Wikification of C...
 
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain ...
 
Sequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog StatesSequential Labeling for Tracking Dynamic Dialog States
Sequential Labeling for Tracking Dynamic Dialog States
 
Wikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic TrackingWikipedia-based Kernels for Dialogue Topic Tracking
Wikipedia-based Kernels for Dialogue Topic Tracking
 
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
 
MMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognitionMMR-based active machine learning for Bio named entity recognition
MMR-based active machine learning for Bio named entity recognition
 
A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...A semi-supervised method for efficient construction of statistical spoken lan...
A semi-supervised method for efficient construction of statistical spoken lan...
 
A spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information accessA spoken dialog system for electronic program guide information access
A spoken dialog system for electronic program guide information access
 
An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...An alignment-based approach to semi-supervised relation extraction including ...
An alignment-based approach to semi-supervised relation extraction including ...
 
An Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information ExtractionAn Alignment-based Pattern Representation Model for Information Extraction
An Alignment-based Pattern Representation Model for Information Extraction
 
A Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation DetectionA Cross-Lingual Annotation Projection Approach for Relation Detection
A Cross-Lingual Annotation Projection Approach for Relation Detection
 
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
A Cross-lingual Annotation Projection-based Self-supervision Approach for Ope...
 

Dernier

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

The Fifth Dialog State Tracking Challenge (DSTC5)

  • 1. The Fifth Dialog State Tracking Challenge (DSTC5) Seokhwan Kim1 , Luis Fernando D’Haro1 , Rafael E. Banchs1 , Jason D. Williams2 , Matthew Henderson3 , Koichiro Yoshino4 1 Institute for Infocomm Research, Singapore. 2 Microsoft Research, USA. 3 Google, USA. 4 Nara Institute of Science and Technology, Japan. Problems Goal Human-human dialogs on tourist information in English and Chinese Focusing on the problem of adaptation to a new language Main Task Dialog State Tracking (DST) Pilot Tasks Spoken Language Understanding (SLU) Speech Act Prediction (SAP) Spoken Language Generation (SLG) End-to-end System (EES) Datasets Dialogs Set Task Language # dialogs # utterances Train ALL English 35 31,304 ← DSTC4 datasets Dev ALL Chinese 2 3,130 Test MAIN Chinese 10 14,878 Test SLU Chinese 8 12,655 Test SAP Chinese 8 11,456 Test SLG Chinese 8 12,346 Translations 5-best translations were provided for each utterance with word alignments generated by English-to-Chinese and Chinese-to-English MT systems The ontology for DSTC4 was given with its automatic translation to Chinese Main Task: Dialog State Tracking Task Definition Dialog state tracking for each sub-dialog level Input Transcribed utterances from the beginning of the session to each timestep Manually segmented by sub-dialogs and annotated with topic categories Output Frame structures defined with slot-value pairs For 5 major topic categories: Accommodation, Attraction, Food, Shopping, Transportation Example Speaker Utterance Dialog State Guide 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) TOPIC: Attraction TYPE OF PLACE: Ethnic enclave NEIGHBORHOOD: Kampong Glam Tourist 对。(Right.) Guide 你看,它是个-它是马来村嘛 (You see, it is a- it’s a Malay Village) Tourist 对,甘榜- (Right, Kampong-) Guide 它就卖了很多马来食物。 (It sells a lot of Malay food.) TOPIC: Food CUISINE: Malay cuisine NEIGHBORHOOD: Kampong Glam Tourist 比较有特色的食物, (It’s quite a unique food,) Guide 对,哦。(Right.) Guide 马来食物,基本上,它是香。 (Malay food, basically, it smells very nice.) Tourist 那我们住宿呢?(Then, where do we stay?) TOPIC: Accommodation INFO: Pricerange NAME: V Hotel Guide 我介绍一间呵,叫V Hotel的。 (Let me recommend to you, the V Hotel.) Guide 这个酒店,价格这个不贵。 (This hotel, the price is not expensive.) Tourist 好的。 (Okay.) Guide 如果要去,我建议的这个马来文化村, TOPIC: Transportation INFO: Duration TYPE: Walking FROM: V Hotel TO: Kampong Glam (If you want to go, I suggest this Malay cultural village,) Tourist 马来村? (Malay village?) Guide 步行大概我看十五分钟吧。 (I think it take fifteen minutes on foot.) Tourist 好。 (That’s good.) Main Task: Dialog State Tracking Baselines Fuzzy string matching between ontology entries and utterances (DSTC4) Baseline 1: Translations in English with the original ontology in English Baseline 2: Original utterances in Chinese with the translated ontology in Chinese Evaluation Schedules: (1) every turn; (2) only at the end of each sub-dialog Metrics: (1) Frame-level Accuracy; (2) Slot-level Precision/Recall/F-measure Results (32 entries from 9 teams) Schedule 1 Schedule 2 Team Entry Accuracy F-measure Accuracy F-measure 0 0 0.0250 0.1124 0.0321 0.1462 ← Baseline 1 0 1 0.0161 0.1475 0.0222 0.1871 ← Baseline 2 1 0 0.0397 0.3115 0.0551 0.3565 1 1 0.0386 0.3032 0.0597 0.3540 1 2 0.0393 0.3071 0.0551 0.3563 1 3 0.0387 0.3052 0.0597 0.3580 1 4 0.0417 0.3166 0.0612 0.3675 2 0 0.0736 0.3966 0.0964 0.4430 2 1 0.0567 0.3764 0.0712 0.4267 2 2 0.0529 0.3756 0.0681 0.4259 2 3 0.0788 0.4047 0.0956 0.4519 2 4 0.0699 0.4024 0.0872 0.4499 3 0 0.0351 0.2060 0.0505 0.2539 3 1 0.0303 0.2424 0.0367 0.2830 3 2 0.0289 0.2074 0.0406 0.2573 3 3 0.0341 0.2442 0.0451 0.2895 4 0 0.0583 0.3280 0.0765 0.3658 4 1 0.0407 0.3405 0.0413 0.3572 4 2 0.0515 0.3708 0.0635 0.3945 4 3 0.0552 0.3649 0.0681 0.3913 4 4 0.0454 0.3572 0.0559 0.3758 5 0 0.0330 0.2749 0.0520 0.3314 5 1 0.0187 0.1804 0.0230 0.1967 5 2 0.0183 0.1520 0.0168 0.1371 5 3 0.0313 0.1574 0.0413 0.1880 5 4 0.0093 0.0945 0.0115 0.0977 6 0 0.0389 0.2849 0.0482 0.3230 6 1 0.0340 0.3070 0.0383 0.3532 6 2 0.0491 0.2988 0.0643 0.3381 7 0 0.0092 0.0783 0.0107 0.0794 7 1 0.0085 0.0767 0.0115 0.0809 8 0 0.0192 0.1570 0.0214 0.1554 8 1 0.0068 0.0554 0.0069 0.0577 9 0 0.0231 0.1114 0.0314 0.1449 Pilot Task: Spoken Language Understanding Task Definition Input: Transcribed utterance at each timestep Output Speech Act: 4 main categories with 21 attributes Semantic Tags: 8 main categories with subcategories, relative modifiers and from-to modifiers Example Input: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Speech Act: INI (RECOMMEND) Semantic Tags: 我介绍你这<LOC CAT=“CULTURAL”>个甘榜格南</LOC>。 (I recommend you this <LOC CAT=“CULTURAL”>Kampong Glam</LOC>.) Pilot Task: Spoken Language Understanding Baselines: SVM for Speech Acts and CRF for Semantic Tags Evaluation Metrics: Precision/Recall/F-measure Results on Speech Acts (12 entries from 4 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4588 0.2480 0.3219 0.3694 0.1828 0.2446 ← SVM baseline 2 0 0.5450 0.3911 0.4554 0.5001 0.5501 0.5239 2 1 0.5305 0.3969 0.4540 0.5331 0.5263 0.5297 2 2 0.5533 0.3829 0.4526 0.5107 0.5425 0.5261 2 3 0.5127 0.4251 0.4648 0.5605 0.4999 0.5285 3 0 0.4279 0.3583 0.3900 0.4591 0.4241 0.4409 3 1 0.4340 0.3635 0.3956 0.4498 0.4119 0.4300 5 0 0.4085 0.3364 0.3690 0.5026 0.4484 0.4739 5 1 0.3905 0.3216 0.3527 0.4519 0.4031 0.4261 5 2 0.4639 0.3820 0.4190 0.4916 0.4385 0.4635 5 3 0.4540 0.3739 0.4101 0.4871 0.4346 0.4594 5 4 0.4459 0.3672 0.4028 0.4984 0.4446 0.4700 7 0 0.5007 0.2976 0.3733 0.5079 0.4156 0.4571 Results on Sementic Tags (8 entries from 3 teams) Guide Tourist Team Entry P R F P R F 0 0 0.4666 0.3187 0.3787 0.5259 0.2659 0.3532 ← CRF baseline 3 0 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 3 1 0.4650 0.3182 0.3779 0.5331 0.2620 0.3513 5 0 0.5006 0.2923 0.3691 0.5083 0.3110 0.3859 5 1 0.5469 0.1893 0.2813 0.5121 0.3081 0.3847 5 2 0.3577 0.2476 0.2926 0.3031 0.2237 0.2574 5 3 0.3486 0.2541 0.2939 0.2932 0.2149 0.2480 5 4 0.3395 0.2111 0.2603 0.2947 0.2072 0.2433 7 0 0.4400 0.3207 0.3710 0.4408 0.2926 0.3517 Pilot Task: Spoken Language Generation Task Definition Input: Speech act and semantic tags at each time step Output: Generated utterance Example Input: INI (RECOMMEND), <LOC CAT=“CULTURAL”>Kampong Glam</LOC> Output: 我介绍你这个甘榜格南。 (I recommend you this Kampong Glam.) Baseline Example-based language generation Using k-nearest neighbors algorithm on speech acts and semantic tags Evaluation Metrics BLEU: Geometric average of n-gram precision of system outputs to references AM-FM: Linear interpolation of cosine similarity and normalized n-gram probability Results (4 entries from 1 team) Guide Tourist Team Entry AM-FM BLEU AM-FM BLEU 0 0 0.1981 0.3854 0.2602 0.5921 ← Baseline 5 0 0.2818 0.3264 0.3221 0.4850 5 1 0.3180 0.3371 0.3635 0.5249 5 2 0.2737 0.2852 0.3100 0.4741 5 3 0.2405 0.2758 0.4258 0.5302 * More details can be found from our paper in the SLT proceeding, DSTC5 official website (http://workshop.colips.org/dstc5/) and DSTC5 GitHub repository (https://github.com/seokhwankim/dstc5).