SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Towards End-to-End Reinforcement Learning
of Dialogue Agents for Information Access
Bhuwan Dhingra Carnegie Mellon University
Lihong Li Microsoft Research
Xiujun Li Microsoft Research
Jianfeng Gao Microsoft Research
Yun-Nung (Vivian) Chen National Taiwan University
Faisal Ahmed Microsoft Research
Li Deng Citadel
KB-InfoBot: An interactive search engine
• Setting
– User is looking for a piece of information from one or more tables/KBs
– System must iteratively ask for user constraints (“slots”) to retrieve the answer
• Interactive search is more natural
– Users are used to issuing queries of length less than 5 words (Spink et al, 2001)
– Users may not know the structure of the database being queried
Movie=? Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993.
KB-InfoBotUser
Entity-Centric Knowledge Base
Movie Actor
Release
Year
Groundhog Day Bill Murray 1993
Australia Nicole Kidman X
Mad Max: Fury Road X 2015
Goal-Oriented Dialogue System (Young et al., 2013)
Natural
Language
Understanding
(NLU)
State Tracker/
Belief Tracker
Dialogue Policy
Natural
Language
Generator
(NLG)
Database /
KB
User
Agent
User
Utterance
Acts/
Entities
Dialogue
State
System
Response
Query
Results
Query Example:
SELECT Movie
WHERE
Actor==Bill Murray AND
Genre==ComedyDialogue
Act
KB-InfoBot
• A simple rule-based approach:
– Use heuristics to maintain belief state over slots
– Ask for slot with maximum uncertainty, until some
“inform” criterion is met
Has no notion of what the user is likely to be looking for or likely to know
Symbolic queries lose notion of uncertainty in upstream modules
Cannot improve online with user feedback
KB-InfoBot
• Supervised / Reinforcement Learning-based
approach
– Use neural networks to model LU, Belief Tracker and
Policy
Learn user behaviors (e.g. slots likely to be known)
Symbolic queries lose notion of uncertainty in upstream modules
End-to-end and online learning possible, but cannot backprop gradients
through symbolic query
Network-Based Dialogue System (Wen et al., 2017)
Database /
KB
User
Agent
User
Utterance
Acts/
Entities
Dialogue
Act
System
Response
Query
Results
Query Example:
SELECT Movie
WHERE
Actor==Bill Murray AND
Genre==Comedy
Dialogue
StateLoss / Reward
Backprop
Not Differentiable!
Supervised Learning /
Reinforcement LearningTruly “End-to-end” learning not possible 
Piecewise Training (Wen et al., 2017)
Database /
KB
User
Agent
User
Utterance
Acts/
Entities
Dialogue
Act
System
Response
Query
Results
Dialogue
StateLoss / Reward
Backprop
Supervised Learning / Reinforcement Learning
Labeled
Data
LossBackprop
Supervised Learning- Labeling expensive
- Cannot learn online
• Replace symbolic query with an attention distribution
– Compose slot-wise belief states into one posterior
distribution over entire database
– The KB structure is encoded in the computation of
attention
Uncertainty over database entries propagated to policy network (rule-based + RL)
Differentiable operations allow backpropagation of gradients (RL)
Computationally expensive for large databases
Our Approach: Soft-KB Lookup via
Attention
Our Approach: Soft-KB Lookup via
Attention
Database /
KB
Agent
Soft
Attention
Supervised Learning /
Reinforcement Learning
User
User
Utterance
Acts/
Entities
Dialogue
Act
System
Response
Dialogu
e State
Full Distribution
over DB
Backprop
Backprop
Uncertainty propagated forward
Gradients propagated backward
Entity-Centric
KB
Soft-KB lookup
Agent Beliefs
Distribution over
slots (or fields) in
the KB
KB Posterior
Posterior
distribution over
entities in the KB
Entity Slot1 Slot2
A x1 y1
B x2 ?
C ? y2
Missing Values
State Tracker
For each slot j:
1. A multinomial over slot values –
2. A binomial probability of whether user knows
the value of the slot -
x1 x2
0.3 0.7
Slot Values
Probabilities
0.8
KB Posterior
Entity Slot1 Slot2
A x1 y1
B x2 ?
C ? y2
Assumption: Slot values are independently distributed
KB Posterior
0.8
KB Posterior
Entity Slot1
A x1
B x2
C ?
x1 x2
0.3 0.7
Examples:
KB-Posterior
• Distribution over all entities in the database
• Posterior reflects uncertainty in LU + State Tracking
• All operations are differentiable
– Gradients can pass through during backward pass
Evaluation – Three Questions
Does Soft-KB lookup lead to better dialog policies?
Does Reinforcement Learning improve over Rule-based approach?
Does End-to-end learning lead to higher rewards?
KB-InfoBot Versions
Belief Trackers:
A. Hand-Crafted (Bayesian updates)
B. Neural (GRU)
Policy Network:
C. Hand-Crafted (Entropy Minimization)
D. Neural (GRU)
KB-lookup:
1. No KB lookup (Policy unaware of KB)
2. Hard-KB lookup (SQL type lookup)
3. Soft-KB lookup (KB Posterior)
Rule-Based Agents: A + C + (1, 2, 3)
RL-Based Agents: A + D + (1, 2, 3)
E2E Agent: B + D + (3)
Training
• All agents trained using against a publicly available user simulator (Li et al, 2017)*
• Optimize future discounted rewards:
• RL agent:
• E2E agent:
• Credit assignment:
– E2E agent always fails with random initialization
– Imitation learning at beginning to mimic rule-based policy
* https://github.com/MiuLab/TC-Bot
Policy
KB Posterior Policy
Simulation Results
• Evaluated on Movie-Centric KBs – small, medium, large, X-large
• Metrics:
– # of Dialogue Turns (T)
– Success Rate (correct movie returned) (S)
– Average Reward (R)
• All agents tuned to maximize average reward
Soft-KB > Hard-KB > No-KB
RL > Rule-based
E2E performs best
Human Evaluation
• Setting
– Typed interactions
– Given 1) a goal entity 2) subset of slot values
– multiple values per slot  noise modeling
– Users are free to frame their inputs
Soft-KB lookup > Hard-KB lookup (Success Rate)
RL agent > Rule-based agent (#Turns)
However, full E2E agent performed worse than RL-
Soft and Rule-Soft agents
Discussion
• Soft-KB lookup
– Better dialogue policies
• E2E agent
– Strong performance in simulations
– Does not transfer to real interactions
– Overfits to the limited natural language from the simulator
• Future research: personalized dialogue assistants?
– Deploy using RL-Soft agent
– Collect interactions to train E2E agent
– Gradually switch to the E2E agent
Thanks for Your
Attention!
Code Available: https://github.com/MiuLab/KB-InfoBot

Contenu connexe

Tendances

An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingYun-Nung (Vivian) Chen
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue SystemsMLReview
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Yun-Nung (Vivian) Chen
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Yun-Nung (Vivian) Chen
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人Yun-Nung (Vivian) Chen
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System ReviewNguyen Quang
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...AI Frontiers
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...Yun-Nung (Vivian) Chen
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsYun-Nung (Vivian) Chen
 
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLPLukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLPAI Frontiers
 
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)AI Frontiers
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AIDaniel Kornev
 
God Mode for designing scenario-driven skills for DeepPavlov Dream
God Mode for designing scenario-driven skills for DeepPavlov DreamGod Mode for designing scenario-driven skills for DeepPavlov Dream
God Mode for designing scenario-driven skills for DeepPavlov DreamDaniel Kornev
 
Managing Dialog Strategy in Multiskill AI Assistant with Discourse Management
Managing Dialog Strategy in Multiskill AI Assistant with Discourse ManagementManaging Dialog Strategy in Multiskill AI Assistant with Discourse Management
Managing Dialog Strategy in Multiskill AI Assistant with Discourse ManagementDaniel Kornev
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 

Tendances (18)

An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task Understanding
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent Assistants
 
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLPLukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP
 
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...
 
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
Multiskill Conversational AI
Multiskill Conversational AIMultiskill Conversational AI
Multiskill Conversational AI
 
God Mode for designing scenario-driven skills for DeepPavlov Dream
God Mode for designing scenario-driven skills for DeepPavlov DreamGod Mode for designing scenario-driven skills for DeepPavlov Dream
God Mode for designing scenario-driven skills for DeepPavlov Dream
 
Managing Dialog Strategy in Multiskill AI Assistant with Discourse Management
Managing Dialog Strategy in Multiskill AI Assistant with Discourse ManagementManaging Dialog Strategy in Multiskill AI Assistant with Discourse Management
Managing Dialog Strategy in Multiskill AI Assistant with Discourse Management
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Similaire à Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②Kent T
 
A Brief Note On Image Based Qa On The Video And Audio...
A Brief Note On Image Based Qa On The Video And Audio...A Brief Note On Image Based Qa On The Video And Audio...
A Brief Note On Image Based Qa On The Video And Audio...Lupita Vickrey
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
Getting Insight from Big Data
Getting Insight from Big DataGetting Insight from Big Data
Getting Insight from Big DataUjang Fahmi
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdf
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdfKevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdf
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdfWFYeung
 
Deep learning for e-commerce: current status and future prospects
Deep learning for e-commerce: current status and future prospectsDeep learning for e-commerce: current status and future prospects
Deep learning for e-commerce: current status and future prospectsRakuten Group, Inc.
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
mini project2.ppt.pptx
mini project2.ppt.pptxmini project2.ppt.pptx
mini project2.ppt.pptxnaniinanii3
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Gabriel Moreira
 

Similaire à Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access (20)

Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②
 
A Brief Note On Image Based Qa On The Video And Audio...
A Brief Note On Image Based Qa On The Video And Audio...A Brief Note On Image Based Qa On The Video And Audio...
A Brief Note On Image Based Qa On The Video And Audio...
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Getting Insight from Big Data
Getting Insight from Big DataGetting Insight from Big Data
Getting Insight from Big Data
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdf
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdfKevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdf
Kevin-Liao_Prototyping-a-Recommender-System-Step-by-Step_Part-1.pdf
 
Deep learning for e-commerce: current status and future prospects
Deep learning for e-commerce: current status and future prospectsDeep learning for e-commerce: current status and future prospects
Deep learning for e-commerce: current status and future prospects
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
mini project2.ppt.pptx
mini project2.ppt.pptxmini project2.ppt.pptx
mini project2.ppt.pptx
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 

Dernier

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Dernier (20)

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

  • 1. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access Bhuwan Dhingra Carnegie Mellon University Lihong Li Microsoft Research Xiujun Li Microsoft Research Jianfeng Gao Microsoft Research Yun-Nung (Vivian) Chen National Taiwan University Faisal Ahmed Microsoft Research Li Deng Citadel
  • 2. KB-InfoBot: An interactive search engine • Setting – User is looking for a piece of information from one or more tables/KBs – System must iteratively ask for user constraints (“slots”) to retrieve the answer • Interactive search is more natural – Users are used to issuing queries of length less than 5 words (Spink et al, 2001) – Users may not know the structure of the database being queried Movie=? Actor=Bill Murray; Release Year=1993 Find me the Bill Murray’s movie. I think it came out in 1993. When was it released? Groundhog Day is a Bill Murray movie which came out in 1993. KB-InfoBotUser Entity-Centric Knowledge Base Movie Actor Release Year Groundhog Day Bill Murray 1993 Australia Nicole Kidman X Mad Max: Fury Road X 2015
  • 3. Goal-Oriented Dialogue System (Young et al., 2013) Natural Language Understanding (NLU) State Tracker/ Belief Tracker Dialogue Policy Natural Language Generator (NLG) Database / KB User Agent User Utterance Acts/ Entities Dialogue State System Response Query Results Query Example: SELECT Movie WHERE Actor==Bill Murray AND Genre==ComedyDialogue Act
  • 4. KB-InfoBot • A simple rule-based approach: – Use heuristics to maintain belief state over slots – Ask for slot with maximum uncertainty, until some “inform” criterion is met Has no notion of what the user is likely to be looking for or likely to know Symbolic queries lose notion of uncertainty in upstream modules Cannot improve online with user feedback
  • 5. KB-InfoBot • Supervised / Reinforcement Learning-based approach – Use neural networks to model LU, Belief Tracker and Policy Learn user behaviors (e.g. slots likely to be known) Symbolic queries lose notion of uncertainty in upstream modules End-to-end and online learning possible, but cannot backprop gradients through symbolic query
  • 6. Network-Based Dialogue System (Wen et al., 2017) Database / KB User Agent User Utterance Acts/ Entities Dialogue Act System Response Query Results Query Example: SELECT Movie WHERE Actor==Bill Murray AND Genre==Comedy Dialogue StateLoss / Reward Backprop Not Differentiable! Supervised Learning / Reinforcement LearningTruly “End-to-end” learning not possible 
  • 7. Piecewise Training (Wen et al., 2017) Database / KB User Agent User Utterance Acts/ Entities Dialogue Act System Response Query Results Dialogue StateLoss / Reward Backprop Supervised Learning / Reinforcement Learning Labeled Data LossBackprop Supervised Learning- Labeling expensive - Cannot learn online
  • 8. • Replace symbolic query with an attention distribution – Compose slot-wise belief states into one posterior distribution over entire database – The KB structure is encoded in the computation of attention Uncertainty over database entries propagated to policy network (rule-based + RL) Differentiable operations allow backpropagation of gradients (RL) Computationally expensive for large databases Our Approach: Soft-KB Lookup via Attention
  • 9. Our Approach: Soft-KB Lookup via Attention Database / KB Agent Soft Attention Supervised Learning / Reinforcement Learning User User Utterance Acts/ Entities Dialogue Act System Response Dialogu e State Full Distribution over DB Backprop Backprop Uncertainty propagated forward Gradients propagated backward
  • 10. Entity-Centric KB Soft-KB lookup Agent Beliefs Distribution over slots (or fields) in the KB KB Posterior Posterior distribution over entities in the KB Entity Slot1 Slot2 A x1 y1 B x2 ? C ? y2 Missing Values
  • 11. State Tracker For each slot j: 1. A multinomial over slot values – 2. A binomial probability of whether user knows the value of the slot - x1 x2 0.3 0.7 Slot Values Probabilities 0.8
  • 12. KB Posterior Entity Slot1 Slot2 A x1 y1 B x2 ? C ? y2 Assumption: Slot values are independently distributed
  • 14. KB Posterior Entity Slot1 A x1 B x2 C ? x1 x2 0.3 0.7 Examples:
  • 15. KB-Posterior • Distribution over all entities in the database • Posterior reflects uncertainty in LU + State Tracking • All operations are differentiable – Gradients can pass through during backward pass
  • 16. Evaluation – Three Questions Does Soft-KB lookup lead to better dialog policies? Does Reinforcement Learning improve over Rule-based approach? Does End-to-end learning lead to higher rewards?
  • 17. KB-InfoBot Versions Belief Trackers: A. Hand-Crafted (Bayesian updates) B. Neural (GRU) Policy Network: C. Hand-Crafted (Entropy Minimization) D. Neural (GRU) KB-lookup: 1. No KB lookup (Policy unaware of KB) 2. Hard-KB lookup (SQL type lookup) 3. Soft-KB lookup (KB Posterior) Rule-Based Agents: A + C + (1, 2, 3) RL-Based Agents: A + D + (1, 2, 3) E2E Agent: B + D + (3)
  • 18. Training • All agents trained using against a publicly available user simulator (Li et al, 2017)* • Optimize future discounted rewards: • RL agent: • E2E agent: • Credit assignment: – E2E agent always fails with random initialization – Imitation learning at beginning to mimic rule-based policy * https://github.com/MiuLab/TC-Bot Policy KB Posterior Policy
  • 19. Simulation Results • Evaluated on Movie-Centric KBs – small, medium, large, X-large • Metrics: – # of Dialogue Turns (T) – Success Rate (correct movie returned) (S) – Average Reward (R) • All agents tuned to maximize average reward Soft-KB > Hard-KB > No-KB RL > Rule-based E2E performs best
  • 20. Human Evaluation • Setting – Typed interactions – Given 1) a goal entity 2) subset of slot values – multiple values per slot  noise modeling – Users are free to frame their inputs Soft-KB lookup > Hard-KB lookup (Success Rate) RL agent > Rule-based agent (#Turns) However, full E2E agent performed worse than RL- Soft and Rule-Soft agents
  • 21. Discussion • Soft-KB lookup – Better dialogue policies • E2E agent – Strong performance in simulations – Does not transfer to real interactions – Overfits to the limited natural language from the simulator • Future research: personalized dialogue assistants? – Deploy using RL-Soft agent – Collect interactions to train E2E agent – Gradually switch to the E2E agent
  • 22. Thanks for Your Attention! Code Available: https://github.com/MiuLab/KB-InfoBot

Notes de l'éditeur

  1. There has been interest in semantic parsing of complicated queries using neural models (Neural GenQA), but evidence suggests an interactive setting may be more appropriate.
  2. What is a goal-oriented dialog system? Description of each module: - NLU – extract entities and intents - Tracker – maintain distribution over user goals and information