Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Interactive Learning of
Task-Oriented Dialog Systems
Bing Liu
Research Scientist, Facebook Conversational AI
Rasa Develope...
Interactive Learning of Task-Oriented
Dialog Systems
Bing Liu
Research Scientist, Facebook
PhD, Carnegie Mellon University
❖ Dialog systems
➢ Chit-chat bot, QA bot, task-oriented dialog system, ...
❖ Get stuff done - assist users in completing s...
Modular Dialog System Architecture
3
Task-Oriented Dialog System
❖ Highly handcrafted
❖ Process interdependent
4
❖ Data driven end-to-end (E2E) systems
➢ [Wen ...
Why Learn through Interactions?
❖ Task-oriented dialog as a sequential decision making process over
multiple steps
5
❖ Sta...
How can we learn end-to-end task-oriented dialog
system effectively through interaction with users?
6
End-to-End Task-Oriented Dialog Modeling
7
❖ Dialog context modeling with hierarchical RNN
B Liu, et al, "Dialogue Learnin...
End-to-End Task-Oriented Dialog Modeling
8
End-to-End Modeling of
SLU, DST, and Dialog Policy
Supervised Pre-training
❖ Supervised model pre-training on dialog corpus with MLE
➢ Objective function: linear interpolati...
Learn Interactively from User Feedback
❖ Interactive dialog learning with user feedback
10
Provide feedback for
policy opt...
Learn Interactively from User Feedback
❖ Use user feedback as dialog reward
❖ Introduce step penalty to encourage
shorter ...
Learn Interactively from User Feedback
❖ Policy optimization with RL can be slow due to sparse reward
12
❖ Dialog state di...
Learn Interactively from User Teaching
❖ Interactive dialog learning with user teaching
13
Correct mistakes &
Demo desired...
Evaluation
14
Slots: theatre name, movie, date, time, num of people
SL: Supervised pre-training model
IL: Imitation learni...
15
What if a user did not provide any feedback, can we
still learn anything from the interaction?
Can we learn a dialog reward function?
❖ User feedback serves as reward to RL optimization
16
❖ Task completion based rewa...
Adversarial Dialog Learning
17
Reward
Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models",...
Discriminative Reward Model
18
User’s Turn Agent’s Turn
External
Entity Info
❖ Input:
➢ Sequence of dialog turns
❖ Represe...
Model Training
❖ Supervised pre-training with an initial set of pos & neg samples
➢ Pre-train dialog agent G on positive d...
❖ Comparing different reward functions
Evaluation
20
Bing Liu and Ian Lane, "Adversarial Learning of
Task-Oriented Neural ...
Summary
❖ The multi-turn nature of task-oriented dialogs makes it especially
important for a system to learn through inter...
Thanks!
Q & A
22
Prochain SlideShare
Chargement dans…5
×

Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems

425 vues

Publié le

Task-oriented spoken dialog system is a prominent component in today’s virtual personal assistant (e.g. Alexa, Siri), which enables people to perform everyday tasks by interacting with devices via voice interfaces. Recent advances in deep learning enabled new research directions for end-to-end dialog modeling. Such data-driven end-to-end learning systems address many limitations of conventional dialog systems. This talk will review the research work on deep learning and reinforcement learning for neural dialog systems. We will further discuss hybrid dialog learning frameworks that combine offline training and online interactive learning with human-in-the-loop. This talk will conclude with the challenges and directions in further advancing data-drive conversational AI systems. Bing Liu is a research scientist in Facebook working on conversational AI. His research interests focus on machine learning for spoken language processing, natural language understanding, and dialog systems. He develops conversational AI system that learns from both offline annotated samples and online interactions. Bing received his Ph.D. degree from Carnegie Mellon University in 2018 where he worked on deep learning and reinforcement learning for task-oriented dialog systems. Before joining Facebook, he interned at Google Research working on end-to-end learning of neural dialog systems.

Publié dans : Technologie
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/qURD } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/qURD } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/qURD } ......................................................................................................................... Download doc Ebook here { https://soo.gd/qURD } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Great, great Article. I am currently outlining a "master-plan" for a "shared online service center" for gov administrations on national, regional and local levels. Maybe we have synergies. When yes. Call me. .
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems

  1. 1. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook Conversational AI Rasa Developer Summit - 2019
  2. 2. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook PhD, Carnegie Mellon University
  3. 3. ❖ Dialog systems ➢ Chit-chat bot, QA bot, task-oriented dialog system, ... ❖ Get stuff done - assist users in completing specific tasks ➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal) ➢ Voice command in vehicle and smart home ➢ Customer service; Sales and marketing Task-Oriented Dialog System 2
  4. 4. Modular Dialog System Architecture 3
  5. 5. Task-Oriented Dialog System ❖ Highly handcrafted ❖ Process interdependent 4 ❖ Data driven end-to-end (E2E) systems ➢ [Wen et al. 2016]: E2E supervised training neural dialog model ➢ [Bordes and Weston, 2017]: E2E model with memory network ➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E system ❖ Interactive learning for E2E system with less human supervision
  6. 6. Why Learn through Interactions? ❖ Task-oriented dialog as a sequential decision making process over multiple steps 5 ❖ State space grows exponentially with number of dialog turns ❖ Extremely hard to ➢ Design all possible dialog paths ➢ Collect a dialog corpus that is large enough to cover all dialog scenarios → Continuously learn through the interaction with users and improve over time
  7. 7. How can we learn end-to-end task-oriented dialog system effectively through interaction with users? 6
  8. 8. End-to-End Task-Oriented Dialog Modeling 7 ❖ Dialog context modeling with hierarchical RNN B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  9. 9. End-to-End Task-Oriented Dialog Modeling 8 End-to-End Modeling of SLU, DST, and Dialog Policy
  10. 10. Supervised Pre-training ❖ Supervised model pre-training on dialog corpus with MLE ➢ Objective function: linear interpolation of cross-entropy losses for ■ Dialog state tracking, i.e. user goal estimation, and ■ Dialog policy, i.e. system action prediction ➢ Optimization: Stochastic gradient descent, Adam 9 ← Loss for user goal estimation ← Loss for system action prediction
  11. 11. Learn Interactively from User Feedback ❖ Interactive dialog learning with user feedback 10 Provide feedback for policy optimization Human-Human Dialog Corpora Supervised Pre-training
  12. 12. Learn Interactively from User Feedback ❖ Use user feedback as dialog reward ❖ Introduce step penalty to encourage shorter dialog for task completion ❖ Optimize dialog model end-to-end with policy gradient RL: 11
  13. 13. Learn Interactively from User Feedback ❖ Policy optimization with RL can be slow due to sparse reward 12 ❖ Dialog state distribution mismatch between offline training and interactive learning leads to compounding errors → Ask user for correction/demonstration when fails at a task and learn to act ❖ Agent may learn to recover from bad state with RL but the search process can be very inefficient
  14. 14. Learn Interactively from User Teaching ❖ Interactive dialog learning with user teaching 13 Correct mistakes & Demo desired dialog agent behavior Add to existing corpora Driven by the agent’s own policy New Dialog Human-Human Dialog Corpora Supervised Pre-training
  15. 15. Evaluation 14 Slots: theatre name, movie, date, time, num of people SL: Supervised pre-training model IL: Imitation learning with user teaching RL: Reinforcement learning with user feedback ❖ Movie booking domain simulation (M2M) Table: Human evaluation results. Mean and standard deviation of crowd worker scores (1-5) B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  16. 16. 15 What if a user did not provide any feedback, can we still learn anything from the interaction?
  17. 17. Can we learn a dialog reward function? ❖ User feedback serves as reward to RL optimization 16 ❖ Task completion based reward requires prior knowledge of user’s goal → NOT usually accessible in real world user interactions ❖ In practice, user feedback can be inconsistent and is NOT always available
  18. 18. Adversarial Dialog Learning 17 Reward Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018. ❖ Reward a machine-agent for conducting task-oriented dialog in a way that is indistinguishable from the way human-agents do it.
  19. 19. Discriminative Reward Model 18 User’s Turn Agent’s Turn External Entity Info ❖ Input: ➢ Sequence of dialog turns ❖ Representation: ➢ BiLSTM with max-pooling ❖ Output: ➢ Prob. of a dialog being successfully completed by a human agent Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  20. 20. Model Training ❖ Supervised pre-training with an initial set of pos & neg samples ➢ Pre-train dialog agent G on positive dialog samples with MLE ➢ Pre-train discriminative reward function D on pos & neg samples ❖ Interactive learning cycle ➢ Collect new dialog sample(s) between agent G and users ➢ Update dialog agent G with RL using the reward produced by D ➢ Update reward function D using the newly collected sample(s) ➢ Continue for next learning cycle 19
  21. 21. ❖ Comparing different reward functions Evaluation 20 Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  22. 22. Summary ❖ The multi-turn nature of task-oriented dialogs makes it especially important for a system to learn through interaction with users ❖ Learning task-oriented dialog model end-to-end with user teaching and feedback ❖ Adversarial dialog learning to address the challenges with missing or inconsistent user feedback with less human supervision 21
  23. 23. Thanks! Q & A 22

×