Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Three Part Harmony: How Rasa and Open
Source Can Make Your Product Sing
Josh Converse
Founder, Dynamic Offset
Rasa Develop...
Three Part Harmony
Josh Converse • Dynamic Offset
hello@dynamicoffset.io
How Rasa and Open Source Can Make Your
Product Si...
Demo
Wanted to provide a digital phone receptionist that could
perform routine tasks on behalf of the business.
It needed ...
At a conceptual level, what’s going on?
The telephony carrier receives a
call from the regular phone
network and starts a VoIP call to
our system
The VoIP call is answered by our
system, and the call audio is
streamed to a speech-to-text system
for transcription
The speech-to-text system
converts the audio stream into
text, forwarding that text to the
agent for handling
The agent will interpret the text in the
context of the overall conversation,
ultimately taking action
The spoken text is classified into
known, structured data called an
intent
The classified intent is evaluated
relative to the conversation as a whole
(e.g. previous responses,
conversational norms, ...
The agent decides how to react to
the structured intent (e.g. spoken
response, database access, etc)
Having decided what to
say and do, the agent
provides a textual
response
Text from the agent is synthesized
into an audio stream using a
text-to-speech system, and sent to
the phone call
The system takes the audio of the
agent’s response and feeds it into the
ongoing VoIP call.
The VoIP Provider takes care of
bridging the audio between the
VoIP call and the regular phone
network
More Detail!
Twilio receives a call from the
regular phone network and
starts a VoIP call to our
Kamailio server
Kamailio routes the call to an Asterisk
server which auto-answers the call. It
taps into the incoming audio stream and
sen...
Google Cloud
Speech-To-Text transcribes
the audio and the results are
sent to the Rasa agent
The Rasa agent will ultimately
handle interpreting the text and
taking action based on the current
state of the conversati...
First, Rasa NLU classifies the raw
text into structured intents - e.g.
inform_name, request_time_slot
Then, Rasa Core will evaluate that
intent in the context of the entire
conversation
Rasa Core will emit one or more
actions that need to be performed in
response to the conversation. In this
example, it’s a...
The agent’s textual response is
sent to Amazon Polly
text-to-speech for synthesis.
Amazon Polly synthesizes the
audio and forwards it to Asterisk
to be played on the ongoing call
Asterisk injects the audio stream
from Amazon Polly into the
VoIP call
Twilio bridges the audio from
Asterisk back to the regular
phone network
This conversation between the customer
and the agent continues in a loop for the
duration of the call.
Even More Detail!
Golden Age Of Open Source Software
How can you run all this distributed
software reliably?
Kubernetes
“Kubernetes provides a container-centric management
environment. It orchestrates computing, networking, and
sto...
X
Single-process systems can’t
do the job and hand-run
clusters can be painful.
Kubernetes Manages The
Fleet So You Don’t Have To
Distributed Environments:
Using Rasa Core As
An Orchestrator
Rasa Core
“Rather than a bunch of if/else statements, [your bot] uses a
machine learning model trained on example conversa...
Rasa Core - Training
Rasa Core training examples are a “historical record” of a
past interaction – a blow-by-blow recounti...
Rasa Core - Training
With training, the agent learns which actions to take based on
stimuli & context.
When presented with...
Rasa Core - What are actions?
Actions are the “abilities” available to your agent.
● You write these yourself
● Reference ...
Rasa - Training Sample
* request_menu{“restaurant”: “foo”}
- action.restaurant_search
- slot{“found_restaurants”: 2}
- act...
Rasa = Flexibility
With training, you can drive your whole system’s behavior if
you have an expressive vocabulary of actio...
Rasa Core
=
System Flexibility
General
Building your own Duplex AI agent using Rasa and Twilio
Twine on Github (coming soon)
Kubernetes Resources
Kuberne...
Appendix
Distributed Rasa Actions
Rasa + Distributed
Actions don’t have
to reside on the
same host as the
Rasa agent.
Kubernetes makes
this easy to do.
Attributions
gpu by Phonlaphat Thongsriphong from the Noun Project
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product...
Prochain SlideShare
Chargement dans…5
×

Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing

426 vues

Publié le

It has been said of the Beatles that the whole (the band) is greater than the sum of the parts (the band members). The same can hold true for open source software. This talk explores combining disparate open source technologies, backed by a Rasa "brain", to yield amazing results, explored through building a phone-based voice receptionist.

WHAT YOU'LL LEARN
The open source ecosystem represents a suite of great standalone technologies. Combining them in a product can yield even more amazing results
Rasa provides the much-needed flexibility for your system to react and adapt to the real world.
Leveraging open source (Rasa included) allows you to spend more time on the most interesting parts of your product.

Josh Converse is the founder of Dynamic Offset, a boutique consulting firm specializing in mobile, web, and conversational experiences. Prior to consulting he held tech lead roles at both Google and Apple.

Publié dans : Technologie
  • Soyez le premier à commenter

Rasa Developer Summit - Josh Converse, Dynamic Offset - Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing

  1. 1. Three Part Harmony: How Rasa and Open Source Can Make Your Product Sing Josh Converse Founder, Dynamic Offset Rasa Developer Summit - 2019
  2. 2. Three Part Harmony Josh Converse • Dynamic Offset hello@dynamicoffset.io How Rasa and Open Source Can Make Your Product Sing
  3. 3. Demo Wanted to provide a digital phone receptionist that could perform routine tasks on behalf of the business. It needed to: ● Have conversations just like a human would ● React to “curveballs” ● Take action on behalf of the user ● Act autonomously
  4. 4. At a conceptual level, what’s going on?
  5. 5. The telephony carrier receives a call from the regular phone network and starts a VoIP call to our system
  6. 6. The VoIP call is answered by our system, and the call audio is streamed to a speech-to-text system for transcription
  7. 7. The speech-to-text system converts the audio stream into text, forwarding that text to the agent for handling
  8. 8. The agent will interpret the text in the context of the overall conversation, ultimately taking action
  9. 9. The spoken text is classified into known, structured data called an intent
  10. 10. The classified intent is evaluated relative to the conversation as a whole (e.g. previous responses, conversational norms, etc)
  11. 11. The agent decides how to react to the structured intent (e.g. spoken response, database access, etc)
  12. 12. Having decided what to say and do, the agent provides a textual response
  13. 13. Text from the agent is synthesized into an audio stream using a text-to-speech system, and sent to the phone call
  14. 14. The system takes the audio of the agent’s response and feeds it into the ongoing VoIP call.
  15. 15. The VoIP Provider takes care of bridging the audio between the VoIP call and the regular phone network
  16. 16. More Detail!
  17. 17. Twilio receives a call from the regular phone network and starts a VoIP call to our Kamailio server
  18. 18. Kamailio routes the call to an Asterisk server which auto-answers the call. It taps into the incoming audio stream and sends it off for transcription
  19. 19. Google Cloud Speech-To-Text transcribes the audio and the results are sent to the Rasa agent
  20. 20. The Rasa agent will ultimately handle interpreting the text and taking action based on the current state of the conversation
  21. 21. First, Rasa NLU classifies the raw text into structured intents - e.g. inform_name, request_time_slot
  22. 22. Then, Rasa Core will evaluate that intent in the context of the entire conversation
  23. 23. Rasa Core will emit one or more actions that need to be performed in response to the conversation. In this example, it’s a query to MongoDB followed by a spoken response
  24. 24. The agent’s textual response is sent to Amazon Polly text-to-speech for synthesis.
  25. 25. Amazon Polly synthesizes the audio and forwards it to Asterisk to be played on the ongoing call
  26. 26. Asterisk injects the audio stream from Amazon Polly into the VoIP call
  27. 27. Twilio bridges the audio from Asterisk back to the regular phone network
  28. 28. This conversation between the customer and the agent continues in a loop for the duration of the call.
  29. 29. Even More Detail!
  30. 30. Golden Age Of Open Source Software
  31. 31. How can you run all this distributed software reliably?
  32. 32. Kubernetes “Kubernetes provides a container-centric management environment. It orchestrates computing, networking, and storage infrastructure on behalf of user workloads.” https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/
  33. 33. X
  34. 34. Single-process systems can’t do the job and hand-run clusters can be painful.
  35. 35. Kubernetes Manages The Fleet So You Don’t Have To
  36. 36. Distributed Environments: Using Rasa Core As An Orchestrator
  37. 37. Rasa Core “Rather than a bunch of if/else statements, [your bot] uses a machine learning model trained on example conversations to decide what to do next.” https://rasa.com/docs/core/
  38. 38. Rasa Core - Training Rasa Core training examples are a “historical record” of a past interaction – a blow-by-blow recounting of a known-good encounter. Three parts: ● Stimuli from the user (Responses, Button Clicks, etc) ● Actions taken by the agent. ● Context (Slots, History, etc)
  39. 39. Rasa Core - Training With training, the agent learns which actions to take based on stimuli & context. When presented with something wholly unseen, the agent will “improvise” using the tools (actions) it has available.
  40. 40. Rasa Core - What are actions? Actions are the “abilities” available to your agent. ● You write these yourself ● Reference them in training data ● Can influence the state of the conversation The agent may, based on its training, choose to run one or more actions in response to stimuli.
  41. 41. Rasa - Training Sample * request_menu{“restaurant”: “foo”} - action.restaurant_search - slot{“found_restaurants”: 2} - action.request_disambiguation * inform_location{“location”: “blah”} - action.restaurant_search - slot{“restaurant_id”: “12345”} - action.menu_lookup - slot{“menu_id”: “98765”} - action.prompt_menu_send * affirm - action.send_menu_text So the agent sends out the menu. * 👩 Asked for menu of restaurant - 🤖 Search db for restaurants - (Found 2 restaurants) - 🤖 Ask user to choose * 👩 Responded with their location - 🤖 Search db for restaurants - (Found a restaurant) - 🤖 Look up their menu - (Menu lookup success) - 🤖 Ask if ok to send menu * 👩 Yes it’s ok - 🤖 Send menu (SMS/Email)
  42. 42. Rasa = Flexibility With training, you can drive your whole system’s behavior if you have an expressive vocabulary of actions (As opposed to writing imperative code). Rasa can form the “brain” of the system – giving instructions (actions) that the other parts of the system carry out. This is the magic.
  43. 43. Rasa Core = System Flexibility
  44. 44. General Building your own Duplex AI agent using Rasa and Twilio Twine on Github (coming soon) Kubernetes Resources Kubernetes Tutorials What is Kubernetes? An old (but good) overview Google Kubernetes Engine
  45. 45. Appendix
  46. 46. Distributed Rasa Actions
  47. 47. Rasa + Distributed Actions don’t have to reside on the same host as the Rasa agent. Kubernetes makes this easy to do.
  48. 48. Attributions gpu by Phonlaphat Thongsriphong from the Noun Project

×