Google Duplex is the technology that gives Google Assistant the ability to make phones calls and sound 'human'. It allows certain users to make a restaurant reservation by phone, but instead of the user speaking directly to the restaurant employee, Google Duplex, with the help of Google Assistant, speaks for the user with an AI-based, but human sounding, voice. Using Google’s Duplex technology, Assistant will place a phone call to your chosen restaurant, have a voice conversation with the employee at the other end, and send you a confirmation that the reservation was successful and is set.
2. Contents
● Introduction
● Abstract
● Context about Google Duplex
● Architecture
● DNNs and RNNs
● Closed domains and Vanishing gradient problem
● Process Flow
3. Introduction
A long-standing goal of human-computer interaction has
been to enable people to have a natural conversation with
computers, as they would with each other. In recent years,
we have witnessed a revolution in the ability of computers to
understand and to generate natural speech, especially with
the application of deep neural networks (e.g., Google voice
search, WaveNet).
4. Abstract
Google Duplex, It is a new technology for conducting
natural conversations to carry out “real world” tasks over the
phone. The technology is directed towards completing
specific tasks, such as scheduling certain types of
appointments. For such tasks, the system makes the
conversational experience as natural as possible, allowing
people to speak normally, like they would to another person,
without having to adapt to a machine.
5. Defining a natural conversation
A natural conversation can be described with the following
characteristics:
● Speaker is exhibiting goal-directed, cooperative, rational
behavior.
● Speaker is using the appropriate tone.
● Speaker can understand and control the conversational
flow and use the right timing.
6. What is Google Duplex?
● Google Duplex is an artificial intelligence (AI) chat agent
that can carry out specific verbal tasks, such as making a
reservation or appointment, over the phone.
● It works to conduct natural conversations to
accomplish certain types of tasks.
7. Closed domain operation
Google Duplex is not able to carry out random casual
conversation. Rather, it was trained to autonomously handle
three specific types of tasks:
● Scheduling a hair salon appointment,
● Making a restaurant reservation, and
● Asking about the business hours of a store.
8. How does Google Duplex model natural
conversations?
● Duplex uses a deep neural network (DNN); in more
complex cases, it makes use of a recurrent neural
network (RNN) which is more expensive, but better at
modeling language.
● At the core of Duplex is a recurrent neural network (RNN)
designed to cope with these challenges, built using
TensorFlow Extended (TFX).
9. Architecture
Incoming sound is processed through an Automatic Speech Recognition (ASR) system.
This produces text that is analyzed with context data and other inputs to produce a
response text that is read aloud through the Text-to-Speech (TTS) system.
10. Deep Neural Networks (DNNs)
● DNNs involve an input layer, a hidden layer (the matrix
of weights which is trained against data), and an
output layer capable of producing what can be
interpreted as a prediction or a classification.
11. Recurrent Neural Networks (RNNs)
RNNs not only ingest the current
input, they also ingest their past
hidden state as well. This allows
for them to learn sequential
patterns.
“Rolled up” RNN
“Unrolled” RNN
12. DNNs versus RNNs
● DNNs are good at one-shot prediction—if a single
observation is all it takes to produce suitable output.
● However, oftentimes, data comes in sequences, esp. for
a language it arrives in a specific sequence. It’s for this
reason that RNNs are used.
● Since it is very important to remember the context when
conducting a longer human-like conversation, RNNs
became one of the obvious, go-to choices to do the job.
13. Why closed domain operation is important?
● Closed domains are loosely defined as any setting that
has a limited number of conceivable interactions.
● Any closed domain has a sort of closed (and well-worn)
number of conversational paths and options.
● When a domain is closed, conversations are
pigeonholed—the same sorts of conversations occur over
and over, building up a stronger dataset for harder-to-
reach features such as natural timing, knowing
industry/trade slang, and so on.
14. Advantages of closed domain operation
● It has a number of advantages, but a major one is that it
helps Duplex avoid the “vanishing gradient problem,”
which is an issue for many DNNs and RNNs alike.
● It increases the sample size for particular conversational
paths in Duplex’s training data.
15. Vanishing Gradient Problem
● When many hidden layers are stacked such as in a multi-
layer DNN or between time steps in an RNN, the network
begins to “forget” the past.
● As the network goes through multiple layers of words, the
original context gets lost, so it fails to capture the
relationship between the words that stand far apart in a
conversation.
● This happens due to the underlying mechanics of
backpropagation.
16. Illustration of vanishing gradients
● Given a closed domain, the
number of times one has
to look into the past is
constrained.
● Vanishing gradients aren’t
as much of an issue if you
don’t need to remember
much.
17. Understanding Nuances
● When many hidden layers are stacked such as in a
multi-layer DNN or between time steps in an RNN, the
network begins to “forget” the past.
● In the above example, we can see how the meaning of
“OK for 4” changes in different contexts.
19. Conclusion
Allowing people to interact with technology as naturally as
they interact with each other has been a long standing
promise. Google Duplex takes a step in this direction,
making interaction with technology via natural
conversation a reality in specific scenarios.