SlideShare une entreprise Scribd logo
1  sur  41
CHATBOT A Generative Based
Approach
-MANISH MISHRA
WHAT IS CHATBOT?
A chatbot is a program that communicates with us.
A chatbot is a service, powered by rules and sometimes artificial
intelligence, that we interact with via a chat interface.
Some chatterbots use sophisticated natural language processing
systems, but many simpler systems scan for keywords within the
input, then pull a reply with the most matching keywords, or the most
similar wording pattern, from a database.
Today, chatbots are part of virtual assistants such as Google
Assistant, and are accessed via many organizations' apps, websites,
and on instant messaging platforms such as Facebook Messenger
WHY WE NEED CHATBOT?
Trends shows that, users are
investing more time on
messaging apps.
Chatbots can handle numerous
conversations at once without
requiring a person on the other
end answering messages by
hand.
WHY WE NEED
CHATBOT?(CONTINUE)
Apps consume most of the memory of the device. Hence the user’s
do not want to use separate apps for separate purposes.
Trend shows that over 90% of all the apps are uninstalled after its
first use.
Developing a chatbot takes significantly less time and it is also easy
to maintain and less expensive as compared to apps.
TAXONOMY OF MODELS
TAXONOMY OF MODELS
(CONTINUE)
Retrieval-based models (easier) use a repository of predefined
responses and some kind of heuristic to pick an appropriate response
based on the input and context.
I. Respond rule based expression, don’t generate any new text.
II. Ensemble of machine learning.
III. Just pick up a response from a fixed set.
IV. Don’t make any grammatical mistakes.
V. In open domain, it is impossible to make repository of
handcrafted responses
TAXONOMY OF MODELS
(CONTINUE)
Generative models (harder) don’t rely on pre-defined responses. They
generate new responses from scratch. Generative models are typically
based on Machine Translation techniques, but instead of translating
from one language to another, we “translate” from an input to an
output (response).
I. Huge amount of data is needed to train the model.
II. On long text, these models makes grammatical mistakes.
III. In closed domain, Generative models are tough to train than the
Retrieval-Based model.
TAXONOMY OF MODELS (CONTINUE)
The encoder data will be the text from one side of conversation. The
decoder data will be the responses.
Tokenize the sentence by chopping it into words and giving every word a
Token ID, so that data retrieval will be faster, now train the model.
RECURRENT NEURAL NETWORKS-
PROMISING IN NLP TASKS
Applications:-
1. It allows us to score arbitrary sentences based on how likely they
are to occur in the real world. This gives us a measure
of grammatical and semantic correctness.(For machine translation)
2. Allows us to generate new text. (For Language Modelling i.e.
Chatbot)
IDEA BEHIND RNN
I. To make use of sequential information.
II. In traditional NN, we assume that all inputs (and outputs) are
independent of each other.
III. For NLP tasks, it is a bad idea because If you want to predict the
next word in a sentence you better know which words came before
it.
IV. RNNs have a “memory” which captures information about what
has been calculated so far.
WORKING PRINCIPLE OF RNN
 x_t is the input at time step t. For example, x_1 could be a one-hot vector corresponding to
the second word of a sentence.
 s_t is the hidden state at time step t.
s_t=f(Ux_t + Ws_{t-1}).
Function f is tanh or ReLU(non linear function)
o_t = softmax(Vs_t).
(o_t is the output at step t)
IMPORTANT POINTS ON RNN
I. Unlike a traditional deep neural network, a RNN shares the same
parameters (U, V, W above) across all steps. This greatly reduces
the total number of parameters we need to learn.
II. In theory RNNs can make use of information in arbitrarily long
sequences, but in practice they are limited to looking back only a
few steps.
III. certain types of RNNs (like LSTMs, GRU (a simplified version of
LSTM)) were specifically designed to overcome the problem of
vanishing gradient(difficulties learning long-term dependencies).
LONG SHORT TERM MEMORY
NETWORKS(LSTM)
LSTMs are explicitly designed to avoid the long-term dependency
problem. Remembering information for long periods of time is
practically their default behavior, not something they struggle to
learn!
THE CORE IDEA BEHIND LSTMS
I. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information
to just flow along it unchanged.
II. The LSTM does have the ability to remove or add information to the cell state,
carefully regulated by structures called gates.
III. The sigmoid layer outputs numbers between zero and one, describing how
much of each component should be let through.
STEP-BY-STEP LSTM WALK
THROUGH
•To decide what information we’re going to throw away from the cell state.
This decision is made by a sigmoid layer called the “forget gate layer.”
It looks at h(T−1) and X(t), and outputs a number between 0 and 1 for each
number in the cell state C(t-1).
STEP-BY-STEP LSTM WALK
THROUGH
(CONTINUED..)
•What new information we’re going to store in the cell state.
I. A sigmoid layer called the “input gate layer” decides which values
we’ll update.
II. A Tanh layer creates a vector of new candidate values, ~C(t), that
could be added to the state. In the next step, we’ll combine these
two to create an update to the state.
STEP-BY-STEP LSTM WALK
THROUGH
(CONTINUED..)
•To update the old Cell state C(t-1), into the new cell state c(t).
I. Multiply the old state by f(t), forgetting the things we decided to
forget earlier.
II. Then we add i(t)∗~C(t). This is the new candidate values, scaled by
how much we decided to update each state value.
STEP-BY-STEP LSTM WALK
THROUGH
(CONTINUED..)• We need to decide what we’re going to output.
1. First, we run a sigmoid layer which decides what parts of the cell state we’re
going to output.
2. We put the cell state through tanh (to push the values to be between −1 and 1)
and multiply it by the output of the sigmoid gate, so that we only output the
parts we decided to.
GATED RECCURENT UNITS(GRU)
•A GRU has two gates, an LSTM has three gates.
•GRUs don’t possess and internal memory (C(t)) that is different from the exposed hidden
state. They don’t have the output gate that is present in LSTMs.
•The input and forget gates are coupled by an update gate z and the reset gate r is applied
directly to the previous hidden state. Thus, the responsibility of the reset gate in a LSTM
is really split up into both r and z.
•We don’t apply a second nonlinearity when computing the output.
ADDING A SECOND GRU LAYER
1. Adding a second layer to our
network allows our model to
capture higher-level interactions.
2. It is likely see diminishing
returns after 2-3 layers and
unless we have a huge amount of
data (which we don’t) more layers
are unlikely to make a
big difference and may lead to
overfitting.
GRU VS LSTM
 In many tasks both architectures yield comparable performance and
tuning hyperparameters like layer size is probably more important
than picking the ideal architecture.
 GRUs have fewer parameters (U and W are smaller) and thus may
train a bit faster or need less data to generalize.
 On the other hand, if you have enough data, the greater expressive
power of LSTMs may lead to better results.
PRE-PROCESSING THE DATA
•TOKENIZE TEXT
We want to make predictions on a per-word basis. This means we
must tokenize our comments into sentences, and sentences into
words.
The sentence “He left!” should be 3 tokens: “He”, “left”, “!”.
•REMOVE INFREQUENT WORDS
Most words in our text will only appear one or two times. It’s a good
idea to remove these infrequent words as having a huge vocabulary
will make our model slow to train
PRE-PROCESSING THE
DATA(CONTINUED..)
•PADDING
Before training, we work on the dataset to convert the variable length sequences into fixed length
sequences, by padding. We use a few special symbols to fill in the sequence.
1. EOS : End of sentence
2. PAD : Filler
3. GO : Start decoding
4. UNK : Unknown; word not in vocabulary
Consider the following query-response pair:
Q : How are you?
A : I am fine.
Assuming that we would like our sentences (queries and responses) to be of fixed length, 10, this pair will be
converted to:
Q : [ PAD, PAD, PAD, PAD, PAD, PAD, “?”, “you”, “are”, “How” ]
A : [ GO, “I”, “am”, “fine”, “.”, EOS, PAD, PAD, PAD, PAD ]
PRE-PROCESSING THE
DATA(CONTINUED..)
•BUCKETING
•If the largest sentence in our dataset is of length 100, we need to encode all our sentences
to be of length 100, in order to not lose any words. Now, what happens to “How are you?”
? There will be 97 PAD symbols in the encoded version of the sentence. This will
overshadow the actual information in the sentence.
•Bucketing kind of solves this problem, by putting sentences into buckets of different
sizes. Consider this list of buckets : [ (5,10), (10,15), (20,25), (40,50) ].
•If the length of a query is 4 and the length of its response is 4 (as in our previous
example), we put this sentence in the bucket (5,10). The query will be padded to length 5
and the response will be padded to length 10.
•If we are using the bucket (5,10), our sentences will be encoded to :
Q : [ PAD, “?”, “you”, “are”, “How” ]
A : [ GO, “I”, “am”, “fine”, “.”, EOS, PAD, PAD, PAD, PAD ]
WORD EMBEDDING
•CO-OCCURRENCE MATRIX
Since deep learning loves math, we’re going to represent each word as
a d-dimensional vector.
Here, 6 distinct word, so each word will be of 6-dim vector.
CONTINUE
D….
Extracting the rows from this matrix can give us a simple initialization of our
word vectors.
INFERENCE FROM THE ABOVE
EXAMPLE
I. Notice that the words ‘love’ and ‘like’ both contain 1’s for their counts
with nouns (NLP and dogs).
II. They also have 1’s for the count with “I”, thus indicating that the words
must be some sort of verb.
III. With a larger dataset than just one sentence, it can be imagined that this
similarity will become more clear as ‘like’, ‘love’, and other synonyms will
begin to have similar word vectors, because of the fact that they are used
in similar contexts.
LIMITATION
I. The dimensionality of each word will increase linearly with the size of the
corpus.
II. If we had a million words (not really a lot in NLP standards), we’d have a
million by million sized matrix which would be extremely sparse (lots of
0’s). Definitely not the best in terms of storage efficiency. Alternatively,
WORD2VEC APPROACH
•Word2Vec operates on the idea that we want to predict the surrounding words
of every word.
We’re going to look at the first 3 words of this sentence. Window size m=3.
Goal is to take the center word, ‘love’, and predict the words that come before
and after it by maximizing/optimizing a function to maximize the log probability
of any context word given the current center word.
Where log function is:
The above cost function is basically saying that we’re going to add the log
probabilities of ‘I’ and ‘love’ as well as ‘NLP’ and ‘love’ (where ‘love’ is the center
word in both cases).
WORD2VEC
APPROACH(CONTINUED..)
Vc is the word vector of the center word. Every word has two vector
representations (Uo and Uw), one for when the word is used as the center word
and one for when it’s used as the outer word. The vectors are trained with
stochastic gradient descent.
Word2Vec seeks to find vector representations of different words by maximizing
the log probability of context words given a center word and modifying the
vectors through SGD.
The most interesting contribution of Word2Vec was the appearance of linear
relationships between different word vectors.
After training, the word vectors seemed to capture different grammatical and
semantic concept.
It’s pretty incredible how these linear relationships could be formed through a
ALGORITHM OF WORD2VEC
•Two algorithms
1. Skip-grams (SG):Predict context words given target (position
independent).
2. Continuous Bag of Words (CBOW):Predict target word from bag-of-
words context.
•Two (moderately efficient) training methods :
1. Hierarchical softmax
2. Negative sampling
SKIP-GRAM PREDICTION
TO TRAIN THE MODEL: COMPUTE ALL
VECTOR GRADIENTS!
•We often define the set of all parameters in a model in terms of one long
vector Theta.
•Then optimize these parameters using gradient descent.
SEQUENCE TO SEQUENCE MODEL
FOR CHATBOT
•Sequence To Sequence model become the Go-To model for Dialogue
Systems and Machine Translation.
• It consists of two RNNs (Recurrent Neural Network(LSTM or GRU)) :
I. An encoder
II. A decoder
Encoder
•The encoder takes a sequence(sentence) as input and processes one
symbol(word) at each timestep.
•Its objective is to convert a sequence of symbols into a fixed size feature
vector that encodes only the important information in the sequence while
losing the unnecessary information.
•You can visualize data flow in the encoder along the time axis, as the flow of
local information from one end of the sequence to another.
SEQUENCE TO SEQUENCE MODEL FOR
CHATBOT(CONTINUED..)
•Each hidden state influences the next hidden state and the final hidden state can be
seen as the summary of the sequence. This state is called the context or thought
vector, as it represents the intention of the sequence.
•From the context, the decoder generates another sequence, one symbol(word) at a
time. Here, at each time step, the decoder is influenced by the context and the
previously generated symbols.
DUAL ENCODER LSTM ALGORITHM FOR
SEQ2SEQ
1. Both the context and the response text are split by words, and
each word is embedded into a vector. The word embeddings are
initialized with Word2Vec Skip gram model of vectors and are
fine-tuned during training.
2. Both the embedded context and response are fed into the same
Recurrent Neural Network word-by-word. The RNN generates a
vector representation that, loosely speaking, captures the
“meaning” of the context and response (c and r in the picture). We
can choose how large these vectors should be, but let’s say we
pick 256 dimensions.
3. We multiply c with a matrix M to “predict” a response r’. If c is a
256-dimensional vector, then M is a 256×256 dimensional matrix,
and the result is another 256-dimensional vector, which we can
interpret as a generated response. The matrix M is learned during
training.
DUAL ENCODER LSTM ALGORITHM FOR
SEQ2SEQ(CONT.)
•We measure the similarity of the predicted response r’ and the
actual response r by taking the dot product of these two
vectors.
•A large dot product means the vectors are similar and that the
response should receive a high score.
• We then apply a sigmoid function to convert that score into a
probability.
REFERENCES
• https://github.com/Marsan-Ma/chat_corpus (Sources of data for
trial ChatBot)
•http://www.wildml.com/2015/09/recurrent-neural-networks-
tutorial-part-1-introduction-to-rnns/
•http://www.wildml.com/2015/09/recurrent-neural-networks-
tutorial-part-2-implementing-a-language-model-rnn-with-python-
numpy-and-theano/
•http://www.wildml.com/2015/10/recurrent-neural-networks-
tutorial-part-3-backpropagation-through-time-and-vanishing-
gradients/
•http://www.wildml.com/2015/10/recurrent-neural-network-
tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
REFERENCES…
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
http://cs231n.github.io/optimization-1/
http://colah.github.io/posts/2015-08-Backprop/
http://cs231n.github.io/optimization-2/
http://neuralnetworksanddeeplearning.com/chap2.html
http://web.stanford.edu/class/cs224n/lectures/cs224n-2017-lecture2.pdf
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
http://sebastianruder.com/word-embeddings-1/
http://suriyadeepan.github.io/2016-06-28-easy-seq2seq/
https://www.tensorflow.org/tutorials/seq2seq

Contenu connexe

Tendances

How do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot ArchitectureHow do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot ArchitectureMaruti Techlabs
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial IntelligenceMd. Mahedi Mahfuj
 
Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16Samuel Adams, MBA
 
Chat GPT Intoduction.pdf
Chat GPT Intoduction.pdfChat GPT Intoduction.pdf
Chat GPT Intoduction.pdfThiyagu K
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - ChatbotLiam Bui
 
Ai chatbot ppt.pptx
Ai chatbot ppt.pptxAi chatbot ppt.pptx
Ai chatbot ppt.pptxaashnareddy1
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational InterfacesThe Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational InterfacesTWG
 
Everything to know about ChatGPT
Everything to know about ChatGPTEverything to know about ChatGPT
Everything to know about ChatGPTKnoldus Inc.
 

Tendances (20)

CHATBOT PPT-2.pptx
CHATBOT PPT-2.pptxCHATBOT PPT-2.pptx
CHATBOT PPT-2.pptx
 
Let's Build a Chatbot!
Let's Build a Chatbot!Let's Build a Chatbot!
Let's Build a Chatbot!
 
How do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot ArchitectureHow do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot Architecture
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 
Chatbot
ChatbotChatbot
Chatbot
 
Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16
 
Chatbots
ChatbotsChatbots
Chatbots
 
Everything you need to know about chatbots
Everything you need to know about chatbotsEverything you need to know about chatbots
Everything you need to know about chatbots
 
Chat GPT Intoduction.pdf
Chat GPT Intoduction.pdfChat GPT Intoduction.pdf
Chat GPT Intoduction.pdf
 
Ai chatbot
Ai chatbotAi chatbot
Ai chatbot
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
 
Chatbot
ChatbotChatbot
Chatbot
 
Ai chatbot ppt.pptx
Ai chatbot ppt.pptxAi chatbot ppt.pptx
Ai chatbot ppt.pptx
 
Chatbots 101
Chatbots 101Chatbots 101
Chatbots 101
 
Final presentation on chatbot
Final presentation on chatbotFinal presentation on chatbot
Final presentation on chatbot
 
Prompt Engineering
Prompt EngineeringPrompt Engineering
Prompt Engineering
 
ChatGPT Use- Cases
ChatGPT Use- Cases ChatGPT Use- Cases
ChatGPT Use- Cases
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational InterfacesThe Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
 
Everything to know about ChatGPT
Everything to know about ChatGPTEverything to know about ChatGPT
Everything to know about ChatGPT
 

Similaire à Chatbot ppt

IRJET - Deep Learning based Chatbot
IRJET - Deep Learning based ChatbotIRJET - Deep Learning based Chatbot
IRJET - Deep Learning based ChatbotIRJET Journal
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Jeongkyu Shin
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReducecoolmirza143
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxnilesh405711
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysisBarry DeCicco
 
Deep learning seminar report
Deep learning seminar reportDeep learning seminar report
Deep learning seminar reportSKS
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review GuideBenjamin Kissinger
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...kevig
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianDataStax Academy
 

Similaire à Chatbot ppt (20)

IRJET - Deep Learning based Chatbot
IRJET - Deep Learning based ChatbotIRJET - Deep Learning based Chatbot
IRJET - Deep Learning based Chatbot
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
ms_3.pdf
ms_3.pdfms_3.pdf
ms_3.pdf
 
ijeter35852020.pdf
ijeter35852020.pdfijeter35852020.pdf
ijeter35852020.pdf
 
Spam Clustering
Spam ClusteringSpam Clustering
Spam Clustering
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysis
 
Deep learning seminar report
Deep learning seminar reportDeep learning seminar report
Deep learning seminar report
 
Basics of Programming - A Review Guide
Basics of Programming - A Review GuideBasics of Programming - A Review Guide
Basics of Programming - A Review Guide
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl YeksigianC* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian
 

Dernier

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Dernier (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Chatbot ppt

  • 1. CHATBOT A Generative Based Approach -MANISH MISHRA
  • 2. WHAT IS CHATBOT? A chatbot is a program that communicates with us. A chatbot is a service, powered by rules and sometimes artificial intelligence, that we interact with via a chat interface. Some chatterbots use sophisticated natural language processing systems, but many simpler systems scan for keywords within the input, then pull a reply with the most matching keywords, or the most similar wording pattern, from a database. Today, chatbots are part of virtual assistants such as Google Assistant, and are accessed via many organizations' apps, websites, and on instant messaging platforms such as Facebook Messenger
  • 3. WHY WE NEED CHATBOT? Trends shows that, users are investing more time on messaging apps. Chatbots can handle numerous conversations at once without requiring a person on the other end answering messages by hand.
  • 4. WHY WE NEED CHATBOT?(CONTINUE) Apps consume most of the memory of the device. Hence the user’s do not want to use separate apps for separate purposes. Trend shows that over 90% of all the apps are uninstalled after its first use. Developing a chatbot takes significantly less time and it is also easy to maintain and less expensive as compared to apps.
  • 6. TAXONOMY OF MODELS (CONTINUE) Retrieval-based models (easier) use a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. I. Respond rule based expression, don’t generate any new text. II. Ensemble of machine learning. III. Just pick up a response from a fixed set. IV. Don’t make any grammatical mistakes. V. In open domain, it is impossible to make repository of handcrafted responses
  • 7. TAXONOMY OF MODELS (CONTINUE) Generative models (harder) don’t rely on pre-defined responses. They generate new responses from scratch. Generative models are typically based on Machine Translation techniques, but instead of translating from one language to another, we “translate” from an input to an output (response). I. Huge amount of data is needed to train the model. II. On long text, these models makes grammatical mistakes. III. In closed domain, Generative models are tough to train than the Retrieval-Based model.
  • 8. TAXONOMY OF MODELS (CONTINUE) The encoder data will be the text from one side of conversation. The decoder data will be the responses. Tokenize the sentence by chopping it into words and giving every word a Token ID, so that data retrieval will be faster, now train the model.
  • 9. RECURRENT NEURAL NETWORKS- PROMISING IN NLP TASKS Applications:- 1. It allows us to score arbitrary sentences based on how likely they are to occur in the real world. This gives us a measure of grammatical and semantic correctness.(For machine translation) 2. Allows us to generate new text. (For Language Modelling i.e. Chatbot)
  • 10. IDEA BEHIND RNN I. To make use of sequential information. II. In traditional NN, we assume that all inputs (and outputs) are independent of each other. III. For NLP tasks, it is a bad idea because If you want to predict the next word in a sentence you better know which words came before it. IV. RNNs have a “memory” which captures information about what has been calculated so far.
  • 11. WORKING PRINCIPLE OF RNN  x_t is the input at time step t. For example, x_1 could be a one-hot vector corresponding to the second word of a sentence.  s_t is the hidden state at time step t. s_t=f(Ux_t + Ws_{t-1}). Function f is tanh or ReLU(non linear function) o_t = softmax(Vs_t). (o_t is the output at step t)
  • 12. IMPORTANT POINTS ON RNN I. Unlike a traditional deep neural network, a RNN shares the same parameters (U, V, W above) across all steps. This greatly reduces the total number of parameters we need to learn. II. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps. III. certain types of RNNs (like LSTMs, GRU (a simplified version of LSTM)) were specifically designed to overcome the problem of vanishing gradient(difficulties learning long-term dependencies).
  • 13. LONG SHORT TERM MEMORY NETWORKS(LSTM) LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!
  • 14. THE CORE IDEA BEHIND LSTMS I. The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged. II. The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. III. The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through.
  • 15. STEP-BY-STEP LSTM WALK THROUGH •To decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at h(T−1) and X(t), and outputs a number between 0 and 1 for each number in the cell state C(t-1).
  • 16. STEP-BY-STEP LSTM WALK THROUGH (CONTINUED..) •What new information we’re going to store in the cell state. I. A sigmoid layer called the “input gate layer” decides which values we’ll update. II. A Tanh layer creates a vector of new candidate values, ~C(t), that could be added to the state. In the next step, we’ll combine these two to create an update to the state.
  • 17. STEP-BY-STEP LSTM WALK THROUGH (CONTINUED..) •To update the old Cell state C(t-1), into the new cell state c(t). I. Multiply the old state by f(t), forgetting the things we decided to forget earlier. II. Then we add i(t)∗~C(t). This is the new candidate values, scaled by how much we decided to update each state value.
  • 18. STEP-BY-STEP LSTM WALK THROUGH (CONTINUED..)• We need to decide what we’re going to output. 1. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. 2. We put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.
  • 19. GATED RECCURENT UNITS(GRU) •A GRU has two gates, an LSTM has three gates. •GRUs don’t possess and internal memory (C(t)) that is different from the exposed hidden state. They don’t have the output gate that is present in LSTMs. •The input and forget gates are coupled by an update gate z and the reset gate r is applied directly to the previous hidden state. Thus, the responsibility of the reset gate in a LSTM is really split up into both r and z. •We don’t apply a second nonlinearity when computing the output.
  • 20. ADDING A SECOND GRU LAYER 1. Adding a second layer to our network allows our model to capture higher-level interactions. 2. It is likely see diminishing returns after 2-3 layers and unless we have a huge amount of data (which we don’t) more layers are unlikely to make a big difference and may lead to overfitting.
  • 21. GRU VS LSTM  In many tasks both architectures yield comparable performance and tuning hyperparameters like layer size is probably more important than picking the ideal architecture.  GRUs have fewer parameters (U and W are smaller) and thus may train a bit faster or need less data to generalize.  On the other hand, if you have enough data, the greater expressive power of LSTMs may lead to better results.
  • 22. PRE-PROCESSING THE DATA •TOKENIZE TEXT We want to make predictions on a per-word basis. This means we must tokenize our comments into sentences, and sentences into words. The sentence “He left!” should be 3 tokens: “He”, “left”, “!”. •REMOVE INFREQUENT WORDS Most words in our text will only appear one or two times. It’s a good idea to remove these infrequent words as having a huge vocabulary will make our model slow to train
  • 23. PRE-PROCESSING THE DATA(CONTINUED..) •PADDING Before training, we work on the dataset to convert the variable length sequences into fixed length sequences, by padding. We use a few special symbols to fill in the sequence. 1. EOS : End of sentence 2. PAD : Filler 3. GO : Start decoding 4. UNK : Unknown; word not in vocabulary Consider the following query-response pair: Q : How are you? A : I am fine. Assuming that we would like our sentences (queries and responses) to be of fixed length, 10, this pair will be converted to: Q : [ PAD, PAD, PAD, PAD, PAD, PAD, “?”, “you”, “are”, “How” ] A : [ GO, “I”, “am”, “fine”, “.”, EOS, PAD, PAD, PAD, PAD ]
  • 24. PRE-PROCESSING THE DATA(CONTINUED..) •BUCKETING •If the largest sentence in our dataset is of length 100, we need to encode all our sentences to be of length 100, in order to not lose any words. Now, what happens to “How are you?” ? There will be 97 PAD symbols in the encoded version of the sentence. This will overshadow the actual information in the sentence. •Bucketing kind of solves this problem, by putting sentences into buckets of different sizes. Consider this list of buckets : [ (5,10), (10,15), (20,25), (40,50) ]. •If the length of a query is 4 and the length of its response is 4 (as in our previous example), we put this sentence in the bucket (5,10). The query will be padded to length 5 and the response will be padded to length 10. •If we are using the bucket (5,10), our sentences will be encoded to : Q : [ PAD, “?”, “you”, “are”, “How” ] A : [ GO, “I”, “am”, “fine”, “.”, EOS, PAD, PAD, PAD, PAD ]
  • 25. WORD EMBEDDING •CO-OCCURRENCE MATRIX Since deep learning loves math, we’re going to represent each word as a d-dimensional vector. Here, 6 distinct word, so each word will be of 6-dim vector.
  • 26. CONTINUE D…. Extracting the rows from this matrix can give us a simple initialization of our word vectors.
  • 27. INFERENCE FROM THE ABOVE EXAMPLE I. Notice that the words ‘love’ and ‘like’ both contain 1’s for their counts with nouns (NLP and dogs). II. They also have 1’s for the count with “I”, thus indicating that the words must be some sort of verb. III. With a larger dataset than just one sentence, it can be imagined that this similarity will become more clear as ‘like’, ‘love’, and other synonyms will begin to have similar word vectors, because of the fact that they are used in similar contexts. LIMITATION I. The dimensionality of each word will increase linearly with the size of the corpus. II. If we had a million words (not really a lot in NLP standards), we’d have a million by million sized matrix which would be extremely sparse (lots of 0’s). Definitely not the best in terms of storage efficiency. Alternatively,
  • 28. WORD2VEC APPROACH •Word2Vec operates on the idea that we want to predict the surrounding words of every word. We’re going to look at the first 3 words of this sentence. Window size m=3. Goal is to take the center word, ‘love’, and predict the words that come before and after it by maximizing/optimizing a function to maximize the log probability of any context word given the current center word. Where log function is: The above cost function is basically saying that we’re going to add the log probabilities of ‘I’ and ‘love’ as well as ‘NLP’ and ‘love’ (where ‘love’ is the center word in both cases).
  • 29. WORD2VEC APPROACH(CONTINUED..) Vc is the word vector of the center word. Every word has two vector representations (Uo and Uw), one for when the word is used as the center word and one for when it’s used as the outer word. The vectors are trained with stochastic gradient descent. Word2Vec seeks to find vector representations of different words by maximizing the log probability of context words given a center word and modifying the vectors through SGD. The most interesting contribution of Word2Vec was the appearance of linear relationships between different word vectors. After training, the word vectors seemed to capture different grammatical and semantic concept. It’s pretty incredible how these linear relationships could be formed through a
  • 30. ALGORITHM OF WORD2VEC •Two algorithms 1. Skip-grams (SG):Predict context words given target (position independent). 2. Continuous Bag of Words (CBOW):Predict target word from bag-of- words context. •Two (moderately efficient) training methods : 1. Hierarchical softmax 2. Negative sampling
  • 32.
  • 33.
  • 34.
  • 35. TO TRAIN THE MODEL: COMPUTE ALL VECTOR GRADIENTS! •We often define the set of all parameters in a model in terms of one long vector Theta. •Then optimize these parameters using gradient descent.
  • 36. SEQUENCE TO SEQUENCE MODEL FOR CHATBOT •Sequence To Sequence model become the Go-To model for Dialogue Systems and Machine Translation. • It consists of two RNNs (Recurrent Neural Network(LSTM or GRU)) : I. An encoder II. A decoder Encoder •The encoder takes a sequence(sentence) as input and processes one symbol(word) at each timestep. •Its objective is to convert a sequence of symbols into a fixed size feature vector that encodes only the important information in the sequence while losing the unnecessary information. •You can visualize data flow in the encoder along the time axis, as the flow of local information from one end of the sequence to another.
  • 37. SEQUENCE TO SEQUENCE MODEL FOR CHATBOT(CONTINUED..) •Each hidden state influences the next hidden state and the final hidden state can be seen as the summary of the sequence. This state is called the context or thought vector, as it represents the intention of the sequence. •From the context, the decoder generates another sequence, one symbol(word) at a time. Here, at each time step, the decoder is influenced by the context and the previously generated symbols.
  • 38. DUAL ENCODER LSTM ALGORITHM FOR SEQ2SEQ 1. Both the context and the response text are split by words, and each word is embedded into a vector. The word embeddings are initialized with Word2Vec Skip gram model of vectors and are fine-tuned during training. 2. Both the embedded context and response are fed into the same Recurrent Neural Network word-by-word. The RNN generates a vector representation that, loosely speaking, captures the “meaning” of the context and response (c and r in the picture). We can choose how large these vectors should be, but let’s say we pick 256 dimensions. 3. We multiply c with a matrix M to “predict” a response r’. If c is a 256-dimensional vector, then M is a 256×256 dimensional matrix, and the result is another 256-dimensional vector, which we can interpret as a generated response. The matrix M is learned during training.
  • 39. DUAL ENCODER LSTM ALGORITHM FOR SEQ2SEQ(CONT.) •We measure the similarity of the predicted response r’ and the actual response r by taking the dot product of these two vectors. •A large dot product means the vectors are similar and that the response should receive a high score. • We then apply a sigmoid function to convert that score into a probability.
  • 40. REFERENCES • https://github.com/Marsan-Ma/chat_corpus (Sources of data for trial ChatBot) •http://www.wildml.com/2015/09/recurrent-neural-networks- tutorial-part-1-introduction-to-rnns/ •http://www.wildml.com/2015/09/recurrent-neural-networks- tutorial-part-2-implementing-a-language-model-rnn-with-python- numpy-and-theano/ •http://www.wildml.com/2015/10/recurrent-neural-networks- tutorial-part-3-backpropagation-through-time-and-vanishing- gradients/ •http://www.wildml.com/2015/10/recurrent-neural-network- tutorial-part-4-implementing-a-grulstm-rnn-with-python-and- http://karpathy.github.io/2015/05/21/rnn-effectiveness/