SlideShare une entreprise Scribd logo
1  sur  152
Télécharger pour lire hors ligne
Recent advances
in applied chatbot technology
to be presented at AI Ukraine 2016
Jordi Carrera Ventura

NLP scientist

Telefónica Research & Development
Outline
Overview
Industrial state-of-the-art
Current challenges
Recent advances
Conclusions
Overview
Chatbots are a form of conversational artificial intelligence (AI) in which a user
interacts with a virtual agent through natural language messaging in
• a messaging interface like Slack or Facebook Messenger
• or a voice interface like Amazon Echo or Google Assistant.
Chatbot1
A bot that lives in a chat (an automation routine inside a UI).
Conversational agent
Virtual Assistant1
A bot that takes natural language as input and returns it as output.
Chatbot2
A virtual assistant that lives in a chat.
User> what are you?
1 Generally assumed to have broader coverage and more advanced AI than chatbots2.
Usually intended to
• get quick answers to a specific questions over some pre-defined
repository of knowledge
• perform transactions in a faster and more natural way.
Can be used to
• surface contextually relevant information
• help a user complete an online transaction
• serve as a helpdesk agent to resolve a customer’s issue without
ever involving a human.
Virtual assistants for
Customer support, e-commerce, expense management, booking
flights, booking meetings, data science...
Use cases
Advantages of chatbots
From: https://medium.com/@csengerszab/messenger-chatbots-is-it-a-new-
revolution-6f550110c940
Actually,
As an interface, chatbots are not the best.

Nothing beats a spreadsheet for most things. For data
already tabulated, a chatbot may actually slow things down.

A well-designed UI can allow a user to do anything in a few
taps, as opposed to a few whole sentences in natural
language.

But chatbots can generally be regarded as universal -every
channel supports text. Chatbots as a UX are effective as a
protocol, like email.
What's really important...
Success conditions

• Talk like a person.

• The "back-end" MUST NOT BE a person.
• Provide a single interface to access all other services.

• Useful in industrial processes

Response automation given specific, highly-repetitive
linguistic inputs. Not conversations, but ~auto-replies
linked to more refined email filters.

• Knowledge representation.
What's really important...
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
What's really important...
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
... and the hype
That was IBM Watson. 🙄
... and the hype
Getting started with bots is as simple as using any of a handful of
new bot platforms that aim to make bot creation easy...
... sophisticated bots require an understanding of natural language
processing (NLP) and other areas of artificial intelligence.
The optimists
... and the hype
Artificial intelligence has advanced remarkably in the last three
years, and is reaching the point where computers can have
moderately complex conversations with users in natural language
<perform simple actions as response to primitive linguistic
signals>.
The optimists
... and the hype
Most of the value of deep learning today is in narrow domains where you
can get a lot of data. Here’s one example of something it cannot do: have
a meaningful conversation. There are demos, and if you cherry-pick the
conversation, it looks like it’s having a meaningful conversation, but if you
actually try it yourself, it quickly goes off the rails.
Andrew Ng (Chief Scientist, AI, Baidu)
... and the hype
Many companies start off by outsourcing their conversations to human
workers and promise that they can “automate” it once they’ve collected
enough data.
That’s likely to happen only if they are operating in a pretty narrow
domain – like a chat interface to call an Uber for example.
Anything that’s a bit more open domain (like sales emails) is beyond what
we can currently do.
However, we can also use these systems to assist human workers by
proposing and correcting responses. That’s much more feasible.
Andrew Ng (Chief Scientist, AI, Baidu)
Industrial state-of-the-art
Lack of suitable metrics
Industrial state-of-the-art
Liu, Chia-Wei, et al. "How NOT to evaluate your dialogue system: An empirical study of
unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv:
1603.08023 (2016).
By Intento (https://inten.to)
Dataset
SNIPS.ai 2017 NLU Benchmark
https://github.com/snipsco/nlu-benchmark
Example intents
SearchCreativeWork (e.g. Find me the I, Robot television show)
GetWeather (e.g. Is it windy in Boston, MA right now?)
BookRestaurant (e.g. I want to book a highly rated restaurant for me and my boyfriend
tomorrow night)
PlayMusic (e.g. Play the last track from Beyoncé off Spotify)
AddToPlaylist (e.g. Add Diamonds to my roadtrip playlist)
RateBook (e.g. Give 6 stars to Of Mice and Men)
SearchScreeningEvent (e.g. Check the showtimes for Wonder Woman in Paris)
Industrial state-of-the-art
Benchmark
Methodology
English language.
Removed duplicates that differ by number of whitespaces, quotes,
lettercase, etc.
Resulting dataset parameters
7 intents, 15.6K utterances (~2K per intent)
3-fold 80/20 cross-validation.
Most providers do not offer programmatic interfaces for adding training
data.
Industrial state-of-the-art
Benchmark
Framework Since F1
False
positives
Response
time
Price
IBM Watson
Conversation
2015 99.7 100% 0.35 2.5
API.ai (Google 2016) 2010 99.6 40% 0.28 Free
Microsoft LUIS 2015 99.2 100% 0.21 0.75
Amazon Lex 2016 96.5 82% 0.43 0.75
Recast.ai 2016 97 75% 2.06 N/A
wit.ai (Facebook) 2013 97.4 72% 0.96 Free
SNIPS 2017 97.5 26% 0.36 Per device
Industrial state-of-the-art
Logarithmic scale!
Current challenges
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
So, let's be real...
We've got what you're looking for!
<image product_id=1 item_id=2>
where can i find a pair of shoes that
match this yak etc.
We've got what you're looking for!
<image product_id=1 item_id=3>
pair of shoes
(, you talking dishwasher)
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
lexical variants:
synonyms, paraphrases
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can be normalized,
e.g. regular expressions
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can also be normalized,
e.g. regular expressions
Entity
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent
This can also be normalized,
e.g. regular expressions
Entity
1-week project on a regular-expression-based filter.
From 89% to 95% F1 with only a 5% positive rate.
7-label domain classification task using a
RandomForest classifier over TFIDF-weighted vectors.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
Since we associate each intent to an action,

the ability to discriminate actions presupposes
training as many separate intents,

with just as many ambiguities.
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
It also presupposes exponentially increasing
amounts of training data as we add

higher-precision entities to our model

(every entity * every intent that applies)
So, let's be real...
lexical variants:
synonyms, paraphrases
Intent My internet connection does not work.
My smartphone does not work.
My landline does not work.
My router does not work.
My SIM does not work.
My payment method does not work.
My saved movies does not work.
ConnectionDebugWizard(*)
TechnicalSupport(Mobile)
TechnicalSupport(Landline)
RouterDebugWizard()
VerifyDeviceSettings
AccountVerificationProcess
TVDebugWizard
The entity resolution step (which would have
helped us make the right decision) is nested
downstream in our pipeline.

Any errors from the previous stage

will propagate down.
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, I want to see
different jackets .... how can i do that
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
buy <red jacket>
So, let's be real...
where can i buy <a men's red leather
jacket with straps> for my friend
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
hi i need <a men's red leather jacket with
straps> that my friend can wear
}Our training

dataset
}Predictions

at runtime
buy <red jacket>
So, let's be real...
where can i buy <a men's red leather
jacket with straps> for my friend
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
hi i need <a men's red leather jacket with
straps> that my friend can wear
}Our training

dataset
}Predictions

at runtime
Incomplete training data

will cause

entity segmentation issues

buy <red jacket>
So, let's be real...
buy <red jacket>
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
But more importantly, in the intent-entity

paradigm we are usually unable to train two

entity types with the same distribution:

where can i buy a <red leather jacket>
where can i buy a <red leather wallet>
jacket, 0.5
wallet, 0.5
jacket, 0.5
wallet, 0.5
So, let's be real...
buy <red jacket>
where can i find a <jacket with straps>
looking for a <black mens jacket>
do you have anything in <leather>
<present> for a friend
}Our training

dataset
... and have no way of detecting entities

used in isolation:

pair of shoes
MakePurchase
ReturnPurchase
['START', 3-gram, 'END']
GetWeather
Greeting
Goodbye
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
searchItemByText("leather jacket")
searchItemByText(
"to buy a men's ... pfff TLDR... my friend"
)
[
{ "name": "leather jacket",
"product_category_id": 12,
"gender_id": 0,
"materials_id": [68, 34, ...],
"features_id": [],
},
...
{ "name": "leather jacket with straps",
"product_category_id": 12,
"gender_id": 1,
"materials_id": [68, 34, ...],
"features_id": [8754],
},
]
that's a nice one but, I want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
hey i need to buy a men's red leather
jacket with straps for my friend
searchItemByText("leather jacket")
searchItemByText(
"red leather jacket with straps"
)
[
{ "name": "leather jacket",
"product_category_id": 12,
"gender_id": 0,
"materials_id": [68, 34, ...],
"features_id": [],
},
...
{ "name": "leather jacket with straps",
"product_category_id": 12,
"gender_id": 1,
"materials_id": [68, 34, ...],
"features_id": [8754],
},
]
NO IDs,
NO reasonable expectation of a
cognitively realistic answer and
NO "conversational technology".
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Straps
Jackets
Leather items
Since the conditions express
an AND logical operator and
we will search for an item
fulfilling all of them, we could
actually search for the terms
in the conjunction in any
order.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection
D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈
d is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, 'leather'), 'straps'), "men's"), 'jacket'),'red'
) =

do filter(filter(filter(filter(

filter(D, 'straps'), 'red'), 'jacket'), 'leather'),"men's"
) =

do filter(filter(filter(filter(

filter(D, "men's"), 'leather'), 'red'), 'straps'),'jacket'
) =
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given query q = {"men's", 'red', 'leather', 'jacket', 'straps'},

find the x: x ∈ q that satisfies:

containsSubstring(x, 'leather') ^ 

containsSubstring(x, 'straps') ^

...

containsSubstring(x, "men's")
As a result, it will lack the notion of what counts as a partial match.
We cannot use more complex predicates because, in this scenario,
the system has no understanding of the internal structure of the
phrase, its semantic dependencies, or its syntactic dependencies.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
The following partial matches will be

virtually indistinguishable:

• leather straps for men's watches (3/5)

• men's vintage red leather shoes (3/5)

• red wallet with leather straps (3/5)

• red leather jackets with hood (3/5)
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
This is our problem:
we're using a function
that treats equally every wx.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
We need a function with at least one
additional parameter, rx
filter(D, wx, rx)
where
rx denotes the depth at which the
dependency tree headed by wx
attaches to the node being
currently evaluated.
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) (NNS men) (POS 's))
(JJ red) (NN leather) (NN jacket))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) ((NNS men) (POS 's))
((JJ red) (NN leather)) (NN jacket)))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
So, let's be real...
Your query
i need to buy a men's red leather jacket with straps for my friend
Tagging
i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN
straps/NNS for/IN my/PRP$ friend/NN
Parse
(ROOT
(S
(NP (FW i))
(VP (VBP need)
(S
(VP (TO to)
(VP (VB buy)
(NP
(NP (DT a) ((NNS men) (POS 's))
((JJ red) (NN leather)) (NN jacket)))
(PP (IN with)
(NP
(NP (NNS straps))
(PP (IN for)
(NP (PRP$ my) (NN friend)))))))))))
http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
(red, leather)

(leather, jacket)

(jacket, buy)
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
do filter(filter(filter(filter(

filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps'
)

Given

• any known word, denoted by w, 

• the vocabulary V of all known words, such that wi ∈ V for any i,

• search space D, where d denotes every document in a collection D
such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V,
• a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d
is true for every d: d ∈ Dwx, and
• query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
The output of this function
will no longer be a subset of
search results but a ranked list.
This list can be naively ranked by
∑x
|t| (1 / rx)
although much more advanced
scoring functions can be used.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> t1 = dependencyTree(a1)
>>> t1.depth("black")
3
>>> t1.depth("jacket")
1
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(D, "jacket", t1.depth("jacket"))
>>> results
[ ( Jacket1, 1.0 ),
( Jacket2, 1.0 ),
...,
( Jacketn, 1.0 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(results, "leather", t1.depth("leather"))
>>> results
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
>>> results = filter(results, "red", t1.depth("red"))
>>> results
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> t2 = dependencyTree(a2)
>>> t2.depth("red")
3
>>> t2.depth("jacket")
None
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(D, "jacket", t2.depth("jacket"))
>>> results
[]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
>>> results = filter(results, "leather", t2.depth("leather"))
>>> results
[ ( Wallet1, 0.5 ),
( Wallet2, 0.5 ),
...,
( Strapn, 0.5 )
]
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Given:
q: "red leather jacket"
a1: "black leather jacket"
a2: "red leather wallet"
>>> results = filter(results, "red", t2.depth("red"))
>>> results
[ ( Wallet1, 0.83 ),
( Wallet2, 0.83 ),
...,
( Strapn - x, 0.83 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
Results for q2:
"red leather wallet"
[ ( Wallet1, 0.83 ),
( Wallet2, 0.83 ),
...,
( Strapn - x, 0.83 )
]
Results for q1:
"black leather jacket"
[ ( Jacket1, 1.5 ),
( Jacket2, 1.5 ),
...,
( Jacketn - x, 1.5 )
]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
However, the dependency tree is not always available, and it
often contains parsing issues (cf. Stanford parser output).
An evaluation of parser robustness over noisy input reports
performances down to an average of 80% across datasets
and parsers.
Hashemi, Homa B., and Rebecca Hwa. "An Evaluation of Parser Robustness for
Ungrammatical Sentences." EMNLP. 2016.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
[ [red leather] jacket ]👍
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]👍
In order to make this decision
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]👍
In order to make this decision
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
😱
In order to make this decision
[ [men's leather] jacket ]
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]😱
In order to make this decision
... the parser already needs
as much information as we
can provide regarding head-
modifier attachments.
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
In order to make this decision
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
Same pattern, different score. How to assign a probability such
that less plausible results are buried at the bottom of the
hypotheses space? (not only for display purposes, but for
dynamic programming reasons)
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
In order to make this decision
p(red | jacket ) ~
p(red | leather )
p(men's | jacket ) >
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1 p(men's | leather )
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
That's a lot of data to model. Where will we get it from?
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
That's a lot of data to model. Where will we get it from?
p(Color | ClothingItem ) ~
p(Color | Material )
p(Person | ClothingItem ) >
[ [red leather] jacket ]
[ red [leather jacket] ]
[ men's [leather jacket] ]
[ [men's leather] jacket ]
2 2 1
2 1 1
2 1 1
2 2 1 p(Person | Material )
So, let's be real...
hey i need to buy a men's red leather
jacket with straps for my friend
If our taxonomy is well built, semantic associations
acquired for a significant number of members of each
class will propagate to new, unseen members of that class.
With a sufficiently rich NER detection pipeline and a semantic
relation database large enough, we can rely on simple inference
for a vast majority of the disambiguations.
More importantly, we can weight candidate hypotheses
dynamically in order to make semantic parsing tractable
over many possible syntactic trees.
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, I want to see
different jackets .... how can i do that
lexical variants:
synonyms, paraphrases formal variants
(typos, ASR errors)
multiword
chunking/parsing
morphological
variants
entity recognition
entity linking
hierarchical structure
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
So, let's be real...
Welcome to Macy's blah-blah... !
hi im a guy looking for leda jackets
We've got what you're looking for!
<image product_id=1 item_id=1>
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
that
one
do that
LastResult
jacket
LastCommand
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
that LastResult
IF:

pPoS-tagging(that_DT | * be_VBZ, START *) > pPoS-tagging(that_IN | * be_VBZ, START *)
that_DT 's_VBZ great_JJ !_PUNCT
that_DT 's_VBZ the_DT last_JJ...
he said that_IN i_PRP...
he said that_IN trump_NNP...
THEN IF:

pisCoreferent(LastValueReturned | that_DT) > u # may also need to condition on the
# dependency: (x, nice)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
one jacket
Find the LastResult in the context that maximizes:

pisCoreferent(

LastResult.type=x |
one_CD, # trigger for the function, trivial condition
QualitativeEvaluation(User, x),
ContextDistance(LastResult.node)
)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
one jacket
Particularly important
for decision-tree re-entry
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
Find the LastResult in the context that maximizes:

pisCoreferent(

LastResult.type=x |
one_CD, # trigger for the function, trivial condition
QualitativeEvaluation(User, x),
ContextDistance(LastResult.node)
)
So, let's be real...
no! forget the last thing i said
go back!
third option in the menu
i want to modify the address I gave you
So, let's be real...
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
}A single utterance
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
So, let's be real...
}A single utterance
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
So, let's be real...
}random.choice(I)
x if x ∈ I ^
length(x) = argmax(L)
I = [ i1, i2, i3 ]

L = [ | i1 |, | i2 |, | i3 | ]

P = [ p( i1 ), p( i2 ), p( i3 )]
x if x ∈ I ^
p(x) = argmax(P)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
QualitativeEvaluation(Useru, ClothingItemi)
BrowseCatalogue(Useru, filter=[ClothingItemn, j
|J|])
HowTo(Useru, Commandc
|Context|)
So, let's be real...
}
random.choice(I)
x if x ∈ I ^
length(x) = argmax(L)
I = [ i1, i2, i3 ]

L = [ | i1 |, | i2 |, | i3 | ]

P = [ p( i1 | q ), p( i2 | q ), p( i3 | q ) ]
x if x ∈ I ^
p(x | q) = argmax(P)
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
However, this approach will still suffer from
segmentation issues due to passing many
more arguments than expected for resolving
any specific intent.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
A standard API

will normally
provide these
definitions as the
declaration of its
methods.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
We can map the
argument signatures
to the nodes of the
taxonomy we are
using for linguistic
inference.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}The system will
then be able to link
API calls to entity
parses dynamically.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
Over time, new
methods added to
the API will be
automatically
supported by the
pre-existing
linguistic engine.
So, let's be real...
GetInfo(Useru, argmax(Commandc
|Context| ) )
SetPreference(Useru, SetRating(ClothingItemi, k, 1) )
that's a nice one but, i want to see
different jackets .... how can i do that
LastResult is a nice jacket but, i want
to see different jackets .... how can i
LastCommand
∀(ClothingItemn, m
|N| |M| | n = i, k = * )
}
Over time, new
methods added to
the API will be
automatically
supported by the
pre-existing
linguistic engine.
😁
So, let's be real...
If the system can reasonably solve nested semantic
dependencies hierarchically,

it can also be extended to handle multi-intent utterances in a
natural way.
So, let's be real...
We need:

• semantic association measures between the leaves of the tree in
order to attach them to the right nodes and be able to stop growing
a tree at the right level,

• a syntactic tree to build the dependencies, either using a parser or a
PCFG/LFG grammar (both may take input from the previous step,
which will provide the notion of verbal valence: Person Browse
Product),

• for tied analyses, an interface with the user to request clarification,
So, let's be real...
Convert to abstract entity types
Activate applicable
dependencies/grammar rules

(fewer and less ambiguous
thanks to the previous step)
i need to pay my bill and i cannot log into my account
i need to PAYMENT my INVOICE and i ISSUE ACCESS ACCOUNT
Resolve attachments and rank
candidates by semantic association,
keep top-n hypotheses
i need to [ PAYMENT my INVOICE ] and i [ ISSUE [ AccessAccount() ] ]
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
AND usually joins

ontologically coordinate elements,

e.g. books and magazines versus
*humans and Donald Trump
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
Even if

p0 = p( Payment | Issue ) > u,
we will still split the intents because here we have

py = p( Payment | Issue, y ),

and normally py < p0 for any irrelevant case (i.e., the
posterior probability of the candidate attachment will not beat
the baseline of its own prior probability)
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
I.e., Payment may have been attached to Issue if no other element
had been attached to it before.

Resolution order, as given by attachment probabilities, matters.
i need to [ Payment(x) ] and i [ Issue(y) ]
So, let's be real...
i need to [ Payment(x) ] i [ Issue(y) ]
i need to [ Payment(x) ] and i [ Issue(y) ]
Multi-intents solved 😎
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Word embeddings, domain
thesaurus, fuzzy matching
Inverted indexes, character-
based neural networks for NER
Summary of desiderata
1. Normalization
1. Ability to handle input indeterminacy by activating several
candidate hypotheses

2. Ability to handle synonyms

3. Ability to handle paraphrases

2. NER
1. Transversal (cross-intent)

2. No nesting under classification (feedback loop instead)

3. Contextually unbounded

4. Entity-linked

3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Word embeddings, domain
thesaurus, fuzzy matching
Inverted indexes, character-
based neural networks for NER
implicitly solved by Supersenses

+ Neural enquirer
implicitly solved by Neural
enquirer (layered/modular)
Recent advances
3.1 Supersense inference
GFT (Google Fine-Grained) taxonomy
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.1 Supersense inference
FIGER taxonomy
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.1 Supersense inference
for every position, term, tag tuple in TermExtractor3(s):
for every document d in D2:
FB = Freebase
owem = OurWordEmbeddingsModel
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
for every sent s in d:
sentence_vector = None
if not sentence_vector:
sentence_vector = vectorizeSentence(s, position)
owem[ T[tag] ].update( sentence_vector )
T = OurTaxonomy1
3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities.
2 D = 133,000 news documents.
1 FB labels are manually mapped to fine-grained labels.
3.1 Supersense inference
for every position, term, tag tuple in TermExtractor3(s):
for every document d in D2:
FB = Freebase
owem = OurWordEmbeddingsModel
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
for every sent s in d:
sentence_vector = None
if not sentence_vector:
sentence_vector = vectorizeSentence(s, position)
owem[ T[tag] ].update( sentence_vector )
T = OurTaxonomy1
3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities.
2 D = 133,000 news documents.
1 FB labels are manually mapped to fine-grained labels.
3.1 Supersense inference
Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods
for Fine Grained Entity Type Classification." ACL (2). 2015.
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * # the weights that minimize the loss for this column

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
"olympic games near kangaroos" 👍
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * 

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
"olympic games near kangaroos"
Risk of false positives?
Already as probabilities!!
👍
q = "Which city hosted the longest game before the game in Beijing?"
KB = knowledge base consisting of tables that store facts

encoder = bi-directional Recurrent Neural Network

v = encoder(q) # vector of the encoded query

kb_encoder = KB table encoder
fn = embedding of the name of the n-th column 

wmn = embedding of the value in the n-th column of the m-th row

[x ; y] = vector concatenation

W = weights (what we will e.g. SGD out of the data)

b = bias
normalizeBetween0and1(

penaltiesOrRewards * 

[ {sydney... australia... kangaroo...} + {city... town... place...} ] +

somePriorExpectations 

)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?
Olympic Games
City Year Duration
Barcelona 1992 15
Atlanta 1996 18
Sydney 2000 17
Athens 2004 14
Beijing 2008 16
London 2012 16
Rio de Janeiro 2016 18
Q(i) =
T(i) =
y(i) =
RNN-encoded = p(vector("column:city") | [ vector("which"), vector("city")... ] ) >
p(vector("column:city") | [ vector("city"), vector("which")... ] )
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
As many as SQL
operations
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Each executor
applies the same
set of weights to all
rows (since it
models a single
operation)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
The logic of max, min
functions is built-in as
column-wise
operations. The system
learns over which
ranges they apply
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
The logic of max, min
functions is built-in as
column-wise
operations. The system
learns over which
ranges they apply
(before, after are just
temporal versions of
argmax and argmin)
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Candidate vector at layer L for
row m of our database
Memory layer with the last output
vector generated by executor L - 1
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
As an attention mechanism,

cumulatively weight the vector by the output
weights from executor L - 1
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
This allows the system to restrict
the reference at each step.
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
Which city hosted the 2012 Olympic Games?Q(i) =
RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) >
p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L
p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) >
p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1
Executor layer L - 2
p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) >
p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] )
Row m, R = our table
Set of columns
As we weight the vector in the direction of
each successive operation, it gets closer to
any applicable answer in our DB.
Evaluate a
candidate:
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
SbS is N2N with bias.
SEMPRE is a toolkit for training semantic parsers, which map
natural language utterances to denotations (answers) via
intermediate logical forms. Here's an example for querying
databases. https://nlp.stanford.edu/software/sempre/
3.2 Semantic trees
Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with
natural language." arXiv preprint arXiv:1512.00965 (2015).
SbS is N2N with bias.
SEMPRE is a toolkit for training semantic parsers, which map
natural language utterances to denotations (answers) via
intermediate logical forms. Here's an example for querying
databases. https://nlp.stanford.edu/software/sempre/
😳
3.2 Semantic trees
3.3 Re-entry
Task: given a set of facts or a story, answer questions on those
facts in a logically consistent way.

Problem: Theoretically, it could be achieved by a language
modeler such as a recurrent neural network (RNN).

However, their memory (encoded by hidden states and weights)
is typically too small, and is not compartmentalized enough to
accurately remember facts from the past (knowledge is
compressed into dense vectors).

RNNs are known to have difficulty in memorization tasks, i.e.,
outputting the same input sequence they have just read.
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Task: given a set of facts or a story, answer questions on those
facts in a logically consistent way.

Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Jason went to the kitchen.
Jason picked up the milk.
Jason travelled to the office.
Jason left the milk there.
Jason went to the bathroom.
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Where is the milk? milk?
3.3 Re-entry
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

for fact f in F: # any linguistic pre-processing may apply

store f in the next available memory slot of M
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the reading stage...
3.3 Re-entry
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

O = inference module (oracle)

x = input

q = query about our known facts

k = number of candidate facts to be retrieved

m = a supporting fact (typically, the one that maximizes support)
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage...
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage...
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
F = set of facts (e.g. recent context of conversation; sentences)

M = memory

O = inference module (oracle)

x = input

q = query about our known facts

k = number of candidate facts to be retrieved

m = a supporting fact (typically, the one that maximizes support)
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t1...
milk
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Candidate space for k = 2
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
argmax over that space
milk, [ ]
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t2...
milk
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Candidate space for k = 2
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
argmax over that space
milk, jason, drop
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
At the inference stage, t2...
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
jason go kitchen
jason get milk
jason go office
jason drop milk
jason go bathroom
Why not?
jason, get, milk = does not contribute as much
new information as jason go office
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
Training is performed with a margin ranking loss and stochastic
gradient descent (SGD). 

Simple model.

Easy to integrate with any pipeline already working with
structured facts (e.g. Neural Inquirer). Adds a temporal dimension
to it.

Answering the question about the location of the milk requires
comprehension of the actions picked up and left.
3.3 Re-entry
Memory networks
Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory
networks." arXiv preprint arXiv:1410.3916 (2014).
QACNN CBT-NE CBT-CN Date
69.4 66.6 63.0
Kadlec, Rudolf, et al. "Text understanding with the
attention sum reader network." arXiv preprint arXiv:
1603.01547 (2016).
75.4 71.0 68.9 4 Mar 16
Sordoni, Alessandro, et al. "Iterative alternating neural
attention for machine reading." arXiv preprint arXiv:
1606.02245 (2016).
76.1 72.0 71.0 7 Jun 16
Trischler, Adam, et al. "Natural language comprehension
with the epireader." arXiv preprint arXiv:1606.02270
(2016).
74.0 71.8 70.6 7 Jun 16
Dhingra, Bhuwan, et al. "Gated-attention readers for text
comprehension." arXiv preprint arXiv:1606.01549 (2016).
77.4 71.9 69.0 5 Jun 16
3.3 Re-entry
0
25
50
75
100
KB IE Wikipedia
76.2
68.3
93.9
69.9
63.4
78.5
No Knowledge
(embeddings)
Standard QA
System on KB
54.4%
93.5%
17%
Memory Networks Key-Value Memory Networks
Responseaccuracy
(%)
Source: Antoine Bordes, Facebook AI Research, LXMLS Lisbon July 28, 2016
Results on MovieQA
Conclusions
Summary of desiderata
3. Semantic parsing
1. Taxonomy-driven supersense inference

2. Semantically-informed syntactic trees

3. Ability to handle decision-tree re-entries
Extend the neuralized architecture's domain by integrating support
for supersenses. Enable inference at training and test time.
Enable semantic parsing using a neuralized architecture with respect
to some knowledge base serving as ground truth.
Use memory networks to effectively keep track of the entities
throughout the conversation and ensure logical consistency.
Extend memory networks by integrating support for supersenses.
Enable inference at training and test time.
Summary of desiderata
Conversational technology should really be about

• dynamically finding the best possible way to browse a large
repository of information/actions.

• find the shortest path to any relevant action or piece of
information (to avoid the plane dashboard effect).

• surfacing implicit data in unstructured content ("bocadillo de
calamares in Madrid"). Rather than going open-domain, taking
the closed-domain and going deep into it.
Remember?
thanks!

Contenu connexe

Tendances

A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3Ishan Jain
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3Raven Jiang
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
From ELIZA to Alexa and Beyond
From ELIZA to Alexa and BeyondFrom ELIZA to Alexa and Beyond
From ELIZA to Alexa and BeyondCharmi Chokshi
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingIRJET Journal
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsHimanshu kandwal
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformJordi Cabot
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Open ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 minsOpen ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 minsAnshul Nema
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System ReviewNguyen Quang
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 

Tendances (20)

A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3A brief primer on OpenAI's GPT-3
A brief primer on OpenAI's GPT-3
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
From ELIZA to Alexa and Beyond
From ELIZA to Alexa and BeyondFrom ELIZA to Alexa and Beyond
From ELIZA to Alexa and Beyond
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
An Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for CounsellingAn Intelligent Career Counselling Bot A System for Counselling
An Intelligent Career Counselling Bot A System for Counselling
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Chatbots and AI
Chatbots and AIChatbots and AI
Chatbots and AI
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Lessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platformLessons learned from building a commercial bot development platform
Lessons learned from building a commercial bot development platform
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Open ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 minsOpen ai’s gpt 3 language explained under 5 mins
Open ai’s gpt 3 language explained under 5 mins
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 

Similaire à Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Pythonamyiris
 
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robotsMeetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robotsDigipolis Antwerpen
 
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...LavaConConference
 
Getting ready for voice
Getting ready for voiceGetting ready for voice
Getting ready for voiceMaarten Dings
 
Python enterprise vento di liberta
Python enterprise vento di libertaPython enterprise vento di liberta
Python enterprise vento di libertaSimone Federici
 
The Age of Conversational Agents
The Age of Conversational AgentsThe Age of Conversational Agents
The Age of Conversational AgentsFaction XYZ
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...DataScienceConferenc1
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-BotIRJET Journal
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityMarco Parenzan
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxSahithiGurlinka
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechChristopher Mohritz
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19Paya Do
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by SystangoSystango
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine LearningOgilvy Consulting
 
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first Patrick Van Renterghem
 
Demystifying AI-chatbots Just add CUI to your business apps
Demystifying AI-chatbots Just add CUI to your business appsDemystifying AI-chatbots Just add CUI to your business apps
Demystifying AI-chatbots Just add CUI to your business appsGrid Dynamics
 

Similaire à Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura (20)

ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
AI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using PythonAI and Python: Developing a Conversational Interface using Python
AI and Python: Developing a Conversational Interface using Python
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 
ms_3.pdf
ms_3.pdfms_3.pdf
ms_3.pdf
 
iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
 
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robotsMeetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots
 
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
Fabrice Lacroix - Connecting a Chatbot to Your Technical Content: Myth and Re...
 
Getting ready for voice
Getting ready for voiceGetting ready for voice
Getting ready for voice
 
Python enterprise vento di liberta
Python enterprise vento di libertaPython enterprise vento di liberta
Python enterprise vento di liberta
 
The Age of Conversational Agents
The Age of Conversational AgentsThe Age of Conversational Agents
The Age of Conversational Agents
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-Bot
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer Speech
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
Conversational experience by Systango
Conversational experience by SystangoConversational experience by Systango
Conversational experience by Systango
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
 
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
Filip Maertens - AI, Machine Learning and Chatbots: Think AI-first
 
Demystifying AI-chatbots Just add CUI to your business apps
Demystifying AI-chatbots Just add CUI to your business appsDemystifying AI-chatbots Just add CUI to your business apps
Demystifying AI-chatbots Just add CUI to your business apps
 

Plus de Grammarly

Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonGrammarly
 

Plus de Grammarly (13)

Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 

Dernier

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Dernier (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jordi Carrera Ventura

  • 1. Recent advances in applied chatbot technology to be presented at AI Ukraine 2016 Jordi Carrera Ventura NLP scientist Telefónica Research & Development
  • 4. Chatbots are a form of conversational artificial intelligence (AI) in which a user interacts with a virtual agent through natural language messaging in • a messaging interface like Slack or Facebook Messenger • or a voice interface like Amazon Echo or Google Assistant. Chatbot1 A bot that lives in a chat (an automation routine inside a UI). Conversational agent Virtual Assistant1 A bot that takes natural language as input and returns it as output. Chatbot2 A virtual assistant that lives in a chat. User> what are you? 1 Generally assumed to have broader coverage and more advanced AI than chatbots2.
  • 5. Usually intended to • get quick answers to a specific questions over some pre-defined repository of knowledge • perform transactions in a faster and more natural way. Can be used to • surface contextually relevant information • help a user complete an online transaction • serve as a helpdesk agent to resolve a customer’s issue without ever involving a human. Virtual assistants for Customer support, e-commerce, expense management, booking flights, booking meetings, data science... Use cases
  • 6. Advantages of chatbots From: https://medium.com/@csengerszab/messenger-chatbots-is-it-a-new- revolution-6f550110c940
  • 7. Actually, As an interface, chatbots are not the best. Nothing beats a spreadsheet for most things. For data already tabulated, a chatbot may actually slow things down. A well-designed UI can allow a user to do anything in a few taps, as opposed to a few whole sentences in natural language. But chatbots can generally be regarded as universal -every channel supports text. Chatbots as a UX are effective as a protocol, like email.
  • 8. What's really important... Success conditions • Talk like a person. • The "back-end" MUST NOT BE a person. • Provide a single interface to access all other services. • Useful in industrial processes Response automation given specific, highly-repetitive linguistic inputs. Not conversations, but ~auto-replies linked to more refined email filters. • Knowledge representation.
  • 9. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  • 10. What's really important... Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it.
  • 11. ... and the hype
  • 12. ... and the hype
  • 13. ... and the hype
  • 14. ... and the hype
  • 15. ... and the hype
  • 16. ... and the hype
  • 17. ... and the hype
  • 18. ... and the hype
  • 19. ... and the hype
  • 20. ... and the hype That was IBM Watson. 🙄
  • 21. ... and the hype Getting started with bots is as simple as using any of a handful of new bot platforms that aim to make bot creation easy... ... sophisticated bots require an understanding of natural language processing (NLP) and other areas of artificial intelligence. The optimists
  • 22. ... and the hype Artificial intelligence has advanced remarkably in the last three years, and is reaching the point where computers can have moderately complex conversations with users in natural language <perform simple actions as response to primitive linguistic signals>. The optimists
  • 23. ... and the hype Most of the value of deep learning today is in narrow domains where you can get a lot of data. Here’s one example of something it cannot do: have a meaningful conversation. There are demos, and if you cherry-pick the conversation, it looks like it’s having a meaningful conversation, but if you actually try it yourself, it quickly goes off the rails. Andrew Ng (Chief Scientist, AI, Baidu)
  • 24. ... and the hype Many companies start off by outsourcing their conversations to human workers and promise that they can “automate” it once they’ve collected enough data. That’s likely to happen only if they are operating in a pretty narrow domain – like a chat interface to call an Uber for example. Anything that’s a bit more open domain (like sales emails) is beyond what we can currently do. However, we can also use these systems to assist human workers by proposing and correcting responses. That’s much more feasible. Andrew Ng (Chief Scientist, AI, Baidu)
  • 26. Lack of suitable metrics Industrial state-of-the-art Liu, Chia-Wei, et al. "How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation." arXiv preprint arXiv: 1603.08023 (2016).
  • 27. By Intento (https://inten.to) Dataset SNIPS.ai 2017 NLU Benchmark https://github.com/snipsco/nlu-benchmark Example intents SearchCreativeWork (e.g. Find me the I, Robot television show) GetWeather (e.g. Is it windy in Boston, MA right now?) BookRestaurant (e.g. I want to book a highly rated restaurant for me and my boyfriend tomorrow night) PlayMusic (e.g. Play the last track from Beyoncé off Spotify) AddToPlaylist (e.g. Add Diamonds to my roadtrip playlist) RateBook (e.g. Give 6 stars to Of Mice and Men) SearchScreeningEvent (e.g. Check the showtimes for Wonder Woman in Paris) Industrial state-of-the-art Benchmark
  • 28. Methodology English language. Removed duplicates that differ by number of whitespaces, quotes, lettercase, etc. Resulting dataset parameters 7 intents, 15.6K utterances (~2K per intent) 3-fold 80/20 cross-validation. Most providers do not offer programmatic interfaces for adding training data. Industrial state-of-the-art Benchmark
  • 29. Framework Since F1 False positives Response time Price IBM Watson Conversation 2015 99.7 100% 0.35 2.5 API.ai (Google 2016) 2010 99.6 40% 0.28 Free Microsoft LUIS 2015 99.2 100% 0.21 0.75 Amazon Lex 2016 96.5 82% 0.43 0.75 Recast.ai 2016 97 75% 2.06 N/A wit.ai (Facebook) 2013 97.4 72% 0.96 Free SNIPS 2017 97.5 26% 0.36 Per device Industrial state-of-the-art
  • 32. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  • 33. So, let's be real... We've got what you're looking for! <image product_id=1 item_id=2> where can i find a pair of shoes that match this yak etc. We've got what you're looking for! <image product_id=1 item_id=3> pair of shoes (, you talking dishwasher)
  • 34. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets lexical variants: synonyms, paraphrases
  • 35. So, let's be real... lexical variants: synonyms, paraphrases Intent
  • 36. So, let's be real... lexical variants: synonyms, paraphrases Intent Entity
  • 37. So, let's be real... lexical variants: synonyms, paraphrases Intent This can be normalized, e.g. regular expressions Entity
  • 38. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity
  • 39. So, let's be real... lexical variants: synonyms, paraphrases Intent This can also be normalized, e.g. regular expressions Entity 1-week project on a regular-expression-based filter. From 89% to 95% F1 with only a 5% positive rate. 7-label domain classification task using a RandomForest classifier over TFIDF-weighted vectors.
  • 40. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work.
  • 41. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard
  • 42. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard Since we associate each intent to an action, the ability to discriminate actions presupposes training as many separate intents, with just as many ambiguities.
  • 43. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard It also presupposes exponentially increasing amounts of training data as we add higher-precision entities to our model (every entity * every intent that applies)
  • 44. So, let's be real... lexical variants: synonyms, paraphrases Intent My internet connection does not work. My smartphone does not work. My landline does not work. My router does not work. My SIM does not work. My payment method does not work. My saved movies does not work. ConnectionDebugWizard(*) TechnicalSupport(Mobile) TechnicalSupport(Landline) RouterDebugWizard() VerifyDeviceSettings AccountVerificationProcess TVDebugWizard The entity resolution step (which would have helped us make the right decision) is nested downstream in our pipeline. Any errors from the previous stage will propagate down.
  • 45. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants
  • 46. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend that's a nice one but, I want to see different jackets .... how can i do that
  • 47. So, let's be real... where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset buy <red jacket>
  • 48. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime buy <red jacket>
  • 49. So, let's be real... where can i buy <a men's red leather jacket with straps> for my friend where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend hi i need <a men's red leather jacket with straps> that my friend can wear }Our training dataset }Predictions at runtime Incomplete training data will cause entity segmentation issues buy <red jacket>
  • 50. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset But more importantly, in the intent-entity paradigm we are usually unable to train two entity types with the same distribution: where can i buy a <red leather jacket> where can i buy a <red leather wallet> jacket, 0.5 wallet, 0.5 jacket, 0.5 wallet, 0.5
  • 51. So, let's be real... buy <red jacket> where can i find a <jacket with straps> looking for a <black mens jacket> do you have anything in <leather> <present> for a friend }Our training dataset ... and have no way of detecting entities used in isolation: pair of shoes MakePurchase ReturnPurchase ['START', 3-gram, 'END'] GetWeather Greeting Goodbye
  • 52. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "to buy a men's ... pfff TLDR... my friend" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ]
  • 53. that's a nice one but, I want to see different jackets .... how can i do that So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition hey i need to buy a men's red leather jacket with straps for my friend searchItemByText("leather jacket") searchItemByText( "red leather jacket with straps" ) [ { "name": "leather jacket", "product_category_id": 12, "gender_id": 0, "materials_id": [68, 34, ...], "features_id": [], }, ... { "name": "leather jacket with straps", "product_category_id": 12, "gender_id": 1, "materials_id": [68, 34, ...], "features_id": [8754], }, ] NO IDs, NO reasonable expectation of a cognitively realistic answer and NO "conversational technology".
  • 54. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Straps Jackets Leather items Since the conditions express an AND logical operator and we will search for an item fulfilling all of them, we could actually search for the terms in the conjunction in any order.
  • 55. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'},
  • 56. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, 'leather'), 'straps'), "men's"), 'jacket'),'red' ) = do filter(filter(filter(filter( filter(D, 'straps'), 'red'), 'jacket'), 'leather'),"men's" ) = do filter(filter(filter(filter( filter(D, "men's"), 'leather'), 'red'), 'straps'),'jacket' ) =
  • 57. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, find the x: x ∈ q that satisfies: containsSubstring(x, 'leather') ^ containsSubstring(x, 'straps') ^ ... containsSubstring(x, "men's") As a result, it will lack the notion of what counts as a partial match. We cannot use more complex predicates because, in this scenario, the system has no understanding of the internal structure of the phrase, its semantic dependencies, or its syntactic dependencies.
  • 58. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend The following partial matches will be virtually indistinguishable: • leather straps for men's watches (3/5) • men's vintage red leather shoes (3/5) • red wallet with leather straps (3/5) • red leather jackets with hood (3/5)
  • 59. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, This is our problem: we're using a function that treats equally every wx.
  • 60. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, We need a function with at least one additional parameter, rx filter(D, wx, rx) where rx denotes the depth at which the dependency tree headed by wx attaches to the node being currently evaluated.
  • 61. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) (NNS men) (POS 's)) (JJ red) (NN leather) (NN jacket)) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  • 62. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017)
  • 63. So, let's be real... Your query i need to buy a men's red leather jacket with straps for my friend Tagging i/FW need/VBP to/TO buy/VB a/DT men/NNS 's/POS red/JJ leather/NN jacket/NN with/IN straps/NNS for/IN my/PRP$ friend/NN Parse (ROOT (S (NP (FW i)) (VP (VBP need) (S (VP (TO to) (VP (VB buy) (NP (NP (DT a) ((NNS men) (POS 's)) ((JJ red) (NN leather)) (NN jacket))) (PP (IN with) (NP (NP (NNS straps)) (PP (IN for) (NP (PRP$ my) (NN friend))))))))))) http://nlp.stanford.edu:8080/parser/index.jsp (as of September 19th 2017) (red, leather) (leather, jacket) (jacket, buy)
  • 64. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend do filter(filter(filter(filter( filter(D, "men's"), 'red'), 'leather'), 'jacket'),'straps' ) Given • any known word, denoted by w, • the vocabulary V of all known words, such that wi ∈ V for any i, • search space D, where d denotes every document in a collection D such that d ∈ D and d = {wi, wi + 1, ..., wn}, such that wi ≤ x ≤ n ∈ V, • a function filter(D, wx) that returns a subset of D, Dwx where wx ∈ d is true for every d: d ∈ Dwx, and • query q = {"men's", 'red', 'leather', 'jacket', 'straps'}, The output of this function will no longer be a subset of search results but a ranked list. This list can be naively ranked by ∑x |t| (1 / rx) although much more advanced scoring functions can be used.
  • 65. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 66. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t1 = dependencyTree(a1) >>> t1.depth("black") 3 >>> t1.depth("jacket") 1 Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 67. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t1.depth("jacket")) >>> results [ ( Jacket1, 1.0 ), ( Jacket2, 1.0 ), ..., ( Jacketn, 1.0 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 68. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t1.depth("leather")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 69. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t1.depth("red")) >>> results [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  • 70. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> t2 = dependencyTree(a2) >>> t2.depth("red") 3 >>> t2.depth("jacket") None Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 71. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(D, "jacket", t2.depth("jacket")) >>> results [] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 72. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend >>> results = filter(results, "leather", t2.depth("leather")) >>> results [ ( Wallet1, 0.5 ), ( Wallet2, 0.5 ), ..., ( Strapn, 0.5 ) ] Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet"
  • 73. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Given: q: "red leather jacket" a1: "black leather jacket" a2: "red leather wallet" >>> results = filter(results, "red", t2.depth("red")) >>> results [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ]
  • 74. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend Results for q2: "red leather wallet" [ ( Wallet1, 0.83 ), ( Wallet2, 0.83 ), ..., ( Strapn - x, 0.83 ) ] Results for q1: "black leather jacket" [ ( Jacket1, 1.5 ), ( Jacket2, 1.5 ), ..., ( Jacketn - x, 1.5 ) ]
  • 75. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend However, the dependency tree is not always available, and it often contains parsing issues (cf. Stanford parser output). An evaluation of parser robustness over noisy input reports performances down to an average of 80% across datasets and parsers. Hashemi, Homa B., and Rebecca Hwa. "An Evaluation of Parser Robustness for Ungrammatical Sentences." EMNLP. 2016.
  • 76. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision [ [red leather] jacket ]👍
  • 77. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ]👍 In order to make this decision
  • 78. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ]👍 In order to make this decision
  • 79. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] 😱 In order to make this decision [ [men's leather] jacket ]
  • 80. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ]😱 In order to make this decision ... the parser already needs as much information as we can provide regarding head- modifier attachments.
  • 81. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] In order to make this decision 2 2 1 2 1 1 2 1 1 2 2 1
  • 82. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision Same pattern, different score. How to assign a probability such that less plausible results are buried at the bottom of the hypotheses space? (not only for display purposes, but for dynamic programming reasons) [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  • 83. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend In order to make this decision p(red | jacket ) ~ p(red | leather ) p(men's | jacket ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(men's | leather )
  • 84. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1
  • 85. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend That's a lot of data to model. Where will we get it from? p(Color | ClothingItem ) ~ p(Color | Material ) p(Person | ClothingItem ) > [ [red leather] jacket ] [ red [leather jacket] ] [ men's [leather jacket] ] [ [men's leather] jacket ] 2 2 1 2 1 1 2 1 1 2 2 1 p(Person | Material )
  • 86. So, let's be real... hey i need to buy a men's red leather jacket with straps for my friend If our taxonomy is well built, semantic associations acquired for a significant number of members of each class will propagate to new, unseen members of that class. With a sufficiently rich NER detection pipeline and a semantic relation database large enough, we can rely on simple inference for a vast majority of the disambiguations. More importantly, we can weight candidate hypotheses dynamically in order to make semantic parsing tractable over many possible syntactic trees.
  • 87. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, I want to see different jackets .... how can i do that lexical variants: synonyms, paraphrases formal variants (typos, ASR errors) multiword chunking/parsing morphological variants entity recognition entity linking hierarchical structure
  • 88. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that
  • 89. So, let's be real... Welcome to Macy's blah-blah... ! hi im a guy looking for leda jackets We've got what you're looking for! <image product_id=1 item_id=1> that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 90. So, let's be real... that one do that LastResult jacket LastCommand that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 91. So, let's be real... that LastResult IF: pPoS-tagging(that_DT | * be_VBZ, START *) > pPoS-tagging(that_IN | * be_VBZ, START *) that_DT 's_VBZ great_JJ !_PUNCT that_DT 's_VBZ the_DT last_JJ... he said that_IN i_PRP... he said that_IN trump_NNP... THEN IF: pisCoreferent(LastValueReturned | that_DT) > u # may also need to condition on the # dependency: (x, nice) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 92. So, let's be real... one jacket Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 93. So, let's be real... one jacket Particularly important for decision-tree re-entry that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand Find the LastResult in the context that maximizes: pisCoreferent( LastResult.type=x | one_CD, # trigger for the function, trivial condition QualitativeEvaluation(User, x), ContextDistance(LastResult.node) )
  • 94. So, let's be real... no! forget the last thing i said go back! third option in the menu i want to modify the address I gave you
  • 95. So, let's be real... QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|) }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand
  • 96. So, let's be real... }A single utterance that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  • 97. So, let's be real... }random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 ), p( i2 ), p( i3 )] x if x ∈ I ^ p(x) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand QualitativeEvaluation(Useru, ClothingItemi) BrowseCatalogue(Useru, filter=[ClothingItemn, j |J|]) HowTo(Useru, Commandc |Context|)
  • 98. So, let's be real... } random.choice(I) x if x ∈ I ^ length(x) = argmax(L) I = [ i1, i2, i3 ] L = [ | i1 |, | i2 |, | i3 | ] P = [ p( i1 | q ), p( i2 | q ), p( i3 | q ) ] x if x ∈ I ^ p(x | q) = argmax(P) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand However, this approach will still suffer from segmentation issues due to passing many more arguments than expected for resolving any specific intent.
  • 99. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } A standard API will normally provide these definitions as the declaration of its methods.
  • 100. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } We can map the argument signatures to the nodes of the taxonomy we are using for linguistic inference.
  • 101. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) }The system will then be able to link API calls to entity parses dynamically.
  • 102. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine.
  • 103. So, let's be real... GetInfo(Useru, argmax(Commandc |Context| ) ) SetPreference(Useru, SetRating(ClothingItemi, k, 1) ) that's a nice one but, i want to see different jackets .... how can i do that LastResult is a nice jacket but, i want to see different jackets .... how can i LastCommand ∀(ClothingItemn, m |N| |M| | n = i, k = * ) } Over time, new methods added to the API will be automatically supported by the pre-existing linguistic engine. 😁
  • 104. So, let's be real... If the system can reasonably solve nested semantic dependencies hierarchically, it can also be extended to handle multi-intent utterances in a natural way.
  • 105. So, let's be real... We need: • semantic association measures between the leaves of the tree in order to attach them to the right nodes and be able to stop growing a tree at the right level, • a syntactic tree to build the dependencies, either using a parser or a PCFG/LFG grammar (both may take input from the previous step, which will provide the notion of verbal valence: Person Browse Product), • for tied analyses, an interface with the user to request clarification,
  • 106. So, let's be real... Convert to abstract entity types Activate applicable dependencies/grammar rules (fewer and less ambiguous thanks to the previous step) i need to pay my bill and i cannot log into my account i need to PAYMENT my INVOICE and i ISSUE ACCESS ACCOUNT Resolve attachments and rank candidates by semantic association, keep top-n hypotheses i need to [ PAYMENT my INVOICE ] and i [ ISSUE [ AccessAccount() ] ] i need to [ Payment(x) ] and i [ Issue(y) ]
  • 107. So, let's be real... AND usually joins ontologically coordinate elements, e.g. books and magazines versus *humans and Donald Trump i need to [ Payment(x) ] and i [ Issue(y) ]
  • 108. So, let's be real... Even if p0 = p( Payment | Issue ) > u, we will still split the intents because here we have py = p( Payment | Issue, y ), and normally py < p0 for any irrelevant case (i.e., the posterior probability of the candidate attachment will not beat the baseline of its own prior probability) i need to [ Payment(x) ] and i [ Issue(y) ]
  • 109. So, let's be real... I.e., Payment may have been attached to Issue if no other element had been attached to it before. Resolution order, as given by attachment probabilities, matters. i need to [ Payment(x) ] and i [ Issue(y) ]
  • 110. So, let's be real... i need to [ Payment(x) ] i [ Issue(y) ] i need to [ Payment(x) ] and i [ Issue(y) ] Multi-intents solved 😎
  • 111. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries
  • 112. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER
  • 113. Summary of desiderata 1. Normalization 1. Ability to handle input indeterminacy by activating several candidate hypotheses 2. Ability to handle synonyms 3. Ability to handle paraphrases 2. NER 1. Transversal (cross-intent) 2. No nesting under classification (feedback loop instead) 3. Contextually unbounded 4. Entity-linked 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Word embeddings, domain thesaurus, fuzzy matching Inverted indexes, character- based neural networks for NER implicitly solved by Supersenses + Neural enquirer implicitly solved by Neural enquirer (layered/modular)
  • 115. 3.1 Supersense inference GFT (Google Fine-Grained) taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 116. 3.1 Supersense inference FIGER taxonomy Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 117. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  • 118. 3.1 Supersense inference for every position, term, tag tuple in TermExtractor3(s): for every document d in D2: FB = Freebase owem = OurWordEmbeddingsModel Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015. for every sent s in d: sentence_vector = None if not sentence_vector: sentence_vector = vectorizeSentence(s, position) owem[ T[tag] ].update( sentence_vector ) T = OurTaxonomy1 3 TermExtractor = parser + chunker + entity resolver that assigns Freebase types to entities. 2 D = 133,000 news documents. 1 FB labels are manually mapped to fine-grained labels.
  • 119. 3.1 Supersense inference Yogatama, Dani, Daniel Gillick, and Nevena Lazic. "Embedding Methods for Fine Grained Entity Type Classification." ACL (2). 2015.
  • 120. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder
  • 121. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias
  • 122. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * # the weights that minimize the loss for this column [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 123. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 124. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). "olympic games near kangaroos" Risk of false positives? Already as probabilities!! 👍 q = "Which city hosted the longest game before the game in Beijing?" KB = knowledge base consisting of tables that store facts encoder = bi-directional Recurrent Neural Network v = encoder(q) # vector of the encoded query kb_encoder = KB table encoder fn = embedding of the name of the n-th column wmn = embedding of the value in the n-th column of the m-th row [x ; y] = vector concatenation W = weights (what we will e.g. SGD out of the data) b = bias normalizeBetween0and1( penaltiesOrRewards * [ {sydney... australia... kangaroo...} + {city... town... place...} ] + somePriorExpectations )
  • 125. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  • 126. 3.2 Semantic trees Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18
  • 127. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games? Olympic Games City Year Duration Barcelona 1992 15 Atlanta 1996 18 Sydney 2000 17 Athens 2004 14 Beijing 2008 16 London 2012 16 Rio de Janeiro 2016 18 Q(i) = T(i) = y(i) = RNN-encoded = p(vector("column:city") | [ vector("which"), vector("city")... ] ) > p(vector("column:city") | [ vector("city"), vector("which")... ] ) 3.2 Semantic trees
  • 128. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) As many as SQL operations 3.2 Semantic trees
  • 129. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Each executor applies the same set of weights to all rows (since it models a single operation) 3.2 Semantic trees
  • 130. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply 3.2 Semantic trees
  • 131. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) The logic of max, min functions is built-in as column-wise operations. The system learns over which ranges they apply (before, after are just temporal versions of argmax and argmin) 3.2 Semantic trees
  • 132. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Candidate vector at layer L for row m of our database Memory layer with the last output vector generated by executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  • 133. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As an attention mechanism, cumulatively weight the vector by the output weights from executor L - 1 Evaluate a candidate: 3.2 Semantic trees
  • 134. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns This allows the system to restrict the reference at each step. Evaluate a candidate: 3.2 Semantic trees
  • 135. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). Which city hosted the 2012 Olympic Games?Q(i) = RNN-encoded = p(vector("column:city") | [ vec("which"), vec("city")... ] ) > p(vector("column:city") | [ vec("city"), vec("which")... ] )Executor layer L p(vector("column:year") | [ ... vec("2012"), vec("olympic") ... ] ) > p(vector("column:year") | [ ... vec("olympic"), vec("2012") ... ] )Executor layer L - 1 Executor layer L - 2 p(vector("FROM") | [ ... vec("olympic"), vec("games") ... ] ) > p(vector("FROM") | [ ... vec("games"), vec("olympic") ... ] ) Row m, R = our table Set of columns As we weight the vector in the direction of each successive operation, it gets closer to any applicable answer in our DB. Evaluate a candidate: 3.2 Semantic trees
  • 136. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 3.2 Semantic trees
  • 137. Yin, Pengcheng, et al. "Neural enquirer: Learning to query tables with natural language." arXiv preprint arXiv:1512.00965 (2015). SbS is N2N with bias. SEMPRE is a toolkit for training semantic parsers, which map natural language utterances to denotations (answers) via intermediate logical forms. Here's an example for querying databases. https://nlp.stanford.edu/software/sempre/ 😳 3.2 Semantic trees
  • 138. 3.3 Re-entry Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Problem: Theoretically, it could be achieved by a language modeler such as a recurrent neural network (RNN). However, their memory (encoded by hidden states and weights) is typically too small, and is not compartmentalized enough to accurately remember facts from the past (knowledge is compressed into dense vectors). RNNs are known to have difficulty in memorization tasks, i.e., outputting the same input sequence they have just read. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014).
  • 139. Task: given a set of facts or a story, answer questions on those facts in a logically consistent way. Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Jason went to the kitchen. Jason picked up the milk. Jason travelled to the office. Jason left the milk there. Jason went to the bathroom. jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Where is the milk? milk? 3.3 Re-entry
  • 140. F = set of facts (e.g. recent context of conversation; sentences) M = memory for fact f in F: # any linguistic pre-processing may apply store f in the next available memory slot of M Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the reading stage... 3.3 Re-entry
  • 141. F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... 3.3 Re-entry
  • 142. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom F = set of facts (e.g. recent context of conversation; sentences) M = memory O = inference module (oracle) x = input q = query about our known facts k = number of candidate facts to be retrieved m = a supporting fact (typically, the one that maximizes support) 3.3 Re-entry
  • 143. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t1... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, [ ] 3.3 Re-entry
  • 144. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... milk jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Candidate space for k = 2 jason go kitchen jason get milk jason go office jason drop milk jason go bathroom argmax over that space milk, jason, drop 3.3 Re-entry
  • 145. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). At the inference stage, t2... jason go kitchen jason get milk jason go office jason drop milk jason go bathroom jason go kitchen jason get milk jason go office jason drop milk jason go bathroom Why not? jason, get, milk = does not contribute as much new information as jason go office 3.3 Re-entry
  • 146. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). Training is performed with a margin ranking loss and stochastic gradient descent (SGD). Simple model. Easy to integrate with any pipeline already working with structured facts (e.g. Neural Inquirer). Adds a temporal dimension to it. Answering the question about the location of the milk requires comprehension of the actions picked up and left. 3.3 Re-entry
  • 147. Memory networks Weston, Jason, Sumit Chopra, and Antoine Bordes. "Memory networks." arXiv preprint arXiv:1410.3916 (2014). QACNN CBT-NE CBT-CN Date 69.4 66.6 63.0 Kadlec, Rudolf, et al. "Text understanding with the attention sum reader network." arXiv preprint arXiv: 1603.01547 (2016). 75.4 71.0 68.9 4 Mar 16 Sordoni, Alessandro, et al. "Iterative alternating neural attention for machine reading." arXiv preprint arXiv: 1606.02245 (2016). 76.1 72.0 71.0 7 Jun 16 Trischler, Adam, et al. "Natural language comprehension with the epireader." arXiv preprint arXiv:1606.02270 (2016). 74.0 71.8 70.6 7 Jun 16 Dhingra, Bhuwan, et al. "Gated-attention readers for text comprehension." arXiv preprint arXiv:1606.01549 (2016). 77.4 71.9 69.0 5 Jun 16 3.3 Re-entry
  • 148. 0 25 50 75 100 KB IE Wikipedia 76.2 68.3 93.9 69.9 63.4 78.5 No Knowledge (embeddings) Standard QA System on KB 54.4% 93.5% 17% Memory Networks Key-Value Memory Networks Responseaccuracy (%) Source: Antoine Bordes, Facebook AI Research, LXMLS Lisbon July 28, 2016 Results on MovieQA
  • 150. Summary of desiderata 3. Semantic parsing 1. Taxonomy-driven supersense inference 2. Semantically-informed syntactic trees 3. Ability to handle decision-tree re-entries Extend the neuralized architecture's domain by integrating support for supersenses. Enable inference at training and test time. Enable semantic parsing using a neuralized architecture with respect to some knowledge base serving as ground truth. Use memory networks to effectively keep track of the entities throughout the conversation and ensure logical consistency. Extend memory networks by integrating support for supersenses. Enable inference at training and test time.
  • 151. Summary of desiderata Conversational technology should really be about • dynamically finding the best possible way to browse a large repository of information/actions. • find the shortest path to any relevant action or piece of information (to avoid the plane dashboard effect). • surfacing implicit data in unstructured content ("bocadillo de calamares in Madrid"). Rather than going open-domain, taking the closed-domain and going deep into it. Remember?