Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017

Byron Galbraith, PhD
Co-founder / Chief Data Scientist, Talla
MLConf Seattle
2017.05.19
Neural Information Retrieval 
& 
Conversational Question Answering

/ 29
Intelligent
Conversational
Service Desk
Human in the loop
Conversational
Knowledge Base
Conversational
Ticketing System
Intelligent
Workﬂows
Talla gets smarter, faster.
Conversational
Ticketing System
Intelligent
Workﬂows
Conversational
Knowledge Base
Human in the Loop
Talla gets smarter, faster. 
Stay in control.
2

/ 29
Context and
ambiguity are
significant
challenges
Time flies like an arrow;
fruit flies like a banana.
4

/ 29
AI to the
Rescue?
https://xkcd.com/1831/

5

/ 29
With apologies
to George Box
All chatbots are dumb,
but some are useful.
6

/ 29
Question
Answering
is the most
compelling use
case for
chatbots
NLP IR
Q&A
7

/ 29
Neural Information Retrieval
2014 2015 2016 2017
1 %
4 %
8 %
21 %
051015202530
Year
%ofSIGIRpapers
relatedtoneuralIR
Figure 1: The percentage of neural IR papers at the ACM SIGIR conference—as
manual inspection of the paper titles—shows a clear trend in the growing popularity
important IR task. A search query may typically contain a few terms, while the d
depending on the scenario, may range from a few terms to hundreds of sentences
models for IR use vector representations of text, and usually contain a large numb
that needs to be tuned. ML models with large set of parameters typically require
Mitra and Craswell (2017)

8

/ 29
(Neural)
Information
Retrieval
System
Query
Docs
Generate

Representation
q
D
Generate

Representation
Estimate

Relevance
9

/ 29
Word
Embeddings
Query
Docs
Generate

Representation
q
D
Generate

Representation
Estimate

Relevance
10

/ 29
Word
Embeddings
Mitra et al. (2016)

Figure 2: The architecture of a word2vec (CBOW) model con-
2
g
e
co
o
w
p
m
u
(i
a
m
a
fu
in
11

/ 29
Learning to
Rank
Query
Docs
Generate

Representation
q
D
Generate

Representation
Estimate

Relevance
12

/ 29
Learning to
Rank
Huang et al. (2013)

13

/ 29
End-to-End
Models
Query
Docs
Generate

Representation
q
D
Generate

Representation
Estimate

Relevance
14

/ 29
End-to-End
Models
Severyn and Moschitti (2015)
15

/ 29
Neural IR
Resources
Mitra and Craswell (2017) Neural Models for Information Retrieval
https://arxiv.org/abs/1705.01509
Mitra and Craswell (2017) Neural Text Embeddings for IR
WSDM 2017 Tutorial
https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017
Zhang et al. (2016) Neural Information Retrieval: A Literature Review
https://arxiv.org/abs/1611.06792
Neu-IR Workshop at SIGIR
http://neu-ir.weebly.com/
16

/ 29
Neural IR for Q&A
Conversational
Knowledge Base
17

/ 29
Conversational
Knowledge
Base
Goal: Automatically Eﬃciently
answer employees’ requests
Method:
1. Respond with high conﬁdence answer from KB
2. Suggest up to four similar questions from KB
3. Provide easy path to service desk representatives
Enable rep to train
18

/ 29
Word embeddings
are susceptible to
out of vocabulary
terms
Problem: Out of Vocabulary Terms
Unseen at Training
Skipped for being too rare
Can be highly discriminative
What does cromulent mean?
What does bigly mean?
19

/ 29
Word embeddings
are susceptible to
out of vocabulary
terms
Problem: Out of Vocabulary Terms
Unseen at Training
Skipped for being too rare
Can be highly discriminative
What does UNK mean?
What does UNK mean?
20

/ 29
OOV can be
overcome through
ensembling
Solution: Out of Vocabulary Terms
Infer embedding from local context
Ensemble with term frequency methods
Mitra et al. (2016)

Table 4: Results of NDCG evaluations under the non-telescoping settings. Both the DESM and the LSA models perform poorly in
the presence of random irrelevant documents in the candidate set. The mixture of DESMIN OUT with BM25 achieves the best
NDCG. The best NDCG values are highlighted per column in bold and all the statistically signiﬁcant (p < 0.05) differences with the
BM25 baseline are indicated by the asterisk (*)
.
Explicitly Judged Test Set Implicit Feedback based Test Set
NDCG@1 NDCG@3 NDCG@10 NDCG@1 NDCG@3 NDCG@10
BM25 21.44 26.09 37.53 11.68 22.14 33.19
LSA 04.61* 04.63* 04.83* 01.97* 03.24* 04.54*
DESM (IN-IN, trained on body text) 06.69* 06.80* 07.39* 03.39* 05.09* 07.13*
DESM (IN-IN, trained on queries) 05.56* 05.59* 06.03* 02.62* 04.06* 05.92*
DESM (IN-OUT, trained on body text) 01.01* 01.16* 01.58* 00.78* 01.12* 02.07*
DESM (IN-OUT, trained on queries) 00.62* 00.58* 00.81* 00.29* 00.39* 01.36*
BM25 + DESM (IN-IN, trained on body text) 21.53 26.16 37.48 11.96 22.58* 33.70*
BM25 + DESM (IN-IN, trained on queries) 21.58 26.20 37.62 11.91 22.47* 33.72*
BM25 + DESM (IN-OUT, trained on body text) 21.47 26.18 37.55 11.83 22.42* 33.60*
BM25 + DESM (IN-OUT, trained on queries) 21.54 26.42* 37.86* 12.22* 22.96* 34.11*
We do not report the results of evaluating the mixture models
under the telescoping setup because tuning the ↵ parameter under
those settings on the training set results in the best performance from
the standalone DESM models. Overall, we conclude that the DESM
The probabilistic model of information retrieval leads to the de-
velopment of the BM25 ranking feature [35]. The increase in BM25
as term frequency increases is justiﬁed according to the 2-Poisson
model [15, 36], which makes a distinction between documents about
21

/ 29
Deep Learning
methods have both
training and
operational
challenges
Problem: Operationalizing Deep Learning
A lot of labeled data required
UX requires online, one-shot learning
Poor interpretability, hard to debug
Performance gain vs model complexity
Model persistence with auto-scaling infrastructure
22

/ 29
In this case, Deep
Learning is better
suited for ofﬂine
scenarios
Solution: Operationalizing Deep Learning
Use linear models instead for online / nearline
Deep learning for ofﬂine and global tasks
e.g. generating new word embeddings
https://xkcd.com/1838/
23

/ 29
The user controls
the question-
answer pairs in the
knowledge base
Problem: User-Trained Agent
End users can ad hoc update the knowledge base
New Q&A pairs should be accessible immediately
Real-time, one-shot learning expected
24

/ 29
The user controls
the question-
answer pairs in the
knowledge base
Problem: User-Trained Agent
End users can ad hoc update the knowledge base
New Q&A pairs should be accessible immediately
Real-time, one-shot learning expected
25

/ 29
IR-based methods
give us the
interpretability and
speed needed for a
reliable UX
Solution: User-Trained Agent
Fully inspectable, editable KB via web interface
Cascade of fast online and nearline models
Linear models and term-frequency features easier
to debug and modify
26

/ 29
Conversational
interfaces have
their own user
behavioral quirks
Problem: Users Don’t Read
Skim, Assume, Respond
27

/ 29
Give the user every
opportunity to
succeed
Solution: Users Don’t Read
Constrain interaction expectation
Hybrid interfaces
28

/ 29
Summary
Productizing conversational Q&A is not just about algorithms
Neural IR is an exciting and fast growing ﬁeld
Chatbots can actually be useful
29

Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017

Similaire à Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017 (20)

Plus de MLconf

Plus de MLconf (20)

Dernier

Dernier (20)

Byron Galbraith, Chief Data Scientist, Talla, at MLconf SEA 2017