Detecting Actionable Items in Meetings with Convolutional Deep Structured Semantic Models

Detecting Actionable Items in Meetings by
Convolutional Deep Structured Semantic Models
YUN-NUNG (VIVIAN) CHEN
DILEK HAKKANI-TÜR
XIAODONG HE
AUG 7TH, 2015
REDMOND, WA
DETECTING ACTIONABLE ITEMS IN MEETINGS BY CONVOLUTIONAL DEEP STRUCTURED SEMANTIC MODELS 1

My Background
Yun-Nung (Vivian) Chen http://vivianchen.idv.tw
PhD candidate working with Prof. Alexander I Rudnicky
Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Research focus
◦ spoken dialogue system
◦ unsupervised/weakly-supervised spoken language understanding
Dissertation
◦ Title: Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogue Systems
◦ Committee: Prof. Alexander I. Rudnicky, Prof. Anatole Gershman, Prof. Alan W Black, Dr. Dilek Hakkani-Tür

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Convolutional Deep Structured Semantic Model (CDSSM)
◦ Adaptation
◦ Actionable Item Detection
Experiments
Conclusions
Ongoing & Future Work

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Motivation
Meetings as a rich resource contain important information for participants.
◦ Meeting summarization (Xie et al., 2009; Riedhammer et al., 2010; Chen and Metze, 2013)
◦ Decision detection (Fernandez et al., 2008)
◦ Noteworthy utterance detection (Banerjee and Rudnicky, 2009)
◦ Action item identification (Yang et al., 2008)

Motivation
Computing devices have been easily accessible and information search has been a common part
of regular conversations.
Meetings include discussions for identifying participants’ next actions.

Motivation - Example
Will Vivian come here for the meeting? Find Calendar Entry

The ACL Best Student Paper is great! That is about .. Web Search

Let’s discuss the rebuttal, maybe Monday afternoon. Create Calendar Entry

Motivation
Some actionable items
◦ should be processed in real-time during the meeting, e.g. search, find_calendar_entry
◦ can be done after the meeting, e.g. create_reminder, create_calendar_entry
The advantages include
◦ seamless
◦ users do not need to directly address actions
◦ make meeting continue as planned, not interrupted by system errors

Motivation
3944.620 3945.990 me018 Have they ever responded to you?
3945.703 3946.325 me011 Nope.
find_email: check emails of me011, search for any emails from them
2049.280 2057.010 mn015 Yeah it's - or - or just - Yeah. It's also all on my - my home page at E_M_L. It's called "An
Anatomy of afind Spatial Description". But I'll send that link.
send_email: email all participants about "link to An Anatomy of Spatial Description"
2867.655 2888.613 mn015 I suggest w- to - for - to proceed with this in - in the sense that maybe, throughout this week,
the three of us will - will talk some more about maybe segmenting off different regions, and
we make up some - some toy a- observable "nodes" - is that what th-
create_calendar_entry: open calendars of participants, marking times free for the three participants and schedule an event
 Action Detection/Identification
stime etime spk_id trans

Motivation
3944.620 3945.990 me018 Have <from_contact_name>they</from_contact_name> ever responded to
<contact_name>you</contact_name>?
3945.703 3946.325 me011 Nope.
find_email: check emails of me011, search for any emails from them
2049.280 2057.010 mn015 Yeah it's - or - or just - Yeah. It's also all on my - my home page at E_M_L. It's called "An
Anatomy of afind Spatial Description". But I'll send <email_content>that link</email_content>.
send_email: email all participants about "link to An Anatomy of Spatial Description"
2867.655 2888.613 mn015 I suggest w- to - for - to proceed with this in - in the sense that maybe, <date>throughout this
week</date>, the <contact_name>three of us</contact_name> will - will talk some more
about maybe segmenting off different regions, and we make up some - some toy a- observable
"nodes" - is that what th-
create_calendar_entry: open calendars of participants, marking times free for the three participants and schedule an event
 Coreference Resolution
stime etime spk_id trans
 Action Detection/Identification  Slot Filling

Pipeline
Automatic Speech
Recognition (ASR)
Speaker
Diarization
Coreference
Resolution
Sentence
Segmentation
Slot Filling
Action Detection
3944.620 3945.990Have they ever responded to you? me018 find_email
from_contact_name contact_name

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Task Definition
Actionable Item Detection
◦ Goal: provide the easy access to information and perform actions a personal assistant can handle
without interrupting the meetings
◦ Assumption: some actions and associated arguments can be shared across genres
Human-Machine Genre create_calendar_entry
schedule a meeting with <contact_name>John</contact_name> <start_time>this afternoon</start_time>
Human-Human Genre create_calendar_entry
how about the <contact_name>three of us</contact_name> discuss this later <start_time>this afternoon</start_time>?
 more casual, include conversational terms
Task: multi-class utterance classification
• train on the available human-machine genre
• test on the human-human genre
Adaptation
• model adapation
• embedding vector adaptation

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Proposed Approach
Idea: Cortana data (human-machine) may help detect actionable items in meetings (human-human)
12 30 on Monday the 2nd of July schedule meeting to discuss taxes with Bill,
Judy and Rick.
12 PM lunch with Jessie
12:00 meeting with cleaning staff to discuss progress
2 hour business presentation with Rick Sandy on March 8 at 7am
create_calendar_entry
A read receipt should be sent in Karen's email.
Activate meeting delay by 15 minutes. Inform the participants.
Add more message saying "Thanks".
send_email
Query Doc
Convolutional deep structured semantic models for IR may be useful for this task.

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Convolutional Deep Structured Semantic Model (CDSSM)
20K 20K 20K
1000
w1 w2 w3
1000 1000
20K
wd
1000
300
Word Sequence: x
Word Hashing Matrix: Wh
Word Hashing Layer: lh
Convolution Matrix: Wc
Convolutional Layer: lc
Max Pooling Operation
Max Pooling Layer: lm
Semantic Projection Matrix: Ws
Semantic Layer: y
max max max
300 300 300 300
U A1 A2 An
CosSim(U, Ai)
P(A1 | U) P(A2 | U) P(An | U)
…
Semantic Similarity
Posterior Probability
Utterance
Action
𝑃 𝐴 𝑈 =
exp(𝐶𝑜𝑠𝑆𝑖𝑚(𝑈, 𝐴))
𝐴′ exp(𝐶𝑜𝑠𝑆𝑖𝑚(𝑈, 𝐴′))how about the we discuss this later
Shen et al., “A latent semantic model with convolutional-pooling structure for information retrieval,” in CIKM, 2014.
Huang et al., “Learning deep structured semantic models for web search using click through data,” in CIKM, 2013.

The model objective is to maximize the likelihood of associated actions given training utterances:
Similarly, reversing actions and utterances induces
Convolutional Deep Structured Semantic Model (CDSSM)
300 300 300 300
A U1 U2 Un
P(U1 | A) P(U2 | A) P(Un | A)
…
Action
Utterance
 Predictive Model
 Generative Model
300 300 300 300
U A1 A2 An
P(A1 | U) P(A2 | U) P(An | U)
…
Utterance
Action

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Adaptation
Issue: mismatch between a source genre and a target genre
Solution:
◦ Adapting CDSSM
◦ Continually train the CDSSM using the data from the target genre (usually stop early before fully
converged)
◦ More robust because of data from different genres and specific to the target genre
◦ Adapting Action Embeddings
◦ Moving learned action embeddings close to the observed corresponding utterance embeddings

Adapting Action Embeddings
The mismatch may result in inaccurate action embeddings
Action1
Action2
Action3
utt utt
utt
utt
utt
utt
utt
Action embedding: trained on the source genre
Utterance embedding: generated on the target genre
Idea: moving action embeddings close to the observed
utterance embeddings from the target genre

Learning adapted action embeddings by minimizing an objective:
Action1
Action2
Action3
utt utt
utt
utt
utt
utt
utt
The distance between original and new action embeddings
The distance between new action embeddings and corresponding utterance embeddings
Action3
Action1
Action2

adapted action embeddings
The objective considering action and utterance embeddings together:

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Goal: predict possible actions given each utterance
Before adaptation, the estimated semantic score of the k-th action
After adaptation, the estimated semantic score of the k-th action
Two ways of usage:
1. As final prediction scores
2. As features of a classifier: the trained classifier outputs the final prediction scores

Unidirectional: prediction score from one of
◦ Predictive model: SP(U, A)
◦ Generative model: SG(U, A)
Bidirectional: prediction score merging both scores
300 300 300 300
A U1 U2 Un
SG(U, A)
…
Action
Utterance
300 300 300 300
U A1 A2 An
SP(U, A)
…
Utterance
Action

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Experiments
Dataset: 22 meetings from the ICSI meeting corpus
Meeting type: Bed, Bmr, Bro
Identified action: find_calendar_entry, create_calendar_entry, open_agenda, add_agenda_item,
create_single_reminder, send_email, find_email, make_call, search, open_setting
Annotating agreement: Cohen’s Kappa = 0.64~0.67
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Bed
Bmr
Bro
create_single_reminder find_calendar_entry search add_agenda_item create_calendar_entry
open_agenda send_email find_email make_call open_setting

Experiments
Evaluation metrics: the average area under the precision-recall curve (AUC) for 10 actions+others
CDSSM training:
◦ Mismatch-CDSSM: trained on Cortana data (300 iters)
◦ Adapt-CDSSM: trained on Cortana data and then continually trained on meeting data (each 150 iters)
◦ Match-CDSSM: trained on meeting data (300 iters)
Approach #dim
Mismatch-CDSSM Adapt-CDSSM Match-CDSSM
P(A|U) P(U|A) Bidir P(A|U) P(U|A) Bidir P(A|U) P(U|A) Bidir
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34

Experiments
CDSSM training:
Approach #dim
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95 64.33 65.58 69.27

Experiments
CDSSM training:
Approach #dim
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95 64.33 65.58 69.27
+ Sim 311 52.80 54.95 59.09 60.78 60.29 65.08 64.52 64.81 68.86

Experiments
CDSSM training:
Approach #dim
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95 64.33 65.58 69.27
+ Sim 311 52.80 54.95 59.09 60.78 60.29 65.08 64.52 64.81 68.86
+ AdaptSim 311 52.75 55.22 59.23 61.60 61.13 65.71 64.72 65.39 69.08

Comparing different CDSSM training data
The embeddings pretrained on the mismatched data are successfully adjusted to the target genre
Matched data produces the best performance
◦ Not robust enough, even worse than the mismatch model
Better robustness from the bidirectional model, the embedding adaptation, and an additional classifier
Approach #dim
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95 64.33 65.58 69.27
+ Sim 311 52.80 54.95 59.09 60.78 60.29 65.08 64.52 64.81 68.86
+ AdaptSim 311 52.75 55.22 59.23 61.60 61.13 65.71 64.72 65.39 69.08

Effectiveness of Bidirectional Estimation
Almost all results from the bidirectional estimation significantly outperform unidirectional ones.
Between predictive and generative models, no certain direction is better.
Predictive and generative models can compensate each other and provide more robust estimations.
Approach #dim
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36 56.33 43.39 50.57
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08 64.19 60.36 62.34
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95 64.33 65.58 69.27
+ Sim 311 52.80 54.95 59.09 60.78 60.29 65.08 64.52 64.81 68.86
+ AdaptSim 311 52.75 55.22 59.23 61.60 61.13 65.71 64.72 65.39 69.08

Effectiveness of Adaptation
When the trained model mismatches to the task,
◦ the model adaptation
◦ the action embedding adaptation
are useful to overcome the genre mismatch.
Approach #dim
Mismatch-CDSSM Adapt-CDSSM
P(A|U) P(U|A) Bidir P(A|U) P(U|A) Bidir
w/o
SVM
Sim 47.45 48.17 49.10 48.67 50.09 50.36
AdaptSim 54.00 53.89 55.82 59.46 56.96 60.08
w/
SVM
Embeddings 300 53.07 48.07 55.71 60.06 59.03 63.95
+ Sim 311 52.80 54.95 59.09 60.78 60.29 65.08
+ AdaptSim 311 52.75 55.22 59.23 61.60 61.13 65.71
model adaptation
action embedding adaptation

Effectiveness of Adaptation
When the model matches with the target genre, action embedding adaptation still works.
We force action embeddings to be more accurate by decreasing the accuracy of others embeddings
◦ Less training samples for important action embeddings
◦ More training samples for others
Approach
Match-CDSSM
P(A|U) P(U|A) Bidir
w/o
SVM
Sim 56.33 43.39 50.57
AdaptSim 64.19 60.36 62.34 0.0
0.2
0.4
0.6
0.8
1.0
Sim
AdpSim
AUC

Effectiveness of CDSSM
Baselines
◦ Lexical: ngram
◦ Semantic: doc2vec (paragraph vector)
Classifier: SVM with RBF kernal
Approach AUC
Baseline
ngram 52.84
doc2vec 59.79
Proposed
CDSSM: P(A|U) 64.33
CDSSM: P(U|A) 65.58
CDSSM: Bidirectional 69.27
Semantic embeddings provide useful features for detecting actions.
Bidirectional estimation performs better than unidirectional.
All proposed approaches outperform baselines.
Le and Mikolov, “Distributed representations of sentences and documents,” in ICML, 2014.

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Conclusions
The latent semantic features generated by CDSSM show the effectiveness of detecting actions in
meetings, outperform lexical features and semantic paragraph vectors.
The adaptation techniques adjust embeddings to fit the target genre when the target genre does not
match well with the source genre.
The experiments highlight a future research direction for language understanding.
◦ Release annotated dataset
◦ Release trained embeddings

Outline
Introduction
◦ Motivation
◦ Task Definition
Proposed Approach
◦ Adaptation
Experiments
Conclusions

Active learning: boosting the performance via few annotated actions
◦ Rank by score without a classifier
◦ Acquire annotations for top-ranked utterances
Robustness test: evaluate unseen action embeddings generated by CDSSM
◦ Unseen: send_email
◦ Observed: send_message, find_email

Q & A 
THANKS FOR YOUR ATTENTIONS!

Detecting Actionable Items in Meetings with Convolutional Deep Structured Semantic Models

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Detecting Actionable Items in Meetings with Convolutional Deep Structured Semantic Models

Similaire à Detecting Actionable Items in Meetings with Convolutional Deep Structured Semantic Models (20)

Dernier

Dernier (20)

Detecting Actionable Items in Meetings with Convolutional Deep Structured Semantic Models