SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
DEEP LEARNING JP
[DL Papers]
“Towards an AutomaticTuringTest:
Learning to Evaluate Dialog Response (ACL2017)”
Hiromi Nakagawa, Matsuo Lab
http://deeplearning.jp/
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
2
Agenda
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
3
Agenda
• Author
– Ryan Lowe1, Michael Noseworthy1, Iulian V.Serban1, Nicolas A.-Gontier1,
– Yoshua Bengio2,3, Joelle Pineau1,3
1. Reasoning and Learning Lab, School of Computer Science, McGill University
2. Montreal Institute for Learning Algorithms, Universite de Monreal
3. CIFAR Senior Fellow
• ACL 2017
– https://arxiv.org/abs/1708.07149
• Summary
– BLEUなどword-overlap metricによる対話生成の評価は人間による評価とほとんど相関がない
– 人間のスコアリングを学習したモデルを用いて生成結果を評価する手法を提案(ADEM)
– 人間のスコアリングと高い相関性を持つスコアを自動で出力できることを検証
• 実装と学習済みモデルが公開 (https://github.com/mike-n-7/ADEM)
4
1. Paper Information
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
5
Agenda
• 対話システム開発の歴史
– 「人間らしく」人間と対話できる(non-task-oriented)システムの構築はAI研究の歴史の中
でも大きなゴールの1つ[Turing, 1950; Weizenbaum, 1966]
– 近年ではneural networkの活用で大規模なnon-task-orientedな対話システム研究が活発
化[Sordoni et al., 2015b; Shang et al., 2015; Vinyals and Le, 2015; Serban et al., 2016a; Li et al., 2015]
– 特定の目的のためにend-to-endで学習されたモデルが成功しているケースもある
• Google’s Smart Reply system[Kannen et al. 2016], Microsoft’s Xiaoice chatbot[Markoff and Mozur, 2015]
• 一方、対話システムの開発で常に課題となってきたのがパフォーマンスの評価
6
2. Introduction
• 対話システムのパフォーマンス評価
– Turing test:システムか人間かを見分ける評価を人間が行う
• 合理的ではあるが、制約も多く、人手による評価が必要なためスケーリングしにくい
• 相当注意深く評価システムを設計しないと、バイアスがかかりやすい
– 「見分ける」まではせず、対話の質を人間が主観評価する
• いずれにせよ時間/費用/スケールしにくい問題は解決しない
• 特にspecific conversation domainsではその評価を行える有識者の用意が大変[Lowe et al., 2015]
7
2. Introduction
• Neural network-based modelの発展にもかかわらず、non-task-orientedな
タスクでは評価指標が依然として問題となっている
– BLEU含めword-overlap指標は、人間の評価とほとんど相関がない[Liu et al. 2016]
– response間のsemantic similarityを考慮できないことが問題
8
2. Introduction
• とはいえ、現状では対話システム評価にはBLEUが使われることがほとんど
– 人手評価はコストが高すぎる
– 極端に言えば、全hyper parameterに対して人手評価するの?という話
• 自動で対話システムの質を評価できるモデルが作れれば、対話システム開発に
大きなインパクトがあるはず
– rapid prototyping & testing
– “Automatic Turing Test”
9
2. Introduction
• What is a ‘good’ chatbot ?
– one whose response aarree ssccoorreedd hhiigghhllyy on appropriateness bbyy hhuummaann evaluators.
– 現状の(破綻した返答をするような)対話システムの改善には十分な指標のはず
• 多様な対話に対する人間の評価スコアを収集し、automatic dialogue
evaluation model (ADEM) を学習させる
– hierarchical RNN で human scoresをsemi-supervisedに学習
– ADEMのscoreはutterance-levelでもsystem-levelでも人手評価と高い相関関係を示した
10
2. Introduction
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
11
Agenda
• word-overlap metrics
– BLEU[Papineni et al. 2002]
• 機械翻訳の用途で利用
– ROUGE[Lin, 2004]
• 要約の用途で利用
– 意味的類似性や文脈依存性を測れない
• 単語の共通度合いしか見れない
• 機械翻訳ではそこまで問題にならない(reasonable translationが大体限られている)
• 対話生成ではresponse diversityが非常に高い[Artstein et al., 2009]ためcriticalな問題
– 対話生成では人間の評価とほとんど相関がないことが指摘されている[Liu et al. 2016]
12
3. Related Works
参考:BLEU
N-gramのprecisionを計算し、短文に対するpenaltyを考慮
• chat-oriented dialogue systemsで返答の質を推定する研究
– automatic dialogue policy evaluation metric [DeVault et al., 2011]
– semi-automatic evaluation metric for dialogue coherence (similar to BLEU and
ROUGE)[Gandle and Traum, 2016]
– a framework to predict utterance-level problematic situations using intent and
sentiment factors[Xiang et al., 2014]
– train a classifier to distinguish user utterances from system-generated utterances
using various dialogue features[Higashinaka et al., 2014]
13
3. Related Works
• hand-crafted reward featuresによる強化学習の活用
– ease of answering and information flow [Li et al., 2016b]
– turn-level appropriateness and conversational depth [Yu et al., 2016]
• hand-crafted featuresであり、対話の一側面しか捉えられていない
– sub-optimal performance
– これがretrieval-based cross-entropyやword-level maximum log-likelihoodの最適化より良
いかはunclear
• conversational-levelでの評価のため、single dialogue responseを評価できない事
が多い
– response-levelで評価できる指標は提案指標に組み込むことが可能
14
3. Related Works
• task-orientedな対話システムについては評価手法の開発が進んでいる
– ex) finding a restaurant
– task completion signalを考慮する指標(PARADISE[Walker et al., 1997], MeMo[Moller et al,, 2006])
– task completionやtask complexityが計測できる領域でないと利用できない
15
3. Related Works
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
16
Agenda
• An Automatic Dialogue Evaluation Model (ADEM)
– captures sseemmaannttiicc ssiimmiillaarriittyy beyond word overlap statistics
– exploits both the ccoonntteexxtt and the rreeffeerreennccee rreessppoonnssee to calculate its score
17
4. Proposed Method
1. RNN encoderでContext, Model response, Reference responseを変換
2. scoreを計算
18
4. Proposed Method
• Hierarchical RNN encoder[El Hihi and Bengio, 1995; Sordoni et al., 2015a]
– utterance-level encoder
• input : word
• output : a vector at the end of each utterance
– context-level encoder
• input : utterance
• output: a vector representation of the context
– Why hierarchical? -> incorporate information from early utterances
– RNN部分のパラメータはpre-trained(後述)
• not learned from human scores
19
4. Proposed Method
•
– パラメータ:M, N
• linear projection
• map !̂ -> #	&	! space
– 定数:α, β
• モデルの出力が1~5の範囲に収まるようにscalingする
– contextとreference responseと似たresponseベクトルに対して高いscoreを出力
– scoreと人間の評価スコアの二乗誤差を最小化するように学習(L2正則化)
• simple -> accurate prediction & fast evaluation (cf. supp. material in original paper)
20
4. Proposed Method
• Pre-training with VHRED
– encoderをneural dialogue modelとして学習させる
• encoder outputを受け取ってnext utteranceを予測する3rd decoder RNNを追加
– VHRED (latent variable hierarchical recurrent encoder decoder[Serban et al., 2016b])
• stochastic latent variable
• HREDよりもdiverseでcoherentな返答を生成できる
21
4. Proposed Method
• Pre-training with VHRED
1. The context is encoded into a vector using the hierarchical encoder
2. VHRED then samples a Gaussian variable that is used to condition the decoder
3. use the last hidden state of the context-level encoder (#, !, !̂ -> ', (, ())
22
4. Proposed Method
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
23
Agenda
• Settings
– BPE(Byte Pair Encoding)[Gage, 1994; Sennrich et al., 2015]
• reduce the effective vocabulary size
– layer normalization[Ba et al., 2016] for hierarchical encoder
• better than batch normalization[Ioffe and Szegedy, 2015; Cooijmans et al., 2016]
– used several of techniques to train the VHRED[Serban et al., 2016b; Bowman et al., 2016]
• drop words in the decoder 25%
• anneal the KL linearly from 0 to1 over the first 60,000batches
– Adam[Kingma and Ba, 2014] optimizer
24
5. Experiments
• Settings
– training ADEM
• employ a subsampling procedure based on the model response length
• ensure that ADEM does not use response length to predict the score
– humans have a tendency to give a higher rating to give a higher rating to shorter responses
– training VHRED
• embedding size = 2,000
– after training VHRED, use PCA to reduce the dimensionality (n = 50)
– Early stopping
25
5. Experiments
• Data Collection
– Twitter Corpus[Ritter et al., 2011]を対象にresponseを生成し、クラウドソーシング(Amazon
Mechanical Turk)で人間がスコアリング
• relevant / irrelevant responses
• coherent / incoherent responses
– 4パターンのCandidate responsesを用意してresponse varietyを増やす
• a response selected by TF-IFD retrieval-based model
• a response selected by the Dual Encoder(DE)[Lowe et al., 2015]
• a response generated by the hierarchical recurrent encoder-decoder(HRED)
• human-generated responses
– novel human response, different from a fixed corpus
26
5. Experiments
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
27
Agenda
• Utterance-level correlations
– utterance-levelで、各指標が人間による評価とどれだけ相関が有るか
– ADEMはword-overlap metricsより遥かに高い相関係数
28
6. Results
• Utterance-level correlations
– C-ADEM, R-ADEM: context / reference情報のみで学習した場合
– ADEM(T2V): pre-trained VHREDではなく学習済みのtweet2vecモデルを用いた場合
29
6. Results
• Utterance-level correlations
– utterance-levelで、各指標が人間による評価とどれだけ相関が有るか
– ADEMはword-overlap metricsより遥かに高い相関係数
30
6. Results
• System-level correlations
– 各dialogue model (TF-IDF, DE, HRED, human) によるresponseに対する平均スコア
– 横軸が人間のスコアリング、縦軸が各指標(BLEUなど)によるスコアリング
– ダメなモデルはダメ、理想的なモデル(human)は良い、と評価できているのがADEM
31
6. Results
• Generalization to previously unseen models
– 実用を考えると、trainにはないnew modelによるresponseを正しく評価できる必要
– {TF-IDF, DE, HRED, humans}モデルから1つ抜いてtrainして、残りのモデルに対してtest
を行う(leave-one-out evaluation)
32
6. Results
• Generalization to previously unseen models
– Dual Encoderを抜いた時以外はうまくいっている
– HREDを抜いた場合はsurprising
• 人間の記述した返答(retrieval models or human-generated)だけでtrainして、ニューラルネッ
トが生成した返答にもgeneralizeしている
33
6. Results
• Qualitative Analysis
– poor responseには正しくlow scoreをつけられる
– よいresponseにも高い評価
– 2nd contextの4th responseは人間がもっとよい評価つけてよいのでは
34
6. Results
• Qualitative Analysis
– 人間が高く評価する場合でも低く評価してしまう場合もあった
– 二乗誤差を取っているので平均的なスコアを出力しやすい(外れ値を出力しにくい)
35
6. Results
1. Paper Information
2. Introduction
3. Related Works
4. Proposed Model
5. Experiments
6. Results
7. Discussion
36
Agenda
• 提案モデルは多様な目的のデータセットに適用可能
– 一度pre-trainedモデルが公開されれば、その目的のために利用が可能
• domain transfer ability はfuture work
• 人間が高評価する返答を出力するdialogue modelは、chatbotのdesired end-goalではない
– generic responseの問題(人間は無難な/汎用性の高い返答を好む[Shang et al., 2016])
– このbiasがかからないようにADEMを拡張することがfuture work
• 長さに対して情報量の少ない返答を許容しないようにする
• adversarial evaluation model[Kannan and Vinyals, 2017; Li et al., 2017]
– 人間の返答かそうでないかを見分ける。generic responsesはeasy to distinguishableなのでスコアが低くなる
• 対話システムが人間と魅力的で意味深いinteractionをしているかを評価できるモデルが重要
– 難しいが、提案手法がこれを達成する過程での1つのstepになるはず
37
7. Discussion
• 学習済みモデルを分析すれば、人間らしさの定性的評価とかいろいろできそう
– 言語やデータセットごとの違いとかも面白そう
• アノテーションチェックや誤差関数の設計を工夫すればより人の直感に近いス
コアは出力できそう
• どこまで汎化できているのか?
– データセットを超えた「人間らしさ」
• チャットボット以外にも、人の主観評価が重要な領域に有効?
38
感想

Contenu connexe

Tendances

[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...Deep Learning JP
 
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−Deep Learning JP
 
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @DenaICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @DenaTakanori Nakai
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Toru Fujino
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKSDeep Learning JP
 
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...harmonylab
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence LearningDeep Learning JP
 
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based ControlDeep Learning JP
 
論文 Solo Advent Calendar
論文 Solo Advent Calendar論文 Solo Advent Calendar
論文 Solo Advent Calendar諒介 荒木
 
[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles
[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles
[DL輪読会]StyleNet: Generating Attractive Visual Captions with StylesDeep Learning JP
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP
 
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural NetworksDeep Learning JP
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめDeep Learning JP
 
[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A SurveyDeep Learning JP
 
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...Deep Learning JP
 
[DL輪読会]Learning Task Informed Abstractions
[DL輪読会]Learning Task Informed Abstractions [DL輪読会]Learning Task Informed Abstractions
[DL輪読会]Learning Task Informed Abstractions Deep Learning JP
 

Tendances (20)

[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
 
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
 
170614 iclr reading-public
170614 iclr reading-public170614 iclr reading-public
170614 iclr reading-public
 
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @DenaICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
 
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
Generating Better Search Engine Text Advertisements with Deep Reinforcement L...
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control
[DL輪読会]Learning to Adapt: Meta-Learning for Model-Based Control
 
深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向
 
論文 Solo Advent Calendar
論文 Solo Advent Calendar論文 Solo Advent Calendar
論文 Solo Advent Calendar
 
[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles
[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles
[DL輪読会]StyleNet: Generating Attractive Visual Captions with Styles
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
 
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
[DLHacks 実装] DeepPose: Human Pose Estimation via Deep Neural Networks
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
 
[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey[DL輪読会]Deep Face Recognition: A Survey
[DL輪読会]Deep Face Recognition: A Survey
 
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)
教師なしオブジェクトマッチング(第2回ステアラボ人工知能セミナー)
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
[DL輪読会]Learning Task Informed Abstractions
[DL輪読会]Learning Task Informed Abstractions [DL輪読会]Learning Task Informed Abstractions
[DL輪読会]Learning Task Informed Abstractions
 

En vedette

[DLHacks 実装]Neural Machine Translation in Linear Time
[DLHacks 実装]Neural Machine Translation in Linear Time [DLHacks 実装]Neural Machine Translation in Linear Time
[DLHacks 実装]Neural Machine Translation in Linear Time Deep Learning JP
 
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People ImagesDeep Learning JP
 
Web開発初心者がReactをチームに導入して半年経った
Web開発初心者がReactをチームに導入して半年経ったWeb開発初心者がReactをチームに導入して半年経った
Web開発初心者がReactをチームに導入して半年経ったkazuki matsumura
 
[DLHacks] DLHacks説明資料
[DLHacks] DLHacks説明資料[DLHacks] DLHacks説明資料
[DLHacks] DLHacks説明資料Deep Learning JP
 
[DLHacks 実装] The statistical recurrent unit
[DLHacks 実装] The statistical recurrent unit[DLHacks 実装] The statistical recurrent unit
[DLHacks 実装] The statistical recurrent unitDeep Learning JP
 
[DL輪読会] DeepNav: Learning to Navigate Large Cities
[DL輪読会] DeepNav: Learning to Navigate Large Cities[DL輪読会] DeepNav: Learning to Navigate Large Cities
[DL輪読会] DeepNav: Learning to Navigate Large CitiesDeep Learning JP
 
[DL輪読会]Parallel Multiscale Autoregressive Density Estimation
[DL輪読会]Parallel Multiscale Autoregressive Density Estimation[DL輪読会]Parallel Multiscale Autoregressive Density Estimation
[DL輪読会]Parallel Multiscale Autoregressive Density EstimationDeep Learning JP
 
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...Deep Learning JP
 
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image TransformationDeep Learning JP
 
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
[DL輪読会]Opening the Black Box of Deep Neural Networks via InformationDeep Learning JP
 
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-Deep Learning JP
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video GenerationDeep Learning JP
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networksDeep Learning JP
 

En vedette (14)

React.js + Flux入門 #scripty02
React.js + Flux入門 #scripty02React.js + Flux入門 #scripty02
React.js + Flux入門 #scripty02
 
[DLHacks 実装]Neural Machine Translation in Linear Time
[DLHacks 実装]Neural Machine Translation in Linear Time [DLHacks 実装]Neural Machine Translation in Linear Time
[DLHacks 実装]Neural Machine Translation in Linear Time
 
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images
[DL輪読会] The Conditional Analogy GAN: Swapping Fashion Articles on People Images
 
Web開発初心者がReactをチームに導入して半年経った
Web開発初心者がReactをチームに導入して半年経ったWeb開発初心者がReactをチームに導入して半年経った
Web開発初心者がReactをチームに導入して半年経った
 
[DLHacks] DLHacks説明資料
[DLHacks] DLHacks説明資料[DLHacks] DLHacks説明資料
[DLHacks] DLHacks説明資料
 
[DLHacks 実装] The statistical recurrent unit
[DLHacks 実装] The statistical recurrent unit[DLHacks 実装] The statistical recurrent unit
[DLHacks 実装] The statistical recurrent unit
 
[DL輪読会] DeepNav: Learning to Navigate Large Cities
[DL輪読会] DeepNav: Learning to Navigate Large Cities[DL輪読会] DeepNav: Learning to Navigate Large Cities
[DL輪読会] DeepNav: Learning to Navigate Large Cities
 
[DL輪読会]Parallel Multiscale Autoregressive Density Estimation
[DL輪読会]Parallel Multiscale Autoregressive Density Estimation[DL輪読会]Parallel Multiscale Autoregressive Density Estimation
[DL輪読会]Parallel Multiscale Autoregressive Density Estimation
 
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
 
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation
 
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
 
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
[DLHacks LT] PytorchのDataLoader -torchtextのソースコードを読んでみた-
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks
 

Similaire à [DL輪読会] Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Learning to summarize from human feedback
Learning to summarize from human feedbackLearning to summarize from human feedback
Learning to summarize from human feedbackharmonylab
 
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日Hironori Washizaki
 
Variational Template Machine for Data-to-Text Generation
Variational Template Machine for Data-to-Text GenerationVariational Template Machine for Data-to-Text Generation
Variational Template Machine for Data-to-Text Generationharmonylab
 
読解支援@2015 06-26
読解支援@2015 06-26読解支援@2015 06-26
読解支援@2015 06-26sekizawayuuki
 
【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative Modeling【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative ModelingDeep Learning JP
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochiOhsawa Goodfellow
 
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演Hironori Washizaki
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...Shohei Okada
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介YukiK2
 
Fast abstractive summarization with reinforce selected sentence rewriting
Fast abstractive summarization with reinforce selected sentence rewritingFast abstractive summarization with reinforce selected sentence rewriting
Fast abstractive summarization with reinforce selected sentence rewritingYasuhide Miura
 
WordNetで作ろう! 言語横断検索サービス
WordNetで作ろう! 言語横断検索サービスWordNetで作ろう! 言語横断検索サービス
WordNetで作ろう! 言語横断検索サービスShintaro Takemura
 
2012年度中鉢PBLシラバス
2012年度中鉢PBLシラバス2012年度中鉢PBLシラバス
2012年度中鉢PBLシラバスYoshihide Chubachi
 
ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題Kenta Oono
 
トピックモデルの評価指標 Coherence 研究まとめ #トピ本
トピックモデルの評価指標 Coherence 研究まとめ #トピ本トピックモデルの評価指標 Coherence 研究まとめ #トピ本
トピックモデルの評価指標 Coherence 研究まとめ #トピ本hoxo_m
 
ICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaYuto Yamaguchi
 
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsAll-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsMakoto Takenaka
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image TranslationDeep Learning JP
 
外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証Yusaku Kawaguchi
 

Similaire à [DL輪読会] Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses (20)

Learning to summarize from human feedback
Learning to summarize from human feedbackLearning to summarize from human feedback
Learning to summarize from human feedback
 
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日
鷲崎 愛媛大学講演-プロジェクト型演習2014年12月15日
 
Variational Template Machine for Data-to-Text Generation
Variational Template Machine for Data-to-Text GenerationVariational Template Machine for Data-to-Text Generation
Variational Template Machine for Data-to-Text Generation
 
読解支援@2015 06-26
読解支援@2015 06-26読解支援@2015 06-26
読解支援@2015 06-26
 
【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative Modeling【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative Modeling
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochi
 
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演
Pythonを含む多くのプログラミング言語を扱う処理フレームワークとパターン、鷲崎弘宜、PyConJP 2016 招待講演
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
Rethinking Knowledge Graph Propagation for Zero-Shot Learinig 論文紹介
 
Fast abstractive summarization with reinforce selected sentence rewriting
Fast abstractive summarization with reinforce selected sentence rewritingFast abstractive summarization with reinforce selected sentence rewriting
Fast abstractive summarization with reinforce selected sentence rewriting
 
WordNetで作ろう! 言語横断検索サービス
WordNetで作ろう! 言語横断検索サービスWordNetで作ろう! 言語横断検索サービス
WordNetで作ろう! 言語横断検索サービス
 
2012年度中鉢PBLシラバス
2012年度中鉢PBLシラバス2012年度中鉢PBLシラバス
2012年度中鉢PBLシラバス
 
ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題ディープラーニング最近の発展とビジネス応用への課題
ディープラーニング最近の発展とビジネス応用への課題
 
Deeplearning lt.pdf
Deeplearning lt.pdfDeeplearning lt.pdf
Deeplearning lt.pdf
 
トピックモデルの評価指標 Coherence 研究まとめ #トピ本
トピックモデルの評価指標 Coherence 研究まとめ #トピ本トピックモデルの評価指標 Coherence 研究まとめ #トピ本
トピックモデルの評価指標 Coherence 研究まとめ #トピ本
 
ICDE2012勉強会:Social Media
ICDE2012勉強会:Social MediaICDE2012勉強会:Social Media
ICDE2012勉強会:Social Media
 
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word RepresentationsAll-but-the-Top: Simple and Effective Postprocessing for Word Representations
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 
外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証
 

Plus de Deep Learning JP

【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 

Plus de Deep Learning JP (20)

【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 

[DL輪読会] Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

  • 1. DEEP LEARNING JP [DL Papers] “Towards an AutomaticTuringTest: Learning to Evaluate Dialog Response (ACL2017)” Hiromi Nakagawa, Matsuo Lab http://deeplearning.jp/
  • 2. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 2 Agenda
  • 3. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 3 Agenda
  • 4. • Author – Ryan Lowe1, Michael Noseworthy1, Iulian V.Serban1, Nicolas A.-Gontier1, – Yoshua Bengio2,3, Joelle Pineau1,3 1. Reasoning and Learning Lab, School of Computer Science, McGill University 2. Montreal Institute for Learning Algorithms, Universite de Monreal 3. CIFAR Senior Fellow • ACL 2017 – https://arxiv.org/abs/1708.07149 • Summary – BLEUなどword-overlap metricによる対話生成の評価は人間による評価とほとんど相関がない – 人間のスコアリングを学習したモデルを用いて生成結果を評価する手法を提案(ADEM) – 人間のスコアリングと高い相関性を持つスコアを自動で出力できることを検証 • 実装と学習済みモデルが公開 (https://github.com/mike-n-7/ADEM) 4 1. Paper Information
  • 5. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 5 Agenda
  • 6. • 対話システム開発の歴史 – 「人間らしく」人間と対話できる(non-task-oriented)システムの構築はAI研究の歴史の中 でも大きなゴールの1つ[Turing, 1950; Weizenbaum, 1966] – 近年ではneural networkの活用で大規模なnon-task-orientedな対話システム研究が活発 化[Sordoni et al., 2015b; Shang et al., 2015; Vinyals and Le, 2015; Serban et al., 2016a; Li et al., 2015] – 特定の目的のためにend-to-endで学習されたモデルが成功しているケースもある • Google’s Smart Reply system[Kannen et al. 2016], Microsoft’s Xiaoice chatbot[Markoff and Mozur, 2015] • 一方、対話システムの開発で常に課題となってきたのがパフォーマンスの評価 6 2. Introduction
  • 7. • 対話システムのパフォーマンス評価 – Turing test:システムか人間かを見分ける評価を人間が行う • 合理的ではあるが、制約も多く、人手による評価が必要なためスケーリングしにくい • 相当注意深く評価システムを設計しないと、バイアスがかかりやすい – 「見分ける」まではせず、対話の質を人間が主観評価する • いずれにせよ時間/費用/スケールしにくい問題は解決しない • 特にspecific conversation domainsではその評価を行える有識者の用意が大変[Lowe et al., 2015] 7 2. Introduction
  • 8. • Neural network-based modelの発展にもかかわらず、non-task-orientedな タスクでは評価指標が依然として問題となっている – BLEU含めword-overlap指標は、人間の評価とほとんど相関がない[Liu et al. 2016] – response間のsemantic similarityを考慮できないことが問題 8 2. Introduction
  • 9. • とはいえ、現状では対話システム評価にはBLEUが使われることがほとんど – 人手評価はコストが高すぎる – 極端に言えば、全hyper parameterに対して人手評価するの?という話 • 自動で対話システムの質を評価できるモデルが作れれば、対話システム開発に 大きなインパクトがあるはず – rapid prototyping & testing – “Automatic Turing Test” 9 2. Introduction
  • 10. • What is a ‘good’ chatbot ? – one whose response aarree ssccoorreedd hhiigghhllyy on appropriateness bbyy hhuummaann evaluators. – 現状の(破綻した返答をするような)対話システムの改善には十分な指標のはず • 多様な対話に対する人間の評価スコアを収集し、automatic dialogue evaluation model (ADEM) を学習させる – hierarchical RNN で human scoresをsemi-supervisedに学習 – ADEMのscoreはutterance-levelでもsystem-levelでも人手評価と高い相関関係を示した 10 2. Introduction
  • 11. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 11 Agenda
  • 12. • word-overlap metrics – BLEU[Papineni et al. 2002] • 機械翻訳の用途で利用 – ROUGE[Lin, 2004] • 要約の用途で利用 – 意味的類似性や文脈依存性を測れない • 単語の共通度合いしか見れない • 機械翻訳ではそこまで問題にならない(reasonable translationが大体限られている) • 対話生成ではresponse diversityが非常に高い[Artstein et al., 2009]ためcriticalな問題 – 対話生成では人間の評価とほとんど相関がないことが指摘されている[Liu et al. 2016] 12 3. Related Works 参考:BLEU N-gramのprecisionを計算し、短文に対するpenaltyを考慮
  • 13. • chat-oriented dialogue systemsで返答の質を推定する研究 – automatic dialogue policy evaluation metric [DeVault et al., 2011] – semi-automatic evaluation metric for dialogue coherence (similar to BLEU and ROUGE)[Gandle and Traum, 2016] – a framework to predict utterance-level problematic situations using intent and sentiment factors[Xiang et al., 2014] – train a classifier to distinguish user utterances from system-generated utterances using various dialogue features[Higashinaka et al., 2014] 13 3. Related Works
  • 14. • hand-crafted reward featuresによる強化学習の活用 – ease of answering and information flow [Li et al., 2016b] – turn-level appropriateness and conversational depth [Yu et al., 2016] • hand-crafted featuresであり、対話の一側面しか捉えられていない – sub-optimal performance – これがretrieval-based cross-entropyやword-level maximum log-likelihoodの最適化より良 いかはunclear • conversational-levelでの評価のため、single dialogue responseを評価できない事 が多い – response-levelで評価できる指標は提案指標に組み込むことが可能 14 3. Related Works
  • 15. • task-orientedな対話システムについては評価手法の開発が進んでいる – ex) finding a restaurant – task completion signalを考慮する指標(PARADISE[Walker et al., 1997], MeMo[Moller et al,, 2006]) – task completionやtask complexityが計測できる領域でないと利用できない 15 3. Related Works
  • 16. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 16 Agenda
  • 17. • An Automatic Dialogue Evaluation Model (ADEM) – captures sseemmaannttiicc ssiimmiillaarriittyy beyond word overlap statistics – exploits both the ccoonntteexxtt and the rreeffeerreennccee rreessppoonnssee to calculate its score 17 4. Proposed Method
  • 18. 1. RNN encoderでContext, Model response, Reference responseを変換 2. scoreを計算 18 4. Proposed Method
  • 19. • Hierarchical RNN encoder[El Hihi and Bengio, 1995; Sordoni et al., 2015a] – utterance-level encoder • input : word • output : a vector at the end of each utterance – context-level encoder • input : utterance • output: a vector representation of the context – Why hierarchical? -> incorporate information from early utterances – RNN部分のパラメータはpre-trained(後述) • not learned from human scores 19 4. Proposed Method
  • 20. • – パラメータ:M, N • linear projection • map !̂ -> # & ! space – 定数:α, β • モデルの出力が1~5の範囲に収まるようにscalingする – contextとreference responseと似たresponseベクトルに対して高いscoreを出力 – scoreと人間の評価スコアの二乗誤差を最小化するように学習(L2正則化) • simple -> accurate prediction & fast evaluation (cf. supp. material in original paper) 20 4. Proposed Method
  • 21. • Pre-training with VHRED – encoderをneural dialogue modelとして学習させる • encoder outputを受け取ってnext utteranceを予測する3rd decoder RNNを追加 – VHRED (latent variable hierarchical recurrent encoder decoder[Serban et al., 2016b]) • stochastic latent variable • HREDよりもdiverseでcoherentな返答を生成できる 21 4. Proposed Method
  • 22. • Pre-training with VHRED 1. The context is encoded into a vector using the hierarchical encoder 2. VHRED then samples a Gaussian variable that is used to condition the decoder 3. use the last hidden state of the context-level encoder (#, !, !̂ -> ', (, ()) 22 4. Proposed Method
  • 23. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 23 Agenda
  • 24. • Settings – BPE(Byte Pair Encoding)[Gage, 1994; Sennrich et al., 2015] • reduce the effective vocabulary size – layer normalization[Ba et al., 2016] for hierarchical encoder • better than batch normalization[Ioffe and Szegedy, 2015; Cooijmans et al., 2016] – used several of techniques to train the VHRED[Serban et al., 2016b; Bowman et al., 2016] • drop words in the decoder 25% • anneal the KL linearly from 0 to1 over the first 60,000batches – Adam[Kingma and Ba, 2014] optimizer 24 5. Experiments
  • 25. • Settings – training ADEM • employ a subsampling procedure based on the model response length • ensure that ADEM does not use response length to predict the score – humans have a tendency to give a higher rating to give a higher rating to shorter responses – training VHRED • embedding size = 2,000 – after training VHRED, use PCA to reduce the dimensionality (n = 50) – Early stopping 25 5. Experiments
  • 26. • Data Collection – Twitter Corpus[Ritter et al., 2011]を対象にresponseを生成し、クラウドソーシング(Amazon Mechanical Turk)で人間がスコアリング • relevant / irrelevant responses • coherent / incoherent responses – 4パターンのCandidate responsesを用意してresponse varietyを増やす • a response selected by TF-IFD retrieval-based model • a response selected by the Dual Encoder(DE)[Lowe et al., 2015] • a response generated by the hierarchical recurrent encoder-decoder(HRED) • human-generated responses – novel human response, different from a fixed corpus 26 5. Experiments
  • 27. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 27 Agenda
  • 28. • Utterance-level correlations – utterance-levelで、各指標が人間による評価とどれだけ相関が有るか – ADEMはword-overlap metricsより遥かに高い相関係数 28 6. Results
  • 29. • Utterance-level correlations – C-ADEM, R-ADEM: context / reference情報のみで学習した場合 – ADEM(T2V): pre-trained VHREDではなく学習済みのtweet2vecモデルを用いた場合 29 6. Results
  • 30. • Utterance-level correlations – utterance-levelで、各指標が人間による評価とどれだけ相関が有るか – ADEMはword-overlap metricsより遥かに高い相関係数 30 6. Results
  • 31. • System-level correlations – 各dialogue model (TF-IDF, DE, HRED, human) によるresponseに対する平均スコア – 横軸が人間のスコアリング、縦軸が各指標(BLEUなど)によるスコアリング – ダメなモデルはダメ、理想的なモデル(human)は良い、と評価できているのがADEM 31 6. Results
  • 32. • Generalization to previously unseen models – 実用を考えると、trainにはないnew modelによるresponseを正しく評価できる必要 – {TF-IDF, DE, HRED, humans}モデルから1つ抜いてtrainして、残りのモデルに対してtest を行う(leave-one-out evaluation) 32 6. Results
  • 33. • Generalization to previously unseen models – Dual Encoderを抜いた時以外はうまくいっている – HREDを抜いた場合はsurprising • 人間の記述した返答(retrieval models or human-generated)だけでtrainして、ニューラルネッ トが生成した返答にもgeneralizeしている 33 6. Results
  • 34. • Qualitative Analysis – poor responseには正しくlow scoreをつけられる – よいresponseにも高い評価 – 2nd contextの4th responseは人間がもっとよい評価つけてよいのでは 34 6. Results
  • 35. • Qualitative Analysis – 人間が高く評価する場合でも低く評価してしまう場合もあった – 二乗誤差を取っているので平均的なスコアを出力しやすい(外れ値を出力しにくい) 35 6. Results
  • 36. 1. Paper Information 2. Introduction 3. Related Works 4. Proposed Model 5. Experiments 6. Results 7. Discussion 36 Agenda
  • 37. • 提案モデルは多様な目的のデータセットに適用可能 – 一度pre-trainedモデルが公開されれば、その目的のために利用が可能 • domain transfer ability はfuture work • 人間が高評価する返答を出力するdialogue modelは、chatbotのdesired end-goalではない – generic responseの問題(人間は無難な/汎用性の高い返答を好む[Shang et al., 2016]) – このbiasがかからないようにADEMを拡張することがfuture work • 長さに対して情報量の少ない返答を許容しないようにする • adversarial evaluation model[Kannan and Vinyals, 2017; Li et al., 2017] – 人間の返答かそうでないかを見分ける。generic responsesはeasy to distinguishableなのでスコアが低くなる • 対話システムが人間と魅力的で意味深いinteractionをしているかを評価できるモデルが重要 – 難しいが、提案手法がこれを達成する過程での1つのstepになるはず 37 7. Discussion
  • 38. • 学習済みモデルを分析すれば、人間らしさの定性的評価とかいろいろできそう – 言語やデータセットごとの違いとかも面白そう • アノテーションチェックや誤差関数の設計を工夫すればより人の直感に近いス コアは出力できそう • どこまで汎化できているのか? – データセットを超えた「人間らしさ」 • チャットボット以外にも、人の主観評価が重要な領域に有効? 38 感想