機械学習ゼミ: Area attenttion

Area attention
@UMU____
20181028
arXiv:1810.10126

Significance
• Attentionのfocusを「single-item方式」から「隣
接しているmulti-itemを一括で参照する方式に
変えることで，SOTAを達成

Background: Attention
• 辞書のようなNNを構成し，クエリで(辞書を)引く
ℎ 𝑞
（クエリ側）
ℎ 𝑑
（辞書側） (𝑘𝑖, 𝑣𝑖)
𝑞
𝑓𝑞(⋅)
𝑓𝑘,𝑣(⋅)
𝑎𝑖：𝑞と𝑘𝑖の類似度のようなもの
総和が１（Attention）
Attenttion出力
イメージ：ki=qであればviが出てくる辞書を微分可能にした

Background: Attention
• 𝑓𝑎𝑡𝑡 𝑞, 𝑘 について
[Luong et al., 2015]
𝑓𝑎𝑡𝑡 𝑞, 𝑘 = 𝑞 ⋅ 𝑘
[Bahdanau et al., 2014]
𝑓𝑎𝑡𝑡 𝑞, 𝑘 = 𝑊1 𝑞 + 𝑊2 𝑘 + 𝑏 𝑊, 𝑏 𝑡𝑟𝑎𝑖𝑛𝑎𝑏𝑙𝑒

Background: Problem of Attention
• 普通のAttentionの問題点：single item focus
• 複数のitemにattentionできないので表現力が制限
ℎ 𝑞
（クエリ側）
ℎ 𝑑
𝑞
𝑓𝑞(⋅)
𝑓𝑘,𝑣(⋅)
クエリがq一つ:single item focus

→multi item focusにする
[Vaswani et al., 2017] Multi head attention
ℎ 𝑞
（クエリ側）
ℎ 𝑑
𝑞
𝑓𝑞(⋅)
𝑓𝑘,𝑣(⋅)
ℎ 𝑞
（クエリ側）
ℎ 𝑑
𝑞
𝑓𝑞(⋅)
𝑓𝑘,𝑣(⋅)
ℎ 𝑞
（クエリ側）
ℎ 𝑑
𝑞
𝑓𝑞(⋅)
𝑓𝑘,𝑣(⋅)

→multi item focusにする
[Pedersoli et al., 2016] areas of attention
画像の部分的な箇所をfocusするattention

Method (本論文)
Multi item focusを導入
• Single item focus: 要素ひとつひとつが辞書の要素
• Multi item focus: 要素ひとつひとつ+２つづつ+3つづつ…
ℎ 𝑑
（辞書側）
h1 h2 h3 hN…
ℎ 𝑑
（辞書側）
h1 h2 h3 hN…
h1と
h2
h2と
h3
h1と
h2と
h3
…
…

• Nつづつ？実際には
キーの計算：
バリューの計算：
・または，
キーとして，平均だけでなく分散やArea面積を
入れたものを用いることもできる．
Method 詳細
単に該当するitemのキーと
バリューを平均するだけ

Experiments: Neural Machine
Translation(vs Transformer)
BLEU (character level)
BLEU (token level)

Experiments: Neural Machine
Translation(vs LSTM)
Negative Log likelihood (character level)

Experiments: Image captioning
Test acculary

機械学習ゼミ: Area attenttion

Recommandé

Recommandé

Contenu connexe

Plus de KCS Keio Computer Society

Plus de KCS Keio Computer Society (20)

機械学習ゼミ: Area attenttion