2016word embbed supp

単語埋込みモデルによる
意味論，統語論，隠喩，暗喩の計算（仮）
浅川伸一 Shin Asakawa <asakawa@ieee.org>

2
謝辞
発表の機会を与えてくださいました京都大
学楠見孝先生に感謝申し上げます。

3
自己紹介
浅川伸一東京女子大学情報処理センター勤務。早稲田大
学在学時はピアジェの発生論的認識論に心酔する。卒業後
エルマンネットの考案者ジェフ・エルマンに師事，薫陶を受け
る。以来人間の高次認知機能をシミュレートすることを目指
している。知的情報処理機械を作ることを通して知的情報処
理とは何かを考えたいと思っている。著書に「 Python で実践する深層学
習」（ 2016) コロナ社 . 「ディープラーニング，ビッグデータ，機械学習 --- あるいはその心理学」
(2015) 新曜社。「ニューラルネットワークの数理的基礎」「脳損傷とニューラルネットワークモデ
ル，神経心理学への適用例」いずれも守一雄他編コネクショニストモデルと心理学(2001) 北
大路書房などがある

4
師匠ジェフ・エルマンとUCSDキャンパス内 2002年ころ

5
←２０１６年
２０１５年→

6
表記と基本グラフ
h
y
x
y: 出力層ニューロン
h: 中間層ニューロン
x: 入力層ニューロン

7
h
y
x
再帰結合（recurrent connections)

8
h
y
x
Wx
Wy
Wh
Wy:結合係数行列(中間から出力)
Wh:結合係数行列(再帰結合)
Wx:結合係数行列(入力から中間）

9
h
y
x
Wx+bx
Wy+by
Wh+bh
by:バイアス（中間から出力）
bh:バイアス（再帰結合）
bx:バイアス（入力から中間）
Bias terms will be omitted, henceforth
以降バイアス項は省略します

10
h0
y0
x0
h1
y1
x1
Digits subscripted indicate time
t:= 0...
下付き添字は時刻を表す。
カッコで表記する流儀もある
(e.g. x(t))

11
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5

12
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

13
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

14
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

15
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh
teacher

16
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

17
Full BPTT
h0
y0
x0
h1
y1
x1
h1
y1
x1
h2
y2
x2
h3
y3
x3
h4
y4
x4
h4
y4
x4
h5
y5
x5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

18
Trancated BPTT(window width=5)
ht+0
yt+0
xt+0
h1
y1
x1
ht+1
yt+1
xt+1
ht+2
yt+2
xt+2
ht+3
yt+3
xt+3
h4
y4
x4
ht+4
yt+4
xt+4
ht+5
yt+5
xt+5
teacher
error
Loss(t,y)
Wh Wh Wh Wh Wh

改良可能？
Can we improve?

ゲートの導入 introducing gates to control hidden state
ht-1
yt-1
xt-1
h1
y1
x1
ht
yt
xt
gate

ゲートの導入 introducing gates to control hidden state
ht-1
yt-1
xt-1
h1
y1
x1
ht
yt
xt
gate
でも，
なぜゲート?
Why gates?

忘却ゲートの導入
ht-1
yt-1
xt-1
h1
y1
x1
ht
yt
xt
gate
Who can control
gates?
誰がどうやって
ゲート制御？

ht-1
yt-1
xt-1
h1
y1
x1
ht
yt
xt
gate
Who can control
gates?
ゲート制御？
Who can tell me
how can I
control myself?

ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
who can control gates?
ゲートを制御？
３つ候補

ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
３つ候補
1. ht

It’s me
ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
３つ候補
1. ht

Me, too
ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
３つ候補
1. ht
2. yt

I can, too
ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
３つ候補
1. ht
2. yt
3. xt+1

ht
yt
xt
h1
y1
x1
ht+1
yt+1
xt+1
gate
ゲート制御
1. ht
2. yt
3. xt+1
ht+1 = ht s(x)
● s(x) = (1+e-x
)-1
● x = Wf (yt + ht + xt+1)

ゲートによって長距離依存LTDを解消可能

もっと改良可能？
Can we improve more?

入力ゲートの導入
ht
yt
xt
ht+1
yt+1
xt+1
gate
gate
ht+1 = ht s(w(ht + xt+1))
● s(x) = (1+e-x
)-1
● x = yt + ht + xt+1

もっともっと可能？
You need more?

出力ゲートの導入
ht
yt
xt
ht+1
yt+1
xt+1
gate
gate
gate
ht+1 = ht s(w(ht + xt+1 + yt+1))
● s(x) = (1+e-x
)-1
● x = yt + ht + xt+1

How does LSTM work?
1. LSTM replaces logistic or tanh hidden units with “memory cells” that
can store an analog value.
2. Each memory cell has its own input and output gates that control.
3. There is a forget gate which the analog value stored in the memory ce
decays.
4. For periods when the input and output gates are off and the forget gate is
not causing decay, a memory cell simply holds its value over time.
Le, Jaitly, & Hinton (2015)

別モデル GRU An alternative of the LSTM
h
~h
x
y
r: reset
gate
input
output
uupdate
gate
ut
= s (Wu
+ uu
)
ht
= f(Wh
+ uh
(ut
@ )
rt
= s (Wr
+ ur
ht-1
)
tilde(h) = (1- rt
) ht
+ rt
tilde(ht-1
)
yt
= Wy
tilde(ht
)

別モデル GRU An alternative of the LSTM
h
~h
x
y
r: reset
gate
input
output
uupdate
gate
ut = σ (Wuxt + Uuht−1) .
ht = ϕ (Wxt + Uh (ut ⊙ht−1)) ,
rt = σ (Wr xt + Urht−1) ,
˜ht = (1 − rt) ht + rt
˜ht−1,
yt = Wy
˜ht

双方向RNN (Bidirectional RNN)
前行
ステート
逆行
ステート
yt-1
xt-1
yt
xt
yt+1
xt+1

グレーブス (Graves, 2013)の生成 LSTM
出力
中
間
層
入力

深層 LSTM Depth Gated LSTM
ht− 1
( a ) 直前 ( b ) 生成 ( c ) 再帰 ( d ) 推論 ( e ) 全関与
ht
zt
xt
ht− 1 ht
zt
xt
ht− 1 ht
zt
xt
ht− 1 ht
zt
xt
ht− 1 ht
zt
xt
図 4.31 種々の LSTM 変種

Pascanu (2014) より
y( t )
h( t )h( t − 1)
x( t )
y( t )
h( t )h( t − 1)
x( t )
y( t )
h( t )h( t − 1)
x( t )
( a ) ( b ) ( c )
y( t )
h( t )h( t − 1)
x( t )
y( t )
h( t − 1)
x( t ) z( t )
z( t )h( t )
( d ) ( e )
図 4.27 パスカヌらの文献 108) の図 2 を改変

Pascanu (2014)より
I * xi
m
h
m ′1h ′1
m ′2
h ′2
m 1h1
m 2
h2
m ′
h ′
h ′
2 次元格子状 LSTM
ブロック
標準の LSTM
ブロック
ブロック
ブロック

Pascanu (2014) より
出力層
入力層
隠れ層
T1 T2 Tg
図 4.33 時計状 LSTM

47
Actor is Schmithuber who proposed LSTM
https://www.youtube.com/watch?v=-OodHtJ1saY

2016word embbed supp

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (13)

En vedette

En vedette (20)

Plus de Shin Asakawa

Plus de Shin Asakawa (12)

2016word embbed supp