動的ボルツマンマシンとPommerman

動的ボルツマンマシンとPommerman
IBM東京基礎研究所
恐神貴行
© 2019 IBM Corporation 1

恐神貴行 @TOsogami
1998年日本アイ・ビー・エム（株）入社東京基礎研究所配属
2005年米国学術博士（カーネギーメロン大学コンピュータ・サイエンス学科）
2013-19年 JST CRESTプロジェクト主たる共同研究者
2015年 IBMアカデミー会員
2019年 IBMシニア・テクニカル・スタッフ・メンバー
現在数学アドバンストイノベーションプラットフォーム（AIMaP）運営委員
産業数学の先進的・基礎的共同研究拠点共同利用・共同研究委員会委員
人工知能・機械学習関連の学会で活動
など
興味確率モデル、逐次的意思決定、強化学習

基礎研究
受賞
• 人工知能学会全国大会優秀賞 (2004, 2006, 2015, 2017)
• IBISワークショップ・ベストプレゼンテーション賞 (2015)
• 待ち行列研究部会論文賞 (2015)
学術書
基礎研究からビジネスのイノベーションへ
IBM東京基礎研究所.数理科学部門の取り組み
ビジネスのイノベーション
日本OR学会
実施賞 (2003)
ICDM データマイ
ニング・コンテス
ト優勝 (2007)
PDOS
製造プロセスの最適化
Image courtesy of worradmu at
FreeDigitalPhotos.net
日本OR学会
文献賞奨励賞
(2010)
ANACONDA
センサーデータからの異常検知
Finance trend predictor
金融市場の予測
NeurIPS Pommerman
コンペティション優勝
(2018)

Dynamic Boltzmann machine (DyBM) from
scientific contributions to business innovations
© 2019 IBM Corporation
Publication in a Nature journal (2015) Business innovation (2018)

How can we make effective use of spike-timing
dependent plasticity (STDP) in artificial neural networks?
Hebb’s rule (’49) STDP (’90s)
Cells that fire together,
wire together
Bi & Poo (1998)
Dan & Poo (2006)
Amount of changes
depends on timing of spikes
Today’s artificial neural
networks ?[Nessler et al. 2013,
Bengio et al. 2016,
Scellier & Bengio 2016]

DyBM provides theoretical underpinnings for STDP,
similar to Boltzmann machine for Hebb’s rule
Boltzmann machine
Dynamic Boltzmann machine
Hebb’s rule
Spike-timing dependent plasticity
Bi & Poo (1998)
Dan & Poo (2006)
MLE
MLE Cells that fire together, wire together
Refine
Boltzmann machine Hebb’s rule
Derive

Learning rule of Boltzmann machine,
maximizing log-likelihood [Hinton et al. ’83]
Neuron Neuron
Synapse
𝒙∈
Expected value:
𝒙
Log likelihood of training data :
𝒙∈
cf. Hebb’s rule
Stochastic
gradient

Pre-synaptic neuron
Post-synaptic neuron
Image courtesy of dream designs at FreeDigitalPhotos.net

Spike-timing dependent plasticity (STDP):
Amount of changes depends on timing of spikes
Synapse strengthened
(Long Term Potentiation)
Bi & Poo (1998)
Dan & Poo (2006)
Pre-synaptic
neuron
Post-synaptic
neuronSynapse
Synapse weakened
(Long Term Depression)

Dynamic Boltzmann machine as a limit
of a sequence of Boltzmann machines
Time
Dynamic Boltzmann machine
Historical values Next value
Weight from neuron at time
to neuron at time
We learn
Boltzmann machine for a
-th order Markov model

Inference with Dynamic Boltzmann machine (LTP only)
Conduction delay,
Synaptic eligibility trace:
[ ]
[ ]
Probability for neuron to fire at time :
:

Learning with DyBM, maximizing log-likelihood
Conduction delay,
Synaptic eligibility trace:
[ ]
[ ]
[: ]
:
Stochastic gradient update for LTP weight:
:
Spike-timing dependent
How recently/often
spikes reached from
neuron
cf. Boltzmann machine

No back propagation through time
in DyBM’s learning
:
[ ]
*summation is over
pre-synaptic neurons
connected toPer-step learning time is independent of the length
of time-series (local in time & space)
cf. Back propagation through time needed for
recurrent neural networks (including LSTM)

Online learning can also improve predictive
accuracy for non-stationary data
Training Test
Batch 0.932 0.863
Online 0.980 0.958
Training Test Predictive accuracy*
Batch:
Train DyBM optimally → Test with fixed parameters
Online:
Train DyBM optimally
Further online learning → Test while learning online
*Predictive accuracy is the coefficient of
correlation between prediction and
realized values in sensor data from a
power generator, but Figure is IBM stock
price from Yahoo! Finance

DyBM provides theoretical underpinnings for STDP
Hebb’s rule (’49)
Motivated artificial
neural networks
- Perceptron (’58)
Failure
1950 1960 1970 1980 1990
Theoretical underpinnings
- Hopfield network (’82)
- Boltzmann machine (’83)
2000 2010
Success
- Deep learning
STDP (’90s)
Theoretical underpinnings
- Dynamic Boltzmann machine
Successful
applications

Extensions of DyBM
To structured time-series
• T. Osogami, R. Raymond, A. Goel, T. Shirai,
and T. Maehara, “Dynamic determinantal
point processes,” AAAI-18
To real-valued time-series
• S. Dasgupta and T. Osogami, “Nonlinear
dynamic Boltzmann machines for time
series prediction,” AAAI-17
To models with hidden units
• T. Osogami, H. Kajino, and T. Sekiyama,
“Bidirectional learning for time-series
models with hidden units,” ICML 2017
To continuous space
• H. Kajino, “A functional dynamic
Boltzmann machine,” IJCAI-17

References
• 恐神貴行, ボルツマンマシン, コロナ社, 2019• T. Osogami and M. Otsuka, “Seven neurons
memorizing sequences of alphabetical images
via spike-timing dependent plasticity,” Scientific
Reports 5, 14149 (2015).
www.nature.com/articles/srep14149
• T. Osogami and S. Dasgupta, Energy-based
machine learning, IJCAI-17 tutorial
researcher.watson.ibm.com/researcher/view_g
roup.php?id=7834
• github.com/ibm-research-tokyo/dybm

NeurIPS 2018 Pommerman コンペティションで優勝しました

Pommermanは今日のAI技術では手に負えません
Pommermanの難しさ:
• 実時間での意思決定
• 複数のエージェントの協調
• 部分観測
• ⾧期のプラニング
AIの学会では、この様な難しい課題を
コンペティションとすることで技術の
発展を目指しています
IBM エージェント (赤) vs. デフォルト・エージェント (青)
19

最終的に目標が達成されるように
逐次的にアクションを選びます
壁を壊すアイテムを
取得する
敵を追い
詰める
勝利
勝利するために
何をするべきか
逐次的意思決定

逐次的意思決定問題へのアプローチ
環境が既知環境が未知
• 環境をシミュレート可
• 他者の動きが未知
• 一部観測不可
強化学習プラニング

環境をシミュレート
できる場合には、
木探索が有効です
(爆,右,爆,上)
(左,右,右,上)
. . .
. . .

Pommermanでは、巨大な探索木に対して
実時間の意思決定が必要です
分岐数
~ 通り
最低10手先
(爆弾の寿命)
を考慮
通り
0.1秒で
意思決定
. . .
. . .
. . .
23

新技術
悲観的シナリオによる実時間での木探索
© 2019 IBM Corporation T. Osogami & T. Takahashi, Real-time tree search with pessimistic scenarios, arXiv:1902.10870
確率的シナリオ
による木探索
決定的・悲観的
シナリオによる
評価
24

相手に複数の行動を同時にとらせることで、
Pommermanにおける悲観的なシナリオを作
ることができます

自己対戦により、最適な悲観度を学習しました
悲観度０悲観度１悲観度２悲観度３

エージェントが移動できる場所の数が
「生存可能性」の強さを表します
エージェントが
移動できる場所
良いアクション
- 自分・仲間の生存可能性↑
- 敵の生存可能性↓
- 生存可能性を一定以上に
保って、アイテムを収集
27

悲観的シナリオによる木探索の応用可能性
ゲーム
• デバッグ
• ゲーム内キャラクター
映像・シミュレーション
自律飛行・走行

Pommermanを動かしてみるには
$ git clone https://github.com/MultiAgentLearning/playground.git
$ cd playground
$ pip install –r requirements.txt
$ python examples/simple_ffa_run.py
詳細は
https://github.com/MultiAgentLearning/playground/tree/master/docs

NeurIPS 2019でもPommermanコンペティションが
開催されます
昨年と同ルール
新ルール
• エージェント間の通信可
詳細は
https://www.pommerman.com/competitions

協力しながら競争することで、
勝つエージェントができました
情報共有
• アイデア・手法
• うまく行ったことそれぞれ、勝つものを作る

Pommermanまとめ
悲観的なシナリオによる木探索は、
高い安全性が要求される状況での、
実時間での逐次的意思決定に有効
応用の可能性

動的ボルツマンマシンとPommerman
恐神貴行
IBM東京基礎研究所
ありがとうございました

動的ボルツマンマシンとPommerman

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

動的ボルツマンマシンとPommerman