SlideShare une entreprise Scribd logo
1  sur  22
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Graph Convolutional Policy Network for Goal-Directed
Molecular Graph Generation (NIPS2018)
Kazuki Fujikawa, DeNA
サマリ
• 書誌情報
– Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
• NIPS2018(to appear)
• Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec
• 概要
– Graph Convolutional Policy Network(GCPN)を提案
• 強化学習で所望の属性を最適化する分子グラフを生成する
• ドメイン特有の報酬と敵対的な損失が最適化されるように方策を学習する
– 分子属性の最適化、ターゲティングなどの実験で既存の手法を上回る性能を示した
• 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い
2
アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
3
アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
4
背景
• 目的関数が最適化できるグラフ構造を生成することは、創薬や材料化学の分野
において重要
– 一般的な分子構造設計では、原子価などの物理法則に従いながら、Drug-likenessや合成
可能性といった特性が理想的な値にすることを考える
– 複雑で微分不可能なルールに対して最適化することは依然として困難
• 可変長のグラフを直接生成することは容易ではない
– 自然言語のような直列の系列と比較して、「分岐・結合種の存在」「始点が不明確」などの
理由で難易度が高い
5図引用: Gomez-Bombarelli+, 2018
アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
6
関連研究
• 扱うデータ形式の違いで2種類に大別できる
– テキストベース
• SMILES
– 分子の化学構造を文字列で表現する記法
• SMILES CFG (Context-free Grammar)
– SMILESを生成する文脈自由文法の生成規則列
– グラフベース
• 隣接行列を直接生成
• ノード・結合を自己回帰的に生成(隣接行列を一行ずつ生成)
7
分子名 構造式(グラフ) 隣接行列 SMILES SMILES CFG
ベンゼン c1ccccc1
smiles → chain
chain → chain, branched atom
chain → branched atom
branched atom → atom, ringbond
branched atom → atom
atom → aromatic organic
atom → aliphatic organic
ringbond → digit
aromatic organic → ’c’
aliphatic organic → ‘C’
aliphatic organic → ‘N’
digit → ‘1’
digit → ‘2’
0 1 0 0 0 1
1 0 1 0 0 0
0 1 0 1 0 0
0 0 1 0 1 0
0 0 0 1 0 1
1 0 0 0 1 0
関連研究(SMILES-based)
• テキストベースの生成モデルを使ってSMILESを生成するアプローチ
– Automatic chemical design using a data-driven continuous representation of
molecules [Gómez-Bombarelli+, 2018]
• 入力SMILESをAuto-Encoder, VAEで再構築するように学習することで、潜在空間を学習
• ベイズ最適化で目的変数を最適化
– ORGAN [Guimaraes+, 2017]
• RNNDecoderによるSMILESの文字列生成をGAN+RLで最適化
• SeqGAN [Yu+, 2017] と同様、Discriminatorが評価したスコア平均を報酬に学習
• 任意のヒューリスティクス(Diversity等)から得たスコアも同時に最大化する
8
Gomez-Bombarelli+, 2018
to generate drug-like molecules. [Gómez-Bombarelli et al.,
2016b] employed a variational autoencoder to build a la-
tent, continuous space where property optimization can be
made through surrogate optimization. Finally, [Kadurin et
al., 2017] presented a GAN model for drug generation. Ad-
ditionally, the approach presented in this paper has recently
been applied to molecular design [Sanchez-Lengeling et al.,
2017].
In the field of music generation, [Lee et al., 2017] built
a SeqGAN model employing an efficient representation of
multi-channel MIDI to generate polyphonic music. [Chen
et al., 2017] presented Fusion GAN, a dual-learning GAN
model that can fuse two data distributions. [Jaques et al.,
2017] employ deep Q-learning with a cross-entropy reward
to optimize the quality of melodies generated from an RNN.
In adversarial training, [Pfau and Vinyals, 2016] recontex-
tualizes GANs in the actor-critic setting. This connection
is also explored with the Wasserstein-1 distance in WGANs
[Arjovsky et al., 2017]. Minibatch discrimination and feature
mapping were used to promote diversity in GANs [Salimans
et al., 2016]. Another approach to avoid mode collapse was
shown with Unrolled GANs [Metz et al., 2016]. Issues and
convergence of GANs has been studied in [Mescheder et al.,
2017].
3 Background
Inthissection, weelaborateontheGAN andRL setting based
on SeqGAN [Yu et al., 2017]
G✓ is agenerator parametrized by ✓, that is trained to pro-
duce high-quality sequences Y1:T = (y1, ..., yT ) of length
T and a discriminator model Dφ parametrized by φ, trained
to classify real and generated sequences. G✓ is trained to
deceive Dφ, and Dφ to classify correctly. Both models are
trained in alternation, following aminimax game:
is completed. In order to do so, we perform N -time Monte
Carlo search with thecanonical rollout policy G✓ represented
as
MCG✓
(Y1:t ; N ) = { Y1
1:T , ..., YN
1:T } (3)
whereYn
1:t = Y1:t and Yn
t + 1:T isstochastically sampled via
the policy G✓. Now Q(s, a) becomes
Q(Y1:t − 1, yt ) =
8
><
>:
1
N
P
n = 1..N
R(Yn
1:T ), with
Yn
1:T 2 MCG✓
(Y1:t ; N ), if t < T.
R(Y1:T ), if t = T.
(4)
An unbiased estimation of the gradient of J(✓) can be de-
rived as
r ✓J(✓) '
1
T
X
t = 1,...,T
Eyt ⇠G✓(yt |Y1:t − 1 ) [
r ✓logG✓(yt |Y1:t − 1) · Q(Y1:t − 1, yt )] (5)
Finally in SeqGAN thereward function isprovided by Dφ.
4 ORGAN
Figure 1: Schema for ORGAN. Left: D is trained as a classifier
receiving asinput amix of real dataand generated databy G. Right:
Guimaraes+, 2017
関連研究(Graph-based)
• グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ
– Learning deep generative models of graphs [Li+, 2018]
• ノード・結合を順々に自己回帰的に生成する
• 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する
ノード・結合を決める
– Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018]
• 環などの原子団を一つのグループにまとめることにより、
グラフ構造を木構造に変換する(Tree decomposition)
• 木構造をVAEの枠組みで再構築するように学習する
• Graph Convolutionで特徴抽出した結果も使って木構造から
グラフへと戻す
9
Junction TreeVariational Autoencoder for Molecular Graph Generation
Figure 2. Comparison of two graph generation schemes: Structure
by structure approach is preferred asit avoids invalid intermediate
states (marked in red) encountered in node by node approach.
ond phase, the subgraphs (nodes in the tree) areassembled
together into acoherent molecular graph.
We evaluate our model on multiple tasks ranging from
molecular generation to optimization of a given molecule
according to desired properties. As baselines, we utilize
state-of-the-art SMILES-based generation approaches (Kus-
ner et al., 2017; Dai et al., 2018). We demonstrate that
our model produces 100% valid molecules when sampled
from a prior distribution, outperforming the top perform-
ing baseline by asignificant margin. In addition, weshow
that our model excels in discovering molecules with desired
properties, yielding a30% relativegain over the baselines.
2. Junction TreeVariational Autoencoder
Our approach extends the variational autoencoder (Kingma
Figure 3. Overview of our method: A molecular graph G is first
decomposed into its junction tree TG , where each colored node in
the tree represents a substructure in the molecule. We then encode
both the tree and graph into their latent embeddings z and z .
Li+, 2018
Jin+, 2018
アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
10
Graph Generation as MDP
• 反復的なグラフ生成のプロセスをMDPで定式化
– 状態: 𝑆 = {𝑠𝑡}
• エージェントが観測する、時刻 𝑡 での中間的なグラフ
– 行動: A = {𝑎 𝑡}
• 各時刻で現在のグラフに対する修正を記述する行動の集合(ノード・結合の追加など)
– 状態遷移: P = 𝑝(𝑠𝑡+1|𝑠𝑡, … , 𝑠0, 𝑎 𝑡)
• 𝑠𝑡, … , 𝑠0 において行動 𝑎 𝑡 を取った時の状態遷移確率
– 報酬: R = {𝑠𝑡}
• 状態 𝑠𝑡 到達時に得られる報酬関数
11
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
Graph Convolutional Policy Network (GCPN)
• Graph convolution による生成済みグラフ 𝐺𝑡 と候補構造 𝐶 の特徴抽出
– 候補構造(Scaffold)
• 生成済みのグラフに対して、新たに追加される部分グラフの候補
• いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定
– 拡張グラフ 𝐺𝑡 𝐶 に対し、GCNの一種 [Kipf+, 2017] を拡張したモデルを使って特徴抽出
• Kipf+ の手法を結合が考慮できるように拡張
–
– 𝑙 層目のノード埋め込み 𝐻(𝑙)
を結合の種類毎に定義した重み 𝑊𝑖
(𝑙)
を使って畳み込む
– 非線形変換などを行った後、AGG処理で各結合の種類に関して統合した結果を 𝐻(𝑙+1)
とする
– 𝐸𝑖: 結合に関する次元を追加した隣接テンソル 𝐸 の 𝑖 番目の slice、 𝐸𝑖 = 𝐸𝑖 + 𝐼、 𝐷𝑖 = 𝑘 𝐸𝑖𝑗𝑘
12
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
Graph Convolutional Policy Network (GCPN)
• 行動の予測
– グラフにおけるリンク予測の要領で、𝑎 𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎 𝑓𝑖𝑟𝑠𝑡, 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑, 𝑎 𝑒𝑑𝑔𝑒, 𝑎 𝑠𝑡𝑜𝑝) を推定する
• 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める
– 𝑓𝑓𝑖𝑟𝑠𝑡(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑓(𝑋)), 𝑎 𝑓𝑖𝑟𝑠𝑡 ~𝑓𝑓𝑖𝑟𝑠𝑡 𝑠𝑡 ∈ {0, 1} 𝑛
(𝑚 𝑓: ℝ 𝑛×𝑘
→ ℝ 𝑛
へ写像するMLP)
• 最初に選択されたノードに関する情報も使ってどのノードを2番目に選択するか決める
– 𝑓𝑠𝑒𝑐𝑜𝑛𝑑(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑠(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋)), 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑 ~𝑓𝑠𝑒𝑐𝑜𝑛𝑑 𝑠𝑡 ∈ {0, 1} 𝑛+𝑐
• 選択された2つのノードの情報を使って結合の種類を決める
– 𝑓𝑒𝑑𝑔𝑒(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑒(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑
)), 𝑎 𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 𝑠𝑡 ∈ {0, 1} 𝑏
• 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める
– 𝑓𝑠𝑡𝑜𝑝(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑡(𝐴𝐺𝐺 𝑋 )), 𝑎 𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1}
13
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
状態遷移 / 報酬
• 状態遷移
– 生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、
その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う
• 報酬
– Step reward
• 原子価ルールに違反したかどうか + Adversarial reward: 𝑉(πθ, 𝐷φ)
• Adversarial rewardを算出するDiscriminatorは一般的なGANフレームワークに従って学習する
–
– Final reward
• ドメイン固有の報酬(LogP, QED, 分子量等の組み合わせ)+ Adversarial reward: 𝑉(πθ, 𝐷φ)
14
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
方策勾配ベースの手法による方策の学習
• Proximal Policy Optimization (PPO) [Schulman+, 2017] により方策を学習
– 通常の方策勾配法:
• 𝐿 𝑃𝐺
(θ) = 𝔼 𝑡 log πθ(𝑎 𝑡|𝑠𝑡) 𝐴 𝑡
– Conservative Policy Iteration (CPI): 過去の方策との差分に注目
• 𝐿 𝐶𝑃𝐼 θ = 𝔼 𝑡
πθ(𝑎 𝑡|𝑠 𝑡)
πθ 𝑜𝑙𝑑
(𝑎 𝑡|𝑠 𝑡)
𝐴 𝑡 = 𝔼 𝑡 𝑟𝑡(θ) 𝐴 𝑡
– Proximal Policy Optimization (PPO): 方策の更新幅に制限を加えて学習を安定化させる
• 𝐿 𝐶𝐿𝐼𝑃
(θ) = 𝔼 𝑡 min(𝑟𝑡 𝜃 𝐴 𝑡, clip(𝑟𝑡 θ , 1 − ε, 1 + ε) 𝐴 𝑡)
15
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
16
実験設定
• データセット
– ZINCからサンプリングした25万件の分子を使用
– 原子数の上限: 38、原子の種類: 9、結合の種類: 3
• GCPNの設定
– 3層64次元の中間層 + 各層の出力に対してBatch Normalizationを適用
– Aggregation functionにはSUMを採用
– RLの学習率は0.001、expert pretrainingについての学習率は0.00025
– Adam optimizer、バッチサイズ: 32
• ベースライン
– JT-VAEとORGANをベースラインに設定
17
実験1: 属性最適化
• 下記二種の属性値を最大化することを目的に実験を行った
– Penalized logP: ring sizeや合成可能性スコアも含めたLogP(疎水性)スコア
– QED: Drug-likenessを測る指標
• 一貫して既存法よりも優れた結果を達成
– LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善
– Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった
• スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた
– 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、
スコア関数の欠陥をつくような生成例も存在した
18
Figure 2: Samples of generated molecules in property optimization and constrained property opti-
mization task. In (c), the two columns correspond to molecules before and after modification.
References
実験2: 属性ターゲティング
• LogP, 分子量が特定の値域に収めることを目的に実験を行った
– スコアが範囲に収まっているかどうかに加えて、生成物の多様性も含めて評価を行った
– 多様性は生成物同士の全ペアに対するMorgan Fingerprintのタニモト距離平均で評価
• 値が大きいほど多様性が高い
• 値域の制御については一貫して既存法よりも優れた結果を達成
– 多様性については一部他手法より劣っているものの、致命的なものは無く、値域の制御と
多様性を両立できていると言える
19
実験3: 制約付き属性最適化
• 所与の分子との類似度とPenalized logPとの両立を目的に実験を行った
– 800個ピックアップしたZINC分子との類似度を最適化後、Penalized logPについて最適化
する
– JT-VAEについては目的関数による制御ができないため、類似度の閾値δでフィルタを行った
• 一貫して既存法よりも優れた結果を達成
– Penalized logPの改善幅については平均して148%の改善を達成
– 元の分子の部分構造を保ちながら、目的関数を最適化する新たな分子の生成に関して
一定水準の品質で成功した
20
結論
• Graph Convolutional Policy Network(GCPN)を提案し、分子設計に適用した
– 分子属性の最適化、ターゲティングなどのタスクにおいて、既存の手法を上回る性能を
示した
– 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い
• GCPNは一般的な枠組みであり、分子生成以外の分野にも適用可能
– 電子回路やSNSなどの分野でも、ドメイン固有の目的関数を変更することで適用可能だと
考えられる
21
References
• Text-based generative models
– Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation
of molecules." ACS central science 4.2 (2018): 268-276.
– Guimaraes, Gabriel Lima, et al. "Objective-reinforced generative adversarial networks (ORGAN) for
sequence generation models." arXiv preprint arXiv:1705.10843 (2017).
– Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017.
• Graph-based generative models
– You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation."
NIPS (2018 to appear).
– Li, Yujia, et al. "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
– Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular
Graph Generation." ICML (2018).
– Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks."
ICLR (2017).
• Others
– Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
22

Contenu connexe

Tendances

Tendances (20)

Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
[DL輪読会]`強化学習のための状態表現学習 -より良い「世界モデル」の獲得に向けて-
 
【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?【論文紹介】How Powerful are Graph Neural Networks?
【論文紹介】How Powerful are Graph Neural Networks?
 
Graph Neural Networks
Graph Neural NetworksGraph Neural Networks
Graph Neural Networks
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
 
第52回SWO研究会チュートリアル資料
第52回SWO研究会チュートリアル資料第52回SWO研究会チュートリアル資料
第52回SWO研究会チュートリアル資料
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
SakataMoriLab GNN勉強会第一回資料
SakataMoriLab GNN勉強会第一回資料SakataMoriLab GNN勉強会第一回資料
SakataMoriLab GNN勉強会第一回資料
 
【解説】 一般逆行列
【解説】 一般逆行列【解説】 一般逆行列
【解説】 一般逆行列
 
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
[DL輪読会]近年のオフライン強化学習のまとめ —Offline Reinforcement Learning: Tutorial, Review, an...
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 

Similaire à [DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
Tomoki Koriyama
 

Similaire à [DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018) (20)

[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
 
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
LCCC2010:Learning on Cores,  Clusters and Cloudsの解説LCCC2010:Learning on Cores,  Clusters and Cloudsの解説
LCCC2010:Learning on Cores, Clusters and Cloudsの解説
 
PyTorch, PixyzによるGenerative Query Networkの実装
PyTorch, PixyzによるGenerative Query Networkの実装PyTorch, PixyzによるGenerative Query Networkの実装
PyTorch, PixyzによるGenerative Query Networkの実装
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法について
 
ベイズ統計によるデータ解析
ベイズ統計によるデータ解析ベイズ統計によるデータ解析
ベイズ統計によるデータ解析
 
[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...
[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...
[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...
 
CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)CNNの構造最適化手法(第3回3D勉強会)
CNNの構造最適化手法(第3回3D勉強会)
 
[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について
 
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
(文献紹介)Deep Unrolling: Learned ISTA (LISTA)
 
金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデル金融時系列のための深層t過程回帰モデル
金融時系列のための深層t過程回帰モデル
 
Feature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learningFeature engineering for predictive modeling using reinforcement learning
Feature engineering for predictive modeling using reinforcement learning
 
Graph-to-Sequence Learning using Gated Graph Neural Networks. [ACL'18] 論文紹介
Graph-to-Sequence Learning using Gated Graph Neural Networks. [ACL'18] 論文紹介Graph-to-Sequence Learning using Gated Graph Neural Networks. [ACL'18] 論文紹介
Graph-to-Sequence Learning using Gated Graph Neural Networks. [ACL'18] 論文紹介
 
Graph Clustering on Missing Data
Graph Clustering on Missing DataGraph Clustering on Missing Data
Graph Clustering on Missing Data
 
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
 
パターン認識モデル初歩の初歩
パターン認識モデル初歩の初歩パターン認識モデル初歩の初歩
パターン認識モデル初歩の初歩
 
NeurIPS'21参加報告 tanimoto_public
NeurIPS'21参加報告 tanimoto_publicNeurIPS'21参加報告 tanimoto_public
NeurIPS'21参加報告 tanimoto_public
 
Knowledge_graph_alignment_with_entity-pair_embedding
Knowledge_graph_alignment_with_entity-pair_embeddingKnowledge_graph_alignment_with_entity-pair_embedding
Knowledge_graph_alignment_with_entity-pair_embedding
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular Properties
 
CVPR 2019 report (30 papers)
CVPR 2019 report (30 papers)CVPR 2019 report (30 papers)
CVPR 2019 report (30 papers)
 
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI2018 unde...
 

Plus de Deep Learning JP

Plus de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

  • 1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018) Kazuki Fujikawa, DeNA
  • 2. サマリ • 書誌情報 – Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation • NIPS2018(to appear) • Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec • 概要 – Graph Convolutional Policy Network(GCPN)を提案 • 強化学習で所望の属性を最適化する分子グラフを生成する • ドメイン特有の報酬と敵対的な損失が最適化されるように方策を学習する – 分子属性の最適化、ターゲティングなどの実験で既存の手法を上回る性能を示した • 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い 2
  • 3. アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 3
  • 4. アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 4
  • 5. 背景 • 目的関数が最適化できるグラフ構造を生成することは、創薬や材料化学の分野 において重要 – 一般的な分子構造設計では、原子価などの物理法則に従いながら、Drug-likenessや合成 可能性といった特性が理想的な値にすることを考える – 複雑で微分不可能なルールに対して最適化することは依然として困難 • 可変長のグラフを直接生成することは容易ではない – 自然言語のような直列の系列と比較して、「分岐・結合種の存在」「始点が不明確」などの 理由で難易度が高い 5図引用: Gomez-Bombarelli+, 2018
  • 6. アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 6
  • 7. 関連研究 • 扱うデータ形式の違いで2種類に大別できる – テキストベース • SMILES – 分子の化学構造を文字列で表現する記法 • SMILES CFG (Context-free Grammar) – SMILESを生成する文脈自由文法の生成規則列 – グラフベース • 隣接行列を直接生成 • ノード・結合を自己回帰的に生成(隣接行列を一行ずつ生成) 7 分子名 構造式(グラフ) 隣接行列 SMILES SMILES CFG ベンゼン c1ccccc1 smiles → chain chain → chain, branched atom chain → branched atom branched atom → atom, ringbond branched atom → atom atom → aromatic organic atom → aliphatic organic ringbond → digit aromatic organic → ’c’ aliphatic organic → ‘C’ aliphatic organic → ‘N’ digit → ‘1’ digit → ‘2’ 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0
  • 8. 関連研究(SMILES-based) • テキストベースの生成モデルを使ってSMILESを生成するアプローチ – Automatic chemical design using a data-driven continuous representation of molecules [Gómez-Bombarelli+, 2018] • 入力SMILESをAuto-Encoder, VAEで再構築するように学習することで、潜在空間を学習 • ベイズ最適化で目的変数を最適化 – ORGAN [Guimaraes+, 2017] • RNNDecoderによるSMILESの文字列生成をGAN+RLで最適化 • SeqGAN [Yu+, 2017] と同様、Discriminatorが評価したスコア平均を報酬に学習 • 任意のヒューリスティクス(Diversity等)から得たスコアも同時に最大化する 8 Gomez-Bombarelli+, 2018 to generate drug-like molecules. [Gómez-Bombarelli et al., 2016b] employed a variational autoencoder to build a la- tent, continuous space where property optimization can be made through surrogate optimization. Finally, [Kadurin et al., 2017] presented a GAN model for drug generation. Ad- ditionally, the approach presented in this paper has recently been applied to molecular design [Sanchez-Lengeling et al., 2017]. In the field of music generation, [Lee et al., 2017] built a SeqGAN model employing an efficient representation of multi-channel MIDI to generate polyphonic music. [Chen et al., 2017] presented Fusion GAN, a dual-learning GAN model that can fuse two data distributions. [Jaques et al., 2017] employ deep Q-learning with a cross-entropy reward to optimize the quality of melodies generated from an RNN. In adversarial training, [Pfau and Vinyals, 2016] recontex- tualizes GANs in the actor-critic setting. This connection is also explored with the Wasserstein-1 distance in WGANs [Arjovsky et al., 2017]. Minibatch discrimination and feature mapping were used to promote diversity in GANs [Salimans et al., 2016]. Another approach to avoid mode collapse was shown with Unrolled GANs [Metz et al., 2016]. Issues and convergence of GANs has been studied in [Mescheder et al., 2017]. 3 Background Inthissection, weelaborateontheGAN andRL setting based on SeqGAN [Yu et al., 2017] G✓ is agenerator parametrized by ✓, that is trained to pro- duce high-quality sequences Y1:T = (y1, ..., yT ) of length T and a discriminator model Dφ parametrized by φ, trained to classify real and generated sequences. G✓ is trained to deceive Dφ, and Dφ to classify correctly. Both models are trained in alternation, following aminimax game: is completed. In order to do so, we perform N -time Monte Carlo search with thecanonical rollout policy G✓ represented as MCG✓ (Y1:t ; N ) = { Y1 1:T , ..., YN 1:T } (3) whereYn 1:t = Y1:t and Yn t + 1:T isstochastically sampled via the policy G✓. Now Q(s, a) becomes Q(Y1:t − 1, yt ) = 8 >< >: 1 N P n = 1..N R(Yn 1:T ), with Yn 1:T 2 MCG✓ (Y1:t ; N ), if t < T. R(Y1:T ), if t = T. (4) An unbiased estimation of the gradient of J(✓) can be de- rived as r ✓J(✓) ' 1 T X t = 1,...,T Eyt ⇠G✓(yt |Y1:t − 1 ) [ r ✓logG✓(yt |Y1:t − 1) · Q(Y1:t − 1, yt )] (5) Finally in SeqGAN thereward function isprovided by Dφ. 4 ORGAN Figure 1: Schema for ORGAN. Left: D is trained as a classifier receiving asinput amix of real dataand generated databy G. Right: Guimaraes+, 2017
  • 9. 関連研究(Graph-based) • グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ – Learning deep generative models of graphs [Li+, 2018] • ノード・結合を順々に自己回帰的に生成する • 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する ノード・結合を決める – Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018] • 環などの原子団を一つのグループにまとめることにより、 グラフ構造を木構造に変換する(Tree decomposition) • 木構造をVAEの枠組みで再構築するように学習する • Graph Convolutionで特徴抽出した結果も使って木構造から グラフへと戻す 9 Junction TreeVariational Autoencoder for Molecular Graph Generation Figure 2. Comparison of two graph generation schemes: Structure by structure approach is preferred asit avoids invalid intermediate states (marked in red) encountered in node by node approach. ond phase, the subgraphs (nodes in the tree) areassembled together into acoherent molecular graph. We evaluate our model on multiple tasks ranging from molecular generation to optimization of a given molecule according to desired properties. As baselines, we utilize state-of-the-art SMILES-based generation approaches (Kus- ner et al., 2017; Dai et al., 2018). We demonstrate that our model produces 100% valid molecules when sampled from a prior distribution, outperforming the top perform- ing baseline by asignificant margin. In addition, weshow that our model excels in discovering molecules with desired properties, yielding a30% relativegain over the baselines. 2. Junction TreeVariational Autoencoder Our approach extends the variational autoencoder (Kingma Figure 3. Overview of our method: A molecular graph G is first decomposed into its junction tree TG , where each colored node in the tree represents a substructure in the molecule. We then encode both the tree and graph into their latent embeddings z and z . Li+, 2018 Jin+, 2018
  • 10. アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 10
  • 11. Graph Generation as MDP • 反復的なグラフ生成のプロセスをMDPで定式化 – 状態: 𝑆 = {𝑠𝑡} • エージェントが観測する、時刻 𝑡 での中間的なグラフ – 行動: A = {𝑎 𝑡} • 各時刻で現在のグラフに対する修正を記述する行動の集合(ノード・結合の追加など) – 状態遷移: P = 𝑝(𝑠𝑡+1|𝑠𝑡, … , 𝑠0, 𝑎 𝑡) • 𝑠𝑡, … , 𝑠0 において行動 𝑎 𝑡 を取った時の状態遷移確率 – 報酬: R = {𝑠𝑡} • 状態 𝑠𝑡 到達時に得られる報酬関数 11 In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe the problem definition, theenvironment design, and theGraph Convolutional Policy Network that predicts adistribution of actions which areused to update thegraph being generated. 3.1 Problem Definition Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n istheadjacency matrix, and F 2 Rn⇥d isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n to bethe (discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if there exists an edgeof type i between nodes j and k, and A = P b i = 1 Ei . Our primary objectiveis to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0 )], where G0 isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest. It isalso of practical importanceto constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing theproperty optimization objectivewith EG,G0[J (G, G0 )] under distancemetric J (·, ·). In thecaseof molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance metric isan adversarially trained discriminator.
  • 12. Graph Convolutional Policy Network (GCPN) • Graph convolution による生成済みグラフ 𝐺𝑡 と候補構造 𝐶 の特徴抽出 – 候補構造(Scaffold) • 生成済みのグラフに対して、新たに追加される部分グラフの候補 • いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定 – 拡張グラフ 𝐺𝑡 𝐶 に対し、GCNの一種 [Kipf+, 2017] を拡張したモデルを使って特徴抽出 • Kipf+ の手法を結合が考慮できるように拡張 – – 𝑙 層目のノード埋め込み 𝐻(𝑙) を結合の種類毎に定義した重み 𝑊𝑖 (𝑙) を使って畳み込む – 非線形変換などを行った後、AGG処理で各結合の種類に関して統合した結果を 𝐻(𝑙+1) とする – 𝐸𝑖: 結合に関する次元を追加した隣接テンソル 𝐸 の 𝑖 番目の slice、 𝐸𝑖 = 𝐸𝑖 + 𝐼、 𝐷𝑖 = 𝑘 𝐸𝑖𝑗𝑘 12 In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe the problem definition, theenvironment design, and theGraph Convolutional Policy Network that predicts adistribution of actions which areused to update thegraph being generated. 3.1 Problem Definition Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n istheadjacency matrix, and F 2 Rn⇥d isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n to bethe (discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if there exists an edgeof type i between nodes j and k, and A = P b i = 1 Ei . Our primary objectiveis to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0 )], where G0 isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest. It isalso of practical importanceto constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing theproperty optimization objectivewith EG,G0[J (G, G0 )] under distancemetric J (·, ·). In thecaseof molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance metric isan adversarially trained discriminator.
  • 13. Graph Convolutional Policy Network (GCPN) • 行動の予測 – グラフにおけるリンク予測の要領で、𝑎 𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎 𝑓𝑖𝑟𝑠𝑡, 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑, 𝑎 𝑒𝑑𝑔𝑒, 𝑎 𝑠𝑡𝑜𝑝) を推定する • 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める – 𝑓𝑓𝑖𝑟𝑠𝑡(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑓(𝑋)), 𝑎 𝑓𝑖𝑟𝑠𝑡 ~𝑓𝑓𝑖𝑟𝑠𝑡 𝑠𝑡 ∈ {0, 1} 𝑛 (𝑚 𝑓: ℝ 𝑛×𝑘 → ℝ 𝑛 へ写像するMLP) • 最初に選択されたノードに関する情報も使ってどのノードを2番目に選択するか決める – 𝑓𝑠𝑒𝑐𝑜𝑛𝑑(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑠(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡 , 𝑋)), 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑 ~𝑓𝑠𝑒𝑐𝑜𝑛𝑑 𝑠𝑡 ∈ {0, 1} 𝑛+𝑐 • 選択された2つのノードの情報を使って結合の種類を決める – 𝑓𝑒𝑑𝑔𝑒(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑒(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡 , 𝑋 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑 )), 𝑎 𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 𝑠𝑡 ∈ {0, 1} 𝑏 • 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める – 𝑓𝑠𝑡𝑜𝑝(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑡(𝐴𝐺𝐺 𝑋 )), 𝑎 𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1} 13 In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe the problem definition, theenvironment design, and theGraph Convolutional Policy Network that predicts adistribution of actions which areused to update thegraph being generated. 3.1 Problem Definition Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n istheadjacency matrix, and F 2 Rn⇥d isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n to bethe (discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if there exists an edgeof type i between nodes j and k, and A = P b i = 1 Ei . Our primary objectiveis to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0 )], where G0 isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest. It isalso of practical importanceto constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing theproperty optimization objectivewith EG,G0[J (G, G0 )] under distancemetric J (·, ·). In thecaseof molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance metric isan adversarially trained discriminator.
  • 14. 状態遷移 / 報酬 • 状態遷移 – 生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、 その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う • 報酬 – Step reward • 原子価ルールに違反したかどうか + Adversarial reward: 𝑉(πθ, 𝐷φ) • Adversarial rewardを算出するDiscriminatorは一般的なGANフレームワークに従って学習する – – Final reward • ドメイン固有の報酬(LogP, QED, 分子量等の組み合わせ)+ Adversarial reward: 𝑉(πθ, 𝐷φ) 14 In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe the problem definition, theenvironment design, and theGraph Convolutional Policy Network that predicts adistribution of actions which areused to update thegraph being generated. 3.1 Problem Definition Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n istheadjacency matrix, and F 2 Rn⇥d isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n to bethe (discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if there exists an edgeof type i between nodes j and k, and A = P b i = 1 Ei . Our primary objectiveis to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0 )], where G0 isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest. It isalso of practical importanceto constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing theproperty optimization objectivewith EG,G0[J (G, G0 )] under distancemetric J (·, ·). In thecaseof molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance metric isan adversarially trained discriminator.
  • 15. 方策勾配ベースの手法による方策の学習 • Proximal Policy Optimization (PPO) [Schulman+, 2017] により方策を学習 – 通常の方策勾配法: • 𝐿 𝑃𝐺 (θ) = 𝔼 𝑡 log πθ(𝑎 𝑡|𝑠𝑡) 𝐴 𝑡 – Conservative Policy Iteration (CPI): 過去の方策との差分に注目 • 𝐿 𝐶𝑃𝐼 θ = 𝔼 𝑡 πθ(𝑎 𝑡|𝑠 𝑡) πθ 𝑜𝑙𝑑 (𝑎 𝑡|𝑠 𝑡) 𝐴 𝑡 = 𝔼 𝑡 𝑟𝑡(θ) 𝐴 𝑡 – Proximal Policy Optimization (PPO): 方策の更新幅に制限を加えて学習を安定化させる • 𝐿 𝐶𝐿𝐼𝑃 (θ) = 𝔼 𝑡 min(𝑟𝑡 𝜃 𝐴 𝑡, clip(𝑟𝑡 θ , 1 − ε, 1 + ε) 𝐴 𝑡) 15 In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe the problem definition, theenvironment design, and theGraph Convolutional Policy Network that predicts adistribution of actions which areused to update thegraph being generated. 3.1 Problem Definition Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n istheadjacency matrix, and F 2 Rn⇥d isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n to bethe (discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if there exists an edgeof type i between nodes j and k, and A = P b i = 1 Ei . Our primary objectiveis to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0 )], where G0 isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest. It isalso of practical importanceto constrain our model with two main sources of prior knowledge. (1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing theproperty optimization objectivewith EG,G0[J (G, G0 )] under distancemetric J (·, ·). In thecaseof molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance metric isan adversarially trained discriminator.
  • 16. アウトライン • 背景 • 関連研究 • 提案手法 • 実験・結果 16
  • 17. 実験設定 • データセット – ZINCからサンプリングした25万件の分子を使用 – 原子数の上限: 38、原子の種類: 9、結合の種類: 3 • GCPNの設定 – 3層64次元の中間層 + 各層の出力に対してBatch Normalizationを適用 – Aggregation functionにはSUMを採用 – RLの学習率は0.001、expert pretrainingについての学習率は0.00025 – Adam optimizer、バッチサイズ: 32 • ベースライン – JT-VAEとORGANをベースラインに設定 17
  • 18. 実験1: 属性最適化 • 下記二種の属性値を最大化することを目的に実験を行った – Penalized logP: ring sizeや合成可能性スコアも含めたLogP(疎水性)スコア – QED: Drug-likenessを測る指標 • 一貫して既存法よりも優れた結果を達成 – LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善 – Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった • スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた – 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、 スコア関数の欠陥をつくような生成例も存在した 18 Figure 2: Samples of generated molecules in property optimization and constrained property opti- mization task. In (c), the two columns correspond to molecules before and after modification. References
  • 19. 実験2: 属性ターゲティング • LogP, 分子量が特定の値域に収めることを目的に実験を行った – スコアが範囲に収まっているかどうかに加えて、生成物の多様性も含めて評価を行った – 多様性は生成物同士の全ペアに対するMorgan Fingerprintのタニモト距離平均で評価 • 値が大きいほど多様性が高い • 値域の制御については一貫して既存法よりも優れた結果を達成 – 多様性については一部他手法より劣っているものの、致命的なものは無く、値域の制御と 多様性を両立できていると言える 19
  • 20. 実験3: 制約付き属性最適化 • 所与の分子との類似度とPenalized logPとの両立を目的に実験を行った – 800個ピックアップしたZINC分子との類似度を最適化後、Penalized logPについて最適化 する – JT-VAEについては目的関数による制御ができないため、類似度の閾値δでフィルタを行った • 一貫して既存法よりも優れた結果を達成 – Penalized logPの改善幅については平均して148%の改善を達成 – 元の分子の部分構造を保ちながら、目的関数を最適化する新たな分子の生成に関して 一定水準の品質で成功した 20
  • 21. 結論 • Graph Convolutional Policy Network(GCPN)を提案し、分子設計に適用した – 分子属性の最適化、ターゲティングなどのタスクにおいて、既存の手法を上回る性能を 示した – 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い • GCPNは一般的な枠組みであり、分子生成以外の分野にも適用可能 – 電子回路やSNSなどの分野でも、ドメイン固有の目的関数を変更することで適用可能だと 考えられる 21
  • 22. References • Text-based generative models – Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation of molecules." ACS central science 4.2 (2018): 268-276. – Guimaraes, Gabriel Lima, et al. "Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models." arXiv preprint arXiv:1705.10843 (2017). – Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017. • Graph-based generative models – You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NIPS (2018 to appear). – Li, Yujia, et al. "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018). – Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." ICML (2018). – Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." ICLR (2017). • Others – Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). 22