[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Graph Convolutional Policy Network for Goal-Directed
Molecular Graph Generation (NIPS2018)
Kazuki Fujikawa, DeNA

サマリ
• 書誌情報
– Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation
• NIPS2018（to appear）
• Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec
• 概要
– Graph Convolutional Policy Network（GCPN）を提案
• 強化学習で所望の属性を最適化する分子グラフを生成する
• ドメイン特有の報酬と敵対的な損失が最適化されるように方策を学習する
– 分子属性の最適化、ターゲティングなどの実験で既存の手法を上回る性能を示した
• 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い
2

アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
3

アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
4

背景
• 目的関数が最適化できるグラフ構造を生成することは、創薬や材料化学の分野
において重要
– 一般的な分子構造設計では、原子価などの物理法則に従いながら、Drug-likenessや合成
可能性といった特性が理想的な値にすることを考える
– 複雑で微分不可能なルールに対して最適化することは依然として困難
• 可変長のグラフを直接生成することは容易ではない
– 自然言語のような直列の系列と比較して、「分岐・結合種の存在」「始点が不明確」などの
理由で難易度が高い
5図引用: Gomez-Bombarelli+, 2018

アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
6

関連研究
• 扱うデータ形式の違いで2種類に大別できる
– テキストベース
• SMILES
– 分子の化学構造を文字列で表現する記法
• SMILES CFG (Context-free Grammar)
– SMILESを生成する文脈自由文法の生成規則列
– グラフベース
• 隣接行列を直接生成
• ノード・結合を自己回帰的に生成（隣接行列を一行ずつ生成）
7
分子名構造式（グラフ）隣接行列 SMILES SMILES CFG
ベンゼン c1ccccc1
smiles → chain
chain → chain, branched atom
chain → branched atom
branched atom → atom, ringbond
branched atom → atom
atom → aromatic organic
atom → aliphatic organic
ringbond → digit
aromatic organic → ’c’
aliphatic organic → ‘C’
aliphatic organic → ‘N’
digit → ‘1’
digit → ‘2’
0 1 0 0 0 1
1 0 1 0 0 0
0 1 0 1 0 0
0 0 1 0 1 0
0 0 0 1 0 1
1 0 0 0 1 0

関連研究（SMILES-based）
• テキストベースの生成モデルを使ってSMILESを生成するアプローチ
– Automatic chemical design using a data-driven continuous representation of
molecules [Gómez-Bombarelli+, 2018]
• 入力SMILESをAuto-Encoder, VAEで再構築するように学習することで、潜在空間を学習
• ベイズ最適化で目的変数を最適化
– ORGAN [Guimaraes+, 2017]
• RNNDecoderによるSMILESの文字列生成をGAN+RLで最適化
• SeqGAN [Yu+, 2017] と同様、Discriminatorが評価したスコア平均を報酬に学習
• 任意のヒューリスティクス（Diversity等）から得たスコアも同時に最大化する
8
Gomez-Bombarelli+, 2018
to generate drug-like molecules. [Gómez-Bombarelli et al.,
2016b] employed a variational autoencoder to build a la-
tent, continuous space where property optimization can be
made through surrogate optimization. Finally, [Kadurin et
al., 2017] presented a GAN model for drug generation. Ad-
ditionally, the approach presented in this paper has recently
been applied to molecular design [Sanchez-Lengeling et al.,
2017].
In the field of music generation, [Lee et al., 2017] built
a SeqGAN model employing an efficient representation of
multi-channel MIDI to generate polyphonic music. [Chen
et al., 2017] presented Fusion GAN, a dual-learning GAN
model that can fuse two data distributions. [Jaques et al.,
2017] employ deep Q-learning with a cross-entropy reward
to optimize the quality of melodies generated from an RNN.
In adversarial training, [Pfau and Vinyals, 2016] recontex-
tualizes GANs in the actor-critic setting. This connection
is also explored with the Wasserstein-1 distance in WGANs
[Arjovsky et al., 2017]. Minibatch discrimination and feature
mapping were used to promote diversity in GANs [Salimans
et al., 2016]. Another approach to avoid mode collapse was
shown with Unrolled GANs [Metz et al., 2016]. Issues and
convergence of GANs has been studied in [Mescheder et al.,
2017].
3 Background
Inthissection, weelaborateontheGAN andRL setting based
on SeqGAN [Yu et al., 2017]
G✓ is agenerator parametrized by ✓, that is trained to pro-
duce high-quality sequences Y1:T = (y1, ..., yT ) of length
T and a discriminator model Dφ parametrized by φ, trained
to classify real and generated sequences. G✓ is trained to
deceive Dφ, and Dφ to classify correctly. Both models are
trained in alternation, following aminimax game:
is completed. In order to do so, we perform N -time Monte
Carlo search with thecanonical rollout policy G✓ represented
as
MCG✓
(Y1:t ; N ) = { Y1
1:T , ..., YN
1:T } (3)
whereYn
1:t = Y1:t and Yn
t + 1:T isstochastically sampled via
the policy G✓. Now Q(s, a) becomes
Q(Y1:t − 1, yt ) =
8
><
>:
1
N
P
n = 1..N
R(Yn
1:T ), with
Yn
1:T 2 MCG✓
(Y1:t ; N ), if t < T.
R(Y1:T ), if t = T.
(4)
An unbiased estimation of the gradient of J(✓) can be de-
rived as
r ✓J(✓) '
1
T
X
t = 1,...,T
Eyt ⇠G✓(yt |Y1:t − 1 ) [
r ✓logG✓(yt |Y1:t − 1) · Q(Y1:t − 1, yt )] (5)
Finally in SeqGAN thereward function isprovided by Dφ.
4 ORGAN
Figure 1: Schema for ORGAN. Left: D is trained as a classifier
receiving asinput amix of real dataand generated databy G. Right:
Guimaraes+, 2017

関連研究（Graph-based）
• グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ
– Learning deep generative models of graphs [Li+, 2018]
• ノード・結合を順々に自己回帰的に生成する
• 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する
ノード・結合を決める
– Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018]
• 環などの原子団を一つのグループにまとめることにより、
グラフ構造を木構造に変換する（Tree decomposition）
• 木構造をVAEの枠組みで再構築するように学習する
• Graph Convolutionで特徴抽出した結果も使って木構造から
グラフへと戻す
9
Junction TreeVariational Autoencoder for Molecular Graph Generation
Figure 2. Comparison of two graph generation schemes: Structure
by structure approach is preferred asit avoids invalid intermediate
states (marked in red) encountered in node by node approach.
ond phase, the subgraphs (nodes in the tree) areassembled
together into acoherent molecular graph.
We evaluate our model on multiple tasks ranging from
molecular generation to optimization of a given molecule
according to desired properties. As baselines, we utilize
state-of-the-art SMILES-based generation approaches (Kus-
ner et al., 2017; Dai et al., 2018). We demonstrate that
our model produces 100% valid molecules when sampled
from a prior distribution, outperforming the top perform-
ing baseline by asigniﬁcant margin. In addition, weshow
that our model excels in discovering molecules with desired
properties, yielding a30% relativegain over the baselines.
2. Junction TreeVariational Autoencoder
Our approach extends the variational autoencoder (Kingma
Figure 3. Overview of our method: A molecular graph G is ﬁrst
decomposed into its junction tree TG , where each colored node in
the tree represents a substructure in the molecule. We then encode
both the tree and graph into their latent embeddings z and z .
Li+, 2018
Jin+, 2018

アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
10

Graph Generation as MDP
• 反復的なグラフ生成のプロセスをMDPで定式化
– 状態: 𝑆 = {𝑠𝑡}
• エージェントが観測する、時刻 𝑡 での中間的なグラフ
– 行動: A = {𝑎 𝑡}
• 各時刻で現在のグラフに対する修正を記述する行動の集合（ノード・結合の追加など）
– 状態遷移: P = 𝑝(𝑠𝑡+1|𝑠𝑡, … , 𝑠0, 𝑎 𝑡)
• 𝑠𝑡, … , 𝑠0 において行動 𝑎 𝑡 を取った時の状態遷移確率
– 報酬: R = {𝑠𝑡}
• 状態 𝑠𝑡 到達時に得られる報酬関数
11
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.

Graph Convolutional Policy Network (GCPN)
• Graph convolution による生成済みグラフ 𝐺𝑡 と候補構造 𝐶 の特徴抽出
– 候補構造（Scaffold）
• 生成済みのグラフに対して、新たに追加される部分グラフの候補
• いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定
– 拡張グラフ 𝐺𝑡 𝐶 に対し、GCNの一種 [Kipf+, 2017] を拡張したモデルを使って特徴抽出
• Kipf+ の手法を結合が考慮できるように拡張
–
– 𝑙 層目のノード埋め込み 𝐻(𝑙)
を結合の種類毎に定義した重み 𝑊𝑖
(𝑙)
を使って畳み込む
– 非線形変換などを行った後、AGG処理で各結合の種類に関して統合した結果を 𝐻(𝑙+1)
とする
– 𝐸𝑖: 結合に関する次元を追加した隣接テンソル 𝐸 の 𝑖 番目の slice、 𝐸𝑖 = 𝐸𝑖 + 𝐼、 𝐷𝑖 = 𝑘 𝐸𝑖𝑗𝑘
12
to bethe
P b
)],
where G0

Graph Convolutional Policy Network (GCPN)
• 行動の予測
– グラフにおけるリンク予測の要領で、𝑎 𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎 𝑓𝑖𝑟𝑠𝑡, 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑, 𝑎 𝑒𝑑𝑔𝑒, 𝑎 𝑠𝑡𝑜𝑝) を推定する
• 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める
– 𝑓𝑓𝑖𝑟𝑠𝑡(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑓(𝑋)), 𝑎 𝑓𝑖𝑟𝑠𝑡 ~𝑓𝑓𝑖𝑟𝑠𝑡 𝑠𝑡 ∈ {0, 1} 𝑛
（𝑚 𝑓: ℝ 𝑛×𝑘
→ ℝ 𝑛
へ写像するMLP）
• 最初に選択されたノードに関する情報も使ってどのノードを2番目に選択するか決める
– 𝑓𝑠𝑒𝑐𝑜𝑛𝑑(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑠(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋)), 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑 ~𝑓𝑠𝑒𝑐𝑜𝑛𝑑 𝑠𝑡 ∈ {0, 1} 𝑛+𝑐
• 選択された2つのノードの情報を使って結合の種類を決める
– 𝑓𝑒𝑑𝑔𝑒(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑒(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑
)), 𝑎 𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 𝑠𝑡 ∈ {0, 1} 𝑏
• 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める
– 𝑓𝑠𝑡𝑜𝑝(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑡(𝐴𝐺𝐺 𝑋 )), 𝑎 𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1}
13
to bethe
P b
)],
where G0

状態遷移 / 報酬
• 状態遷移
– 生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、
その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う
• 報酬
– Step reward
• 原子価ルールに違反したかどうか + Adversarial reward: 𝑉(πθ, 𝐷φ)
• Adversarial rewardを算出するDiscriminatorは一般的なGANフレームワークに従って学習する
–
– Final reward
• ドメイン固有の報酬（LogP, QED, 分子量等の組み合わせ）+ Adversarial reward: 𝑉(πθ, 𝐷φ)
14
to bethe
P b
)],
where G0

方策勾配ベースの手法による方策の学習
• Proximal Policy Optimization (PPO) [Schulman+, 2017] により方策を学習
– 通常の方策勾配法:
• 𝐿 𝑃𝐺
(θ) = 𝔼 𝑡 log πθ(𝑎 𝑡|𝑠𝑡) 𝐴 𝑡
– Conservative Policy Iteration (CPI): 過去の方策との差分に注目
• 𝐿 𝐶𝑃𝐼 θ = 𝔼 𝑡
πθ(𝑎 𝑡|𝑠 𝑡)
πθ 𝑜𝑙𝑑
(𝑎 𝑡|𝑠 𝑡)
𝐴 𝑡 = 𝔼 𝑡 𝑟𝑡(θ) 𝐴 𝑡
– Proximal Policy Optimization (PPO): 方策の更新幅に制限を加えて学習を安定化させる
• 𝐿 𝐶𝐿𝐼𝑃
(θ) = 𝔼 𝑡 min(𝑟𝑡 𝜃 𝐴 𝑡, clip(𝑟𝑡 θ , 1 − ε, 1 + ε) 𝐴 𝑡)
15
to bethe
P b
)],
where G0

アウトライン
• 背景
• 関連研究
• 提案手法
• 実験・結果
16

実験設定
• データセット
– ZINCからサンプリングした25万件の分子を使用
– 原子数の上限: 38、原子の種類: 9、結合の種類: 3
• GCPNの設定
– 3層64次元の中間層 + 各層の出力に対してBatch Normalizationを適用
– Aggregation functionにはSUMを採用
– RLの学習率は0.001、expert pretrainingについての学習率は0.00025
– Adam optimizer、バッチサイズ: 32
• ベースライン
– JT-VAEとORGANをベースラインに設定
17

実験1: 属性最適化
• 下記二種の属性値を最大化することを目的に実験を行った
– Penalized logP: ring sizeや合成可能性スコアも含めたLogP（疎水性）スコア
– QED: Drug-likenessを測る指標
• 一貫して既存法よりも優れた結果を達成
– LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善
– Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった
• スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた
– 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、
スコア関数の欠陥をつくような生成例も存在した
18
Figure 2: Samples of generated molecules in property optimization and constrained property opti-
mization task. In (c), the two columns correspond to molecules before and after modiﬁcation.
References

実験2: 属性ターゲティング
• LogP, 分子量が特定の値域に収めることを目的に実験を行った
– スコアが範囲に収まっているかどうかに加えて、生成物の多様性も含めて評価を行った
– 多様性は生成物同士の全ペアに対するMorgan Fingerprintのタニモト距離平均で評価
• 値が大きいほど多様性が高い
• 値域の制御については一貫して既存法よりも優れた結果を達成
– 多様性については一部他手法より劣っているものの、致命的なものは無く、値域の制御と
多様性を両立できていると言える
19

実験3: 制約付き属性最適化
• 所与の分子との類似度とPenalized logPとの両立を目的に実験を行った
– 800個ピックアップしたZINC分子との類似度を最適化後、Penalized logPについて最適化
する
– JT-VAEについては目的関数による制御ができないため、類似度の閾値δでフィルタを行った
• 一貫して既存法よりも優れた結果を達成
– Penalized logPの改善幅については平均して148%の改善を達成
– 元の分子の部分構造を保ちながら、目的関数を最適化する新たな分子の生成に関して
一定水準の品質で成功した
20

結論
• Graph Convolutional Policy Network（GCPN）を提案し、分子設計に適用した
– 分子属性の最適化、ターゲティングなどのタスクにおいて、既存の手法を上回る性能を
示した
– 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い
• GCPNは一般的な枠組みであり、分子生成以外の分野にも適用可能
– 電子回路やSNSなどの分野でも、ドメイン固有の目的関数を変更することで適用可能だと
考えられる
21

References
• Text-based generative models
– Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation
of molecules." ACS central science 4.2 (2018): 268-276.
– Guimaraes, Gabriel Lima, et al. "Objective-reinforced generative adversarial networks (ORGAN) for
sequence generation models." arXiv preprint arXiv:1705.10843 (2017).
– Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017.
• Graph-based generative models
– You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation."
NIPS (2018 to appear).
– Li, Yujia, et al. "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
– Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular
Graph Generation." ICML (2018).
– Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks."
ICLR (2017).
• Others
– Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
22

[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à [DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)

Similaire à [DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018) (20)

Plus de Deep Learning JP

Plus de Deep Learning JP (20)

[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (NIPS2018)