8. 関連研究(SMILES-based)
• テキストベースの生成モデルを使ってSMILESを生成するアプローチ
– Automatic chemical design using a data-driven continuous representation of
molecules [Gómez-Bombarelli+, 2018]
• 入力SMILESをAuto-Encoder, VAEで再構築するように学習することで、潜在空間を学習
• ベイズ最適化で目的変数を最適化
– ORGAN [Guimaraes+, 2017]
• RNNDecoderによるSMILESの文字列生成をGAN+RLで最適化
• SeqGAN [Yu+, 2017] と同様、Discriminatorが評価したスコア平均を報酬に学習
• 任意のヒューリスティクス(Diversity等)から得たスコアも同時に最大化する
8
Gomez-Bombarelli+, 2018
to generate drug-like molecules. [Gómez-Bombarelli et al.,
2016b] employed a variational autoencoder to build a la-
tent, continuous space where property optimization can be
made through surrogate optimization. Finally, [Kadurin et
al., 2017] presented a GAN model for drug generation. Ad-
ditionally, the approach presented in this paper has recently
been applied to molecular design [Sanchez-Lengeling et al.,
2017].
In the field of music generation, [Lee et al., 2017] built
a SeqGAN model employing an efficient representation of
multi-channel MIDI to generate polyphonic music. [Chen
et al., 2017] presented Fusion GAN, a dual-learning GAN
model that can fuse two data distributions. [Jaques et al.,
2017] employ deep Q-learning with a cross-entropy reward
to optimize the quality of melodies generated from an RNN.
In adversarial training, [Pfau and Vinyals, 2016] recontex-
tualizes GANs in the actor-critic setting. This connection
is also explored with the Wasserstein-1 distance in WGANs
[Arjovsky et al., 2017]. Minibatch discrimination and feature
mapping were used to promote diversity in GANs [Salimans
et al., 2016]. Another approach to avoid mode collapse was
shown with Unrolled GANs [Metz et al., 2016]. Issues and
convergence of GANs has been studied in [Mescheder et al.,
2017].
3 Background
Inthissection, weelaborateontheGAN andRL setting based
on SeqGAN [Yu et al., 2017]
G✓ is agenerator parametrized by ✓, that is trained to pro-
duce high-quality sequences Y1:T = (y1, ..., yT ) of length
T and a discriminator model Dφ parametrized by φ, trained
to classify real and generated sequences. G✓ is trained to
deceive Dφ, and Dφ to classify correctly. Both models are
trained in alternation, following aminimax game:
is completed. In order to do so, we perform N -time Monte
Carlo search with thecanonical rollout policy G✓ represented
as
MCG✓
(Y1:t ; N ) = { Y1
1:T , ..., YN
1:T } (3)
whereYn
1:t = Y1:t and Yn
t + 1:T isstochastically sampled via
the policy G✓. Now Q(s, a) becomes
Q(Y1:t − 1, yt ) =
8
><
>:
1
N
P
n = 1..N
R(Yn
1:T ), with
Yn
1:T 2 MCG✓
(Y1:t ; N ), if t < T.
R(Y1:T ), if t = T.
(4)
An unbiased estimation of the gradient of J(✓) can be de-
rived as
r ✓J(✓) '
1
T
X
t = 1,...,T
Eyt ⇠G✓(yt |Y1:t − 1 ) [
r ✓logG✓(yt |Y1:t − 1) · Q(Y1:t − 1, yt )] (5)
Finally in SeqGAN thereward function isprovided by Dφ.
4 ORGAN
Figure 1: Schema for ORGAN. Left: D is trained as a classifier
receiving asinput amix of real dataand generated databy G. Right:
Guimaraes+, 2017
9. 関連研究(Graph-based)
• グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ
– Learning deep generative models of graphs [Li+, 2018]
• ノード・結合を順々に自己回帰的に生成する
• 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する
ノード・結合を決める
– Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018]
• 環などの原子団を一つのグループにまとめることにより、
グラフ構造を木構造に変換する(Tree decomposition)
• 木構造をVAEの枠組みで再構築するように学習する
• Graph Convolutionで特徴抽出した結果も使って木構造から
グラフへと戻す
9
Junction TreeVariational Autoencoder for Molecular Graph Generation
Figure 2. Comparison of two graph generation schemes: Structure
by structure approach is preferred asit avoids invalid intermediate
states (marked in red) encountered in node by node approach.
ond phase, the subgraphs (nodes in the tree) areassembled
together into acoherent molecular graph.
We evaluate our model on multiple tasks ranging from
molecular generation to optimization of a given molecule
according to desired properties. As baselines, we utilize
state-of-the-art SMILES-based generation approaches (Kus-
ner et al., 2017; Dai et al., 2018). We demonstrate that
our model produces 100% valid molecules when sampled
from a prior distribution, outperforming the top perform-
ing baseline by asignificant margin. In addition, weshow
that our model excels in discovering molecules with desired
properties, yielding a30% relativegain over the baselines.
2. Junction TreeVariational Autoencoder
Our approach extends the variational autoencoder (Kingma
Figure 3. Overview of our method: A molecular graph G is first
decomposed into its junction tree TG , where each colored node in
the tree represents a substructure in the molecule. We then encode
both the tree and graph into their latent embeddings z and z .
Li+, 2018
Jin+, 2018
11. Graph Generation as MDP
• 反復的なグラフ生成のプロセスをMDPで定式化
– 状態: 𝑆 = {𝑠𝑡}
• エージェントが観測する、時刻 𝑡 での中間的なグラフ
– 行動: A = {𝑎 𝑡}
• 各時刻で現在のグラフに対する修正を記述する行動の集合(ノード・結合の追加など)
– 状態遷移: P = 𝑝(𝑠𝑡+1|𝑠𝑡, … , 𝑠0, 𝑎 𝑡)
• 𝑠𝑡, … , 𝑠0 において行動 𝑎 𝑡 を取った時の状態遷移確率
– 報酬: R = {𝑠𝑡}
• 状態 𝑠𝑡 到達時に得られる報酬関数
11
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
12. Graph Convolutional Policy Network (GCPN)
• Graph convolution による生成済みグラフ 𝐺𝑡 と候補構造 𝐶 の特徴抽出
– 候補構造(Scaffold)
• 生成済みのグラフに対して、新たに追加される部分グラフの候補
• いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定
– 拡張グラフ 𝐺𝑡 𝐶 に対し、GCNの一種 [Kipf+, 2017] を拡張したモデルを使って特徴抽出
• Kipf+ の手法を結合が考慮できるように拡張
–
– 𝑙 層目のノード埋め込み 𝐻(𝑙)
を結合の種類毎に定義した重み 𝑊𝑖
(𝑙)
を使って畳み込む
– 非線形変換などを行った後、AGG処理で各結合の種類に関して統合した結果を 𝐻(𝑙+1)
とする
– 𝐸𝑖: 結合に関する次元を追加した隣接テンソル 𝐸 の 𝑖 番目の slice、 𝐸𝑖 = 𝐸𝑖 + 𝐼、 𝐷𝑖 = 𝑘 𝐸𝑖𝑗𝑘
12
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
13. Graph Convolutional Policy Network (GCPN)
• 行動の予測
– グラフにおけるリンク予測の要領で、𝑎 𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎 𝑓𝑖𝑟𝑠𝑡, 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑, 𝑎 𝑒𝑑𝑔𝑒, 𝑎 𝑠𝑡𝑜𝑝) を推定する
• 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める
– 𝑓𝑓𝑖𝑟𝑠𝑡(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑓(𝑋)), 𝑎 𝑓𝑖𝑟𝑠𝑡 ~𝑓𝑓𝑖𝑟𝑠𝑡 𝑠𝑡 ∈ {0, 1} 𝑛
(𝑚 𝑓: ℝ 𝑛×𝑘
→ ℝ 𝑛
へ写像するMLP)
• 最初に選択されたノードに関する情報も使ってどのノードを2番目に選択するか決める
– 𝑓𝑠𝑒𝑐𝑜𝑛𝑑(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑠(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋)), 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑 ~𝑓𝑠𝑒𝑐𝑜𝑛𝑑 𝑠𝑡 ∈ {0, 1} 𝑛+𝑐
• 選択された2つのノードの情報を使って結合の種類を決める
– 𝑓𝑒𝑑𝑔𝑒(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑒(𝑋 𝑎 𝑓𝑖𝑟𝑠𝑡
, 𝑋 𝑎 𝑠𝑒𝑐𝑜𝑛𝑑
)), 𝑎 𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 𝑠𝑡 ∈ {0, 1} 𝑏
• 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める
– 𝑓𝑠𝑡𝑜𝑝(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚 𝑡(𝐴𝐺𝐺 𝑋 )), 𝑎 𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1}
13
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
14. 状態遷移 / 報酬
• 状態遷移
– 生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、
その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う
• 報酬
– Step reward
• 原子価ルールに違反したかどうか + Adversarial reward: 𝑉(πθ, 𝐷φ)
• Adversarial rewardを算出するDiscriminatorは一般的なGANフレームワークに従って学習する
–
– Final reward
• ドメイン固有の報酬(LogP, QED, 分子量等の組み合わせ)+ Adversarial reward: 𝑉(πθ, 𝐷φ)
14
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
15. 方策勾配ベースの手法による方策の学習
• Proximal Policy Optimization (PPO) [Schulman+, 2017] により方策を学習
– 通常の方策勾配法:
• 𝐿 𝑃𝐺
(θ) = 𝔼 𝑡 log πθ(𝑎 𝑡|𝑠𝑡) 𝐴 𝑡
– Conservative Policy Iteration (CPI): 過去の方策との差分に注目
• 𝐿 𝐶𝑃𝐼 θ = 𝔼 𝑡
πθ(𝑎 𝑡|𝑠 𝑡)
πθ 𝑜𝑙𝑑
(𝑎 𝑡|𝑠 𝑡)
𝐴 𝑡 = 𝔼 𝑡 𝑟𝑡(θ) 𝐴 𝑡
– Proximal Policy Optimization (PPO): 方策の更新幅に制限を加えて学習を安定化させる
• 𝐿 𝐶𝐿𝐼𝑃
(θ) = 𝔼 𝑡 min(𝑟𝑡 𝜃 𝐴 𝑡, clip(𝑟𝑡 θ , 1 − ε, 1 + ε) 𝐴 𝑡)
15
In this section weformulate theproblem of graph generation aslearning an RL agent that iteratively
adds substructures and edges to themolecular graph in achemistry-aware environment. Wedescribe
the problem definition, theenvironment design, and theGraph Convolutional Policy Network that
predicts adistribution of actions which areused to update thegraph being generated.
3.1 Problem Definition
Werepresent agraph G as(A, E, F ), where A 2 { 0, 1} n⇥n
istheadjacency matrix, and F 2 Rn⇥d
isthenodefeaturematrix assuming each nodehasd features. WedefineE 2 { 0, 1} b⇥n ⇥n
to bethe
(discrete) edge-conditioned adjacency tensor, assuming there arebpossible edge types. Ei ,j ,k = 1 if
there exists an edgeof type i between nodes j and k, and A =
P b
i = 1 Ei . Our primary objectiveis
to generate graphs that maximize agiven property function S(G) 2 R, i.e., maximize EG0[S(G0
)],
where G0
isthegenerated graph, and S could beoneor multiple domain-specific statistics of interest.
It isalso of practical importanceto constrain our model with two main sources of prior knowledge.
(1) Generated graphs need to satisfy aset of hard constraints. (2) Weprovidethe model with aset of
example graphsG ⇠ pdat a(G), and would liketo incorporate such prior knowledge by regularizing
theproperty optimization objectivewith EG,G0[J (G, G0
)] under distancemetric J (·, ·). In thecaseof
molecule generation, theset of hard constraints isdescribed by chemical valency while thedistance
metric isan adversarially trained discriminator.
18. 実験1: 属性最適化
• 下記二種の属性値を最大化することを目的に実験を行った
– Penalized logP: ring sizeや合成可能性スコアも含めたLogP(疎水性)スコア
– QED: Drug-likenessを測る指標
• 一貫して既存法よりも優れた結果を達成
– LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善
– Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった
• スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた
– 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、
スコア関数の欠陥をつくような生成例も存在した
18
Figure 2: Samples of generated molecules in property optimization and constrained property opti-
mization task. In (c), the two columns correspond to molecules before and after modification.
References