Report: "MolGAN: An implicit generative model for small molecular graphs"

MolGAN: An implicit generative
model for small molecular graphs
N. De Cao and T. Kipf
(Informatics Institute, University of Amsterdam)
ICML Deep Generative Models Workshop (2018)
arXiv:1805.11973
Gpat Journal Club 2018.10.12, Ryohei Suzuki

Research Summary
• Automatic generation of drug-like small molecules
• Generative Adversarial Net + Graph Neural Network
+ Reinforcement Learning
• Optimization of biochemical properties (e.g., solubility)
→ first step toward in-silico screening by ML
※It is not aimed at designing drugs for specific purposes

About the authors
T. Kipf (Ph.D cand.)
• https://tkipf.github.io/
• Supervisor: Max Welling (ML)
N. De Cao (Ph.D cand.)
• https://nicola-decao.github.io/
• Supervisor: Ivan Titov (NLP)
Supervisor of D. Kingma Pupil of G. t’Hooft
(author of Adam, VAE, etc.) (quantum gravity, string theory)
citation count
1999 (electro-weak)

Drug design / drug discovery (DD)
Properties required for drugs
• Useful bioactivity
• Controllable side effect
• Synthesizability
• Having effect after metabolism (cf. drug delivery)
Vast time and monetary cost of animal/human experiments
→ in-silico screening using computers

Screening by simulation
Case of target drug:
1. Structure determination of
target protein
2. Decision of target site
3. Static affinity prediction
4. Dynamic binding simulation (MD)
days-weeks computation time /molecule
Gefitinib
Mutated EGFR
(non small cell lung cancer)

Why is drug design difficult?
1. Very large and high-dimensional search space
- over 60,000 permutation for only 10 C/N/O atoms
- very limited atomic permutations give valid structure
2. Discrete optimization of molecular structure
- continuous/gradual optimization is not possible
3. Slight change in structure results in large effects
- COH and COOH are absolutely different

Why is drug design difficult?
4. No appropriate data structure for molecular structure
5. Predicting biochemical properties is essentially difficult
- Even QM/MM has limitation. Wet exp. is necessary
CN1CCC[C@H]1c2cccnc2
Image SMILES representation 3D structure
(important for proteins)

Will ML solve the problems?
1. Very large and high-dimensional search space
→ Generative models (e.g. GAN) can
effectively represent complex/high-dimensional data
2. Discrete optimization of molecular structure
→ Goal of this study is just rough screening
(not fine-tuning of specific drugs)
3. Slight change in structure results in large effects
→ Pinpoint affinity prediction can be difficult for ML.
ML suites predicting general properties like solubility

Will ML solve the problems?
4. No appropriate data structure for molecular structure
→ Graph representation
+ Graph convolutional neural network
5. Predicting biochemical properties is essentially difficult
→ ML wouldn’t solve this fundamental problem.
Improved simulation methods are also needed

Problem definition
Generating molecular structure without specific usages
• Generated molecules are evaluated by:
1. Druglikeness (QED: Bickerton et al., 2012)
2. Synthesizability (Synthetic Accessibility: Ertl & Schuffenhauer, 2009)
3. Solubility (logP: Comer & Tam, 2001)
• Methods are evaluated by:
1. Validness = valid structure / output structure
2. Novelty = ratio of valid structures not included in training dataset
3. Uniqueness = unique valid molecules / total valid molecules

Overview
Generator:
Transforms noise
into a structure
Generated
structure Discriminator:
Judges structure
is valid or not
Reward Network:
Predict the properties
of molecular structures
Goal: obtaining a generator that can output
valid molecular structures with good properties

Revisiting neural networks
https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6
1. Input an image or some value
2. Multiple transformation
3. Value (regression) or category (classification) is
outputted
4. Calculate “loss” value
5. Refine the transformation parameter to improve the
loss value (back-propagation)

Generative models
• classification：judge an image to be cat or dog
• regression：predict f(0.5) from f(0), f(1)
• generation：generate data distribution like training data
https://blog.openai.com/generative-models/

Generative models
• 識別モデル：画像を入力してカテゴリ(犬か猫か)を判定
• 回帰モデル：f(0), f(1)が分かってるときのf(0.5)を予測
• 生成モデル：データセットの分布と同じようなデータを生成
https://blog.openai.com/generative-models/
Challenge:
How to calculate the “loss” value to train the model
to generate a “distribution like given dataset?”

Generative Adversarial Net (GAN)
“Rat race between fake bill maker vs. police”
• generator：generate data as resemble as possible dataset samples
• discriminator：distinguish real / fake data as precise as possible
→ train two modules alternately
do not calculate actual distribution
→ danger of mode collapse
https://towardsdatascience.com/generative-adversarial-networks-explained-34472718707a

Power of GANs
e.g., BigGANs (Brock et al., 2018)
Generated Images
Continuous morphing of input noise
Continuous change of noise
gives semantically continuous
change of Image
＝learned useful representation

Molecular structure representation
Image：human-interpretable, but inefficient
SMILES：rich information, but syntax is too strict
3D：very rich information, large data size, invariance problem
CN1CCC[C@H]1c2cccnc2
2D Image SMILES 3D structure

Graph and molecular structure
Graph：Network structure consist of nodes V and edges E
Node＝atom / Edge＝bond → Graph = molecule
https://ja.wikipedia.org/wiki/%E9%9A%A3%E6%8E%A5%E8%A1%8C%E5%88%97
simple graph Adjacency matrix
Node matrix Adjacency tensor

2D-convolution for images
https://developer.nvidia.com/discover/convolutional-neural-network
Convolution：Applying filters for an entire image
http://timdettmers.com/2015/03/26/convolution-deep-learning/
Convolutional Neural Network
Extract abstract information of images
by repeated 2D-convolutions

Graph convolution (Kipf&Welling ICLR2017)
Convolution can be also defined for graphs!
http://tkipf.github.io/misc/SlidesCambridge.pdf

Reinforcement Learning
Learning framework for robot movement
Action under an environment gives
a reward reflecting the goodness
ex) going toward a hole results in death of Mario
Optimizing the policy to maximize the reward
ex) Jump when a hole is located in front of Mario
https://en.wikipedia.org/wiki/Reinforcement_learning

LR for Molecular Design
Action：Generation of a molecule
Environment/Reward：biochemical evaluation of molecule
Policy：Generative model
druglikeness：0.9
synthesizability：0.1
solubility：0.3
…
Feedback
External
software

Design of MolGAN (1) GAN
• Gen directly output a graph
in adjacency matrix
• Gen is a MLP
• Dis judges the validness of a
molecule
• Dis is a graph convolutional
• WGAN-GP* loss
*Please refer to the material of Fukuta-san’s lecture

Design of MolGAN (2) LR
Deep deterministic policy gradient
• Reward network mimics external
program to evaluate molecules
• Reward network has same structure
as the dis
• Reward loss = output of reward
network
• Blend GAN loss & reward loss

Examples of generated molecules
※numbers: druglikeness (QED score)

Exp.1: valance of GAN/reward loss
Evaluate generated molecules with changing the loss valance
Result：Only reward loss is necessary

Exp.2: comparison with other methods
• Validity:
Others: 85-95%
MolGAN: 98-100%
• Uniqueness:
Others: 10-70%
MolGAN: 2%
• Time consumption:
1/10-1/2 to others

Exp.2: comparison with other methods
• druglikeness
• synthesizability
• solubility
Higher score than other methods
for all the properties

Discussion
Pros
• Very high (~100%) valid output structure ratio
• GraphNN＋LR is effective for biochemical optimization
• Light computational cost, fast learning
Cons / Future work
• mode collapse = same structure is repeatedly generated
→ normalization techniques (e.g., spectral norm) are useful?
• Fixed atom count

Report: "MolGAN: An implicit generative model for small molecular graphs"

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Report: "MolGAN: An implicit generative model for small molecular graphs"

Similaire à Report: "MolGAN: An implicit generative model for small molecular graphs" (20)

Plus de Ryohei Suzuki

Plus de Ryohei Suzuki (20)

Dernier

Dernier (20)

Report: "MolGAN: An implicit generative model for small molecular graphs"