Semi-supervised learning of hierarchical representations of molecules

Notation
Background & Purpose
1
2
3
4
5
Hai Nguyen (Kyoto univ.), Kenta Oono (Preferred Networks), Shin-ichi Maeda (Preferred Networks)
Semi-supervised learning of hierarchical
representations of molecules with neural message
Proposed method
●Semi-supervised learning
●Hierarchical feature extraction from graph
Existing methods
Experimental Results
Conclusion
There are growing attention to the prediction task of molecular properties based on
the large database of molecules using machine learning techniques for drug
discovery or material informatics.
Graph convolution
Paragraph vector [Le+14]
However, all the molecules are not necessarily labeled because of difficulties in
the experiment or sample purification.
In this study, we aim to improve the prediction accuracy by taking a semi-
supervised approach which extracts appropriate hierarchical substructure
features from the unlabeled dataset
Problem setting
Graph G=(V, E)：Set of vertexesV, Set of edges E＝VｘV
Graph representing molecule
m∈M ：Molecule in the given database
v ∈Vm：Atom that makes up of molecule m (Oxygen, Carbon, Hydrogen and Chlorine, ・・・)
e(v,w)∈Em：Bond that connects atoms in molecule m (single, double, triple and aromatic, ・・・)
We are hiring. Let me know if you are interested in
𝑦(𝒉 𝑚) = 𝑁𝑁 ෍
𝑙=1
𝐿
෍
𝑣∈𝑉 𝑚
𝑓(ℎ 𝑣
𝑙 )
node2vec sub2vec graph2vec WL kernel Deep WL
kernel
Proposed
MUTAG 72.63±10.2 61.05±15.80 83.15±9.25 80.63±3.07 82.95±1.96 86.46±5.97
PTC 58.85±8.00 59.99±6.38 60.17±6.86 59.61±2.79 59.04±1.09 62.86±5.71
Loss function to extract substructure feature
Substructure vector ℎ 𝑣
𝑙
should be close to molecule vector 𝑢 𝑚
when ℎ 𝑣
𝑙
is computed from the molecule m
ℎ 𝑣
𝑙 : d-dim substructure vector at level l around atom v
෍
𝑚∈𝑀
Reg 𝑚, 𝒉 𝑚
𝑢 𝑚: molecule vector
where 𝑝 𝐶 = 1 ℎ 𝑣
𝑙
, 𝑢 𝑚 = 𝜎 𝑢 𝑚
𝑇
ℎ 𝑚
𝑙
(σ is a sigmoid function)
𝑝 𝑣
𝑙 : negative sampler of substructure that is
computed from a molecule sampled at random
1
2
3
4
5
ℎ1
1
ℎ2
1
ℎ3
1
ℎ4
1
ℎ5
1
Level 1:
Atom’s representation
Level 2:
Substructure’s representation
Level 3:
Substructure’s representation
Level 4:
Cover the whole graph
1
2
3
4
5
ℎ1
2
ℎ2
2
ℎ3
2
ℎ4
2
ℎ5
2
1
2
3
4
5
ℎ1
3
ℎ2
3
ℎ3
3
ℎ4
3
ℎ5
3
1
2
3
4
5
ℎ1
4
ℎ2
4
ℎ3
4
ℎ4
4
ℎ5
4
ℎ 𝑣
𝑙+1
= 𝜎 ℎ 𝑣
𝑙
+ ෍
𝑤∈𝑁(𝑣)
𝐻 𝑒 𝑣,𝑤 ℎ 𝑤
𝑙
𝐻 𝑒 𝑣,𝑤 : d x d matrix depending only on the bond e (applicable any size of graph)
Substructure feature of graph
𝑁(𝑣) : Neighborhood atoms of atom v
Prediction of molecule property using substructure feature
Loss function for semi-supervised learning
෍
𝑖=1
|𝑀 𝐿|
Loss(𝑚𝑖, 𝑦 𝒉 𝑚 𝑖
, 𝑜𝑖) + 𝜆 ෍
𝑗=1
𝑀 𝐿 +|𝑀 𝑈|
Reg(𝑚𝑗, 𝒉 𝑚 𝑗
)
Reg 𝑚, 𝒉 𝑚 = ෍
𝑙=1
𝐿
෍
𝑣∈𝑉 𝑚
log 𝑝 𝐶 = 1 ℎ 𝑣
𝑙
, 𝑢 𝑚 + 𝑘𝐸ℎ 𝑣′
𝑙 ∼𝑝 𝑣
𝑙 [log 𝑝(𝐶 = 0|ℎ 𝑣′
𝑙
, 𝑢 𝑚)]
Level 1 Level 2 Level 3 Level 4
Informationofatom
andbonds
ℎ 𝑣
1
𝑦(𝐡 𝑚)
2
1
3
5
4
2
3
5
4
2
1
3
5
4
ℎ 𝑣
2 ℎ 𝑣
3 ℎ 𝑣
4
2
1
3
5
4
1
𝑀 𝐿
: Number of labeled samples
𝑀 𝑈 : Number of unlabeled samples
𝑜𝑖: Molecule property corresponding to the molecule mi
● Effectiveness of our substructure feature
● Effectiveness of semi-supervised approach
Test the prediction accuracy using the same SVM classifier
except for the input features. Input features are trained with
90% of samples and tested with the rest 10%
MUTAG: Dataset of 188 chemical compounds labeled according to
whether or not they have a mutagenic effect on a specific bacteria
Methods
Dataset
PTC: Dataset of 344 chemical compounds labeled
depending on the carcinogenicity on rats
Our hierarchical substructure extraction successfully
represents an informative feature for prediction
Semi-supervised learning significantly improves
the prediction accuracy
(To the best of our knowledge, this is the first study that brings a
semi-supervised approach to molecular property prediction)
Semi-supervised learning
Supervised learning
Error
Error
Ratio of Labeled data Ratio of Labeled data
Solubility (log Mol/L) Drug efficacy EC50 in nM
Solubility dataset: 1144 molecules Drug efficiacy dataset: 10000 molecules
graph2vec [Narayanan+17]
- Unsupervised representation learning of a document
- Models are trained so that it can predict representations of words from that of a
document containing them.
- An extension of Paragraph vector to arbitrary graphs
- correspondence: Document <=> molecule, words <=> rooted substructures
- Use Weisfeiler-Lehman relabeling algorithm [Weisfeiler+68]
to enumerate rooted substructures.
- Do not use information of hierarchical structures of a set of rooted subgraphs.
- An extension of convolution neural networks to arbitrary graphs
- Neural message passing [Gilmer+17] provides a unified formulation of several
graph convolution algorithms including NFP [Duvenaud+15] and GGNN [Li+15]
[Grover & Leskovec,16] [Narayanan+,17] [Shervashidze+, 11]
[Yanardag
& Vishwanathan, 15]
Bijaya,17]
Negative sampling
↓
෍
𝑖
|𝐷|
෍
𝑛=1
|𝐷 𝑖|
log 𝜎 𝑣 𝑤 𝑛
𝑖
𝑇
𝑣 𝐷 𝑖 + 𝑘E 𝑤′∼𝑃 𝑛
log 𝜎 (𝑣 𝑤 𝑛
𝑖
𝑇
𝑣 𝐷 𝑖
)
𝑣 𝑤: word vector of word w
𝑣 𝐷 𝑖
: Document vector of document Di
Negative sampling
↓

Semi-supervised learning of hierarchical representations of molecules

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Semi-supervised learning of hierarchical representations of molecules

Similar to Semi-supervised learning of hierarchical representations of molecules (20)

More from Dai-Hai Nguyen

More from Dai-Hai Nguyen (8)

Recently uploaded

Recently uploaded (20)

Semi-supervised learning of hierarchical representations of molecules