Presentation on how to chat with PDF using ChatGPT code interpreter
Β
Semi-supervised learning of hierarchical representations of molecules
1. Notation
Background & Purpose
1
2
3
4
5
Hai Nguyen (Kyoto univ.), Kenta Oono (Preferred Networks), Shin-ichi Maeda (Preferred Networks)
Semi-supervised learning of hierarchical
representations of molecules with neural message
Proposed method
βSemi-supervised learning
βHierarchical feature extraction from graph
Existing methods
Experimental Results
Conclusion
There are growing attention to the prediction task of molecular properties based on
the large database of molecules using machine learning techniques for drug
discovery or material informatics.
Graph convolution
Paragraph vector [Le+14]
However, all the molecules are not necessarily labeled because of difficulties in
the experiment or sample purification.
In this study, we aim to improve the prediction accuracy by taking a semi-
supervised approach which extracts appropriate hierarchical substructure
features from the unlabeled dataset
Problem setting
Graph G=(V, E)οΌSet of vertexesV, Set of edges EοΌVο½V
Graph representing molecule
mβM οΌMolecule in the given database
v βVmοΌAtom that makes up of molecule m (Oxygen, Carbon, Hydrogen and Chlorine, γ»γ»γ»)
e(v,w)βEmοΌBond that connects atoms in molecule m (single, double, triple and aromatic, γ»γ»γ»)
We are hiring. Let me know if you are interested in
π¦(π π) = ππ ΰ·
π=1
πΏ
ΰ·
π£βπ π
π(β π£
π )
node2vec sub2vec graph2vec WL kernel Deep WL
kernel
Proposed
MUTAG 72.63Β±10.2 61.05Β±15.80 83.15Β±9.25 80.63Β±3.07 82.95Β±1.96 86.46Β±5.97
PTC 58.85Β±8.00 59.99Β±6.38 60.17Β±6.86 59.61Β±2.79 59.04Β±1.09 62.86Β±5.71
Loss function to extract substructure feature
Substructure vector β π£
π
should be close to molecule vector π’ π
when β π£
π
is computed from the molecule m
β π£
π : d-dim substructure vector at level l around atom v
ΰ·
πβπ
Reg π, π π
π’ π: molecule vector
where π πΆ = 1 β π£
π
, π’ π = π π’ π
π
β π
π
(Ο is a sigmoid function)
π π£
π : negative sampler of substructure that is
computed from a molecule sampled at random
1
2
3
4
5
β1
1
β2
1
β3
1
β4
1
β5
1
Level 1:
Atomβs representation
Level 2:
Substructureβs representation
Level 3:
Substructureβs representation
Level 4:
Cover the whole graph
1
2
3
4
5
β1
2
β2
2
β3
2
β4
2
β5
2
1
2
3
4
5
β1
3
β2
3
β3
3
β4
3
β5
3
1
2
3
4
5
β1
4
β2
4
β3
4
β4
4
β5
4
β π£
π+1
= π β π£
π
+ ΰ·
π€βπ(π£)
π» π π£,π€ β π€
π
π» π π£,π€ : d x d matrix depending only on the bond e (applicable any size of graph)
Substructure feature of graph
π(π£) : Neighborhood atoms of atom v
Prediction of molecule property using substructure feature
Loss function for semi-supervised learning
ΰ·
π=1
|π πΏ|
Loss(ππ, π¦ π π π
, ππ) + π ΰ·
π=1
π πΏ +|π π|
Reg(ππ, π π π
)
Reg π, π π = ΰ·
π=1
πΏ
ΰ·
π£βπ π
log π πΆ = 1 β π£
π
, π’ π + ππΈβ π£β²
π βΌπ π£
π [log π(πΆ = 0|β π£β²
π
, π’ π)]
Level 1 Level 2 Level 3 Level 4
Informationofatom
andbonds
β π£
1
π¦(π‘ π)
2
1
3
5
4
2
3
5
4
2
1
3
5
4
β π£
2 β π£
3 β π£
4
2
1
3
5
4
1
π πΏ
: Number of labeled samples
π π : Number of unlabeled samples
ππ: Molecule property corresponding to the molecule mi
β Effectiveness of our substructure feature
β Effectiveness of semi-supervised approach
Test the prediction accuracy using the same SVM classifier
except for the input features. Input features are trained with
90% of samples and tested with the rest 10%
MUTAG: Dataset of 188 chemical compounds labeled according to
whether or not they have a mutagenic effect on a specific bacteria
Methods
Dataset
PTC: Dataset of 344 chemical compounds labeled
depending on the carcinogenicity on rats
Our hierarchical substructure extraction successfully
represents an informative feature for prediction
Semi-supervised learning significantly improves
the prediction accuracy
(To the best of our knowledge, this is the first study that brings a
semi-supervised approach to molecular property prediction)
Semi-supervised learning
Supervised learning
Error
Error
Ratio of Labeled data Ratio of Labeled data
Solubility (log Mol/L) Drug efficacy EC50 in nM
Solubility dataset: 1144 molecules Drug efficiacy dataset: 10000 molecules
graph2vec [Narayanan+17]
- Unsupervised representation learning of a document
- Models are trained so that it can predict representations of words from that of a
document containing them.
- An extension of Paragraph vector to arbitrary graphs
- correspondence: Document <=> molecule, words <=> rooted substructures
- Use Weisfeiler-Lehman relabeling algorithm [Weisfeiler+68]
to enumerate rooted substructures.
- Do not use information of hierarchical structures of a set of rooted subgraphs.
- An extension of convolution neural networks to arbitrary graphs
- Neural message passing [Gilmer+17] provides a unified formulation of several
graph convolution algorithms including NFP [Duvenaud+15] and GGNN [Li+15]
[Grover & Leskovec,16] [Narayanan+,17] [Shervashidze+, 11]
[Yanardag
& Vishwanathan, 15]
Bijaya,17]
Negative sampling
β
ΰ·
π
|π·|
ΰ·
π=1
|π· π|
log π π£ π€ π
π
π
π£ π· π + πE π€β²βΌπ π
log π (π£ π€ π
π
π
π£ π· π
)
π£ π€: word vector of word w
π£ π· π
: Document vector of document Di
Negative sampling
β