SlideShare a Scribd company logo
1 of 1
Download to read offline
Notation
Background & Purpose
1
2
3
4
5
Hai Nguyen (Kyoto univ.), Kenta Oono (Preferred Networks), Shin-ichi Maeda (Preferred Networks)
Semi-supervised learning of hierarchical
representations of molecules with neural message
Proposed method
●Semi-supervised learning
●Hierarchical feature extraction from graph
Existing methods
Experimental Results
Conclusion
There are growing attention to the prediction task of molecular properties based on
the large database of molecules using machine learning techniques for drug
discovery or material informatics.
Graph convolution
Paragraph vector [Le+14]
However, all the molecules are not necessarily labeled because of difficulties in
the experiment or sample purification.
In this study, we aim to improve the prediction accuracy by taking a semi-
supervised approach which extracts appropriate hierarchical substructure
features from the unlabeled dataset
Problem setting
Graph G=(V, E):Set of vertexesV, Set of edges E=VxV
Graph representing molecule
m∈M :Molecule in the given database
v ∈Vm:Atom that makes up of molecule m (Oxygen, Carbon, Hydrogen and Chlorine, ・・・)
e(v,w)∈Em:Bond that connects atoms in molecule m (single, double, triple and aromatic, ・・・)
We are hiring. Let me know if you are interested in
𝑦(𝒉 π‘š) = 𝑁𝑁 ෍
𝑙=1
𝐿
෍
π‘£βˆˆπ‘‰ π‘š
𝑓(β„Ž 𝑣
𝑙 )
node2vec sub2vec graph2vec WL kernel Deep WL
kernel
Proposed
MUTAG 72.63Β±10.2 61.05Β±15.80 83.15Β±9.25 80.63Β±3.07 82.95Β±1.96 86.46Β±5.97
PTC 58.85Β±8.00 59.99Β±6.38 60.17Β±6.86 59.61Β±2.79 59.04Β±1.09 62.86Β±5.71
Loss function to extract substructure feature
Substructure vector β„Ž 𝑣
𝑙
should be close to molecule vector 𝑒 π‘š
when β„Ž 𝑣
𝑙
is computed from the molecule m
β„Ž 𝑣
𝑙 : d-dim substructure vector at level l around atom v
෍
π‘šβˆˆπ‘€
Reg π‘š, 𝒉 π‘š
𝑒 π‘š: molecule vector
where 𝑝 𝐢 = 1 β„Ž 𝑣
𝑙
, 𝑒 π‘š = 𝜎 𝑒 π‘š
𝑇
β„Ž π‘š
𝑙
(Οƒ is a sigmoid function)
𝑝 𝑣
𝑙 : negative sampler of substructure that is
computed from a molecule sampled at random
1
2
3
4
5
β„Ž1
1
β„Ž2
1
β„Ž3
1
β„Ž4
1
β„Ž5
1
Level 1:
Atom’s representation
Level 2:
Substructure’s representation
Level 3:
Substructure’s representation
Level 4:
Cover the whole graph
1
2
3
4
5
β„Ž1
2
β„Ž2
2
β„Ž3
2
β„Ž4
2
β„Ž5
2
1
2
3
4
5
β„Ž1
3
β„Ž2
3
β„Ž3
3
β„Ž4
3
β„Ž5
3
1
2
3
4
5
β„Ž1
4
β„Ž2
4
β„Ž3
4
β„Ž4
4
β„Ž5
4
β„Ž 𝑣
𝑙+1
= 𝜎 β„Ž 𝑣
𝑙
+ ෍
π‘€βˆˆπ‘(𝑣)
𝐻 𝑒 𝑣,𝑀 β„Ž 𝑀
𝑙
𝐻 𝑒 𝑣,𝑀 : d x d matrix depending only on the bond e (applicable any size of graph)
Substructure feature of graph
𝑁(𝑣) : Neighborhood atoms of atom v
Prediction of molecule property using substructure feature
Loss function for semi-supervised learning
෍
𝑖=1
|𝑀 𝐿|
Loss(π‘šπ‘–, 𝑦 𝒉 π‘š 𝑖
, π‘œπ‘–) + πœ† ෍
𝑗=1
𝑀 𝐿 +|𝑀 π‘ˆ|
Reg(π‘šπ‘—, 𝒉 π‘š 𝑗
)
Reg π‘š, 𝒉 π‘š = ෍
𝑙=1
𝐿
෍
π‘£βˆˆπ‘‰ π‘š
log 𝑝 𝐢 = 1 β„Ž 𝑣
𝑙
, 𝑒 π‘š + π‘˜πΈβ„Ž 𝑣′
𝑙 βˆΌπ‘ 𝑣
𝑙 [log 𝑝(𝐢 = 0|β„Ž 𝑣′
𝑙
, 𝑒 π‘š)]
Level 1 Level 2 Level 3 Level 4
Informationofatom
andbonds
β„Ž 𝑣
1
𝑦(𝐑 π‘š)
2
1
3
5
4
2
3
5
4
2
1
3
5
4
β„Ž 𝑣
2 β„Ž 𝑣
3 β„Ž 𝑣
4
2
1
3
5
4
1
𝑀 𝐿
: Number of labeled samples
𝑀 π‘ˆ : Number of unlabeled samples
π‘œπ‘–: Molecule property corresponding to the molecule mi
● Effectiveness of our substructure feature
● Effectiveness of semi-supervised approach
Test the prediction accuracy using the same SVM classifier
except for the input features. Input features are trained with
90% of samples and tested with the rest 10%
MUTAG: Dataset of 188 chemical compounds labeled according to
whether or not they have a mutagenic effect on a specific bacteria
Methods
Dataset
PTC: Dataset of 344 chemical compounds labeled
depending on the carcinogenicity on rats
Our hierarchical substructure extraction successfully
represents an informative feature for prediction
Semi-supervised learning significantly improves
the prediction accuracy
(To the best of our knowledge, this is the first study that brings a
semi-supervised approach to molecular property prediction)
Semi-supervised learning
Supervised learning
Error
Error
Ratio of Labeled data Ratio of Labeled data
Solubility (log Mol/L) Drug efficacy EC50 in nM
Solubility dataset: 1144 molecules Drug efficiacy dataset: 10000 molecules
graph2vec [Narayanan+17]
- Unsupervised representation learning of a document
- Models are trained so that it can predict representations of words from that of a
document containing them.
- An extension of Paragraph vector to arbitrary graphs
- correspondence: Document <=> molecule, words <=> rooted substructures
- Use Weisfeiler-Lehman relabeling algorithm [Weisfeiler+68]
to enumerate rooted substructures.
- Do not use information of hierarchical structures of a set of rooted subgraphs.
- An extension of convolution neural networks to arbitrary graphs
- Neural message passing [Gilmer+17] provides a unified formulation of several
graph convolution algorithms including NFP [Duvenaud+15] and GGNN [Li+15]
[Grover & Leskovec,16] [Narayanan+,17] [Shervashidze+, 11]
[Yanardag
& Vishwanathan, 15]
Bijaya,17]
Negative sampling
↓
෍
𝑖
|𝐷|
෍
𝑛=1
|𝐷 𝑖|
log 𝜎 𝑣 𝑀 𝑛
𝑖
𝑇
𝑣 𝐷 𝑖 + π‘˜E π‘€β€²βˆΌπ‘ƒ 𝑛
log 𝜎 (𝑣 𝑀 𝑛
𝑖
𝑇
𝑣 𝐷 𝑖
)
𝑣 𝑀: word vector of word w
𝑣 𝐷 𝑖
: Document vector of document Di
Negative sampling
↓

More Related Content

What's hot

Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesAlexander Decker
Β 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
Β 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashrisheetal katkar
Β 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...IEEEFINALYEARPROJECTS
Β 
Abstract takahashi
Abstract takahashiAbstract takahashi
Abstract takahashiharmonylab
Β 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Β 
Noorbehbahani rea sc
Noorbehbahani rea scNoorbehbahani rea sc
Noorbehbahani rea scnoorbehbahani
Β 
Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesJonathan D'Cruz
Β 
Technical University of Crete_giakoumisDiplomaThesis
Technical University of Crete_giakoumisDiplomaThesisTechnical University of Crete_giakoumisDiplomaThesis
Technical University of Crete_giakoumisDiplomaThesisGeorgios M. GIAKOUMIS
Β 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsKimin Lee
Β 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
Β 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Ryohei Suzuki
Β 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Β 

What's hot (17)

EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
Β 
tsopze2011
tsopze2011tsopze2011
tsopze2011
Β 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
Β 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
Β 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Β 
A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...A fast clustering based feature subset selection algorithm for high-dimension...
A fast clustering based feature subset selection algorithm for high-dimension...
Β 
Abstract takahashi
Abstract takahashiAbstract takahashi
Abstract takahashi
Β 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Β 
Noorbehbahani rea sc
Noorbehbahani rea scNoorbehbahani rea sc
Noorbehbahani rea sc
Β 
Neural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variablesNeural networks for the prediction and forecasting of water resources variables
Neural networks for the prediction and forecasting of water resources variables
Β 
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Long Zhou - 2017 -  Neural System Combination for Machine TransaltionLong Zhou - 2017 -  Neural System Combination for Machine Transaltion
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Β 
Complex system
Complex systemComplex system
Complex system
Β 
Technical University of Crete_giakoumisDiplomaThesis
Technical University of Crete_giakoumisDiplomaThesisTechnical University of Crete_giakoumisDiplomaThesis
Technical University of Crete_giakoumisDiplomaThesis
Β 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
Β 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Β 
Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"Report: "MolGAN: An implicit generative model for small molecular graphs"
Report: "MolGAN: An implicit generative model for small molecular graphs"
Β 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
Β 

Similar to Semi-supervised learning of hierarchical representations of molecules

Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
Β 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
Β 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
Β 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure ofcsandit
Β 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
Β 
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...Yuyun Wabula
Β 
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2Seung-gyu Byeon
Β 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
Β 
24csit38.pdf
24csit38.pdf24csit38.pdf
24csit38.pdfamine597585
Β 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
Β 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Β 
Powerpoint
PowerpointPowerpoint
Powerpointbutest
Β 
Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...IOSR Journals
Β 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
Β 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsaciijournal
Β 
Devin Petersohn Poster
Devin Petersohn PosterDevin Petersohn Poster
Devin Petersohn PosterDevin Petersohn
Β 
Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...ijbbjournal
Β 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
Β 
chapter10
chapter10chapter10
chapter10butest
Β 

Similar to Semi-supervised learning of hierarchical representations of molecules (20)

Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Β 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
Β 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
Β 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure of
Β 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequences
Β 
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...
Hierarchical algorithms of quasi linear ARX Neural Networks for Identificatio...
Β 
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2
Mncs 16-10-1μ£Ό-λ³€μŠΉκ·œ-introduction to the machine learning #2
Β 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
Β 
24csit38.pdf
24csit38.pdf24csit38.pdf
24csit38.pdf
Β 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
Β 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX model
Β 
Powerpoint
PowerpointPowerpoint
Powerpoint
Β 
Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...Comparison of Neural Network Training Functions for Hematoma Classification i...
Comparison of Neural Network Training Functions for Hematoma Classification i...
Β 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
Β 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
Β 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
Β 
Devin Petersohn Poster
Devin Petersohn PosterDevin Petersohn Poster
Devin Petersohn Poster
Β 
Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...Application of three graph Laplacian based semisupervised learning methods to...
Application of three graph Laplacian based semisupervised learning methods to...
Β 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Β 
chapter10
chapter10chapter10
chapter10
Β 

More from Dai-Hai Nguyen

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
Β 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
Β 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
Β 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selectionDai-Hai Nguyen
Β 
DL for molecules
DL for moleculesDL for molecules
DL for moleculesDai-Hai Nguyen
Β 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DLDai-Hai Nguyen
Β 

More from Dai-Hai Nguyen (8)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
Β 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
Β 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
Β 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
Β 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selection
Β 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
Β 
Seminar
SeminarSeminar
Seminar
Β 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
Β 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
Β 
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhisoniya singh
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Β 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Β 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
Β 
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...Patryk Bandurski
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Β 
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Β 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Β 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Β 
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY πŸ” 8264348440 πŸ” Call Girls in Diplomatic Enclave | Delhi
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Β 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Β 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Β 
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...
Integration and Automation in Practice: CI/CD in MuleΒ Integration and Automat...
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Β 
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 βœ“Call Girls In Kalyan ( Mumbai ) secure service
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Β 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Β 

Semi-supervised learning of hierarchical representations of molecules

  • 1. Notation Background & Purpose 1 2 3 4 5 Hai Nguyen (Kyoto univ.), Kenta Oono (Preferred Networks), Shin-ichi Maeda (Preferred Networks) Semi-supervised learning of hierarchical representations of molecules with neural message Proposed method ●Semi-supervised learning ●Hierarchical feature extraction from graph Existing methods Experimental Results Conclusion There are growing attention to the prediction task of molecular properties based on the large database of molecules using machine learning techniques for drug discovery or material informatics. Graph convolution Paragraph vector [Le+14] However, all the molecules are not necessarily labeled because of difficulties in the experiment or sample purification. In this study, we aim to improve the prediction accuracy by taking a semi- supervised approach which extracts appropriate hierarchical substructure features from the unlabeled dataset Problem setting Graph G=(V, E):Set of vertexesV, Set of edges E=VxV Graph representing molecule m∈M :Molecule in the given database v ∈Vm:Atom that makes up of molecule m (Oxygen, Carbon, Hydrogen and Chlorine, ・・・) e(v,w)∈Em:Bond that connects atoms in molecule m (single, double, triple and aromatic, ・・・) We are hiring. Let me know if you are interested in 𝑦(𝒉 π‘š) = 𝑁𝑁 ෍ 𝑙=1 𝐿 ෍ π‘£βˆˆπ‘‰ π‘š 𝑓(β„Ž 𝑣 𝑙 ) node2vec sub2vec graph2vec WL kernel Deep WL kernel Proposed MUTAG 72.63Β±10.2 61.05Β±15.80 83.15Β±9.25 80.63Β±3.07 82.95Β±1.96 86.46Β±5.97 PTC 58.85Β±8.00 59.99Β±6.38 60.17Β±6.86 59.61Β±2.79 59.04Β±1.09 62.86Β±5.71 Loss function to extract substructure feature Substructure vector β„Ž 𝑣 𝑙 should be close to molecule vector 𝑒 π‘š when β„Ž 𝑣 𝑙 is computed from the molecule m β„Ž 𝑣 𝑙 : d-dim substructure vector at level l around atom v ෍ π‘šβˆˆπ‘€ Reg π‘š, 𝒉 π‘š 𝑒 π‘š: molecule vector where 𝑝 𝐢 = 1 β„Ž 𝑣 𝑙 , 𝑒 π‘š = 𝜎 𝑒 π‘š 𝑇 β„Ž π‘š 𝑙 (Οƒ is a sigmoid function) 𝑝 𝑣 𝑙 : negative sampler of substructure that is computed from a molecule sampled at random 1 2 3 4 5 β„Ž1 1 β„Ž2 1 β„Ž3 1 β„Ž4 1 β„Ž5 1 Level 1: Atom’s representation Level 2: Substructure’s representation Level 3: Substructure’s representation Level 4: Cover the whole graph 1 2 3 4 5 β„Ž1 2 β„Ž2 2 β„Ž3 2 β„Ž4 2 β„Ž5 2 1 2 3 4 5 β„Ž1 3 β„Ž2 3 β„Ž3 3 β„Ž4 3 β„Ž5 3 1 2 3 4 5 β„Ž1 4 β„Ž2 4 β„Ž3 4 β„Ž4 4 β„Ž5 4 β„Ž 𝑣 𝑙+1 = 𝜎 β„Ž 𝑣 𝑙 + ෍ π‘€βˆˆπ‘(𝑣) 𝐻 𝑒 𝑣,𝑀 β„Ž 𝑀 𝑙 𝐻 𝑒 𝑣,𝑀 : d x d matrix depending only on the bond e (applicable any size of graph) Substructure feature of graph 𝑁(𝑣) : Neighborhood atoms of atom v Prediction of molecule property using substructure feature Loss function for semi-supervised learning ෍ 𝑖=1 |𝑀 𝐿| Loss(π‘šπ‘–, 𝑦 𝒉 π‘š 𝑖 , π‘œπ‘–) + πœ† ෍ 𝑗=1 𝑀 𝐿 +|𝑀 π‘ˆ| Reg(π‘šπ‘—, 𝒉 π‘š 𝑗 ) Reg π‘š, 𝒉 π‘š = ෍ 𝑙=1 𝐿 ෍ π‘£βˆˆπ‘‰ π‘š log 𝑝 𝐢 = 1 β„Ž 𝑣 𝑙 , 𝑒 π‘š + π‘˜πΈβ„Ž 𝑣′ 𝑙 βˆΌπ‘ 𝑣 𝑙 [log 𝑝(𝐢 = 0|β„Ž 𝑣′ 𝑙 , 𝑒 π‘š)] Level 1 Level 2 Level 3 Level 4 Informationofatom andbonds β„Ž 𝑣 1 𝑦(𝐑 π‘š) 2 1 3 5 4 2 3 5 4 2 1 3 5 4 β„Ž 𝑣 2 β„Ž 𝑣 3 β„Ž 𝑣 4 2 1 3 5 4 1 𝑀 𝐿 : Number of labeled samples 𝑀 π‘ˆ : Number of unlabeled samples π‘œπ‘–: Molecule property corresponding to the molecule mi ● Effectiveness of our substructure feature ● Effectiveness of semi-supervised approach Test the prediction accuracy using the same SVM classifier except for the input features. Input features are trained with 90% of samples and tested with the rest 10% MUTAG: Dataset of 188 chemical compounds labeled according to whether or not they have a mutagenic effect on a specific bacteria Methods Dataset PTC: Dataset of 344 chemical compounds labeled depending on the carcinogenicity on rats Our hierarchical substructure extraction successfully represents an informative feature for prediction Semi-supervised learning significantly improves the prediction accuracy (To the best of our knowledge, this is the first study that brings a semi-supervised approach to molecular property prediction) Semi-supervised learning Supervised learning Error Error Ratio of Labeled data Ratio of Labeled data Solubility (log Mol/L) Drug efficacy EC50 in nM Solubility dataset: 1144 molecules Drug efficiacy dataset: 10000 molecules graph2vec [Narayanan+17] - Unsupervised representation learning of a document - Models are trained so that it can predict representations of words from that of a document containing them. - An extension of Paragraph vector to arbitrary graphs - correspondence: Document <=> molecule, words <=> rooted substructures - Use Weisfeiler-Lehman relabeling algorithm [Weisfeiler+68] to enumerate rooted substructures. - Do not use information of hierarchical structures of a set of rooted subgraphs. - An extension of convolution neural networks to arbitrary graphs - Neural message passing [Gilmer+17] provides a unified formulation of several graph convolution algorithms including NFP [Duvenaud+15] and GGNN [Li+15] [Grover & Leskovec,16] [Narayanan+,17] [Shervashidze+, 11] [Yanardag & Vishwanathan, 15] Bijaya,17] Negative sampling ↓ ෍ 𝑖 |𝐷| ෍ 𝑛=1 |𝐷 𝑖| log 𝜎 𝑣 𝑀 𝑛 𝑖 𝑇 𝑣 𝐷 𝑖 + π‘˜E π‘€β€²βˆΌπ‘ƒ 𝑛 log 𝜎 (𝑣 𝑀 𝑛 𝑖 𝑇 𝑣 𝐷 𝑖 ) 𝑣 𝑀: word vector of word w 𝑣 𝐷 𝑖 : Document vector of document Di Negative sampling ↓