Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Deep learning for molecules, introduction to chainer chemistry

1 533 vues

Publié le

Deep learning for molecules, introduction to chainer chemistry

Publié dans : Technologie
  • at 5th page: electlicity → electricity
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Deep learning for molecules, introduction to chainer chemistry

  1. 1. Kenta Oono (oono@preferred.jp, github: delta2323) Kosuke Nakago (nakago@preferred.jp, github: corochann) Deep learning for molecules Introduction to Chainer Chemistry
  2. 2. Table of contents 1. What is machine learning? a. Data driven approach b. Primer of deep learning (MLP/ CNN) 2. Prediction of chemical characteristics a. Rule-based approach vs. Learning-based approach b. Neural Message passing (NFP / GGNN etc.) 3. Chainer Chemistry a. Primer of Chainer b. Coding examples 4. Other topics a. Generation of chemical compounds b. Automatic chemical synthesis
  3. 3. Why machine learning? Example: Prediction of age from pictures Challenges ● What criteria can we use? ○ height, hair, cloths, physique etc. ? ○ Not all criteria are perfect. ● Even if we have good criteria, how could we extract them? ○ People in pictures can have different positions, scale, postures. ○ How can we detect each part (face, hair etc.) within a body? => It is very difficult to list up rules manually. Picture: irastoya (https://www.irasutoya.com)
  4. 4. Approach by machine learning Provide machines with vast amount of images with age information and have them discover treads characteristic to each generation. Human does not direct machines where in images to look at explicitly. Photo : flicker
  5. 5. Application of machine learning Task Input Output Chemical prediction Molecule Chemical characteristics (HOMO etc.) Mail classification E-mail (sentences, header) Spam or Normal or Important Data center electlicity optimization Packets of each server Estimated electricity demand Web marketing Access history, ad contents Click or not Surveillance camera Movie suspicious behavior or not
  6. 6. Categorization of machine learning algorithms ● By dataset types ● Supervised learning (with ground truth labels) ● Unsupervised learning (without ground truth labels) ● Semi-supervised learning (A part of samples has ground truth labels) ● Reinforcement learning (Reward instead of labels) ● By methods ● Classification, Regression, Clustering, Nearest Neighbourhood ● Others ● discriminative model vs. generative model / bayesian vs. fequensionist etc.
  7. 7. Deep Learning A general term of the subcategory of machine learning that uses models consisting of (typically many) simple and differentiable transformations. http://www.wsdm-conference.org/2016/slides/WSDM2016-Jeff-Dean.pdf
  8. 8. Multi Layer Perceptron (MLP) x1 xN ・・・・・・・ h1 hH kM k1 yM y1 f1 f2 f3 W2/b2 W1/b1 tM t1 Ground truthInput Forward Backward Output ・・・ ・・ ・・ Learnable parameters • W1 , W2 : parameter matrices • b1 , b2 : bias vectors Forward propagation • h = f1 (x) = Sigmoid(W1 x + b1 ) • k = f2 (h) = Sigmoid(W2 h + b2 ) • y = f3 (k) = SoftMax(k) (equivalently, yi = exp(ki )/Σj exp(kj )) Training dataset • Feature vectors: x1 , x2 , …, xN • Ground truth labels: t1 , t2 , …, tN Each transform consists of a fully-connected layer and an activation function Evaluate difference ・・・・・・・
  9. 9. ● Learnable parameters: ● W (weight matrix of size N x M) ● b (bias vector of size M) ● Input : vector x of size N ● Output vector y = Wx + b (affine transformation) W/b Fully connected layer yx y1 yM ・・・・ x1 xN ・・・・・・ y = Wx + b
  10. 10. Activation function ● Function (usually) without learnable parameter for introducing non-linearlity ● Input: vector (or tensor) x = (x1 , …, xn ) ● Output: vector (or tensor) y = (y1 , …, yn ) y1 yN x1 xN yx ・・・・・・ Examples of σ ● Sigmoid(x) = 1 / 1 + exp(-x) ● tanh(x) ● ReLU(x) = max(0, x) ● LeakyReLU(x) = x (x > 0), ax (x < 0) ○ a < 0 is a fixed constant ・・・・・・ yi = σ(xi ) (i = 1, …, n)
  11. 11. Convolutional Neural Network (CNN)[LeCun+98] • A neural network consisting of convolutional layers and pooling layers • Many variants: AlexNet, VGG, Inception, GoogleNet, ResNet etc. • Widely used in image recognition and recently applied to biology and chemistry LeNet-5[LeCunn+98] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
  12. 12. Convolution operation (stride = 1 case) 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 input filter * = output 4 3 4 1 2 4 3 3 2 3 4 1 2 2 1 1
  13. 13. Convolution operation (stride = 3 case) 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 input filter * = output 4 1 2 1
  14. 14. Feature extraction by filters
  15. 15. Convolutional layer Stack several filters whose parameters are learnable
  16. 16. Stacking convolutional layers Convolution layer with stride k generates the output whose height & width are approximately k times smaller.
  17. 17. Pooling layers http://cs231n.github.io/convolutional-networks/
  18. 18. How can we generalize convolution operations to arbitrary graphs? Images : grid graph Molecules : arbitrary graph
  19. 19. Table of contents 1. What is machine learning? a. Data driven approach b. Primer of deep learning (MLP / CNN) 2. Prediction of chemical characteristics a. Rule-based approach vs. Learning-based approach b. Neural Message passing (NFP / GGNN etc.) 3. Chainer Chemistry a. Primer of Chainer b. Coding examples 4. Other topics a. Generation of chemical compounds b. Automatic chemical synthesis
  20. 20. Chemical prediction - Two approaches Quantum simulation  Theory-based approach.  DFT (Density Functional Theory)  → Pros: Precision is guaranteed    Cons: High calculation cost Machine learning  Data-based approach.  Learn known compound’s property,  predict new compound’s property.  → Pros: Low cost, high speed calculation    Cons: No precision guaranteed “Neural message passing for quantum chemistry” Justin et al
  21. 21. Extended Connectivity Fingerprint (ECFP) Pros - Calculation is fast - Show presence of particular substructures Cons - Bit collision two (or more) different substructural features could be represented by the same bit position https://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/ https://docs.chemaxon.com/display/docs/Extended+Connectivity+Fingerprint+ECFP Convert molecule into fixed length bit representation
  22. 22. Problems of conventional methods 1. Input representation is not unique, result depends on representation of input e.g. SMILES representation   CC#C and C#CC are same molecule. 2. Order invariance is not guaranteed – representation is not guaranteed to be invariant to relabeling (i.e. permutation of indexes) of molecules.
  23. 23. How graph convolution works CNN on image Image class label Chemical property Graph convolution
  24. 24. Atom feature embedding: 1 Man-made features C N O 1.0 0.0 0.0 6.0 1.0 atom type 0.0 1.0 0.0 7.0 1.0 0.0 0.0 1.0 8.0 1.0 charge chirality Man-made features Molecular Graph Convolutions: Moving Beyond Fingerprints Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856
  25. 25. Atom feature embedding: 2 Embed in vector space C N O 0.5 1.2 1.0 1.0 1.8 Embed in vector space 0.8 1.0 1.3 0.1 1.5 0.5 1.0 0.5 2.0 0.0 Each atom is randomly assigned to some position in vector space W Learnable parameter
  26. 26. Graph Convolution: update each node’s (atom) feature Feature of each node is updated (several times) by Graph Convolution operation. Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4)
  27. 27. Graph Gather: Extract whole graph (molecule) feature Updated feature of each node is finally combined to form graph’s (molecule’s) feature by Graph Gather operation. Han Altae-Tran, Bharath Ramsundar, Aneesh S. Pappu, & Vijay Pande (2017). Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci., 3 (4)
  28. 28. Unified view of graph convolution Many message-passing algorithms (NFP, GGNN, Weave etc.) are formulated as the iterative application of Update and Readout functions [Gilmer et al. 17]. Update Readout Aggregates neighborhood information and updates node representations. Aggregates all node representations and updates the final output. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.
  29. 29. Graph convolution neural network variants - NFP: Neural Fingerprint - GGNN: Gated-Graph Neural Network - WeaveNet: Molecular Graph Convolutions - SchNet: A continuous-filter convolutional NN “Convolutional Networks on Graph for Learning Molecular Fingerprints” https://arxiv.org/abs/1509.09292
  30. 30. NFP: Neural Fingerprint Message passing - update feature r Readout - extract output f from r Convolution David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints.
  31. 31. NFP: Neural Fingerprint C C C N C C C O OH C C C N C C C O O h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 W3 h9 W3 h8 W3 h6 W3 h7 h’7 = σ ( W3 (h7 +h6 +h8 +h9 ) ) h’3 = σ ( W2 (h3 +h2 +h4 ) ) W2 h2 W2 h4 W2 h3 Graph convolution operation depends on degree of each atom → Bonding type information is not utilized Update:
  32. 32. NFP: Neural Fingerprint C C C N C C C O OH h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 Readout operation is basically simply sum over the atoms → No selective operation/attention mechanism is adopted. Readout: R = ∑ i softmax (Whi )
  33. 33. GGNN: Gated Graph Neural Network C C C N C C C O OH C C C N C C C O O h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 W1 h9 W2 h8 W1 h6 h7 h’7 = GRU (h7 , W1 h6 +W2 h8 +W1 h9 ) h’3 = GRU (h3 , W1 h2 +W2 h4 ) W1 h2 W2 h4 h3 Graph convolution operation depends on bonding type of each atom pair Update: GRU: Gated Recurrent Unit Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015.
  34. 34. GGNN: Gated Graph Neural Network C C C N C C C O OH h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 Readout operation contains selective operation (gating) Readout: R = ∑ v σ (Wi hv ) ⦿ Wj hv R = ∑ v σ (i(hv , hv0 )) ⦿ j(hv ) Simplified version Here, i and j represents some function (neural network) σ is sigmoid non-linear function
  35. 35. Weave: Molecular Graph Convolutions ● Weave module convolutes an atom feature for by features of the pair of each atoms. A: atom feature, P: feature of atom pair ● P → A operation: g() is a function for order invariance. sum() is used in the paper. Molecular Graph Convolutions: Moving Beyond Fingerprints Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley arXiv:1603.00856
  36. 36. SchNet: A continuous-filter convolutional neural network Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Rober Müller Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. 1. All atom pair distance ||ri - rj || is used as input 2. Energy conserving condition can be addtionally used to constraint the model for energy prediction task
  37. 37. Comparison between graph convolution networks NFP GGNN Weave SchNet Atom feature extraction Man-made or Embed Man-made or Embed Man-made or Embed Man-made or Embed Graph convolution strategy Adjacent atoms only Adjacent atoms only All atom-atom pairs All atom-atom pairs How to represent connection information Degree Binding type Man-made pair features (bondtype, distance etc.) Distance
  38. 38. Example: IT Drug Discovery Contest Task • Find new seed compounds for a target protein (Sirtuin 1) from 2.5 million compounds by IT technologies Rule • Each team needs to prepare data by itself such as training datasets. • Each team can submit up to 400 candidate compounds • Judge checks all submitted compounds by a 2-stage biological experiment. – Thermal Shift Assay – Inhibitory assay → IC50 measurement Sirtuin 1 Contest website (Japanese) http://www.ipab.org/eventschedule/contest/contest4
  39. 39. Our result Ours Average (18 teams in total) 1st screening (TSA) 23 / 200 (11.5%) 69 / 3559 (1.9 %) 2nd screening (IC50) 1 5 We found one hit compound and won one of Grand prize (IPAB prize)
  40. 40. Extension to semi-supervised learning Compute representations of subgraphs inductively with neural message passing (→) Optimize the representation in unsupervised manner in the same way as Paragraph vector (↓) Nguyen, H., Maeda, S. I., & Oono, K. (2017). Semi-supervised learning of hierarchical representations of molecules using neural message passing. arXiv preprint arXiv:1711.10168.
  41. 41. Table of contents 1. What is machine learning? a. Data driven approach b. Primer of deep learning (MLP/ CNN / Graph convolution network) 2. Prediction of chemical characteristics a. Rule-based approach vs. Learning-based approach b. Neural Message passing (NFP / GGNN etc.) 3. Chainer Chemistry a. Primer of Chainer b. Coding examples 4. Other topics a. Generation of chemical compounds b. Automatic chemical synthesis
  42. 42. How can we incorporate ML to Chemistry and Biology? Problems • Optimized graph convolution algorithms are hard to implement from scratch. • ML and Chemistry/Biology researchers sometimes use different “languages”. Solution: Create tools so that … • Chemistry/Biology researchers do not bother details of DL algorithms and concentrate on their research. • ML and Chemistry researchers can work in collaboration. ー> We are developing Chainer Chemistry Picture: irastoya (https://www.irasutoya.com)
  43. 43. A Python framework that lets researchers quickly implement, train, and evaluate deep learning models. Designing a network Training, evaluation Data set
  44. 44. Speed up research and development of deep learning and its applications. (https://chainer.org) Features • Build DL models as a Python program → Can write complex network (loop, branch etc.) easily • Define-by-Run: dynamic model construction → Can make full use of Python stacktrace in debugging → Can support data-dependent neural networks natively • CuPy: NumPy-like GPU array library → Can write CPU/GPU agnostic code Basic information • First release: June 2015 • Version – v3.3.0 (stable) – v4.0.0b3 (develop) • License: MIT • Language: Python
  45. 45. Example: Build and train convolutional Network import chainer import chainer.links as L import chainer.functions as F class LeNet5(chainer.Chain): def __init__(self): super(LeNet5, self).__init__() with self.init_scope(): self.conv1 = L.Convolution2D(1, 6, 5, 1) self.conv2 = L.Convolution2D(6, 16, 5, 1) self.conv3 = L.Convolution2D(16, 120, 4, 1) self.fc4 = L.Linear(None, 84) self.fc5 = L.Linear(84, 10) def __call__(self, x): h = F.sigmoid(self.conv1(x)) h = F.max_pooling_2d(h, 2, 2) h = F.sigmoid(self.conv2(h)) h = F.max_pooling_2d(h, 2, 2) h = F.sigmoid(self.conv3(h)) h = F.sigmoid(self.fc4(h)) return self.fc5(h)
  46. 46. Example: Build and train convolutional Network model = LeNet5() model = L.Classifier(model) # Dataset is a list! ([] to access, having __len__) dataset = [(x1, t1), (x2, t2), ...] # iterator to return a mini-batch retrieved from dataset it = iterators.SerialIterator(dataset, batchsize=32) # Optimization methods (you can easily try various methods by changing SGD to # MomentumSGD, Adam, RMSprop, AdaGrad, etc.) opt = optimizers.SGD(lr=0.01) opt.setup(model) updater = training.StandardUpdater(it, opt, device=0) # device=-1 if you use CPU trainer = training.Trainer(updater, stop_trigger=(100, 'epoch')) trainer.run()
  47. 47. Add-on packages for Chainer
  48. 48. ChainerUI
  49. 49. Chainer Chemistry Chainer extension library for Biology and Chemistry (http://chainer-chemistry.readthedocs.io/)
  50. 50. Technological Stack File Parser (SDF file, CSV file) QM 9, Tox21 dataset Graph convolution NN GraphLinear Preprocessing (NFP, GGNN, SchNet) Example Train and prediction with QM9/tox21 dataset Model Layer/Function Dataset Pretrained Model (TBD) Preprocessor (Feature Extractor)
  51. 51. Chainer Chemistry Chainer extension library for Biology and Chemistry Basic information release:12/14/2017, version: v0.1.0, license: MIT, language: Python Features • State-of-the-art deep learning neural network models (especially graph convolutions) for chemical molecules (NFP, GGNN, Weave, SchNet etc.) • Preprocessors of molecules tailored for these models • Parsers for several standard file formats (CSV, SDF etc.) • Loaders for several well-known datasets (QM9, Tox21 etc.) (http://chainer-chemistry.readthedocs.io/)
  52. 52. Dataset introduction - tox21 # of Dataset: Train 11757, Validation 295, Test 645 Label - Following 12 types of toxity is included: 'NR-AR', 'NR-AR-LBD', 'NR-AhR', 'NR-Aromatase', 'NR-ER', 'NR-ER-LBD', 'NR-PPAR-gamma', 'SR-ARE', 'SR-ATAD5', 'SR-HSE', 'SR-MMP', 'SR-p53' Example: SMILES: C(=O)C1(O)Cc2c(O)c3c(c(O)c2C(OC2CC (N)C(O)C(C)O2)C1)C(=O)c1c(O)cccc1C3 =O LABEL: [ 0 1 -1 1 -1 1 -1 -1 1 -1 1 1] SMILES: CCCOc1ccc(C(=O)CCN2CCCCC2)cc1.Cl LABEL: [ 0 0 0 -1 1 0 0 -1 -1 -1 0 0] SMILES: CCOP(=S)(OCC)SC(CCl)N1C(=O)c2cccc c2C1=O LABEL: [ 0 0 1 0 1 1 0 1 0 0 -1 -1] SMILES: O=c1c(O)c(-c2ccc(O)cc2)oc2cc(O)cc(O)c 12 LABEL: [ 0 0 1 -1 1 1 -1 0 0 0 1 0] 2948 3895 6558 7381
  53. 53. Dataset introduction - QM9 # of Dataset: 133,885 Label - Following property is included: 'A', 'B', 'C', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv' Example: SMILES: NC1=NCCC(=O)N1 LABEL: [ 3.51 1.93 1.29 2.54 64.1 -0.236 -2.79e-03 2.34e-01 900.7 0.12 -396.0 -396.0 -396.0 -396.0 26.9] SMILES: CN1CCC(=O)C1=N LABEL: [3.285 2.062 1.3 4.218 68.69 -0.224 -0.056 0.168 914.65 0.131 -379.959 -379.951 -379.95 -379.992 27.934] SMILES: N=C1OC2CC1C(=O)O2 LABEL: [2.729 1.853 1.474 4.274 61.94 -0.282 -0.026 0.256 887.402 0.104 -473.876 -473.87 -473.869 -473.907 24.823] SMILES: C1N2C3C4C5OC13C2C5 LABEL: [ 3.64 2.218 1.938 0.863 69.48 -0.232 0.074 0.306 756.356 0.128 -400.633 -400.628 -400.627 -400.662 23.434]
  54. 54. Example: HOMO Prediction by NFP with QM9 dataset Dataset preprocessing (for NFP Network) preprocessor = preprocess_method_dict['nfp']() dataset = D.get_qm9(preprocessor, labels='homo') # Cache dataset for second use NumpyTupleDataset.save('input/nfp_homo/data.npz', dataset) train_data_size = int(len(dataset) * train_data_ratio) train, val = split_dataset_random(dataset, train_data_size)
  55. 55. Example: HOMO Prediction by NFP with QM9 dataset Model definition class GraphConvPredictor(chainer.Chain): def __init__(self, graph_conv, mlp): super(GraphConvPredictor, self).__init__() with self.init_scope(): self.graph_conv = graph_conv self.mlp = mlp def __call__(self, atoms, adjs): x = self.graph_conv(atoms, adjs) x = self.mlp(x) return x model = GraphConvPredictor(NFP(16, 16, 4), MLP(16, 1)) Once a graph neural network is built, training is same as ordinary Chainer models.
  56. 56. Future work • Primitive operations – GraphConv, GraphPool, GraphGather • Graph Convolution models – Follow state of the art Graph Convolutional Neural Networks • Pretrained Models – We do not think to guarantee reproducibility of papers, though. • Off-the-shelf models – Neural message passing, 3D convolution, Generative models etc. • Dataset – MUTAG, MoleculeNet etc.
  57. 57. Table of contents 1. What is machine learning? a. Data driven approach b. Primer of deep learning (MLP/ CNN / Graph convolution network) 2. Prediction of chemical characteristics a. Rule-based approach vs. Learning-based approach b. Neural Message passing (NFP / GGNN etc.) 3. Chainer Chemistry a. Primer of Chainer b. Coding examples 4. Other topics (5 min.) a. Generation of chemical compounds b. Automatic chemical synthesis
  58. 58. From prediction to generation of molecules Prediction Generation Find molecules with desired properties from given compound libraries. Produce molecules not in the libraries that has desired properties
  59. 59. Molecule generation with VAE [Gómez-Bombarelli+16] ● Encode and decode molecules represented as SMILE with VAE in seq2seq manner. ● Latent representation can be used for semi-supervised learning. ● We can use learned model to find molecule with desired property by optimizing representation in latent space and decode it. Generated molecules are not guaranteed to be valid syntactically :( Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Herná ndez-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-Guzik, A. (2016). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science.
  60. 60. Grammar VAE [Kusner+17] Encode Convert a molecule to a parse tree to get a sequence of production rules and feed the sequence to RNN-VAE. Generated molecules are guaranteed to be valid syntactically ! Kusner, M. J., Paige, B., & Hernández-Lobato, J. M. (2017). Grammar Variational Autoencoder. arXiv preprint arXiv:1703.01925. Decode Generate sequence of production rules of syntax of SMILES represented by CFG
  61. 61. Conclusion • Data-based approach for chemical property prediction is getting more attention. • New material/drug discovery research may be accelerated by deep learning technology.

×