Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Link Mining for Kernel-based
Compound-Protein Interaction Predictions
Using a Chemogenomics Approach
Masahito Ohue Takuro ...
• Drug discovery and development
– >10 years time and >2 billion US dollars
– Possibility to reduce costs by computational...
• Compound-Protein Interaction (CPI) Prediction (2008-)
– a.k.a. Drug-Target Interaction Prediction
– Recognize that query...
p 1 p 2 p 3 p 4 p 5
c 1 1 1 0 0 0
c 2 0 0 1 0 0
c 3 0 1 1 0 0
c 4 0 0 1 0 0
c 5 0 0 0 0 0
c 6 0 0 0 0 0
4
CPI Prediction P...
• Basic Concept
“Similar compounds/proteins have similar interactions”
a) Kernel-based Machine Learning
b) Matrix Factoriz...
6
Pairwise Kernel Method (PKM)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interaction net...
7
Gaussian Interaction Profile (GIP)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interacti...
8
Gaussian Interaction Profile (GIP)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
1
0
0
︙
0
0
1
︙
interactio...
• Idea: ‘1’-information is more important
→ Network theory, graph mining, link mining
Data mining on world wide web, socia...
10
Link Indicators
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
0
0
1
︙
interaction matrix
1
0
0
︙
Bipartite...
11
Proposed Method: Link Indicator Kernel (LIK)
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
1
0
0
︙
0
0
1
︙...
12
CPI Prediction Method Summary
Aug 9th, 2017 ICIC2017 Masahito Ohue
1 1 0
1 0 0 …
0 1 1
︙
compound-protein
interaction n...
• Dataset
– General benchmark dataset by Yamanishi et al.
– Contains 4 CPI networks
– Similarity matrices (similarity kern...
14
Evaluation (According to Ding’s benchmarking)
c1
c2
c3
c4
c5
c6
c1
c2
c3
c4
c5
c6
c1
c2
c3
c4
c5
c6
p1 p2 p3 p4 p5 p6 p...
15
Prediction Accuracy (Cross-Validations)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
compound-wise CV protein-wise CV pairwise CV
AUPR...
• Computational complexity
– #compound= , #protein=
– PKM Learning:
– Calc. LIKs:
– Total:
• Experimental computational ti...
• We proposed Link Indicator Kernel (LIK) for CPI predictions.
• Compared with GIP, the calculation time was the same and
...
Acknowledgements
18
This work was partially supported by the Japan Society for the Promotion of Science (JSPS)
KAKENHI (gr...
Supplements
• Proof: Cosine similarity & LHN
• Proof: Jaccard index
– Previously proved to be positive definite by Bouchard et al.
20
...
• Normally, a vector for a pair of compounds
and proteins is required for ML scheme.
• In the PKM, is defined as the tenso...
22
Observed Distribution of Link Indicator Frequency
Aug 9th, 2017 ICIC2017 Masahito Ohue
moderate distribution
immoderate...
23
Overall Prediction Accuracy
Aug 9th, 2017 ICIC2017 Masahito Ohue
Overall prediction accuracy for each CPI prediction me...
• SVM
– Cost parameter C = {0.1, 1, 10, 100}
– Multiple kernel weight wk = {0.1, 0.3, 0.5, 1}
• Cross validation (CV)
– 3 ...
25
History of the CPI Prediction Methods
Aug 9th, 2017 ICIC2017 Masahito Ohue
2008 2017 year
KRM (Yamanishi et al., 2008)
...
Prochain SlideShare
Chargement dans…5
×

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

1 857 vues

Publié le

Thirteenth International Conference on Intelligent Computing (ICIC2017)
R13: Protein and Gene Bioinformatics: Analysis, Algorithms and Applications, Aug 9, 2017.

Masahito Ohue, Takuro Yamazaki, Tomohiro Ban, Yutaka Akiyama.
In Proceedings of the Thirteenth International Conference On Intelligent Computing (ICIC2017) (Lecture Notes in Computer Science), 10362, 549-558, Liverpool,UK August 7-10, 2017
https://link.springer.com/chapter/10.1007/978-3-319-63312-1_48

Publié dans : Données & analyses
  • Soyez le premier à commenter

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

  1. 1. Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach Masahito Ohue Takuro Yamazaki Tomohiro Ban Yutaka Akiyama Department of Computer Science, School of Computing, Tokyo Institute of Technology, Japan. Thirteenth International Conference on Intelligent Computing (ICIC2017) R13: Protein and Gene Bioinformatics: Analysis, Algorithms and Applications, Aug 9, 2017.
  2. 2. • Drug discovery and development – >10 years time and >2 billion US dollars – Possibility to reduce costs by computational approaches • molecular docking, QSAR/QSPR modeling, toxicity prediction, compound-protein interaction prediction 2 Background Aug 9th, 2017 ICIC2017 Masahito Ohue Paul SM, et al. Nat Rev Drug Discov. 2010, 9(3):203.
  3. 3. • Compound-Protein Interaction (CPI) Prediction (2008-) – a.k.a. Drug-Target Interaction Prediction – Recognize that query compound and protein are interact (1) or not (0) by using machine learning (ML) with compounds, proteins, and interaction information. • Called “chemogenomics approach” – It leads to find new targets, side effects, toxicity, etc. 3 Compound-Protein Interaction Prediction Aug 9th, 2017 ICIC2017 Masahito Ohue compounds proteins Unknown interaction (negative label, 0) c1 c2 c3 c4 c5 c6 p1 p2 p3 p4 p5
  4. 4. p 1 p 2 p 3 p 4 p 5 c 1 1 1 0 0 0 c 2 0 0 1 0 0 c 3 0 1 1 0 0 c 4 0 0 1 0 0 c 5 0 0 0 0 0 c 6 0 0 0 0 0 4 CPI Prediction Problem Aug 9th, 2017 ICIC2017 Masahito Ohue Notation Interaction Matrix Feature Vector of Compounds Feature Vector of Proteins e.g. MACCS Key, PubChem fingerprint, etc. e.g. PFAM fingerprint, amino acid k-mer, etc.
  5. 5. • Basic Concept “Similar compounds/proteins have similar interactions” a) Kernel-based Machine Learning b) Matrix Factorization p 1 p 2 p 3 p 4 p 5 c 1 1 1 0 0 0 c 2 0 0 1 0 0 c 3 0 1 1 0 0 c 4 0 0 1 0 0 c 5 0 0 0 0 0 c 6 0 0 0 0 0 5 Major Approaches for CPI Prediction Aug 9th, 2017 ICIC2017 Masahito Ohue →Kernel-based machine learning c 1 c 2 c 3 c 4 c 5 c 6 p 1 p 2 p 3 p 4 p 5 decomposition feature vectors kernel functions
  6. 6. 6 Pairwise Kernel Method (PKM) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ interaction matrix compound kernel (similarity matrix) protein kernel (similarity matrix) Learning (SVM, etc.) Pairwise Kernel (kernel trick) Training data Pairwise Kernel Method (PKM) (Jacob & Vert. Bioinformatics 2008) Prediction Model
  7. 7. 7 Gaussian Interaction Profile (GIP) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ interaction matrix compound kernel (similarity matrix) protein kernel (similarity matrix) Prediction Model Training data Gaussian Interaction Profile (GIP) (van Laarhoven, et al. Bioinformatics 2011) Learning (SVM, etc.) Integration (Multiple kernel scheme) Similar interaction patterns → Have similar interactions
  8. 8. 8 Gaussian Interaction Profile (GIP) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound GIP kernel protein GIP kernel GIP kernels (Gaussian kernel) GIP method • More accurate than using only compound/protein similarities Problem = ‘0’ and ‘1’ are almost same • All ‘0’ vectors obtained maximum value, same as all ‘1’ vectors . • ‘1’ is experimentally determined, but ‘0’ is unknown interaction. • ‘1’-information should be considered more reliable than ‘0’.
  9. 9. • Idea: ‘1’-information is more important → Network theory, graph mining, link mining Data mining on world wide web, social network, biological networks, etc. • Link indicators used in the field of link mining were applied to CPI bipartite network. 9 Idea from Link Mining Aug 9th, 2017 ICIC2017 Masahito Ohue Bipartite graph (network)General graph (network) node link (edge) Group A Group B
  10. 10. 10 Link Indicators Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 0 0 1 ︙ interaction matrix 1 0 0 ︙ Bipartite network (CPIs) 3 link indicators were used in this study Calculate link indicator Jaccard index Cosine similarity LHN because these link indicators become positive definite kernels when used as kernels. *
  11. 11. 11 Proposed Method: Link Indicator Kernel (LIK) Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound Link Indicator Kernel protein Link Indicator Kernel Link Indicator Kernels (LIKs) • All ‘0’ vectors obtained minimum value • ‘1’-information are considered more important than ‘0’. *
  12. 12. 12 CPI Prediction Method Summary Aug 9th, 2017 ICIC2017 Masahito Ohue 1 1 0 1 0 0 … 0 1 1 ︙ compound-protein interaction network ︙ 1 0 0 ︙ 0 0 1 ︙ interaction matrix interaction profile compound kernel (similarity matrix) protein kernel (similarity matrix) compound kernel (GIP/LIK) protein kernel (GIP/LIK) Prediction Model Learning (SVM, etc.)
  13. 13. • Dataset – General benchmark dataset by Yamanishi et al. – Contains 4 CPI networks – Similarity matrices (similarity kernels) are precomputed • Prediction Methods – PKM w/similarity kernels + GIP (conventional) – PKM w/similarity kernels + LIK (Jac/cos/LHN) (proposed) Nuclear Receptor GPCR Ion Channel Enzyme #compounds 54 223 210 445 #proteins 26 95 204 664 #interactions 90 635 1476 2926 Density 6.41% 3.00% 3.45% 0.99% 13 Benchmarking Aug 9th, 2017 ICIC2017 Masahito Ohue (Yamanishi, et al. Bioinformatics, 2008) http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/
  14. 14. 14 Evaluation (According to Ding’s benchmarking) c1 c2 c3 c4 c5 c6 c1 c2 c3 c4 c5 c6 c1 c2 c3 c4 c5 c6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6 p1 p2 p3 p4 p5 p6 compound-wise CV protein-wise CV pairwise CV 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 1 1 0 0 ? ? ? ? ? ? ? ? ? ? ? ? 1 0 0 1 ? ? 0 1 0 0 ? ? 1 0 0 0 ? ? 1 0 1 1 ? ? 0 1 0 0 ? ? 0 0 1 1 ? ? ? 0 ? 1 0 ? 0 ? 0 ? 1 1 1 0 0 0 0 ? 1 0 ? ? 0 0 ? 1 0 0 0 1 ? 0 ? 1 ? 0 • 3 Types of Cross-Validations (CVs) • AUROC and AUPR Precision Recall TP rate FP rate AUPR AUROC Perfect prediction →AUROC = 1 Random prediction →AUROC = 0.5 (diagonal line) Perfect prediction →AUPR = 1 Random prediction →AUPR = density (avg. AUPR≒0.035) * 10-fold CV was randomly tried 5 times and the accuracy (AUROC, AUPR) were averaged. (Ding, et al. Brief Bioinform 2014) CPIs have much fewer positives than negatives, and FPs should be weighed more. AUPR punishes FPs more than AUROC. →AUPR is more important than AUROC
  15. 15. 15 Prediction Accuracy (Cross-Validations) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 compound-wise CV protein-wise CV pairwise CV AUPR Jaccard index cosine similarity LHN GIP random 0.4 0.5 0.6 0.7 0.8 0.9 1 compound-wise CV protein-wise CV pairwise CV AUROC AUPR (averaged the 4 datasets) AUROC (averaged the 4 datasets) Aug 9th, 2017 ICIC2017 Masahito Ohue good good →LIKs are better than GIP, especially compound-wise and protein-wise CVs. LIK LIK LIK LIK LIK LIK
  16. 16. • Computational complexity – #compound= , #protein= – PKM Learning: – Calc. LIKs: – Total: • Experimental computational time 16 Computational Time Aug 9th, 2017 ICIC2017 Masahito Ohue Nuclear Receptor GPCR Ion Channel Enzyme Conventional (PKM) [sec] 0.0680 4.86 24.1 232 Proposed (LIK) [sec] 0.0850 5.17 24.8 239 Increase rate (%) 25% 6.4% 2.9% 3.3% Almost same calculation time as PKM small large dataset size
  17. 17. • We proposed Link Indicator Kernel (LIK) for CPI predictions. • Compared with GIP, the calculation time was the same and the accuracy was improved. – Especially, predictions for novel compound, novel protein were good. – Overall, LIK with cosine similarity was the most accurate. – The difference between LIK’s 0 and 1 handling may be successful. • LIK can also be applied to the derivation of GIP such as WNNGIP, KronRLS-MKL, etc. • Future Work – Hyperparameter search becomes a bottleneck in the CPI problem, but it can be accelerated with application of Bayesian Optimization*. – It may be better to treat unknown interaction as unknown label. Exploring the applicability of positive-unlabeled learning** is interest. 17 Conclusion Aug 9th, 2017 ICIC2017 Masahito Ohue **Lan, et al. Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing 206, 2016. *Ban, Ohue, Akiyama. (submitted)
  18. 18. Acknowledgements 18 This work was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (grant nos. 24240044 and 15K16081), and Core Research for Evolutional Science and Technology (CREST) “Extreme Big Data” (grant no. JPMJCR1303) from the Japan Science and Technology Agency (JST). Akiyama Lab. members Tokyo Tech Takuro Yamazaki Former student of our lab. Currently he is a graduate student of the University of Tokyo, Japan. Aug 9th, 2017 ICIC2017 Masahito Ohue
  19. 19. Supplements
  20. 20. • Proof: Cosine similarity & LHN • Proof: Jaccard index – Previously proved to be positive definite by Bouchard et al. 20 LIKs are Positive Definite Kernels Aug 9th, 2017 ICIC2017 Masahito Ohue Use the property of positive definite kernel: Let be a positive definite kernel and be an arbitrary function. Then, the kernel is also positive definite. Inner product is positive definite. Bouchard, et al. A proof for the positive definiteness of the Jaccard index matrix. Int. J. Approx. Reason. 54: 615-626, 2013.
  21. 21. • Normally, a vector for a pair of compounds and proteins is required for ML scheme. • In the PKM, is defined as the tensor product of the map of compound and protein . • Pairwise kernel is defined between two pairs of proteins and compounds and as 21 Pairwise Kernel Aug 9th, 2017 ICIC2017 Masahito Ohue Use only similarity matrices (kernels), do not use feature vector of .
  22. 22. 22 Observed Distribution of Link Indicator Frequency Aug 9th, 2017 ICIC2017 Masahito Ohue moderate distribution immoderate distribution
  23. 23. 23 Overall Prediction Accuracy Aug 9th, 2017 ICIC2017 Masahito Ohue Overall prediction accuracy for each CPI prediction method in 10-fold CV tests. The AUPR and AUROC values are averaged values of 3 CVs and 4 datasets (total average for 12 AUPR/AUROC values). wk : multiple kernel weight • Cosine similarity with wk=0.5 showed the best performance. • Compared with GIP, the accuracy of LIK showed higher accuracy overall.
  24. 24. • SVM – Cost parameter C = {0.1, 1, 10, 100} – Multiple kernel weight wk = {0.1, 0.3, 0.5, 1} • Cross validation (CV) – 3 types; Compound-wise, Protein-wise, Pairwise – 10-fold CVs – Division of the dataset was randomly tried 5 times – AUROC and AUPR 24 Settings for Learning Aug 9th, 2017 ICIC2017 Masahito Ohue
  25. 25. 25 History of the CPI Prediction Methods Aug 9th, 2017 ICIC2017 Masahito Ohue 2008 2017 year KRM (Yamanishi et al., 2008) PKM (Jacob & Vert, 2008) BLM (Bleakley et al., 2008) LapRLS (Xia et al., 2010) GIP (van Laarhoven et al., 2011) KBMF2K (Gonen et al., 2012) WNNGIP (van Laarhoven et al., 2013) BLMNII (Mei et al., 2013) MSCMF (Zheng et al., 2013) REMAP (Lim et al., 2016) Kernel-based Matrix Factorization-based KronRLS-MKL (Nascimento, et al., 2016) NRLMF (Liu et al., 2016)

×