This document summarizes a webinar on using machine learning and data mining techniques to predict drug repurposing opportunities for chronic pancreatitis. Specifically:
1. Ensemble learning techniques like kernel-based models were used to analyze drug and disease target interaction data from multiple sources to identify potential drug candidates for repurposing.
2. The top 5 repurposing candidates identified through this process were being evaluated further by the partner organization Mission-Cure with the goal of beginning patient trials by January 2020.
3. Additional techniques discussed included using compressed sensing to analyze drug-disease networks and predict side effects to help evaluate candidate drugs identified for repurposing opportunities.
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Pistoia Alliance-Elsevier Datathon
1. 21 June, 2019
Big Data Mining and AI for Drug
Repurposing
Pistoia Alliance Centre of Excellence
for AI in Life Sciences and Elsevier
Datathon Report
Panelists: Aleksandar Poleksic, Professor, University of Northern Iowa
Bruce Aronow, Co-director of the Computational Medicine Center at
Cincinnati Children’s Hospital Medical Center
Finlay Maclean, Elsevier, London UK
Jabe Wilson of Elsevier
Moderator: Vladimir Makarov
5. 21.06.2019
• Collaboration across Pharma, Academic and Non-Profit
• Data from both Elsevier and 3rd Party sources
• Machine Learning and other Analytics methods used
to predict Drugs to be repurposed for disease treatment
• Results validated by leading experts in the disease
(Chronic Pancreatitis)
• Our partner Mission-Cure is planning to take drugs to
patient trials by January 2020
• “The datathon exceeded our expectations,
producing 5 repurposing candidates to address
multiple chronic pancreatitis targets” Megan Golden
CEO Mission Cure
Predictive Analytics for Drug Repurposing
6. 1. March - July 2019: Finish identifying the most promising candidates, identify which ones need
additional preclinical work
3. July 24-26, 2019: PancreasFest meeting in Pittsburgh: coordinate preclinical work and
plan clinical trials with PI's
2. July - December 2019: Fund and conduct preclinical work; plan pilots/trials for safest, most
promising candidates
4. January - June 2020: Conduct small open-label pilots with safest, most promising candidates
and informed patient volunteers
5. July 2020 - June 2022: Conduct repurposing clinical trials using efficient trial designs (e.g.
aggregated n of 1 trials); develop master trial protocol
6. July 2022 - June 2024: Implement master trial to test multiple promising therapeutic candidates
alone and in combination
7. July 2024 - June 2027: Continue master trial until therapies identified
Predictive Analytics for Drug Repurposing
10. Toppcell database: using single cell gene expression data to
understand gene networks responsible for organ health and
disease
Single cell dataset(s)
Learned cell
annotation
User-
defined
Genelist
Biological
pathway-
based
Genelist
Cell type
specific
Genelist
± ±
Machine Learning-based Analysis
User-defined
cell
annotation
Normalization;
Clustering;
Differential analysis;
…
Processing
Interactive heatmaps
Re-
analysis
Searching
Clustering
Searching
Grouping
Enrichment
Eric Bardes
±
11. ToppCell: Leveraging the Human Cell Atlas
21.06.2019
Data Mining by
Organ/Cell Type
Search/
Cluster/
Enrich/Net
Derive models for
° Differentiation
° Organogenesis
° Pathways / Networks
° Cell-cell Interactions
° Physiology
° Pathology
Pancreas tissue individual single cells
13. Portal Views For Data Mining/Systems Biology-Driven Analyses
(1) find/select cell clusters/gene modules and anatomical contexts
(2) carry out enrichment analyses and machine learning prioritization of genes, pathways, interactions
(3) assemble/save/share/export integrated systems biological network models
Tissue/Sample-Associated Cell Population Gene
Modules: cell type-centered signatures allow
for the analysis of cell class and subclass
similarities and differences.
Use Case: compare/combine alveolar epithelial cell subtypes
genesignatures--perregion,per
stage,percelltype/subtype
|ß Single Cells (1,004 shown) à | Systems Biology via the LungMAP Portalè Note the profound functional association differences
between AT1 and AT2 subtype signatures. However, it is precisely through the combinations of their
specialized biological functions that alveolar structure and physiological function achieves highly
efficient air – blood gas exchange. This illuminates the utility of providing users with subtype and
stage-specific gene modules for multimodule and multimodal/technology-based biological network
analyses.
Single Cell Atlas(es) Per Protocol, source, cell-types, subtypes, and developmental stages
(example mouse Fluidigm- LungMAP all distal, all stages, by cell type)
Anatomic regions; Cell types; subtypes; develop stages
|ß16,400Genes(redundancyok)à|
User selects
cell types/
gene modules
for biological
network
analyses
AT1
cell junctions
cell projections
cytoskeleton
angiogenesis
vascular morphogen
AT2
surfactant biology
lipid biosynth
vesicles
lamellar body
secretion
21. The Problem
21
- Search space is huge
- Chemogenomic
- Pharmacologic
- Known information is sparse and heavily
biased
- Only positive measurements
- Possible data sources huge
- Multidomain multilevel information
Yella J, Yaddanapudi S, Wang Y, Jegga A. Changing trends in computational drug repositioning. Pharmaceuticals. 2018 Jun;11(2):57.
22. The Data
22
- 765 disease-associated targets
- 119401 positive interactions
- 203 targets with known bioactivities
- 44161 unique substances
- 2766 possible repurposable drugs
- 15 main genetic drivers
Accumulative bioactivies for disease-associated targets
No targets (accumulative)
Binding affinity for disease-implicated substances
23. Kernels and Similarity Metrics
23
Substances
- Morgan Fingerprint radius 3 to encode substructures
- Tanimoto Distance to determine substructure similarity
Targets
- Local Smith Waterman Alignment
Harish Kandan, Understanding the kernel trick. https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78
24. Kernel Explosion
24
- Apply Kronecker multiplication to drug and target
kernels
- Train Support Vector Machine on Kronecker kernel.
- Training kernel:
203 targets with known bioactivities
44161 bioactive substances
41 209 targets x 1 950 193 921 substances = 80 trillion!
~ 500TB!
- I wish I had a cluster that big..
25. Ensemble Learning
25
- Train multiple models!
- 1. Each takes subset of data
- 2. Each self-evaluates
- 3. Evaluate meta-learner
- 4. Feed genetic driver of CP
- 5. Predict on repurposable drugs
- 6. Weighted average of results
- Reach optimization limit around 0.94
AUROC (for kernels of 30 substances and
30 targets).
- Largest kernels still only around 1000.
26. Kronecker-RLS
26
Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC, Prudêncio
RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009; 5476:1030–7.
- Take advantage of inherent symmetry
- Eigendecompose similarity kernels
- Take advantage of kernel ‘trick’
- Employ regularised least squares
- Feed into ensemble!
- Homogenous bagging ensemble performed best
Final ensemble:
30 models, each:
- Trained and optimized on 500 substances
and 200 most bioactive targets
- Evaluated (model-level)
- Evaluated (ensemble-level)
- Predict!
27. Improvements
27
Sparse data
CGKronRLS (Semi-superversied learning)
Other pairwise relationships can be used
KronRLS-MKL (Multiple kernel learning)
Use of Guassian Interaction Profiles
Sequential model execution and storage
Boosting instead of bagging (sample level optimization)
Making numpy/BLAS work on distributed GPUs
Employ a meta-learning not voting classifier
Tapio Pahikkala. Fast gradient computation for learning with tensor product kernels and sparse training labels. Structural, Syntactic, and Statistical Pattern
Recognition (S+SSPR). volume 8621 of Lecture Notes in Computer Science, pages 123–132. 2014.
Nascimento AC, Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Pahikkala T, Airola A. RLScore: regularized least-squares learners. The Journal of Machine Learning Research. 2016 Jan 1;17(1):7803-7.Nascimento AC,
Prudêncio RB, Costa IG. A multiple kernel learning algorithm for drug-target interaction prediction. BMC bioinformatics. 2016 Dec;17(1):46.
Kashima H, Oyama S, Yamanishi Y, Tsuda K. On pairwise kernels: an efficient alternative and generalization analysis. Adv Data Min Knowl Disc. 2009;
5476:1030–7.
28. Using “compressed sensing” to
support drug repurposing for
chronic pancreatitis
Prof. Aleksandar Poleksić
Department of Computer Science
University of Northern Iowa
29. Compressed sensing for ADR prediction
• Idea: Factor 𝑅 𝑚×𝑛 into the product of
two lower dimensional matrices
𝑅 = 𝐹𝐺′
38. Collaborators:
Prof. Lei Xie, CUNY Graduate Center
References:
1. Poleksic, A., & Xie, L. (2018). Predicting serious rare adverse reactions of novel
chemicals. Bioinformatics, 34(16), 2835-2842.
2. Lim, H., Gray, P., Xie, L., & Poleksic, A. (2016). Improved genome-scale multi-target
virtual screening via a novel collaborative filtering approach to cold-start problem. Scientific
reports, 6, 38860.
3. Poleksic, A., & Xie, L. (2019). Database of Adverse Events Associated with Drugs and
Drug Combinations, in review.
39. Poll Question:
In what other medical area should we run
the next pre-competitive research
exercise?
A. Oncology
B. Heart Disease
C. Diabetes
D. Obesity
E. Some other unmet need (send
- No absolute line between disease modelling and target idenification. Next method illustrates this.
- This tool developed by Dr Bruce Aronow and his research group.
- Cell Atlas incredible project -> this builds upon this to gain greater understanding of disease mechanism.
- TODO: Labels bigger – Y AXIS cell type gene modules