SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Bowes J., et al. Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.
Nature Reviews Drug Discovery 2012;11:909–22.
Number in training set 4585 3106 2457
Median difference with/without feature (DpIC50) 0.35 -0.1 0
Cohan’s d 0.4 -0.26 0.02
Explainable AI
MedChemica
Virtual Toxicity Panel Screens to aid the Medicinal Chemist
A. G. Dossetter•, E. Griffen•, A. Leach•+, A. Lin‡, J. Stacey†, L. Reid§, S. Montague•.
•Medchemica Ltd, Macclesfield, UK, + Pharmacy and Biomolecular Sciences, Liverpool John Moores University, †Information School, University of Sheffield,
‡Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg,
§Bioinformatics Institute (A*STAR), 30 Biopolis Street, Matrix, Singapore 138671
Problem
Unforeseen toxicity via secondary pharmacology is a significant risk and when encountered late in a
discovery project’s life creates major issues and may even terminate it.
Chemists need to be alerted to potential risks but to be influenced they must be able to audit the reasons
and evidence for the alerts.
Solution
Build transparent models of critical toxicity targets
and communicate results in chemical structures
rather than just numbers. This is an example of
‘Explainable AI’ for chemists
contact@medchemica.com
Learning
• Models must be transparent and show structures to influence chemists
• Random Forest models with the correct descriptors can be used to show important features as pharmacophores and the evidence supporting them
• Error models can given a measure of confidence to predictions beyond use of an RMSE.
Chemists won’t make decisions without understanding
Language of medicinal chemists = structures / clear pharmacophores
Machine Learning method Description
MMPA transformations Example pairs
kNN + Morgan fp Structures of Nearest Neighbours
Random Forest + pharmacophore fp Compound highlighted with
important features
Graph analytics Connections between
compound families
Graph Convolutional Neural
Network (GCNN)
Graph node feature importance
– a work in progress
Aspects of Models
Pay attention to Feature Engineering
Clear definitions enables identifying key features
Transparency
Scientific
Sense
Consistency
Parsimony
Applicability
Performance
Modeler’s
domain
Chemist’s
domain
Interpretable
Failure cost high
Immature science
Highly skilled, critical users
Business-2-Business
Transparent and auditable
Black Box
Failure cost is low
Real time response critical
Interactive = self correcting
Business-2-consumer
User agnostic of process
Trying to explain black box models, rather than creating models that are
interpretable in the first place, is likely to perpetuate bad practice and can
potentially cause great harm to society. The way forward is to design
models that are inherently interpretable.
- Cynthia Rudin Nature Machine Intelligence (2019), 206–215.
Approach Application
Advanced Pharmacophore Features
Feature Definition
Basic Group Atom or group most likely protonated at pH 7.4
Acidic Group Atom or group most likely deprotonated at pH 7.4,
includes N and C acids
Acceptor Definitions derived from Taylor & Cosgrove
Donor Definitions derived from Taylor & Cosgrove
Hydrophobic C4 or greater cyclic or acyclic alkyl group
Aromatic Attachment connection of any group to an aromatic atom excluding
connections within rings
Aliphatic Attachment connection of any atom to an aliphatic group not in a ring.
Halo F,Cl, Br, I
Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54.
Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2), 133–138.
Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472.
Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including amidines, guanidine’s - MedChemica
definitions.
Simple
H bond
acceptor
base
acid
Precise
Diclofenac
(1973)
Sulfadiazine
(1941)
Pharmacophore Pairs
• Feature 1 – topological distance - Feature 2
• Engineered for chemical relevance – pairs can
be superimposed or directly linked, e.g.
enables a group to be both a hydrogen bond
acceptor and a base
• Used as unfolded 280 bit fingerprints
• A bit identifies a pharmacophore pair e.g. :
Aromatic - 3 bonds - Base
• Random Forest feature importance and Cohan’s d for effect size allow identification of critical features in models
• Highlight atoms by S Feature Importance coloured by direction of Cohan’s d
• Show statistics on the effect and variance of each feature
• Drill back to precise features and original compounds with data supporting that feature – complete transparency
Cardiac toxicity and Seizure are key toxicological risks
Cardiac
hERG ion channel inhibitor
NaV 1.5 channel inhibitor
Ca L type channel inhibitor
Ca T-type channel inhibitor
PDE 3A inhibitor
Seizure
Dopamine D1 receptor ant/ag
Dopamine D2 receptor ant/ag
Cannabinoid CB1 receptor ant/ag
Acetylcholine a1b2 receptor
agonist / antagonists
µ opioid agonist / antagonists
k opioid agonist
d opioid agonist/ antagonists
Muscarinic M1 receptor ant/ag
Muscarinic M2 receptor ant/ag
Seizure
5HT 1A receptor antagonists
5HT 1B receptor antagonists
5HT receptor antagonists 2A
GABA a1 antagonist
NMDA-NR1 agonist
5HT Transporter inhibitor
Dopamine Transporter inhib
Noradrenaline Transporter inh
Acetylcholine esterase
inhibitor
Monoamine oxidase inhibitor
PDE 4D inhibitor
Model ‘quality’, Error models and Domain of applicability
• Build models with 10 fold CV – report CV-Pearson’s R2 and CV RMSE
• Build a Random Forest error model to generate predicted error for each compound
• Error model can be used to flag compounds out of Domain of Applicability
hERG n=5968, RMSE = 0.16, CV Pearson’s R2 = 0.27
CHEMBL12713 sertindole,
prediction pIC50 7.8 [7.1 – 8.4], actual 8.2
.
Predictions and Transparency
Medicinal Chemistry
Seizure Models – RF and kNN
Dopamine
Transporter
Norepinephrine
Transporter
5HT1a
receptor
GABA-A
receptor
d Opioid
receptor
MAO-A
inhibitor
AChE
inhibitor
Training set
size
1712 1757 400 1526 1070 1684 3283
CV-R2 0.28 0.23 0.37 0.24 0.28 0.21 0.32
RMSE 0.13 0.18 0.21 0.29 0.18 0.29 0.16
Best Random Forest based models for seizure endpoints,
All the seizure data sets delivered kNN models based on Morgan fingerprints
hERG Example

Contenu connexe

Tendances

Gordon2003
Gordon2003Gordon2003
Gordon2003
toluene
 
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
csandit
 

Tendances (20)

Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Fbdd
FbddFbdd
Fbdd
 
Gordon2003
Gordon2003Gordon2003
Gordon2003
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
Docking
DockingDocking
Docking
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Docking
DockingDocking
Docking
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular docking and_virtual_screening
Molecular docking and_virtual_screeningMolecular docking and_virtual_screening
Molecular docking and_virtual_screening
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
Structure based drug design- kiranmayi
Structure based drug design- kiranmayiStructure based drug design- kiranmayi
Structure based drug design- kiranmayi
 
Structure Based Drug Design
Structure Based Drug DesignStructure Based Drug Design
Structure Based Drug Design
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Fragment based drug design
Fragment based drug designFragment based drug design
Fragment based drug design
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Data analysis workflows part 1 2015
Data analysis workflows part 1 2015Data analysis workflows part 1 2015
Data analysis workflows part 1 2015
 
Molecular Docking
 Molecular Docking Molecular Docking
Molecular Docking
 
Computer aided-drug-design-boc sciences
Computer aided-drug-design-boc sciencesComputer aided-drug-design-boc sciences
Computer aided-drug-design-boc sciences
 
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
 

Similaire à Griffen MedChemica Virtual Tox Panel

Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Sean Ekins
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
Steve Flynn
 
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapyProcess Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
MilliporeSigma
 
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapyProcess Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Merck Life Sciences
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interaction
sigma-tau
 

Similaire à Griffen MedChemica Virtual Tox Panel (20)

Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
 
Finland Helsinki Drug Research slides 2011
Finland Helsinki Drug Research slides 2011Finland Helsinki Drug Research slides 2011
Finland Helsinki Drug Research slides 2011
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
Sortase A Inhibition By Ugi Products (Complex)
Sortase A Inhibition By Ugi Products (Complex)Sortase A Inhibition By Ugi Products (Complex)
Sortase A Inhibition By Ugi Products (Complex)
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
 
COMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKAR
COMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKARCOMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKAR
COMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKAR
 
COMPUTER AIDED DRUG DESIGN BYJayant_Nimkar
COMPUTER AIDED DRUG DESIGN BYJayant_NimkarCOMPUTER AIDED DRUG DESIGN BYJayant_Nimkar
COMPUTER AIDED DRUG DESIGN BYJayant_Nimkar
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
 
COMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERYCOMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERY
 
The Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability StudiesThe Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability Studies
 
An Actionable Annotation Scoring Framework For Gas Chromatography - High Reso...
An Actionable Annotation Scoring Framework For Gas Chromatography - High Reso...An Actionable Annotation Scoring Framework For Gas Chromatography - High Reso...
An Actionable Annotation Scoring Framework For Gas Chromatography - High Reso...
 
Talk at Yale University April 26th 2011: Applying Computational Models for To...
Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...Talk at Yale University April 26th 2011: Applying Computational Modelsfor To...
Talk at Yale University April 26th 2011: Applying Computational Models for To...
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
genotoxic_impurities-Gowtham
genotoxic_impurities-Gowthamgenotoxic_impurities-Gowtham
genotoxic_impurities-Gowtham
 
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapyProcess Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
 
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapyProcess Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
Process Impurities: Don’t Let PEI or HCP Derail Your BioTherapy
 
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
2015 07 09__epigenetic_profiling_environmental_health_sciences_v422015 07 09__epigenetic_profiling_environmental_health_sciences_v42
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
 
Kamilar Resume
Kamilar ResumeKamilar Resume
Kamilar Resume
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interaction
 

Plus de Ed Griffen

Plus de Ed Griffen (8)

MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020MedChemica Levinthal Lecture at Openeye CUP XX 2020
MedChemica Levinthal Lecture at Openeye CUP XX 2020
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
SCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaSCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemica
 
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
 
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
 
Extracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology dataExtracting actionable knowledge from large scale in vitro pharmacology data
Extracting actionable knowledge from large scale in vitro pharmacology data
 
Pharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair AnalysisPharmacophore extraction from Matched Molecular Pair Analysis
Pharmacophore extraction from Matched Molecular Pair Analysis
 

Dernier

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Dernier (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 

Griffen MedChemica Virtual Tox Panel

  • 1. Bowes J., et al. Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nature Reviews Drug Discovery 2012;11:909–22. Number in training set 4585 3106 2457 Median difference with/without feature (DpIC50) 0.35 -0.1 0 Cohan’s d 0.4 -0.26 0.02 Explainable AI MedChemica Virtual Toxicity Panel Screens to aid the Medicinal Chemist A. G. Dossetter•, E. Griffen•, A. Leach•+, A. Lin‡, J. Stacey†, L. Reid§, S. Montague•. •Medchemica Ltd, Macclesfield, UK, + Pharmacy and Biomolecular Sciences, Liverpool John Moores University, †Information School, University of Sheffield, ‡Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, §Bioinformatics Institute (A*STAR), 30 Biopolis Street, Matrix, Singapore 138671 Problem Unforeseen toxicity via secondary pharmacology is a significant risk and when encountered late in a discovery project’s life creates major issues and may even terminate it. Chemists need to be alerted to potential risks but to be influenced they must be able to audit the reasons and evidence for the alerts. Solution Build transparent models of critical toxicity targets and communicate results in chemical structures rather than just numbers. This is an example of ‘Explainable AI’ for chemists contact@medchemica.com Learning • Models must be transparent and show structures to influence chemists • Random Forest models with the correct descriptors can be used to show important features as pharmacophores and the evidence supporting them • Error models can given a measure of confidence to predictions beyond use of an RMSE. Chemists won’t make decisions without understanding Language of medicinal chemists = structures / clear pharmacophores Machine Learning method Description MMPA transformations Example pairs kNN + Morgan fp Structures of Nearest Neighbours Random Forest + pharmacophore fp Compound highlighted with important features Graph analytics Connections between compound families Graph Convolutional Neural Network (GCNN) Graph node feature importance – a work in progress Aspects of Models Pay attention to Feature Engineering Clear definitions enables identifying key features Transparency Scientific Sense Consistency Parsimony Applicability Performance Modeler’s domain Chemist’s domain Interpretable Failure cost high Immature science Highly skilled, critical users Business-2-Business Transparent and auditable Black Box Failure cost is low Real time response critical Interactive = self correcting Business-2-consumer User agnostic of process Trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practice and can potentially cause great harm to society. The way forward is to design models that are inherently interpretable. - Cynthia Rudin Nature Machine Intelligence (2019), 206–215. Approach Application Advanced Pharmacophore Features Feature Definition Basic Group Atom or group most likely protonated at pH 7.4 Acidic Group Atom or group most likely deprotonated at pH 7.4, includes N and C acids Acceptor Definitions derived from Taylor & Cosgrove Donor Definitions derived from Taylor & Cosgrove Hydrophobic C4 or greater cyclic or acyclic alkyl group Aromatic Attachment connection of any group to an aromatic atom excluding connections within rings Aliphatic Attachment connection of any atom to an aliphatic group not in a ring. Halo F,Cl, Br, I Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54. Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2), 133–138. Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472. Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including amidines, guanidine’s - MedChemica definitions. Simple H bond acceptor base acid Precise Diclofenac (1973) Sulfadiazine (1941) Pharmacophore Pairs • Feature 1 – topological distance - Feature 2 • Engineered for chemical relevance – pairs can be superimposed or directly linked, e.g. enables a group to be both a hydrogen bond acceptor and a base • Used as unfolded 280 bit fingerprints • A bit identifies a pharmacophore pair e.g. : Aromatic - 3 bonds - Base • Random Forest feature importance and Cohan’s d for effect size allow identification of critical features in models • Highlight atoms by S Feature Importance coloured by direction of Cohan’s d • Show statistics on the effect and variance of each feature • Drill back to precise features and original compounds with data supporting that feature – complete transparency Cardiac toxicity and Seizure are key toxicological risks Cardiac hERG ion channel inhibitor NaV 1.5 channel inhibitor Ca L type channel inhibitor Ca T-type channel inhibitor PDE 3A inhibitor Seizure Dopamine D1 receptor ant/ag Dopamine D2 receptor ant/ag Cannabinoid CB1 receptor ant/ag Acetylcholine a1b2 receptor agonist / antagonists µ opioid agonist / antagonists k opioid agonist d opioid agonist/ antagonists Muscarinic M1 receptor ant/ag Muscarinic M2 receptor ant/ag Seizure 5HT 1A receptor antagonists 5HT 1B receptor antagonists 5HT receptor antagonists 2A GABA a1 antagonist NMDA-NR1 agonist 5HT Transporter inhibitor Dopamine Transporter inhib Noradrenaline Transporter inh Acetylcholine esterase inhibitor Monoamine oxidase inhibitor PDE 4D inhibitor Model ‘quality’, Error models and Domain of applicability • Build models with 10 fold CV – report CV-Pearson’s R2 and CV RMSE • Build a Random Forest error model to generate predicted error for each compound • Error model can be used to flag compounds out of Domain of Applicability hERG n=5968, RMSE = 0.16, CV Pearson’s R2 = 0.27 CHEMBL12713 sertindole, prediction pIC50 7.8 [7.1 – 8.4], actual 8.2 . Predictions and Transparency Medicinal Chemistry Seizure Models – RF and kNN Dopamine Transporter Norepinephrine Transporter 5HT1a receptor GABA-A receptor d Opioid receptor MAO-A inhibitor AChE inhibitor Training set size 1712 1757 400 1526 1070 1684 3283 CV-R2 0.28 0.23 0.37 0.24 0.28 0.21 0.32 RMSE 0.13 0.18 0.21 0.29 0.18 0.29 0.16 Best Random Forest based models for seizure endpoints, All the seizure data sets delivered kNN models based on Morgan fingerprints hERG Example