Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

MedChemica Active Learning - Combining MMPA and ML

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 41 Publicité

MedChemica Active Learning - Combining MMPA and ML

Télécharger pour lire hors ligne

Describes MedChemica research on combining Matched Molecular Pair Analysis (MMPA) and Machine Learning (ML) into a closed loop to find and optimize new hits for drug discovery. The talks describes the MMPA and Regression Forest models and how they were combined and some early conclusion. Of these permutative MMPA is the clear winner (Free Wilson ++)

Describes MedChemica research on combining Matched Molecular Pair Analysis (MMPA) and Machine Learning (ML) into a closed loop to find and optimize new hits for drug discovery. The talks describes the MMPA and Regression Forest models and how they were combined and some early conclusion. Of these permutative MMPA is the clear winner (Free Wilson ++)

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à MedChemica Active Learning - Combining MMPA and ML (20)

Publicité

Plus récents (20)

MedChemica Active Learning - Combining MMPA and ML

  1. 1. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 October 2020 Not for Circulation Accelerating lead optimisation with Active Learning - joining MMPA ADMET knowledge with Regression Forest machine learning models Dr Alexander G. Dossetter Managing Director, MedChemica Ltd Available on Slideshare - search for Dossetter Twitter @MedChemica Twitter @covid_moonshot Twitter #BucketListPapers https://www.medchemica.com/bucket-list/
  2. 2. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Agenda • Problem statement • What is Active Learning? – How can it applied to LI and LO? • Generating new ideas with MMPA – Enumeration with MMPA (RuleDesignTM) • “hit-to-lead” / “AllRules” / 3pairtrans • Protein class Rule sets – Permutative-MMPA (Free Wilson ++) • Getting the best ideas from small data sets • Regression Forest models for ‘potency’ prediction – QSAR revisited with transparent descriptors - Analysis of Error • Learnings so far – The system can ‘gets stuck’ at the start… • ”It’s like the first 8 moves in chess”
  3. 3. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Problem Statement …8 Years of working with pharma companies “Our median number of compounds per LO project is 3000 - this is unsustainable… [it should be] 300” – Director of Chemistry (large pharma) “Can we define the text book of medicinal chemistry?” – Director of Comp Chem (large pharma) “We are aiming at 300 compound per project. Currently we are about 400, we will get better” – ExScienta scientist at SCI ‘What can Big Data do for chemistry” “Can you find us hits [leads] and predict potency on this [brand] new protein?” - Many many people…. MedChemica: using knowledge extraction techniques to build Artificial Intelligence (AI) systems to reduce the time and cost to critical compounds and candidate drugs.
  4. 4. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Problem Statement “Can you find us hits [leads] and predict potency on this [brand] new protein?” Can we automate Lead compound design? The algorithm will:- - design compounds and explore SAR - ‘actively’ selecting compounds to improve properties - AND improve the machine learning models Small amount of data Matched Molecular Pair Analysis Explainable QSAR Awesome leads pIC50 > 7, good in-vitro PK SAR, Novelty
  5. 5. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Augmenting the Medicinal Chemist Prioritizes options Sets goals Makes Decisions Data is organized and summarized
  6. 6. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Augmented Chemists proposalsRuleDesignTM Permutative MMPA Missing features Explainable QSAR models Alerts ideas Score and store Make & test SpotDesignTM SLIDE 27
  7. 7. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Augmenting the Chemist: Lessons so far… Develop AI constructively • Use methods that can be directly connected to chemical structures and data – SpotDesign™, RuleDesignTM, Permutative MMPA, Explainable QSAR • Ensure that all methods are auditable – See the transformations and underlying data, see the pharmacophore pairs on molecules • Automate updates and track metrics – All systems are automated from the start, logging is built in • Integrate automated systems and chemists ideas Principles for Positive Engagement • Define common goals • Evaluate with directly observable data • Expose conflicting views • Continuous learning and improvement • Place in context Chemists: AI Is Here; Unite To Get the Benefits, Griffen E.J.; Dossetter, A.G.; Leach,A.G; J. Med. Chem. 2020, 63, 16, 8695–8704. https://doi.org/10.1021/acs.jmedchem.0c00163
  8. 8. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Data Warehouse rule finder Exploitable Knowledge Molecule problem solving Explainable QSAR Automated loader MMPA Clean Structures & Data Property Prediction Idea ranking Instant SAR analysis REST API & GUI Explainable AI for Medicinal Chemistry Design
  9. 9. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739 - 7750. Leach et al. J. Chem. Inf. Model. 2017, 57, 2424 - 2436 Fully Automated Matched Molecular Pair Analysis (MMPA) What is this form of Artificial Intelligence? Δ Data A- B1 2 2 3 3 3 4 4 4 12 23 3 34 4 4A B • Matched Molecular Pairs – Molecules that differ only by a particular, well-defined structural transformation • Capture the change and environment – MMPs can be recorded as transformations from A B • Statistical analysis to define “medicinal chemistry rules” Defined transformations with high probability of improving properties of molecules • Store in a high performance database and provide an intuitive user interface Level 4 and higher very important to P-MMPA
  10. 10. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 A B pSol A (μM) pSol B (μM) ∆pSol - 4.3(48 μM) - 3.2 (700μM) 1.1 - 6.0 (1.0 μM) - 3.7 (178 μM) 2.3 -5.7 (2.0 μM) - 4.1 (82 μM) 1.6 3 pairs +ve Sol Median 1.6 CHEMBL1949790CHEMBL1949786 From SAR to MMPA….. CHEMBL3356658 CHEMBL218767 CHEMBL456322CHEMBL456802 MCPairs Rule finder required 6 matched pairs for 95% confidence (Al)(Al)
  11. 11. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 The Matched Pairs leading to Rule….. Actual Rule from MCPairs Endpoint: Aqueous Solubility at pH 7.4 [CHEMBL2362975] n-qual 69 n-qual-up 47 n-qual-down 21 median ∆pSol 0.26 std dev +/- 0.636 (Al)(Al) Explainable • Drill back to real world examples and measured data Actionable • Clear decision to make the compound
  12. 12. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Identify and group matching SMIRKS Calc ulate statistical parameters for eac h unique SMIRKS(n, median, sd, se, n_up/ n_down) Is n ≥ 6? Not enough data: ignore transformation Is the | median| ≤ 0.05 and the interc entile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the signific anc e of the up/ down frequenc y transformation is c lassified as ‘neutral’ Transformation c lassified as ‘NED’ (No Effec t Determined) Transformation c lassified as ‘increase’ or ‘ decrease’ depending on whic h direc tion the property is c hanging passfail yesno yesno Rule selection 0 +ve-ve Median data difference Neutral IncreaseDecrease NED • No assumption of normal distribution • Manages ‘censored’ = qualified / out-of-range data Leach et al. J. Chem. Inf. Model. 2017, 57, 2424 - 2436
  13. 13. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Molecule Problem Solving - RuleDesignTM RuleDesignTM (formally “Compounds From Rules”) • Exploitable Knowledge is a Rule database derived from MMPA • User puts in a problem molecule with a property they wish to improve o e.g. solubility, metabolism, hERG…. • System generates potential improved molecules based on data Exploitable Knowledge Enumerator System Problem molecule + property to improve Solution molecules Watch RuleDesignTM on YouTube https://www.youtube.com/watch?v=nQxXddJDTfc “..it’s like asking 150 of your peers for ideas in just a few seconds” - Principal Scientist (large pharma)
  14. 14. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Looking at the results Results sorted in increasing RMM (Mol Weight) Yellow highlight is the overlap with the input compound One column per assay – colour and direction - LogD decrease, Sol increase Hyperlink to “Drill back” to the original data
  15. 15. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 “Multi-Step” transformations Shibuya Crossing Tokyo A C B E F Would you go steps via A -> B -> C How would you go know to go E -> F Or go straight there via D - if the data said it was good? D A Turing test for molecular generators Darren Green D.; et al J. Med. Chem. 2020 https://doi.org/10.1021/acs.jmedchem.0c01148
  16. 16. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 How many pairs? – deeper Goal setting Specific Goal settings Non-rules transformations from pair counts ’All Rules’ – all of the Increase and Decrease Rules for all datasets – warning output can be large – not suitable for Excel spreadsheet ‘Hit to Lead’ – most frequent transformations chemists perform ’Min 3 pair Trans’ – all transformations with 3 OR MORE matched pairs ‘Min 6 pair Trans’ – all transformations with 6 OR MORE matched pairs - Actually Increase, Decrease, Neutral and NED
  17. 17. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Broad Rule Sets • “Rules” for increasing “potency” are gathered by MMPA • Individual assay Rules (numbers in brackets) are grouped as a “Broad” Goal • Example Dopamine Rules number 3548 (screen shot) • Therefore new hits for a new Dopamine target can have these Rules applied [What worked in the past?]
  18. 18. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Permutative MMPA • Take all compounds in a data set • Find all matched pairs & extract DpIC50 and the transforms between them • Aggregate transformations with median DpIC50 and count of pairs • Apply all transformations back to the initial data set (at the most specific environment level) NO R GROUP MAPPING REQUIRED !!! • Predicted pIC50 = substrate pIC50 + median DpIC50 • Remove existing compounds • Prioritize new compounds by pIC50 estimate M1 M2 M3 M4 t1 M5 t1 t1 M* Internal Structures & data Apply transforms New structures & estimated data Filter and prioritize Extract transforms Remove existing compounds
  19. 19. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Exploit Own or Patent Data External Patents & data Extract transforms Apply transforms Filter and prioritize Internal Structures & data Apply transforms New structures & estimated data Filter and prioritize Extract transforms Remove existing compounds
  20. 20. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Client Oncology PPI project example • 386 patent compounds analyzed • 6024 pair relationships found(39% - good number of MMPs) • Permutative MMPA process: • Apply to own series, • Then filter: • remove undesirable substructure • Estimated potency >= 6.5, • clogP <= 2.5 • 52 suggestions Measurement = p(TR-FRET nucleotide exchange assay pIC50) or estimated pIC50 from seed value + DpIC50 Explainable • Visible, original real world compounds and measurement Actionable • Prioritises ‘realistic’ next step compounds. PPIpIC50 cLogP Molecule suggestions yes no
  21. 21. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Regression Forest Models • Features are acid, base, hydrogen bond donor, acceptor, hydrophobe, aromatic attachment, aliphatic attachment and halogen. Definitions are highly engineered [SMARTS] • Feature 1 – topological dist - Feature 2 • Engineered for chemical relevance – features can be superimposed or directly linked, e.g. enables a group to be both a hydrogen bond acceptor and a base • A bit identifies a pharmacophore pair e.g. : Aromatic - 3 bonds - Base • Used as unfolded 360 bit fingerprints • Regression Forest as ML method • Build models with 10 fold CV – report CV-Pearson’s R2 and CV RMSE • Build RF error model to generate predicted error for each compound using the same descriptors
  22. 22. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Feature Definition Basic Group Atom or group most likely protonated at pH 7.4 Acidic Group Atom or group most likely deprotonated at pH 7.4, includes N and C acids Acceptor Definitions derived from Taylor, Cosgrove et al Donor Definitions derived from Taylor, Cosgrove et al Hydrophobic C4 or greater cyclic or acyclic alkyl group Aromatic Attachment connection of any group to an aromatic atom excluding connections within rings Aliphatic Attachment connection of any atom to an aliphatic group not in a ring. Halo F,Cl, Br, I Reference for Donor acceptor feature definitions: Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472. Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including amidines, guanidine’s - MedChemica definitions. MedChemica Advanced Pharmacophore Pairs Gobbi, A.; Poppinger, D. Biotechnology and Bioengineering 1998, 61 (1), 47–54. Reutlinger, M.; Koch, C. P.; Reker, D.; Todoroff, N.; Schneider, P.; Rodrigues, T.; Schneider, G. Mol. Inf. 2013, 32 (2), 133–138.
  23. 23. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Regression Forest & Pharmacophore understanding • hERG – auditable models • Identify important chemical features driving potency • Predict hERG potency from RF model [10 fold CV] Pharmacophore fp length 280 10 fold CV Compounds in training 6196 RMSE 0.37 CV R2 0.51
  24. 24. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Examples of exact Pharmacophore Pairs HBA-same_group-Base HBA-1_atom-HBD Base-2_atom-Ar Topological distances are precisely specified and can be exactly visualized on the molecules – no ambiguity over which features are correlated with activity Critically – enables interrogation and validation of SAR understanding Record as an unfolded fingerprint of 360 bits, 1 or 0 for presence or absence of a feature-distance-feature pair
  25. 25. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 • hERG – auditable models • Predict hERG potency from RF model [10 fold CV] • Example CHEMBL12713 sertindole • Colour structure by feature importance weighted sum of of pharmacophore pair fingerprints – show the chemists where the hotspots are. • Drill deeper to show the most important positive and negative features. RF prediction pIC50 7.8 median_with: 5.1 median_without: 4.7 median_diff: 0.4 n_examples_with: 4585 n_examples_without : 1383 median_with: 5.1, median_without: 5.3 median_diff: -0.2 n_examples_with: 3106 n_examples_without : 2862 Regression Forest & Pharmacophore understanding
  26. 26. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Explainable – chemists can see the parts of the molecule that count Explainable • Highlighted features show the chemist the contribution to the prediction Actionable • Which parts should be optimized to achieve the Goal Explainable • Nearest Neighbours show original data on which model is built Actionable • What weight do I put on this results? How likely is it? Do we test?
  27. 27. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 RF and kNN are good but…… • The models are good but could be great or even superb.. • Analysis of error identifies the exact “functional groups” that are less accurately predicted • A feedback loop could design cmpds to improve models  testing • “Either not enough or the wrong sort of data – the downfall of AI in Life Science?” – Dossetter, A.G. https://www.linkedin.com/pulse/either-enough-wrong-sort-data-downfall-ai-life-al-dossetter/ Using the model RMSE to estimate error: 78% measured values in range prediction +/- RMSE
  28. 28. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Overview Generate virtual compounds from MCPairs MMPA • Hit-to-Lead transformations – the most used medicinal chemistry • ADMET transformations for metabolism and solubility • Target class transformations learning from target analogues • E.g. Dopamine Rule Regression forest models • Accurate pharmacophore features with topological distance • Unfolded fingerprints connect feature importance to pharmacophores • Error models give accuracy of prediction for each compound Active Learning • Explore Strategy - predicted high potency, high error • Exploit Strategy - predicted high potency, low error
  29. 29. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Active Learning Hits Build model with error estimates Enumerate Select for Explore and Exploit Synthesise & Test Compounds with data Compounds meet criteria? Yes No STRATEGIES Explore: prioritize high error Exploit : prioritize high potency & low error Ratio of explore to exploit varies with stage Select enumeration strategy by stage: Hit-to lead, target class, solubility, metabolism For in silico simulation match to known and measured compounds System operational
  30. 30. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Active Learning – V1 Challenges: • How to get started when you only have a few compounds to model build from • limited synthesis resource D2 Case study • Start with 30 literature compounds : 5 <= pIC50 <=6 , -1 < AlogP < 3.5, selected by LLE sort (literature contains 5200 compounds) • Build RF model CV-R2 -0.26, small data set • Enumerate from all compounds: • What is the best enumeration strategy? – how to pick the (few)compounds to make from the enumerated set? – Enumeration is a success if we match literature compounds (very stringent test) – Have we learnt all that the initial set of compounds can teach us? Strategy (MMPA) Number of compounds generated Number of matches to D2 known set Maximum pIC50 (actual) Maximum pIC50 (predicted[error]) Hit-to-Lead 682 10 7.8 5.5[0.21] Dopamine class 469 8 7.9 5.5[0.23] Solubility 10148 10 7.8 5.5[0.21] Metabolism 12729 19 7.9 5.5[0.21] Permutative MMPA (env = 4) 5 3 7.9 6.1[?] D2pIC50 cLogP Round 1…..
  31. 31. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 D2 worked example – The p-MMPA Predicted: pIC50 6.1, actual pIC50 7.9 Finding all the MMP SAR that is present and applying it exhaustively including behind the Pareto frontier. D2pIC50 cLogP
  32. 32. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Active Learning v2 System under development Hits Compounds with data P-MMPA Under Dev Compounds with data Build model with error estimatesEnumerate Select for Explore and Exploit Synthesise & Test Compounds meet criteria? Yes No Explore: prioritize high error Exploit : prioritize high potency & low error Ratio of explore to exploit varies with stage Enumerate by: target class, solubility, metabolism Compounds with data Need initial “induction phase” before cyclic automated active learning can be applied
  33. 33. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Like the opening in chess game • “The first moves of a chess game are termed the "opening" or "opening moves". A good opening will provide better protection of the King, control over an area of the board (particularly the centre), greater mobility for pieces, and possibly opportunities to capture opposing pawns and pieces.” A Beginner's Garden of Chess Openings - David A. Wheeler • Success or failure of an automated active learning system could be like the first few moves of a chess – they shape the game… • Will it always need a human intervention (or ten…)? …set up for either Queen’s Gambit, King’s Indian Defense, Nimzo-Indian, Bogo-Indian, Queen’s Indian Defense, and Dutch Defense.
  34. 34. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Learning from First Experiments…. • MMPA and RF work together to suggest and rank compound designs • Strategies explored – Explore: prioritize high error – Exploit : prioritize high potency & low error • Ratio of explore / exploit varies with stage • The initial phase from a small number of hits is a challenge – Hit-to-Lead / ADMET Rules did not match compounds in literature – Victims of what is published – Requires full datasets – Process can get “stuck” • Human intervention may always be required • Both MMPA and RF can select compounds to make to improve models – analysis of error. • Permutative-MMPA works very well (of course) • Where AI could help is a compound selector depending on strategy
  35. 35. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 • Dr Alexander G. Dossetter • Managing Director, MedChemica Ltd • al.dossetter@medchemic.com • MedChemica • Lauren Reid • Jessica Stacey • Phil De. Sousa • Shane Montague • Edward J. Griffen • Andrew G. Leach • Available on Slideshere - search for Dossetter • Twitter @MedChemica • Twitter #BucketListPapers • https://www.medchemica.com/bucket-list/ Thank you
  36. 36. Exploiting medicinal chemistry knowledge to accelerate projects October 2020October 2020 Not for Circulation About MedChemica >10 experience in building A.I. Systems for drug discovery
  37. 37. Exploiting medicinal chemistry knowledge to accelerate projects October 2020
  38. 38. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 • Founded in 2012 by AZ AP Medicinal / Computational chemists to accelerate drug hunting by exploiting data driven knowledge • Domain leaders in SAR knowledge extraction and knowledge based design • > 11 years experience of building AI systems that suggest actions to chemists (7 years as MedChemica) • Creators of largest ever documented database of medicinal chemistry ADMET knowledge MedChemica Publications
  39. 39. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 AI Software Platforms – Complete In-house platform – Analysis of own data and automated updating – Design tool access to all chemists – Custom fitting (Software-as-a-Service) One stop GUI Design tool Biotech, Universities and Foundations Medium to large pharma, agrochemical and materials research – Secure web-based AI design platform – CHEMBL, Patent data analysed – Merged into one knowledgebase
  40. 40. Exploiting medicinal chemistry knowledge to accelerate projects October 2020Exploiting medicinal chemistry knowledge to accelerate projects October 2020 Science As A Service (SaaS) Target ID Hit Screening Lead Identification Lead Optimisation Pre-Clinical AI H2L design sets Bespoke Advanced Analytics and Computational Chemistry services through-out the research phase Compound design to solve ADMET and potency issues Third party compound assessment Directed virtual screening for hit matter Library design for novel protein targets AI Toxophore assessment Patent analysis Pharmacophore profiling Generating IP for clients [Scaffold hops] Collection evaluation and enhancement
  41. 41. Exploiting medicinal chemistry knowledge to accelerate projects October 2020 October 2020 Not for Circulation Panel Discussion: What should the Medicinal Chemistry Discipline be like in 10 years? Slideshere - search for Dossetter Twitter @MedChemica Twitter @covid_moonshot Twitter #BucketListPapers https://www.medchemica.com/bucket-list/

Notes de l'éditeur

  • Visualisations are anonymised data from an active client project.
  • Feature definitions are pairs from Taylor and Cosgrove
    With the addition of a halogen class, distances are topological distance, binary fingerprints not scalar counts of number of matches.
    Feature importance is permutative importance not impurity
  • Feature definitions are pairs from Taylor and Cosgrove
    With the addition of a halogen class, distances are topological distance, binary fingerprints not scalar counts of number of matches.
    Feature importance is permutative importance not impurity
  • Everyone wants to be able to spot the weak points in a model so it can be improved.

    Here because we can identify where the under explored regions of pharmacophore space are, we can choose to bias our ‘explore’ synthesis and testing to improving the model in a transparent and verifiable way.

    As we are using the precise pharmacophore definitions and Random Forest modelling this means that understanding where to focus attention is straightforward.
  • We can generate good compounds from enumeration – the problem is how to rank them, if we generate a lot of compounds then the initially generated model is not sufficiently discriminating? Generating lots of compounds is not the solution initially! Enumerating from HtL transformation or class transformations – is better, but the best approach is to first make sure you’ve got the most out of the data you already have – permutative MMPA.

    In the D2 example, the m-OMe  o-OH transformation if applid to the propyl compound gives a 1.6log increase in potency (mknown measured compound not in training set).

    Note the env = 4 is only using env 4 transformations from MCPairs – so we only transfer exact SAR, nogenerically pepper the compounds with all the substituents eg just m-Cl not all the Chloros.
  • We can generate good compounds from enumeration – the problem is how to rank them, if we generate a lot of compounds then the initially generated model is not sufficiently discriminating? Generating lots of compounds is not the solution initially! Enumerating from HtL transformation or class transformations – is better, but the best approach is to first make sure you’ve got the most out of the data you already have – permutative MMPA.

    In the D2 example, the m-OMe  o-OH transformation if applid to the propyl compound gives a 1.6log increase in potency (mknown measured compound not in training set).

    Note the env = 4 is only using env 4 transformations from MCPairs – so we only transfer exact SAR, nogenerically pepper the compounds with all the substituents eg just m-Cl not all the Chloros.
  • '"under dev’ covers MMS and extensions. It’s where Andy Bell at Ex Scienta comes in I think.
  • You might want to put more of the team on the Thank you slide:

    E. Griffen, A. Leach, A. Lin, J. Stacey, L. Reid, S. Montague, P De Sousa.





×