Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Using Matched Molecular Pairs To Cluster Compounds
1. Using Matched Molecular Pairs to
cluster compounds
Willem van Hoorn
Senior Solutions Consultant
Professional Services
Accelrys, UK
2. See previous post for intro re MMP etc:
https://community.accelrys.com/message/14428
3. MMP output
IDs / Activities of
smiles of R-groups smiles of core compounds in MMP
4. Chemical series identification
• Series identification ≈ clustering compounds
• There is no universal best clustering method
– Personal taste
– May want few loose clusters or many tight clusters
– Etc
• Aim: identify series with interpretable SAR
5. Test set: EGFR from ChEMBL
- ChEMBL version 11
- 4609 IC50 values
- 3581 compounds
- 2869 unique compounds with IC50
Ed Griffen et al
https://www.ebi.ac.uk/chembl J Med Chem. 2011, 54, 7739-50
6. Cluster by common core
• 2869 compounds yield 2595 unique cores
– Too many clusters
– However: many cores are substructure of others:
is subset of
7. Identify unique common cores
• Convert all cores to substructure queries
• Perform all vs. all substructure search
• 430 cores are not substructure of other core
8. Map compounds to unique cores
1000
Number of series this size
100 430 series, 51 with ≥10 compounds
10
1
Series size