Poster wellcome-trust-2013-herhaus-fuzzy-mcs

Improving Cluster Representations by "Fuzzifying"
Maximum Common Substructures
Christian Herhaus
Merck KGaA, Computational Chemistry, Frankfurter Str. 250, 64293 Darmstadt, christian.herhaus@merckgroup.com

Introduction

Results

Arranging similar structures in clusters is one of the typical tasks of modern
Chemoinformatics with high impact in HTS follow-up, generation of structure activity
relationships (SAR) and selection of starting points for compound optimization.

The approach presented here allows the generation of MCS also for similarity-based
clusters with a given inherent structural diversity. It does so by utilizing query features like
atoms lists and query bonds to add "fuzziness" in atom and bond information to the MCS.

Methods for cluster generation are as diverse as the structures which they are applied to
[1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend
to orientate themselves in structure subsets like clusters with the help of substructures,
so-called "scaffolds", which intuitively characterize the structural relationships between
the molecules of the subset.

As a result, the generated "fuzzy" MCS, although still being fully database-searchable, is
more meaningful for the characterisation of clusters as it can cover larger parts of the full
structures than a conventional MCS could do. The approach was implemented in Pipeline
Pilot™ for proof of concept but is general enough to be transferred to other technical
platforms as well.

In the case of substructure-based clustering, well established methods are existing for
generation of Maximum Common Substructures (MCS) which are present in all members
of the structure population or a defined proportion thereof [2]. But in the case of similaritybased clusters, such MCS may either not be existing for the required dataset proportion
or the common substructure may be so small that it is no longer representative and
therefore lacking information.

Conventional
MCS

Aim of this approach is to provide descriptive scaffold structures also for clusters whose
intrinsic structural diversity is too high to be characterized by conventional Maximum
Common Substructure approaches.
"Fuzzy"
MCS

Methods

any
s/d
s/d

s/d

Content of the Pipeline Pilot™ component "Generate Fuzzy MCS" (slightly simplified)

1. Reduced graphs

2. Generalized MCS
any

any

any

any

any

any any

any

any
any any

any
any any

any

any any
any any

any
any

any

any
any

any
any
any any

any

any

6. Query bonds coded as
stereo bonds

7. Final "fuzzy" MCS

any
s/d
s/d

3. Generalized MCS mapped onto structures

4. Structures reduced & mappings normalized

6

1

1

1
17

11

11

11
9
7

1

13

11

16

5. Collected atom/bond information transfered into queries
Atom No
1
7
11

Elements
O,O,S
N,N,O
C,C,C

Atom No
1
7
11

Element
[O,S]
[N,O]
C

Bond No
3
9
14

Types
1,1,1
1,2,1
2,1,4

Bond No
3
9
14

Type
single
single/double
aromatic

7

7
20

s/d

7

Where to get / How to contribute
The current version of the component is published on the Accelrys® Pipeline
Pilot™ user forum and can be downloaded from
https://community.accelrys.com/message/16528.
Contributors are appreciated for improving the approach and for transfering
the concept to other technology platforms like RDKit, CDK or KNIME.

Current limitations
• Aromaticity may cause unexpected effects (e.g. single/double vs. aromatic bonds)
• Symmetric substructures may blur atom/bond information (e.g. carboxyl or nitro groups)
• Currently, just one presence of an atom/bond feature leads to inclusion into a query
feature. For larger clusters, a percentual threshold may be required
• Different ring sizes and chain lengths can so far not be described by query patterns.

[1] Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry (18). Chichester: Wiley and Sons; 2003, 1-40.
[2] Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 2011, 1:68-79.

Poster wellcome-trust-2013-herhaus-fuzzy-mcs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Poster wellcome-trust-2013-herhaus-fuzzy-mcs

Similaire à Poster wellcome-trust-2013-herhaus-fuzzy-mcs (20)

Poster wellcome-trust-2013-herhaus-fuzzy-mcs