Pipeline Pilot Chemistry 9.0 is inheriting many new chemical representations from the Accelrys Direct data model. These include the support of the Self Contained Sequence Representation (SCSR) biologics, enhanced Markush structure representations, Markush homology groups, and Non Specific Structures (NONS). Also significantly enhanced is the support for Sgroups, in particular for polymers, mixtures, and formulations. Further, Pipeline Pilot depiction has been upgraded to support these enhancements and the stereochemical perception and ring perception capabilities were improved based on Direct.
The major benefit of these changes is that Direct and Pipeline Pilot now use the same data model. Searches carried out in Direct or in Pipeline Pilot will return identical results and both products will deliver identical structural perceptions. This session will give guidance on how these changes will impact your calculators and models and how you can plan for a smooth upgrade.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
(ATS6-PLAT01) Chemistry Harmonization: Bringing together the Direct 9 and Pipeline Pilot Chemistry Data Models
1. (ATS6-PLAT01) Chemistry Harmonization
Bringing together the Direct 9 and Pipeline Pilot Chemistry
Data Models
Ton van Daelen, Ph.D.
Product Director, Platform
Product Management
ton.vandaelen@accelrys.com
Keith Taylor, Ph.D.
Product Manager, Chemistry
Product Management
keith.taylor@accelrys.com
2. The information on the roadmap and future software development efforts are
intended to outline general product direction and should not be relied on in making
a purchasing decision.
3. Content
• We are harmonizing the chemical representations in
Pipeline Pilot 9.0 and Direct 9.0
• Pipeline Pilot, Direct and Draw to adopt best-of-breed
features
• What will you learn?
– What your scientists need to be aware of
– How to manage this change as an administrator
4. Direct 9.0 – Changes
• Note: Direct 9 will return different search results in some
cases, consistent with Pipeline Pilot
– Aromaticity perception now based on Hückel rule (4n+2)
– Tautomer perception based on Sayle et al. paper
• Consistency between Pipeline Pilot, Accelrys Direct, and
Accelrys Draw
– Same chemistry, same results everywhere
*Canonicalization and Enumeration of Tautomers, Sayle and Delany, EuroMUG99, 28-29 October 1999, Cambridge, UK
5. Pipeline Pilot 9.0 – New Capabilities
Consistency between Pipeline Pilot Chemistry, Direct, and Draw
• Enhanced representation – ‘What you see is what you have’
• Depiction engine from Direct and Draw
• Mappers supporting new representations
• Calculators upgraded to interpret new representations
• Enhanced perceptions of stereochemistry, aromaticity, and rings
Note
• Changes to perception mean that models, and calculators must be
relearned and re-baselined
– Significant effect from new ring perception option
– Stereochemistry and aromaticity have smaller, but important effect
6. Pipeline Pilot 9.0 – Improved Chemical Representations
• Single/double/triple bonds supported
in NONS
• Coordination/Dative bond
• Haptic bonds
• Markush Homology Groups
• Hydrogen bonds
7. • Rendering between Accelrys Draw and Pipeline Pilot
9.0 now consistent
• Pipeline Pilot now supports:
– PNG
– JPEG
– GIF
– SVG
– EMF – Linux and Windows!
• SVG and EMF generation fast
– ~ 10,000 structures per second
Pipeline Pilot 9.0 – Depiction
Draw
Pipeline
Pilot
8. • Abbreviated groups are frequently used to simplify structures
• Attachment points are now correct
– The Pipeline Pilot 8.x depictions are incorrect on the left of the phenyl
group
– The labels depicted imply different chemical entities
• Visual corruption
• Nitrile (CN) and isonitrile (NC) are chemically different
• NCS and SCN are also different entities
• Rich text markup renders correctly
• Whitespace around labels is consistent
– Affects perceived bond length
Pipeline Pilot 9.0 – Depiction
Draw
Pipeline
Pilot
9. • Markush/Rgroup depiction is complete in Pipeline Pilot rendering
• Now renders
– Rgroups definitions (e.g. R1 …)
– Rgroup logic (R1 = 1; R2 >= 0)
– Directionality indicated for fragments with multiple attachment points (e.g.
“ on R2)
Pipeline Pilot 9.0 – Depiction
Draw
Pipeline
Pilot
10. • Nonspecific (NONS) representation are equivalent with
Direct 8.0 and Draw 4.1
– Pipeline Pilot version does not lose information
• Examples from mass spectrometry and industrial chemicals
Pipeline Pilot 9.0 – Depiction
Draw
Draw
Pipeline
Pilot
Pipeline
Pilot
11. • Increased focus on biological therapeutics
• Representation exposed in Pipeline Pilot 8.5
• Completed in 9.0
– Much more functional and sophisticated
Biologics
Pipeline
Pilot
Draw
15. What does this mean to my scientists? (1)
• Higher quality reports
– Supports perception of quality research
• Enhanced depiction of biologics and Markush generics
– Look different and minor adjustments to depiction protocols may be needed
• New chemical representations
– No change to existing protocols
– New opportunities opened up
• Expect marginal differences in hit sets between Direct 8 and 9 due to different
aromaticity and tautomer perceptions
16. • Enhanced mapping – New in 9.0 e.g. Imipramine Metabolites
Mapping: Non Specific Structures - New
17. • Screen MDDR data set
– 129,237 structures screened in ~30s
– No pre-processing
Mapping: Homology group screening
Hits = 470
Hits = 108
Hits = 45
Hits = 16
Hits = 10
18. • Changes to stereochemical and aromaticity perception drive changes in the behavior
of:
– Learned models
– Calculators
– Structure Matchers
• Need to relearn and re-baseline calculators and models
• Change is discontinuous (!)
• There will be no legacy mode
– Because this will cause incompatibilities and drive confusion
Data Model Changes from PP 8.x PP9
19. Compatibility: Pipeline Pilot and Accelrys Direct
• PP 9.0 and Direct 9.0 (2013)
– 100% compatible
• PP 9.0 and Direct 8.0
– Only difference is aromaticity
perception edge-cases
– Direct 8.0 uses its current
aromaticity perception
• Template based
– Differs from that in Pipeline Pilot 9.0
• Hückel (4n+2) rule based
– Minor differences will be observed
20. Dataset
Number of
Structures
Canonical
SMILES
AlogP
Number of
Rings
Number of
Aromatic Rings
Number of
Stereo Atoms
ECFP4
ACD 239,996 251 105 2,455 65 0 214
Asinex 137,799 26 24 1,070 22 0 43
Maybridge 51,058 2 0 438 0 0 1
MDDR 2010 201,748 62 24 3,271 29 4 46
WDI 53,517 37 14 612 10 0 42
Observed Differences in Calculated Values
Table shows the number of structures in the datasets that had different values in 9.0 compared with 8.5
Difference generally very small
Ring perception leads to more prominent differences especially in drug-like datasets
21. • Descriptors such as EC Fingerprints, Canonical Smiles, Ring Counts,
AlogP could be different from Pipeline Pilot 8.5
• Results from learned models that use such descriptors could be a
little different
– Retraining the models is recommended
• Canonical SMILES and feature keys could be different
– Recalculating database indices is recommended
• Similarity and substructure searching could also produce different
results
Effect of Perception Changes
22. Effect of Perception Changes
Comparison of DrugLike models
learned in Pipeline Pilot 8.5 and
retrained in Pipeline Pilot 9.0
applied to molecules in the Asinex
data set
The results are very similar for most
molecules, with larger deviations
for a few
23. What do I need to do as an admin?
• When to upgrade?
– Use Direct and AEP/PP independently:
• Upgrade to get new capabilities
– Use Direct and PP in a mixed environment:
• As soon as possible in order to benefit from harmonized chemistry
• If you are using ChemReg
– Wait until AEP 9.1 is released and do one AEP upgrade
– AEP 9.1 contains chemistry updates for Direct 9 capabilities
• What instructions do I give my users?
– Rebuild learned models and calculators under PP 9.0
• What testing do I need to do?
– Run your standard test yet and determine that differences from baseline are
expected due to the changes in chemical perception
24. Implications for Other Products
• Direct 9 retains historic APIs and search type
– Maintenance and interfacing are unchanged
• All supported versions of Draw are compatible with Direct 9
• ChemReg 3.2 will be supported on Direct 8 and 9
• AELN will support Direct 9 in a future release
• Should I be running Direct 8 and 9 simultaneously for a
while?
– This is possible but not recommended: different search results will
confuse users
– Recommendation: verify your enterprise systems with Direct 9
and then move Direct 9 to production
25. Summary
• Chemistry harmonization project:
– PP 9.0 inherits many new chemical representations
– Existing representations enhanced
– Aromaticity, stereochemistry and ring perceptions enhanced
– Significant improvement to depiction aesthetics
• Accelrys Enterprise Platform, Pipeline Pilot 9 and Direct 9
deliver the same results
26. Where do I go for more information?
• Resources
– Admin guides
• AEP/PP 9
• Direct 9
– Chemical representation changes documents
• AEP/PP 9
• Direct 9
• Community / download
– Log into Accelrys community forums
• E.g.: https://community.accelrys.com/community/accelrys_direct__draw__and_jdraw
• Accelrys is there to help
– Customer support – upgrade strategies
– Professional services – upgrade service
28. • Single chemistry foundation with single data model implemented in
a single code stream
– Adopted by Tools and Platform
• Direct , Pipeline Pilot and Accelrys Enterprise Platform
– Application Stack inherits all of the chemistry capabilities
• Simplifies development and application environment
• Enhances our ability to deliver new functionality more quickly across the
products
Harmonization delivers
29. Other New Features in PP 9.0
• Component for reaction-based tautomer enumeration
• Based on a set of twenty one SMIRKS described in "Tautomerism in Large
Databases", Sitzmann, M.; Ihlenfeldt, W.D. & Nicklaus, M. C., J. Comput. Aided
Mol. Des., 2010, 24, 521-551
• Components to do Data Fusion and to Rank Similarities
• Based on “Combination of Similarity Rankings Using Data Fusion”, Peter Willett, J.
Chem. Inf. Model., 2013, 53, 1−10
• Bad Isotope Filter now flags radioactive isotopes
• Components to check structures for querying or registration
• Customizable external elements table (PTable)
• Alternative method to calculate atom-atom mappings in
reactions
30. • Ported CHRP mapper (FSMapper) to Pipeline Pilot source base
• New mapping components decide automatically (user doesn’t know or care)
which mapper to use (PP SGMapper or new FSMapper), depending on the
molecular features present in queries and targets
• FSMapper is used for
• Reactions
• Rgroups with two attachments
• Polymers and link nodes
• Variable-attachment bonds (Markush bonds)
Harmonization of Mapping Functionality
31. • New mapping components
• Work with queries from Tag and from File
• Old mapping components are in a deprecated folder
• Use only PP SGMapper (don’t handle all the new features)
• Can be used to reproduce previous mapping behavior if needed
Harmonization of Mapping Functionality
32. • Charged non-metals are now treated as their “isoelectronic” equivalent:
– B- ~ C ~ N+ ~ O+2 ~ F+3
– Si- ~ P ~ S+ ~ Cl+2
• The bad valence filter is improved and now catches more bad anions.
• Metal anions no longer have implicit hydrogens
– Aluminum anions are an exception (for support of aluminum hydride anion)
• Nitrogen (V) is still allowed as a drawing alternative for nitro- and diazo- groups, amine
oxides, and related substructures. However, the application is now less likely to perceive
uncharged quaternary nitrogens as implicit hydrogens.
• Atoms with illegal valence are now better distinguished from atoms with maximum
valence in ECFP fingerprint bits. For example, the Oxygen in N=O and N#O is now typed
differently. This can affect the Canonical SMILES atom order for structures containing
atoms with illegal valence.
• The changes in valence result in changes to ECFP fingerprint bits and Canonical SMILES.
Valence and Implicit Hydrogens
33. Ring perception is improved. Previously, the SSSR ring perception algorithm was used, which is not unique
and often misses rings in complex non-planar assemblies, when they are atom-order and bond-order
dependent. The unique “K-rings” perception algorithm is now used, which is the union of all possible SSSR
sets. These changes result in changes to Canonical SMILES and improved aromaticity perception.
Examples
• Now perceived as 3 rings:
• Now perceived as 4 rings:
• Now perceived as 6 rings:
Rings
34. • The isoelectronic equivalence enhancement described in Valence and Implicit Hydrogens improves the
perception of ring systems containing charged non-metals. Improved detection of bad valence for
anions also contributes to improved perception of aromaticity.
• The set of atoms that can contribute a lone pair to an aromatic ring is extended (from N,O,P,S) to
include As, Se, and Te.
• These changes result in changes to ECFP fingerprint bits and Canonical SMILES.
Examples
• Now perceived as aromatic:
• No longer perceived as aromatic:
Aromaticity Perception
35. • The isoelectronic equivalence enhancement described in Valence and Implicit Hydrogens improves the
perception of stereogenic centers that include charged non-metals.
• The symmetric equivalence of O-/OH/=O groups attached to P and S atoms has been extended to
include As, Se, and Te centers.
• Stereo validation logic of reader code is synchronized with perception code. This allows for more
consistent application of rules prohibiting S(IV) centers, P(V) centers, symmetric equivalence of O-
/OH/=O, etc.
• “Double-symmetric” ring atom perception is improved Several symmetric spiro cases are now
correctly not marked as pseudo-stereo.
Examples
• Now perceived as stereo:
• More consistently perceived as not stereo:
Stereochemical Perception
38. OpenEye Molecule To Name Component
2,3,4,5-tetrahydro-1λ6,4-benzothiazepine 1,1-dioxide
2,3,4,5-tetrahydro-1λ<sup>6</sup>,4-benzothiazepine 1,1-dioxide
Options to use HTML tags and special characters
39. OpenEye Molecule From Name Component
2-[4-[(3,5-dichloro-4-pyridyl) oxy]phenyl] acetonitrile
leucine
tylenol
40. New Science
• Scaffold Tree
• Bases on "The Scaffold Tree, Visualization of the Scaffold Universe by
Hierarchical Scaffold Classification", Schuffenhauer, A., Ertl, P., Roggo, S.,
Wetzel, S., Koch, M. A., Waldmann, H., J. Chem. Inf. Model. 2007, 47, 47-58
• Quantitative Estimate of Drug-Likeness (QED)
• Based on “Quantifying the Chemical Beauty of Drugs”, G. Richard Bickerton,
Gaia V. Paolini, Jérémy Besnard, Sorel Muresan, Andrew L. Hopkins, Nature
Chemistry 4, 90–98 (2012)
• Synthetic Accessibility (SAscore)
• Based on “Estimation of Synthetic Accessibility Score of Drug-like
Molecules Based on Molecular Complexity and Fragment Contributions”,
Peter Ertl and Ansgar Schuffenhauer, Journal of Cheminformatics, 2009, 1:8