SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
245th ACS National Meeting, New Orleans, 9th April 2013




  Roundtripping between
    small-molecule and
biopolymer representations

     Noel O’Boyle and Roger Sayle
           NextMove Software
             Evan Bolton
               PubChem
Rank Sales   Trade Name                      Name                 Type of biologic
               Q4 2012
                  1           Abilify      aripiprazole

                  2          Nexium        esomeprazole

                  3          Crestor       rosuvastatin

                  4          Cymbalta      duloxetine

                  5          Humira        adalimumab                          Monoclonal antibody

                  6        Advair Diskus   fluticasone/salmeterol
                                                                               Protein attached to
                  7           Enbrel       etanercept
                                                                               monoclonal antibody
                  8         Remicade       infliximab                          Monoclonal antibody

                  9          Copxone       glatiramer acetate                  Peptide

                  10         Neulasta      pegfilgrastim                       PEG attached to protein

                  11         Rituxan       rituximab                           Monoclonal antibody

                  12          Spiriva      tiotropium bromide

                  13          Atripla      emtricitabine/tenofovir/efavirenz

                  14        OxyContin      oxycodone

                  15         Januvia       sitagliptin


Source: Drugs.com Statistics, Q4 2012
(http://www.drugs.com/stats/top100/2012/q4/sales)
A PICTURE IS WORTH A THOUSAND
          WORDS…?
…but it depends on the
      picture




  FDA            IUPAC
Is it a file format problem?
 • HELM (Hierarchical editing language for macromolecules)
       – Developed at Pfizer, and promoted by Pistoia Alliance
       – Hierarchical description of monomers and oligomers and connections
         between them




       RNA1{P.R(A)P.R(G)P.R(C)P.R(U)P.R(T)P.R(T)P.R(T)}|CHEM1{SS3}
       $RNA1,CHEM1,1:R1-1:R1$$$




Zhang et al. J. Chem. Inf. Model. 2012, 52, 2796.
Pfizer
         macromolecular
         editor
                                 HELM    INCHI

SMILES
C[C@H](O1)[C@@H](O)[C@@H](O)[C          DEPICTION
@H](O)[C@@H]1(O2).C(O)[C@@H](O
1)[C@H](O)[C@H](O)…
Database of Monomer
                                        Definitions


         Pfizer
         macromolecular
         editor
                                 HELM           INCHI

SMILES
C[C@H](O1)[C@@H](O)[C@@H](O)[C                 DEPICTION
@H](O)[C@@H]1(O2).C(O)[C@@H](O
1)[C@H](O)[C@H](O)…
Is it a File format problem?
• PLN (Protein line notation)
   – Developed by Biochemfusion
   – Also software to edit and depict
   – Adheres closely to IUPAC recommendations and extended to handle
     cycles and arbitrary modified residues


              H-ASDF-OH.H-CGTY-OH id=P0001 **

• SCSR (Self-contained sequence representation)
   – Developed as part of Accelrys Draw
   – Extension of Mol V3000
   – Can convert between all-atom representation and condensed forms
…OR is it a perception problem?

• Existing all-atom representations are fully capable of storing
  biopolymers
   – PDB files have been used to store biopolymers for some time
   – SMILES and MOL files can store macromolecules


• On-the-fly perception from connection table
   – Perception of biopolymer units and the nature of the connections
     between those


• Convert perceived structure to any of several all-atom or
  biopolymer representations
Perception
                engine       HELM
 PDB

MOL                                    PLN

                                IUPAC NAME


SMILES

                           DEPICTION


   IUPAC Condensed
SMILES
[C@@H]1([C@H](O)[C@@H](O)[C@H]
(O)[C@H](O1)CO)O[C@H]1[C@@H]([C
@H](C(O)O[C@@H]1CO)O)O




                               Sugar &                           common NAME

 IUPAC Condensed                SPLICE                            cellobiose

  Glc(b1-4)Glc
                                                               IUPAC NAME
                                                          beta-D-gluco-hexopyranosyl-
                                                          (1->4)-D-gluco-hexopyranose
  LINUCS
  [][D-Glcp]{
     [(4+1)][b-D-Glcp]{}                           DEPICTION
                                                   Coming soon…
  }                        PDB
                           Mark BOTH as BGC
                           ATOMS as C1-C6, O1-O6
[C@H]1(CCCN1C(=O)[C@@H]1CSSC[C@@H](C(=O)N[C@
 @H](Cc2ccc(cc2)O)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[
 C@@H](CCC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)N1)N)
 C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N

SMILES
                                                                                      OH



PLN                                    Sugar &
H-C(1)YIQDC(1)PLG-[NH2]                                                           common NAME
                                        SPLICE                            [5-L-aspartic acid]oxytocin


DEPICTIONS
                                                                                PDB



 L-Cys(1)-L-Tyr-L-Ile-L-Gln-L-Asp-L-Cys(1)-L-Pro-L-Leu-Gly-NH2
                                                                                        IUPAC NAME
IUPAC Condensed
                                     L-cysteinyl-L-tyrosyl-L-isoleucyl-L-glutaminyl-L-alpha-aspartyl-L-
                                     cysteinyl-L-prolyl-L-leucyl-glycinamide (1->6)-disulfide
Cc1cn(c(=O)[nH]c1=O)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O-])O[C@@H]3[C@H](
O[C@H]([C@@H]3O)n4cnc5c4nc([nH]c5=O)N)CO[P@@](=O)([O-])O[C@@H]6[C@H](O[C@H]([C@@
H]6O)n7ccc(=O)[nH]c7=O)CO[P@@](=O)([O-])O[C@@H]8[C@H](O[C@H]([C@@H]8O)n9cnc1c9nc
([nH]c1=O)N)CO[P@@](=O)([O-])O[C@@H]1[C@H]… (4984 characters)

SMILES
                                                                    PDB
                                  Sugar &
                                   SPLICE                                           IUPAC NAME
                                                     …-5'-cytidylyl)-5-methyl-5'-cytidylyl)-5'-uridylyl)-
                                                     5'-guanylyl)-5'-uridylyl)-5'-guanylyl)-5-methyl-5'-
                                                     uridylyl)-1-uracil-5-yl-1-deoxy-beta-D-ribofuranos-
                                                     O5-yl)(hydroxy)phosphoryl)-5'-cytidylyl)-5'-
                                                     guanylyl)-1-(6-imino-1-methyl-purin-9-yl)-1-
         tRNA(Phe)                                   deoxy-beta-D-ribofuranos-O5-
                                                     yl)(hydroxy)phosphoryl)-5'-uridylyl)-5'-cytidylyl)-
                                                     5'-cytidylyl…
IUPAC Condensed
P-rGuo-P-rCyd-P-rGuo-P-rGuo-P-rAdo-P-rUrd-P-rUrd-P-rUrd-P-rAdo-P-m2Gua-Rib-P-rCyd-P-rUrd-P-
rCyd-P-rAdo-P-rGuo-P-rUrd-P-rUrd-P-rGuo-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-rAdo-P-rGuo-P-rCyd-P-
                                PDB                          common NAME
m22Gua-Rib-P-rCyd-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-Cyt-Rib2Me-P-rUrd-P-Gua-Rib2Me-P-rAdo-P-
                                Mark BOTH as BGC
rAdo-P-rWyb-P-rAdo-P-rPrd-P-m5Cyt-Rib-P-rUrd-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-m7Gua-Rib-P-
                                                              cellobiose
                                ATOMS as C1-C6, O1-O6
rUrd-P-rCyd-P-m5Cyt-Rib-P-rUrd-P-rGuo-P-rUrd-P-rGuo-P-m5Ura-Rib-P-rPrd-P-rCyd-P-rGuo-P-
m1Ade-Rib-P-rUrd-P-rCyd-P-rCyd-P-rAdo-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-rAdo-P-rUrd-P-rUrd-P-
rCyd-P-rGuo-P-rCyd-P-rAdo-P-rCyd-P-rCyd-P-rAdo
In favor of perception
• Cost
   – A new file format means a new compound registry system
• Ability to roundtrip
• Flexibility
   – New and unusual monomer? Will be faithfully stored in the MOL file
• Interchange of data
   – SMILES and MOL files are already a de-facto standard
   – HELM requires the (all-atom) monomer definitions
• Analysis
   – Can combine tools for small-molecule analysis with those that take as
     input biopolymer formats
• File-format lock-in
   – Difficult to migrate if you base your registry system on a particular file
     format
chains Perception Algorithm

• The input is a connection table
• Typical sources include SMILES, MOL and PDB files
• As an option, graph connectivity only can be used, allowing
  molecules without bond orders to be handled (e.g. from PDB
  files)
chains Perception Algorithm

• Identify nucleic acid backbone
• Recognize sidechains
• Identify peptide backbone
• Recognize sidechains
• Identify oligosaccharide backbone
• Recognize monosaccharides
• Label unidentified connected components as “Unknown”
  monomers
• Add connections between monomers
chains Perception Algorithm

• Identify nucleic acid backbone
• Recognize sidechains
• Identify peptide backbone
• Recognize sidechains
• Identify oligosaccharide backbone
• Recognize monosaccharides
• Label unidentified connected components as “Unknown”
  monomers
• Add connections between monomers
Biologically-important
          MONOsaccharIde CLASSES


     Hexopyranoses               Pentopyranoses           Hex-2-ulopyranoses
    e.g. D-Glucopyranose         e.g. D-Xylopyranose      e.g. D-Fructopyranose




      Hexofuranoses               Pentofuranoses          Hex-2-ulofuranoses
    e.g. D-Glucofuranose          e.g. D-Xylofuranose      e.g. D-Fructofuranose




Hexopyranuronic acids        Non-2-ulopyranosonic acids
e.g. D-Glucopyranonic acid       e.g. Neuraminic acid
Biologically-important
          MONOsaccharIde CLASSES


     Hexopyranoses               Pentopyranoses           Hex-2-ulopyranoses
    e.g. D-Glucopyranose         e.g. D-Xylopyranose      e.g. D-Fructopyranose




      Hexofuranoses               Pentofuranoses          Hex-2-ulofuranoses
    e.g. D-Glucofuranose          e.g. D-Xylofuranose      e.g. D-Fructofuranose




Hexopyranuronic acids        Non-2-ulopyranosonic acids
e.g. D-Glucopyranonic acid       e.g. Neuraminic acid
Polymer Matching

• Matching of linear, cyclic and dendrimeric polymers and
  copolymers can be efficiently implemented by graph
  relaxation algorithms.

• Each atom records a set of possible template equivalences
  represented as a bit vector.

• These bit vectors are iteratively refined using the bit
  vectors of neighboring atoms.
Oligosaccharide backbone
                  detection
• Assign initial constraints based upon atomic number, ring
  membership and heavy atom degree (and optionally
  valence and charge)

Bit C1:       C      In ring        3 neighbors
Bit OX:       O      In ring        2 neighbors

                          OX


                               C1
Oligosaccharide backbone
             detection
• Perform iterative graph relaxation at each atom using its
  neighbor’s bit masks
   – Unset bits if do not match neighbor templates
                                                        OX

Bit C1:        OX       C2      OH
                                                             C1
Bit OX:        C5       C1



   – End result: either bitmasks go to zero if not a supported
     monosaccharide, otherwise each atom will have a bitmask with
     a single bit set
Recognize monosaccharides

• Find all anomeric centers attached to a peptide                 C6

  (glycoprotein) or with a free OH                                     C1
   – These are the starting points of oligosaccharide chains


• Travel around each monosaccharide ring from C1->C6
   – Note the location of any substituted OHs or deoxys
   – Note the stereo configuration of the OHs (or subsituted OHs)
   – If an OH links to another monosaccharide, recursively traverse
     that, building up the oligosaccharide structure


• At C6, name the monosaccharide based on the stereo
  configuration and record the substitution pattern
Saccharide or not?



 D-Glucopyranose                 (2S)-2-methyloxane
D-gluco-hexopyranose         (2S)-2-methyl-tetrahydropyran
Saccharide or not?



     D-Glucopyranose                                            (2S)-2-methyloxane
   D-gluco-hexopyranose                                     (2S)-2-methyl-tetrahydropyran




  D-Quinovopyranose             D-Paratopyranose                   D-Amicetopyranose
6-deoxy-Glucopyranose      3,6-dideoxy-Glucopyranose          2,3,6-trideoxy-Glucopyranose
   6-deoxy-D-gluco-       3,6-dideoxy-D-ribo-hexopyranose         2,3,6-trideoxy-D-erythro-
    hexopyranose                                                        hexopyranose
Supported Saccharides
• Initial work has focussed on the most common saccharides
   – 5- and 6-membered ring forms of Hexoses, Hex-2-uloses, and
     Pentoses, 6-membered ring forms of Hexuronic acid and Non-2-
     ulosonic acid
   – Currently supported substitutions are those supported by IUPAC
     condensed notation (N, NAc, OAc, OMe)
• Perceived as monosaccharide if:
   – Matches one of the structures above
   – The hydroxyl group is present at the anomeric position (or substituted
     by N or O)
   – Another ring hydroxyl group (or substituted hydroxyl group) is present
• Covers 70.2% of the 693 monosaccharides in
  MonosaccharideDB
48 hexopyranoses
264 deoxy-hexopyranoses
9540 substituted
hexopyranoses (4 most common
        substituents)
How many saccharides are in PubChem?
                                                                                      Count
Total occurrence (of supported saccharides)                                            4050
…of which are monosaccharides                                                          1628




                             CID 449522
                             3-deoxy-ribHexNAc(b1-4)GlcA(b1-3)GlcNAc(b1-4)GlcA(b1-3)GlcNAc(b1-4)b-GlcA




CID 16663750
4-deoxy-L-thrPen(a1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)b-Xyl
How many MONOsaccharides are in
                        PubChem?
                                                                                    Count
  Total occurrence (of supported monosaccharides)*                                  121680
  …of which have defined stereochemistry                                            86996
  ……of which unique                                                                 1907
  Number of possible substituted hexopyranoses†                                     9540
  …of which observed                                                                 936




* May be present as part of a larger molecule, not necessarily an oligosaccharide
† Supported substitutions are amino, acetyl, acetylamino and methoxy
Breaking news

• Import of GlycosuiteDB into PubChem
   – Database of glycans taken from the literature
   – Covers multiple species
   – Mixture of partial and fully defined structures


• 1487 oligosaccharide structures imported
Challenges AHEAD
The perception of Sugar
        depictions
You drew   You meant   You got
The perception of Sugar
        depictions
You drew   You meant   You got
Example
• PubChem CID 29980572
   – SMILES corresponds to 6-deoxy-L-Gul(a1-2)a-L-Alt
• ZINC 22059715
   – SMILES corresponds to 6-deoxy-L-Gul(a1-2)a-L-Alt
• Toronto Research Chemicals F823500
   – Diagram and name corresponds to 6-deoxy-L-Gal(a1-2)Gal
   – But the 2D SDF file depicts chair forms and uses wedges for perspective


• Given a Mol file depiction of a sugar, the challenge is to:
   – Perceive that it is a sugar
   – Interpret the stereo correctly
   – Generate a Mol file that will be read correctly by cheminformatics
     toolkits that do not have advanced sugar support
Arbitrary chemical modifications
• Roundtripping is straightforward for recognized super-atoms
   – Perceived from all-atom or superatom representation when reading
   – Translated to all-atom or superatom representation when writing


• For “unknown” monomers or chemical modifications, the all-
  atom representation must explicitly be stored along with the
  connection point(s)
   – E.g. as SMILES, “*CCC” for 1-propyl
   – A lookup table could be used for common modifications


• Note: arbitrary chemical modifications are not representable
  in many output formats
   – Be careful of relying on a particular polymer file format
Variable attachment points

Reported oligosaccharide structures often contain linkages
whose exact attachment point is unknown


Glc(β1?)Glc

4 possibilities




Challenge: Is it possible to store this information
in a Mol file or SMILES string?
Variable attachment points
• SMILES does not support positional variation
   – However ChemAxon have used dangling bonds to implement an
     extension
       • C*.C1CCCCC1 |c:1,3,5,m:1:2.7.3|
• V3000 Mol file
   – Both MarvinSketch and Accelrys Draw store variable attachment
     points in the same way using the bond block
       • “M V30 7 1 8 7 ENDPTS=(3 3 2 1) ATTACH=ANY”
   – ChemSketch uses its own extension, while ChemDraw cannot store the
     information
• V2000 Mol file
   – Accelrys Draw will not save as V2000, ChemDraw cannot store the
     information
   – MarvinSketch uses its own extension, as does ChemSketch
Acknowledgements

• Evan Bolton PubChem
• Daniel Lowe NextMove Software




                            http://nextmovesoftware.com
                       http://nextmovesoftware.com/blog
                             noel@nextmovesoftware.com
Left: The Biology Project, Immunology
http://www.biology.arizona.edu/immunology/tutorials/antibody/structure.html
Center: David Goodsell, Antibodies: September 2001 Molecule of the Month, DOI:
10.2210/rcsb_pdb/mom_2001_9
Would a sugar by any other
           name taste as sweet?
• Carbohydrates
     – Originally compounds of formula (CH2O)n, but now used as generic
       term for monosaccharides, oligosaccharides, derivatives thereof
• Saccharides
     – Considered by some to be synonymous with carbohydrates, otherwise
       mono-, oligo- and polysaccharides
• Monosaccharides
     – Aldoses, ketoses, uronic acids, and many more
• Sugars
     – Loose term applied to monosaccharides and lower oligosaccharides
• Glycans
     – “Biochemical” term for oligo- and polysaccharides especially as
       components of glycoproteins and proteoglycans
http://www.chem.qmul.ac.uk/iupac/class/carbo.html
Taken from Table 1, Ranzinger et al., Nucleic Acids Research, 2001, 39, D373.
How many MONOsaccharides are in
                        PubChem?
                                                                                    Count
  Total occurrence (of supported monosaccharides)*                                  121680
  …of which have defined stereochemistry                                            86996
  ……of which unique                                                                 1907
  Number of possible substituted hexopyranoses†                                     9540
  …of which observed                                                                 936
  Number of possible hexopyranose substitution patterns†                            54=625
  …of which observed                                                                 75




* May be present as part of a larger molecule, not necessarily an oligosaccharide
† Supported substitutions are amino, acetyl, acetylamino and methoxy

Contenu connexe

En vedette

Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?NextMove Software
 
Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...NextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会Yusuke Suzuki
 
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015LinkedIn India
 
Generating Results from Promoted Pins on Pinterest
Generating Results from Promoted Pins on PinterestGenerating Results from Promoted Pins on Pinterest
Generating Results from Promoted Pins on PinterestBrian Honigman
 
Interview Ilb Life Style Dordrecht Dec2011
Interview Ilb Life Style Dordrecht Dec2011Interview Ilb Life Style Dordrecht Dec2011
Interview Ilb Life Style Dordrecht Dec2011Leanne_Eline
 
Impacto de las aulas virtuales en la educación
Impacto de las aulas virtuales en la educaciónImpacto de las aulas virtuales en la educación
Impacto de las aulas virtuales en la educaciónalejandracastroandrade
 
Interview exercise
Interview exerciseInterview exercise
Interview exerciseworkventures
 

En vedette (13)

Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?Which is the best fingerprint for medicinal chemistry?
Which is the best fingerprint for medicinal chemistry?
 
Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...Improved chemical text mining of patents using infinite dictionaries, transla...
Improved chemical text mining of patents using infinite dictionaries, transla...
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会
ITサービス運営におけるアーキテクチャ設計 - 要求開発アライアンス 4月定例会
 
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015
Understanding the Technology Buyer on LinkedIn - TECHconnect Bangalore 2015
 
Interview Framework
Interview FrameworkInterview Framework
Interview Framework
 
Generating Results from Promoted Pins on Pinterest
Generating Results from Promoted Pins on PinterestGenerating Results from Promoted Pins on Pinterest
Generating Results from Promoted Pins on Pinterest
 
Interview Ilb Life Style Dordrecht Dec2011
Interview Ilb Life Style Dordrecht Dec2011Interview Ilb Life Style Dordrecht Dec2011
Interview Ilb Life Style Dordrecht Dec2011
 
Zaragoza turismo 200
Zaragoza turismo 200Zaragoza turismo 200
Zaragoza turismo 200
 
Impacto de las aulas virtuales en la educación
Impacto de las aulas virtuales en la educaciónImpacto de las aulas virtuales en la educación
Impacto de las aulas virtuales en la educación
 
Interview exercise
Interview exerciseInterview exercise
Interview exercise
 
Judit Jorba
Judit JorbaJudit Jorba
Judit Jorba
 

Similaire à Roundtripping between small-molecule and biopolymer representations

Review of carbohydrates
Review of carbohydratesReview of carbohydrates
Review of carbohydratesYoAmoNYC
 
Review of carbohydrates
Review of carbohydratesReview of carbohydrates
Review of carbohydratesabctutor
 
Biosynthesis of nucleotides
Biosynthesis of nucleotidesBiosynthesis of nucleotides
Biosynthesis of nucleotidesPrachee Rajput
 
CINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsCINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsNextMove Software
 
2011 lipids 2
2011 lipids 22011 lipids 2
2011 lipids 2MUBOSScz
 
Chemistry of Natural Products: FOLIC ACID & NIACIN
Chemistry of Natural Products: FOLIC ACID & NIACINChemistry of Natural Products: FOLIC ACID & NIACIN
Chemistry of Natural Products: FOLIC ACID & NIACINSapna Sivanthie
 
biosynthesisofnucleotides-160902060256.pdf
biosynthesisofnucleotides-160902060256.pdfbiosynthesisofnucleotides-160902060256.pdf
biosynthesisofnucleotides-160902060256.pdfDrMuhammadOmairChaud
 
823964 review-on-carbohydrates
823964 review-on-carbohydrates823964 review-on-carbohydrates
823964 review-on-carbohydratesYoAmoNYC
 
823964 review-on-carbohydrates
823964 review-on-carbohydrates823964 review-on-carbohydrates
823964 review-on-carbohydratesabctutor
 
Lecture 1 part.1 Structure and Function of Nucleic Acid
Lecture 1 part.1 Structure and Function of Nucleic AcidLecture 1 part.1 Structure and Function of Nucleic Acid
Lecture 1 part.1 Structure and Function of Nucleic AcidDrQuratulAin5
 
Macromolecules scf 1.4.1
Macromolecules scf 1.4.1Macromolecules scf 1.4.1
Macromolecules scf 1.4.1Esther Herrera
 
SOLID PHASE PEPTIDE SYNTHESIS
SOLID PHASE PEPTIDE SYNTHESISSOLID PHASE PEPTIDE SYNTHESIS
SOLID PHASE PEPTIDE SYNTHESISShikha Popali
 
carbohydrate chemistry BOT.pptx
carbohydrate chemistry BOT.pptxcarbohydrate chemistry BOT.pptx
carbohydrate chemistry BOT.pptxSangeeta Khyalia
 
Mm Of Endoglucanases
Mm Of EndoglucanasesMm Of Endoglucanases
Mm Of Endoglucanasesjastheway
 
Introduction to nucleic acid, chemistry of nucleotides
Introduction to nucleic acid, chemistry of  nucleotidesIntroduction to nucleic acid, chemistry of  nucleotides
Introduction to nucleic acid, chemistry of nucleotidesenamifat
 
Chapter18
Chapter18Chapter18
Chapter18b2l091
 
Section ii chap 1 Chemistry of Life
Section ii chap 1 Chemistry of LifeSection ii chap 1 Chemistry of Life
Section ii chap 1 Chemistry of LifeSheryl De Villa
 
Section ii Chap 1 Chemistry of LIfe
Section ii Chap 1 Chemistry of LIfeSection ii Chap 1 Chemistry of LIfe
Section ii Chap 1 Chemistry of LIfeSheryl De Villa
 

Similaire à Roundtripping between small-molecule and biopolymer representations (20)

Review of carbohydrates
Review of carbohydratesReview of carbohydrates
Review of carbohydrates
 
Review of carbohydrates
Review of carbohydratesReview of carbohydrates
Review of carbohydrates
 
Biosynthesis of nucleotides
Biosynthesis of nucleotidesBiosynthesis of nucleotides
Biosynthesis of nucleotides
 
CINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural productsCINF 4: Naming algorithms for derivatives of peptide-like natural products
CINF 4: Naming algorithms for derivatives of peptide-like natural products
 
2011 lipids 2
2011 lipids 22011 lipids 2
2011 lipids 2
 
Chemistry of Natural Products: FOLIC ACID & NIACIN
Chemistry of Natural Products: FOLIC ACID & NIACINChemistry of Natural Products: FOLIC ACID & NIACIN
Chemistry of Natural Products: FOLIC ACID & NIACIN
 
biosynthesisofnucleotides-160902060256.pdf
biosynthesisofnucleotides-160902060256.pdfbiosynthesisofnucleotides-160902060256.pdf
biosynthesisofnucleotides-160902060256.pdf
 
823964 review-on-carbohydrates
823964 review-on-carbohydrates823964 review-on-carbohydrates
823964 review-on-carbohydrates
 
823964 review-on-carbohydrates
823964 review-on-carbohydrates823964 review-on-carbohydrates
823964 review-on-carbohydrates
 
Lecture 1 part.1 Structure and Function of Nucleic Acid
Lecture 1 part.1 Structure and Function of Nucleic AcidLecture 1 part.1 Structure and Function of Nucleic Acid
Lecture 1 part.1 Structure and Function of Nucleic Acid
 
Enzymes 1
Enzymes 1Enzymes 1
Enzymes 1
 
Macromolecules scf 1.4.1
Macromolecules scf 1.4.1Macromolecules scf 1.4.1
Macromolecules scf 1.4.1
 
SOLID PHASE PEPTIDE SYNTHESIS
SOLID PHASE PEPTIDE SYNTHESISSOLID PHASE PEPTIDE SYNTHESIS
SOLID PHASE PEPTIDE SYNTHESIS
 
carbohydrate chemistry BOT.pptx
carbohydrate chemistry BOT.pptxcarbohydrate chemistry BOT.pptx
carbohydrate chemistry BOT.pptx
 
Mm Of Endoglucanases
Mm Of EndoglucanasesMm Of Endoglucanases
Mm Of Endoglucanases
 
Hmp shunt
Hmp shuntHmp shunt
Hmp shunt
 
Introduction to nucleic acid, chemistry of nucleotides
Introduction to nucleic acid, chemistry of  nucleotidesIntroduction to nucleic acid, chemistry of  nucleotides
Introduction to nucleic acid, chemistry of nucleotides
 
Chapter18
Chapter18Chapter18
Chapter18
 
Section ii chap 1 Chemistry of Life
Section ii chap 1 Chemistry of LifeSection ii chap 1 Chemistry of Life
Section ii chap 1 Chemistry of Life
 
Section ii Chap 1 Chemistry of LIfe
Section ii Chap 1 Chemistry of LIfeSection ii Chap 1 Chemistry of LIfe
Section ii Chap 1 Chemistry of LIfe
 

Plus de NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChemNextMove Software
 

Plus de NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 

Dernier

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Dernier (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Roundtripping between small-molecule and biopolymer representations

  • 1. 245th ACS National Meeting, New Orleans, 9th April 2013 Roundtripping between small-molecule and biopolymer representations Noel O’Boyle and Roger Sayle NextMove Software Evan Bolton PubChem
  • 2. Rank Sales Trade Name Name Type of biologic Q4 2012 1 Abilify aripiprazole 2 Nexium esomeprazole 3 Crestor rosuvastatin 4 Cymbalta duloxetine 5 Humira adalimumab Monoclonal antibody 6 Advair Diskus fluticasone/salmeterol Protein attached to 7 Enbrel etanercept monoclonal antibody 8 Remicade infliximab Monoclonal antibody 9 Copxone glatiramer acetate Peptide 10 Neulasta pegfilgrastim PEG attached to protein 11 Rituxan rituximab Monoclonal antibody 12 Spiriva tiotropium bromide 13 Atripla emtricitabine/tenofovir/efavirenz 14 OxyContin oxycodone 15 Januvia sitagliptin Source: Drugs.com Statistics, Q4 2012 (http://www.drugs.com/stats/top100/2012/q4/sales)
  • 3. A PICTURE IS WORTH A THOUSAND WORDS…?
  • 4. …but it depends on the picture FDA IUPAC
  • 5. Is it a file format problem? • HELM (Hierarchical editing language for macromolecules) – Developed at Pfizer, and promoted by Pistoia Alliance – Hierarchical description of monomers and oligomers and connections between them RNA1{P.R(A)P.R(G)P.R(C)P.R(U)P.R(T)P.R(T)P.R(T)}|CHEM1{SS3} $RNA1,CHEM1,1:R1-1:R1$$$ Zhang et al. J. Chem. Inf. Model. 2012, 52, 2796.
  • 6. Pfizer macromolecular editor HELM INCHI SMILES C[C@H](O1)[C@@H](O)[C@@H](O)[C DEPICTION @H](O)[C@@H]1(O2).C(O)[C@@H](O 1)[C@H](O)[C@H](O)…
  • 7. Database of Monomer Definitions Pfizer macromolecular editor HELM INCHI SMILES C[C@H](O1)[C@@H](O)[C@@H](O)[C DEPICTION @H](O)[C@@H]1(O2).C(O)[C@@H](O 1)[C@H](O)[C@H](O)…
  • 8. Is it a File format problem? • PLN (Protein line notation) – Developed by Biochemfusion – Also software to edit and depict – Adheres closely to IUPAC recommendations and extended to handle cycles and arbitrary modified residues H-ASDF-OH.H-CGTY-OH id=P0001 ** • SCSR (Self-contained sequence representation) – Developed as part of Accelrys Draw – Extension of Mol V3000 – Can convert between all-atom representation and condensed forms
  • 9. …OR is it a perception problem? • Existing all-atom representations are fully capable of storing biopolymers – PDB files have been used to store biopolymers for some time – SMILES and MOL files can store macromolecules • On-the-fly perception from connection table – Perception of biopolymer units and the nature of the connections between those • Convert perceived structure to any of several all-atom or biopolymer representations
  • 10. Perception engine HELM PDB MOL PLN IUPAC NAME SMILES DEPICTION IUPAC Condensed
  • 11. SMILES [C@@H]1([C@H](O)[C@@H](O)[C@H] (O)[C@H](O1)CO)O[C@H]1[C@@H]([C @H](C(O)O[C@@H]1CO)O)O Sugar & common NAME IUPAC Condensed SPLICE cellobiose Glc(b1-4)Glc IUPAC NAME beta-D-gluco-hexopyranosyl- (1->4)-D-gluco-hexopyranose LINUCS [][D-Glcp]{ [(4+1)][b-D-Glcp]{} DEPICTION Coming soon… } PDB Mark BOTH as BGC ATOMS as C1-C6, O1-O6
  • 12. [C@H]1(CCCN1C(=O)[C@@H]1CSSC[C@@H](C(=O)N[C@ @H](Cc2ccc(cc2)O)C(=O)N[C@@H]([C@H](CC)C)C(=O)N[ C@@H](CCC(=O)N)C(=O)N[C@@H](CC(=O)O)C(=O)N1)N) C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N SMILES OH PLN Sugar & H-C(1)YIQDC(1)PLG-[NH2] common NAME SPLICE [5-L-aspartic acid]oxytocin DEPICTIONS PDB L-Cys(1)-L-Tyr-L-Ile-L-Gln-L-Asp-L-Cys(1)-L-Pro-L-Leu-Gly-NH2 IUPAC NAME IUPAC Condensed L-cysteinyl-L-tyrosyl-L-isoleucyl-L-glutaminyl-L-alpha-aspartyl-L- cysteinyl-L-prolyl-L-leucyl-glycinamide (1->6)-disulfide
  • 13. Cc1cn(c(=O)[nH]c1=O)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O-])O[C@@H]3[C@H]( O[C@H]([C@@H]3O)n4cnc5c4nc([nH]c5=O)N)CO[P@@](=O)([O-])O[C@@H]6[C@H](O[C@H]([C@@ H]6O)n7ccc(=O)[nH]c7=O)CO[P@@](=O)([O-])O[C@@H]8[C@H](O[C@H]([C@@H]8O)n9cnc1c9nc ([nH]c1=O)N)CO[P@@](=O)([O-])O[C@@H]1[C@H]… (4984 characters) SMILES PDB Sugar & SPLICE IUPAC NAME …-5'-cytidylyl)-5-methyl-5'-cytidylyl)-5'-uridylyl)- 5'-guanylyl)-5'-uridylyl)-5'-guanylyl)-5-methyl-5'- uridylyl)-1-uracil-5-yl-1-deoxy-beta-D-ribofuranos- O5-yl)(hydroxy)phosphoryl)-5'-cytidylyl)-5'- guanylyl)-1-(6-imino-1-methyl-purin-9-yl)-1- tRNA(Phe) deoxy-beta-D-ribofuranos-O5- yl)(hydroxy)phosphoryl)-5'-uridylyl)-5'-cytidylyl)- 5'-cytidylyl… IUPAC Condensed P-rGuo-P-rCyd-P-rGuo-P-rGuo-P-rAdo-P-rUrd-P-rUrd-P-rUrd-P-rAdo-P-m2Gua-Rib-P-rCyd-P-rUrd-P- rCyd-P-rAdo-P-rGuo-P-rUrd-P-rUrd-P-rGuo-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-rAdo-P-rGuo-P-rCyd-P- PDB common NAME m22Gua-Rib-P-rCyd-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-Cyt-Rib2Me-P-rUrd-P-Gua-Rib2Me-P-rAdo-P- Mark BOTH as BGC rAdo-P-rWyb-P-rAdo-P-rPrd-P-m5Cyt-Rib-P-rUrd-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-m7Gua-Rib-P- cellobiose ATOMS as C1-C6, O1-O6 rUrd-P-rCyd-P-m5Cyt-Rib-P-rUrd-P-rGuo-P-rUrd-P-rGuo-P-m5Ura-Rib-P-rPrd-P-rCyd-P-rGuo-P- m1Ade-Rib-P-rUrd-P-rCyd-P-rCyd-P-rAdo-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-rAdo-P-rUrd-P-rUrd-P- rCyd-P-rGuo-P-rCyd-P-rAdo-P-rCyd-P-rCyd-P-rAdo
  • 14. In favor of perception • Cost – A new file format means a new compound registry system • Ability to roundtrip • Flexibility – New and unusual monomer? Will be faithfully stored in the MOL file • Interchange of data – SMILES and MOL files are already a de-facto standard – HELM requires the (all-atom) monomer definitions • Analysis – Can combine tools for small-molecule analysis with those that take as input biopolymer formats • File-format lock-in – Difficult to migrate if you base your registry system on a particular file format
  • 15. chains Perception Algorithm • The input is a connection table • Typical sources include SMILES, MOL and PDB files • As an option, graph connectivity only can be used, allowing molecules without bond orders to be handled (e.g. from PDB files)
  • 16. chains Perception Algorithm • Identify nucleic acid backbone • Recognize sidechains • Identify peptide backbone • Recognize sidechains • Identify oligosaccharide backbone • Recognize monosaccharides • Label unidentified connected components as “Unknown” monomers • Add connections between monomers
  • 17. chains Perception Algorithm • Identify nucleic acid backbone • Recognize sidechains • Identify peptide backbone • Recognize sidechains • Identify oligosaccharide backbone • Recognize monosaccharides • Label unidentified connected components as “Unknown” monomers • Add connections between monomers
  • 18. Biologically-important MONOsaccharIde CLASSES Hexopyranoses Pentopyranoses Hex-2-ulopyranoses e.g. D-Glucopyranose e.g. D-Xylopyranose e.g. D-Fructopyranose Hexofuranoses Pentofuranoses Hex-2-ulofuranoses e.g. D-Glucofuranose e.g. D-Xylofuranose e.g. D-Fructofuranose Hexopyranuronic acids Non-2-ulopyranosonic acids e.g. D-Glucopyranonic acid e.g. Neuraminic acid
  • 19. Biologically-important MONOsaccharIde CLASSES Hexopyranoses Pentopyranoses Hex-2-ulopyranoses e.g. D-Glucopyranose e.g. D-Xylopyranose e.g. D-Fructopyranose Hexofuranoses Pentofuranoses Hex-2-ulofuranoses e.g. D-Glucofuranose e.g. D-Xylofuranose e.g. D-Fructofuranose Hexopyranuronic acids Non-2-ulopyranosonic acids e.g. D-Glucopyranonic acid e.g. Neuraminic acid
  • 20. Polymer Matching • Matching of linear, cyclic and dendrimeric polymers and copolymers can be efficiently implemented by graph relaxation algorithms. • Each atom records a set of possible template equivalences represented as a bit vector. • These bit vectors are iteratively refined using the bit vectors of neighboring atoms.
  • 21. Oligosaccharide backbone detection • Assign initial constraints based upon atomic number, ring membership and heavy atom degree (and optionally valence and charge) Bit C1: C In ring 3 neighbors Bit OX: O In ring 2 neighbors OX C1
  • 22. Oligosaccharide backbone detection • Perform iterative graph relaxation at each atom using its neighbor’s bit masks – Unset bits if do not match neighbor templates OX Bit C1: OX C2 OH C1 Bit OX: C5 C1 – End result: either bitmasks go to zero if not a supported monosaccharide, otherwise each atom will have a bitmask with a single bit set
  • 23. Recognize monosaccharides • Find all anomeric centers attached to a peptide C6 (glycoprotein) or with a free OH C1 – These are the starting points of oligosaccharide chains • Travel around each monosaccharide ring from C1->C6 – Note the location of any substituted OHs or deoxys – Note the stereo configuration of the OHs (or subsituted OHs) – If an OH links to another monosaccharide, recursively traverse that, building up the oligosaccharide structure • At C6, name the monosaccharide based on the stereo configuration and record the substitution pattern
  • 24. Saccharide or not? D-Glucopyranose (2S)-2-methyloxane D-gluco-hexopyranose (2S)-2-methyl-tetrahydropyran
  • 25. Saccharide or not? D-Glucopyranose (2S)-2-methyloxane D-gluco-hexopyranose (2S)-2-methyl-tetrahydropyran D-Quinovopyranose D-Paratopyranose D-Amicetopyranose 6-deoxy-Glucopyranose 3,6-dideoxy-Glucopyranose 2,3,6-trideoxy-Glucopyranose 6-deoxy-D-gluco- 3,6-dideoxy-D-ribo-hexopyranose 2,3,6-trideoxy-D-erythro- hexopyranose hexopyranose
  • 26. Supported Saccharides • Initial work has focussed on the most common saccharides – 5- and 6-membered ring forms of Hexoses, Hex-2-uloses, and Pentoses, 6-membered ring forms of Hexuronic acid and Non-2- ulosonic acid – Currently supported substitutions are those supported by IUPAC condensed notation (N, NAc, OAc, OMe) • Perceived as monosaccharide if: – Matches one of the structures above – The hydroxyl group is present at the anomeric position (or substituted by N or O) – Another ring hydroxyl group (or substituted hydroxyl group) is present • Covers 70.2% of the 693 monosaccharides in MonosaccharideDB
  • 29. 9540 substituted hexopyranoses (4 most common substituents)
  • 30. How many saccharides are in PubChem? Count Total occurrence (of supported saccharides) 4050 …of which are monosaccharides 1628 CID 449522 3-deoxy-ribHexNAc(b1-4)GlcA(b1-3)GlcNAc(b1-4)GlcA(b1-3)GlcNAc(b1-4)b-GlcA CID 16663750 4-deoxy-L-thrPen(a1-4)Xyl(b1-4)Xyl(b1-4)Xyl(b1-4)[GlcA4Me(a1-2)]Xyl(b1-4)b-Xyl
  • 31. How many MONOsaccharides are in PubChem? Count Total occurrence (of supported monosaccharides)* 121680 …of which have defined stereochemistry 86996 ……of which unique 1907 Number of possible substituted hexopyranoses† 9540 …of which observed 936 * May be present as part of a larger molecule, not necessarily an oligosaccharide † Supported substitutions are amino, acetyl, acetylamino and methoxy
  • 32. Breaking news • Import of GlycosuiteDB into PubChem – Database of glycans taken from the literature – Covers multiple species – Mixture of partial and fully defined structures • 1487 oligosaccharide structures imported
  • 34. The perception of Sugar depictions You drew You meant You got
  • 35. The perception of Sugar depictions You drew You meant You got
  • 36. Example • PubChem CID 29980572 – SMILES corresponds to 6-deoxy-L-Gul(a1-2)a-L-Alt • ZINC 22059715 – SMILES corresponds to 6-deoxy-L-Gul(a1-2)a-L-Alt • Toronto Research Chemicals F823500 – Diagram and name corresponds to 6-deoxy-L-Gal(a1-2)Gal – But the 2D SDF file depicts chair forms and uses wedges for perspective • Given a Mol file depiction of a sugar, the challenge is to: – Perceive that it is a sugar – Interpret the stereo correctly – Generate a Mol file that will be read correctly by cheminformatics toolkits that do not have advanced sugar support
  • 37. Arbitrary chemical modifications • Roundtripping is straightforward for recognized super-atoms – Perceived from all-atom or superatom representation when reading – Translated to all-atom or superatom representation when writing • For “unknown” monomers or chemical modifications, the all- atom representation must explicitly be stored along with the connection point(s) – E.g. as SMILES, “*CCC” for 1-propyl – A lookup table could be used for common modifications • Note: arbitrary chemical modifications are not representable in many output formats – Be careful of relying on a particular polymer file format
  • 38. Variable attachment points Reported oligosaccharide structures often contain linkages whose exact attachment point is unknown Glc(β1?)Glc 4 possibilities Challenge: Is it possible to store this information in a Mol file or SMILES string?
  • 39. Variable attachment points • SMILES does not support positional variation – However ChemAxon have used dangling bonds to implement an extension • C*.C1CCCCC1 |c:1,3,5,m:1:2.7.3| • V3000 Mol file – Both MarvinSketch and Accelrys Draw store variable attachment points in the same way using the bond block • “M V30 7 1 8 7 ENDPTS=(3 3 2 1) ATTACH=ANY” – ChemSketch uses its own extension, while ChemDraw cannot store the information • V2000 Mol file – Accelrys Draw will not save as V2000, ChemDraw cannot store the information – MarvinSketch uses its own extension, as does ChemSketch
  • 40. Acknowledgements • Evan Bolton PubChem • Daniel Lowe NextMove Software http://nextmovesoftware.com http://nextmovesoftware.com/blog noel@nextmovesoftware.com
  • 41.
  • 42. Left: The Biology Project, Immunology http://www.biology.arizona.edu/immunology/tutorials/antibody/structure.html Center: David Goodsell, Antibodies: September 2001 Molecule of the Month, DOI: 10.2210/rcsb_pdb/mom_2001_9
  • 43. Would a sugar by any other name taste as sweet? • Carbohydrates – Originally compounds of formula (CH2O)n, but now used as generic term for monosaccharides, oligosaccharides, derivatives thereof • Saccharides – Considered by some to be synonymous with carbohydrates, otherwise mono-, oligo- and polysaccharides • Monosaccharides – Aldoses, ketoses, uronic acids, and many more • Sugars – Loose term applied to monosaccharides and lower oligosaccharides • Glycans – “Biochemical” term for oligo- and polysaccharides especially as components of glycoproteins and proteoglycans http://www.chem.qmul.ac.uk/iupac/class/carbo.html
  • 44. Taken from Table 1, Ranzinger et al., Nucleic Acids Research, 2001, 39, D373.
  • 45. How many MONOsaccharides are in PubChem? Count Total occurrence (of supported monosaccharides)* 121680 …of which have defined stereochemistry 86996 ……of which unique 1907 Number of possible substituted hexopyranoses† 9540 …of which observed 936 Number of possible hexopyranose substitution patterns† 54=625 …of which observed 75 * May be present as part of a larger molecule, not necessarily an oligosaccharide † Supported substitutions are amino, acetyl, acetylamino and methoxy