Crawling Across the Web of Chemistry Using ChemSpider
ACS San Francisco 2010 CINF Talk
1. NCI/CADD: Open-access chemical structure web platform Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 , and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany
2.
3. NCI/CADD Public Web Services OSRA http://cactus.nci.nih.gov/osra/ converts graphical representations of chemical structures in journal articles, patent documents, textbooks, trade magazines etc., into SMILES Online SMILES Translator http://cactus.nci.nih.gov/translate/ GIF Creator for Chemical Structures http://cactus.nci.nih.gov/gifcreator/ PROSIT: Online Pseudorotation Tool Version 2 http://cactus.nci.nih.gov/prosit/
6. Chemical Structure Representations chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV CML SYBYL Line Notation GIF image
7. http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different chemical structure identifiers. Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources
8. http://cactus.nci.nih.gov/chemical/structure first beta release: July 2009 second beta release: Nov. 2009 third beta release: April/May 2010 (beta versions will continue through 2010) 3.0 million requests since July 1, 2009 (~11.000/day) Chemical Identifier Resolver NCI/CADD Web Resources
9.
10. identifier representation http request http response detection of the identifier type identifier is a full structure representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), chemical name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name
11. “Chemical Structure Web Engine” Chemical Structure Web Engine NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external web services http Chemical Identifier Resolver other software packages
12.
13.
14.
15. charged form A3DAE0788050DDE4 3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
18. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
19. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier: representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
20. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier: comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
21. NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier: closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
22. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
23. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
24. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
25.
26. resolver chemical names CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID/CID FDA UNII ChemSpider ID ChemNavigator SID Chemical Formula /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu /image /file, /sdf /mw, /monoisotopic_mass /formula /twirl, /3d /urls /unii /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
30. TwirlyMol Chemical Identifier Resolver implemented by Noel O'Boyle (University College Cork, Ireland) Chrome Safari FF3.5/3.6 FF3.0 FF2.0 IE8 IE7 IE6 simple javascript that allows you to render a rotatable/zoomable 3D representation of a molecule in your web browser no plugin is needed, only a modern browser:
41. Tautomers “ Chemical Operator” http://cactus.nci.nih.gov/chemical/structure/ tautomers :guanine /” representation ” N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
45. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, CBL, NCI Igor Filippov Noel O'Boyle Hans-Juergen Himmler (Akos) Thanks to all database providers! http://cactus.nci.nih.gov Our web site: