SlideShare une entreprise Scribd logo
1  sur  46
NCI/CADD: Open-access chemical structure web platform  Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 ,  and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany
NCI/CADD Public Web Services  Enhanced NCI Database Browser http://cactus.nci.nih.gov/ncidb2 web service for NCI/DTP’s Open NCI Database ,[object Object],[object Object],[object Object],Chemical Structure Lookup Service http://cactus.nci.nih.gov/lookup ,[object Object],[object Object],structure lookup in over 100 database
NCI/CADD Public Web Services  OSRA   http://cactus.nci.nih.gov/osra/ converts graphical representations of chemical structures in journal articles, patent documents, textbooks, trade magazines etc., into SMILES Online SMILES Translator http://cactus.nci.nih.gov/translate/ GIF Creator for Chemical Structures http://cactus.nci.nih.gov/gifcreator/   PROSIT: Online Pseudorotation Tool Version 2 http://cactus.nci.nih.gov/prosit/
http://cactus.nci.nih.gov
New Web Services
Chemical Structure Representations chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical   names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV   CML SYBYL Line Notation   GIF image
http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different  chemical structure identifiers.  Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources
http://cactus.nci.nih.gov/chemical/structure first beta release:  July 2009 second beta release:  Nov. 2009 third beta release:  April/May 2010 (beta versions will continue through 2010) 3.0 million requests since July 1, 2009 (~11.000/day) Chemical Identifier Resolver NCI/CADD Web Resources
[object Object],example:   http://cactus.nci.nih.gov/chemical/structure/ Tamiflu / cas 204255-11-8 http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” MIME type:  text/plain Chemical Identifier Resolver NCI/CADD Web Resources XML format:  http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” /xml ,[object Object]
identifier representation http request http response detection of the identifier type identifier is a full structure  representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), chemical name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name
“Chemical Structure Web Engine” Chemical Structure Web Engine  NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external web services http Chemical Identifier Resolver other software packages
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],} union set of unique structures:   ~83.6 million Chemical Structure Database NCI/CADD Web Resources
[object Object],[object Object],[object Object],as of March 2010: 140 chemical structure databases 103.9 million structure records  ~70.6 million unique structures by FICuS ChemNav. iResearch Lib. ~56% PubChem ~38% others ~6% Chemical Structure Database NCI/CADD Web Resources
[object Object],[object Object],[object Object],[object Object],[object Object],NCI/CADD Structure Identifiers Unique Representation of Chemical Structures 9850FD9F9E2B4E25 H N N N H 2 O H O
charged form A3DAE0788050DDE4  3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
[object Object],NCI/CADD Structure Identifiers Fragments sensitive keep only largest organic fragment Isotopes ignore isotope labels sensitive Charges uncharge sensitive find canonical tautomer Stereochemistry sensitive discard stereo information un-sensitive un-sensitive un-sensitive un-sensitive sensitive Tautomers Na + Structure Normalization un-sensitive D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier:   representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier:  comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier:  closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
A3DAE0788050DDE4-FICTS  E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
A3DAE0788050DDE4-FICuS  E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
NCI/CADD Chemical Structure Database NCI/CADD:RID NCI/CADD:CID structure records compounds (structures unique by CACTVS HASHISY) FICTS associations ~72.0 million FICuS associations ~70.6 million uuuuu associations ~65.3 million 103.5 million 83.6 million ~130 million linkouts to  original database records ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
resolver chemical names CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID/CID FDA UNII ChemSpider ID ChemNavigator SID Chemical Formula /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu  /image /file, /sdf /mw, /monoisotopic_mass   /formula /twirl, /3d /urls /unii /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
http://cactus.nci.nih.gov/chemical/structure/ LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles Standard InChIKey Chemical Identifier Resolver ,[object Object],CCO http://cactus.nci.nih.gov/chemical/structure/ LFQSCWFLJHTTHZ-UHFFFAOYSA / smiles CCO CC[OH2+]   http://cactus.nci.nih.gov/chemical/structure/ LFQSCWFLJHTTHZ / smiles C(C(O)([2H])[2H])[2H] CC(O)([2H])[2H] C(CO)([2H])([2H])[2H] CC[17OH] C(CO)[2H] [14CH3]CO CCO
alc  Alchemy format cdxml  CambridgeSoft ChemDraw XML format cerius  MSI Cerius II format charmm   Chemistry at HARvard Macromolecular Mechanics file format cif  Crystallographic Information File cml  Chemical Markup Language ctx  Gasteiger Clear Text format gjf  Gaussian input data file gromacs  GROMACS file format hyperchem  HyperChem file format jme  Java Molecule Editor format maestro  Schroedinger MacroModel structure file format mol  Symyx molecule file sybyl2/mol2  Tripos Sybyl MOL2 format mrv  ChemAxon MRV format pdb  Protein Data Bank sdf  Symyx Structure Data Format sdf3000  Symyx Structure Data Format 3000 sln  SYBYL Line Notation smiles   SMILES xyz  xyz file format ,[object Object],http://cactus.nci.nih.gov/chemical/structure/ LFQSCWFLJHTTHZ-UHFFFAOYSA-N / file ?format = sdf File Representation Chemical Identifier Resolver
http://cactus.nci.nih.gov/chemical/structure/ buckyball / image ? height= 300 &width= 300 &bgcolor= black &bondcolor= white http://cactus.nci.nih.gov/chemical/structure/ aspirin / image ?height= 200 &width= 200 &symbolfontsize= 7 &footer=" Aspirin "   Aspirin Structure Image Generation Chemical Identifier Resolver
TwirlyMol Chemical Identifier Resolver implemented by Noel O'Boyle (University College Cork, Ireland) Chrome  Safari   FF3.5/3.6   FF3.0   FF2.0   IE8   IE7   IE6 simple javascript that allows you to render a rotatable/zoomable 3D representation of a molecule in your web browser no plugin is needed, only a modern browser:
[object Object],http://cactus.nci.nih.gov/chemical/structure/ restasis / twirl ,[object Object],<div id=“ canvas ” height=“ 400 ” width=“ 400 ”></div> <script src=“ http://cactus.nci.nih.gov/chemical/structure/ restasis / twirl_cached / canvas ”  /> TwirlyMol Chemical Identifier Resolver
restasis
http://www.coronene.com/blog/ http://chemical-quantum-images.blogspot.com http://baoilleach.blogspot.com/  TwirlyMol Chemical Identifier Resolver
ethanol name a specific resolver module : http://cactus.nci.nih.gov/chemical/structure/ CCO / iupac_name ?resolver= name 2-[[3-(3-chlorophenyl)-1,2,4-oxadiazol-5-yl]sulfanyl]acetic acid ,[object Object],[object Object],[object Object],Ambiguous Identifiers Chemical Identifier Resolver http://cactus.nci.nih.gov/chemical/structure/ CCO / iupac_name ?resolver= smiles
< ?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ? >   < request   string=&quot; CCO &quot;   representation=“ iupac_name &quot; > < data   id=&quot; 1 &quot;   resolver=&quot; smiles &quot; string_class=&quot; SMILES String &quot;> < item   id=&quot; 1 &quot;> ethanol < / item > < / data > < data   id=&quot; 2 &quot; resolver=&quot; name &quot; string_class=&quot; Chemical Name &quot; > < item   id=&quot; 1 &quot; > 2-[[3-(3-chlorophenyl)-1,2,4-oxadiazol-5-yl]sulfanyl]acetic acid < / item > < / data > < / request > XML format: ,[object Object],[object Object],[object Object],Chemical Identifier Resolver Ambiguous Identifiers http://cactus.nci.nih.gov/chemical/structure/ CCO / iupac_name /xml
< ?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ? >   < request   string=&quot; restasis &quot; representation=&quot; urls &quot;> < data   id=&quot; 1 &quot; resolver=&quot; name &quot; string_class=&quot; Chemical   Name &quot;> < item   id=&quot; 1 &quot; classification=&quot; exact &quot; database=&quot; ChemSpider &quot;   publisher=&quot; ChemSpider &quot;> http://chemspider.com/structure.4939506 < /item > < item   id=&quot; 2 &quot; classification=&quot; exact &quot; database=&quot; ChemSpider “ publisher=&quot; PubChem &quot;> http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=43028058 < /item > < item   id=&quot; 3 &quot; classification=&quot; exact &quot; database=&quot; NLM   ChemIDplus &quot;   publisher=&quot; NLM &quot;> http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?result=advanced&regno=059865133 […] < /data > < /request > ,[object Object],http://cactus.nci.nih.gov/chemical/structure/ restasis / urls /xml Chemical Identifier Resolver Database URL Lookup
[object Object],http://cactus.nci.nih.gov/chemical/structure/ CC (= O)Oc1ccccc1C(O)=O/ names /xml   Chemical Identifier Resolver Name Lookup <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;   ? >  < request   string=&quot; CC(=O)Oc1ccccc1C(O)=O &quot;   representation=&quot; names &quot;> < data   id=&quot; 1 &quot; resolver=&quot; smiles &quot;  string_class=&quot; SMILES   String &quot;   description=&quot; CC(=O)Oc1ccccc1C(O)=O &quot; > < item   id=&quot; 1 &quot;  classification =&quot; PUBCHEM_IUPAC_NAME &quot;> 2-acetyloxybenzoic acid < /item > < item   id=&quot; 2 &quot;  classification=&quot; PUBCHEM_IUPAC_OPENEYE_NAME &quot;> 2-Acetoxybenzoic acid < /item > < item   id=&quot; 3 &quot;  classification=&quot; PUBCHEM_GENERIC_REGISTRY_NAME &quot;> 50-78-2 < /item > < item   id=&quot; 4 &quot;  classification=&quot; PUBCHEM_GENERIC_REGISTRY_NAME &quot;> 11126-35-5 </ item > < item   id=&quot; 5 &quot;   classification=&quot; PUBCHEM_GENERIC_REGISTRY_NAME &quot;> 11126-37-7 </ item > < item   id=&quot; 6 &quot;   classification=&quot; PUBCHEM_GENERIC_REGISTRY_NAME &quot;> 2349-94-2 </ item > < item   id=&quot; 7 &quot;   classification=&quot; PUBCHEM_GENERIC_REGISTRY_NAME &quot;> 26914-13-6 </ item > < item   id=&quot; 8 &quot;   classification=&quot; PUBCHEM_SUBSTANCE_SYNONYM &quot;> NCGC00090977-04 </ item > < item   id=&quot; 9 &quot;   classification=&quot; PUBCHEM_SUBSTANCE_SYNONYM &quot;> KBioSS_002272 </ item > < item   id=&quot; 10 &quot;   classification=&quot; PUBCHEM_SUBSTANCE_SYNONYM &quot;> SBB015069 </ item > < item   id=&quot; 11 &quot; classification=&quot; PUBCHEM_SUBSTANCE_SYNONYM &quot;> Aspirin </ item > < item   id=&quot; 12 &quot; classification=&quot; PUBCHEM_SUBSTANCE_SYNONYM &quot;> D00109 </ item > […]
http://cactus.nci.nih.gov/blog /chemical/structure Blog
In Development http://cactus.nci.nih.gov/ TEST_ chemical/structure
[object Object],[object Object],http://cactus.nci.nih.gov/chemical/structure/ operator: identifier/representation “ Chemical Operators” Chemical Identifier Resolver operators:   tautomers,  canonical_tautomer,   addh, removeh, nostereo, rings, …
Tautomers “ Chemical Operator” http://cactus.nci.nih.gov/chemical/structure/ tautomers :guanine /” representation ” N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IUPAC InChI/InChIKey Resolver
IUPAC InChI/InChIKey Resolver  IUPAC Root Resolver Resolver 1 Resolver 2 Resolver 3 Resolver 3.1 Resolver 3.2 Resolver 3.3 Clients Chemical Identifier Resolver
http://cactus.nci.nih.gov/chemical/structure Chemical Identifier Resolver NCI/CADD Web Resources http://cactus.nci.nih.gov/blog
Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, CBL, NCI Igor Filippov  Noel O'Boyle Hans-Juergen Himmler (Akos) Thanks to all database providers! http://cactus.nci.nih.gov Our web site:
Users webel.py - A Cinfony module IUPHAR DATABASE http://www.iuphar-db.org http://baoilleach.blogspot.com/2009/11/introducing-webel-cheminformatics.html   http://www.akosgmbh.eu/globalsearch/index.htm  avogadro.openmolecules.net/   CACTVS http://www.xemistry.com in silico  toxicology http://www.in-silico.ch/   Symyx Draw Resolver http://www.symyx.com/

Contenu connexe

Similaire à ACS San Francisco 2010 CINF Talk

Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patentsdan2097
 
Publishing chemical data in public data repository
Publishing chemical data in public data repository Publishing chemical data in public data repository
Publishing chemical data in public data repository Jian Zhang
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Ken Karapetyan
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemSunghwan Kim
 
Building global chemistry network at the royal society of chemistry
Building global chemistry network at the royal society of chemistryBuilding global chemistry network at the royal society of chemistry
Building global chemistry network at the royal society of chemistryValery Tkachenko
 

Similaire à ACS San Francisco 2010 CINF Talk (20)

Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
 
Online Public Compound Databases
Online Public Compound DatabasesOnline Public Compound Databases
Online Public Compound Databases
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
Publishing chemical data in public data repository
Publishing chemical data in public data repository Publishing chemical data in public data repository
Publishing chemical data in public data repository
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Sourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicologySourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicology
 
Connecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpiderConnecting Chemists to the Internet Through ChemSpider
Connecting Chemists to the Internet Through ChemSpider
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChem
 
Building global chemistry network at the royal society of chemistry
Building global chemistry network at the royal society of chemistryBuilding global chemistry network at the royal society of chemistry
Building global chemistry network at the royal society of chemistry
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 

ACS San Francisco 2010 CINF Talk

  • 1. NCI/CADD: Open-access chemical structure web platform Markus Sitzmann 1 , Wolf-Dietrich Ihlenfeldt 2 , and Marc C. Nicklaus 1 [1] Computer-Aided Drug Design Group, Chemical Biology Laboratory, NCI-Frederick, NIH, DHHS [2] Xemistry GmbH, Auf den Stieden 8, D-35094 Lahntal, Germany
  • 2.
  • 3. NCI/CADD Public Web Services OSRA http://cactus.nci.nih.gov/osra/ converts graphical representations of chemical structures in journal articles, patent documents, textbooks, trade magazines etc., into SMILES Online SMILES Translator http://cactus.nci.nih.gov/translate/ GIF Creator for Chemical Structures http://cactus.nci.nih.gov/gifcreator/ PROSIT: Online Pseudorotation Tool Version 2 http://cactus.nci.nih.gov/prosit/
  • 6. Chemical Structure Representations chemical structure NCI/CADD Identifiers InChI/InChIKey ChemSpider ID PubChem SID/CID chemical names CAS Registry Number NSC number FDA UNII ChemNavigator SID SMILES SD File Chemical Formula ChEBI ID PDB Ligand ID MRV CML SYBYL Line Notation GIF image
  • 7. http://cactus.nci.nih.gov/chemical/structure Works as a resolver for different chemical structure identifiers. Allows one to convert a given structure identifier into another representation or structure identifier. Chemical Identifier Resolver NCI/CADD Web Resources
  • 8. http://cactus.nci.nih.gov/chemical/structure first beta release: July 2009 second beta release: Nov. 2009 third beta release: April/May 2010 (beta versions will continue through 2010) 3.0 million requests since July 1, 2009 (~11.000/day) Chemical Identifier Resolver NCI/CADD Web Resources
  • 9.
  • 10. identifier representation http request http response detection of the identifier type identifier is a full structure representation (e.g. SMILES, InChI) calculation of the requested structure representation identifier is a hashed structure representation (e.g. InChIKey), chemical name etc. database lookup MIME type Chemical Identifier Resolver NCI/CADD Web Resources structure e.g. InChI, GIF image e.g. CAS number, chemical name
  • 11. “Chemical Structure Web Engine” Chemical Structure Web Engine NCI/CADD web service NCI/CADD web service NCI/CADD Chemical Structure Database (CSDB) CACTVS external web services http Chemical Identifier Resolver other software packages
  • 12.
  • 13.
  • 14.
  • 15. charged form A3DAE0788050DDE4 3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
  • 16. input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
  • 17.
  • 18. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 19. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier: representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 20. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier: comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 21. NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier: closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
  • 22. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 23. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 24. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 25.
  • 26. resolver chemical names CAS numbers SMILES strings IUPAC InChI/InChIKeys NCI/CADD Identifiers CACTVS HASHISY NSC number PubChem SID/CID FDA UNII ChemSpider ID ChemNavigator SID Chemical Formula /smiles /names, /iupac_name /cas /inchi, /stdinchi /inchikey, /stdinchikey /ficts, /ficus, /uuuuu /image /file, /sdf /mw, /monoisotopic_mass /formula /twirl, /3d /urls /unii /chemspider_id /pubchem_sid /chemnavigator_sid “ identifier” “ representation” http://cactus.nci.nih.gov/chemcial/structure Chemical Identifier Resolver NCI/CADD Public Web Resources
  • 27.
  • 28.
  • 29. http://cactus.nci.nih.gov/chemical/structure/ buckyball / image ? height= 300 &width= 300 &bgcolor= black &bondcolor= white http://cactus.nci.nih.gov/chemical/structure/ aspirin / image ?height= 200 &width= 200 &symbolfontsize= 7 &footer=&quot; Aspirin &quot; Aspirin Structure Image Generation Chemical Identifier Resolver
  • 30. TwirlyMol Chemical Identifier Resolver implemented by Noel O'Boyle (University College Cork, Ireland) Chrome Safari FF3.5/3.6 FF3.0 FF2.0 IE8 IE7 IE6 simple javascript that allows you to render a rotatable/zoomable 3D representation of a molecule in your web browser no plugin is needed, only a modern browser:
  • 31.
  • 34.
  • 35.
  • 36.
  • 37.
  • 39. In Development http://cactus.nci.nih.gov/ TEST_ chemical/structure
  • 40.
  • 41. Tautomers “ Chemical Operator” http://cactus.nci.nih.gov/chemical/structure/ tautomers :guanine /” representation ” N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
  • 42.
  • 43. IUPAC InChI/InChIKey Resolver IUPAC Root Resolver Resolver 1 Resolver 2 Resolver 3 Resolver 3.1 Resolver 3.2 Resolver 3.3 Clients Chemical Identifier Resolver
  • 44. http://cactus.nci.nih.gov/chemical/structure Chemical Identifier Resolver NCI/CADD Web Resources http://cactus.nci.nih.gov/blog
  • 45. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, CBL, NCI Igor Filippov Noel O'Boyle Hans-Juergen Himmler (Akos) Thanks to all database providers! http://cactus.nci.nih.gov Our web site:
  • 46. Users webel.py - A Cinfony module IUPHAR DATABASE http://www.iuphar-db.org http://baoilleach.blogspot.com/2009/11/introducing-webel-cheminformatics.html http://www.akosgmbh.eu/globalsearch/index.htm avogadro.openmolecules.net/ CACTVS http://www.xemistry.com in silico toxicology http://www.in-silico.ch/ Symyx Draw Resolver http://www.symyx.com/