SlideShare une entreprise Scribd logo
1  sur  53
InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures
[object Object],[object Object],[object Object],[object Object],[object Object],NCI/CADD Structure Identifiers Unique Representation of Chemical Structures 9850FD9F9E2B4E25 H N N N H 2 O H O
charged form A3DAE0788050DDE4  3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
[object Object],NCI/CADD Structure Identifiers Fragments sensitive keep only largest organic fragment Isotopes ignore isotope labels sensitive Charges uncharge sensitive find canonical tautomer Stereochemistry sensitive discard stereo information un-sensitive un-sensitive un-sensitive un-sensitive sensitive Tautomers Na + Structure Normalization un-sensitive D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier:   representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier:  comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier:  closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds input structure normalize  or  discard stereo information define canonical tautomer discard isotope labels d Structure Normalization get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure uuuuu uuuuS uuuTu uuuTS FICuu FICuS FICTS FICTu n n n n d d d define canonical resonance form/ protonation state parent structures
NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57   9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> H N N N H 2 O H O
A3DAE0788050DDE4-FICTS  E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
A3DAE0788050DDE4-FICuS  E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N  HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO -UHFFFAOYSA-N H N N N H 2 O - O N a + HNDVDQJCIGZPNO -UHFFFAOYSA-N charged form tautomers isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO -UHFFFAOYSA-N UHPNKBYGGMJTIM-UHFFFAOYSA-M  UHPNKBYGGMJTIM-UHFFFAOYSA-M  H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
[object Object],[object Object],[object Object],Tautomers Structure Normalization ,[object Object],[object Object]
Tautomers Structure Normalization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026 -FICTS 0B345B47F6625113 -FICTS 181CA9BCE3EF47F4 -FICTS 1AD375920BE60DAD -FICTS 67196F0B20B1D934 -FICTS BCCDA7D0CDACF120 -FICTS CE8F480C11DBFC4F -FICTS D46A1E6500B06AB6 -FICTS D979CF9770AC0BA5 -FICTS 56FFE8B5619FB01 -FICTS F802E527EC5C61BF -FICTS EF060DA9D97091DE -FICTS BCCDA7D0CDACF120 -FICuS guanine UYTPUPDQBNUYGX-UHFFFAOYSA-N N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+ LABTWGUMFABVFG -ONEGZZNKSA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4- LYGWZVOQSCPYDG -PLNGDYQASA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3- LABTWGUMFABVFG -ARJAWSKDSA-N tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3 LABTWGUMFABVFG -UHFFFAOYSA-N O Z O E O H O
821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS  1677645190718885 -FICTS  tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl propenyl ketone FICTS “sees” four  different structures InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? uncharge ≠ uncharge problem! 2E011EE4519F7920 different protonation states N N H N N H H N N H N N H H
[object Object],[object Object],Structure Normalization shifting of charges: 5 rules recombination of charges: 5 rules separation of charges: 4 rules O N O Charges in Resonance Systems O N O O N O O N O O N O O N O
Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombination 1.2 recombination separation (pentavalent N atom) 1.3 shift 1.3 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift Charges in Resonance Systems IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F -FICuS N O O N O O N O O N O O N O O N O O N O O N O O
[object Object],[object Object],[object Object],»Chemical Structure Lookup Service« Database 74 million structure records  (~46 million unique structures) InChI/InChIKey - NCI/CADD Identifier comparison ChemNav. iResearch   Lib.  ~43% PubChem ~47% Others ~ 10%
[object Object],successful calculation of: Standard InChI/InChIKey:  73.8 million  records NCI/CADD Structure Identifiers:  73.7 million  records ,[object Object],Standard InChI/InChIKey: FICTS Identifier FICuS Identifier Standard InChIKey (first block) uuuuu Identifier 48,027,940 48,023,835 46,715,521 43,055,589 41,671,010 Standard InChI/InChIKeys where calculated by  stdinchi-1 (Linux i-386 executable) from the  original SD file  records Unique Structure Counts InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? 2
no conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FICuS linked to a single InChI/InChIKey both linked to a  single  structure record both linked to  multiple  structure records 62.3 34.4 27.9 all structure records (46.9%) (38.0%) 73.7 (84.5%) structure records (million records) 1
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 1
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 number of InChIKeys first block 0.9 number of InChIKeys first block 2.3 (1.2%) (3.1%) 1
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? InChI changes InChI changes 2
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison 3.2 6.3 (7.6%) (8.4%) vs. InChIKey first block InChI changes InChI changes same InChI/InChIKey? 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) 2
(formal) tautomer count >  1 (formal) tautomer count >  3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic compound classification 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison occurrence in FICuS set occurrence in  FICuS subset ( InChI changes )
FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates  3 different   strings/keys for these 12 structure records (all have the same connectivity layer/first block)  all of these 3  StdInChI/InChIKey  differ from the  StdInChI/InChIKey  calculated after  FICuS  normalization (including connectivity layer/ first block) InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A3422/0145215 N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 S R H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? ZINC04685909 ChemBlock A3422/0145215 ChemNavigator 47748165 NIST MS-Lib 1967005690 ChemNavigator 34903393 ChemNavigator 65635274 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? InChIKey A InChIKey B InChIKey C same connectivity layer/block FICuS parent structure H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best representation InChI FICuS Z E E Z E S N S N I original structure
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures http://cactus.nci.nih.gov/lookup Chemical Structure Lookup Service
Web Service Chemical Structure REST Service (beta)  http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / names http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / ficus http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / stdinchi http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / image http://cactus.nci.nih.gov/chemical/structure/ ethanol / stdinchikey http://cactus.nci.nih.gov/chemical/structure/ 64-17-5 / stdinchikey URL scheme: returns plain text/gif image if the structure identifier is not resolvable:  http 404 status code
Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry GmbH Wolf-Dietrich Ihlenfeldt Thanks to all database providers http://cactus.nci.nih.gov Our web site:

Contenu connexe

En vedette

Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignShahir Shamsir
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Aakshay Subramaniam
 
Computer aided drug designing
Computer aided drug designingComputer aided drug designing
Computer aided drug designingMuhammed sadiq
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing Ayesha Aftab
 

En vedette (7)

Computer Aided Drug Design
Computer Aided Drug DesignComputer Aided Drug Design
Computer Aided Drug Design
 
CADD Lecture
CADD LectureCADD Lecture
CADD Lecture
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug Design
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
Computer aided drug designing
Computer aided drug designingComputer aided drug designing
Computer aided drug designing
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing
 

Similaire à ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry TalkMarkus Sitzmann
 
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkMarkus Sitzmann
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Prof. Wim Van Criekinge
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15Krystal Huffer
 
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Richard West
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Nucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryNucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryHarry An
 
Question #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxQuestion #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxmakdul
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Christoph Steinbeck
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compound
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compoundSynthesis, Crystal and Molecular Structure Studies of a new Pyrazole compound
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compoundIRJET Journal
 

Similaire à ACS Salt Lake City 2009 CINF Talk (InChI Symposium) (20)

ICCS9 2011 Talk
ICCS9 2011 TalkICCS9 2011 Talk
ICCS9 2011 Talk
 
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
 
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF Talk
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Seton2007
Seton2007Seton2007
Seton2007
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15
 
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
RJM-Certificates
RJM-CertificatesRJM-Certificates
RJM-Certificates
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Nucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryNucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-Harry
 
Arom fold
Arom foldArom fold
Arom fold
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Question #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxQuestion #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docx
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compound
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compoundSynthesis, Crystal and Molecular Structure Studies of a new Pyrazole compound
Synthesis, Crystal and Molecular Structure Studies of a new Pyrazole compound
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

  • 1. InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
  • 2. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures
  • 3.
  • 4. charged form A3DAE0788050DDE4 3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
  • 5. input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
  • 6.
  • 7. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 8. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier: representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 9. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier: comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 10. NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier: closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
  • 11. NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds input structure normalize or discard stereo information define canonical tautomer discard isotope labels d Structure Normalization get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure uuuuu uuuuS uuuTu uuuTS FICuu FICuS FICTS FICTu n n n n d d d define canonical resonance form/ protonation state parent structures
  • 12. NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57 9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> H N N N H 2 O H O
  • 13. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 14. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 15. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 16. HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO -UHFFFAOYSA-N H N N N H 2 O - O N a + HNDVDQJCIGZPNO -UHFFFAOYSA-N charged form tautomers isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO -UHFFFAOYSA-N UHPNKBYGGMJTIM-UHFFFAOYSA-M UHPNKBYGGMJTIM-UHFFFAOYSA-M H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 17. Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
  • 18.
  • 19.
  • 20. Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026 -FICTS 0B345B47F6625113 -FICTS 181CA9BCE3EF47F4 -FICTS 1AD375920BE60DAD -FICTS 67196F0B20B1D934 -FICTS BCCDA7D0CDACF120 -FICTS CE8F480C11DBFC4F -FICTS D46A1E6500B06AB6 -FICTS D979CF9770AC0BA5 -FICTS 56FFE8B5619FB01 -FICTS F802E527EC5C61BF -FICTS EF060DA9D97091DE -FICTS BCCDA7D0CDACF120 -FICuS guanine UYTPUPDQBNUYGX-UHFFFAOYSA-N N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
  • 21. Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
  • 22. tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
  • 23. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  • 24. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+ LABTWGUMFABVFG -ONEGZZNKSA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4- LYGWZVOQSCPYDG -PLNGDYQASA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3- LABTWGUMFABVFG -ARJAWSKDSA-N tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3 LABTWGUMFABVFG -UHFFFAOYSA-N O Z O E O H O
  • 25. 821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS 1677645190718885 -FICTS tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl propenyl ketone FICTS “sees” four different structures InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  • 26. Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? uncharge ≠ uncharge problem! 2E011EE4519F7920 different protonation states N N H N N H H N N H N N H H
  • 27.
  • 28. Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombination 1.2 recombination separation (pentavalent N atom) 1.3 shift 1.3 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift Charges in Resonance Systems IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F -FICuS N O O N O O N O O N O O N O O N O O N O O N O O
  • 29.
  • 30.
  • 31. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison
  • 32. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison
  • 33. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? 2
  • 34. no conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FICuS linked to a single InChI/InChIKey both linked to a single structure record both linked to multiple structure records 62.3 34.4 27.9 all structure records (46.9%) (38.0%) 73.7 (84.5%) structure records (million records) 1
  • 35. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 1
  • 36. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 number of InChIKeys first block 0.9 number of InChIKeys first block 2.3 (1.2%) (3.1%) 1
  • 37. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? InChI changes InChI changes 2
  • 38. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison 3.2 6.3 (7.6%) (8.4%) vs. InChIKey first block InChI changes InChI changes same InChI/InChIKey? 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) 2
  • 39. (formal) tautomer count > 1 (formal) tautomer count > 3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic compound classification 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison occurrence in FICuS set occurrence in FICuS subset ( InChI changes )
  • 40. FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates 3 different strings/keys for these 12 structure records (all have the same connectivity layer/first block) all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/ first block) InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  • 41. H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  • 42. H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A3422/0145215 N O N N H O O
  • 43. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O
  • 44. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  • 45. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 S R H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  • 46. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? ZINC04685909 ChemBlock A3422/0145215 ChemNavigator 47748165 NIST MS-Lib 1967005690 ChemNavigator 34903393 ChemNavigator 65635274 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  • 47. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? InChIKey A InChIKey B InChIKey C same connectivity layer/block FICuS parent structure H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  • 48. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
  • 49. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
  • 50. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best representation InChI FICuS Z E E Z E S N S N I original structure
  • 51. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures http://cactus.nci.nih.gov/lookup Chemical Structure Lookup Service
  • 52. Web Service Chemical Structure REST Service (beta) http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / names http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / ficus http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / stdinchi http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / image http://cactus.nci.nih.gov/chemical/structure/ ethanol / stdinchikey http://cactus.nci.nih.gov/chemical/structure/ 64-17-5 / stdinchikey URL scheme: returns plain text/gif image if the structure identifier is not resolvable: http 404 status code
  • 53. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry GmbH Wolf-Dietrich Ihlenfeldt Thanks to all database providers http://cactus.nci.nih.gov Our web site: