SlideShare une entreprise Scribd logo
1  sur  37
How can the International Chemical
Identifier (InChI) be extended to non-
                     trivial chemicals?
                        of the pillars of a
                          V. Tkachenko, A.J. Williams,
         Y. Borodina, F. Switzer, T. Peryea, L. Callahan

                                    ACS Philly August 2012
What is InChI
InChI Examples


     CH3CH2OH
                      InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
      ethanol




                      InChI=1S/C6H8O6/c7-1-2(8)5-
    L-ascorbic acid   3(9)4(10)6(11)12-5/h2,5,7-8,10-
                      11H,1H2/t2-,5+/m0/s1
InChI Structure
InChIKey
   The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the
    SHA-256 algorithm)
   Designed to allow for easy web searches of chemical compounds
   InChIKeys consist of
       14 characters resulting from a hash of the connectivity information of the InChI
       followed by 9 characters resulting from a hash of the remaining layers of the InChI
       followed by a single character indication the version of InChI used
       followed by single checksum character




   InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-
    11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
   BQJCRHHNABKAKU-KBQPJGBKSA-N
   Unlike InChI, InChIKey  CT only by lookup
Proliferation of InChI
Search by InChI
ChemSpider Google Search
http://www.chemspider.com/google/
What’s the catch?

 InChI has limitations
 InChI is ideal for
    Simple
    Static
    Well-defined graphs
 Real chemical substances can only be
  approximated by such graphs
Limitations
 Non-trivial stereo (e.g. axial, planar)
 Non-trivial tautomers (e.g. ring-chain)
 Mixtures – full stereo is rarely known
 Polymers
 Markush structures
 Organometalics
 Inorganics
 Materials
 Reactions
 Etc
Chemical data complexity
Work in progress
   InChI Extensions: Under the guidance of IUPAC, several sub-teams are now
    working on expanding InChI to new areas of chemical representation:

      Reaction InChI (RInChI): the reaction working group has completed its
       recommendations, and work is ready to begin.

      Polymers/Mixtures: The polymers/mixtures working group also has
       submitted its recommendations, and work to incorporate the new
       representations should begin once version 1.04 is released.

      Markush: This project is the most complex undertaken to date. The initial
       recommendations have been submitted, but financing of the work still
       needs to be sorted out.

   But what do we do NOW???
Data
   Validation

 Standardization

    Filtering

Componentization
                   Deposition Process




 Deduplication

    Mapping
      data
      Non-
   redundant
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI…
 Unfortunately not supported
 Is it important?
 Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures



                                     Moiety 1:
Substance:




                                      Moiety 2:
Substance:   Moiety 1:



             Moiety 2:



             Moiety 3:



             Moiety 4:
Substance:   Moiety 1:




             Moiety 2:
                         (undefined)
Moiety 1:
Substance:


                         (A)


             Moiety 2:
                         (B)
D-glucose
SRS standardization approach
   Substance description
   Standardization module
   Moieties generator
   Normalization
   InChI[Key] generator


 Hash function f(InChIKeys, moieties)


 Unique ID
 Standard description
SRS TBD
 Markush

 Polymers

 Proteins

 Inorganics

 Materials
OpenPHACTS
 Open PHACTS is an Innovative Medicines Initiative
  (IMI) – 3 years project

 To reduce the barriers to drug discovery in industry,
  academia and for small businesses

 To build an open platform, integrating chemistry and
  biology data from public domain resources

 Semantic web platform

 Open Standards, Open Data and Open Source
OpenPHACTS specifics
 Active/inactive ingredient

 Parent/child

 Sample/substance

 Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges
 Deduplication

 Identification

 Deposition
Conclusions
 InChI is The Identifier

 InChI has its limitations

 InChI is work in progress

 InChI deficiencies can be hot-fixed
Acknowledgements
 RSC Cheminformatics group

 FDA SRS group

 OpenPHACTS consortium

 Software: InChI, GGA Software
Thank you

Email: tkachenkov@rsc.org
Blog: www.chemspider.com/blog
SLIDES:
http://www.slideshare.net/valerytkachenko16

Contenu connexe

En vedette

Do arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaDo arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaLuiz Carlos Dias
 
Toda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaToda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaLuiz Carlos Dias
 
Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)silviagarcia66
 
Dezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososDezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososLuiz Carlos Dias
 

En vedette (7)

Do arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaDo arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de Lisboa
 
O Segredo da Cebola
O Segredo da CebolaO Segredo da Cebola
O Segredo da Cebola
 
Microbios
MicrobiosMicrobios
Microbios
 
Toda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaToda a verdade sobre a linhaça
Toda a verdade sobre a linhaça
 
Cuide seus olhos
Cuide seus olhosCuide seus olhos
Cuide seus olhos
 
Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)
 
Dezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososDezenove predios inusitados e curiosos
Dezenove predios inusitados e curiosos
 

Similaire à Extending InChI for Non-Trivial Chemicals

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chisRoyal Society of Chemistry
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Peter van Amsterdam
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...Frederik van den Broek
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysisWenlan Hu
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Universitat Politècnica de València
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyEFSA EU
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsWenlan Hu
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocolWenlan Hu
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingWenlan Hu
 

Similaire à Extending InChI for Non-Trivial Chemicals (20)

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
 
ICH
ICHICH
ICH
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysis
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicology
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of results
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging Drugs
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocol
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging Testing
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Extending InChI for Non-Trivial Chemicals

  • 1. How can the International Chemical Identifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 5. InChIKey  The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm)  Designed to allow for easy web searches of chemical compounds  InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character  InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1  BQJCRHHNABKAKU-KBQPJGBKSA-N  Unlike InChI, InChIKey  CT only by lookup
  • 9. What’s the catch?  InChI has limitations  InChI is ideal for  Simple  Static  Well-defined graphs  Real chemical substances can only be approximated by such graphs
  • 10. Limitations  Non-trivial stereo (e.g. axial, planar)  Non-trivial tautomers (e.g. ring-chain)  Mixtures – full stereo is rarely known  Polymers  Markush structures  Organometalics  Inorganics  Materials  Reactions  Etc
  • 12. Work in progress  InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.  But what do we do NOW???
  • 13. Data Validation Standardization Filtering Componentization Deposition Process Deduplication Mapping data Non- redundant
  • 19. Enhanced stereo and InChI…  Unfortunately not supported  Is it important?  Now real-world examples…
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1: Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1: Substance: (A) Moiety 2: (B)
  • 26. SRS standardization approach  Substance description  Standardization module  Moieties generator  Normalization  InChI[Key] generator  Hash function f(InChIKeys, moieties)  Unique ID  Standard description
  • 27. SRS TBD  Markush  Polymers  Proteins  Inorganics  Materials
  • 28. OpenPHACTS  Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project  To reduce the barriers to drug discovery in industry, academia and for small businesses  To build an open platform, integrating chemistry and biology data from public domain resources  Semantic web platform  Open Standards, Open Data and Open Source
  • 29.
  • 30.
  • 31. OpenPHACTS specifics  Active/inactive ingredient  Parent/child  Sample/substance  Misreferences (!!!)
  • 33.
  • 34. ChemSpider Reaction Challenges  Deduplication  Identification  Deposition
  • 35. Conclusions  InChI is The Identifier  InChI has its limitations  InChI is work in progress  InChI deficiencies can be hot-fixed
  • 36. Acknowledgements  RSC Cheminformatics group  FDA SRS group  OpenPHACTS consortium  Software: InChI, GGA Software
  • 37. Thank you Email: tkachenkov@rsc.org Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16