SlideShare une entreprise Scribd logo
1  sur  37
How can the International Chemical
Identifier (InChI) be extended to non-
                     trivial chemicals?
                        of the pillars of a
                          V. Tkachenko, A.J. Williams,
         Y. Borodina, F. Switzer, T. Peryea, L. Callahan

                                    ACS Philly August 2012
What is InChI
InChI Examples


     CH3CH2OH
                      InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
      ethanol




                      InChI=1S/C6H8O6/c7-1-2(8)5-
    L-ascorbic acid   3(9)4(10)6(11)12-5/h2,5,7-8,10-
                      11H,1H2/t2-,5+/m0/s1
InChI Structure
InChIKey
   The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the
    SHA-256 algorithm)
   Designed to allow for easy web searches of chemical compounds
   InChIKeys consist of
       14 characters resulting from a hash of the connectivity information of the InChI
       followed by 9 characters resulting from a hash of the remaining layers of the InChI
       followed by a single character indication the version of InChI used
       followed by single checksum character




   InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-
    11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
   BQJCRHHNABKAKU-KBQPJGBKSA-N
   Unlike InChI, InChIKey  CT only by lookup
Proliferation of InChI
Search by InChI
ChemSpider Google Search
http://www.chemspider.com/google/
What’s the catch?

 InChI has limitations
 InChI is ideal for
    Simple
    Static
    Well-defined graphs
 Real chemical substances can only be
  approximated by such graphs
Limitations
 Non-trivial stereo (e.g. axial, planar)
 Non-trivial tautomers (e.g. ring-chain)
 Mixtures – full stereo is rarely known
 Polymers
 Markush structures
 Organometalics
 Inorganics
 Materials
 Reactions
 Etc
Chemical data complexity
Work in progress
   InChI Extensions: Under the guidance of IUPAC, several sub-teams are now
    working on expanding InChI to new areas of chemical representation:

      Reaction InChI (RInChI): the reaction working group has completed its
       recommendations, and work is ready to begin.

      Polymers/Mixtures: The polymers/mixtures working group also has
       submitted its recommendations, and work to incorporate the new
       representations should begin once version 1.04 is released.

      Markush: This project is the most complex undertaken to date. The initial
       recommendations have been submitted, but financing of the work still
       needs to be sorted out.

   But what do we do NOW???
Data
   Validation

 Standardization

    Filtering

Componentization
                   Deposition Process




 Deduplication

    Mapping
      data
      Non-
   redundant
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI…
 Unfortunately not supported
 Is it important?
 Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures



                                     Moiety 1:
Substance:




                                      Moiety 2:
Substance:   Moiety 1:



             Moiety 2:



             Moiety 3:



             Moiety 4:
Substance:   Moiety 1:




             Moiety 2:
                         (undefined)
Moiety 1:
Substance:


                         (A)


             Moiety 2:
                         (B)
D-glucose
SRS standardization approach
   Substance description
   Standardization module
   Moieties generator
   Normalization
   InChI[Key] generator


 Hash function f(InChIKeys, moieties)


 Unique ID
 Standard description
SRS TBD
 Markush

 Polymers

 Proteins

 Inorganics

 Materials
OpenPHACTS
 Open PHACTS is an Innovative Medicines Initiative
  (IMI) – 3 years project

 To reduce the barriers to drug discovery in industry,
  academia and for small businesses

 To build an open platform, integrating chemistry and
  biology data from public domain resources

 Semantic web platform

 Open Standards, Open Data and Open Source
OpenPHACTS specifics
 Active/inactive ingredient

 Parent/child

 Sample/substance

 Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges
 Deduplication

 Identification

 Deposition
Conclusions
 InChI is The Identifier

 InChI has its limitations

 InChI is work in progress

 InChI deficiencies can be hot-fixed
Acknowledgements
 RSC Cheminformatics group

 FDA SRS group

 OpenPHACTS consortium

 Software: InChI, GGA Software
Thank you

Email: tkachenkov@rsc.org
Blog: www.chemspider.com/blog
SLIDES:
http://www.slideshare.net/valerytkachenko16

Contenu connexe

En vedette

Power blog
Power blogPower blog
Power bloglaos2011
 
The rsc e science - reflecting the change in the world we live in
The rsc e science - reflecting the change in the world we live inThe rsc e science - reflecting the change in the world we live in
The rsc e science - reflecting the change in the world we live inValery Tkachenko
 
Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesValery Tkachenko
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...Valery Tkachenko
 
Text mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessText mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessValery Tkachenko
 

En vedette (6)

Baldor algebra
Baldor algebraBaldor algebra
Baldor algebra
 
Power blog
Power blogPower blog
Power blog
 
The rsc e science - reflecting the change in the world we live in
The rsc e science - reflecting the change in the world we live inThe rsc e science - reflecting the change in the world we live in
The rsc e science - reflecting the change in the world we live in
 
Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and libraries
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...
 
Text mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessText mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community access
 

Similaire à How can the international chemical identifier (InChI) be extended to non …

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chisRoyal Society of Chemistry
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeEmma Schymanski
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Peter van Amsterdam
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...Frederik van den Broek
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysisWenlan Hu
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Universitat Politècnica de València
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyEFSA EU
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsAlex Clark
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsWenlan Hu
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocolWenlan Hu
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingWenlan Hu
 

Similaire à How can the international chemical identifier (InChI) be extended to non … (20)

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
 
ICH
ICHICH
ICH
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysis
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicology
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of results
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging Drugs
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocol
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging Testing
 

Plus de Valery Tkachenko

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureValery Tkachenko
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materialsValery Tkachenko
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Valery Tkachenko
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsValery Tkachenko
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsValery Tkachenko
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionValery Tkachenko
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Valery Tkachenko
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...Valery Tkachenko
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Valery Tkachenko
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Valery Tkachenko
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchValery Tkachenko
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardizationValery Tkachenko
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsValery Tkachenko
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical informationValery Tkachenko
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesValery Tkachenko
 

Plus de Valery Tkachenko (20)

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the future
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materials
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representations
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpoints
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials research
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardization
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and Learnings
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spaces
 

Dernier

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

How can the international chemical identifier (InChI) be extended to non …

  • 1. How can the International Chemical Identifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 5. InChIKey  The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm)  Designed to allow for easy web searches of chemical compounds  InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character  InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1  BQJCRHHNABKAKU-KBQPJGBKSA-N  Unlike InChI, InChIKey  CT only by lookup
  • 9. What’s the catch?  InChI has limitations  InChI is ideal for  Simple  Static  Well-defined graphs  Real chemical substances can only be approximated by such graphs
  • 10. Limitations  Non-trivial stereo (e.g. axial, planar)  Non-trivial tautomers (e.g. ring-chain)  Mixtures – full stereo is rarely known  Polymers  Markush structures  Organometalics  Inorganics  Materials  Reactions  Etc
  • 12. Work in progress  InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.  But what do we do NOW???
  • 13. Data Validation Standardization Filtering Componentization Deposition Process Deduplication Mapping data Non- redundant
  • 19. Enhanced stereo and InChI…  Unfortunately not supported  Is it important?  Now real-world examples…
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1: Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1: Substance: (A) Moiety 2: (B)
  • 26. SRS standardization approach  Substance description  Standardization module  Moieties generator  Normalization  InChI[Key] generator  Hash function f(InChIKeys, moieties)  Unique ID  Standard description
  • 27. SRS TBD  Markush  Polymers  Proteins  Inorganics  Materials
  • 28. OpenPHACTS  Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project  To reduce the barriers to drug discovery in industry, academia and for small businesses  To build an open platform, integrating chemistry and biology data from public domain resources  Semantic web platform  Open Standards, Open Data and Open Source
  • 29.
  • 30.
  • 31. OpenPHACTS specifics  Active/inactive ingredient  Parent/child  Sample/substance  Misreferences (!!!)
  • 33.
  • 34. ChemSpider Reaction Challenges  Deduplication  Identification  Deposition
  • 35. Conclusions  InChI is The Identifier  InChI has its limitations  InChI is work in progress  InChI deficiencies can be hot-fixed
  • 36. Acknowledgements  RSC Cheminformatics group  FDA SRS group  OpenPHACTS consortium  Software: InChI, GGA Software
  • 37. Thank you Email: tkachenkov@rsc.org Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16