SlideShare a Scribd company logo
1 of 40
Capturing Chemistry in XML/CML J. A. Townsend * ,  S. E. Adams *  , J. M. Goodman * ,  P. Murray-Rust * , C. A. Waudby *   Capturing Chemistry in XML/CML ACS March 2004 *  Unilever Centre for Molecular Informatics, University of Cambridge
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
The Agony Of  Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> mp 65-66   C Human-readable Machine-readable
The Vision-2 ,[object Object],Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],But also
Our Approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004
Machine Parsing  of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured  documents and data in  XML MACHINE PARSING   ?
How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add  Structure Parse with Regular Expressions Legacy to CML  converters
Regular Expressions Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],Maybe ‘.’ Any  punctuation 0 or more digits Capital ‘ C’ Melting point: two possible syntaxes Capital or  lowercase ‘m’ Lowercase ‘ p’ Maybe whitespace Maybe degrees sign m.p. > 23.5 °C mp 23.5 – 25 °C
CML - XML For  Chemistry ,[object Object],[object Object],[object Object],[object Object],[object Object],Capturing Chemistry in XML/CML ACS March 2004 J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG,  * AniML + ,  * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci.,  2003 ,  43 , 757
Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure,  thermodynamic properties
Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is  normally used, but Hermann  Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar  dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66”  /> Linked to CML schema Accesses CCML  namespace Units dictionary id =&quot;celsius&quot;  name =&quot;Celsius&quot;  parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot;  constantToSI =&quot;273.15&quot;  abbreviation =&quot;C&quot;  unitType =&quot;temp&quot; id =&quot;meltrange&quot;  term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
Information  Checked / Extracted Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings  searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect  author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
Capturing Molecules Capturing Chemistry in XML/CML ACS March 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Encourage chemists to
NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY:  Locant  Characteristic Group  Mono valent parent hydride Multiplier  Heterocyclic parent hydride
Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004

More Related Content

What's hot

Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...
Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...
Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...Tianyu Liu
 
Separacion de H2S
Separacion de H2SSeparacion de H2S
Separacion de H2Sluhurocu
 
Ag (5 8) bonacickoutecky2001
Ag (5 8) bonacickoutecky2001Ag (5 8) bonacickoutecky2001
Ag (5 8) bonacickoutecky2001hong-nguyen
 
Ccprice aps2020 interfacial_electromechanics_pvsk
Ccprice aps2020 interfacial_electromechanics_pvskCcprice aps2020 interfacial_electromechanics_pvsk
Ccprice aps2020 interfacial_electromechanics_pvskChris Price
 
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...Arash Nasiri
 
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...Tianyu Liu
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2International QSAR Foundation
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2David Woo
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Theabhi.in
 
CHE 611 Presentation
CHE 611 PresentationCHE 611 Presentation
CHE 611 PresentationDhruv Jain
 
Bradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekBradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekJean-Claude Bradley
 
ASGC Presentation
ASGC PresentationASGC Presentation
ASGC PresentationKellenH
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1International QSAR Foundation
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolboxguestcfca1eb1
 
The Value of Openness in Research and Teaching
The Value of Openness in Research and TeachingThe Value of Openness in Research and Teaching
The Value of Openness in Research and TeachingJean-Claude Bradley
 

What's hot (17)

Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...
Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...
Blending ammonia in nitrogen: A facile synthesis strategy of nitrogen-doped c...
 
Separacion de H2S
Separacion de H2SSeparacion de H2S
Separacion de H2S
 
Ag (5 8) bonacickoutecky2001
Ag (5 8) bonacickoutecky2001Ag (5 8) bonacickoutecky2001
Ag (5 8) bonacickoutecky2001
 
Ccprice aps2020 interfacial_electromechanics_pvsk
Ccprice aps2020 interfacial_electromechanics_pvskCcprice aps2020 interfacial_electromechanics_pvsk
Ccprice aps2020 interfacial_electromechanics_pvsk
 
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...
Simultaneousnonlinear two dimensional modeling of tubular reactor of hydrogen...
 
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...
3D-Printing of Three-Dimensional Graphene Aerogels with Periodic Macropores f...
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
General Concepts in QSAR for Using the QSAR Application Toolbox Part 2
 
DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2DavidWooChemEResearchPosterv2
DavidWooChemEResearchPosterv2
 
Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)Quantitative Structure Activity Relationship (QSAR)
Quantitative Structure Activity Relationship (QSAR)
 
Pb3426462649
Pb3426462649Pb3426462649
Pb3426462649
 
CHE 611 Presentation
CHE 611 PresentationCHE 611 Presentation
CHE 611 Presentation
 
Bradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA weekBradley Open Notebook Science Georgia Tech OA week
Bradley Open Notebook Science Georgia Tech OA week
 
ASGC Presentation
ASGC PresentationASGC Presentation
ASGC Presentation
 
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
General Concepts in QSAR for Using the QSAR Application Toolbox Part 1
 
Computational chemistry
Computational chemistryComputational chemistry
Computational chemistry
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolbox
 
The Value of Openness in Research and Teaching
The Value of Openness in Research and TeachingThe Value of Openness in Research and Teaching
The Value of Openness in Research and Teaching
 

Viewers also liked

NP nomenament Rosa Mª Lleal a la delegació del Vallès Oriental
NP nomenament Rosa Mª Lleal a la delegació del Vallès OrientalNP nomenament Rosa Mª Lleal a la delegació del Vallès Oriental
NP nomenament Rosa Mª Lleal a la delegació del Vallès OrientalCambra de Comerç de Barcelona
 
ALEKS CAN KAPTAÇ CV
ALEKS CAN KAPTAÇ CVALEKS CAN KAPTAÇ CV
ALEKS CAN KAPTAÇ CVAlex Ck
 
80 bai tap khao sat ham so trong de thi dai hoc và cao dang
80 bai tap khao sat ham so trong de thi dai hoc và cao dang80 bai tap khao sat ham so trong de thi dai hoc và cao dang
80 bai tap khao sat ham so trong de thi dai hoc và cao dangHoàng Thái Việt
 
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11Hoàng Thái Việt
 
Kütüphanecilik ve arşivcilik
Kütüphanecilik ve arşivcilikKütüphanecilik ve arşivcilik
Kütüphanecilik ve arşivcilikbozokkutuphane
 
Açık erişim kaynakları
Açık erişim kaynaklarıAçık erişim kaynakları
Açık erişim kaynaklarıbozokkutuphane
 
Marketing & Storytelling with Testimonials
Marketing & Storytelling with TestimonialsMarketing & Storytelling with Testimonials
Marketing & Storytelling with TestimonialsLaura Monroe
 
6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutesChad Hart
 
SUSHIL_KUMAR_PANDEY
SUSHIL_KUMAR_PANDEYSUSHIL_KUMAR_PANDEY
SUSHIL_KUMAR_PANDEYrishu sushil
 
SE_Lec 00_ Software Engineering 1
SE_Lec 00_ Software Engineering 1SE_Lec 00_ Software Engineering 1
SE_Lec 00_ Software Engineering 1Amr E. Mohamed
 
Business communication in banking sector
Business communication in banking sectorBusiness communication in banking sector
Business communication in banking sectorLahore
 

Viewers also liked (15)

NP nomenament Rosa Mª Lleal a la delegació del Vallès Oriental
NP nomenament Rosa Mª Lleal a la delegació del Vallès OrientalNP nomenament Rosa Mª Lleal a la delegació del Vallès Oriental
NP nomenament Rosa Mª Lleal a la delegació del Vallès Oriental
 
CV de Diana Álvarez
CV de Diana ÁlvarezCV de Diana Álvarez
CV de Diana Álvarez
 
ALEKS CAN KAPTAÇ CV
ALEKS CAN KAPTAÇ CVALEKS CAN KAPTAÇ CV
ALEKS CAN KAPTAÇ CV
 
Quijotadas Iii
Quijotadas IiiQuijotadas Iii
Quijotadas Iii
 
80 bai tap khao sat ham so trong de thi dai hoc và cao dang
80 bai tap khao sat ham so trong de thi dai hoc và cao dang80 bai tap khao sat ham so trong de thi dai hoc và cao dang
80 bai tap khao sat ham so trong de thi dai hoc và cao dang
 
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11
Tổng hợp lý thuyết và bài tập cơ bản nâng cao hóa học 11
 
Kütüphanecilik ve arşivcilik
Kütüphanecilik ve arşivcilikKütüphanecilik ve arşivcilik
Kütüphanecilik ve arşivcilik
 
Açık erişim kaynakları
Açık erişim kaynaklarıAçık erişim kaynakları
Açık erişim kaynakları
 
Marketing & Storytelling with Testimonials
Marketing & Storytelling with TestimonialsMarketing & Storytelling with Testimonials
Marketing & Storytelling with Testimonials
 
6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes6 Months of WebRTC in 10 minutes
6 Months of WebRTC in 10 minutes
 
SUSHIL_KUMAR_PANDEY
SUSHIL_KUMAR_PANDEYSUSHIL_KUMAR_PANDEY
SUSHIL_KUMAR_PANDEY
 
Merchandising Accounting
Merchandising AccountingMerchandising Accounting
Merchandising Accounting
 
SE_Lec 00_ Software Engineering 1
SE_Lec 00_ Software Engineering 1SE_Lec 00_ Software Engineering 1
SE_Lec 00_ Software Engineering 1
 
Business communication in banking sector
Business communication in banking sectorBusiness communication in banking sector
Business communication in banking sector
 
5 behavior theory 15092558 (1)
5 behavior theory 15092558 (1)5 behavior theory 15092558 (1)
5 behavior theory 15092558 (1)
 

Similar to Capturing Chemistry In XML

Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. BasicsMobiliuz
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics IIbaoilleach
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic ChemistryIsamu Katsuyama
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Modelsmtingle
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...Ichigaku Takigawa
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit MinSung Kim
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using GromacsRajendra K Labala
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirKAUSHAL SAHU
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...ManavBhugun3
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force fieldJose Luis
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IJon Paul Janet
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsRAKESH JAGTAP
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemOmar Qasim
 

Similar to Capturing Chemistry In XML (20)

Poster_Jun 2014
Poster_Jun 2014Poster_Jun 2014
Poster_Jun 2014
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Computational Organic Chemistry
Computational Organic ChemistryComputational Organic Chemistry
Computational Organic Chemistry
 
AWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion ModelsAWMA Presentation Application of Two State-of-the-art Dispersion Models
AWMA Presentation Application of Two State-of-the-art Dispersion Models
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using Gromacs
 
Cheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sirCheminformatics, concept by kk sahu sir
Cheminformatics, concept by kk sahu sir
 
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
Lecture_No._2_Computational_Chemistry_Tools___Application_of_computational_me...
 
A01 9-1
A01 9-1A01 9-1
A01 9-1
 
Parameterization of force field
Parameterization of force fieldParameterization of force field
Parameterization of force field
 
Machine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part IMachine Learning in Chemistry: Part I
Machine Learning in Chemistry: Part I
 
molecular mechanics and quantum mechnics
molecular mechanics and quantum mechnicsmolecular mechanics and quantum mechnics
molecular mechanics and quantum mechnics
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Hydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive systemHydrogen fuel cells for the automotive system
Hydrogen fuel cells for the automotive system
 

Recently uploaded

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Capturing Chemistry In XML

  • 1. Capturing Chemistry in XML/CML J. A. Townsend * , S. E. Adams * , J. M. Goodman * , P. Murray-Rust * , C. A. Waudby * Capturing Chemistry in XML/CML ACS March 2004 * Unilever Centre for Molecular Informatics, University of Cambridge
  • 2. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World
  • 3. The Agony Of Publication - Loss Capturing Chemistry in XML/CML ACS March 2004 The World Sad The Scientist The Lab Journals Web Pages
  • 4. The Vision-1 Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> mp 65-66  C Human-readable Machine-readable
  • 5.
  • 6.
  • 7. Machine Parsing of Chemistry Capturing Chemistry in XML/CML ACS March 2004 Structured (CompChem) Semi-Structured (Articles) Unstructured (Discussion) Structured documents and data in XML MACHINE PARSING ?
  • 8. How? Abstract Discussion Experimental Capturing Chemistry in XML/CML ACS March 2004 Article semi- structured Add Structure Parse with Regular Expressions Legacy to CML converters
  • 9.
  • 10.
  • 11. The CML Family Controlled XMLNamespaces: CMLCore – compounds and properties CMLReact – reactions CMLSpect – spectra * CMLComp – compChem CMLCryst – crystallography and condensed matter Interoperates with HTML, MathML, SVG, * AniML + , * ThermoML $ , etc. Capturing Chemistry in XML/CML ACS March 2004 + spectra: ANSI/JCAMP $ thermochemistry: NIST J. Chem. Inf. Comp. Sci., 2003 , 43 , 757
  • 12. Case Studies Parsing output from 750,000 MOPAC jobs High-throughput parsing of journals Capturing Chemistry in XML/CML ACS March 2004
  • 13. CompChem Logs Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Point Group Dipole Total Energy
  • 14. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 15. Loss From CompChem Capturing Chemistry in XML/CML ACS March 2004 Coordinates Molecular Formula Calculation Type Dipole Total Energy Ionisation Potential
  • 16. Parsing Data CompChem Output Capturing Chemistry in XML/CML ACS March 2004 Coordinates Energy Levels Vibrations Coordinates Energy Level Vibration CML File CMLCore CMLCore CMLComp CMLSpect Input/jobControl General Parsers
  • 17. Display Process 1 Capturing Chemistry in XML/CML ACS March 2004 CompChem Log Xindice CML XSLT
  • 18. Display Process 2 Capturing Chemistry in XML/CML ACS March 2004 CML File CMLCore CMLCore CMLComp CMLSpect compChem Output 3D structure, electronic properties Coordinates Energy Levels Vibrations Input/jobControl XSLT Display Normal modes 2D structure, thermodynamic properties
  • 19. Parsing Data Capturing Chemistry in XML/CML ACS March 2004 Dictionary Entry: The pointgroup of a molecule ... The Schoenflies convention is normally used, but Hermann Mauguin is also allowed. D [debye] ParentSI: c.m Multiplier: 3.335641E-30 CGS units for electric dipole
  • 20. Dictionaries Capturing Chemistry in XML/CML ACS March 2004 < scalar dictRef =“ ccml:mp ” units =“units:c” minValue =“65” maxValue =“66” /> Linked to CML schema Accesses CCML namespace Units dictionary id =&quot;celsius&quot; name =&quot;Celsius&quot; parentSI =&quot;k&quot; multiplierToSI =&quot;1&quot; constantToSI =&quot;273.15&quot; abbreviation =&quot;C&quot; unitType =&quot;temp&quot; id =&quot;meltrange&quot; term =&quot;Melting range&quot; definition =&quot;Minimum and maximum values of melting range in degrees Celsius&quot;
  • 21. OSCAR Open Source Chemistry Analysis Routines Capturing Chemistry in XML/CML ACS March 2004 Sponsored by the Royal Society of Chemistry (Cambridge) Mounted on http://www.rsc.org/
  • 22. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 23. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 24. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 25. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 26. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Article
  • 27. Article Structure Capturing Chemistry in XML/CML ACS March 2004 Front Matter Abstract Introduction Discussion Experimental References Results Synthesis Set up Analysis Compound Name Article Experimental
  • 28.
  • 29. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004 H NMR Nature HRMS
  • 30. OSCAR Parsing Data Capturing Chemistry in XML/CML ACS March 2004
  • 31. OSCAR Data Found Capturing Chemistry in XML/CML ACS March 2004 Results from one paper
  • 32. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 Serious Error Warning Type 1 Warning Type 2
  • 33. OSCAR Error Checking Capturing Chemistry in XML/CML ACS March 2004 ~30 errors / warnings searched for This article has: 4 errors 2 warnings (type 1) 30 warnings (type 2) Elemental analysis, incorrect – calculations are for a different molecular formula
  • 34. OSCAR Data Presentation Capturing Chemistry in XML/CML ACS March 2004
  • 35. OSCAR Speed Capturing Chemistry in XML/CML ACS March 2004 A typical paper contains ca. 20 compounds JOC (Feb 2004) contains ~600 compounds OSCAR could extract and tabulate in under 5 minutes OBC (Feb 2004) contains ~300 compounds OSCAR could extract and tabulate in under 3 minutes High throughput, high precision
  • 36. OSCAR Accuracy Capturing Chemistry in XML/CML ACS March 2004 92 % of Data Correctly Identified 3 % incorrect author entry 5 % missed 437 items, ~10,000 data fields in test set, working with current Regular Expressions False-positives: 3 %
  • 37. XML-CML Databases Capturing Chemistry in XML/CML ACS March 2004 CML Journals Theses CompChem XMLDb can support > 250,000 molecules Millisecond retrieval on INChI, properties Xindice
  • 38.
  • 39. NLP & Parsing Names Capturing Chemistry in XML/CML ACS March 2004 KEY: Locant Characteristic Group Mono valent parent hydride Multiplier Heterocyclic parent hydride
  • 40. Thank You Unilever RSC Jonathan Goodman Sam Adams Fraser Norton Chris Waudby Yong Zhang Capturing Chemistry in XML/CML ACS March 2004