SlideShare une entreprise Scribd logo
1  sur  22
Christos Kannas
University of Cyprus
Department of Computer
Science
Outline
• Introduction
• Related Work
• ESOL
• RDKit based Implementation
• Results
• Correlation Table & Chart
• Conclusion

3rd October, 2013

2nd RDKit UGM

2
Introduction
• Need to estimate the solubility of
molecules in:
• DMSO (CS(=O)C), and
• Water.

• Predictive Models for DMSO and Water
Solubility.

3rd October, 2013

2nd RDKit UGM

3
3rd October, 2013

2nd RDKit UGM

4
Related Work
• J. S. Delaney, “ESOL: Estimating Aqueous
Solubility Directly from Molecular
Structure,” Journal of Chemical
Information and Modeling, vol. 44, no. 3,
pp. 1000–1005, May 2004.

3rd October, 2013

2nd RDKit UGM

5
Related Work: ESOL
• ESOL – Estimated SOLubility
• Linear Regression Model
• 8 Molecular Properties (Initially)
• Preeminent Method: General Solubility
Equation (GSE), logP and melting point
(Tm)

3rd October, 2013

2nd RDKit UGM

6
ESOL: Molecular Properties
(Initial) 1/3
• clogP – Daylight CLOGP v4.72
• MolWeight
• RotBonds – Rotatable Bonds, Daylight
SMARTS structures define rotatable bonds

3rd October, 2013

2nd RDKit UGM

7
ESOL: Molecular Properties
(Initial) 2/3
• Aromatic Proportion (AromProp) – The
proportion of heavy atoms in the molecule
that are in an aromatic ring. Daylight
SMARTS ([a]) aromatic atoms.

• Non-Carbon Proportion – The proportion
of heavy atoms in a molecule that are not
carbon. Daylight SMARTS ([!#6])

3rd October, 2013

2nd RDKit UGM

8
ESOL: Molecular Properties
(Initial) 3/3
• H-bond Donors
• H-bond Acceptors
• Polar Surface Area – Peter Ertl’s Polar
Surface Area

3rd October, 2013

2nd RDKit UGM

9
ESOL: Methodology
• Multiple Linear Regression
• Significance of each parameter based in
terms of its absolute t-statistic.

3rd October, 2013

2nd RDKit UGM

10
ESOL: Train Dataset
• Training Set: 2874 molecules
• Small – Low MolWeight organic
compounds
• Medium – Pesticide products,
MolWeight 200-300
• Large – Sygenta compounds,
MolWeight 300-400

3rd October, 2013

2nd RDKit UGM

11
ESOL: Results
• 4 parameters with t-statistic > 2
• clogP
• MolWeight
• RotBonds
• AromProp

Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromProp

3rd October, 2013

J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular
Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp.
1000–1005, May 2004.
2nd RDKit UGM
12
3rd October, 2013

2nd RDKit UGM

13
RDKit Based Implementation 1/2
• Use Regression Equation:
Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromProp
• Calculate properties using RDKit.

3rd October, 2013

2nd RDKit UGM

14
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

15
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

16
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

17
3rd October, 2013

2nd RDKit UGM

18
Testing…
• Supplementary Dataset:
• 1143 molecules with:
• Measured Water Solubility (logSw)
• ESOL

• Correlation Charts:
• Measured vs ESOL
• Measured vs RDKit_clogSw
• ESOL vs RDKit_clogSw
• Measured vs ESOL vs RDKit_clogSw
3rd October, 2013

2nd RDKit UGM

19
Correlation Table & Chart
IMPORTED_measured log(solubility:mol/L)
IMPORTED_measured log(solubility:mol/L)

IMPORTED_ESOL predicted
log(solubility:mol/L)

clogSw

1

IMPORTED_ESOL predicted log(solubility:mol/L)

0.90794375
0.864718601

clogSw

Predicted log(solubility:mol/L)

0.964683313

Predicted vs Measured

IMPORTED_ESOL predicted
log(solubility:mol/L)
clogSw

-12

1

4

2

Linear (IMPORTED_ESOL predicted
log(solubility:mol/L))
Linear (clogSw)

0
-10

-8

-6

-4

-2

0

2

-2

-4

-6

-8

Measured log(solubility:mol/L)

3rd October, 2013

2nd RDKit UGM

-10

20

1
Conclusion
• Comparable results.
• Easy, fast and relatively accurate.
• What is importance of adding Hydrogens
prior to Aromatic Proportion calculation?

3rd October, 2013

2nd RDKit UGM

21
3rd October, 2013

2nd RDKit UGM

22

Contenu connexe

Similaire à Estimate Water Solubility

Experimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heatExperimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heat
iaemedu
 
Experimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latentExperimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latent
IAEME Publication
 
To study the behavior of nanofluids in heat transfer applications a review
To study the behavior of nanofluids in heat transfer applications  a reviewTo study the behavior of nanofluids in heat transfer applications  a review
To study the behavior of nanofluids in heat transfer applications a review
eSAT Journals
 
Investigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slagInvestigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slag
eSAT Publishing House
 

Similaire à Estimate Water Solubility (20)

Pin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.comPin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.com
 
Fast Scanning Chip Calorimetry
Fast Scanning Chip CalorimetryFast Scanning Chip Calorimetry
Fast Scanning Chip Calorimetry
 
Experimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heatExperimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heat
 
Experimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latentExperimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latent
 
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
Meso- and Microporous Carbon Electrode and Its Effect on the Capacitive, Ene...
Meso- and Microporous Carbon Electrode and Its Effect on the  Capacitive, Ene...Meso- and Microporous Carbon Electrode and Its Effect on the  Capacitive, Ene...
Meso- and Microporous Carbon Electrode and Its Effect on the Capacitive, Ene...
 
Sampling Techniques
Sampling TechniquesSampling Techniques
Sampling Techniques
 
CS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 PresentationCS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 Presentation
 
Formulas and Equations
Formulas and EquationsFormulas and Equations
Formulas and Equations
 
Kinetics Project
Kinetics ProjectKinetics Project
Kinetics Project
 
Lesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdfLesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdf
 
Slides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptxSlides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptx
 
To study the behavior of nanofluids in heat transfer applications a review
To study the behavior of nanofluids in heat transfer applications  a reviewTo study the behavior of nanofluids in heat transfer applications  a review
To study the behavior of nanofluids in heat transfer applications a review
 
5th International Conference : Garvin Heath
5th International Conference : Garvin Heath5th International Conference : Garvin Heath
5th International Conference : Garvin Heath
 
Investigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slagInvestigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slag
 
PVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptxPVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptx
 
Novak2012
Novak2012Novak2012
Novak2012
 
Experimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage SystemExperimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage System
 

Plus de Christos Kannas

Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Christos Kannas
 

Plus de Christos Kannas (13)

CKannas PhD Thesis Slides
CKannas PhD Thesis SlidesCKannas PhD Thesis Slides
CKannas PhD Thesis Slides
 
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
 
CSC2013_LiSIs_poster
CSC2013_LiSIs_posterCSC2013_LiSIs_poster
CSC2013_LiSIs_poster
 
LiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchLiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences Research
 
Diversity Filtering
Diversity FilteringDiversity Filtering
Diversity Filtering
 
LiSIs platform
LiSIs platformLiSIs platform
LiSIs platform
 
LiSIs Poster Presentation
LiSIs Poster PresentationLiSIs Poster Presentation
LiSIs Poster Presentation
 
GCC2013 LiSIs poster
GCC2013 LiSIs posterGCC2013 LiSIs poster
GCC2013 LiSIs poster
 
GCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkGCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning Talk
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
20120615_Granatum_COST_v2
20120615_Granatum_COST_v220120615_Granatum_COST_v2
20120615_Granatum_COST_v2
 
2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA
 
9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Estimate Water Solubility

  • 1. Christos Kannas University of Cyprus Department of Computer Science
  • 2. Outline • Introduction • Related Work • ESOL • RDKit based Implementation • Results • Correlation Table & Chart • Conclusion 3rd October, 2013 2nd RDKit UGM 2
  • 3. Introduction • Need to estimate the solubility of molecules in: • DMSO (CS(=O)C), and • Water. • Predictive Models for DMSO and Water Solubility. 3rd October, 2013 2nd RDKit UGM 3
  • 4. 3rd October, 2013 2nd RDKit UGM 4
  • 5. Related Work • J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004. 3rd October, 2013 2nd RDKit UGM 5
  • 6. Related Work: ESOL • ESOL – Estimated SOLubility • Linear Regression Model • 8 Molecular Properties (Initially) • Preeminent Method: General Solubility Equation (GSE), logP and melting point (Tm) 3rd October, 2013 2nd RDKit UGM 6
  • 7. ESOL: Molecular Properties (Initial) 1/3 • clogP – Daylight CLOGP v4.72 • MolWeight • RotBonds – Rotatable Bonds, Daylight SMARTS structures define rotatable bonds 3rd October, 2013 2nd RDKit UGM 7
  • 8. ESOL: Molecular Properties (Initial) 2/3 • Aromatic Proportion (AromProp) – The proportion of heavy atoms in the molecule that are in an aromatic ring. Daylight SMARTS ([a]) aromatic atoms. • Non-Carbon Proportion – The proportion of heavy atoms in a molecule that are not carbon. Daylight SMARTS ([!#6]) 3rd October, 2013 2nd RDKit UGM 8
  • 9. ESOL: Molecular Properties (Initial) 3/3 • H-bond Donors • H-bond Acceptors • Polar Surface Area – Peter Ertl’s Polar Surface Area 3rd October, 2013 2nd RDKit UGM 9
  • 10. ESOL: Methodology • Multiple Linear Regression • Significance of each parameter based in terms of its absolute t-statistic. 3rd October, 2013 2nd RDKit UGM 10
  • 11. ESOL: Train Dataset • Training Set: 2874 molecules • Small – Low MolWeight organic compounds • Medium – Pesticide products, MolWeight 200-300 • Large – Sygenta compounds, MolWeight 300-400 3rd October, 2013 2nd RDKit UGM 11
  • 12. ESOL: Results • 4 parameters with t-statistic > 2 • clogP • MolWeight • RotBonds • AromProp Log(Sw) = 0.16 - 0.63 x clogP - 0.0062 x MolWeight + 0.066 x RotBonds - 0.74 x AromProp 3rd October, 2013 J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004. 2nd RDKit UGM 12
  • 13. 3rd October, 2013 2nd RDKit UGM 13
  • 14. RDKit Based Implementation 1/2 • Use Regression Equation: Log(Sw) = 0.16 - 0.63 x clogP - 0.0062 x MolWeight + 0.066 x RotBonds - 0.74 x AromProp • Calculate properties using RDKit. 3rd October, 2013 2nd RDKit UGM 14
  • 15. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 15
  • 16. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 16
  • 17. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 17
  • 18. 3rd October, 2013 2nd RDKit UGM 18
  • 19. Testing… • Supplementary Dataset: • 1143 molecules with: • Measured Water Solubility (logSw) • ESOL • Correlation Charts: • Measured vs ESOL • Measured vs RDKit_clogSw • ESOL vs RDKit_clogSw • Measured vs ESOL vs RDKit_clogSw 3rd October, 2013 2nd RDKit UGM 19
  • 20. Correlation Table & Chart IMPORTED_measured log(solubility:mol/L) IMPORTED_measured log(solubility:mol/L) IMPORTED_ESOL predicted log(solubility:mol/L) clogSw 1 IMPORTED_ESOL predicted log(solubility:mol/L) 0.90794375 0.864718601 clogSw Predicted log(solubility:mol/L) 0.964683313 Predicted vs Measured IMPORTED_ESOL predicted log(solubility:mol/L) clogSw -12 1 4 2 Linear (IMPORTED_ESOL predicted log(solubility:mol/L)) Linear (clogSw) 0 -10 -8 -6 -4 -2 0 2 -2 -4 -6 -8 Measured log(solubility:mol/L) 3rd October, 2013 2nd RDKit UGM -10 20 1
  • 21. Conclusion • Comparable results. • Easy, fast and relatively accurate. • What is importance of adding Hydrogens prior to Aromatic Proportion calculation? 3rd October, 2013 2nd RDKit UGM 21
  • 22. 3rd October, 2013 2nd RDKit UGM 22

Notes de l'éditeur

  1. Preeminent == Best, Leading
  2. Rhombi