SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
minutes
Python for Chemistry in 21 days



                Dr. Noel O'Boyle

   Dr. John Mitchell and Prof. Peter Murray-Rust

                 UCC Talk, Sept 2005
 Available at http://www-mitchell.ch.cam.ac.uk/noel/
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
●   This talk will not cover
     –   how to program in Python
●   See references at end of talk



                       Word of warning!!
“My mental eye could now distinguish larger structures, of
manifold conformation; long rows, sometimes more closely
fitted together; all twining and twisting in snakelike motion.
But look! What was that? One of the snakes had seized
hold of its own tail, and the form whirled mockingly before
my eyes. As if by the flash of lightning I awoke...Let us learn
to dream, gentlemen”
                          Friedrich August Kekulé (1829-1896)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
    –   named after Cambridge comedians
The Great Debate

Sir Lancelot:
We were in the nick of time. You were in great Perl.
Sir Galahad:
I don't think I was.
Sir Lancelot:
You were, Sir Galahad. You were in terrible Perl.
Sir Galahad:
Look, let me go back in there and face the Perl.
Sir Lancelot:
No, it's too perilous.

                      (adapted from Monty Python and the Holy Grail)
Why you should be interested (1)

●   Python has been adopted by the cheminformatics community
●   For example, AstraZeneca has moved some of its codebase
    from 'the other scripting language' to Python


Job Description – Research Software Developer/Informatician
[section deleted]
Required Skills:

At least one object oriented programming language, e.g., Python, C++, Java.
Web-based application development (design/construction/maintenance)
UNIX, UNIX scripting & Linux OS

                            Position in AstraZeneca R&D, 02-09-05
Why you should be interested (2)

●   Scientific computing: scipy/pylab (like Matlab)
●   Molecular dynamics: MMTK
●   Statistics: scipy, rpy (R), pychem
●   3D-visualisation: VTK (mayavi)
●   2D-visualisation: scipy, pylab, rpy
●   coming soon, a wrapper around OpenBabel
●   cheminformatics: OEChem, frowns, PyDaylight, pychem
●   bioinformatics: BioPython
●   structural biology: PyMOL
●   computational chemistry: GaussSum
●   you can still use Java libraries...like the CDK
>>> from scipy import *
         >>> from pylab import *
         >>> a = arange(0,4)       > a <- seq(0,3)
         >>> a                     >a
scipy/                                               R
         [0,1,2,3]                 [1] 0 1 2 3
pylab    >>> mean(a)               > mean(a)
         1.5                       [1] 1.5
         >>> a**2                  > a**2
         [0,1,4,9]                 [1] 0 1 4 9
         >>> plot(a,a**2)          > plot(a,a**2)
         >>> show()                > lines(a,a**2)
Scipy
–   cluster: information theory functions (currently, vq and kmeans)
–   weave: compilation of numeric expressions to C++ for fast execution
–   cow: parallel programming via a Cluster Of Workstations
–   fftpack: fast Fourier transform module based on fftpack and fftw when
    available
–   ga: genetic algorithms
–   io: reading and writing numeric arrays, MATLAB .mat, and Matrix
    Market .mtx files
–   integrate: numeric integration for bounded and unbounded ranges. ODE
    solvers.
–   interpolate: interpolation of values from a sample data set.
–   optimize: constrained and unconstrained optimization methods and
    root-finding algorithms
–   signal: signal processing (1-D and 2-D filtering, filter design, LTI
    systems, etc.)
–   special: special function types (bessel, gamma, airy, etc.)
–   stats: statistical functions (stdev, var, mean, etc.)
–   linalg: linear algebra and BLAS routines based on the ATLAS
    implementation of LAPACK
–   sparse: Some sparse matrix support. LU factorization and solving
    sparse linear systems
Scipy statistical functions

●   descriptive statistics: variance, standard deviation, standard error, mean,
    mode, median
●   correlation: Pearson r, Spearman r, Kendall tau
●   statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal,
    Kolmogorov-Smirnov, Anderson, etc.

●   linear regression

●   analysis of variance (ANOVA)

●   (and more)
pylab
pychem: Using scipy for chemoinformatics

●   Many multivariate analysis techniques are based on matrix
    algebra
●   scipy has wrappers around well-known C and Fortran
    numerical libraries (ATLAS, LAPACK)
●   pychem can do:
     –   principal component analysis, partial least squares
         regression, Fisher's discriminant analysis
     –   clustering: k-means, hierarchical (used Open Source
         clustering library)
     –   feature selection: genetic algorithms with PLS
     –   additional methods can be added
Python and R
  ●   Advantages of R:
       –   a large number of statistical libraries are available
  ●   Disadvantages of R:
       –   difficult to write algorithms
       –   slow (most R libraries are written in C)
       –   chokes on large datasets (use scan instead of read.table)

           Reading in data                   Principal component analysis
Method         300K    600K    1.6M        Method         300K    600K     1.6M
Python             6.8    13.9      41     Python             2.2      3.6      42
R (read.table)      42     105             R (read.table)       5       10
R (scan)             9      20      56     R (scan)             3        5      29




                                http://www.redbrick.dcu.ie/~noel/RversusPython.html
Python and R
●   rpy module allows Python programs to interface with R
●   have the best of both worlds
     –   access to the statistical functions of R
     –   access to the numerous modules available for Python
     –   can program in Python, instead of in R!!

             Python                                    R
>>> from rpy import r
>>> x = [5.05, 6.75, 3.21, 2.66]         > x <- c(5.05, 6.75, 3.21, 2.66)
>>> y = [1.65, 26.5, -5.93, 7.96]        > y <- c(1.65, 26.5, -5.93, 7.96)
>>> print r.lsfit(x,y)['coefficients']   > lsfit(x, y)$coefficients
{'X': 5.3935773611970212,                 Intercept        X
 'Intercept': -16.281127993087839}       -16.281128 5.393577
Python and R
                                             > hc$merge
Problem: Analyse a hierarchical clustering         [,1] [,2]
                                               [1,] -32 -33
Solution: Use R to cluster, and Python to      [2,] -39 -71
analyse the merge object of the cluster        [3,] -43 -47
                                               [4,] -10 -55
                                               [5,] -19 -36
                                               [6,] -5 -24
                                               [7,] -62 -63
                                               [8,] -74 -75
                                               [9,] -35 -76
                                              [10,] -1 -84
                                              [11,] -41 -42
                                              [12,] -83 -96
                                              [13,] -2 -29
                                              [14,] -7 -21
                                              [15,] -61 4
Problem 1
Graphically show the distribution of molecular weights of
molecules in an SD file. The molecular weight is stored in
a field of the SD file.

1,2-Diaminoethane
  MOE2004           3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Object-oriented approach

 object                        SD file



 object           Molecule 1      Molecule 2      Molecule n


attributes                               fields
               name        atoms

  The code for creating these objects is stored in sdparser.py,
  which can be imported and used by any scripts that need to
  parse SD files.
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Problem 2
Every molecule in an SD file is missing the name. To be
compatible with proprietary program X, we need to set the
name equal to the value of the field “chem.name”.

(MISSING NAME!)
  MOE2004          3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Python and Java
 ●  It's easy to use Java libraries from Python
      – using either Jython or JPype
      – see http://www.redbrick.dcu.ie/~noel/CDKJython.html

Example: using the CDK to calculate the number of rings in a molecule
(given a string variable containing CML)
from jpype import *
startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so")
cdk = JPackage("org").openscience.cdk
SSSRFinder = cdk.ringsearch.SSSRFinder
CMLReader = cdk.io.CMLReader

def getNumRings(molecule):
  # Convert to a CDK molecule
  reader = CMLReader(java.io.StringReader(molXmlValue))
  chemFile = reader.read(cdk.ChemFile())
  cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0)
  # Calculate the number of rings
  sssrFinder = SSSRFinder(cdkMol)
  sssr = sssrFinder.findSSSR().size()
  return sssr
3D visualisation
●   VTK (Visualisation Toolkit) from Kitware
    –   open source, freely available
    –   scalar, tensor, vector and volumetric methods
    –   advanced modeling techniques such as implicit modelling,
        polygon reduction, mesh smoothing, cutting, contouring, and
        Delaunay triangulation
●   MayaVi
    –   easy to use GUI interface to VTK, written in Python
    –   can create input files and visualise them using Python scripts

                                 Demo
Python Resources
●   http://www.python.org
●   Guido's Tutorial
     –   http://www.python.org/doc/current/tut/tut.html
●   O'Reilly's “Learning Python” or Visual Quickstart Guide
    to Python
     –   Make sure it's Python 2.3 or 2.4 though
●   For Windows, consider the Enthought edition
     –   http://www.enthought.com/
Python for Chemistry

Contenu connexe

Tendances

Tendances (20)

Quantum Chemistry
Quantum ChemistryQuantum Chemistry
Quantum Chemistry
 
IR Spectroscopy
IR SpectroscopyIR Spectroscopy
IR Spectroscopy
 
Mossbauer Spectroscopy
Mossbauer SpectroscopyMossbauer Spectroscopy
Mossbauer Spectroscopy
 
Biomedical Instrumentation
Biomedical InstrumentationBiomedical Instrumentation
Biomedical Instrumentation
 
Superconductor
SuperconductorSuperconductor
Superconductor
 
Optical properties of semiconductors ppt
Optical properties of semiconductors pptOptical properties of semiconductors ppt
Optical properties of semiconductors ppt
 
NUCLEAR QUADRUPOLE RESONANCE SPECTROSCOPY
NUCLEAR QUADRUPOLE RESONANCE SPECTROSCOPY NUCLEAR QUADRUPOLE RESONANCE SPECTROSCOPY
NUCLEAR QUADRUPOLE RESONANCE SPECTROSCOPY
 
Raman spectroscopy
Raman spectroscopy Raman spectroscopy
Raman spectroscopy
 
Hyperfine splitting
Hyperfine splittingHyperfine splitting
Hyperfine splitting
 
Signals and noise
Signals and noiseSignals and noise
Signals and noise
 
Nmr soni
Nmr soniNmr soni
Nmr soni
 
Dye sensitized solar cells
Dye sensitized solar cellsDye sensitized solar cells
Dye sensitized solar cells
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Postulates of quantum mechanics & operators
Postulates of quantum mechanics & operatorsPostulates of quantum mechanics & operators
Postulates of quantum mechanics & operators
 
Debye huckle theory
Debye huckle theoryDebye huckle theory
Debye huckle theory
 
Photolithography
PhotolithographyPhotolithography
Photolithography
 
Laser applications in biomedical field
Laser applications in biomedical fieldLaser applications in biomedical field
Laser applications in biomedical field
 
Superconductors
Superconductors Superconductors
Superconductors
 
Apps of thin films
Apps of thin filmsApps of thin films
Apps of thin films
 
Uv-vis Spectroscopy
Uv-vis SpectroscopyUv-vis Spectroscopy
Uv-vis Spectroscopy
 

Similaire à Python for Chemistry

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
A Jorge Garcia
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
ActiveState
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
g3_nittala
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
tirlukachaitanya
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
rtelmore
 

Similaire à Python for Chemistry (20)

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 

Plus de baoilleach

Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
baoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
baoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
baoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
baoilleach
 

Plus de baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 

Python for Chemistry

  • 1. minutes Python for Chemistry in 21 days Dr. Noel O'Boyle Dr. John Mitchell and Prof. Peter Murray-Rust UCC Talk, Sept 2005 Available at http://www-mitchell.ch.cam.ac.uk/noel/
  • 2. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry
  • 3. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry ● This talk will not cover – how to program in Python ● See references at end of talk Word of warning!!
  • 4. “My mental eye could now distinguish larger structures, of manifold conformation; long rows, sometimes more closely fitted together; all twining and twisting in snakelike motion. But look! What was that? One of the snakes had seized hold of its own tail, and the form whirled mockingly before my eyes. As if by the flash of lightning I awoke...Let us learn to dream, gentlemen” Friedrich August Kekulé (1829-1896)
  • 5. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented
  • 6. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!)
  • 7. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!) – named after Cambridge comedians
  • 8. The Great Debate Sir Lancelot: We were in the nick of time. You were in great Perl. Sir Galahad: I don't think I was. Sir Lancelot: You were, Sir Galahad. You were in terrible Perl. Sir Galahad: Look, let me go back in there and face the Perl. Sir Lancelot: No, it's too perilous. (adapted from Monty Python and the Holy Grail)
  • 9. Why you should be interested (1) ● Python has been adopted by the cheminformatics community ● For example, AstraZeneca has moved some of its codebase from 'the other scripting language' to Python Job Description – Research Software Developer/Informatician [section deleted] Required Skills: At least one object oriented programming language, e.g., Python, C++, Java. Web-based application development (design/construction/maintenance) UNIX, UNIX scripting & Linux OS Position in AstraZeneca R&D, 02-09-05
  • 10. Why you should be interested (2) ● Scientific computing: scipy/pylab (like Matlab) ● Molecular dynamics: MMTK ● Statistics: scipy, rpy (R), pychem ● 3D-visualisation: VTK (mayavi) ● 2D-visualisation: scipy, pylab, rpy ● coming soon, a wrapper around OpenBabel ● cheminformatics: OEChem, frowns, PyDaylight, pychem ● bioinformatics: BioPython ● structural biology: PyMOL ● computational chemistry: GaussSum ● you can still use Java libraries...like the CDK
  • 11. >>> from scipy import * >>> from pylab import * >>> a = arange(0,4) > a <- seq(0,3) >>> a >a scipy/ R [0,1,2,3] [1] 0 1 2 3 pylab >>> mean(a) > mean(a) 1.5 [1] 1.5 >>> a**2 > a**2 [0,1,4,9] [1] 0 1 4 9 >>> plot(a,a**2) > plot(a,a**2) >>> show() > lines(a,a**2)
  • 12. Scipy – cluster: information theory functions (currently, vq and kmeans) – weave: compilation of numeric expressions to C++ for fast execution – cow: parallel programming via a Cluster Of Workstations – fftpack: fast Fourier transform module based on fftpack and fftw when available – ga: genetic algorithms – io: reading and writing numeric arrays, MATLAB .mat, and Matrix Market .mtx files – integrate: numeric integration for bounded and unbounded ranges. ODE solvers. – interpolate: interpolation of values from a sample data set. – optimize: constrained and unconstrained optimization methods and root-finding algorithms – signal: signal processing (1-D and 2-D filtering, filter design, LTI systems, etc.) – special: special function types (bessel, gamma, airy, etc.) – stats: statistical functions (stdev, var, mean, etc.) – linalg: linear algebra and BLAS routines based on the ATLAS implementation of LAPACK – sparse: Some sparse matrix support. LU factorization and solving sparse linear systems
  • 13. Scipy statistical functions ● descriptive statistics: variance, standard deviation, standard error, mean, mode, median ● correlation: Pearson r, Spearman r, Kendall tau ● statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal, Kolmogorov-Smirnov, Anderson, etc. ● linear regression ● analysis of variance (ANOVA) ● (and more)
  • 14. pylab
  • 15. pychem: Using scipy for chemoinformatics ● Many multivariate analysis techniques are based on matrix algebra ● scipy has wrappers around well-known C and Fortran numerical libraries (ATLAS, LAPACK) ● pychem can do: – principal component analysis, partial least squares regression, Fisher's discriminant analysis – clustering: k-means, hierarchical (used Open Source clustering library) – feature selection: genetic algorithms with PLS – additional methods can be added
  • 16. Python and R ● Advantages of R: – a large number of statistical libraries are available ● Disadvantages of R: – difficult to write algorithms – slow (most R libraries are written in C) – chokes on large datasets (use scan instead of read.table) Reading in data Principal component analysis Method 300K 600K 1.6M Method 300K 600K 1.6M Python 6.8 13.9 41 Python 2.2 3.6 42 R (read.table) 42 105 R (read.table) 5 10 R (scan) 9 20 56 R (scan) 3 5 29 http://www.redbrick.dcu.ie/~noel/RversusPython.html
  • 17. Python and R ● rpy module allows Python programs to interface with R ● have the best of both worlds – access to the statistical functions of R – access to the numerous modules available for Python – can program in Python, instead of in R!! Python R >>> from rpy import r >>> x = [5.05, 6.75, 3.21, 2.66] > x <- c(5.05, 6.75, 3.21, 2.66) >>> y = [1.65, 26.5, -5.93, 7.96] > y <- c(1.65, 26.5, -5.93, 7.96) >>> print r.lsfit(x,y)['coefficients'] > lsfit(x, y)$coefficients {'X': 5.3935773611970212, Intercept X 'Intercept': -16.281127993087839} -16.281128 5.393577
  • 18. Python and R > hc$merge Problem: Analyse a hierarchical clustering [,1] [,2] [1,] -32 -33 Solution: Use R to cluster, and Python to [2,] -39 -71 analyse the merge object of the cluster [3,] -43 -47 [4,] -10 -55 [5,] -19 -36 [6,] -5 -24 [7,] -62 -63 [8,] -74 -75 [9,] -35 -76 [10,] -1 -84 [11,] -41 -42 [12,] -83 -96 [13,] -2 -29 [14,] -7 -21 [15,] -61 4
  • 19. Problem 1 Graphically show the distribution of molecular weights of molecules in an SD file. The molecular weight is stored in a field of the SD file. 1,2-Diaminoethane MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 20. Object-oriented approach object SD file object Molecule 1 Molecule 2 Molecule n attributes fields name atoms The code for creating these objects is stored in sdparser.py, which can be imported and used by any scripts that need to parse SD files.
  • 21. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 22. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 23. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 24. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 25.
  • 26. Problem 2 Every molecule in an SD file is missing the name. To be compatible with proprietary program X, we need to set the name equal to the value of the field “chem.name”. (MISSING NAME!) MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 27. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 28. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 29. Python and Java ● It's easy to use Java libraries from Python – using either Jython or JPype – see http://www.redbrick.dcu.ie/~noel/CDKJython.html Example: using the CDK to calculate the number of rings in a molecule (given a string variable containing CML) from jpype import * startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so") cdk = JPackage("org").openscience.cdk SSSRFinder = cdk.ringsearch.SSSRFinder CMLReader = cdk.io.CMLReader def getNumRings(molecule): # Convert to a CDK molecule reader = CMLReader(java.io.StringReader(molXmlValue)) chemFile = reader.read(cdk.ChemFile()) cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0) # Calculate the number of rings sssrFinder = SSSRFinder(cdkMol) sssr = sssrFinder.findSSSR().size() return sssr
  • 30. 3D visualisation ● VTK (Visualisation Toolkit) from Kitware – open source, freely available – scalar, tensor, vector and volumetric methods – advanced modeling techniques such as implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation ● MayaVi – easy to use GUI interface to VTK, written in Python – can create input files and visualise them using Python scripts Demo
  • 31. Python Resources ● http://www.python.org ● Guido's Tutorial – http://www.python.org/doc/current/tut/tut.html ● O'Reilly's “Learning Python” or Visual Quickstart Guide to Python – Make sure it's Python 2.3 or 2.4 though ● For Windows, consider the Enthought edition – http://www.enthought.com/