SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
minutes
Python for Chemistry in 21 days



                Dr. Noel O'Boyle

   Dr. John Mitchell and Prof. Peter Murray-Rust

                 UCC Talk, Sept 2005
 Available at http://www-mitchell.ch.cam.ac.uk/noel/
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
●   This talk will not cover
     –   how to program in Python
●   See references at end of talk



                       Word of warning!!
“My mental eye could now distinguish larger structures, of
manifold conformation; long rows, sometimes more closely
fitted together; all twining and twisting in snakelike motion.
But look! What was that? One of the snakes had seized
hold of its own tail, and the form whirled mockingly before
my eyes. As if by the flash of lightning I awoke...Let us learn
to dream, gentlemen”
                          Friedrich August Kekulé (1829-1896)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
    –   named after Cambridge comedians
The Great Debate

Sir Lancelot:
We were in the nick of time. You were in great Perl.
Sir Galahad:
I don't think I was.
Sir Lancelot:
You were, Sir Galahad. You were in terrible Perl.
Sir Galahad:
Look, let me go back in there and face the Perl.
Sir Lancelot:
No, it's too perilous.

                      (adapted from Monty Python and the Holy Grail)
Why you should be interested (1)

●   Python has been adopted by the cheminformatics community
●   For example, AstraZeneca has moved some of its codebase
    from 'the other scripting language' to Python


Job Description – Research Software Developer/Informatician
[section deleted]
Required Skills:

At least one object oriented programming language, e.g., Python, C++, Java.
Web-based application development (design/construction/maintenance)
UNIX, UNIX scripting & Linux OS

                            Position in AstraZeneca R&D, 02-09-05
Why you should be interested (2)

●   Scientific computing: scipy/pylab (like Matlab)
●   Molecular dynamics: MMTK
●   Statistics: scipy, rpy (R), pychem
●   3D-visualisation: VTK (mayavi)
●   2D-visualisation: scipy, pylab, rpy
●   coming soon, a wrapper around OpenBabel
●   cheminformatics: OEChem, frowns, PyDaylight, pychem
●   bioinformatics: BioPython
●   structural biology: PyMOL
●   computational chemistry: GaussSum
●   you can still use Java libraries...like the CDK
>>> from scipy import *
         >>> from pylab import *
         >>> a = arange(0,4)       > a <- seq(0,3)
         >>> a                     >a
scipy/                                               R
         [0,1,2,3]                 [1] 0 1 2 3
pylab    >>> mean(a)               > mean(a)
         1.5                       [1] 1.5
         >>> a**2                  > a**2
         [0,1,4,9]                 [1] 0 1 4 9
         >>> plot(a,a**2)          > plot(a,a**2)
         >>> show()                > lines(a,a**2)
Scipy
–   cluster: information theory functions (currently, vq and kmeans)
–   weave: compilation of numeric expressions to C++ for fast execution
–   cow: parallel programming via a Cluster Of Workstations
–   fftpack: fast Fourier transform module based on fftpack and fftw when
    available
–   ga: genetic algorithms
–   io: reading and writing numeric arrays, MATLAB .mat, and Matrix
    Market .mtx files
–   integrate: numeric integration for bounded and unbounded ranges. ODE
    solvers.
–   interpolate: interpolation of values from a sample data set.
–   optimize: constrained and unconstrained optimization methods and
    root-finding algorithms
–   signal: signal processing (1-D and 2-D filtering, filter design, LTI
    systems, etc.)
–   special: special function types (bessel, gamma, airy, etc.)
–   stats: statistical functions (stdev, var, mean, etc.)
–   linalg: linear algebra and BLAS routines based on the ATLAS
    implementation of LAPACK
–   sparse: Some sparse matrix support. LU factorization and solving
    sparse linear systems
Scipy statistical functions

●   descriptive statistics: variance, standard deviation, standard error, mean,
    mode, median
●   correlation: Pearson r, Spearman r, Kendall tau
●   statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal,
    Kolmogorov-Smirnov, Anderson, etc.

●   linear regression

●   analysis of variance (ANOVA)

●   (and more)
pylab
pychem: Using scipy for chemoinformatics

●   Many multivariate analysis techniques are based on matrix
    algebra
●   scipy has wrappers around well-known C and Fortran
    numerical libraries (ATLAS, LAPACK)
●   pychem can do:
     –   principal component analysis, partial least squares
         regression, Fisher's discriminant analysis
     –   clustering: k-means, hierarchical (used Open Source
         clustering library)
     –   feature selection: genetic algorithms with PLS
     –   additional methods can be added
Python and R
  ●   Advantages of R:
       –   a large number of statistical libraries are available
  ●   Disadvantages of R:
       –   difficult to write algorithms
       –   slow (most R libraries are written in C)
       –   chokes on large datasets (use scan instead of read.table)

           Reading in data                   Principal component analysis
Method         300K    600K    1.6M        Method         300K    600K     1.6M
Python             6.8    13.9      41     Python             2.2      3.6      42
R (read.table)      42     105             R (read.table)       5       10
R (scan)             9      20      56     R (scan)             3        5      29




                                http://www.redbrick.dcu.ie/~noel/RversusPython.html
Python and R
●   rpy module allows Python programs to interface with R
●   have the best of both worlds
     –   access to the statistical functions of R
     –   access to the numerous modules available for Python
     –   can program in Python, instead of in R!!

             Python                                    R
>>> from rpy import r
>>> x = [5.05, 6.75, 3.21, 2.66]         > x <- c(5.05, 6.75, 3.21, 2.66)
>>> y = [1.65, 26.5, -5.93, 7.96]        > y <- c(1.65, 26.5, -5.93, 7.96)
>>> print r.lsfit(x,y)['coefficients']   > lsfit(x, y)$coefficients
{'X': 5.3935773611970212,                 Intercept        X
 'Intercept': -16.281127993087839}       -16.281128 5.393577
Python and R
                                             > hc$merge
Problem: Analyse a hierarchical clustering         [,1] [,2]
                                               [1,] -32 -33
Solution: Use R to cluster, and Python to      [2,] -39 -71
analyse the merge object of the cluster        [3,] -43 -47
                                               [4,] -10 -55
                                               [5,] -19 -36
                                               [6,] -5 -24
                                               [7,] -62 -63
                                               [8,] -74 -75
                                               [9,] -35 -76
                                              [10,] -1 -84
                                              [11,] -41 -42
                                              [12,] -83 -96
                                              [13,] -2 -29
                                              [14,] -7 -21
                                              [15,] -61 4
Problem 1
Graphically show the distribution of molecular weights of
molecules in an SD file. The molecular weight is stored in
a field of the SD file.

1,2-Diaminoethane
  MOE2004           3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Object-oriented approach

 object                        SD file



 object           Molecule 1      Molecule 2      Molecule n


attributes                               fields
               name        atoms

  The code for creating these objects is stored in sdparser.py,
  which can be imported and used by any scripts that need to
  parse SD files.
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Problem 2
Every molecule in an SD file is missing the name. To be
compatible with proprietary program X, we need to set the
name equal to the value of the field “chem.name”.

(MISSING NAME!)
  MOE2004          3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Python and Java
 ●  It's easy to use Java libraries from Python
      – using either Jython or JPype
      – see http://www.redbrick.dcu.ie/~noel/CDKJython.html

Example: using the CDK to calculate the number of rings in a molecule
(given a string variable containing CML)
from jpype import *
startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so")
cdk = JPackage("org").openscience.cdk
SSSRFinder = cdk.ringsearch.SSSRFinder
CMLReader = cdk.io.CMLReader

def getNumRings(molecule):
  # Convert to a CDK molecule
  reader = CMLReader(java.io.StringReader(molXmlValue))
  chemFile = reader.read(cdk.ChemFile())
  cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0)
  # Calculate the number of rings
  sssrFinder = SSSRFinder(cdkMol)
  sssr = sssrFinder.findSSSR().size()
  return sssr
3D visualisation
●   VTK (Visualisation Toolkit) from Kitware
    –   open source, freely available
    –   scalar, tensor, vector and volumetric methods
    –   advanced modeling techniques such as implicit modelling,
        polygon reduction, mesh smoothing, cutting, contouring, and
        Delaunay triangulation
●   MayaVi
    –   easy to use GUI interface to VTK, written in Python
    –   can create input files and visualise them using Python scripts

                                 Demo
Python Resources
●   http://www.python.org
●   Guido's Tutorial
     –   http://www.python.org/doc/current/tut/tut.html
●   O'Reilly's “Learning Python” or Visual Quickstart Guide
    to Python
     –   Make sure it's Python 2.3 or 2.4 though
●   For Windows, consider the Enthought edition
     –   http://www.enthought.com/
Python for Chemistry

Contenu connexe

Tendances

Nanotechnology in Renewable Energy Sources
Nanotechnology in Renewable Energy SourcesNanotechnology in Renewable Energy Sources
Nanotechnology in Renewable Energy SourcesSteevan Sequeira
 
Synthesis and properties of Polyaniline
Synthesis and properties of PolyanilineSynthesis and properties of Polyaniline
Synthesis and properties of PolyanilineAwad Albalwi
 
Conducting Polymer By Imran Aziz
Conducting Polymer By Imran AzizConducting Polymer By Imran Aziz
Conducting Polymer By Imran AzizDr.imran aziz
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubes
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubesCarbon containing Nanomaterials: Fullerenes & Carbon nanotubes
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubesMayur D. Chauhan
 
6 nanomaterials
6 nanomaterials6 nanomaterials
6 nanomaterialsEkeeda
 
Chemical and electrochem method of synthesis of polyaniline and polythiophene...
Chemical and electrochem method of synthesis of polyaniline and polythiophene...Chemical and electrochem method of synthesis of polyaniline and polythiophene...
Chemical and electrochem method of synthesis of polyaniline and polythiophene...Mugilan Narayanasamy
 
Schrödinger wave equation
Schrödinger wave equationSchrödinger wave equation
Schrödinger wave equationHARSHWALIA9
 
Master Thesis
Master ThesisMaster Thesis
Master ThesisAli Fathi
 
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsFabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsBecker Budwan
 
Surface modification of nanomaterials
Surface modification of nanomaterialsSurface modification of nanomaterials
Surface modification of nanomaterialszenziyan
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theorykrishslide
 
Polymers Used in Everyday Life
Polymers Used in Everyday LifePolymers Used in Everyday Life
Polymers Used in Everyday Lifeijtsrd
 
Synthetic polymers - a content written by Dr.Lali Thomas Kotturan about man ...
Synthetic polymers -  a content written by Dr.Lali Thomas Kotturan about man ...Synthetic polymers -  a content written by Dr.Lali Thomas Kotturan about man ...
Synthetic polymers - a content written by Dr.Lali Thomas Kotturan about man ...lalikotturan
 

Tendances (20)

Nanotechnology in Renewable Energy Sources
Nanotechnology in Renewable Energy SourcesNanotechnology in Renewable Energy Sources
Nanotechnology in Renewable Energy Sources
 
Synthesis and properties of Polyaniline
Synthesis and properties of PolyanilineSynthesis and properties of Polyaniline
Synthesis and properties of Polyaniline
 
Conducting Polymer By Imran Aziz
Conducting Polymer By Imran AzizConducting Polymer By Imran Aziz
Conducting Polymer By Imran Aziz
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubes
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubesCarbon containing Nanomaterials: Fullerenes & Carbon nanotubes
Carbon containing Nanomaterials: Fullerenes & Carbon nanotubes
 
6 nanomaterials
6 nanomaterials6 nanomaterials
6 nanomaterials
 
Chemical and electrochem method of synthesis of polyaniline and polythiophene...
Chemical and electrochem method of synthesis of polyaniline and polythiophene...Chemical and electrochem method of synthesis of polyaniline and polythiophene...
Chemical and electrochem method of synthesis of polyaniline and polythiophene...
 
Schrödinger wave equation
Schrödinger wave equationSchrödinger wave equation
Schrödinger wave equation
 
Dft calculation by vasp
Dft calculation by vaspDft calculation by vasp
Dft calculation by vasp
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
s3-Ellipsometry.ppt
s3-Ellipsometry.ppts3-Ellipsometry.ppt
s3-Ellipsometry.ppt
 
Introduction to DFT Part 2
Introduction to DFT Part 2Introduction to DFT Part 2
Introduction to DFT Part 2
 
Quantum
QuantumQuantum
Quantum
 
Polymer industries
Polymer industriesPolymer industries
Polymer industries
 
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsFabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
 
Surface modification of nanomaterials
Surface modification of nanomaterialsSurface modification of nanomaterials
Surface modification of nanomaterials
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
 
Graphene
GrapheneGraphene
Graphene
 
Polymers Used in Everyday Life
Polymers Used in Everyday LifePolymers Used in Everyday Life
Polymers Used in Everyday Life
 
Synthetic polymers - a content written by Dr.Lali Thomas Kotturan about man ...
Synthetic polymers -  a content written by Dr.Lali Thomas Kotturan about man ...Synthetic polymers -  a content written by Dr.Lali Thomas Kotturan about man ...
Synthetic polymers - a content written by Dr.Lali Thomas Kotturan about man ...
 

Similaire à Python for Chemistry

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009A Jorge Garcia
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talkrtelmore
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash courseOlga Scrivner
 

Similaire à Python for Chemistry (20)

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 

Plus de baoilleach

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESbaoilleach
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overviewbaoilleach
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?baoilleach
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Webbaoilleach
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringbaoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2baoilleach
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculationbaoilleach
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSARbaoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsbaoilleach
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papersbaoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...baoilleach
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...baoilleach
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tunebaoilleach
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...baoilleach
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopybaoilleach
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devicesbaoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 

Plus de baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 

Dernier (20)

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Python for Chemistry

  • 1. minutes Python for Chemistry in 21 days Dr. Noel O'Boyle Dr. John Mitchell and Prof. Peter Murray-Rust UCC Talk, Sept 2005 Available at http://www-mitchell.ch.cam.ac.uk/noel/
  • 2. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry
  • 3. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry ● This talk will not cover – how to program in Python ● See references at end of talk Word of warning!!
  • 4. “My mental eye could now distinguish larger structures, of manifold conformation; long rows, sometimes more closely fitted together; all twining and twisting in snakelike motion. But look! What was that? One of the snakes had seized hold of its own tail, and the form whirled mockingly before my eyes. As if by the flash of lightning I awoke...Let us learn to dream, gentlemen” Friedrich August Kekulé (1829-1896)
  • 5. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented
  • 6. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!)
  • 7. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!) – named after Cambridge comedians
  • 8. The Great Debate Sir Lancelot: We were in the nick of time. You were in great Perl. Sir Galahad: I don't think I was. Sir Lancelot: You were, Sir Galahad. You were in terrible Perl. Sir Galahad: Look, let me go back in there and face the Perl. Sir Lancelot: No, it's too perilous. (adapted from Monty Python and the Holy Grail)
  • 9. Why you should be interested (1) ● Python has been adopted by the cheminformatics community ● For example, AstraZeneca has moved some of its codebase from 'the other scripting language' to Python Job Description – Research Software Developer/Informatician [section deleted] Required Skills: At least one object oriented programming language, e.g., Python, C++, Java. Web-based application development (design/construction/maintenance) UNIX, UNIX scripting & Linux OS Position in AstraZeneca R&D, 02-09-05
  • 10. Why you should be interested (2) ● Scientific computing: scipy/pylab (like Matlab) ● Molecular dynamics: MMTK ● Statistics: scipy, rpy (R), pychem ● 3D-visualisation: VTK (mayavi) ● 2D-visualisation: scipy, pylab, rpy ● coming soon, a wrapper around OpenBabel ● cheminformatics: OEChem, frowns, PyDaylight, pychem ● bioinformatics: BioPython ● structural biology: PyMOL ● computational chemistry: GaussSum ● you can still use Java libraries...like the CDK
  • 11. >>> from scipy import * >>> from pylab import * >>> a = arange(0,4) > a <- seq(0,3) >>> a >a scipy/ R [0,1,2,3] [1] 0 1 2 3 pylab >>> mean(a) > mean(a) 1.5 [1] 1.5 >>> a**2 > a**2 [0,1,4,9] [1] 0 1 4 9 >>> plot(a,a**2) > plot(a,a**2) >>> show() > lines(a,a**2)
  • 12. Scipy – cluster: information theory functions (currently, vq and kmeans) – weave: compilation of numeric expressions to C++ for fast execution – cow: parallel programming via a Cluster Of Workstations – fftpack: fast Fourier transform module based on fftpack and fftw when available – ga: genetic algorithms – io: reading and writing numeric arrays, MATLAB .mat, and Matrix Market .mtx files – integrate: numeric integration for bounded and unbounded ranges. ODE solvers. – interpolate: interpolation of values from a sample data set. – optimize: constrained and unconstrained optimization methods and root-finding algorithms – signal: signal processing (1-D and 2-D filtering, filter design, LTI systems, etc.) – special: special function types (bessel, gamma, airy, etc.) – stats: statistical functions (stdev, var, mean, etc.) – linalg: linear algebra and BLAS routines based on the ATLAS implementation of LAPACK – sparse: Some sparse matrix support. LU factorization and solving sparse linear systems
  • 13. Scipy statistical functions ● descriptive statistics: variance, standard deviation, standard error, mean, mode, median ● correlation: Pearson r, Spearman r, Kendall tau ● statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal, Kolmogorov-Smirnov, Anderson, etc. ● linear regression ● analysis of variance (ANOVA) ● (and more)
  • 14. pylab
  • 15. pychem: Using scipy for chemoinformatics ● Many multivariate analysis techniques are based on matrix algebra ● scipy has wrappers around well-known C and Fortran numerical libraries (ATLAS, LAPACK) ● pychem can do: – principal component analysis, partial least squares regression, Fisher's discriminant analysis – clustering: k-means, hierarchical (used Open Source clustering library) – feature selection: genetic algorithms with PLS – additional methods can be added
  • 16. Python and R ● Advantages of R: – a large number of statistical libraries are available ● Disadvantages of R: – difficult to write algorithms – slow (most R libraries are written in C) – chokes on large datasets (use scan instead of read.table) Reading in data Principal component analysis Method 300K 600K 1.6M Method 300K 600K 1.6M Python 6.8 13.9 41 Python 2.2 3.6 42 R (read.table) 42 105 R (read.table) 5 10 R (scan) 9 20 56 R (scan) 3 5 29 http://www.redbrick.dcu.ie/~noel/RversusPython.html
  • 17. Python and R ● rpy module allows Python programs to interface with R ● have the best of both worlds – access to the statistical functions of R – access to the numerous modules available for Python – can program in Python, instead of in R!! Python R >>> from rpy import r >>> x = [5.05, 6.75, 3.21, 2.66] > x <- c(5.05, 6.75, 3.21, 2.66) >>> y = [1.65, 26.5, -5.93, 7.96] > y <- c(1.65, 26.5, -5.93, 7.96) >>> print r.lsfit(x,y)['coefficients'] > lsfit(x, y)$coefficients {'X': 5.3935773611970212, Intercept X 'Intercept': -16.281127993087839} -16.281128 5.393577
  • 18. Python and R > hc$merge Problem: Analyse a hierarchical clustering [,1] [,2] [1,] -32 -33 Solution: Use R to cluster, and Python to [2,] -39 -71 analyse the merge object of the cluster [3,] -43 -47 [4,] -10 -55 [5,] -19 -36 [6,] -5 -24 [7,] -62 -63 [8,] -74 -75 [9,] -35 -76 [10,] -1 -84 [11,] -41 -42 [12,] -83 -96 [13,] -2 -29 [14,] -7 -21 [15,] -61 4
  • 19. Problem 1 Graphically show the distribution of molecular weights of molecules in an SD file. The molecular weight is stored in a field of the SD file. 1,2-Diaminoethane MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 20. Object-oriented approach object SD file object Molecule 1 Molecule 2 Molecule n attributes fields name atoms The code for creating these objects is stored in sdparser.py, which can be imported and used by any scripts that need to parse SD files.
  • 21. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 22. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 23. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 24. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 25.
  • 26. Problem 2 Every molecule in an SD file is missing the name. To be compatible with proprietary program X, we need to set the name equal to the value of the field “chem.name”. (MISSING NAME!) MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 27. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 28. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 29. Python and Java ● It's easy to use Java libraries from Python – using either Jython or JPype – see http://www.redbrick.dcu.ie/~noel/CDKJython.html Example: using the CDK to calculate the number of rings in a molecule (given a string variable containing CML) from jpype import * startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so") cdk = JPackage("org").openscience.cdk SSSRFinder = cdk.ringsearch.SSSRFinder CMLReader = cdk.io.CMLReader def getNumRings(molecule): # Convert to a CDK molecule reader = CMLReader(java.io.StringReader(molXmlValue)) chemFile = reader.read(cdk.ChemFile()) cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0) # Calculate the number of rings sssrFinder = SSSRFinder(cdkMol) sssr = sssrFinder.findSSSR().size() return sssr
  • 30. 3D visualisation ● VTK (Visualisation Toolkit) from Kitware – open source, freely available – scalar, tensor, vector and volumetric methods – advanced modeling techniques such as implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation ● MayaVi – easy to use GUI interface to VTK, written in Python – can create input files and visualise them using Python scripts Demo
  • 31. Python Resources ● http://www.python.org ● Guido's Tutorial – http://www.python.org/doc/current/tut/tut.html ● O'Reilly's “Learning Python” or Visual Quickstart Guide to Python – Make sure it's Python 2.3 or 2.4 though ● For Windows, consider the Enthought edition – http://www.enthought.com/