SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Open-source from/in the enterprise: the RDKit
Gregory Landrum
NIBR Informatics
Novartis Institutes for BioMedical Research, Basel, Switzerland
Outline
§  What is the RDKit?
§  RDKit integration with other open-source projects
•  Knime
•  PostgreSQL
•  IPython
•  Pandas
•  Lucene
§  RDKit in NIBR, some case studies
RDKit: What is it?
§  Open-source C++ toolkit for cheminformatics
§  Wrappers for Python (2.x), Java, C#
§  Functionality:
•  2D and 3D molecular operations
•  Descriptor generation for machine learning
•  PostgreSQL database cartridge for substructure and similarity searching
•  Knime nodes
•  IPython integration
•  Lucene integration (experimental)
•  Supports Mac/Windows/Linux
§  Releases every 6 months
§  business-friendly BSD license
§  Code: https://github.com/rdkit
§  http://www.rdkit.org
The community
§  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/
mailman/
§  Active participants from academia, small and large pharma, software
companies, and service providers
§  30+ attendees at each of the two user group meetings
Some features
§  Input/Output: SMILES/SMARTS, SDF, TDT, PDB,
SLN [1], Corina mol2 [1]
§  “Cheminformatics”:
•  Substructure searching
•  Canonical SMILES
•  Chirality support (i.e. R/S or E/Z labeling)
•  Chemical transformations (e.g. remove matching
substructures)
•  Chemical reactions
§  2D depiction, including constrained depiction
§  2D->3D conversion/conformational analysis via
distance geometry
§  UFF and MMFF94 implementation for cleaning up
structures
§  Fingerprinting: Daylight-like, atom pairs, topological
torsions, Morgan algorithm, “MACCS keys”, etc.
§  Similarity/diversity picking
§  2D pharmacophores [1]
§  Gasteiger-Marsili charges
§  Hierarchical subgraph/fragment analysis
§  Bemis and Murcko scaffold determination
§  RECAP and BRICS implementations
§  Multi-molecule maximum common substructure
§  Feature maps
§  Shape-based similarity
§  Fraggle similarity (from GSK)
§  Molecule-molecule alignment
§  Open3DAlign implementation
§  Integration with PyMOL for 3D visualization
§  Functional group filtering
§  Salt stripping
§  Molecular descriptor library:
Topological (κ3, Balaban J, etc.), Compositional (Number
of Rings, Number of Aromatic Heterocycles, etc.),
EState, SlogP/SMR (Wildman and Crippen approach),
“MOE like” VSA descriptors, Feature-map vectors
§  Machine Learning:
•  Clustering (hierarchical)
•  Information theory (Shannon entropy, information
gain, etc.)
§  Tight integration with the IPython notebook and
pandas
§  Integration with the InChI library
[1] These implementations are functional but are not necessarily
the best, fastest, or most complete.
The contrib dir
§  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine
§  PBF (Nicholas Firth, ICR): Plane of best fit descriptor
§  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score
§  fraggle (Jameed Hussain, GSK): fragment-based similarity
§  mmpa (Jameed Hussain, GSK): molecular matched pairs
§  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating
classifiers
§  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory
files
C++ :
Core data structures and algorithms
Postgre
SQL
Java
SWIG
Python
Boost.Python
Knime
What is this all about?
script
inter-
active
Exact same algorithms/implementations accessible from
many different endpoints
C#
App
Knime integration
§  Open-source RDKit-based nodes for Knime providing cheminformatics
functionality
+
§  Trusted nodes distributed from
knime community site
§  Work in progress: more nodes being
added (new wizard makes it easy)
What’s there?
+
RDKit Interactive Table
§  KNIME interactive table with molecules as column headers
+
+
Functionality for working with 3D molecules
§  Example: flexible molecule-molecule alignment
PostgreSQL integration
§  PostgreSQL (http://www.postgresql.org): a robust, flexible, and
extensible relational open-source database. Rich collection of
extensions available
§  RDKit “cartridge”:
•  Fast substructure and similarity search
•  Fingerprints (count-based and bit-vector):
Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair,
topological torsion, MACCS
•  Standard molecule properties and descriptors
§  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl-
virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G.,
Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of
open data and cheminformatics tools. Bioinformatics, 30(2), 298–300.
+
PostgreSQL integration
Substructure search
+
chembl_17=# select molregno,m from rdk.mols where
m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';!
molregno | m !
----------+---------------------------------------------------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12!
23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12!
23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12!
23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12!
24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1!
24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F!
24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1!
24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12!
24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O!
(9 rows)!
!
Time: 112.325 ms!
PostgreSQL integration
Similarity search
+
chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F)
(F)F)ccc3s2)c(=O)c2ccccc12') limit 5;!
molregno | m | similarity !
----------+------------------------------------------------------+-------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1!
24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018!
24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271!
24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898!
24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898!
(5 rows)!
!
Time: 1222.426 ms!
!
!
Notice that results come back in sorted order
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_formula !
---------------!
C19H12F3N3O3S!
(1 row)!
chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_logp !
----------!
3.7004!
(1 row)!
chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');
mol_inchi !
------------------------------------------------------------------------------------------
-----------------------------------------------!
InChI=1S/C19H12F3N3O3S/
c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27
/h1-7H,8-9H2,(H,26,27)!
(1 row)!
!
!
!
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_to_ctab('CC'::mol);!
mol_to_ctab !
-----------------------------------------------------------------------!
+!
RDKit 2D +!
+!
2 1 0 0 0 0 0 0 0 0999 V2000 +!
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1 2 1 0 +!
M END +!
!
(1 row)!
!
!
!
IPython notebok integration
§  IPython: a very powerful interactive shell for python
http://www.ipython.org
§  IPython notebook: IPython in the browser, with graphics
•  combines code and output in one place
•  great tool for reproducible research
•  Example notebook with graphics.
§  RDKit integration:
•  Display molecules, substructure matches, reactions, graphics from PyMOL
+
IPython notebook integration:
Molecule tables
http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html
+
IPython notebook integration:
Similarity Maps
+
Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
IPython notebook integration:
PyMol
http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html
+
Pandas integration
§  Pandas: library for working with data tables in Python. Integrates well
with matplotlib and ipython
http://pandas.pydata.org/
§  RDKit integration:
•  Load smiles tables or SD files into Pandas data tables
•  Adds molecule columns to existing tables with smiles/SD columns
•  Enables substructure filters on tables
•  Integration with IPython notebook to render molecules
+
Pandas integration
+
http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Tutorials/pandastools/Pandas_RDKit_UGM.ipynb
Substructure filters
Molecules in tables
Lucene integration
§  Still in the experimental stage
§  Adds substructure search functionality with fingerprint screenout to
Lucene
§  Includes demo app for testing
+
RDKit in NIBR
§  Extensive use by CADD, informaticians, and IT
§  Lots of convenience code/wrappers for accessing internal data sources
and tools
§  Combined with the Avalon toolkit (another NIBR-supported open-
source project), provides the underpinning for many of our global
chemistry-based applications
+
The Avalon toolkit
§  C/Java cheminformatics toolkit
§  Primary author: Bernd Rohde (NIBRIT Basel)
§  http://sourceforge.net/projects/avalontoolkit/
§  Functionality:
•  Canonical SMILES
•  Avalon fingerprint (highly optimized substructure fingerprint)
•  Molecular standardization (STRUCHK)
•  2D Coordinate generation
•  Tomcat webapp for 2D rendering
§  The RDKit has (optional) Python bindings for much of the functionality
+
RDKit in NIBR
Case study 1: CIx Framework
§  “Service bus” for cheminformatics/CADD services
§  Handles format conversions for input/output automatically
i.e. callers can provide SMILES input to a service/model wants CTABs with 3D
coordinates
§  Supports versioning of models/services
§  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant
JChem, etc.)
§  Enables trivial addition of “chemical intelligence” to web apps
§  Makes it easy to globally deploy models: once a new model/service (or
new version of a model/service) is registered with the Framework, it is
instantly globally accessible
+
CIx Framework architecture
Translation service
- molecule format conversion
- name lookup
XML File exchange
between engine and the
Models
Database to store
Model information
Model registration and
Request service
Web Model
Registration
Portal Front
end
Cix Tools Framework:
Cix Tools Web
Service
-SOAP
-REST
Model
Script
Model
Model
Script
Model
Model
Script
Model
Model
Script
Model
CIX Tools Engine
Data
In one of the following
formats:
- TSV/CSV File
- SMILES/CPD_NO
- SD-File
- DART query
XML File exchange
between engine and the
Translation service
Get the Model info from the Database
Client
- web app
-  KNIME
-  Spotfire
-  IJC
-  Python
Java/Tomcat
Python/Django
Geographically diverse servers
Most models are Python/Django
+
RDKit in NIBR
Case study 2: Small-Molecule Registration
§  Internally developed web application for compound registration
§  C#-based web services writing to Oracle
§  RDKit + Avalon toolkit for structure standardization
§  RDKit + InChI used for structure-key calculation
§  Calls out to CIx Framework for standard computed properties
§  Independent (but validated) Python implementation of standardization
and structure-key calculation for standalone use
+
RDKit in NIBR
Case study 3: QSAR Toolkit
§  Descriptor calculator providing access to all available internal
descriptors
§  Tools for pulling assay data from our data warehouse
§  Standardized model-building
§  Standardized reporting for evaluation and peer review
§  Packaging for deployment via CIx Framework
§  Model Watchdog:
Pulls most recent data, generates predictions, creates report showing evolution
of model accuracy over time
+
RDKit in NIBR
Case study 4: Similarity Server
§  Central PostgreSQL database with easily available compounds
•  in-house available
•  available from reliable vendors
§  Kept up-to-date
§  Substructure search
§  Similarity search with various fingerprints:
•  Avalon
•  Morgan2, Morgan3, FeatMorgan2
•  Atom Pairs, Topological Torsions
§  Web services interface
§  Available to chemists via one of their standard desktop tools
+
NIBR Open Source
Something new
Acknowledgements
§  General:
•  Remy Evard (NIBR/Informatics)
•  Richard Lewis (NIBR/GDC)
•  Tom Digby (NIBR/Legal)
•  Peter Gedeck (NIBR/GDC)
•  Nik Stiefl (NIBR/GDC)
§  RDKit Community
•  Roger Sayle (NextMove): PDB Parser
•  Andrew Dalke (Dalke Scientific): FMCS
•  Paolo Tosco (University of Turin):
MMFF94, Open3DAlign
•  Jameed Hussain (GSK): Fraggle,
mmpa
§  Pandas, scikit-learn:
•  Sereina Riniker (NIBR/Informatics)
•  Nikolas Fechner (NIBR/Informatics)
http://www.rdkit.org
§  Knime:
•  Manuel Schwarze (NIBR/Informatics)
•  Thorsten Meinl (knime.com)
•  Bernd Wiswedel (knime.com)
§  SMR
•  Thomas Mueller (NIBR/Informatics)
•  Thomas Veith (NIBR/Informatics)
•  Dave Cotter (NIBR/Informatics)
§  QSAR Toolkit:
•  Peter Gedeck (NIBR/GDC)
•  Nikolas Fechner (NIBR/Informatics)
§  CIx Framework
•  Sandra Mueller (NIBR/Informatics)
•  Joerg Muehlbacher (NIBR/CPC)
•  Riccardo Vianello (NIBR/Informatics)
§  NIBR Open Source
•  Ken Robbins (NIBR/Informatics)
•  Dennis Jen (NIBR/Informatics)
•  Mark Schreiber (NIBR/Informatics)
Advertising
33
3rd RDKit User Group Meeting
22-24 October 2014
Merck KGaA, Darmstadt, Germany
Talks, “talktorials”, lightning talks, social activities, and a hackathon on
the 24th.
Registration: http://goo.gl/z6QzwD
Full announcement: http://goo.gl/ZUm2wm
We’re looking for speakers. Please contact greg.landrum@gmail.com

Contenu connexe

Tendances

GROMACS Tutorial - Introduction, Procedure & Data Analysis
GROMACS Tutorial - Introduction, Procedure & Data AnalysisGROMACS Tutorial - Introduction, Procedure & Data Analysis
GROMACS Tutorial - Introduction, Procedure & Data AnalysisMayurMukhi
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overviewsubhasis banerjee
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
Protein folding prediction using Alphafold 1
Protein folding prediction using Alphafold 1Protein folding prediction using Alphafold 1
Protein folding prediction using Alphafold 1Joel Ricci-López
 
Qsar and drug design ppt
Qsar and drug design pptQsar and drug design ppt
Qsar and drug design pptAbhik Seal
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingAkshay Kank
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Dinesh Barupal
 
QSAR Studies presentation
 QSAR Studies presentation QSAR Studies presentation
QSAR Studies presentationAshruti agrawal
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using GromacsRajendra K Labala
 

Tendances (20)

GROMACS Tutorial - Introduction, Procedure & Data Analysis
GROMACS Tutorial - Introduction, Procedure & Data AnalysisGROMACS Tutorial - Introduction, Procedure & Data Analysis
GROMACS Tutorial - Introduction, Procedure & Data Analysis
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Protein folding prediction using Alphafold 1
Protein folding prediction using Alphafold 1Protein folding prediction using Alphafold 1
Protein folding prediction using Alphafold 1
 
Qsar and drug design ppt
Qsar and drug design pptQsar and drug design ppt
Qsar and drug design ppt
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
Virendra
VirendraVirendra
Virendra
 
Example of force fields
Example of force fieldsExample of force fields
Example of force fields
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Pubchem
PubchemPubchem
Pubchem
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
QSAR Studies presentation
 QSAR Studies presentation QSAR Studies presentation
QSAR Studies presentation
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Energy Minimization Using Gromacs
Energy Minimization Using GromacsEnergy Minimization Using Gromacs
Energy Minimization Using Gromacs
 
Qsar
QsarQsar
Qsar
 

En vedette

1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016maditabalnco
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionCGS
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessGreg Gifford
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisirSoumia EL Yaacoubi
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writingmystiquemel
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4azura272
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animalsquicarroll
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5Nome Sobrenome
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.José María
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Heriyanto Asep
 

En vedette (16)

BEDP II
BEDP IIBEDP II
BEDP II
 
1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee Retention
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your Business
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisir
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writing
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4
 
P7 e2 josemariabarrio
P7 e2 josemariabarrio P7 e2 josemariabarrio
P7 e2 josemariabarrio
 
538df1cdf0b7f
538df1cdf0b7f538df1cdf0b7f
538df1cdf0b7f
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animals
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)
 
SPA: Key Questions
SPA: Key QuestionsSPA: Key Questions
SPA: Key Questions
 
New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13 New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13
 

Similaire à Open-source from/in the enterprise: the RDKit

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscoverygwprice
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

Similaire à Open-source from/in the enterprise: the RDKit (20)

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
From Laboratory to e-Laboratory
From Laboratory to e-LaboratoryFrom Laboratory to e-Laboratory
From Laboratory to e-Laboratory
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscovery
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Bosco r users2013
Bosco r users2013Bosco r users2013
Bosco r users2013
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Plus de Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 

Plus de Greg Landrum (18)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Dernier (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

Open-source from/in the enterprise: the RDKit

  • 1. Open-source from/in the enterprise: the RDKit Gregory Landrum NIBR Informatics Novartis Institutes for BioMedical Research, Basel, Switzerland
  • 2. Outline §  What is the RDKit? §  RDKit integration with other open-source projects •  Knime •  PostgreSQL •  IPython •  Pandas •  Lucene §  RDKit in NIBR, some case studies
  • 3. RDKit: What is it? §  Open-source C++ toolkit for cheminformatics §  Wrappers for Python (2.x), Java, C# §  Functionality: •  2D and 3D molecular operations •  Descriptor generation for machine learning •  PostgreSQL database cartridge for substructure and similarity searching •  Knime nodes •  IPython integration •  Lucene integration (experimental) •  Supports Mac/Windows/Linux §  Releases every 6 months §  business-friendly BSD license §  Code: https://github.com/rdkit §  http://www.rdkit.org
  • 4. The community §  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/ mailman/ §  Active participants from academia, small and large pharma, software companies, and service providers §  30+ attendees at each of the two user group meetings
  • 5. Some features §  Input/Output: SMILES/SMARTS, SDF, TDT, PDB, SLN [1], Corina mol2 [1] §  “Cheminformatics”: •  Substructure searching •  Canonical SMILES •  Chirality support (i.e. R/S or E/Z labeling) •  Chemical transformations (e.g. remove matching substructures) •  Chemical reactions §  2D depiction, including constrained depiction §  2D->3D conversion/conformational analysis via distance geometry §  UFF and MMFF94 implementation for cleaning up structures §  Fingerprinting: Daylight-like, atom pairs, topological torsions, Morgan algorithm, “MACCS keys”, etc. §  Similarity/diversity picking §  2D pharmacophores [1] §  Gasteiger-Marsili charges §  Hierarchical subgraph/fragment analysis §  Bemis and Murcko scaffold determination §  RECAP and BRICS implementations §  Multi-molecule maximum common substructure §  Feature maps §  Shape-based similarity §  Fraggle similarity (from GSK) §  Molecule-molecule alignment §  Open3DAlign implementation §  Integration with PyMOL for 3D visualization §  Functional group filtering §  Salt stripping §  Molecular descriptor library: Topological (κ3, Balaban J, etc.), Compositional (Number of Rings, Number of Aromatic Heterocycles, etc.), EState, SlogP/SMR (Wildman and Crippen approach), “MOE like” VSA descriptors, Feature-map vectors §  Machine Learning: •  Clustering (hierarchical) •  Information theory (Shannon entropy, information gain, etc.) §  Tight integration with the IPython notebook and pandas §  Integration with the InChI library [1] These implementations are functional but are not necessarily the best, fastest, or most complete.
  • 6. The contrib dir §  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine §  PBF (Nicholas Firth, ICR): Plane of best fit descriptor §  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score §  fraggle (Jameed Hussain, GSK): fragment-based similarity §  mmpa (Jameed Hussain, GSK): molecular matched pairs §  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating classifiers §  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory files
  • 7. C++ : Core data structures and algorithms Postgre SQL Java SWIG Python Boost.Python Knime What is this all about? script inter- active Exact same algorithms/implementations accessible from many different endpoints C# App
  • 8. Knime integration §  Open-source RDKit-based nodes for Knime providing cheminformatics functionality + §  Trusted nodes distributed from knime community site §  Work in progress: more nodes being added (new wizard makes it easy)
  • 10. RDKit Interactive Table §  KNIME interactive table with molecules as column headers +
  • 11. + Functionality for working with 3D molecules §  Example: flexible molecule-molecule alignment
  • 12. PostgreSQL integration §  PostgreSQL (http://www.postgresql.org): a robust, flexible, and extensible relational open-source database. Rich collection of extensions available §  RDKit “cartridge”: •  Fast substructure and similarity search •  Fingerprints (count-based and bit-vector): Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair, topological torsion, MACCS •  Standard molecule properties and descriptors §  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl- virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G., Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of open data and cheminformatics tools. Bioinformatics, 30(2), 298–300. +
  • 13. PostgreSQL integration Substructure search + chembl_17=# select molregno,m from rdk.mols where m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';! molregno | m ! ----------+---------------------------------------------------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12! 23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12! 23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12! 23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12! 24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1! 24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F! 24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1! 24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12! 24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O! (9 rows)! ! Time: 112.325 ms!
  • 14. PostgreSQL integration Similarity search + chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F) (F)F)ccc3s2)c(=O)c2ccccc12') limit 5;! molregno | m | similarity ! ----------+------------------------------------------------------+-------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1! 24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018! 24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271! 24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898! 24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898! (5 rows)! ! Time: 1222.426 ms! ! ! Notice that results come back in sorted order
  • 15. PostgreSQL integration Other functionality + chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_formula ! ---------------! C19H12F3N3O3S! (1 row)! chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_logp ! ----------! 3.7004! (1 row)! chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12'); mol_inchi ! ------------------------------------------------------------------------------------------ -----------------------------------------------! InChI=1S/C19H12F3N3O3S/ c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27 /h1-7H,8-9H2,(H,26,27)! (1 row)! ! ! !
  • 16. PostgreSQL integration Other functionality + chembl_17=# select mol_to_ctab('CC'::mol);! mol_to_ctab ! -----------------------------------------------------------------------! +! RDKit 2D +! +! 2 1 0 0 0 0 0 0 0 0999 V2000 +! 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1 2 1 0 +! M END +! ! (1 row)! ! ! !
  • 17. IPython notebok integration §  IPython: a very powerful interactive shell for python http://www.ipython.org §  IPython notebook: IPython in the browser, with graphics •  combines code and output in one place •  great tool for reproducible research •  Example notebook with graphics. §  RDKit integration: •  Display molecules, substructure matches, reactions, graphics from PyMOL +
  • 18. IPython notebook integration: Molecule tables http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html +
  • 19. IPython notebook integration: Similarity Maps + Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
  • 21. Pandas integration §  Pandas: library for working with data tables in Python. Integrates well with matplotlib and ipython http://pandas.pydata.org/ §  RDKit integration: •  Load smiles tables or SD files into Pandas data tables •  Adds molecule columns to existing tables with smiles/SD columns •  Enables substructure filters on tables •  Integration with IPython notebook to render molecules +
  • 23. Lucene integration §  Still in the experimental stage §  Adds substructure search functionality with fingerprint screenout to Lucene §  Includes demo app for testing +
  • 24. RDKit in NIBR §  Extensive use by CADD, informaticians, and IT §  Lots of convenience code/wrappers for accessing internal data sources and tools §  Combined with the Avalon toolkit (another NIBR-supported open- source project), provides the underpinning for many of our global chemistry-based applications +
  • 25. The Avalon toolkit §  C/Java cheminformatics toolkit §  Primary author: Bernd Rohde (NIBRIT Basel) §  http://sourceforge.net/projects/avalontoolkit/ §  Functionality: •  Canonical SMILES •  Avalon fingerprint (highly optimized substructure fingerprint) •  Molecular standardization (STRUCHK) •  2D Coordinate generation •  Tomcat webapp for 2D rendering §  The RDKit has (optional) Python bindings for much of the functionality +
  • 26. RDKit in NIBR Case study 1: CIx Framework §  “Service bus” for cheminformatics/CADD services §  Handles format conversions for input/output automatically i.e. callers can provide SMILES input to a service/model wants CTABs with 3D coordinates §  Supports versioning of models/services §  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant JChem, etc.) §  Enables trivial addition of “chemical intelligence” to web apps §  Makes it easy to globally deploy models: once a new model/service (or new version of a model/service) is registered with the Framework, it is instantly globally accessible +
  • 27. CIx Framework architecture Translation service - molecule format conversion - name lookup XML File exchange between engine and the Models Database to store Model information Model registration and Request service Web Model Registration Portal Front end Cix Tools Framework: Cix Tools Web Service -SOAP -REST Model Script Model Model Script Model Model Script Model Model Script Model CIX Tools Engine Data In one of the following formats: - TSV/CSV File - SMILES/CPD_NO - SD-File - DART query XML File exchange between engine and the Translation service Get the Model info from the Database Client - web app -  KNIME -  Spotfire -  IJC -  Python Java/Tomcat Python/Django Geographically diverse servers Most models are Python/Django +
  • 28. RDKit in NIBR Case study 2: Small-Molecule Registration §  Internally developed web application for compound registration §  C#-based web services writing to Oracle §  RDKit + Avalon toolkit for structure standardization §  RDKit + InChI used for structure-key calculation §  Calls out to CIx Framework for standard computed properties §  Independent (but validated) Python implementation of standardization and structure-key calculation for standalone use +
  • 29. RDKit in NIBR Case study 3: QSAR Toolkit §  Descriptor calculator providing access to all available internal descriptors §  Tools for pulling assay data from our data warehouse §  Standardized model-building §  Standardized reporting for evaluation and peer review §  Packaging for deployment via CIx Framework §  Model Watchdog: Pulls most recent data, generates predictions, creates report showing evolution of model accuracy over time +
  • 30. RDKit in NIBR Case study 4: Similarity Server §  Central PostgreSQL database with easily available compounds •  in-house available •  available from reliable vendors §  Kept up-to-date §  Substructure search §  Similarity search with various fingerprints: •  Avalon •  Morgan2, Morgan3, FeatMorgan2 •  Atom Pairs, Topological Torsions §  Web services interface §  Available to chemists via one of their standard desktop tools +
  • 32. Acknowledgements §  General: •  Remy Evard (NIBR/Informatics) •  Richard Lewis (NIBR/GDC) •  Tom Digby (NIBR/Legal) •  Peter Gedeck (NIBR/GDC) •  Nik Stiefl (NIBR/GDC) §  RDKit Community •  Roger Sayle (NextMove): PDB Parser •  Andrew Dalke (Dalke Scientific): FMCS •  Paolo Tosco (University of Turin): MMFF94, Open3DAlign •  Jameed Hussain (GSK): Fraggle, mmpa §  Pandas, scikit-learn: •  Sereina Riniker (NIBR/Informatics) •  Nikolas Fechner (NIBR/Informatics) http://www.rdkit.org §  Knime: •  Manuel Schwarze (NIBR/Informatics) •  Thorsten Meinl (knime.com) •  Bernd Wiswedel (knime.com) §  SMR •  Thomas Mueller (NIBR/Informatics) •  Thomas Veith (NIBR/Informatics) •  Dave Cotter (NIBR/Informatics) §  QSAR Toolkit: •  Peter Gedeck (NIBR/GDC) •  Nikolas Fechner (NIBR/Informatics) §  CIx Framework •  Sandra Mueller (NIBR/Informatics) •  Joerg Muehlbacher (NIBR/CPC) •  Riccardo Vianello (NIBR/Informatics) §  NIBR Open Source •  Ken Robbins (NIBR/Informatics) •  Dennis Jen (NIBR/Informatics) •  Mark Schreiber (NIBR/Informatics)
  • 33. Advertising 33 3rd RDKit User Group Meeting 22-24 October 2014 Merck KGaA, Darmstadt, Germany Talks, “talktorials”, lightning talks, social activities, and a hackathon on the 24th. Registration: http://goo.gl/z6QzwD Full announcement: http://goo.gl/ZUm2wm We’re looking for speakers. Please contact greg.landrum@gmail.com