Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

ACS San Diego - The RDKit: Open-source cheminformatics

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 23 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à ACS San Diego - The RDKit: Open-source cheminformatics (20)

Publicité

Plus par Greg Landrum (10)

Plus récents (20)

Publicité

ACS San Diego - The RDKit: Open-source cheminformatics

  1. 1. The RDKit: Open-source cheminformatics Greg Landrum ACS San Diego August 2019 T5 Informatics GmbH greg.landrum@t5informatics.com @dr_greg_landrum
  2. 2. T5 Informatics 2 An open source toolkit for cheminformatics ● Business-friendly BSD license ● Core data structures and algorithms in C++ ● Python 3.x wrapper generated using Boost.Python ● Java and C# wrappers generated with SWIG ● 2D and 3D molecular operations ● Descriptor generation for machine learning ● Molecular database cartridge for PostgreSQL ● Cheminformatics nodes for KNIME (distributed from the KNIME community site: http://www.knime.org/rdkit)
  3. 3. 3T5 Informatics Ecosystem Exact same implementation regardless of where you are using it from
  4. 4. 4T5 Informatics Details ● http://www.rdkit.org ● Supports Mac/Windows/Linux ● Releases every 6 months ● Github (https://github.com/rdkit) Downloads, bug tracker, git repository ● Mailing lists at https://sourceforge.net/p/rdkit/mailman/, searchable archives available for rdkit-discuss and rdkit-devel ● Blog (https://rdkit.blogspot.com) Tips, tricks, random stuff ● KNIME integration (https://github.com/rdkit/knime-rdkit) RDKit nodes for KNIME (also just from the community download site inside of KNIME) ● Twitter: @RDKit_org ● LinkedIn: https://www.linkedin.com/groups/8192558
  5. 5. T5 Informatics 5 Functionality1 ● Fingerprints ● Descriptors ● Reactions ● MCS ● Enhanced stereochemistry ● Molecular standardization ● Depiction ● Diversity picking ● Tight integration with Jupyter and pandas ● Conformation generation ● 3D descriptors ● UFF and MMFF94/MMFF94S ● Open3D Align ● Feature map vectors ● Pharmacophore embedding 1 A not-quite-random selection
  6. 6. T5 Informatics 6 Documentation
  7. 7. T5 Informatics 7 Documentation
  8. 8. T5 Informatics 8 Documentation
  9. 9. T5 Informatics 9 Being opinionated ● The RDKit is not designed to be a toolkit for file format conversion, so round-tripping molecules isn't always possible ● The default settings will reject molecules that are chemically unreasonable. You can turn this off1 ● The toolkit generally does not try to guess and "fix" input structures 1 But you generally shouldn't!
  10. 10. 10T5 Informatics Support ● Web searches ● Mailing list ● Github ● Commercial support
  11. 11. T5 Informatics 11 Community ● Mailing lists: >850 messages to rdkit-discuss from 2018.08.15 - 2019.08.15 ● Google scholar: >550 hits for "rdkit" in 2018 ● Searching github for from rdkit import Chem returns >8500 code results across >450 unique repo names (i.e. not RDKit forks). ● Each of the last five UGMs at capacity with 40-100+ attendees
  12. 12. 12T5 Informatics Code contributions in the last year
  13. 13. 13T5 Informatics Contributions to github issue tracker in the last year ricrogz bp-kelley ombanck UnixJunkie shayakhmetov LivC182 kovasap kienerj coleb yurivict tdudgeon tawe141 sroughley soerendip sbhakat rmrmg paconius gedeck ericmjl e-kwsm b-mahjour aparente-nurix SiPa13 CamAnNguyen yshen22 yphillip yamasakih xiaohongniua wtriddle uditgupta0912 timholy thegodone tduigou stefdoerr smoe simonmb sihagmnis36 shashany sdvillal sahertariq07 rvianello pstjohn pschwllr proteneer poppy7675 poganyp pavlovnicola oivulf mwojcikowski msteijaert mjw99 mayankBIL malteseunderdog lorton lilleswing likhangy lewisacidic kexul kennethriva kemaeleon karolbadowski jwarmitage jones-gareth jasad1 icamps hsiaoyi0504 hjuinj grinnnnnn goraj gncs ghiandonigianmarco gauravmoghe felixekn eugene-bright ericmjonas eloyfelix dvidmon dpwildboar darkcircle danpol cwhidden cowsandmilk complext clarezhu cing chazanov btcooooper bembel balducci baerbock azedine-healx andt88 andrewtarzia agdecm adalke ZacharyKaplan ValeryPolyakov Szirenke SamuelFigueroa SRaent Plancalkuele NadineSchneider Mickdub Mario-Liu JLVarjo Dekken ChinzoD ChiCheng45 CKannas Bjoux2 BillLawrence111 AustinApple Andy-Wilkinson 7FeiW That's 115 different people
  14. 14. T5 Informatics 14 Usage in other open-source projects ● stk (docs, paper) - a Python library for building, manipulating, analyzing and automatic design of molecules. ● OpenFF - Open source approach for better force fields ● gpusimilarity - GPU implementation of fingerprint similarity searching ● Samson Connect - Software for adaptive modeling and simulation of nanosystems ● mol_frame - Chemical Structure Handling for Dask and Pandas DataFrames ● mmpdb 2.0 - matched molecular pair database generation and analysis ● CheTo - Chemical topic modeling ● OCEAN - web-tool for target-prediction of chemical structures which uses ChEMBL as datasource ● Coot - software for macromolecular model building, model completion and validation ● DeepChem - deep learning toolkit for drug discovery ● sdf_viewer.py - an interactive SDF viewer ● sdf2ppt - Reads an SDFile and displays molecules as image grid in powerpoint/openoffice presentation. ● chemfp ● PYPL - Simple cartridge that lets you call Python scripts from Oracle PL/SQL. ● WONKA - Tool for analysis and interrogation of protein-ligand crystal structures ● OOMMPPAA - Tool for directed synthesis and data analysis based on protein-ligand crystal structures ● RRDKit - RDKit integration for R ● chemicalite - SQLite integration for the RDKit ● django-rdkit - Django integration for the RDKit ● … more ...
  15. 15. 15T5 Informatics Usage in commercial tools ● Cresset Software ● Dalke Scientific Software ● NextMove Software ● Schrödinger ● SCM ● Wolfram Research Disclaimer: this info is from public statements made by people from those companies. I almost certainly have forgotten someone
  16. 16. 16T5 Informatics Usage in online tools ● ChEMBL ● ZINC ● Google Patents ● PDBe ● Enamine ● TeachOpenCADD (teaching material) Disclaimer: this info is from public statements made by people associated with those projects. I almost certainly have forgotten someone
  17. 17. 17T5 Informatics Sustainability: the bus problem https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
  18. 18. 18T5 Informatics Roadmap
  19. 19. T5 Informatics 19 Some upcoming things1 ● Further improvements to the conformation generator ● Javascript integration: run the RDKit in a web page ● Prototype Neo4J integration (GSoC 2019 project) ● Substructure search performance improvements ● Interactive molecules in Jupyter ● Improved S group support 1 This slide contains forward looking statements...
  20. 20. 20 T5 Informatics A couple other topics
  21. 21. T5 Informatics 21 Open source and chemistry Still a somewhat divisive topic We've come a long way in the last decade or so
  22. 22. 22T5 Informatics A bit of ranting An impassioned plea When you use a piece of open-source software: - Let the developers know what you're doing - Acknowledge that you're using it (citations, on posters, in presentations, etc.) - Don't forget that you can contribute too! - Contributing is not just for developers - Good bug reports are really valuable things - Fixing/extending documentation is easy - Answering questions on mailing lists/forums - If you're in a position where you can provide funding: think about doing so
  23. 23. T5 Informatics 23 Acknowledgements ● Everyone who has contributed code, questions, answers, bug reports, etc ● People who have funded RDKit development (directly or indirectly) ● The others in our community who've been pushing the idea and adoption of open source

×