SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Developing an open source
  community for cloud
      bioinformatics
        Brad Chapman
  http://bcbio.wordpress.com/


         8 June 2010
Overview

   1   Building open source bioinformatics
       communities is hard.
   2   Developer resources are a productive
       target.
   3   Framework: collaborative software
       images and data snapshots.
Motivation

    Open source
        OpenBio, Biopython
        Graduate school – developed distributed
        algorithm. Never reused.
    Work
        Startup: Automated biological pipelines.
        Research hospital: Democratization of
        analysis.
Filters in biological computing

          Working in same biological area

          Interest in developing open source code

          Technical abilities

          Your software is good enough
Successful bioinformatics

  Sean Eddy, HMMER
  ...the best software in the field is often an
  unplanned labor of love from a single
  investigator.
  http://selab.janelia.org/people/eddys/blog/?p=313
Recognizing contributions
Successful community projects

      OpenBio: BioPerl, Biopython, BioJava
      Bioconductor
 Common theme
 Aimed at developers.
 Biologists benefit indirectly.
Lowering activation energy
Establishing common platform

                           The solution
                    =      to all our
                           problems


    Remove install and distribution barriers
    Building block for scaling
Existing cloud bioinformatics work

      JCVI Cloud BioLinux
      bioperl-max
      MachetEC2
      Debian Med
  Overlapping set of useful functionality.
Integrated community solution

      Inclusive but configurable
      Easy to contribute
      Automated
 Bootstrap bare machine to fully ready
 distributed AMI.
 http://github.com/chapmanb/bcbb/tree/master/ec2/
 biolinux/
Inclusive but configurable
  # Top level YAML configuration file specifying
  # groups of programs to be installed.
  packages:
    - python
    - r
    - erlang
    - databases
    - viz
    - bio_search
    - bio_alignment
    - bio_nextgen
    - bio_sequencing
    - bio_visualization
    - phylogeny
  libraries:
    - r-libs
    - python-libs
Easy to contribute
 # Configuration file defining R specific libraries that
 # are installed via CRAN and Bioconductor.
 cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/
 cran:
  - ggplot2
  - rjson
  - sqldf
  - NMF
  - ape
 biocrepo: http://bioconductor.org/biocLite.R
 bioc:
  - ShortRead
  - BSgenome
  - edgeR
  - GOstats
  - biomaRt
  - Rsamtools
Automated

 def install_biolinux():
     ec2_ubuntu_environment()
     pkg_install, lib_install = _read_main_config()
     _apt_packages(pkg_install)
     _do_library_installs(lib_install)

 def _ruby_library_installer(config):
     for gem in config[’gems’]:
         sudo("gem install %s" % gem)


 Fabric: http://docs.fabfile.org/
Ready to use biological data

 % ls /referenceGenomes/            % ls Hsapiens/hg18
 Athaliana                          arachne
 Celegans                           bowtie
 Dmelanogaster                      bwa
 Ecoli                              eland
 Hsapiens                           maq
 Mmusculus                          seq
 Msmegmatis                         snps
 Mtuberculosis_H37Rv                ucsc
 Paeruginosa_UCBPP-PA14
 phiX174
 Rnorvegicus
 Scerevisiae
 Xtropicalis
  http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
Organization: Codefest 2010




 www.open-bio.org/wiki/Codefest_2010

Contenu connexe

Similaire à Developing an open source community for cloud bioinformatics

Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009
bosc
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 

Similaire à Developing an open source community for cloud bioinformatics (20)

Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Bioclipse
BioclipseBioclipse
Bioclipse
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012
 
Deploying Straight to Production
Deploying Straight to ProductionDeploying Straight to Production
Deploying Straight to Production
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-intro
 
Devoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summaryDevoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summary
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
HPC For Bioinformatics
HPC For BioinformaticsHPC For Bioinformatics
HPC For Bioinformatics
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbio
 

Plus de Brad Chapman (6)

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformatics
 
Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plot
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in Galaxy
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 Intro
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Developing an open source community for cloud bioinformatics

  • 1. Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010
  • 2. Overview 1 Building open source bioinformatics communities is hard. 2 Developer resources are a productive target. 3 Framework: collaborative software images and data snapshots.
  • 3. Motivation Open source OpenBio, Biopython Graduate school – developed distributed algorithm. Never reused. Work Startup: Automated biological pipelines. Research hospital: Democratization of analysis.
  • 4. Filters in biological computing Working in same biological area Interest in developing open source code Technical abilities Your software is good enough
  • 5. Successful bioinformatics Sean Eddy, HMMER ...the best software in the field is often an unplanned labor of love from a single investigator. http://selab.janelia.org/people/eddys/blog/?p=313
  • 7. Successful community projects OpenBio: BioPerl, Biopython, BioJava Bioconductor Common theme Aimed at developers. Biologists benefit indirectly.
  • 9. Establishing common platform The solution = to all our problems Remove install and distribution barriers Building block for scaling
  • 10. Existing cloud bioinformatics work JCVI Cloud BioLinux bioperl-max MachetEC2 Debian Med Overlapping set of useful functionality.
  • 11. Integrated community solution Inclusive but configurable Easy to contribute Automated Bootstrap bare machine to fully ready distributed AMI. http://github.com/chapmanb/bcbb/tree/master/ec2/ biolinux/
  • 12. Inclusive but configurable # Top level YAML configuration file specifying # groups of programs to be installed. packages: - python - r - erlang - databases - viz - bio_search - bio_alignment - bio_nextgen - bio_sequencing - bio_visualization - phylogeny libraries: - r-libs - python-libs
  • 13. Easy to contribute # Configuration file defining R specific libraries that # are installed via CRAN and Bioconductor. cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/ cran: - ggplot2 - rjson - sqldf - NMF - ape biocrepo: http://bioconductor.org/biocLite.R bioc: - ShortRead - BSgenome - edgeR - GOstats - biomaRt - Rsamtools
  • 14. Automated def install_biolinux(): ec2_ubuntu_environment() pkg_install, lib_install = _read_main_config() _apt_packages(pkg_install) _do_library_installs(lib_install) def _ruby_library_installer(config): for gem in config[’gems’]: sudo("gem install %s" % gem) Fabric: http://docs.fabfile.org/
  • 15. Ready to use biological data % ls /referenceGenomes/ % ls Hsapiens/hg18 Athaliana arachne Celegans bowtie Dmelanogaster bwa Ecoli eland Hsapiens maq Mmusculus seq Msmegmatis snps Mtuberculosis_H37Rv ucsc Paeruginosa_UCBPP-PA14 phiX174 Rnorvegicus Scerevisiae Xtropicalis http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
  • 16. Organization: Codefest 2010 www.open-bio.org/wiki/Codefest_2010