SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Developing an open source
  community for cloud
      bioinformatics
        Brad Chapman
  http://bcbio.wordpress.com/


         8 June 2010
Overview

   1   Building open source bioinformatics
       communities is hard.
   2   Developer resources are a productive
       target.
   3   Framework: collaborative software
       images and data snapshots.
Motivation

    Open source
        OpenBio, Biopython
        Graduate school – developed distributed
        algorithm. Never reused.
    Work
        Startup: Automated biological pipelines.
        Research hospital: Democratization of
        analysis.
Filters in biological computing

          Working in same biological area

          Interest in developing open source code

          Technical abilities

          Your software is good enough
Successful bioinformatics

  Sean Eddy, HMMER
  ...the best software in the field is often an
  unplanned labor of love from a single
  investigator.
  http://selab.janelia.org/people/eddys/blog/?p=313
Recognizing contributions
Successful community projects

      OpenBio: BioPerl, Biopython, BioJava
      Bioconductor
 Common theme
 Aimed at developers.
 Biologists benefit indirectly.
Lowering activation energy
Establishing common platform

                           The solution
                    =      to all our
                           problems


    Remove install and distribution barriers
    Building block for scaling
Existing cloud bioinformatics work

      JCVI Cloud BioLinux
      bioperl-max
      MachetEC2
      Debian Med
  Overlapping set of useful functionality.
Integrated community solution

      Inclusive but configurable
      Easy to contribute
      Automated
 Bootstrap bare machine to fully ready
 distributed AMI.
 http://github.com/chapmanb/bcbb/tree/master/ec2/
 biolinux/
Inclusive but configurable
  # Top level YAML configuration file specifying
  # groups of programs to be installed.
  packages:
    - python
    - r
    - erlang
    - databases
    - viz
    - bio_search
    - bio_alignment
    - bio_nextgen
    - bio_sequencing
    - bio_visualization
    - phylogeny
  libraries:
    - r-libs
    - python-libs
Easy to contribute
 # Configuration file defining R specific libraries that
 # are installed via CRAN and Bioconductor.
 cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/
 cran:
  - ggplot2
  - rjson
  - sqldf
  - NMF
  - ape
 biocrepo: http://bioconductor.org/biocLite.R
 bioc:
  - ShortRead
  - BSgenome
  - edgeR
  - GOstats
  - biomaRt
  - Rsamtools
Automated

 def install_biolinux():
     ec2_ubuntu_environment()
     pkg_install, lib_install = _read_main_config()
     _apt_packages(pkg_install)
     _do_library_installs(lib_install)

 def _ruby_library_installer(config):
     for gem in config[’gems’]:
         sudo("gem install %s" % gem)


 Fabric: http://docs.fabfile.org/
Ready to use biological data

 % ls /referenceGenomes/            % ls Hsapiens/hg18
 Athaliana                          arachne
 Celegans                           bowtie
 Dmelanogaster                      bwa
 Ecoli                              eland
 Hsapiens                           maq
 Mmusculus                          seq
 Msmegmatis                         snps
 Mtuberculosis_H37Rv                ucsc
 Paeruginosa_UCBPP-PA14
 phiX174
 Rnorvegicus
 Scerevisiae
 Xtropicalis
  http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
Organization: Codefest 2010




 www.open-bio.org/wiki/Codefest_2010

Contenu connexe

Similaire à Developing an open source community for cloud bioinformatics

Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009bosc
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)Mark Jensen
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)Mark Jensen
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsJoão André Carriço
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012Jan Aerts
 
Deploying Straight to Production
Deploying Straight to ProductionDeploying Straight to Production
Deploying Straight to ProductionMark Baker
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterIOSR Journals
 
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...South Tyrol Free Software Conference
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-introLeo Lahti
 
Devoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summaryDevoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summaryArtem Oboturov
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbioNatalio Krasnogor
 

Similaire à Developing an open source community for cloud bioinformatics (20)

Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009Chapman_publishingweb_BOSC2009
Chapman_publishingweb_BOSC2009
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Bioclipse
BioclipseBioclipse
Bioclipse
 
Making Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and AnnotationsMaking Use of NGS Data: From Reads to Trees and Annotations
Making Use of NGS Data: From Reads to Trees and Annotations
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012B Chapman - Codefest BOSC2012
B Chapman - Codefest BOSC2012
 
Deploying Straight to Production
Deploying Straight to ProductionDeploying Straight to Production
Deploying Straight to Production
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
SFScon 2020 - Paolo Boldi - Software Ecosystems as Networks Advances on the F...
 
20120907 microbiome-intro
20120907 microbiome-intro20120907 microbiome-intro
20120907 microbiome-intro
 
Devoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summaryDevoxx 2014 [incomplete] summary
Devoxx 2014 [incomplete] summary
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
HPC For Bioinformatics
HPC For BioinformaticsHPC For Bioinformatics
HPC For Bioinformatics
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Advanced computationalsyntbio
Advanced computationalsyntbioAdvanced computationalsyntbio
Advanced computationalsyntbio
 

Plus de Brad Chapman

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformaticsBrad Chapman
 
Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Brad Chapman
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010Brad Chapman
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plotBrad Chapman
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyBrad Chapman
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 IntroBrad Chapman
 

Plus de Brad Chapman (6)

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformatics
 
Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plot
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in Galaxy
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 Intro
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Developing an open source community for cloud bioinformatics

  • 1. Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010
  • 2. Overview 1 Building open source bioinformatics communities is hard. 2 Developer resources are a productive target. 3 Framework: collaborative software images and data snapshots.
  • 3. Motivation Open source OpenBio, Biopython Graduate school – developed distributed algorithm. Never reused. Work Startup: Automated biological pipelines. Research hospital: Democratization of analysis.
  • 4. Filters in biological computing Working in same biological area Interest in developing open source code Technical abilities Your software is good enough
  • 5. Successful bioinformatics Sean Eddy, HMMER ...the best software in the field is often an unplanned labor of love from a single investigator. http://selab.janelia.org/people/eddys/blog/?p=313
  • 7. Successful community projects OpenBio: BioPerl, Biopython, BioJava Bioconductor Common theme Aimed at developers. Biologists benefit indirectly.
  • 9. Establishing common platform The solution = to all our problems Remove install and distribution barriers Building block for scaling
  • 10. Existing cloud bioinformatics work JCVI Cloud BioLinux bioperl-max MachetEC2 Debian Med Overlapping set of useful functionality.
  • 11. Integrated community solution Inclusive but configurable Easy to contribute Automated Bootstrap bare machine to fully ready distributed AMI. http://github.com/chapmanb/bcbb/tree/master/ec2/ biolinux/
  • 12. Inclusive but configurable # Top level YAML configuration file specifying # groups of programs to be installed. packages: - python - r - erlang - databases - viz - bio_search - bio_alignment - bio_nextgen - bio_sequencing - bio_visualization - phylogeny libraries: - r-libs - python-libs
  • 13. Easy to contribute # Configuration file defining R specific libraries that # are installed via CRAN and Bioconductor. cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/ cran: - ggplot2 - rjson - sqldf - NMF - ape biocrepo: http://bioconductor.org/biocLite.R bioc: - ShortRead - BSgenome - edgeR - GOstats - biomaRt - Rsamtools
  • 14. Automated def install_biolinux(): ec2_ubuntu_environment() pkg_install, lib_install = _read_main_config() _apt_packages(pkg_install) _do_library_installs(lib_install) def _ruby_library_installer(config): for gem in config[’gems’]: sudo("gem install %s" % gem) Fabric: http://docs.fabfile.org/
  • 15. Ready to use biological data % ls /referenceGenomes/ % ls Hsapiens/hg18 Athaliana arachne Celegans bowtie Dmelanogaster bwa Ecoli eland Hsapiens maq Mmusculus seq Msmegmatis snps Mtuberculosis_H37Rv ucsc Paeruginosa_UCBPP-PA14 phiX174 Rnorvegicus Scerevisiae Xtropicalis http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
  • 16. Organization: Codefest 2010 www.open-bio.org/wiki/Codefest_2010