“Cloud BioLinux:Standardized, Pre-Configured and On-Demand
Computing for Genomics and Beyond
”. Genomics Standards Consortium Conference 2010, European Bioinformatics Institute, Hinxton, UK
Unblocking The Main Thread Solving ANRs and Frozen Frames
Ntino Krampis GSC 2011
1. Cloud BioLinux: Standardized, Pre-Configured and On-Demand
Computing for Genomics and Beyond
Ntino Krampis, PhD
GSC 2011
Hinxton, UK
2. Expensive sequencing and large organizations
Commodity sequencing and small labs
●
large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, coordination with other centers
● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
3. “Bioinformatics nation is a land of city-states” Lincoln Stein
● smaller labs building small-scale bioinformatics infrastructures
● duplication of effort in compiling and installing software tools
● some labs have no hardware, expertise, or time to install and run software
● early pioneer in this area was NEBC BioLinux ( tinyurl.com/BioLinux-NEBC )
●
desktop linux with with 100+ pre-configured bioinformatics tools
● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS
how about large-scale sequence
datasets ?
4. Cloud BioLinux
standardized, pre-configured and on-demand bioinformatics computing on the cloud
● JCVI's cloud computing expertise
● NEBC's bioinformatics software repository
● community effort – ISMB / BOSC 2010
● standardized, pre-configured Virtual Machine (VM, image)
+ ● VM: emulates a computer server, encapsulates operating
system, software libraries and bioinformatics tools
● Amazon EC2 computational capacity as a utility, on-demand
● rich interface through a remote desktop client
=
tinyurl.com/CloudBioLinux-JCVI
http://cloudbiolinux.com
5. Cloud BioLinux and Genomic Standards
framework to distribute bioinformatics tools, data and analysis results
create cloud VM / images with standardized software configurations
● customize Cloud BioLinux VMs, based on community requirements
● share customized VMs with collaborators, avoiding effort duplication
● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)
whole system snapshot exchange (Dudley and Butte 2010)
● capture the state of the computing system and data
● software execution parameters and “massaged” input datasets
● save into cloud VM / image and share along with analysis results
democratize access to computing resources
● large-scale computing independently of institutional or geographic boundaries
● only need a desktop computer with internet access
6. Cloud BioLinux and Genomic Standards
create cloud VM / images with standard software configurations
● framework to describe software components in cloud VM / image
● based on python-fabric automated deployment tool
● software components listed in simple text files
● edit the files to mix and match software according to your community needs
● community members use files to share descriptions of customized systems
● start with a bare-bones VM, fabric downloads and installs specified software
● Labs with sensitive data and capacity for private clouds: works identically on
Amazon EC2 or Eucalyptus open-source cloud
tinyurl.com/python-fabric open.eucalyptus.com
7. software domains in bioinformatics: nextgen
sequencing, de novo assembly, annotation, phylogeny,
molecular structures, gene expression analysis
high-level configuration describing software groups
for each group individual bioinformatics tools
tinyurl.com/CloudBioLinux-github
8. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
simply signup at
aws.amazon.com
then
aws.amazon.com/console
and
http://tinyurl.com/cloud-biolinux-tutorial
9. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
find Cloud Biolinux
using ID
enter desired
password for remote
desktop login
all other default
http://tinyurl.com/cloud-biolinux-tutorial
10. free remote desktop client:
nomachine.com/download.php
simply enter VM IP address
and your password
11. What if I want to
share my
alignments with
a collaborator?
save your data as
a new VM
0.10$ / GB /
month
at 15GB, it costs
1.5$ / month
12. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
share your analysis results: publicly or only with your
collaborators
authorized users can access the cloud VM/image with
all the software, data, analysis results
13. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
start VM / image share
perform analysis snapshot researcher B
researcher A
snapshot perform analysis
share start VM / image
14. Cloud Biolinux
The future
● expand community, receive feedback, add more software to the VM
● analysis pipelines that are used by large sequencing centers
● actively seeking funding to put major effort in development
● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/
● tinyurl.com/cloudbiolinux-lists or community@cloudbiolinux.com
15. Acknowledgments & Credits
Brad Chapman - development of the fabric scripts and community organizer
Tim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentation
Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop
Justin Johnson – community and sponsorship of cloudbiolinux.com
J. Craig Venter Inst. - time allowed to work on an open-source project
D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation
Members of the Cloud Biolinux community:
Enis Afgan
Michael Heuer
Richard Holland
Mark Jensen Thank you !
Dave Messina
Steffen Möller
Roman Valls