“Cloud BioLinux:Standardized, Pre-Configured and On-Demand
Computing for Genomics and Beyond
”. Genomics Standards Consortium Conference 2010, European Bioinformatics Institute, Hinxton, UK
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Ntino Krampis GSC 2011
1. Cloud BioLinux: Standardized, Pre-Configured and On-Demand
Computing for Genomics and Beyond
Ntino Krampis, PhD
GSC 2011
Hinxton, UK
2. Expensive sequencing and large organizations
Commodity sequencing and small labs
●
large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, coordination with other centers
● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
3. “Bioinformatics nation is a land of city-states” Lincoln Stein
● smaller labs building small-scale bioinformatics infrastructures
● duplication of effort in compiling and installing software tools
● some labs have no hardware, expertise, or time to install and run software
● early pioneer in this area was NEBC BioLinux ( tinyurl.com/BioLinux-NEBC )
●
desktop linux with with 100+ pre-configured bioinformatics tools
● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS
how about large-scale sequence
datasets ?
4. Cloud BioLinux
standardized, pre-configured and on-demand bioinformatics computing on the cloud
● JCVI's cloud computing expertise
● NEBC's bioinformatics software repository
● community effort – ISMB / BOSC 2010
● standardized, pre-configured Virtual Machine (VM, image)
+ ● VM: emulates a computer server, encapsulates operating
system, software libraries and bioinformatics tools
● Amazon EC2 computational capacity as a utility, on-demand
● rich interface through a remote desktop client
=
tinyurl.com/CloudBioLinux-JCVI
http://cloudbiolinux.com
5. Cloud BioLinux and Genomic Standards
framework to distribute bioinformatics tools, data and analysis results
create cloud VM / images with standardized software configurations
● customize Cloud BioLinux VMs, based on community requirements
● share customized VMs with collaborators, avoiding effort duplication
● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)
whole system snapshot exchange (Dudley and Butte 2010)
● capture the state of the computing system and data
● software execution parameters and “massaged” input datasets
● save into cloud VM / image and share along with analysis results
democratize access to computing resources
● large-scale computing independently of institutional or geographic boundaries
● only need a desktop computer with internet access
6. Cloud BioLinux and Genomic Standards
create cloud VM / images with standard software configurations
● framework to describe software components in cloud VM / image
● based on python-fabric automated deployment tool
● software components listed in simple text files
● edit the files to mix and match software according to your community needs
● community members use files to share descriptions of customized systems
● start with a bare-bones VM, fabric downloads and installs specified software
● Labs with sensitive data and capacity for private clouds: works identically on
Amazon EC2 or Eucalyptus open-source cloud
tinyurl.com/python-fabric open.eucalyptus.com
7. software domains in bioinformatics: nextgen
sequencing, de novo assembly, annotation, phylogeny,
molecular structures, gene expression analysis
high-level configuration describing software groups
for each group individual bioinformatics tools
tinyurl.com/CloudBioLinux-github
8. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
simply signup at
aws.amazon.com
then
aws.amazon.com/console
and
http://tinyurl.com/cloud-biolinux-tutorial
9. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
find Cloud Biolinux
using ID
enter desired
password for remote
desktop login
all other default
http://tinyurl.com/cloud-biolinux-tutorial
10. free remote desktop client:
nomachine.com/download.php
simply enter VM IP address
and your password
11. What if I want to
share my
alignments with
a collaborator?
save your data as
a new VM
0.10$ / GB /
month
at 15GB, it costs
1.5$ / month
12. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
share your analysis results: publicly or only with your
collaborators
authorized users can access the cloud VM/image with
all the software, data, analysis results
13. Cloud BioLinux and Genomic Standards
whole system snapshot exchange
start VM / image share
perform analysis snapshot researcher B
researcher A
snapshot perform analysis
share start VM / image
14. Cloud Biolinux
The future
● expand community, receive feedback, add more software to the VM
● analysis pipelines that are used by large sequencing centers
● actively seeking funding to put major effort in development
● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/
● tinyurl.com/cloudbiolinux-lists or community@cloudbiolinux.com
15. Acknowledgments & Credits
Brad Chapman - development of the fabric scripts and community organizer
Tim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentation
Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop
Justin Johnson – community and sponsorship of cloudbiolinux.com
J. Craig Venter Inst. - time allowed to work on an open-source project
D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation
Members of the Cloud Biolinux community:
Enis Afgan
Michael Heuer
Richard Holland
Mark Jensen Thank you !
Dave Messina
Steffen Möller
Roman Valls