Cloud ntino-krampis

Cloud BioLinux: pre-Configured and on-demand
computing for genomics independently of institutional,
geographic or economic boundaries

Ntino Krampis, PhD
JCVI-NIAID workshop 2011
S. Africa

Expensive sequencing and large organizations
Commodity sequencing and small labs

●
large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, coordination with other centers

● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome

Acquiring the sequence data is only the first step

● downstream bioinformatics analysis for scientific discovery

● many commonly-used bioinformatics tools are difficult to install

● usually available only as source code - needs technical expertise

● large-scale sequence data analysis requires high performance
and expensive computing hardware

Alternative: computational capacity on the cloud

● Cloud Computing: large-scale, high
performance computers accessible
through the Internet

●Example: using Gmail, Google Docs,
Yahoo! Mail, FaceBook etc. you store and
access data on a remote computer

●Cloud Computing services - Amazon
EC2 (http://aws.amazon.com/ec2) rent high
computational and data storage capacity
on remote computers

How does Cloud Computing work ?

remote Amazon EC2 Cloud Computing service
operating system, bioinformatics software
and data, are installed in a Virtual Machine VM VM VM
(VM)

a VM is uploaded and executed on a cloud
computing service

run a practically unlimited number of VMs Internet
for large-scale sequence data analysis

access VM on a desktop computer through
the Internet

local desktop computers

Cloud BioLinux

● Cloud BioLinux by leverages VM technology and the
cloud, offering pre-configured bioinformatics computing

● allow setting up a high-performance data analysis
environment, without any technical expertise

● researchers can perform large-scale data analysis, by
simply using a desktop computer with Internet access

● accessible without any institutional, economic or national
boundaries

Launching Cloud BioLinux

1. sign up for an Amazon EC2 cloud account:
http://aws.amazon.com/ec2

Also can connect an existing account from the main Amazon.com website
for the cloud usage charges. We have an account ready for you:

Username: aws_nhgri@jcvi.org
Password: Nhg4|CL0ud!

2. using the account credentials sign in to the EC2 cloud console
(select EC2 in the dropdown menu below the sign-in button):
http://aws.amazon.com/console

3. launch Cloud BioLinux through the cloud console wizard

Launching Cloud BioLinux

Click the
button :

http://aws.amazon.com/console

Launch instance wizard: steps 1 & 2

1. specify the Cloud
BioLinux identifier
under “Community
AMIs” tab

2. computational
capacity: memory,
processor, CPU cores

Launch instance wizard: step 3

3. specify a
password for login for
the Cloud BioLinux
desktop, under “User
Data” box

4. remaining steps:
all as default, keep
clicking the
“Continue” button
until the wizard
finishes and you are
back to the console

Launching
Cloud
BioLinux

back to the
console after
we completed
the wizard

Pick a running
instance, select
with your
mouse and
copy its “Public
DNS” address
(Cloud
BioLinux
server address
on the cloud)

While waiting for Cloud BioLinux to boot up...

● examples of NCBI public datasets on EC2
● bringing the data to the compute

Final step: connecting remotely to Cloud BioLinux
click the NX client icon on your computer's desktop

A. paste the DNS in the “Host” box B. select “Unix”, “Gnome”, remote desktop size

C. “ubuntu” is the default user Login
“workshop” is the password we set

What if I want to
share my
alignments with
a collaborator?

save your data as
a new VM

0.10$ / GB /
month

at 15GB, it costs
1.5$ / month

Cloud BioLinux
whole system snapshot exchange
share your analysis results: publicly or only with your
collaborators

authorized users can access the cloud VM/image with
all the software, data, analysis results

Cloud BioLinux and Genomic Standards
whole system snapshot exchange

start VM / image share

perform analysis snapshot researcher B
researcher A

snapshot perform analysis

share start VM / image

Acknowledgments & Credits
Brad Chapman - development of the fabric scripts and community organizer
Tim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentation
Deepak Singh and AWS - education grant supporting ISMB / BOSC workshop
Justin Johnson – community and sponsorship of cloudbiolinux.com
J. Craig Venter Inst. - time allowed to work on an open-source project
D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation

Members of the Cloud Biolinux community:
Enis Afgan
Michael Heuer
Richard Holland
Mark Jensen Thank you !
Dave Messina
Steffen Möller
Roman Valls

Cloud ntino-krampis

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Cloud ntino-krampis

Similaire à Cloud ntino-krampis (20)

Dernier

Dernier (20)

Cloud ntino-krampis