Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Why The Cloud Is A Computational Biologist's Best Friend
1. Amazon Cloud: A Religious Experience
Yannick Pouliot
2/23/2012
2. Amazon Cloud services in a nutshell:
Highly flexible storage
and compute power sold
on a use basis
3. Why the Cloud?
• Complete flexibility of computing power and
storage
• Grow or diminish as needed
• Arbitrary number of machines
• Ridiculously powerful machine made
affordable on a short lease basis to address
particular task (e.g., 15B ANOVAs)
• Unusual architectures (e.g., GPUs)
4. There Are Many Cloud Providers…
… but Amazon is clear leader, IMO
5. Q: What does working with a Cloud
machine feel like?
A: It’s not materially different than
accessing a machine on our cluster,
except you can do anything you want
6. Main Services Provided by Amazon Cloud
• Storage
▫ Traditional disk volumes
▫ S3 buckets (“Simple Storage System”)
• Computing (EC2 – “Elastic Compute Cloud”)
▫ Single machine instances
▫ Clusters of various types
• Machine types
▫
▫
▫
▫
▫
Compute servers
Database servers
Cluster
Specialized architectures
Variety of operating systems (LINUX flavors, Windows)
7. Types of Instances
• Based on definition of the virtual machine
definition
▫
▫
▫
▫
I/O bus
Number of CPUs
Memory
Type of CPU, cluster
• Deployment: Spot market vs. “Reserved”
8. Costs
• You pay for (almost) everything you do
▫ Data transfers (out)
▫ Storage
▫ CPU cycles (depends on instance type; one
instance is free)
• Can purchase cycles at below average market
price
▫ Can provide access to vast amounts of computing
power at a price you can afford
• Research grants from Amazon
9. Controlling Your Services
• Web-base console
• Command-line tools
▫ EC2 API tools
• Third party systems: RightScale
10. Using & Distributing Instances
• You can always make images of your instances for
later use/backup
• Images can be made public
• You can launch other people’s images (i.e., public
images), e.g.,
▫ CloudBioLinux: pre-made biocomputational instances
▫ Galaxy Cloud: pre-made Cluster-based Galaxy
instance (Web-based, no less)
▫ PathSeq: pre-made comprehensive bowtie engine that
uses Hadoop
11. Issues
• Security
▫ Lots of it
• Data transfers
▫ Free for upload; $ for download
▫ No big deal, so far
▫ Can send drives…
• Latency
▫ No big deal
• Small “ephemeral” storage
▫ Gotcha if you don’t know
• Max 1 terabyte per disk
▫ Hum…
• “Max” 20 disks per instance
▫ Can be circumvented
• No sharing of disks between instances, usually
12. Support
• Unless you purchase support, you’re on your own
• Hasn’t been an issue for me, though it can consume time to find
solution…
Support options: