SlideShare une entreprise Scribd logo
1  sur  196
Télécharger pour lire hors ligne
There	
  is	
  no	
  magic,	
  there	
  is	
  only	
  awesome
 Scien&fic	
  compu&ng	
  with	
  Amazon	
  Web	
  Services



 Deepak	
  Singh
 Business	
  Development	
  Manager	
  -­‐	
  Amazon	
  Compute	
  Services




Discovery	
  2015	
  Workshop,	
  July	
  23	
  2010
Via Reavel under a CC-BY-NC-ND license
life science industry
Credit: Bosco Ho
By ~Prescott under a CC-BY-NC license
<1>
the cloud
has_many :definitions
infrastructure as a service
The	
   “ Living	
   a nd	
   Evolving”	
   C loud
                                                            AWS	
  services	
  and	
  basic	
  terminology
Most	
  Applica9ons	
  Need:
1. Compute                                                                                                                                                                                 Your	
   A pplication
2. Storage
                                                                                                                                                                                              Amazon	
             Amazon	
   E lastic	
  
3. Messaging                                                                                                                                                                                   RDS               MapReduce	
   J obFlows
                      Payment	
   : 	
   A mazon	
   F PS/	
   D evPay

                                                                         Amazon	
   S impleDB	
   D omains
4. Payment                                                                                                                                                                                                                                               Amazon	
  
                                                                                                                                                                                                                                                          Cloud


                                                                                                                                              Amazon	
   S QS	
   Q ueues
                                                                                                                                                                                   Auto-­‐         Elastic	
     Cloud
5. Distribu9on                                                                                                   Amazon	
   S NS	
   Topics                                       Scaling            LB          Watch                                    Front
                                                                                                                                                                                                                                  Amazon	
   S 3	
  
6. Scale                                                                                                                                                                                                                          Objects	
   a nd	
  
                                                                                                                                                                                 Amazon	
   EC2	
   I nstances                     Buckets
7. Analy9cs                                                                                                                                                                  (On-­‐Demand,	
   Reserved,	
   S pot)

                                                                                                                                                                                                                    EBS          Snapshots
                                                                                                                                                                                                                  Volumes
                                                                                                                                                                                  Amazon	
  
                                                                                                                                                                            Virtual	
   P rivate	
   C loud

                                                                                                                    Amazon	
   Worldwide	
   P hysical	
   I nfrastructure	
  
                                                                                                             (Geographical	
   Regions,	
   Availability	
   Zones,	
   Edge	
   L ocations)	
  
Scalable
Increase	
  or	
  decrease	
  
 capacity	
  in	
  minutes
    AutomaIon
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes       Low	
  rate,	
  pay-­‐as-­‐you-­‐go
    AutomaIon
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes       Low	
  rate,	
  pay-­‐as-­‐you-­‐go
    AutomaIon




   Reliable
   Mission	
  CriIcal	
  
   Infrastructure
Scalable
Increase	
  or	
  decrease	
     Cost	
  Effec9ve
 capacity	
  in	
  minutes         Low	
  rate,	
  pay-­‐as-­‐you-­‐go
    AutomaIon




   Reliable                               Secure
   Mission	
  CriIcal	
  
                                 MulIlayer	
  security	
  faciliIes
   Infrastructure
compute
elastic compute cloud
elastic
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                                                                    344'+567/'(.'
                                                                    8%%9%.:/'




            344'JJ'



                           I%:.%/:1='    ;<"&/:1='     A&B:1='     C10"&:1='    C".:1='      E(.:1='      ;"%/:1='
                           >?,,?,44@'   >?,3?,44@'   >?,>?,44@'   >?,H?,44@'   >?,D?,44@'   >?,F?,44@'   >?,G?,44@'
programmable
// Run an instance
$EC2 = new AmazonEC2();

$Options = array('KeyName' => "Jeff's Keys",
                 'InstanceType' => "m1.small");

$Res = $EC2->run_instances("ami-db7b9db2", 1, 1, $Options);
more later
cost effective
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                                                                    344'+567/'(.'
                                                                    8%%9%.:/'




            344'JJ'



                           I%:.%/:1='    ;<"&/:1='     A&B:1='     C10"&:1='    C".:1='      E(.:1='      ;"%/:1='
                           >?,,?,44@'   >?,3?,44@'   >?,>?,44@'   >?,H?,44@'   >?,D?,44@'   >?,F?,44@'   >?,G?,44@'
% Utilization




time
Ideal Effective Utilization
% Utilization




                 time
Ideal Effective Utilization
% Utilization




                          Real Utilization

                 time
Ideal Effective Utilization
% Utilization




                          Real Utilization

                 time
on-demand instances
 reserved instances
   spot instances
Amazon EC2 On-Demand price for the same instance is $0.50
Ideal Effective Utilization
% Utilization




                 time
Ideal Effective Utilization
% Utilization




                           Reserved Utilization




                 time
Ideal Effective Utilization
% Utilization




                           Reserved Utilization




                 time
Ideal Effective Utilization
% Utilization




                         On Demand Utilization




                           Reserved Utilization




                 time
Ideal Effective Utilization
                Spot Utilization
% Utilization




                                            On Demand Utilization




                                              Reserved Utilization




                                    time
secure
Customer	
  A



                                                                                     Customer	
  B




                                                                                                                                                        Customer	
  Z
• Guest	
  operaIng	
  system	
  doesn’t	
  
  have	
  elevated	
  privilege	
  level.
• Instances	
  are	
  completely	
  
                                                                                                                            …
  isolated.
• Intrinsic	
  network	
  firewall.
• No	
  access	
  to	
  raw	
  devices.
• Virtualized	
  disks,	
  logically	
  
  isolated,	
  wiped	
  clean	
  aRer	
  use.
                                                                            	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Hypervisor




                                                                                                     Firewall


                                                                Physical	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Interface
{ "Version": "2008-10-17",
  "Id": "Queue1_Policy_UUID",
  "Statement": {

"Sid":"Queue1_AnonymousAccess_ReceiveMessage_TimeLimit"
,
       "Effect": "Allow",
       "Principal": { "AWS": "*" },
       "Action": "SQS:ReceiveMessage",
       "Resource": "/987654321098/queue1",
       "Condition" : {
              "DateGreaterThan" :
{ "AWS:CurrentTime":"2009-01-31T12:00Z" },
              "DateLessThan" :
{ "AWS:CurrentTime":"2009-01-31T15:00Z" }
       }
   }
}
Amazon	
  Virtual	
  Private	
  Cloud	
  (VPC)

                                                              Customer’s isolated
                                                              AWS resources

                                                                          Subnets




                                                     Router
                           VPN Gateway



                                                               Amazon
                                                               Web Services
                                                               Cloud
                               Secure VPN
                               Connection over the
                               Internet




                           Customer’s
                           Network
storage
Amazon S3
highly scalable
highly available
highly durable
Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
T

                                             Node	
  1         Node	
  n

                                                         ...




Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
Region

                                             Datacent             Datacent
                                                er                   er

                                                      Datacent
                                                         er
            Node	
  1         Node	
  n

                        ...


Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
elastic block store
block device
resizable
boot device
one size does not fit all
Amazon S3                                                            Amazon EC2 + EBS

•    Cost-­‐effecIve	
  blob	
  or	
  	
  large	
  object	
  storage       •    Mul9ple	
  flavors	
  of	
  database	
  engine
•    Minimal	
  rela9onships	
  between	
  objects                        •    Complete	
  control




Amazon SimpleDB                                                                    Amazon RDS

•   Zero	
  administra9ve	
  overhead	
  (automaIc	
                      •    Na9ve	
  access	
  to	
  database	
  engine
    handling	
  of	
  geo-­‐redundant	
  replicaIon,	
  index	
           •    Easy	
  migra9on	
  path	
  (exisIng	
  code,	
  tools,	
  
    creaIon,	
  database	
  tuning)                                            applicaIon	
  are	
  compaIble)
•   AutomaIc	
  and	
  elasIc	
  scaling	
  of	
  resources	
  to	
       •    Key	
  features	
  of	
  a	
  relaIonal	
  database,	
  such	
  as	
  
    meet	
  request	
  load                                                    joins	
  or	
  complex	
  transac9ons
•   High	
  availability	
  (mulIple	
  copies	
  of	
  data	
  for	
     •    Managed	
  experience	
  (offload	
  common	
  DBA	
  
    reliability	
  and	
  failover)                                            tasks,	
  lower	
  total	
  cost	
  of	
  ownership)
•   Flexibility	
  (schema-­‐less	
  data	
  store)
an ecosystem prospers
<2>
infrastructure as code
Source: Chris Dagdigian
• Images:                        • Keypairs:                                  • VPC:
    –   RegisterImage                       – CreateKeyPair                      –   CreateCustomerGateway
    –   DescribeImages                      – DescribeKeyPairs                   –   DeleteCustomerGateway
    –   DeregisterImage                     – DeleteKeyPair                      –   DescribeCustomerGateways
    –   ModifyImageAcribute                                                      –   AssociateDhcpOpIons
    –   DescribeImageAcribute    •         Security	
  Groups:                   –   CreateDhcpOpIons
    –   ResetImageAcribute                                                       –   DeleteDhcpOpIons
                                            – CreateSecurityGroup                –   DescribeDhcpOpIons
                                            – DescribeSecurityGroups             –   CreateSubnet
• Instances:                                – DeleteSecurityGroup                –   DeleteSubnet
    –   RunInstances                                                             –   DescribeSubnets
                                            – AuthorizeSecurityGroupIngress
    –   DescribeInstances                                                        –   CreateVpc
    –   TerminateInstances                  – RevokeSecurityGroupIngress
                                                                                 –   DeleteVpc
    –   StopInstances                                                            –   DescribeVpcs
    –   GetConsoleOutput                • Block	
  Storage	
  Volumes:           –   CreateVpnConnecIon
    –   RebootInstances                    – CreateVolume                        –   DeleteVpnConnecIon
    –   CreatePlacementGroup                                                     –   DescribeVpnConnecIons
                                           – DeleteVolume
    –   DescribePlacementGroup                                                   –   AcachVpnGateway
                                           – DescribeVolumes
                                                                                 –   CreateVpnGateway
• IP	
  Addresses:                         – AhachVolume                         –   DeleteVpnGateway
    –   AllocateAddress                    – DetachVolume                        –   DescribeVpnGateways
    –   ReleaseAddress                     – CreateSnapshot                      –   DetachVpnGateway
    –   AssociateAddress                   – DescribeSnapshots
    –   DisassociateAddress
                                           – DeleteSnapshot
    –   DescribeAddresses

                                 	
  
using libraries
def access_key
                options.services['access-key']
  Access      end

credentials   def secret_key
                options.services['secret-key']
              end
class EC2

attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index,
:volume_index

      def initialize(access_key, secret_key)
        @ec2 = RightAws::Ec2.new(access_key, secret_key)
        @instance_index = {}
        @image_index = {}
        @elastic_ip_index = {}
        @volume_index = {}
      end

end
class Instance
    attr_accessor :aws_hash, :elastic_ip

      def initialize(hash, elastic_ip = nil)
        @aws_hash = hash
        @elastic_ip = elastic_ip
      end

      def public_dns
        @aws_hash[:dns_name] || ""
      end

      def friendly_name
        public_dns.empty? ? status.capitalize : public_dns.split(".")[0]
      end

      def id
        @aws_hash[:aws_instance_id]
      end
end
class EC2

         attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index,
         :volume_index

               def initialize(access_key, secret_key)
                 @ec2 = RightAws::Ec2.new(access_key, secret_key)
                 @instance_index = {}
                 @image_index = {}
                 @elastic_ip_index = {}
                 @volume_index = {}
               end

             def instance_index
               if @instance_index.empty?
                 @ec2.describe_instances.each do |i|
                   # create an Instance object & add to the array
Custom             @instance_index[i[:aws_instance_id]] = Instance.new(i,
         get_elastic_ip_for_instance_id(i[:aws_instance_id]))
 index           end
               end
               return @instance_index
             end

         end
class Instance
             attr_accessor :aws_hash, :elastic_ip

               def initialize(hash, elastic_ip = nil)
                 @aws_hash = hash
                 @elastic_ip = elastic_ip
               end

               def public_dns
                 @aws_hash[:dns_name] || ""
               end

               def friendly_name
                 public_dns.empty? ? status.capitalize : public_dns.split(".")[0]
               end

               def id
                 @aws_hash[:aws_instance_id]
               end

               def running?
Helper           status == "running"
               end
         end
configuration management
cfengine


puppet


 chef
chef
dsl
include_recipe "packages"
include_recipe "ruby"
include_recipe "apache2"

if platform?("centos","redhat")
  if dist_only?
     # just the gem, we'll install the apache module within apache2
     package "rubygem-passenger"
     return
  else
     package "httpd-devel"
  end
else
  %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
     package pkg do
       action :upgrade
     end
  end
end

gem_package "passenger" do
  version node[:passenger][:version]
end

execute "passenger_module" do
  command 'echo -en "nnnn" | passenger-install-apache2-module'
  creates node[:passenger][:module_path]
end
include_recipe "packages"
Modular   include_recipe "ruby"
          include_recipe "apache2"

          if platform?("centos","redhat")
            if dist_only?
               # just the gem, we'll install the apache module within apache2
               package "rubygem-passenger"
               return
            else
               package "httpd-devel"
            end
          else
            %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
               package pkg do
                 action :upgrade
               end
            end
          end

          gem_package "passenger" do
            version node[:passenger][:version]
          end

          execute "passenger_module" do
            command 'echo -en "nnnn" | passenger-install-apache2-module'
            creates node[:passenger][:module_path]
          end
include_recipe "packages"
           include_recipe "ruby"
           include_recipe "apache2"

OS aware   if platform?("centos","redhat")
             if dist_only?
                # just the gem, we'll install the apache module within apache2
                package "rubygem-passenger"
                return
             else
                package "httpd-devel"
             end
           else
             %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
                package pkg do
                  action :upgrade
                end
             end
           end

           gem_package "passenger" do
             version node[:passenger][:version]
           end

           execute "passenger_module" do
             command 'echo -en "nnnn" | passenger-install-apache2-module'
             creates node[:passenger][:module_path]
           end
include_recipe "packages"
         include_recipe "ruby"
         include_recipe "apache2"

         if platform?("centos","redhat")
           if dist_only?
              # just the gem, we'll install the apache module within apache2
              package "rubygem-passenger"
              return
           else
              package "httpd-devel"
           end
         else
           %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
 Ruby         package pkg do
                action :upgrade
syntax     end
              end

         end

         gem_package "passenger" do
           version node[:passenger][:version]
         end

         execute "passenger_module" do
           command 'echo -en "nnnn" | passenger-install-apache2-module'
           creates node[:passenger][:module_path]
         end
include_recipe "packages"
          include_recipe "ruby"
          include_recipe "apache2"

          if platform?("centos","redhat")
            if dist_only?
               # just the gem, we'll install the apache module within apache2
               package "rubygem-passenger"
               return
            else
               package "httpd-devel"
            end
          else
            %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
               package pkg do
                 action :upgrade
               end
            end
          end

Package   gem_package "passenger" do
            version node[:passenger][:version]
 aware    end

          execute "passenger_module" do
            command 'echo -en "nnnn" | passenger-install-apache2-module'
            creates node[:passenger][:module_path]
          end
include_recipe "packages"
          include_recipe "ruby"
          include_recipe "apache2"

          if platform?("centos","redhat")
            if dist_only?
               # just the gem, we'll install the apache module within apache2
               package "rubygem-passenger"
               return
            else
               package "httpd-devel"
            end
          else
            %w{ apache2-prefork-dev libapr1-dev }.each do |pkg|
               package pkg do
                 action :upgrade
               end
            end
          end

          gem_package "passenger" do
            version node[:passenger][:version]
          end

          execute "passenger_module" do

Execute     command 'echo -en "nnnn" | passenger-install-apache2-module'
            creates node[:passenger][:module_path]
          end
recipes
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
  cookbook "passenger_apache2"
  source "passenger.conf.erb"
  owner "root"
  group "root"
  mode 0755
end
Template   template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
             cookbook "passenger_apache2"
             source "passenger.conf.erb"
             owner "root"
             group "root"
             mode 0755
           end
template "#{node[:apache][:dir]}/mods-available/passenger.conf" do
             cookbook "passenger_apache2"
Cookbook     source "passenger.conf.erb"
             owner "root"
 re-use      group "root"
             mode 0755
           end
<3>
architectural lessons
design for failure
“Everything fails, all the time”
                   -- Werner Vogels
“Things will crash. Deal with it”
                        -- Jeff Dean
2-4% of servers
                                will die annually



Source: Jeff Dean, LADIS 2009
1-5% of disk drives
                                 will die every year



Source: Jeff Dean, LADIS 2009
2.3% AFR in population of 13,250
3.3% AFR in population of 22,400
4.2% AFR in population of 246,000
2.3% AFR in population of 13,250
                         3.3% AFR in population of 22,400
                         4.2% AFR in population of 246,000




Source: James Hamilton (http://perspectives.mvdirona.com)
human errors
human errors
          ~20% admin issues have unintended consequences




Source: James Hamilton (http://perspectives.mvdirona.com)
assume sw/hw failure
avoid single points of failure
system as a whole is reslient
loose coupling sets you free
loose coupling sets you free
using message queues
Tight	
  Coupling                Controller	
  A        Controller	
  B        Controller	
  C


                         Q                          Q                      Q

Loose	
  Coupling	
  using	
  
                                  Controller	
  A        Controller	
  B        Controller	
  C
Queues
implement elasticity
no assumptions
resilience to reboot
bootstrap
dynamic
multi-layered security
“Web”	
  Security	
  Group:
TCP	
  	
  80	
   0.0.0.0/0
TCP	
  	
  443	
   0.0.0.0/0
TCP	
  	
  22	
   “App”

“App”	
  Security	
  Group:
TCP	
  	
  8080	
   “Web”
TCP	
  	
  22	
   172.154.0.0/16
TCP	
  	
  22	
   “App”

“DB”	
  Security	
  Group:
TCP	
  	
  3306	
   “App”
TCP	
  	
  3306	
   163.128.25.32/32
TCP	
  	
  22	
   “App”
embrace constraints
distributed memory
sharded DBs
hardware failed?

simply throw it away and
 switch to new hardware
  with no additional cost
cache
think parallel
different architectures
multi-threaded, concurrent
          requests
mapreduce
elastic load-balancing
decompose jobs into
   simplest form
leverage many storage
        options
<4>
computing in the cloud
3 modalities
batch processing
“grids”
queues
URL	
  Queue                   Fetch	
  Images     S3




                    Fetch	
  &	
  Store	
           Render	
  
                        Page                        Queue




                       Parse	
                        Render	
  
                                              S3     Images	
  &	
     S3
                       Queue
                                                       Pages




                      Parse	
  Page




                       Image	
  
                       Queue




Source: Jeff Barr
sudo gem install cloud-crowd

http://wiki.github.com/documentcloud/cloud-crowd
http://www.rightscale.com
data-intensive computing
Amazon Elastic MapReduce


                                      Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                     Hadoop                Hadoop     Hadoop
                          Elastic                                                        Elastic
                        MapReduce                                                      MapReduce
                                     Hadoop                Hadoop     Hadoop                       Notify
Web Console, Command
      line tools                     Input                                   output
                                    dataset                                  results



                                         Input	
  S3	
              Output	
  S3	
                  Get Results
   Input Data
                                          bucket                     bucket



                                       Amazon S3
PREANNOUNCE	
  –	
  EXPAND/SHRINK	
  CLUSTERS

                 Use	
  Case:	
  Increase	
  speed	
  of	
  running	
  job	
  flows
           Speed	
  up	
  job	
  flow	
  execuIon	
  in	
  response	
  to	
  changing	
  requirements
          Dynamically	
  balance	
  cost	
  versus	
  performance	
  without	
  restarIng	
  a	
  job


                                                                                          Job Flow
                                                   Job Flow
                 Job Flow

 Allocate                       Expand to                           Expand to
4 instances                     9 instances                        25 instances




              Time remaining:
                                                Time remaining:
                 14 Hours                           7 Hours
                                                                                       Time remaining:
                                                                                            3 Hours
Use	
  Case:	
  Agile	
  Data	
  Warehouse	
  Cluster
                   Customize	
  cluster	
  size	
  to	
  support	
  varying	
  resource	
  needs
               Leverage	
  flexibility	
  to	
  reduce	
  costs	
  and	
  increase	
  cluster	
  uIlizaIon



                                                     Data Warehouse
                                                    (Batch Processing)
              Data Warehouse                                                                    Data Warehouse
               (Steady State)                                                                    (Steady State)

 Allocate                          Expand to                                       Shrink to
9 instances                       25 instances                                    9 instances
PREANNOUNCE	
  –	
  IntegraIon	
  with	
  Spot	
  Instances


                                                            Cost without Spot:
                                                            4 instances *14 hrs * $0.50 = $28
                                            Job Flow
                Job Flow                                    Cost with Spot:
 Allocate                   Expand to                       4 instances *7 hrs * $0.50 = $13 +
4 instances                 9 instances                     5 instances * 7 hrs * $0.25 = $8.75
                                                            Total = $21.75

              Time remaining:                               Savings: ~22%
                14 Hours                     7 Hours
                                          Time remaining:
high performance computing
Low latency
high bandwidth
cluster compute instances
full bisection bandwidth
10gbps
2 * Xeon 5570 (Intel “Nehalem”)
          23 GB RAM
       10 gbps Ethernet
       1690 TB local disk
    HVM-based virtualization
           $1.60 / hr
managing compute cycles
http://cyclecomputing.com
http://web.mit.edu/stardev/cluster/
SQS
<5>
AWS + science = win
3.7 million classifications in just over three days
~15 million in less than a month
>2.6 million clicks in 100 hours
Biomarker Warehouse
pre-clinical, clinical, 3rd party data and publications




            Estimated cost: 10 TB warehouse over 3 years
Protein interactions @ U. Washington




           Simple Python scripts automate the
           management of 1000s of simultaneous
           experiments using the EC2 API




                                                 http://faculty.washington.edu/danielt/
Source: Ed Lazowska
200 instances
                         60000 structures
                             4 hours
http://bioteam.net/aws
HEAVY-ION COLLISIONS

Problem: Quark matter physics conference
imminent but no compute resources handy

Solution: NIMBUS context broker allowed
researchers to provision 300 nodes and get the
simulations done
Image: Wikipedia
lots and lots and lots and lots
 and lots and lots of data and
  lots and lots of lots of data
Image	
  via	
  image	
  editor	
  under	
  a	
  CC-­‐BY	
  License
Image: NOAA
scale
 availability
 utilization
   sharing
collaboration
we are data geeks not data center geeks
BLAT @ U. Penn
Map 100 million, 100 base paired end reads
Quad core with 5 GB of RAM would take 16 days




30 high-memory instances; 32 hours; $195
                                                Source: Angel Pizzaro/John Hogenesch
BELLE MONTE CARLO




Credit: Tom Fifield
MapReduce for Genomics

                                                            Ben Langmead

   http://bowtie-bio.sourceforge.net/crossbow/index.shtml
              http://contrail-bio.sourceforge.net
     http://bowtie-bio.sourceforge.net/myrna/index.shtml
platform for science
http://www.cloudbiolinux.com/
http://usegalaxy.org/cloud
http://dnanexus.com
http://www.elasticr.net




            Elastic-R Collaborative Research Environment
http://aws.amazon.com/publicdatasets/
s3://1000genomes
deesingh@amazon.com	
  
                                                                                      Twicer:@mndoci
                                                           slides	
  at	
  hcp://slideshare.net/mndoci




   InspiraIon	
  and	
  material	
  from	
  Mah	
  Wood,
         James	
  Hamilton	
  &	
  Larry	
  Lessig


By Oberazzi under a CC-BY-NC-SA license

Contenu connexe

Plus de Deepak Singh

Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Deepak Singh
 
Talk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsTalk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsDeepak Singh
 
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkPlatforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkDeepak Singh
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingDeepak Singh
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteDeepak Singh
 
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingTalk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingDeepak Singh
 
Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data scienceDeepak Singh
 
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talkBio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talkDeepak Singh
 
Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010Deepak Singh
 
NHGRI Cloud Computing talk
NHGRI Cloud Computing talkNHGRI Cloud Computing talk
NHGRI Cloud Computing talkDeepak Singh
 
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010Deepak Singh
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceDeepak Singh
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopDeepak Singh
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for BioinformaticsDeepak Singh
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Deepak Singh
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science ConnectedDeepak Singh
 
Bioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameBioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameDeepak Singh
 

Plus de Deepak Singh (20)

Intel Theater Presentation - SC11
Intel Theater Presentation - SC11Intel Theater Presentation - SC11
Intel Theater Presentation - SC11
 
Talk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource DirectorsTalk at West Coast Association of Shared Resource Directors
Talk at West Coast Association of Shared Resource Directors
 
Platforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the BrinkPlatforms for Data Science - Computing on the Brink
Platforms for Data Science - Computing on the Brink
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
#arseniclife
#arseniclife#arseniclife
#arseniclife
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop Keynote
 
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingTalk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
 
Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data science
 
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talkBio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
 
Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010
 
NHGRI Cloud Computing talk
NHGRI Cloud Computing talkNHGRI Cloud Computing talk
NHGRI Cloud Computing talk
 
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshop
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for Bioinformatics
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
Bioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameBioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frame
 
Searching Science
Searching ScienceSearching Science
Searching Science
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Discovery 2015 Workshop

  • 1. There  is  no  magic,  there  is  only  awesome Scien&fic  compu&ng  with  Amazon  Web  Services Deepak  Singh Business  Development  Manager  -­‐  Amazon  Compute  Services Discovery  2015  Workshop,  July  23  2010
  • 2.
  • 3.
  • 4. Via Reavel under a CC-BY-NC-ND license
  • 5.
  • 6.
  • 9.
  • 10. By ~Prescott under a CC-BY-NC license
  • 11. <1>
  • 15. The   “ Living   a nd   Evolving”   C loud AWS  services  and  basic  terminology Most  Applica9ons  Need: 1. Compute Your   A pplication 2. Storage Amazon   Amazon   E lastic   3. Messaging RDS MapReduce   J obFlows Payment   :   A mazon   F PS/   D evPay Amazon   S impleDB   D omains 4. Payment Amazon   Cloud Amazon   S QS   Q ueues Auto-­‐ Elastic   Cloud 5. Distribu9on Amazon   S NS   Topics Scaling LB Watch Front Amazon   S 3   6. Scale Objects   a nd   Amazon   EC2   I nstances Buckets 7. Analy9cs (On-­‐Demand,   Reserved,   S pot) EBS Snapshots Volumes Amazon   Virtual   P rivate   C loud Amazon   Worldwide   P hysical   I nfrastructure   (Geographical   Regions,   Availability   Zones,   Edge   L ocations)  
  • 16.
  • 17. Scalable Increase  or  decrease   capacity  in  minutes AutomaIon
  • 18. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon
  • 19. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon Reliable Mission  CriIcal   Infrastructure
  • 20. Scalable Increase  or  decrease   Cost  Effec9ve capacity  in  minutes Low  rate,  pay-­‐as-­‐you-­‐go AutomaIon Reliable Secure Mission  CriIcal   MulIlayer  security  faciliIes Infrastructure
  • 24. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  • 26.
  • 27.
  • 28. // Run an instance $EC2 = new AmazonEC2(); $Options = array('KeyName' => "Jeff's Keys", 'InstanceType' => "m1.small"); $Res = $EC2->run_instances("ami-db7b9db2", 1, 1, $Options);
  • 31. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  • 32.
  • 33.
  • 35. Ideal Effective Utilization % Utilization time
  • 36. Ideal Effective Utilization % Utilization Real Utilization time
  • 37. Ideal Effective Utilization % Utilization Real Utilization time
  • 38. on-demand instances reserved instances spot instances
  • 39.
  • 40. Amazon EC2 On-Demand price for the same instance is $0.50
  • 41. Ideal Effective Utilization % Utilization time
  • 42. Ideal Effective Utilization % Utilization Reserved Utilization time
  • 43. Ideal Effective Utilization % Utilization Reserved Utilization time
  • 44. Ideal Effective Utilization % Utilization On Demand Utilization Reserved Utilization time
  • 45. Ideal Effective Utilization Spot Utilization % Utilization On Demand Utilization Reserved Utilization time
  • 47. Customer  A Customer  B Customer  Z • Guest  operaIng  system  doesn’t   have  elevated  privilege  level. • Instances  are  completely   … isolated. • Intrinsic  network  firewall. • No  access  to  raw  devices. • Virtualized  disks,  logically   isolated,  wiped  clean  aRer  use.                            Hypervisor Firewall Physical                                  Interface
  • 48. { "Version": "2008-10-17", "Id": "Queue1_Policy_UUID", "Statement": { "Sid":"Queue1_AnonymousAccess_ReceiveMessage_TimeLimit" , "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "SQS:ReceiveMessage", "Resource": "/987654321098/queue1", "Condition" : { "DateGreaterThan" : { "AWS:CurrentTime":"2009-01-31T12:00Z" }, "DateLessThan" : { "AWS:CurrentTime":"2009-01-31T15:00Z" } } } }
  • 49. Amazon  Virtual  Private  Cloud  (VPC) Customer’s isolated AWS resources Subnets Router VPN Gateway Amazon Web Services Cloud Secure VPN Connection over the Internet Customer’s Network
  • 53.
  • 56. Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  • 57. T Node  1 Node  n ... Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  • 58. Region Datacent Datacent er er Datacent er Node  1 Node  n ... Note: Conceptual drawing only. Actual number of nodes & datacenters may vary
  • 63. one size does not fit all
  • 64. Amazon S3 Amazon EC2 + EBS • Cost-­‐effecIve  blob  or    large  object  storage • Mul9ple  flavors  of  database  engine • Minimal  rela9onships  between  objects • Complete  control Amazon SimpleDB Amazon RDS • Zero  administra9ve  overhead  (automaIc   • Na9ve  access  to  database  engine handling  of  geo-­‐redundant  replicaIon,  index   • Easy  migra9on  path  (exisIng  code,  tools,   creaIon,  database  tuning) applicaIon  are  compaIble) • AutomaIc  and  elasIc  scaling  of  resources  to   • Key  features  of  a  relaIonal  database,  such  as   meet  request  load joins  or  complex  transac9ons • High  availability  (mulIple  copies  of  data  for   • Managed  experience  (offload  common  DBA   reliability  and  failover) tasks,  lower  total  cost  of  ownership) • Flexibility  (schema-­‐less  data  store)
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72. <2>
  • 75. • Images: • Keypairs: • VPC: – RegisterImage – CreateKeyPair – CreateCustomerGateway – DescribeImages – DescribeKeyPairs – DeleteCustomerGateway – DeregisterImage – DeleteKeyPair – DescribeCustomerGateways – ModifyImageAcribute – AssociateDhcpOpIons – DescribeImageAcribute • Security  Groups: – CreateDhcpOpIons – ResetImageAcribute – DeleteDhcpOpIons – CreateSecurityGroup – DescribeDhcpOpIons – DescribeSecurityGroups – CreateSubnet • Instances: – DeleteSecurityGroup – DeleteSubnet – RunInstances – DescribeSubnets – AuthorizeSecurityGroupIngress – DescribeInstances – CreateVpc – TerminateInstances – RevokeSecurityGroupIngress – DeleteVpc – StopInstances – DescribeVpcs – GetConsoleOutput • Block  Storage  Volumes: – CreateVpnConnecIon – RebootInstances – CreateVolume – DeleteVpnConnecIon – CreatePlacementGroup – DescribeVpnConnecIons – DeleteVolume – DescribePlacementGroup – AcachVpnGateway – DescribeVolumes – CreateVpnGateway • IP  Addresses: – AhachVolume – DeleteVpnGateway – AllocateAddress – DetachVolume – DescribeVpnGateways – ReleaseAddress – CreateSnapshot – DetachVpnGateway – AssociateAddress – DescribeSnapshots – DisassociateAddress – DeleteSnapshot – DescribeAddresses  
  • 77. def access_key options.services['access-key'] Access end credentials def secret_key options.services['secret-key'] end
  • 78. class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end end
  • 79. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end end
  • 80. class EC2 attr_accessor :ec2, :instance_index, :image_index, :elastic_ip_index, :volume_index def initialize(access_key, secret_key) @ec2 = RightAws::Ec2.new(access_key, secret_key) @instance_index = {} @image_index = {} @elastic_ip_index = {} @volume_index = {} end def instance_index if @instance_index.empty? @ec2.describe_instances.each do |i| # create an Instance object & add to the array Custom @instance_index[i[:aws_instance_id]] = Instance.new(i, get_elastic_ip_for_instance_id(i[:aws_instance_id])) index end end return @instance_index end end
  • 81. class Instance attr_accessor :aws_hash, :elastic_ip def initialize(hash, elastic_ip = nil) @aws_hash = hash @elastic_ip = elastic_ip end def public_dns @aws_hash[:dns_name] || "" end def friendly_name public_dns.empty? ? status.capitalize : public_dns.split(".")[0] end def id @aws_hash[:aws_instance_id] end def running? Helper status == "running" end end
  • 84. chef
  • 85. dsl
  • 86. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 87. include_recipe "packages" Modular include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 88. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" OS aware if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 89. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| Ruby package pkg do action :upgrade syntax end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 90. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end Package gem_package "passenger" do version node[:passenger][:version] aware end execute "passenger_module" do command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 91. include_recipe "packages" include_recipe "ruby" include_recipe "apache2" if platform?("centos","redhat") if dist_only? # just the gem, we'll install the apache module within apache2 package "rubygem-passenger" return else package "httpd-devel" end else %w{ apache2-prefork-dev libapr1-dev }.each do |pkg| package pkg do action :upgrade end end end gem_package "passenger" do version node[:passenger][:version] end execute "passenger_module" do Execute command 'echo -en "nnnn" | passenger-install-apache2-module' creates node[:passenger][:module_path] end
  • 93. template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755 end
  • 94. Template template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" source "passenger.conf.erb" owner "root" group "root" mode 0755 end
  • 95. template "#{node[:apache][:dir]}/mods-available/passenger.conf" do cookbook "passenger_apache2" Cookbook source "passenger.conf.erb" owner "root" re-use group "root" mode 0755 end
  • 96. <3>
  • 98.
  • 100.
  • 101. “Everything fails, all the time” -- Werner Vogels
  • 102.
  • 103. “Things will crash. Deal with it” -- Jeff Dean
  • 104. 2-4% of servers will die annually Source: Jeff Dean, LADIS 2009
  • 105. 1-5% of disk drives will die every year Source: Jeff Dean, LADIS 2009
  • 106. 2.3% AFR in population of 13,250 3.3% AFR in population of 22,400 4.2% AFR in population of 246,000
  • 107. 2.3% AFR in population of 13,250 3.3% AFR in population of 22,400 4.2% AFR in population of 246,000 Source: James Hamilton (http://perspectives.mvdirona.com)
  • 109. human errors ~20% admin issues have unintended consequences Source: James Hamilton (http://perspectives.mvdirona.com)
  • 111. avoid single points of failure
  • 112. system as a whole is reslient
  • 113. loose coupling sets you free
  • 114. loose coupling sets you free
  • 116. Tight  Coupling Controller  A Controller  B Controller  C Q Q Q Loose  Coupling  using   Controller  A Controller  B Controller  C Queues
  • 123. “Web”  Security  Group: TCP    80   0.0.0.0/0 TCP    443   0.0.0.0/0 TCP    22   “App” “App”  Security  Group: TCP    8080   “Web” TCP    22   172.154.0.0/16 TCP    22   “App” “DB”  Security  Group: TCP    3306   “App” TCP    3306   163.128.25.32/32 TCP    22   “App”
  • 127. hardware failed? simply throw it away and switch to new hardware with no additional cost
  • 128. cache
  • 134. decompose jobs into simplest form
  • 136. <4>
  • 141. queues
  • 142. URL  Queue Fetch  Images S3 Fetch  &  Store   Render   Page Queue Parse   Render   S3 Images  &   S3 Queue Pages Parse  Page Image   Queue Source: Jeff Barr
  • 143. sudo gem install cloud-crowd http://wiki.github.com/documentcloud/cloud-crowd
  • 146.
  • 147. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  • 148. PREANNOUNCE  –  EXPAND/SHRINK  CLUSTERS Use  Case:  Increase  speed  of  running  job  flows Speed  up  job  flow  execuIon  in  response  to  changing  requirements Dynamically  balance  cost  versus  performance  without  restarIng  a  job Job Flow Job Flow Job Flow Allocate Expand to Expand to 4 instances 9 instances 25 instances Time remaining: Time remaining: 14 Hours 7 Hours Time remaining: 3 Hours
  • 149. Use  Case:  Agile  Data  Warehouse  Cluster Customize  cluster  size  to  support  varying  resource  needs Leverage  flexibility  to  reduce  costs  and  increase  cluster  uIlizaIon Data Warehouse (Batch Processing) Data Warehouse Data Warehouse (Steady State) (Steady State) Allocate Expand to Shrink to 9 instances 25 instances 9 instances
  • 150. PREANNOUNCE  –  IntegraIon  with  Spot  Instances Cost without Spot: 4 instances *14 hrs * $0.50 = $28 Job Flow Job Flow Cost with Spot: Allocate Expand to 4 instances *7 hrs * $0.50 = $13 + 4 instances 9 instances 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Time remaining: Savings: ~22% 14 Hours 7 Hours Time remaining:
  • 154.
  • 156. 10gbps
  • 157. 2 * Xeon 5570 (Intel “Nehalem”) 23 GB RAM 10 gbps Ethernet 1690 TB local disk HVM-based virtualization $1.60 / hr
  • 159.
  • 162. SQS
  • 163. <5>
  • 164. AWS + science = win
  • 165.
  • 166.
  • 167. 3.7 million classifications in just over three days ~15 million in less than a month >2.6 million clicks in 100 hours
  • 168. Biomarker Warehouse pre-clinical, clinical, 3rd party data and publications Estimated cost: 10 TB warehouse over 3 years
  • 169. Protein interactions @ U. Washington Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API http://faculty.washington.edu/danielt/ Source: Ed Lazowska
  • 170. 200 instances 60000 structures 4 hours http://bioteam.net/aws
  • 171. HEAVY-ION COLLISIONS Problem: Quark matter physics conference imminent but no compute resources handy Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done
  • 173.
  • 174. lots and lots and lots and lots and lots and lots of data and lots and lots of lots of data
  • 175.
  • 176.
  • 177.
  • 178.
  • 179.
  • 180. Image  via  image  editor  under  a  CC-­‐BY  License
  • 182. scale availability utilization sharing collaboration
  • 183. we are data geeks not data center geeks
  • 184. BLAT @ U. Penn Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days 30 high-memory instances; 32 hours; $195 Source: Angel Pizzaro/John Hogenesch
  • 185. BELLE MONTE CARLO Credit: Tom Fifield
  • 186. MapReduce for Genomics Ben Langmead http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
  • 188.
  • 192. http://www.elasticr.net Elastic-R Collaborative Research Environment
  • 195.
  • 196. deesingh@amazon.com   Twicer:@mndoci slides  at  hcp://slideshare.net/mndoci InspiraIon  and  material  from  Mah  Wood, James  Hamilton  &  Larry  Lessig By Oberazzi under a CC-BY-NC-SA license