SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
The AWS Cloud
          Leveraging the State of the Art

          Sid Anand (@r39132)
          SAP Cloud Inside Track 2012




                                            1

Thursday, February 16, 2012
What is the AWS Cloud?
          A Real World Scenario




                                   2

Thursday, February 16, 2012
A Real World Scenario
        Question

        If you were to build your own website today, what would you need?

        Answer

        You need a machine!

        For simplicity, we will assume that your web server and application server code run
        on the same box!

        AWS offers EC2 instances (i.e. virtual instances) to host your code

             - Various sizes (e.g. IOps, # of Spindles, CPUs, Memory, Network bandwidth)

             - Various configurations (e.g. Virtual Private Cloud, High Performance Cluster )

             - Various pricing schemes (e.g. on-demand, reserved, SPOT, etc....)




                                                                                               3

Thursday, February 16, 2012
A Real World Scenario
        Question

        Is one machine enough to handle traffic
        from all of your users?

        What if that machine were to fall over or
        need maintenance (i.e. a restart)?



        Answer

        Add many machines!




                                                      4

Thursday, February 16, 2012
A Real World Scenario
        Question

        This handles more traffic, but what if your
        servers were to fall over or need maintenance?

        Answer

        AWS offers AutoScaleGroups (a.k.a. ASG)!

        You can deploy your servers under the protection of
        an ASG with a min and max pool size set.

        The ASG ensures that machines are replaced when
        they die to guarantee your “min” pool size

        ASGs monitor the health of your machines by polling
        an http port on each machine



                                                              5

Thursday, February 16, 2012
A Real World Scenario
        Question

        How do you distribute traffic to all of your
        machines evenly?

        Answer

        Deploy your favorite software load balancer!

        And write some custom code to register/deregister
        your machine instances with the load balancer




                                                            6

Thursday, February 16, 2012
A Real World Scenario
      Question

      What if the load balancer were to fall over or to need maintenance or
      to become a traffic choke point?

      Answer

      Add multiple servers and deploy them under an ASG!

      This is not ideal for a few reasons

           - Need to register/deregister your Load Balancer instances with DNS

           - Need to sync with ASGsʼs view of what is alive and dead, being
           added or removed, etc...




                                                                                 7

Thursday, February 16, 2012
A Real World Scenario
        Answer

        AWS offers Elastic Load Balancers (i.e. ELB)

             - Conceptually similar to having many LBs in an ASG, with some
             additional features:

                  - Provides DNS hostname (e.g. mysite-11111111.us-
                  east-1.elb.amazonaws.com)

                  - Maps all of the load balancer instances to this hostname

                  - Takes care of maintenance of the load balancer machines and
                  the requisite DNS registrations/deregistrations

                  - Syncs with the ASG -- if the ASG replaces one of your
                  instances, the ELB will also remove that instance

             - Letʼs see how it works in action!




                                                                                  8

Thursday, February 16, 2012
9
       @r39132                23
Thursday, February 16, 2012
A Real World Scenario
        Question

        What about a DB to persist my data?

        Answer

        Multiple AWS hosted/managed options!

             - DynamoDB (the new SimpleDB replacement) offers key-value
             semantics

                  Netflix replaced Oracle with SimpleDB and ran on it 2010-2011

                  - 4.5 Billion user-facing request a day

             - S3 offers key-value semantics for very large files (e.g. 5TB).
             Typically for Map-Reduce files, media files, or Oracle BLOBS/
             CLOBS

             - RDS - hosted Oracle or MySQL if you need relations and complex
             queries

                                                                                 10

Thursday, February 16, 2012
A Real World Scenario
        Question

        What if I have high-volume writes, but donʼt
        care when they are written -- e.g. event
        streams

        Answer

        Simple Queue Service

             - Think Enterprise Message Bus

             - Highly available, infinitely scalable

             - Handles application/system monitoring
             event traffic and social graph events at
             Netflix




                                                       11

Thursday, February 16, 2012
A Real World Scenario

        Question

        What if the whole Data Center goes
        down? How do I keep my service
        available?

        Answer

        Amazon Data Center = Availability Zone




                                                      12

Thursday, February 16, 2012
A Real World Scenario
        Answer

        Always deploy your code in
        multiple Availability Zones!

             - Netflix deploys in 3 AZs in
             Virgina

             - Best Practice : Always deploy
             enough capacity in each AZ to
             handle losing one AZ during
             peak

             - Netflix follows this best
             practice!




                                                      13

Thursday, February 16, 2012
A Real World Scenario
        Question

        What if your Asian and European customers complain of slow response times?

        Recall : Higher Response times, lower scalability

        Answer

        AWS has 8 global regions! Each region has between 3 and 4 AZs

             - Netflixʼs launch in the UK and Ireland were out of AWS EU-West Region




                                                                                      14

Thursday, February 16, 2012
A Real World Scenario




                                                      15

Thursday, February 16, 2012
A Real World Scenario
        Other AWS Services:

        - Elastic Map Reduce : Map-Reduce as a Service for analytics. Supports PIG and Hive

        - ElastiCache : A hosted cache service (think Memcached as a Service)




        Whatʼs Missing (or coming soon)?:

        - Discovery & Load Balancing for N-tier applications!

             - In effect, weʼd like ELB for internal traffic

        - Crypto as a Service

        - Currently, none of the services are cross-region! Itʼs left to the user to transfer data or proxy requests between
        regions



                                                                                                                               16

Thursday, February 16, 2012
Who Uses AWS?
          Netflix’s Cloud Architecture




                                        17

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                          ELB                                                     ELB



                                                   NES           NES                                       NES           NES
      Components

      Many (~100) applications, organized in                                     Discovery

      clusters (a.k.a. ASGs)
                                                   NMTS          NMTS                                      NMTS          NMTS

      Clusters can be at different levels in the
      call stack
                                                                               NMTS          NMTS

      Clusters can call each other


                                                          NBES                                                    NBES




                                                                        IAAS          IAAS          IAAS




                                                                                                                                18

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                       ELB                                                     ELB

      Levels
                                                NES           NES                                       NES           NES

      NES : Netflix Edge Services
                                                                              Discovery
      NMTS : Netflix Mid-tier Services
                                                NMTS          NMTS                                      NMTS          NMTS
      NBES : Netflix Back-end Services

      IAAS : AWS IAAS Services                                              NMTS          NMTS


      Discovery : Help services discover NMTS
      and NBES services
                                                       NBES                                                    NBES




                                                                     IAAS          IAAS          IAAS




                                                                                                                             19

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                               ELB                                                     ELB
        Components (NES)
                                                        NES           NES                                       NES           NES
        Overview

              Any service that browsers and streaming                                 Discovery

              devices connect to over the internet
                                                        NMTS          NMTS                                      NMTS          NMTS

              They sit behind AWS Elastic Load
              Balancers (a.k.a. ELB)
                                                                                    NMTS          NMTS
              They call clusters at lower levels


                                                               NBES                                                    NBES




                                                                             IAAS          IAAS          IAAS




                                                                                                                                     20

Thursday, February 16, 2012
Netflix’s Cloud Architecture
        Components (NES)                                                ELB                                                     ELB



        Examples                                                 NES           NES                                       NES           NES


              API Servers
                                                                                               Discovery

                    Support the video browsing experience
                                                                 NMTS          NMTS                                      NMTS          NMTS

                    Also allows users to modify their Q

                    Serves 1.4 Billions calls/day                                            NMTS          NMTS


              Streaming Control Servers

                    Support streaming video playback
                                                                        NBES                                                    NBES

                    Authenticate your Wii, PS3, etc...

                    Download DRM to the Wii, PS3, etc...

                    Return a list of CDN urls to the Wii, PS3,                        IAAS          IAAS          IAAS

                    etc...

                                                                                                                                              21

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                              ELB                                                     ELB



        Components (NMTS)                              NES           NES                                       NES           NES



        Overview
                                                                                     Discovery

              Can call services at the same or lower   NMTS          NMTS                                      NMTS          NMTS
              levels

                    Other NMTS
                                                                                   NMTS          NMTS

                    NBES, IAAS

                    Not NES
                                                              NBES                                                    NBES

              Exposed through our Discovery service




                                                                            IAAS          IAAS          IAAS




                                                                                                                                    22

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                                    ELB                                                     ELB

        Components (NMTS)
                                                             NES           NES                                       NES           NES

        Examples
                                                                                           Discovery
              Netflix Queue Servers
                                                             NMTS          NMTS                                      NMTS          NMTS
                    Modify items in the usersʼ movie queue

              Viewing History Servers
                                                                                         NMTS          NMTS

                    Record and track all streaming movie
                    watching

              SIMS Servers                                          NBES                                                    NBES


                    Compute and serve user-to-user and
                    movie-to-movie similarities


                                                                                  IAAS          IAAS          IAAS




                                                                                                                                          23

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                                   ELB                                                     ELB


        Components (NBES)
                                                            NES           NES                                       NES           NES

        Overview
                                                                                          Discovery
              A back-end, usually 3rd party, open-source
              service                                       NMTS          NMTS                                      NMTS          NMTS


              Leaf in the call tree. Cannot call anything
              else
                                                                                        NMTS          NMTS




                                                                   NBES                                                    NBES




                                                                                 IAAS          IAAS          IAAS




                                                                                                                                         24

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                                         ELB                                                     ELB



        Components (NBES)                                         NES           NES                                       NES           NES


        Examples
                                                                                                Discovery

              Cassandra Clusters
                                                                  NMTS          NMTS                                      NMTS          NMTS

                    Our new cloud database is Cassandra and
                    stores all sorts of data to support
                    application needs                                                         NMTS          NMTS


              Zookeeper Clusters

                    Our distributed lock service and sequence
                                                                         NBES                                                    NBES
                    generator

              Memcached Clusters

                    Typically caches things that we store in S3
                    but need to access quickly or often                                IAAS          IAAS          IAAS




                                                                                                                                               25

Thursday, February 16, 2012
Netflix’s Cloud Architecture
                                                                           ELB                                                     ELB
        Components (IAAS)
                                                                    NES           NES                                       NES           NES
        Examples

              AWS S3                                                                              Discovery


                    Large-sized data (e.g. video encodes,           NMTS          NMTS                                      NMTS          NMTS
                    application logs, etc...) is stored here, not
                    Cassandra
                                                                                                NMTS          NMTS
              AWS SQS

                    Amazonʼs message queue to send events
                    (e.g. Facebook network updates are
                    processed asynchronously over SQS)                     NBES                                                    NBES




                                                                                         IAAS          IAAS          IAAS




                                                                                                                                                 26

Thursday, February 16, 2012
Netflix’s Cloud Architecture
      Architecture Pros

      Horizontally scalable at every level

            Should give us maximum availability



      Architecture Cons

      A user-issued call will pass through multiple levels (a.k.a. hops) during normal operation

            Latency can be a concern

      EC2 instances in AWS can die at any time!

      A lot of moving parts



                                                                                                   27

Thursday, February 16, 2012
Dealing with the Cons!


       We have a little help




                                   28

Thursday, February 16, 2012
Simian Army
          Prevention (& Early Detection) is the best
          medicine




                                                       29

Thursday, February 16, 2012
Simian Army
  • Chaos Monkey
        • Simulates hard failures in AWS by killing a few instances per ASG (e.g. Auto Scale Group)
              • Similar to how EC2 instances can be killed by AWS with little warning
        • Tests Netflixʼs ability to gracefully deal with broken connections, interrupted calls, etc...
        • Verifies that all services are running within the protection of AWS Auto Scale Groups, which
              reincarnates killed instances

              • If not, the Chaos monkey will win!




                                                                                                         30

Thursday, February 16, 2012
Simian Army
  • Latency Monkey
        • Simulates soft failures -- i.e. a service gets slower
        • Injects random delays in servers!
        • Tests the ability of applications to detect and recover (i.e. Graceful Degradation) from the harder
              problem of delays

              • Delays cause Thundering Herds (outside of the scope of this talk!)




                                                                                                                31

Thursday, February 16, 2012
Simian Army


                              Does this solve all of our issues?




                                                                   32

Thursday, February 16, 2012
Simian Army

       The infinite cloud is infinite when your needs are
       moderate!


       To ensure fairness among tenants, AWS meters or limits every resource

       Hence, we hit limits quite often. Our “velocity” is limited by how long it takes for AWS to
       turn around and raise the limit -- a few hours!




                                                                                                     33

Thursday, February 16, 2012
Simian Army
  • Limits Monkey
        • Checks once an hour whether we are approaching one of our limits and triggers alerts for us to
              proactively reach out to AWS!




  • Conformity & Janitor Monkeys
        • Finds and clean up orphaned resources (e.g. EC2 instances that are not in an ASG,
              unreferenced security groups, ELBs, ASGs, etc...) to increase head-room

              • Buys us more time before we run out of resources and also saves us $$$$




                                                                                                           34

Thursday, February 16, 2012
Questions?
          Sid Anand


                  @r39132


          http://www.linkedin.com/in/siddharthanand


                                                      35

Thursday, February 16, 2012

Contenu connexe

Plus de Sid Anand

Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)Sid Anand
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Sid Anand
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Sid Anand
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Sid Anand
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Sid Anand
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)Sid Anand
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Sid Anand
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with MavenSid Anand
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)Sid Anand
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Sid Anand
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2Sid Anand
 

Plus de Sid Anand (20)

Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Building & Operating High-Fidelity Data Streams - QCon Plus 2021
Building & Operating High-Fidelity Data Streams - QCon Plus 2021
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
YOW! Data Keynote (2021)
YOW! Data Keynote (2021)YOW! Data Keynote (2021)
YOW! Data Keynote (2021)
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)Cloud Native Predictive Data Pipelines (micro talk)
Cloud Native Predictive Data Pipelines (micro talk)
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016Introduction to Apache Airflow - Data Day Seattle 2016
Introduction to Apache Airflow - Data Day Seattle 2016
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
 
Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)Building a Modern Website for Scale (QCon NY 2013)
Building a Modern Website for Scale (QCon NY 2013)
 
Hands On with Maven
Hands On with MavenHands On with Maven
Hands On with Maven
 
Learning git
Learning gitLearning git
Learning git
 
LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)LinkedIn Data Infrastructure Slides (Version 2)
LinkedIn Data Infrastructure Slides (Version 2)
 
LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)LinkedIn Data Infrastructure (QCon London 2012)
LinkedIn Data Infrastructure (QCon London 2012)
 
Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!Keeping Movies Running Amid Thunderstorms!
Keeping Movies Running Amid Thunderstorms!
 
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2OSCON Data 2011 -- NoSQL @ Netflix, Part 2
OSCON Data 2011 -- NoSQL @ Netflix, Part 2
 

Dernier

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Dernier (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

The AWS Cloud : Leveraging the State of the Art

  • 1. The AWS Cloud Leveraging the State of the Art Sid Anand (@r39132) SAP Cloud Inside Track 2012 1 Thursday, February 16, 2012
  • 2. What is the AWS Cloud? A Real World Scenario 2 Thursday, February 16, 2012
  • 3. A Real World Scenario Question If you were to build your own website today, what would you need? Answer You need a machine! For simplicity, we will assume that your web server and application server code run on the same box! AWS offers EC2 instances (i.e. virtual instances) to host your code - Various sizes (e.g. IOps, # of Spindles, CPUs, Memory, Network bandwidth) - Various configurations (e.g. Virtual Private Cloud, High Performance Cluster ) - Various pricing schemes (e.g. on-demand, reserved, SPOT, etc....) 3 Thursday, February 16, 2012
  • 4. A Real World Scenario Question Is one machine enough to handle traffic from all of your users? What if that machine were to fall over or need maintenance (i.e. a restart)? Answer Add many machines! 4 Thursday, February 16, 2012
  • 5. A Real World Scenario Question This handles more traffic, but what if your servers were to fall over or need maintenance? Answer AWS offers AutoScaleGroups (a.k.a. ASG)! You can deploy your servers under the protection of an ASG with a min and max pool size set. The ASG ensures that machines are replaced when they die to guarantee your “min” pool size ASGs monitor the health of your machines by polling an http port on each machine 5 Thursday, February 16, 2012
  • 6. A Real World Scenario Question How do you distribute traffic to all of your machines evenly? Answer Deploy your favorite software load balancer! And write some custom code to register/deregister your machine instances with the load balancer 6 Thursday, February 16, 2012
  • 7. A Real World Scenario Question What if the load balancer were to fall over or to need maintenance or to become a traffic choke point? Answer Add multiple servers and deploy them under an ASG! This is not ideal for a few reasons - Need to register/deregister your Load Balancer instances with DNS - Need to sync with ASGsʼs view of what is alive and dead, being added or removed, etc... 7 Thursday, February 16, 2012
  • 8. A Real World Scenario Answer AWS offers Elastic Load Balancers (i.e. ELB) - Conceptually similar to having many LBs in an ASG, with some additional features: - Provides DNS hostname (e.g. mysite-11111111.us- east-1.elb.amazonaws.com) - Maps all of the load balancer instances to this hostname - Takes care of maintenance of the load balancer machines and the requisite DNS registrations/deregistrations - Syncs with the ASG -- if the ASG replaces one of your instances, the ELB will also remove that instance - Letʼs see how it works in action! 8 Thursday, February 16, 2012
  • 9. 9 @r39132 23 Thursday, February 16, 2012
  • 10. A Real World Scenario Question What about a DB to persist my data? Answer Multiple AWS hosted/managed options! - DynamoDB (the new SimpleDB replacement) offers key-value semantics Netflix replaced Oracle with SimpleDB and ran on it 2010-2011 - 4.5 Billion user-facing request a day - S3 offers key-value semantics for very large files (e.g. 5TB). Typically for Map-Reduce files, media files, or Oracle BLOBS/ CLOBS - RDS - hosted Oracle or MySQL if you need relations and complex queries 10 Thursday, February 16, 2012
  • 11. A Real World Scenario Question What if I have high-volume writes, but donʼt care when they are written -- e.g. event streams Answer Simple Queue Service - Think Enterprise Message Bus - Highly available, infinitely scalable - Handles application/system monitoring event traffic and social graph events at Netflix 11 Thursday, February 16, 2012
  • 12. A Real World Scenario Question What if the whole Data Center goes down? How do I keep my service available? Answer Amazon Data Center = Availability Zone 12 Thursday, February 16, 2012
  • 13. A Real World Scenario Answer Always deploy your code in multiple Availability Zones! - Netflix deploys in 3 AZs in Virgina - Best Practice : Always deploy enough capacity in each AZ to handle losing one AZ during peak - Netflix follows this best practice! 13 Thursday, February 16, 2012
  • 14. A Real World Scenario Question What if your Asian and European customers complain of slow response times? Recall : Higher Response times, lower scalability Answer AWS has 8 global regions! Each region has between 3 and 4 AZs - Netflixʼs launch in the UK and Ireland were out of AWS EU-West Region 14 Thursday, February 16, 2012
  • 15. A Real World Scenario 15 Thursday, February 16, 2012
  • 16. A Real World Scenario Other AWS Services: - Elastic Map Reduce : Map-Reduce as a Service for analytics. Supports PIG and Hive - ElastiCache : A hosted cache service (think Memcached as a Service) Whatʼs Missing (or coming soon)?: - Discovery & Load Balancing for N-tier applications! - In effect, weʼd like ELB for internal traffic - Crypto as a Service - Currently, none of the services are cross-region! Itʼs left to the user to transfer data or proxy requests between regions 16 Thursday, February 16, 2012
  • 17. Who Uses AWS? Netflix’s Cloud Architecture 17 Thursday, February 16, 2012
  • 18. Netflix’s Cloud Architecture ELB ELB NES NES NES NES Components Many (~100) applications, organized in Discovery clusters (a.k.a. ASGs) NMTS NMTS NMTS NMTS Clusters can be at different levels in the call stack NMTS NMTS Clusters can call each other NBES NBES IAAS IAAS IAAS 18 Thursday, February 16, 2012
  • 19. Netflix’s Cloud Architecture ELB ELB Levels NES NES NES NES NES : Netflix Edge Services Discovery NMTS : Netflix Mid-tier Services NMTS NMTS NMTS NMTS NBES : Netflix Back-end Services IAAS : AWS IAAS Services NMTS NMTS Discovery : Help services discover NMTS and NBES services NBES NBES IAAS IAAS IAAS 19 Thursday, February 16, 2012
  • 20. Netflix’s Cloud Architecture ELB ELB Components (NES) NES NES NES NES Overview Any service that browsers and streaming Discovery devices connect to over the internet NMTS NMTS NMTS NMTS They sit behind AWS Elastic Load Balancers (a.k.a. ELB) NMTS NMTS They call clusters at lower levels NBES NBES IAAS IAAS IAAS 20 Thursday, February 16, 2012
  • 21. Netflix’s Cloud Architecture Components (NES) ELB ELB Examples NES NES NES NES API Servers Discovery Support the video browsing experience NMTS NMTS NMTS NMTS Also allows users to modify their Q Serves 1.4 Billions calls/day NMTS NMTS Streaming Control Servers Support streaming video playback NBES NBES Authenticate your Wii, PS3, etc... Download DRM to the Wii, PS3, etc... Return a list of CDN urls to the Wii, PS3, IAAS IAAS IAAS etc... 21 Thursday, February 16, 2012
  • 22. Netflix’s Cloud Architecture ELB ELB Components (NMTS) NES NES NES NES Overview Discovery Can call services at the same or lower NMTS NMTS NMTS NMTS levels Other NMTS NMTS NMTS NBES, IAAS Not NES NBES NBES Exposed through our Discovery service IAAS IAAS IAAS 22 Thursday, February 16, 2012
  • 23. Netflix’s Cloud Architecture ELB ELB Components (NMTS) NES NES NES NES Examples Discovery Netflix Queue Servers NMTS NMTS NMTS NMTS Modify items in the usersʼ movie queue Viewing History Servers NMTS NMTS Record and track all streaming movie watching SIMS Servers NBES NBES Compute and serve user-to-user and movie-to-movie similarities IAAS IAAS IAAS 23 Thursday, February 16, 2012
  • 24. Netflix’s Cloud Architecture ELB ELB Components (NBES) NES NES NES NES Overview Discovery A back-end, usually 3rd party, open-source service NMTS NMTS NMTS NMTS Leaf in the call tree. Cannot call anything else NMTS NMTS NBES NBES IAAS IAAS IAAS 24 Thursday, February 16, 2012
  • 25. Netflix’s Cloud Architecture ELB ELB Components (NBES) NES NES NES NES Examples Discovery Cassandra Clusters NMTS NMTS NMTS NMTS Our new cloud database is Cassandra and stores all sorts of data to support application needs NMTS NMTS Zookeeper Clusters Our distributed lock service and sequence NBES NBES generator Memcached Clusters Typically caches things that we store in S3 but need to access quickly or often IAAS IAAS IAAS 25 Thursday, February 16, 2012
  • 26. Netflix’s Cloud Architecture ELB ELB Components (IAAS) NES NES NES NES Examples AWS S3 Discovery Large-sized data (e.g. video encodes, NMTS NMTS NMTS NMTS application logs, etc...) is stored here, not Cassandra NMTS NMTS AWS SQS Amazonʼs message queue to send events (e.g. Facebook network updates are processed asynchronously over SQS) NBES NBES IAAS IAAS IAAS 26 Thursday, February 16, 2012
  • 27. Netflix’s Cloud Architecture Architecture Pros Horizontally scalable at every level Should give us maximum availability Architecture Cons A user-issued call will pass through multiple levels (a.k.a. hops) during normal operation Latency can be a concern EC2 instances in AWS can die at any time! A lot of moving parts 27 Thursday, February 16, 2012
  • 28. Dealing with the Cons! We have a little help 28 Thursday, February 16, 2012
  • 29. Simian Army Prevention (& Early Detection) is the best medicine 29 Thursday, February 16, 2012
  • 30. Simian Army • Chaos Monkey • Simulates hard failures in AWS by killing a few instances per ASG (e.g. Auto Scale Group) • Similar to how EC2 instances can be killed by AWS with little warning • Tests Netflixʼs ability to gracefully deal with broken connections, interrupted calls, etc... • Verifies that all services are running within the protection of AWS Auto Scale Groups, which reincarnates killed instances • If not, the Chaos monkey will win! 30 Thursday, February 16, 2012
  • 31. Simian Army • Latency Monkey • Simulates soft failures -- i.e. a service gets slower • Injects random delays in servers! • Tests the ability of applications to detect and recover (i.e. Graceful Degradation) from the harder problem of delays • Delays cause Thundering Herds (outside of the scope of this talk!) 31 Thursday, February 16, 2012
  • 32. Simian Army Does this solve all of our issues? 32 Thursday, February 16, 2012
  • 33. Simian Army The infinite cloud is infinite when your needs are moderate! To ensure fairness among tenants, AWS meters or limits every resource Hence, we hit limits quite often. Our “velocity” is limited by how long it takes for AWS to turn around and raise the limit -- a few hours! 33 Thursday, February 16, 2012
  • 34. Simian Army • Limits Monkey • Checks once an hour whether we are approaching one of our limits and triggers alerts for us to proactively reach out to AWS! • Conformity & Janitor Monkeys • Finds and clean up orphaned resources (e.g. EC2 instances that are not in an ASG, unreferenced security groups, ELBs, ASGs, etc...) to increase head-room • Buys us more time before we run out of resources and also saves us $$$$ 34 Thursday, February 16, 2012
  • 35. Questions? Sid Anand @r39132 http://www.linkedin.com/in/siddharthanand 35 Thursday, February 16, 2012