SlideShare une entreprise Scribd logo
1  sur  31
Stacking up with OpenStack:
Building for High Availability
Utpal Thakrar, Sr. Product Manager
April 17, 2013
2#



My relationship with HA   1975




      Cloud Management    #rightscale
3#



My relationship with HA      1991




      Cloud Management    #rightscale
4#



My relationship with HA                         2001



                         How many 9-s can
                          your product do?




      Cloud Management                       #rightscale
5#



So what did they mean by 5-9s?

    Availability          Allowed Down Time each Year
    99%                   3.65 days
    99.9%                 8.76 hours
    99.99%                52.56 minutes
    99.999%               5.26 minutes




       Cloud Management                                 #rightscale
6#



Stuff happens, are you prepared?




      Cloud Management             #rightscale
7#



Who dunnit?…




     Cloud Management   #rightscale
8#



And you see these …




    Cloud Management   #rightscale
9#



Is 100% Outage-proofing possible?




      Cloud Management              #rightscale
10#



Old School Fault-Tolerance: Build Two




      Cloud Management            #rightscale
11#

Golden Age of Cloud Computing
  No Up-Front               Low Cost           Pay Only for
 Capital Expense                               What You Use




   Self-Service            Easily Scale Up   Improve Agility &
  Infrastructure             and Down         Time-to-Market

     Deploy




        Cloud Management                                 #rightscale
12#

Golden Age for Fault-Tolerance
  No Up-Front HA              Low Cost          Pay for DR Only
  Capital Expense             Backups           When You Use it




   Self-Service        Easily Deliver Fault-   Improve Agility &
 DR Infrastructure     Tolerant Applications   Time-to-Recovery

      Deploy




         Cloud Management                                  #rightscale
13#



Yeah, but …
What about my private cloud?

Applications deployed in private clouds have to worry about:

• Private Cloud Infrastructure being HA
• Application architecture HA / DR

• With Public Clouds – Well, you get what your provider gives
  you



           Cloud Management                               #rightscale
14#



Private Cloud Infrastructure HA
Several single points of failure in OpenStack deployment
• OpenStack API services
• MySQL
• RabbitMQ

Solved in various ways
• Pacemaker cluster management
• Keepalived (e.g: RAX Private Cloud)
• MySQL (Galera), RabbitMQ (active-active mirrored queues)

               Eliminate SPoFs as best as you can.

          Cloud Management                                 #rightscale
15#



What about my app?
Design for failure:
• If your application relies on Cloud infrastructure
  SLA for its HA needs, you are STUCK with that
  vendor / infrastructure

• Need to balance cost and complexity against risk
  tolerance

• Design application so that its:
      Build for server failure
      Build for zone failure
      Build for cloud failure
      Keep management layer separate from infrastructure
             Cloud Management                               #rightscale
16#



Build for Server Failure
• Set up auto-scaling

• Set up database mirroring,
  master/slave configuration

• Use static public IPs

• Use Dynamic DNS for
  private IPs




           Cloud Management    #rightscale
17#



 Build for Zone Failure
                                                                                         Static Public IPs

                                                              DNS
                                              172.168.7.31                172.168.8.62
                   Zone 1                                                                                    Zone 2
                                                                                                             1
                            LOAD BALANCERS                                    LOAD BALANCERS                          Where possible,
                                                                                                                      use NoSQL DB
                                                                                                                      like Cassandra
                                                                                                                      or MongoDB

                                                         APP SERVERS
                                                                                                       AUTOSCALE



                              MASTER DB                                 SLAVE DB
                                        REPLICATE

                                         Block
                                                                                                         SNAPSHOTS
                                                                                                                         Object store
Snapshot data volume for backups so
                                                             Place Slave databases in one
the database can be readily recovered
                                                               or more zones for failover.
          within the region.


    A creative deployment model would be to make your private cloud an “AZ” by placing
    it in close physical proximity to a public cloud provider
                       Cloud Management                                                                                #rightscale
18#



Build for Cloud Failure (Cold DR)
Staged Server Configuration and generally no staged data
                                                                                           $
• Not recommended if rapid recovery is required
• Slow to replicate data to other cloud and bring database online
                                                         DNS
                                          172.168.7.31

         Private                                                                  DALLAS


                    LOAD BALANCERS                               LOAD BALANCERS




                      APP SERVERS                                 APP SERVERS




               MASTER DB              SLAVE DB                      SLAVE DB

                          REPLICATE


                           Block

                    SNAPSHOTS



                                                         CLOUD
                   Cloud Management                      FILES                        #rightscale
19#



Build for Cloud Failure (Warm DR)
Staged Server Configuration, pre-staged data and running Slave Database Server
                                                                                         $$
• Generally recommended DR solution
• Minimal additional cost and allows fairly rapid recovery
                                                       DNS
                                        172.168.7.31

        Private                                                                     DALLAS


                   LOAD BALANCERS                                  LOAD BALANCERS




                     APP SERVERS                                     APP SERVERS




              MASTER DB             SLAVE DB                            SLAVE DB

                        REPLICATE                               REPLICATE



                         Block
                                                                      SNAPSHOTS
                    SNAPSHOTS




                                                        CLOUD
                  Cloud Management                      FILES                           #rightscale
20#



Build for Cloud Failure (Hot DR)
Parallel Deployment with all servers running but all traffic going to primary
                                                                                           $$$
• Not recommended
• Very high additional cost to allow rapid recovery
                                                         DNS
                                          172.168.7.31

         Private                                                                      DALLAS


                    LOAD BALANCERS                                   LOAD BALANCERS




                      APP SERVERS                                      APP SERVERS




               MASTER DB              SLAVE DB                            SLAVE DB

                          REPLICATE                               REPLICATE


                           Block

                     SNAPSHOTS                                         SNAPSHOTS




                                                          CLOUD
                   Cloud Management                       FILES                           #rightscale
21#



Availability vs. Cost - Dial

                                 Cost




                            Availability
            Min            Min          Max   Max




        Cloud Management                            #rightscale
22#



Make sure workload is portable across clouds




       Cloud Management                #rightscale
23#



Automate and test everything

• Automate backups of your data
• Setup monitoring and alerts
• Run fire-drills! Plan and Practice your recovery procedures!




          Cloud Management                                #rightscale
24#


Separate Management layer from Infrastructure

• Keep the keys to the car outside the car




          Cloud Management                   #rightscale
25#



Automating HA and DR
• Use dynamic DNS for your database servers
   • Allow app servers to use a single FQDN.
   • Use a low TTL to allow rapid failover in the case of a change in master
     database
• Automatic connection of app servers to load balancing servers
   • App servers can connect to all load balancers automatically at launch
   • No manual intervention
   • No DNS modifications
• Automated promotion of slave to master
   • Process is automated
   • Decision to run process is manual




            Cloud Management                                          #rightscale
Samsung SDS
                                                              Mr. Kirk Kim




Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved
Hybrid Cloud Network Architecture


                                                                                         Internet traffic
                                                        CF Router
                                                     Public ASN: XXXX




                                                          Firewall
                                                            IPS
                                                        VPN Gateway                                                                              Compute
                                                                                                                   EIP: e.x.y.b   EIP: e.x.y.a
                                                                                                                       VM             VM


                                                                                         Private Network



                                             VM                           VM                                                                       VPC
                                                                                                           Virtual GW


                                   Private: 10.x.x.x/24           Private: 10.x.x.x/24                                      VM              VM
                                    Public: *.*.*.0/24             Public: *.*.*.0/24
                                                                                                                                                      Internet
                                                                                                                                                        GW

                                                                                                                              10.x.x.x/24



                                                                                                                        Object
                                                                                                                        Storage

                                                             SPCS
                                                                                                                         Public Cloud
                                        Between SPCS and Public Cloud using public
                                        IP
                                        Between SPCS and Public Cloud using private
                                        IP
                                        Internet traffic to SPCS and Public Cloud using public IP
     Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved
27
28#



How RightScale makes it possible

RightScale ServerTemplates™
• Reproducible: Predictable
  deployment
• Dynamic: Configuration from
  scripts at boot time
• Multi-cloud: Cloud agnostic
  and portable
• Modular: Role and behavior
  abstracted from cloud
  infrastructure

          Cloud Management         #rightscale
29#



How RightScale makes it possible
MultiCloud Images
• MultiCloud Images can be launched across regions and clouds
  without modification
            ServerTemplate contains a list
      1     of MultiCloud Images (MCIs)
                                                  When the Server is
                                             2    created, a specific MCI
                                                  is chosen.
                                                                            The appropriate
                                                                        3   RightImage is used at
          MultiCloud Images
                                                                            launch.
             Cloud A, B, Image 1
             Cloud A C, Image 2
             Cloud B, Image 1        Cloud A, B, Image 1



                                                                                Cloud B
             Stability across clouds
                                                                               Image 1

                                         RightImage


                Cloud Management                                                                    #rightscale
30#



Outage-Proofing Best Practices


  Place in >1          Replicate data        Replicate data
   zone:                  across zones          across zones
   • Load balancers      Backup across        Design stateless
   • App servers          regions & clouds      apps for
   • Databases           Monitoring, alert,    resilience to
  Maintain                and automate         reboot / relaunch
   capacity to            operations to
   absorb zone or         speed up
   region failures        failover



         Cloud Management                                    #rightscale
31#



Thank you!
Sign-up for a free account at: www.rightscale.com

Check out job postings are: www.rightscale.com/jobs


                      We are hiring!




          Cloud Management                            #rightscale

Contenu connexe

Tendances

Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
confluent
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Red Hat Developers
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
Arthur Berezin
 

Tendances (20)

Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack Cloud
 
Scalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and PracticeScalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and Practice
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
vSphere With OpenStack
vSphere With OpenStackvSphere With OpenStack
vSphere With OpenStack
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
Friends don't let friends do dual writes: Outbox pattern with OpenShift Strea...
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 

En vedette

Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production Environment
OpenStack Foundation
 
Top 10 Things We Learned Implementing OpenStack
Top 10 Things We Learned Implementing OpenStackTop 10 Things We Learned Implementing OpenStack
Top 10 Things We Learned Implementing OpenStack
OpenStack Foundation
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
OpenStack Foundation
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
OpenStack Foundation
 
Consideration for Building a Private Cloud
Consideration for Building a Private CloudConsideration for Building a Private Cloud
Consideration for Building a Private Cloud
OpenStack Foundation
 
Open stack in_production__the_good,_the_bad_&_the_ugly
Open stack in_production__the_good,_the_bad_&_the_uglyOpen stack in_production__the_good,_the_bad_&_the_ugly
Open stack in_production__the_good,_the_bad_&_the_ugly
OpenStack Foundation
 
Bercovici top 10 things net app learned 0416133
Bercovici top 10 things net app learned 0416133Bercovici top 10 things net app learned 0416133
Bercovici top 10 things net app learned 0416133
OpenStack Foundation
 
Best Practices for Integrating a Third party Portal with OpenStack
Best Practices for Integrating a Third party Portal with OpenStackBest Practices for Integrating a Third party Portal with OpenStack
Best Practices for Integrating a Third party Portal with OpenStack
OpenStack Foundation
 
Introduction to OpenStack Architecture
Introduction to OpenStack ArchitectureIntroduction to OpenStack Architecture
Introduction to OpenStack Architecture
OpenStack Foundation
 

En vedette (17)

Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production Environment
 
Hadoop For OpenStack Log Analysis
Hadoop For OpenStack Log AnalysisHadoop For OpenStack Log Analysis
Hadoop For OpenStack Log Analysis
 
Using *Grimoire to Analyze
Using *Grimoire to AnalyzeUsing *Grimoire to Analyze
Using *Grimoire to Analyze
 
Top 10 Things We Learned Implementing OpenStack
Top 10 Things We Learned Implementing OpenStackTop 10 Things We Learned Implementing OpenStack
Top 10 Things We Learned Implementing OpenStack
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
Using OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting EnvironmentUsing OpenStack In a Traditional Hosting Environment
Using OpenStack In a Traditional Hosting Environment
 
Mark Collier Keynote - OpenStack Day London June 2014
Mark Collier Keynote -  OpenStack Day London June 2014Mark Collier Keynote -  OpenStack Day London June 2014
Mark Collier Keynote - OpenStack Day London June 2014
 
OpenStack in Production
OpenStack in ProductionOpenStack in Production
OpenStack in Production
 
Consideration for Building a Private Cloud
Consideration for Building a Private CloudConsideration for Building a Private Cloud
Consideration for Building a Private Cloud
 
Clouds in High Energy
Clouds in High EnergyClouds in High Energy
Clouds in High Energy
 
OpenStack Database as a Service - Juno Updates
OpenStack Database as a Service - Juno UpdatesOpenStack Database as a Service - Juno Updates
OpenStack Database as a Service - Juno Updates
 
Open stack in_production__the_good,_the_bad_&_the_ugly
Open stack in_production__the_good,_the_bad_&_the_uglyOpen stack in_production__the_good,_the_bad_&_the_ugly
Open stack in_production__the_good,_the_bad_&_the_ugly
 
Clouds in High Energy Physics
Clouds in High Energy PhysicsClouds in High Energy Physics
Clouds in High Energy Physics
 
Bercovici top 10 things net app learned 0416133
Bercovici top 10 things net app learned 0416133Bercovici top 10 things net app learned 0416133
Bercovici top 10 things net app learned 0416133
 
Chef For OpenStack Overview
Chef For OpenStack OverviewChef For OpenStack Overview
Chef For OpenStack Overview
 
Best Practices for Integrating a Third party Portal with OpenStack
Best Practices for Integrating a Third party Portal with OpenStackBest Practices for Integrating a Third party Portal with OpenStack
Best Practices for Integrating a Third party Portal with OpenStack
 
Introduction to OpenStack Architecture
Introduction to OpenStack ArchitectureIntroduction to OpenStack Architecture
Introduction to OpenStack Architecture
 

Similaire à Stacking up with OpenStack: Building for High Availability

Similaire à Stacking up with OpenStack: Building for High Availability (20)

Cloud Immortality - Architecting for High Availability & Disaster Recovery
Cloud Immortality - Architecting for High Availability & Disaster RecoveryCloud Immortality - Architecting for High Availability & Disaster Recovery
Cloud Immortality - Architecting for High Availability & Disaster Recovery
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
CloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew FowlerCloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew Fowler
 
Tour de Clouds: Understanding Multi-Cloud Integration
Tour de Clouds: Understanding Multi-Cloud IntegrationTour de Clouds: Understanding Multi-Cloud Integration
Tour de Clouds: Understanding Multi-Cloud Integration
 
RightScale overview and why I find it elegant
RightScale overview and why I find it elegantRightScale overview and why I find it elegant
RightScale overview and why I find it elegant
 
The sky's the limit
The sky's the limitThe sky's the limit
The sky's the limit
 
ExternalRS
ExternalRSExternalRS
ExternalRS
 
RightScale overview for AWS User Group
RightScale overview for AWS User GroupRightScale overview for AWS User Group
RightScale overview for AWS User Group
 
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
 
RightScale Customer Use Case - Coupa
RightScale Customer Use Case - CoupaRightScale Customer Use Case - Coupa
RightScale Customer Use Case - Coupa
 
Cloud Networking: Network aspects of the cloud
Cloud Networking: Network aspects of the cloudCloud Networking: Network aspects of the cloud
Cloud Networking: Network aspects of the cloud
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud Outage
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Building a Hybrid Cloud
Building a Hybrid CloudBuilding a Hybrid Cloud
Building a Hybrid Cloud
 
Architecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace CloudArchitecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace Cloud
 
Openflow for Cloud Scalability
Openflow for Cloud ScalabilityOpenflow for Cloud Scalability
Openflow for Cloud Scalability
 
RightScale Introduction, Amazon EBS
RightScale Introduction, Amazon EBSRightScale Introduction, Amazon EBS
RightScale Introduction, Amazon EBS
 
Cloud Foundry Bootcamp
Cloud Foundry BootcampCloud Foundry Bootcamp
Cloud Foundry Bootcamp
 
Nimble Storage - The Predicitive Multicloud Flash Fabric
Nimble Storage - The Predicitive Multicloud Flash FabricNimble Storage - The Predicitive Multicloud Flash Fabric
Nimble Storage - The Predicitive Multicloud Flash Fabric
 

Plus de OpenStack Foundation

Plus de OpenStack Foundation (20)

Sponsor Webinar - OpenStack Summit Vancouver 2018
Sponsor Webinar  - OpenStack Summit Vancouver 2018Sponsor Webinar  - OpenStack Summit Vancouver 2018
Sponsor Webinar - OpenStack Summit Vancouver 2018
 
OpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For AttendeesOpenStack Summits 101: A Guide For Attendees
OpenStack Summits 101: A Guide For Attendees
 
OpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community PresentationOpenStack Marketing Plan - Community Presentation
OpenStack Marketing Plan - Community Presentation
 
OpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group PartiesOpenStack 5th Birthday - User Group Parties
OpenStack 5th Birthday - User Group Parties
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
OpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing PlanOpenStack Foundation 2H 2015 Marketing Plan
OpenStack Foundation 2H 2015 Marketing Plan
 
OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar OpenStack Summit Tokyo Sponsor Webinar
OpenStack Summit Tokyo Sponsor Webinar
 
Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition Cinder Updates - Liberty Edition
Cinder Updates - Liberty Edition
 
Glance Updates - Liberty Edition
Glance Updates - Liberty EditionGlance Updates - Liberty Edition
Glance Updates - Liberty Edition
 
Heat Updates - Liberty Edition
Heat Updates - Liberty EditionHeat Updates - Liberty Edition
Heat Updates - Liberty Edition
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Nova Updates - Liberty Edition
Nova Updates - Liberty EditionNova Updates - Liberty Edition
Nova Updates - Liberty Edition
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Searchlight Updates - Liberty Edition
Searchlight Updates - Liberty EditionSearchlight Updates - Liberty Edition
Searchlight Updates - Liberty Edition
 
Trove Updates - Liberty Edition
Trove Updates - Liberty EditionTrove Updates - Liberty Edition
Trove Updates - Liberty Edition
 
OpenStack: five years in
OpenStack: five years inOpenStack: five years in
OpenStack: five years in
 
Swift Updates - Liberty Edition
Swift Updates - Liberty EditionSwift Updates - Liberty Edition
Swift Updates - Liberty Edition
 
Congress Updates - Liberty Edition
Congress Updates - Liberty EditionCongress Updates - Liberty Edition
Congress Updates - Liberty Edition
 
Release Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty EditionRelease Cycle Management Updates - Liberty Edition
Release Cycle Management Updates - Liberty Edition
 
OpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use CasesOpenStack Day CEE 2015: Real-World Use Cases
OpenStack Day CEE 2015: Real-World Use Cases
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Stacking up with OpenStack: Building for High Availability

  • 1. Stacking up with OpenStack: Building for High Availability Utpal Thakrar, Sr. Product Manager April 17, 2013
  • 2. 2# My relationship with HA 1975 Cloud Management #rightscale
  • 3. 3# My relationship with HA 1991 Cloud Management #rightscale
  • 4. 4# My relationship with HA 2001 How many 9-s can your product do? Cloud Management #rightscale
  • 5. 5# So what did they mean by 5-9s? Availability Allowed Down Time each Year 99% 3.65 days 99.9% 8.76 hours 99.99% 52.56 minutes 99.999% 5.26 minutes Cloud Management #rightscale
  • 6. 6# Stuff happens, are you prepared? Cloud Management #rightscale
  • 7. 7# Who dunnit?… Cloud Management #rightscale
  • 8. 8# And you see these … Cloud Management #rightscale
  • 9. 9# Is 100% Outage-proofing possible? Cloud Management #rightscale
  • 10. 10# Old School Fault-Tolerance: Build Two Cloud Management #rightscale
  • 11. 11# Golden Age of Cloud Computing No Up-Front Low Cost Pay Only for Capital Expense What You Use Self-Service Easily Scale Up Improve Agility & Infrastructure and Down Time-to-Market Deploy Cloud Management #rightscale
  • 12. 12# Golden Age for Fault-Tolerance No Up-Front HA Low Cost Pay for DR Only Capital Expense Backups When You Use it Self-Service Easily Deliver Fault- Improve Agility & DR Infrastructure Tolerant Applications Time-to-Recovery Deploy Cloud Management #rightscale
  • 13. 13# Yeah, but … What about my private cloud? Applications deployed in private clouds have to worry about: • Private Cloud Infrastructure being HA • Application architecture HA / DR • With Public Clouds – Well, you get what your provider gives you Cloud Management #rightscale
  • 14. 14# Private Cloud Infrastructure HA Several single points of failure in OpenStack deployment • OpenStack API services • MySQL • RabbitMQ Solved in various ways • Pacemaker cluster management • Keepalived (e.g: RAX Private Cloud) • MySQL (Galera), RabbitMQ (active-active mirrored queues) Eliminate SPoFs as best as you can. Cloud Management #rightscale
  • 15. 15# What about my app? Design for failure: • If your application relies on Cloud infrastructure SLA for its HA needs, you are STUCK with that vendor / infrastructure • Need to balance cost and complexity against risk tolerance • Design application so that its:  Build for server failure  Build for zone failure  Build for cloud failure  Keep management layer separate from infrastructure Cloud Management #rightscale
  • 16. 16# Build for Server Failure • Set up auto-scaling • Set up database mirroring, master/slave configuration • Use static public IPs • Use Dynamic DNS for private IPs Cloud Management #rightscale
  • 17. 17# Build for Zone Failure Static Public IPs DNS 172.168.7.31 172.168.8.62 Zone 1 Zone 2 1 LOAD BALANCERS LOAD BALANCERS Where possible, use NoSQL DB like Cassandra or MongoDB APP SERVERS AUTOSCALE MASTER DB SLAVE DB REPLICATE Block SNAPSHOTS Object store Snapshot data volume for backups so Place Slave databases in one the database can be readily recovered or more zones for failover. within the region. A creative deployment model would be to make your private cloud an “AZ” by placing it in close physical proximity to a public cloud provider Cloud Management #rightscale
  • 18. 18# Build for Cloud Failure (Cold DR) Staged Server Configuration and generally no staged data $ • Not recommended if rapid recovery is required • Slow to replicate data to other cloud and bring database online DNS 172.168.7.31 Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE Block SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  • 19. 19# Build for Cloud Failure (Warm DR) Staged Server Configuration, pre-staged data and running Slave Database Server $$ • Generally recommended DR solution • Minimal additional cost and allows fairly rapid recovery DNS 172.168.7.31 Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE Block SNAPSHOTS SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  • 20. 20# Build for Cloud Failure (Hot DR) Parallel Deployment with all servers running but all traffic going to primary $$$ • Not recommended • Very high additional cost to allow rapid recovery DNS 172.168.7.31 Private DALLAS LOAD BALANCERS LOAD BALANCERS APP SERVERS APP SERVERS MASTER DB SLAVE DB SLAVE DB REPLICATE REPLICATE Block SNAPSHOTS SNAPSHOTS CLOUD Cloud Management FILES #rightscale
  • 21. 21# Availability vs. Cost - Dial Cost Availability Min Min Max Max Cloud Management #rightscale
  • 22. 22# Make sure workload is portable across clouds Cloud Management #rightscale
  • 23. 23# Automate and test everything • Automate backups of your data • Setup monitoring and alerts • Run fire-drills! Plan and Practice your recovery procedures! Cloud Management #rightscale
  • 24. 24# Separate Management layer from Infrastructure • Keep the keys to the car outside the car Cloud Management #rightscale
  • 25. 25# Automating HA and DR • Use dynamic DNS for your database servers • Allow app servers to use a single FQDN. • Use a low TTL to allow rapid failover in the case of a change in master database • Automatic connection of app servers to load balancing servers • App servers can connect to all load balancers automatically at launch • No manual intervention • No DNS modifications • Automated promotion of slave to master • Process is automated • Decision to run process is manual Cloud Management #rightscale
  • 26. Samsung SDS Mr. Kirk Kim Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved
  • 27. Hybrid Cloud Network Architecture Internet traffic CF Router Public ASN: XXXX Firewall IPS VPN Gateway Compute EIP: e.x.y.b EIP: e.x.y.a VM VM Private Network VM VM VPC Virtual GW Private: 10.x.x.x/24 Private: 10.x.x.x/24 VM VM Public: *.*.*.0/24 Public: *.*.*.0/24 Internet GW 10.x.x.x/24 Object Storage SPCS Public Cloud Between SPCS and Public Cloud using public IP Between SPCS and Public Cloud using private IP Internet traffic to SPCS and Public Cloud using public IP Copyright © 2013 Samsung SDS Co., Ltd. All rights reserved 27
  • 28. 28# How RightScale makes it possible RightScale ServerTemplates™ • Reproducible: Predictable deployment • Dynamic: Configuration from scripts at boot time • Multi-cloud: Cloud agnostic and portable • Modular: Role and behavior abstracted from cloud infrastructure Cloud Management #rightscale
  • 29. 29# How RightScale makes it possible MultiCloud Images • MultiCloud Images can be launched across regions and clouds without modification ServerTemplate contains a list 1 of MultiCloud Images (MCIs) When the Server is 2 created, a specific MCI is chosen. The appropriate 3 RightImage is used at MultiCloud Images launch. Cloud A, B, Image 1 Cloud A C, Image 2 Cloud B, Image 1 Cloud A, B, Image 1 Cloud B Stability across clouds Image 1 RightImage Cloud Management #rightscale
  • 30. 30# Outage-Proofing Best Practices Place in >1 Replicate data Replicate data zone: across zones across zones • Load balancers  Backup across Design stateless • App servers regions & clouds apps for • Databases  Monitoring, alert, resilience to Maintain and automate reboot / relaunch capacity to operations to absorb zone or speed up region failures failover Cloud Management #rightscale
  • 31. 31# Thank you! Sign-up for a free account at: www.rightscale.com Check out job postings are: www.rightscale.com/jobs We are hiring! Cloud Management #rightscale

Notes de l'éditeur

  1. Good afternoon folks, Hope you are here for the high availability discussion.. In case of an emergency, we have specially arrange a highly available pair of exits to your left and behind ya..So, let me tell u a bit about myself and what HA means to me.. I am a product manager at RightScale..
  2. My relationship with HA goes back all the way to my kindergarten years, growing up in India. Going to my first big kindergarten exam, I recall worrying about having more than one sharpened pencils in my pencil box ready to go. And yes, kindergarteners have exams in India, but that’s an entirely different discussion. Fast forward to my college days, taking my big 747 flight to california. Yes, you guessed it, I worried about the plane having enough engines so if one of them failed, I wouldn’t become fish food in the pacific ocean Fast forward few more years to my telecommunication days – visiting KDDI and NTT DoCoMo in Japan for discussion on our messaging product.. They pretty much immediately got to the topic of “how many 9s does your product do”? Any anything less than 5-9s would not have been an acceptable answer in the heavily regulated Japanese telecommunication market.
  3. Fast forward to my college days, taking my first big flight on a 747 to california. Yes, you guessed it, I worried about the plane having enough engines so if one of them failed, I wouldn’t become fish food in the pacific ocean
  4. Fast forward few more years to my telecommunication days – visiting KDDI and NTT DoCoMo in Japan for discussion on our messaging product.. They pretty much immediately got to the topic of “how many 9s does your product do”? Any anything less than 5-9s would not have been an acceptable answer in the heavily regulated Japanese telecommunication market.
  5. Quick definition of how the “9s” availability translates to allowed downtime each year
  6. Leap forward to 2012 – the cloud era is in full swing. Behemoth cloud providers are stamping out VMs like Oreo cookies, while preaching the mantra “everything fails all the time”.And rightfully so – In 2012, we saw 27 sizable outages in public, private, hosting and SaaS providers.Infographic -- not just restricted to cloud computing only..- 7 major cloud outages in 2012.. Average company has 1 major and 3 minor DC outages per year$5k per min of downtime (avg cost)They are starting to become more and more public as more people are getting on the cloud..-May of 2010. - first big one that happened was in- April 2011  -- lot of people that got a lot of press
  7. Among the top-5 causes for outages were power loss, natural disasters, software bugs that cascaded and operator errors.Even though large scale outages are rare, they do happen and will continue to happen in the future.
  8. In the aftermath of outages, you see these..Outages are expensive – there is nothing more frustrating to a modern day consumer to go to a website and see its down.. Every minute of downtime affect your revenue and your brand reputation. Computer Associates did a study last year that the cost of outages is about $26 Billion a year.Cost of
  9. We are in the golden age of cloud computing..
  10. At the end of the day, you are responsible for the HA of your application. Cloud infrastructure provides tools.Relying on cloud infrastructure for HA is a recipe for trouble as this locks you into that cloud infra.. You need portability, so when you move your application to another cloud, it stands on its own merit.Complexity of HA against the risk.. Auto and home insurance. The cost of HA goes up exponentially as you reduce your tolerance for downtime (Recovery time objective) as well as tolerance for data loss (Recovery Point objective).
  11. This is what we generally recommend when someone comes to us and says I want HAThree tiered ApplicationRR DNS Load BalancersArray of Application ServersMaster – Slave DatabasesAtleast one of each component in each AZPlace slave database in different zone, so if one of the zones were to go down, you will not have an outage.. Granted there will be some performance degradation..
  12. During emergencies, time is precious – make sure it works
  13. If both goes down, u have no where to go..if the disaster hits management, u still have the app,if the disaster hit app u can execute on DR scenarios..
  14. Which parts you should automate and which parts you shouldn’t..We always recommend using dynamic DNS for your DB servers.. This allows app servers to use a single FQDN that can be resolved by the dynamic DNS. So in case of a failover, Dynamic DNS gets automatically updated and the servers will discover the new DB once the TTL expires.Use low TTL(e.g: mymaster.mydomain.com)We recommend automating the process of connecting apps servers to LBs. So when a new app server fires up, it automatically registers itself to the load balancer without manual interventionThe process is automated, decision to run the process is manual.. Once u pushed that button, there is no going back, so make sure u are certain before you failover.. The promotion happened in case where the master wasn’t really down but it resulted
  15. I AM representing RightScale today, so a little bit on how RightScale can help.Server templates allow you to pre-configure servers by starting from a base image and adding scripts that run during boot, operational and shutdown phases of a server instance.The key benefit of a server template is that they help you create a easily reproducible server setup. And this can be done across multiple clouds..Through the server configuration mechanism that is built into the server templates, they servers have the ability to automatically join load balancer pools, autoscale across zones etc.
  16. I AM representing RightScale today, so a little bit on how RightScale can help.Server Template contains a list of multi-cloud images.. When a server is created, Quickly, efficiently and repeatably