SlideShare une entreprise Scribd logo
1  sur  11
Best Practices for
                Surviving Outages
                Designing and implementing a High Availability
                and Disaster Recovery strategy



Sal Cardello,           Matt Dolian,           Avroham Katz,
Director of             System Engineer        System Engineer
Pro Services
Disaster Recovery




       Photo credit: naturaldisasterss.com/wp-content/uploads/2011/12/Natural-Disaster-Images.jpg   2
Tiers of Disaster Recovery

            0 - No off-site data
            1 - Data backup with no hot site
            2 - Data backup with hot site
            3 - Electronic vaulting
            4 - Point-in-time copies
            5 - Transaction integrity
            6 - Zero or near-Zero data loss
            7 - Highly automated, business
                integrated solution
        Citation: http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery   3
Definition: High Availability



    “Design approach & associated service
    implementation that ensures a pre-
    arranged level of operational
    performance will be met during a
    contractual measurement period”



             Citation: ttp://en.wikipedia.org/wiki/High_availability   4
High Availability Architecture




                                 5
Why implement HA?




                    6
Best Practices for High Availability

        Environment                                    Validate
         Analysis                                   Synchronization


        Geographic
                                                    Escalation Plan
         Mirroring


        Database
        Replication                                      Test


        Store Assets
                                                        Launch
        Replication

               Photo Credit: http://bit.ly/z9OEwG                     7
Application Considerations

• Environment Specific Configurations

• Asset Hosting

• Page Caching

• Other Data Stores

• Background Processing

• Cron Jobs
              Photo credit: http://www.flickr.com/photos/dseneste/5912382808/   8
Failover Process at Engine Yard

Manual, customer owned decision
1. Client contacted per
   terms of SLA
2. Engine Yard syncs
   database and performs
   manual failover
3. Redundant database
   promoted to master
4. DNS is updated
5. Replication to former
   master is re-established


                                  9
Questions?


             10
Get in touch

Contact us:
Sal Cardello, Director of Pro Services
proservices@engineyard.com

Learn more:
http://www.engineyard.com/services

                                     11

Contenu connexe

Plus de Engine Yard

Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to RubyEngine Yard
 
JRuby: Enhancing Java Developers Lives
JRuby: Enhancing Java Developers LivesJRuby: Enhancing Java Developers Lives
JRuby: Enhancing Java Developers LivesEngine Yard
 
High Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedHigh Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedEngine Yard
 
Release Early & Release Often: Reducing Deployment Friction
Release Early & Release Often: Reducing Deployment FrictionRelease Early & Release Often: Reducing Deployment Friction
Release Early & Release Often: Reducing Deployment FrictionEngine Yard
 
JRuby Jam Session
JRuby Jam Session JRuby Jam Session
JRuby Jam Session Engine Yard
 
Rubinius and Ruby | A Love Story
Rubinius and Ruby | A Love Story Rubinius and Ruby | A Love Story
Rubinius and Ruby | A Love Story Engine Yard
 
Rails Antipatterns | Open Session with Chad Pytel
Rails Antipatterns | Open Session with Chad Pytel Rails Antipatterns | Open Session with Chad Pytel
Rails Antipatterns | Open Session with Chad Pytel Engine Yard
 
JRuby: Apples and Oranges
JRuby: Apples and OrangesJRuby: Apples and Oranges
JRuby: Apples and OrangesEngine Yard
 
Developing a Language
Developing a LanguageDeveloping a Language
Developing a LanguageEngine Yard
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Everything Rubinius
Everything RubiniusEverything Rubinius
Everything RubiniusEngine Yard
 
Rails Hosting and the Woes
Rails Hosting and the WoesRails Hosting and the Woes
Rails Hosting and the WoesEngine Yard
 

Plus de Engine Yard (13)

Introduction to Ruby
Introduction to RubyIntroduction to Ruby
Introduction to Ruby
 
JRuby: Enhancing Java Developers Lives
JRuby: Enhancing Java Developers LivesJRuby: Enhancing Java Developers Lives
JRuby: Enhancing Java Developers Lives
 
High Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedHigh Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. Threaded
 
Release Early & Release Often: Reducing Deployment Friction
Release Early & Release Often: Reducing Deployment FrictionRelease Early & Release Often: Reducing Deployment Friction
Release Early & Release Often: Reducing Deployment Friction
 
JRuby Jam Session
JRuby Jam Session JRuby Jam Session
JRuby Jam Session
 
Rubinius and Ruby | A Love Story
Rubinius and Ruby | A Love Story Rubinius and Ruby | A Love Story
Rubinius and Ruby | A Love Story
 
Rails Antipatterns | Open Session with Chad Pytel
Rails Antipatterns | Open Session with Chad Pytel Rails Antipatterns | Open Session with Chad Pytel
Rails Antipatterns | Open Session with Chad Pytel
 
JRuby: Apples and Oranges
JRuby: Apples and OrangesJRuby: Apples and Oranges
JRuby: Apples and Oranges
 
Developing a Language
Developing a LanguageDeveloping a Language
Developing a Language
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Geemus
GeemusGeemus
Geemus
 
Everything Rubinius
Everything RubiniusEverything Rubinius
Everything Rubinius
 
Rails Hosting and the Woes
Rails Hosting and the WoesRails Hosting and the Woes
Rails Hosting and the Woes
 

Dernier

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Dernier (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Best Practices for Surviving Outages

  • 1. Best Practices for Surviving Outages Designing and implementing a High Availability and Disaster Recovery strategy Sal Cardello, Matt Dolian, Avroham Katz, Director of System Engineer System Engineer Pro Services
  • 2. Disaster Recovery Photo credit: naturaldisasterss.com/wp-content/uploads/2011/12/Natural-Disaster-Images.jpg 2
  • 3. Tiers of Disaster Recovery 0 - No off-site data 1 - Data backup with no hot site 2 - Data backup with hot site 3 - Electronic vaulting 4 - Point-in-time copies 5 - Transaction integrity 6 - Zero or near-Zero data loss 7 - Highly automated, business integrated solution Citation: http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery 3
  • 4. Definition: High Availability “Design approach & associated service implementation that ensures a pre- arranged level of operational performance will be met during a contractual measurement period” Citation: ttp://en.wikipedia.org/wiki/High_availability 4
  • 7. Best Practices for High Availability Environment Validate Analysis Synchronization Geographic Escalation Plan Mirroring Database Replication Test Store Assets Launch Replication Photo Credit: http://bit.ly/z9OEwG 7
  • 8. Application Considerations • Environment Specific Configurations • Asset Hosting • Page Caching • Other Data Stores • Background Processing • Cron Jobs Photo credit: http://www.flickr.com/photos/dseneste/5912382808/ 8
  • 9. Failover Process at Engine Yard Manual, customer owned decision 1. Client contacted per terms of SLA 2. Engine Yard syncs database and performs manual failover 3. Redundant database promoted to master 4. DNS is updated 5. Replication to former master is re-established 9
  • 11. Get in touch Contact us: Sal Cardello, Director of Pro Services proservices@engineyard.com Learn more: http://www.engineyard.com/services 11

Notes de l'éditeur

  1. Introduction roles and titlesMelissaSalAvrohomMatt
  2. What is Disaster Recovery?The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induceddisaster.
  3. Seven Tiers to Disaster Recovery0: No off-site data – Possibly no recovery 1: Data backup with no hot site 2: Data backup with a hot site 3: Electronic vaulting 4: Point-in-time copies 5: Transaction integrity 6: Zero or near-Zero data loss 7: Highly automated, business integrated solution
  4. High Availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.Sal to explain, Matt to cover diagram.
  5. Avrohom to talk about complexityWhy should we implement a H/A environment.Revenue lossMore consistent up timeHigher client satisfactionBetter level of protection for critical systemsInsuranceThings to know up front about implementing a H/A environmentCostAdditional Complexity
  6. AvrohomImplementation for High Availability systemNeeds Assessment H/A is implemented using geo-redundant systemsDatabases are kept in sync using replicationAssets are ideally stored on a storage system such as Amazon S3 but can be kept in sync using rsyncFile system synchronized between locationsCode is deployed to both systemsStack changes applied to both systemsCreate escalation flow chartBring up Secondary Site.One week test cycleFailover testLive
  7. EnvConfigs: Stored as template in Chef Stored on filesystem and symlinked on deploy with CapistranoAsset hosting: Assets must be synced if stored locally Adds complexity and strain on resourcesPage caching: Sync page cache to prevent higher response time as cache warmsOther data stores: Dump and sync data at select intervals and during failoverBackground: Wait for jobs to finish when failing over consider where jobs are storedCron jobs:Use a gem such as whenever to automate cron jobs
  8. Decision to failover is mutualNo automatic failoverDBA is brought in to perform manual failoverClient uptime needs are designated in client flow chartDBA promotes redundant database to masterDNS is updatedRe-establish replication to former master once back onlineDBA is brought in to check the state of the database and perform manual failoverClient uptime needs are designated in client flow chartAfter the decision to failover is made, a DBA promotes the redundant database to masterAfter a quick test of the redundant system, DNS is updatedLow TTL should be setDNS load balancing such as DynECT Managed DNS can be used to minimize downtime during IP switchRe-establish: When the former master environment is back online, configure the former master database as a read only slave