Site disruptions happen, often when you least expect. When your business depends on application uptime or access to critical data, a strategy for high availability (HA) and disaster recovery (DR) is essential. Carefully considering how to architect and successfully implement an HA and DR strategy helps ensure that you minimize risk, strengthen fault tolerance, and rapidly re-deploy your application and data in case of a disruption.
This presentation walks through an overview of HA and DR, and offers some best practices from the Engine Yard team.
The full on-demand webcast can be viewed here: http://pages.engineyard.com/BestPracticesforSurvivingOutagesWebcast.html
Powerful Google developer tools for immediate impact! (2023-24 C)
Best Practices for Surviving Outages
1. Best Practices for
Surviving Outages
Designing and implementing a High Availability
and Disaster Recovery strategy
Sal Cardello, Matt Dolian, Avroham Katz,
Director of System Engineer System Engineer
Pro Services
3. Tiers of Disaster Recovery
0 - No off-site data
1 - Data backup with no hot site
2 - Data backup with hot site
3 - Electronic vaulting
4 - Point-in-time copies
5 - Transaction integrity
6 - Zero or near-Zero data loss
7 - Highly automated, business
integrated solution
Citation: http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery 3
4. Definition: High Availability
“Design approach & associated service
implementation that ensures a pre-
arranged level of operational
performance will be met during a
contractual measurement period”
Citation: ttp://en.wikipedia.org/wiki/High_availability 4
7. Best Practices for High Availability
Environment Validate
Analysis Synchronization
Geographic
Escalation Plan
Mirroring
Database
Replication Test
Store Assets
Launch
Replication
Photo Credit: http://bit.ly/z9OEwG 7
8. Application Considerations
• Environment Specific Configurations
• Asset Hosting
• Page Caching
• Other Data Stores
• Background Processing
• Cron Jobs
Photo credit: http://www.flickr.com/photos/dseneste/5912382808/ 8
9. Failover Process at Engine Yard
Manual, customer owned decision
1. Client contacted per
terms of SLA
2. Engine Yard syncs
database and performs
manual failover
3. Redundant database
promoted to master
4. DNS is updated
5. Replication to former
master is re-established
9
11. Get in touch
Contact us:
Sal Cardello, Director of Pro Services
proservices@engineyard.com
Learn more:
http://www.engineyard.com/services
11
Notes de l'éditeur
Introduction roles and titlesMelissaSalAvrohomMatt
What is Disaster Recovery?The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induceddisaster.
Seven Tiers to Disaster Recovery0: No off-site data – Possibly no recovery 1: Data backup with no hot site 2: Data backup with a hot site 3: Electronic vaulting 4: Point-in-time copies 5: Transaction integrity 6: Zero or near-Zero data loss 7: Highly automated, business integrated solution
High Availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.Sal to explain, Matt to cover diagram.
Avrohom to talk about complexityWhy should we implement a H/A environment.Revenue lossMore consistent up timeHigher client satisfactionBetter level of protection for critical systemsInsuranceThings to know up front about implementing a H/A environmentCostAdditional Complexity
AvrohomImplementation for High Availability systemNeeds Assessment H/A is implemented using geo-redundant systemsDatabases are kept in sync using replicationAssets are ideally stored on a storage system such as Amazon S3 but can be kept in sync using rsyncFile system synchronized between locationsCode is deployed to both systemsStack changes applied to both systemsCreate escalation flow chartBring up Secondary Site.One week test cycleFailover testLive
EnvConfigs: Stored as template in Chef Stored on filesystem and symlinked on deploy with CapistranoAsset hosting: Assets must be synced if stored locally Adds complexity and strain on resourcesPage caching: Sync page cache to prevent higher response time as cache warmsOther data stores: Dump and sync data at select intervals and during failoverBackground: Wait for jobs to finish when failing over consider where jobs are storedCron jobs:Use a gem such as whenever to automate cron jobs
Decision to failover is mutualNo automatic failoverDBA is brought in to perform manual failoverClient uptime needs are designated in client flow chartDBA promotes redundant database to masterDNS is updatedRe-establish replication to former master once back onlineDBA is brought in to check the state of the database and perform manual failoverClient uptime needs are designated in client flow chartAfter the decision to failover is made, a DBA promotes the redundant database to masterAfter a quick test of the redundant system, DNS is updatedLow TTL should be setDNS load balancing such as DynECT Managed DNS can be used to minimize downtime during IP switchRe-establish: When the former master environment is back online, configure the former master database as a read only slave