SlideShare une entreprise Scribd logo
1  sur  14
Disaster Recovery
Russ Pedneault       Anil C. Sedha      Kevin Seniuk         and Failover using
    Technology
 Services Manager
                    Midrange Services
                       Supervisor
                                        Senior Technical
                                           Specialist
                                                                   SRM


                                                           VMWare Forum Winnipeg
                                                               May 15, 2012
Company Overview
 Largest publisher by circulation of paid English-language daily newspapers in
  Canada, representing some of the country’s oldest and best known media brands.

 Reaching millions of Canadians every week

 Engage readers and offer advertisers and marketers integrated solutions to
  effectively reach target audiences through a variety of print, online, digital, and
  mobile platforms.

 Postmedia Network is a Mobile Web Leader – 120 Daily News media mobile sites,
  80+ vertical mobile web sites, 1M monthly visitors, 9M monthly page views.




                                                                                        2
IT Overview
 Virtualization Platform: VMWare vSphere 4.1 and 5.0, SRM v4.1

 500+ Virtual Servers, 250 Physical servers, 3 Virtual Center servers, 4000+
  desktops, 3 datacenters and 13 smaller sites

 Server Hardware: HP, Cisco, SUN, and Apha/VMS servers

 EMC Clariion and VNX arrays, HP EVA arrays, Sun Storage, Data domain VTL

 Operating System: VMWare ESXi, Windows 2003/2008, HP-UX, VMS, Red Hat
  Enterprise Linux, Solaris, Suse Linux, Apple

 Messaging: Exchange 2007, MS Office Communicator, Cisco Unified Messaging

 Database: Oracle, MS SQL, Sybase, MySQL




                                                                                3
Virtualization/SRM Story
Background
IT could not recover data quickly enough so Postmedia recovery plans were time consuming and
involved special recovery procedures requiring expert knowledge.

Challenges
IT environment was running mostly on old physical servers and had clustering/mirroring in place

The Inevitable Happens
- An entire datacenter goes down due to a power outage despite power protection.
- After power was restored another outage had to be taken to perform repairs.
- Enhanced recovery procedures were not in place at that time

Resolution
- Deploy virtualization first strategy
- Implement SRM with existing Storage Replication Technology
- Upgrade SRM to run with newer Storage Replication Technology

Turnaround
SRM failover brings relief and a new self confidence in the organization that data can be recovered
in a very short duration with roll back capabilities.




                                                                                                      4
Background
Key Issues –

- Recovery timeline was unacceptable for some revenue generating applications

- Multiple resources from Application and Infrastructure teams had to be involved

- Operational sequence for recovery was manual so mistakes could easily happen

- Changes in application environments meant keeping up with those changes manually

- Managing failover/recovery of remote sites




                                                                                     5
Challenges

 Physical server infrastructure does not offer the flexibility for easy failover to
  secondary site.

 Reliance on aging hardware – unsure if server would come up after restart

 Many manual steps needed to make remote site operational

 Required specialists to bring up Storage environment at remote site before Server
  environment could be brought up.

 Clustered Environments presented additional challenges – Microsoft Cluster, HP-
  UX Cluster, Sun Cluster.

 Push back from Application teams – don’t touch the server running our applications




                                                                                       6
Challenges
 A large number of application servers were running on physical
hardware.

 A great deal of effort was needed by both Application and
Infrastructure teams.

 Outages to critical applications for longer than expected
timeframe would mean revenue loss.

 IT had never done a datacenter recovery or failover in the past.




                                                                     7
Reality Bites (Power Outage)

 There was an unexpected Power Outage at one of our Datacenters and all
 servers went offline for approximately an hour.

 Server Recovery after power outage took further effort and quite a few
 hours.

 The initial event left Postmedia IT wondering what to do since a recovery
 would have taken many hours.

 Once power was restored, a planned failover was needed by Service
 Provider to perform power infrastructure repair for around 8 hours.

 Postmedia was given 5 days after negotiation (scheduled to next day
 earlier) to perform the planned failover before outage.


                                                                              8
What SRM did for us
 Created a complete recovery process in a simple, centralized recovery plan,
and automated recovery steps.

SRM allowed failover of the Exchange 2007 environment in minutes.

 Other application servers failed over in minutes as well.

 Half of the datacenter move was accomplished quickly and within the
expected timeframe.

 The success of SRM and Virtualization gave the impetus to create further cost
savings by virtualizing and retiring older servers.




                                                                                  9
What SRM did for us (Contd)
 Postmedia IT chose the approach of showcasing the benefits of
virtualization instead of forcing virtualization on the business.

 Highlighted the capabilities of SRM failover of the Exchange 2007
environment in minutes.

Recovery is very simplified and even a non-IT individual within the
organization with the authorization and awareness of documented
login procedures can press the recovery button in case of a disaster.




                                                                        10
Lessons Learned

SRM recovery plans should be created based on which
application consistency groups need to be failed over together.

 Review your common outage windows based on applications

Ensure you have efficient storage replication mechanisms in place
that integrate with SRM.

Verify your Recovery Plans in advance by running a test (this does
not perform an actual failover)




                                                                      11
Planned Failover - Now
 With newer replication mechanisms available in the industry it is more
easier and quicker to perform failover using SRM.

 Postmedia moved away from traditional software based replication to
hardware appliance based replication.

 We now have PVR like capabilities to rollback data to any point in time –
right down to the seconds

 Our recent array upgrade required planned failovers and we were able to
failover Exchange and other critical applications in 7-13 minutes per
recovery group.

 Tested before we failed over to ensure success

 Ran 3 recovery plans simultaneously for faster failover

                                                                              12
Where we are today
450+ virtual servers, 50+ ESXi hosts

SRM 4.1 fully implemented for all virtualized production servers

Replication mechanism fully integrated and automated with SRM – wide variety of
storage related replication products

Recovery of critical applications like Exchange, Citrix, CMS, takes 7-13 minutes to
bring servers up at secondary site

Settled down on RecoverPoint appliances to perform Replication since it offers PVR
like data rollback capabilities.

The organization has adopted a “Virtualize First” strategy.

Significant ability to meet business timelines for application recovery.

Can recover an entire datacenter quickly and successfully.

                                                                                       13
Thank You !

              14

Contenu connexe

Tendances

Improving Application Availability on Virtual Machines
Improving Application Availability on Virtual MachinesImproving Application Availability on Virtual Machines
Improving Application Availability on Virtual MachinesNeverfail Group
 
Avamar weekly webcast
Avamar weekly webcastAvamar weekly webcast
Avamar weekly webcaststefriche0199
 
Protecting Mission Critical Bus App
Protecting Mission Critical Bus AppProtecting Mission Critical Bus App
Protecting Mission Critical Bus AppJeremy Bigler
 
Veeam Product info - Backup Standard vs. Enterprise Edition
Veeam Product info -  Backup Standard vs. Enterprise EditionVeeam Product info -  Backup Standard vs. Enterprise Edition
Veeam Product info - Backup Standard vs. Enterprise EditionSuministros Obras y Sistemas
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL ServerBob Roudebush
 
Vizioncore Economical Disaster Recovery through Virtualization
Vizioncore Economical Disaster Recovery through VirtualizationVizioncore Economical Disaster Recovery through Virtualization
Vizioncore Economical Disaster Recovery through Virtualization1CloudRoad.com
 
V mware v sphere 5 fundamentals services kit
V mware v sphere 5 fundamentals services kitV mware v sphere 5 fundamentals services kit
V mware v sphere 5 fundamentals services kitsolarisyougood
 
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large Installations
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large InstallationsTECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large Installations
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large InstallationsSymantec
 
High Availability og virtualisering, IBM Power Event
High Availability og virtualisering, IBM Power EventHigh Availability og virtualisering, IBM Power Event
High Availability og virtualisering, IBM Power EventIBM Danmark
 
STN Event 12.8.09 - Chris Vain Powerpoint Presentation
STN Event 12.8.09 - Chris Vain Powerpoint PresentationSTN Event 12.8.09 - Chris Vain Powerpoint Presentation
STN Event 12.8.09 - Chris Vain Powerpoint Presentationmcini
 
Techarex networks introduces disaster recovery as a service (draas) in united...
Techarex networks introduces disaster recovery as a service (draas) in united...Techarex networks introduces disaster recovery as a service (draas) in united...
Techarex networks introduces disaster recovery as a service (draas) in united...Techarex Networks
 
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage ManagementAlan McSweeney
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Serhad MAKBULOĞLU, MBA
 
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC
 
Whitepaper Server Virtualisation And Storage Management
Whitepaper   Server Virtualisation And Storage ManagementWhitepaper   Server Virtualisation And Storage Management
Whitepaper Server Virtualisation And Storage ManagementAlan McSweeney
 

Tendances (20)

Improving Application Availability on Virtual Machines
Improving Application Availability on Virtual MachinesImproving Application Availability on Virtual Machines
Improving Application Availability on Virtual Machines
 
Avamar weekly webcast
Avamar weekly webcastAvamar weekly webcast
Avamar weekly webcast
 
Protecting Mission Critical Bus App
Protecting Mission Critical Bus AppProtecting Mission Critical Bus App
Protecting Mission Critical Bus App
 
Veeam Product info - Backup Standard vs. Enterprise Edition
Veeam Product info -  Backup Standard vs. Enterprise EditionVeeam Product info -  Backup Standard vs. Enterprise Edition
Veeam Product info - Backup Standard vs. Enterprise Edition
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Vizioncore Economical Disaster Recovery through Virtualization
Vizioncore Economical Disaster Recovery through VirtualizationVizioncore Economical Disaster Recovery through Virtualization
Vizioncore Economical Disaster Recovery through Virtualization
 
V mware v sphere 5 fundamentals services kit
V mware v sphere 5 fundamentals services kitV mware v sphere 5 fundamentals services kit
V mware v sphere 5 fundamentals services kit
 
Virtualization
VirtualizationVirtualization
Virtualization
 
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large Installations
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large InstallationsTECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large Installations
TECHNICAL BRIEF▶ Backup Exec 15 Blueprint for Large Installations
 
High Availability og virtualisering, IBM Power Event
High Availability og virtualisering, IBM Power EventHigh Availability og virtualisering, IBM Power Event
High Availability og virtualisering, IBM Power Event
 
STN Event 12.8.09 - Chris Vain Powerpoint Presentation
STN Event 12.8.09 - Chris Vain Powerpoint PresentationSTN Event 12.8.09 - Chris Vain Powerpoint Presentation
STN Event 12.8.09 - Chris Vain Powerpoint Presentation
 
Streamline it & save with virtualization
Streamline it & save with virtualizationStreamline it & save with virtualization
Streamline it & save with virtualization
 
Discover what's new in Windows Server 2012 Active Directory
Discover what's new in Windows Server 2012 Active DirectoryDiscover what's new in Windows Server 2012 Active Directory
Discover what's new in Windows Server 2012 Active Directory
 
Techarex networks introduces disaster recovery as a service (draas) in united...
Techarex networks introduces disaster recovery as a service (draas) in united...Techarex networks introduces disaster recovery as a service (draas) in united...
Techarex networks introduces disaster recovery as a service (draas) in united...
 
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
 
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
Windows Server 2012 Active Directory Domain and Trust (Forest Trust)
 
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
 
EMC DR Case Study
EMC DR Case StudyEMC DR Case Study
EMC DR Case Study
 
Profile narendraredy
Profile narendraredyProfile narendraredy
Profile narendraredy
 
Whitepaper Server Virtualisation And Storage Management
Whitepaper   Server Virtualisation And Storage ManagementWhitepaper   Server Virtualisation And Storage Management
Whitepaper Server Virtualisation And Storage Management
 

En vedette

0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamedSyed Ahamed
 
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamedSyed Ahamed
 
オスグッド(成長痛)の予防法
オスグッド(成長痛)の予防法オスグッド(成長痛)の予防法
オスグッド(成長痛)の予防法めぐり整体所
 
Genial save the_world
Genial save the_worldGenial save the_world
Genial save the_worldEmilio Urbano
 
VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011asedha
 
野球肘・野球肩の予防法
野球肘・野球肩の予防法野球肘・野球肩の予防法
野球肘・野球肩の予防法めぐり整体所
 
Trastorno de negativismo desafiante
Trastorno de negativismo desafianteTrastorno de negativismo desafiante
Trastorno de negativismo desafianteJuanjosecubillos
 
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder Implications
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder ImplicationsDisrupt the Status Quo : Digital Revolution, Fintech & Stakeholder Implications
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder ImplicationsSid Kurian
 

En vedette (12)

0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
 
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed0. fe mwas chennai meet at siet college   8th dec. 13- syed ahamed
0. fe mwas chennai meet at siet college 8th dec. 13- syed ahamed
 
オスグッド(成長痛)の予防法
オスグッド(成長痛)の予防法オスグッド(成長痛)の予防法
オスグッド(成長痛)の予防法
 
Genial save the_world
Genial save the_worldGenial save the_world
Genial save the_world
 
VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011
 
Historia y comentario
Historia y comentarioHistoria y comentario
Historia y comentario
 
野球肘・野球肩の予防法
野球肘・野球肩の予防法野球肘・野球肩の予防法
野球肘・野球肩の予防法
 
Xii highlights
Xii highlightsXii highlights
Xii highlights
 
Trastorno de negativismo desafiante
Trastorno de negativismo desafianteTrastorno de negativismo desafiante
Trastorno de negativismo desafiante
 
53011220072
5301122007253011220072
53011220072
 
2015 aplof criterios_evaluacion
2015 aplof criterios_evaluacion2015 aplof criterios_evaluacion
2015 aplof criterios_evaluacion
 
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder Implications
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder ImplicationsDisrupt the Status Quo : Digital Revolution, Fintech & Stakeholder Implications
Disrupt the Status Quo : Digital Revolution, Fintech & Stakeholder Implications
 

Similaire à VMWare Forum Winnipeg - 2012

Business Track Session 1: The Power of udp
Business Track Session 1: The Power of udpBusiness Track Session 1: The Power of udp
Business Track Session 1: The Power of udparcserve data protection
 
VMware Site Recovery Manager
VMware Site Recovery ManagerVMware Site Recovery Manager
VMware Site Recovery ManagerJürgen Ambrosi
 
Case Study: Datalink—Manage IT monitoring the MSP way
Case Study: Datalink—Manage IT monitoring the MSP wayCase Study: Datalink—Manage IT monitoring the MSP way
Case Study: Datalink—Manage IT monitoring the MSP wayCA Technologies
 
5 Ways to Avoid Server and Application Downtime
5 Ways to Avoid Server and Application Downtime5 Ways to Avoid Server and Application Downtime
5 Ways to Avoid Server and Application DowntimeNeverfail Group
 
Dr Training V1 07 17 09 Rev Four 4
 Dr Training V1 07 17 09 Rev Four 4 Dr Training V1 07 17 09 Rev Four 4
Dr Training V1 07 17 09 Rev Four 4Ricoh
 
Emc vspex customer_presentation_private_cloud_virtualized_share_point
Emc vspex customer_presentation_private_cloud_virtualized_share_pointEmc vspex customer_presentation_private_cloud_virtualized_share_point
Emc vspex customer_presentation_private_cloud_virtualized_share_pointxKinAnx
 
Emc solutions for sap_overview
Emc solutions for sap_overviewEmc solutions for sap_overview
Emc solutions for sap_overviewCenk Ersoy
 
How to achieve better backup with Symantec
How to achieve better backup with SymantecHow to achieve better backup with Symantec
How to achieve better backup with SymantecArrow ECS UK
 
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...Mydbops
 
Accelerating and Protecting your Virtualize Environment
Accelerating and Protecting your Virtualize EnvironmentAccelerating and Protecting your Virtualize Environment
Accelerating and Protecting your Virtualize EnvironmentCTI Group
 
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...Symantec
 
Unified Recovery Management
Unified Recovery ManagementUnified Recovery Management
Unified Recovery ManagementIBM
 
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...zOSCommserver
 
EMC VPLEX Continuous availability and non disruptive
EMC VPLEX Continuous availability and non disruptiveEMC VPLEX Continuous availability and non disruptive
EMC VPLEX Continuous availability and non disruptivesolarisyougood
 

Similaire à VMWare Forum Winnipeg - 2012 (20)

Business Track Session 1: The Power of udp
Business Track Session 1: The Power of udpBusiness Track Session 1: The Power of udp
Business Track Session 1: The Power of udp
 
VMware Site Recovery Manager
VMware Site Recovery ManagerVMware Site Recovery Manager
VMware Site Recovery Manager
 
Case Study: Datalink—Manage IT monitoring the MSP way
Case Study: Datalink—Manage IT monitoring the MSP wayCase Study: Datalink—Manage IT monitoring the MSP way
Case Study: Datalink—Manage IT monitoring the MSP way
 
Commercial track 1_The Power of UDP
Commercial track 1_The Power of UDPCommercial track 1_The Power of UDP
Commercial track 1_The Power of UDP
 
5 Ways to Avoid Server and Application Downtime
5 Ways to Avoid Server and Application Downtime5 Ways to Avoid Server and Application Downtime
5 Ways to Avoid Server and Application Downtime
 
Dr Training V1 07 17 09 Rev Four 4
 Dr Training V1 07 17 09 Rev Four 4 Dr Training V1 07 17 09 Rev Four 4
Dr Training V1 07 17 09 Rev Four 4
 
Emc vspex customer_presentation_private_cloud_virtualized_share_point
Emc vspex customer_presentation_private_cloud_virtualized_share_pointEmc vspex customer_presentation_private_cloud_virtualized_share_point
Emc vspex customer_presentation_private_cloud_virtualized_share_point
 
Reach new heights with Nutanix
Reach new heights with NutanixReach new heights with Nutanix
Reach new heights with Nutanix
 
CS_10_DR_CFD
CS_10_DR_CFDCS_10_DR_CFD
CS_10_DR_CFD
 
Solution Brief HPE StoreOnce backup with Veeam
Solution Brief HPE StoreOnce backup with VeeamSolution Brief HPE StoreOnce backup with Veeam
Solution Brief HPE StoreOnce backup with Veeam
 
Emc solutions for sap_overview
Emc solutions for sap_overviewEmc solutions for sap_overview
Emc solutions for sap_overview
 
How to achieve better backup with Symantec
How to achieve better backup with SymantecHow to achieve better backup with Symantec
How to achieve better backup with Symantec
 
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...
Scaling managed MySQL Platform in Flipkart - (Sachin Japate - Flipkart) - Myd...
 
Accelerating and Protecting your Virtualize Environment
Accelerating and Protecting your Virtualize EnvironmentAccelerating and Protecting your Virtualize Environment
Accelerating and Protecting your Virtualize Environment
 
High Res CIO Review Article
High Res CIO Review ArticleHigh Res CIO Review Article
High Res CIO Review Article
 
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...
Announcing Symantec & Microsoft’s Azure Cloud Disaster Recovery as a Service ...
 
Unified Recovery Management
Unified Recovery ManagementUnified Recovery Management
Unified Recovery Management
 
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
Enabling Continuous Availability and Reducing Downtime with IBM Multi-Site Wo...
 
VMworld 2011 (BCO3276)
VMworld 2011 (BCO3276)VMworld 2011 (BCO3276)
VMworld 2011 (BCO3276)
 
EMC VPLEX Continuous availability and non disruptive
EMC VPLEX Continuous availability and non disruptiveEMC VPLEX Continuous availability and non disruptive
EMC VPLEX Continuous availability and non disruptive
 

VMWare Forum Winnipeg - 2012

  • 1. Disaster Recovery Russ Pedneault Anil C. Sedha Kevin Seniuk and Failover using Technology Services Manager Midrange Services Supervisor Senior Technical Specialist SRM VMWare Forum Winnipeg May 15, 2012
  • 2. Company Overview  Largest publisher by circulation of paid English-language daily newspapers in Canada, representing some of the country’s oldest and best known media brands.  Reaching millions of Canadians every week  Engage readers and offer advertisers and marketers integrated solutions to effectively reach target audiences through a variety of print, online, digital, and mobile platforms.  Postmedia Network is a Mobile Web Leader – 120 Daily News media mobile sites, 80+ vertical mobile web sites, 1M monthly visitors, 9M monthly page views. 2
  • 3. IT Overview  Virtualization Platform: VMWare vSphere 4.1 and 5.0, SRM v4.1  500+ Virtual Servers, 250 Physical servers, 3 Virtual Center servers, 4000+ desktops, 3 datacenters and 13 smaller sites  Server Hardware: HP, Cisco, SUN, and Apha/VMS servers  EMC Clariion and VNX arrays, HP EVA arrays, Sun Storage, Data domain VTL  Operating System: VMWare ESXi, Windows 2003/2008, HP-UX, VMS, Red Hat Enterprise Linux, Solaris, Suse Linux, Apple  Messaging: Exchange 2007, MS Office Communicator, Cisco Unified Messaging  Database: Oracle, MS SQL, Sybase, MySQL 3
  • 4. Virtualization/SRM Story Background IT could not recover data quickly enough so Postmedia recovery plans were time consuming and involved special recovery procedures requiring expert knowledge. Challenges IT environment was running mostly on old physical servers and had clustering/mirroring in place The Inevitable Happens - An entire datacenter goes down due to a power outage despite power protection. - After power was restored another outage had to be taken to perform repairs. - Enhanced recovery procedures were not in place at that time Resolution - Deploy virtualization first strategy - Implement SRM with existing Storage Replication Technology - Upgrade SRM to run with newer Storage Replication Technology Turnaround SRM failover brings relief and a new self confidence in the organization that data can be recovered in a very short duration with roll back capabilities. 4
  • 5. Background Key Issues – - Recovery timeline was unacceptable for some revenue generating applications - Multiple resources from Application and Infrastructure teams had to be involved - Operational sequence for recovery was manual so mistakes could easily happen - Changes in application environments meant keeping up with those changes manually - Managing failover/recovery of remote sites 5
  • 6. Challenges  Physical server infrastructure does not offer the flexibility for easy failover to secondary site.  Reliance on aging hardware – unsure if server would come up after restart  Many manual steps needed to make remote site operational  Required specialists to bring up Storage environment at remote site before Server environment could be brought up.  Clustered Environments presented additional challenges – Microsoft Cluster, HP- UX Cluster, Sun Cluster.  Push back from Application teams – don’t touch the server running our applications 6
  • 7. Challenges  A large number of application servers were running on physical hardware.  A great deal of effort was needed by both Application and Infrastructure teams.  Outages to critical applications for longer than expected timeframe would mean revenue loss.  IT had never done a datacenter recovery or failover in the past. 7
  • 8. Reality Bites (Power Outage) There was an unexpected Power Outage at one of our Datacenters and all servers went offline for approximately an hour. Server Recovery after power outage took further effort and quite a few hours. The initial event left Postmedia IT wondering what to do since a recovery would have taken many hours. Once power was restored, a planned failover was needed by Service Provider to perform power infrastructure repair for around 8 hours. Postmedia was given 5 days after negotiation (scheduled to next day earlier) to perform the planned failover before outage. 8
  • 9. What SRM did for us  Created a complete recovery process in a simple, centralized recovery plan, and automated recovery steps. SRM allowed failover of the Exchange 2007 environment in minutes.  Other application servers failed over in minutes as well.  Half of the datacenter move was accomplished quickly and within the expected timeframe.  The success of SRM and Virtualization gave the impetus to create further cost savings by virtualizing and retiring older servers. 9
  • 10. What SRM did for us (Contd)  Postmedia IT chose the approach of showcasing the benefits of virtualization instead of forcing virtualization on the business.  Highlighted the capabilities of SRM failover of the Exchange 2007 environment in minutes. Recovery is very simplified and even a non-IT individual within the organization with the authorization and awareness of documented login procedures can press the recovery button in case of a disaster. 10
  • 11. Lessons Learned SRM recovery plans should be created based on which application consistency groups need to be failed over together.  Review your common outage windows based on applications Ensure you have efficient storage replication mechanisms in place that integrate with SRM. Verify your Recovery Plans in advance by running a test (this does not perform an actual failover) 11
  • 12. Planned Failover - Now  With newer replication mechanisms available in the industry it is more easier and quicker to perform failover using SRM.  Postmedia moved away from traditional software based replication to hardware appliance based replication.  We now have PVR like capabilities to rollback data to any point in time – right down to the seconds  Our recent array upgrade required planned failovers and we were able to failover Exchange and other critical applications in 7-13 minutes per recovery group.  Tested before we failed over to ensure success  Ran 3 recovery plans simultaneously for faster failover 12
  • 13. Where we are today 450+ virtual servers, 50+ ESXi hosts SRM 4.1 fully implemented for all virtualized production servers Replication mechanism fully integrated and automated with SRM – wide variety of storage related replication products Recovery of critical applications like Exchange, Citrix, CMS, takes 7-13 minutes to bring servers up at secondary site Settled down on RecoverPoint appliances to perform Replication since it offers PVR like data rollback capabilities. The organization has adopted a “Virtualize First” strategy. Significant ability to meet business timelines for application recovery. Can recover an entire datacenter quickly and successfully. 13

Notes de l'éditeur

  1. Recovery was difficult since disparate systems were in use and each one required their own recovery procedure
  2. Recovery Timeline – Some critical applications like Canada.com have multiple consistency groups and tens of servers so it was difficult to let them stay down for a longer duration. This environment powers all of our major websites. Planned failover timeline – When we would perform failover the server recovery was essentially based on the ability of the SAN team as to how quickly they could failover the mirror volumes Multiple resources involved – Since only infrastructure was under our control the choreography of which servers come up first and then which interface servers had to be brought up next required a lot of intervention. Operational sequence – Even if servers came up in a specific sequence and we planned it there was always a chance that mistakes could happen.
  3. We were able to showcase the value of using SRM to perform further virtualization since recovery was simplified.
  4. We were able to showcase the value of using SRM to perform further virtualization since recovery was simplified.
  5. We were able to showcase the value of using SRM to perform further virtualization since recovery was simplified.
  6. We were able to showcase the value of using SRM to perform further virtualization since recovery was simplified.