Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Exchange Server 2013High Availability | Site ResilienceScott SchnollPrincipal Technical WriterMicrosoft CorporationServeur...
• Storage• High Availability• Site ResilienceAgenda
STORAGE
• Capacity is increasing, but IOPS are not• Database sizes must be manageable• Reseeds must be fast and reliable• Passive ...
• Multiple Databases Per Volume• Automatic Reseed• Automatic Recovery from Storage Failures• Lagged Copy EnhancementsStora...
MULTIPLE DATABASE PER VOLUME
Multiple databases per volumeDB4DB3DB2PassiveActive Lagged4-member DAG4 databases4 copies of each database4 databases per ...
Multiple databases per volumeDB1DB1DB1PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8T...
Multiple databases per volumeDB4DB3DB2PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8T...
• Requirements– Single logical disk/partition per physical disk• Recommendations– Databases per volume should equal the nu...
AUTORESEED
• Disk failure on active copy = databasefailover• Failed disk and database corruption issuesneed to be addressed quickly• ...
• Autoreseed - automatically restoreredundancy after disk failureSeeding EnhancementsIn-UseStorageX
AutoreseedPeriodicallyscan for failedandsuspendedcopiesCheckprerequisites:singlecopy, spareavailabilityAllocate andremap a...
AutoreseedConfigure storagesubsystem with spare disksCreate DAG, add serverswith configured storageCreate directory andmou...
• Requirements– Single logical disk/partition per physical disk– Specific database and log folder structure must be used• ...
AUTOMATIC RECOVERY FROM STORAGEFAILURES
• Storage controllers are basically mini-PCs– As such, they can crash, hang, etc., requiringadministrative intervention• O...
• Innovations added in Exchange 2010 carried forward• New recovery behaviors added to Exchange 2013– Even more added to Ex...
LAGGED COPY ENHANCEMENTS
• Activation is difficult• Lagged copies require manual care• Lagged copies cannot be page patchedLagged Copy Challenged
• Automatic log file replay in a variety ofsituations– Low disk space (enable in registry)– Page patching (enabled by defa...
HIGH AVAILABILITY
• High availability focuses on database health• Best copy selection insufficient for newarchitecture• Management challenge...
• Managed Availability• Best Copy and Server Selection• DAG Network AutoconfigHigh Availability Enhancements
MANAGED AVAILABILITY
• Key tenet for Exchange 2013:– All access to a mailbox is provided by the protocol stack on theMailbox server that hosts ...
• An internal framework used by componentteams• Sequencing mechanism to control whenrecovery actions are taken versus aler...
• MA failovers are recovery action from failure– Detected via a synthetic operation or live data– Throttled in time and ac...
BEST COPY AND SERVER SELECTION
• Exchange 2010 used several criteria– Copy queue length– Replay queue length– Database copy status – including activation...
• Still an Active Manager algorithm performed at*over time based on extracted health of the system– Replication health sti...
Best Copy and Server SelectionAll HealthyChecks for a server hosting a copy that has all health sets in a healthy stateUp ...
DAG NETWORK AUTOCONFIG
• DAG networks must be manually collapsedin a multi-subnet deployment• Continuing to reduce administrative burdenfor deplo...
• DAGs now default to automaticconfiguration– Still requires specific configuration settings on NICs– Manual edits and EAC...
DAG Network Enhancements
SITE RESILIENCE
• Operationally complex• Mailbox and Client Access recoveryconnected• Namespace is a SPOFSite Resilience Challenges
• Operationally simplified• Mailbox and Client Access recoveryindependent• Namespace provides redundancySite Resilience En...
• Previously loss of CAS, CAS array, VIP, LB,some portion of the DAG required admin toperform a datacenter switchover• In ...
• Previously, CAS and Mailbox serverrecovery were tied together in siterecoveries• In Exchange Server 2013, recovery isind...
• DNS resolves to multiple IP addresses• Almost all protocol access in Exchange 2013 isHTTP• HTTP clients have built-in IP...
• With the namespace simplification, consolidation ofserver roles, separation of CAS array and DAGrecovery, and load balan...
PortlandRedmondSite Resiliencecas3 cas4cas1 cas2
PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4Assuming MBX3 and MBX4 are operating and one of them can lock the wit...
PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4
PortlandRedmonddag1Site Resiliencembx1 mbx2 mbx3 mbx41. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGro...
SUMMARY
• Many storage enhancements targetedtowards JBOD environments• Numerous high availability improvements• Site resilience op...
Scott SchnollPrincipal Technical Writerscott.schnoll@microsoft.comhttp://aka.ms/schnollschnollQuestions?
Formez-vous en ligneRetrouvez nos évènementsFaites-vous accompagnergratuitementEssayer gratuitement nossolutions ITRetrouv...
Prochain SlideShare
Chargement dans…5
×

Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site

1 664 vues

Publié le

La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. L'expérience acquise par la gestion des solutions de messagerie Cloud par les équipes Microsoft a été directement intégrée dans cette nouvelle version du produit ce qui va vous permettre la mise en place d'un système de messagerie ultra résilient. Scott Schnoll, Principal Technical Writer dans l'équipe Exchange à Microsoft Corp va vous expliquer de manière didactique l'ensemble des mécanismes de haute disponibilité et les solutions de resilience inter sites dans les plus petits détails. Venez apprendre directement par l'expert qui a travaillé sur ces sujets chez Microsoft ! Attention, session très technique, en anglais.

Publié dans : Technologie
  • Soyez le premier à commenter

Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site

  1. 1. Exchange Server 2013High Availability | Site ResilienceScott SchnollPrincipal Technical WriterMicrosoft CorporationServeurs / Entreprise / Réseaux / ITTwitter: SchnollBlog: http://aka.ms/Schnoll
  2. 2. • Storage• High Availability• Site ResilienceAgenda
  3. 3. STORAGE
  4. 4. • Capacity is increasing, but IOPS are not• Database sizes must be manageable• Reseeds must be fast and reliable• Passive copy IOPS are inefficient• Lagged copies have asymmetric storagerequirements• Low agility from low disk space recoveryStorage Challenges
  5. 5. • Multiple Databases Per Volume• Automatic Reseed• Automatic Recovery from Storage Failures• Lagged Copy EnhancementsStorage Enhancements
  6. 6. MULTIPLE DATABASE PER VOLUME
  7. 7. Multiple databases per volumeDB4DB3DB2PassiveActive Lagged4-member DAG4 databases4 copies of each database4 databases per volumeSymmetrical design with balancedactivation preferenceNumber of copies per database =number of databasesper volume
  8. 8. Multiple databases per volumeDB1DB1DB1PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs20 MB/s
  9. 9. Multiple databases per volumeDB4DB3DB2PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs4 database copies/disk:Reseed 2TB Disk = ~9.7 hrsReseed 8TB Disk = ~39 hrs12 MB/s12 MB/s20 MB/s 20 MB/s
  10. 10. • Requirements– Single logical disk/partition per physical disk• Recommendations– Databases per volume should equal the number ofcopies per database– Same neighbors on all servers– Balance activation preferencesMultiple databases per volume
  11. 11. AUTORESEED
  12. 12. • Disk failure on active copy = databasefailover• Failed disk and database corruption issuesneed to be addressed quickly• Fast recovery to restore redundancy isneededSeeding Challenges
  13. 13. • Autoreseed - automatically restoreredundancy after disk failureSeeding EnhancementsIn-UseStorageX
  14. 14. AutoreseedPeriodicallyscan for failedandsuspendedcopiesCheckprerequisites:singlecopy, spareavailabilityAllocate andremap a spareStart the seedVerify that thenew copy ishealthyAdminreplacesfailed disk
  15. 15. AutoreseedConfigure storagesubsystem with spare disksCreate DAG, add serverswith configured storageCreate directory andmount pointsConfigure DAG, including 3new propertiesCreate mailbox databasesand database copiesMDB1 MDB2MDB1 MDB2MDB1.DB MDB1.logMDB1.DB MDB1.logAutoDagDatabasesRootFolderPathAutoDagVolumesRootFolderPathAutoDagDatabaseCopiesPerVolume = 1
  16. 16. • Requirements– Single logical disk/partition per physical disk– Specific database and log folder structure must be used• Recommendations– Same neighbors on all servers– Databases per volume should equal the number of copiesper database– Balance activation preferences• Configuration instructions– http://aka.ms/autoreseedAutoreseed
  17. 17. AUTOMATIC RECOVERY FROM STORAGEFAILURES
  18. 18. • Storage controllers are basically mini-PCs– As such, they can crash, hang, etc., requiringadministrative intervention• Other operator-recoverable conditions canoccur– Loss of vital system elements– Hung or highly latent IORecovery Challenges
  19. 19. • Innovations added in Exchange 2010 carried forward• New recovery behaviors added to Exchange 2013– Even more added to Exchange 2013 CU1Recovery EnhancementsExchange Server 2010 Exchange Server 2013ESE Database Hung IO (240s) System Bad State (302s)Failure Item Channel Heartbeat (30s) Long I/O times (41s)SystemDisk Heartbeat (120s) MSExchangeRepl.exe memory threshold (4GB)Exchange Server 2013 CU1Bus reset (event 129)Replication service endpoints not responding
  20. 20. LAGGED COPY ENHANCEMENTS
  21. 21. • Activation is difficult• Lagged copies require manual care• Lagged copies cannot be page patchedLagged Copy Challenged
  22. 22. • Automatic log file replay in a variety ofsituations– Low disk space (enable in registry)– Page patching (enabled by default)– Less than 3 other healthy copies (enable in AD;configure in registry)• Integration with Safety Net– No need for log surgery or hunting for the point ofcorruptionLagged Copy Enhancements
  23. 23. HIGH AVAILABILITY
  24. 24. • High availability focuses on database health• Best copy selection insufficient for newarchitecture• Management challenges aroundmaintenance and DAG networkconfigurationHigh Availability Challenges
  25. 25. • Managed Availability• Best Copy and Server Selection• DAG Network AutoconfigHigh Availability Enhancements
  26. 26. MANAGED AVAILABILITY
  27. 27. • Key tenet for Exchange 2013:– All access to a mailbox is provided by the protocol stack on theMailbox server that hosts the active copy of the user’s mailbox• If a protocol is down on a Mailbox server, all activedatabases lose access via that protocol• Managed Availability was introduced to detect thesekinds of failures and automatically correct them– For most protocols, quick recovery is achieved via a restart action– If the restart action fails, a failover can be triggered• Each protocol team designed their own recovery sequence, whichis based on their experiences running Office 365 – serviceexperience accrues to the on-premises admin!Managed Availability
  28. 28. • An internal framework used by componentteams• Sequencing mechanism to control whenrecovery actions are taken versus alerting andescalation• Enhances the best copy selection algorithm bytaking into account server health• Includes a mechanism for taking servers in/outof service (maintenance mode)Managed Availability
  29. 29. • MA failovers are recovery action from failure– Detected via a synthetic operation or live data– Throttled in time and across the DAG• MA failovers come in two forms– Server: Protocol failure can trigger server failover– Database: Store-detected database failure can trigger databasefailover• MA includes Single Copy Alert– Alert is per-server to reduce flow– Still triggered across all machines with copies– Monitoring triggered through a notification– Logs 4138 (red) and 4139 (green) eventsManaged Availability
  30. 30. BEST COPY AND SERVER SELECTION
  31. 31. • Exchange 2010 used several criteria– Copy queue length– Replay queue length– Database copy status – including activation blocked– Content index status• Using just this criteria is not good enoughfor Exchange 2013, because protocol healthis not consideredBest Copy Selection Challenges
  32. 32. • Still an Active Manager algorithm performed at*over time based on extracted health of the system– Replication health still determined by same criteria and phases– Criteria now includes health of the entire protocol stack• Considers a prioritized protocol health set in theselection– Four priorities – critical, high, medium, low (all health sets have apriority)– Failover responders trigger added checks to select a “protocol notworse” targetBest Copy and Server Selection
  33. 33. Best Copy and Server SelectionAll HealthyChecks for a server hosting a copy that has all health sets in a healthy stateUp to Normal HealthyChecks for a server hosting a copy that has all health sets Medium and above in a healthy stateAll Better than SourceChecks for a server hosting a copy that has health sets in a state that is better than the currentserver hosting the affected copySame as SourceChecks for a server hosting a copy of the affected database that has health sets in a state that is the same as thecurrent server hosting the affected copy
  34. 34. DAG NETWORK AUTOCONFIG
  35. 35. • DAG networks must be manually collapsedin a multi-subnet deployment• Continuing to reduce administrative burdenfor deployment and initial configurationDAG Network Challenges
  36. 36. • DAGs now default to automaticconfiguration– Still requires specific configuration settings on NICs– Manual edits and EAC controls blocked whenautomatic networking is enabled– Set DAG to manual network setup to edit or changeDAG networks• Multi-subnet DAG networks automaticallycollapsedDAG Network Enhancements
  37. 37. DAG Network Enhancements
  38. 38. SITE RESILIENCE
  39. 39. • Operationally complex• Mailbox and Client Access recoveryconnected• Namespace is a SPOFSite Resilience Challenges
  40. 40. • Operationally simplified• Mailbox and Client Access recoveryindependent• Namespace provides redundancySite Resilience Enhancements
  41. 41. • Previously loss of CAS, CAS array, VIP, LB,some portion of the DAG required admin toperform a datacenter switchover• In Exchange Server 2013, recovery happensautomatically– The admin focuses on fixing the issue, instead ofrestoring serviceSite Resilience – Operationally Simplified
  42. 42. • Previously, CAS and Mailbox serverrecovery were tied together in siterecoveries• In Exchange Server 2013, recovery isindependent, and may come automaticallyin the form of failoverSite Resilience – Recovery Independent
  43. 43. • DNS resolves to multiple IP addresses• Almost all protocol access in Exchange 2013 isHTTP• HTTP clients have built-in IP failover capabilities• Clients skip past IPs that produce hard TCP failures• Admins can switchover by removing VIP from DNS• Namespace no longer a SPOF• No dealing with DNS latencySite Resilience – Namespace Redundancy
  44. 44. • With the namespace simplification, consolidation ofserver roles, separation of CAS array and DAGrecovery, and load balancing changes, threelocations can simplify mailbox recovery and providedatacenter failovers• You must have at least three locations– Two locations with Exchange; one with witness server– Exchange sites must be well-connected– Witness server site must be isolated from network failuresaffecting Exchange sitesSite Resilience – Three Locations
  45. 45. PortlandRedmondSite Resiliencecas3 cas4cas1 cas2
  46. 46. PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4Assuming MBX3 and MBX4 are operating and one of them can lock the witness.logfile, automatic failover should occurwitness
  47. 47. PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4
  48. 48. PortlandRedmonddag1Site Resiliencembx1 mbx2 mbx3 mbx41. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland
  49. 49. SUMMARY
  50. 50. • Many storage enhancements targetedtowards JBOD environments• Numerous high availability improvements• Site resilience operationally simplifiedSummary
  51. 51. Scott SchnollPrincipal Technical Writerscott.schnoll@microsoft.comhttp://aka.ms/schnollschnollQuestions?
  52. 52. Formez-vous en ligneRetrouvez nos évènementsFaites-vous accompagnergratuitementEssayer gratuitement nossolutions ITRetrouver nos expertsMicrosoftPros de l’ITDéveloppeurswww.microsoftvirtualacademy.comhttp://aka.ms/generation-apphttp://aka.ms/evenements-developpeurshttp://aka.ms/itcamps-franceLes accélérateursWindows Azure, Windows Phone,Windows 8http://aka.ms/telechargementsLa Dev’Team sur MSDNhttp://aka.ms/devteamL’IT Team sur TechNethttp://aka.ms/itteam

×