Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site
 

Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site

on

  • 878 vues

La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. ...

La nouvelle version d'Exchange Server 2013 intègre une foule de nouveautés lui permettant d'être aujourd'hui le serveur de messagerie le plus sécurisé et le plus fiable sur le marché. L'expérience acquise par la gestion des solutions de messagerie Cloud par les équipes Microsoft a été directement intégrée dans cette nouvelle version du produit ce qui va vous permettre la mise en place d'un système de messagerie ultra résilient. Scott Schnoll, Principal Technical Writer dans l'équipe Exchange à Microsoft Corp va vous expliquer de manière didactique l'ensemble des mécanismes de haute disponibilité et les solutions de resilience inter sites dans les plus petits détails. Venez apprendre directement par l'expert qui a travaillé sur ces sujets chez Microsoft ! Attention, session très technique, en anglais.

Statistiques

Vues

Total des vues
878
Vues sur SlideShare
878
Vues externes
0

Actions

J'aime
0
Téléchargements
24
Commentaires
0

0 Ajouts 0

No embeds

Accessibilité

Catégories

Détails de l'import

Uploaded via as Microsoft PowerPoint

Droits d'utilisation

© Tous droits réservés

Report content

Signalé comme inapproprié Signaler comme inapproprié
Signaler comme inapproprié

Indiquez la raison pour laquelle vous avez signalé cette présentation comme n'étant pas appropriée.

Annuler
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Votre message apparaîtra ici
    Processing...
Poster un commentaire
Modifier votre commentaire
  • Intro Serveurs / Entreprise / Reseaux / IT
  • See bug 2527095 for the event 129 logic.Example event 129:Log Name: SystemSource: HpCISSs2Date: 1/28/2012 6:55:34 PMEvent ID: 129Task Category: NoneLevel: WarningKeywords: ClassicUser: N/AComputer: DB3PRD0104MB115.eurprd01.prod.exchangelabs.comDescription:Reset to device, \\Device\\RaidPort0, was issued.
  • Single copy alert (aka database redundancy check) is now integrated with the Replication service.

Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance / résilience de site Presentation Transcript

  • Exchange Server 2013High Availability | Site ResilienceScott SchnollPrincipal Technical WriterMicrosoft CorporationServeurs / Entreprise / Réseaux / ITTwitter: SchnollBlog: http://aka.ms/Schnoll
  • • Storage• High Availability• Site ResilienceAgenda
  • STORAGE
  • • Capacity is increasing, but IOPS are not• Database sizes must be manageable• Reseeds must be fast and reliable• Passive copy IOPS are inefficient• Lagged copies have asymmetric storagerequirements• Low agility from low disk space recoveryStorage Challenges
  • • Multiple Databases Per Volume• Automatic Reseed• Automatic Recovery from Storage Failures• Lagged Copy EnhancementsStorage Enhancements
  • MULTIPLE DATABASE PER VOLUME
  • Multiple databases per volumeDB4DB3DB2PassiveActive Lagged4-member DAG4 databases4 copies of each database4 databases per volumeSymmetrical design with balancedactivation preferenceNumber of copies per database =number of databasesper volume
  • Multiple databases per volumeDB1DB1DB1PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs20 MB/s
  • Multiple databases per volumeDB4DB3DB2PassiveActive LaggedSingle database copy/disk:Reseed 2TB Database = ~23 hrsReseed 8TB Database = ~93 hrs4 database copies/disk:Reseed 2TB Disk = ~9.7 hrsReseed 8TB Disk = ~39 hrs12 MB/s12 MB/s20 MB/s 20 MB/s
  • • Requirements– Single logical disk/partition per physical disk• Recommendations– Databases per volume should equal the number ofcopies per database– Same neighbors on all servers– Balance activation preferencesMultiple databases per volume
  • AUTORESEED
  • • Disk failure on active copy = databasefailover• Failed disk and database corruption issuesneed to be addressed quickly• Fast recovery to restore redundancy isneededSeeding Challenges
  • • Autoreseed - automatically restoreredundancy after disk failureSeeding EnhancementsIn-UseStorageX
  • AutoreseedPeriodicallyscan for failedandsuspendedcopiesCheckprerequisites:singlecopy, spareavailabilityAllocate andremap a spareStart the seedVerify that thenew copy ishealthyAdminreplacesfailed disk
  • AutoreseedConfigure storagesubsystem with spare disksCreate DAG, add serverswith configured storageCreate directory andmount pointsConfigure DAG, including 3new propertiesCreate mailbox databasesand database copiesMDB1 MDB2MDB1 MDB2MDB1.DB MDB1.logMDB1.DB MDB1.logAutoDagDatabasesRootFolderPathAutoDagVolumesRootFolderPathAutoDagDatabaseCopiesPerVolume = 1
  • • Requirements– Single logical disk/partition per physical disk– Specific database and log folder structure must be used• Recommendations– Same neighbors on all servers– Databases per volume should equal the number of copiesper database– Balance activation preferences• Configuration instructions– http://aka.ms/autoreseedAutoreseed
  • AUTOMATIC RECOVERY FROM STORAGEFAILURES
  • • Storage controllers are basically mini-PCs– As such, they can crash, hang, etc., requiringadministrative intervention• Other operator-recoverable conditions canoccur– Loss of vital system elements– Hung or highly latent IORecovery Challenges
  • • Innovations added in Exchange 2010 carried forward• New recovery behaviors added to Exchange 2013– Even more added to Exchange 2013 CU1Recovery EnhancementsExchange Server 2010 Exchange Server 2013ESE Database Hung IO (240s) System Bad State (302s)Failure Item Channel Heartbeat (30s) Long I/O times (41s)SystemDisk Heartbeat (120s) MSExchangeRepl.exe memory threshold (4GB)Exchange Server 2013 CU1Bus reset (event 129)Replication service endpoints not responding
  • LAGGED COPY ENHANCEMENTS
  • • Activation is difficult• Lagged copies require manual care• Lagged copies cannot be page patchedLagged Copy Challenged
  • • Automatic log file replay in a variety ofsituations– Low disk space (enable in registry)– Page patching (enabled by default)– Less than 3 other healthy copies (enable in AD;configure in registry)• Integration with Safety Net– No need for log surgery or hunting for the point ofcorruptionLagged Copy Enhancements
  • HIGH AVAILABILITY
  • • High availability focuses on database health• Best copy selection insufficient for newarchitecture• Management challenges aroundmaintenance and DAG networkconfigurationHigh Availability Challenges
  • • Managed Availability• Best Copy and Server Selection• DAG Network AutoconfigHigh Availability Enhancements
  • MANAGED AVAILABILITY
  • • Key tenet for Exchange 2013:– All access to a mailbox is provided by the protocol stack on theMailbox server that hosts the active copy of the user’s mailbox• If a protocol is down on a Mailbox server, all activedatabases lose access via that protocol• Managed Availability was introduced to detect thesekinds of failures and automatically correct them– For most protocols, quick recovery is achieved via a restart action– If the restart action fails, a failover can be triggered• Each protocol team designed their own recovery sequence, whichis based on their experiences running Office 365 – serviceexperience accrues to the on-premises admin!Managed Availability
  • • An internal framework used by componentteams• Sequencing mechanism to control whenrecovery actions are taken versus alerting andescalation• Enhances the best copy selection algorithm bytaking into account server health• Includes a mechanism for taking servers in/outof service (maintenance mode)Managed Availability
  • • MA failovers are recovery action from failure– Detected via a synthetic operation or live data– Throttled in time and across the DAG• MA failovers come in two forms– Server: Protocol failure can trigger server failover– Database: Store-detected database failure can trigger databasefailover• MA includes Single Copy Alert– Alert is per-server to reduce flow– Still triggered across all machines with copies– Monitoring triggered through a notification– Logs 4138 (red) and 4139 (green) eventsManaged Availability
  • BEST COPY AND SERVER SELECTION
  • • Exchange 2010 used several criteria– Copy queue length– Replay queue length– Database copy status – including activation blocked– Content index status• Using just this criteria is not good enoughfor Exchange 2013, because protocol healthis not consideredBest Copy Selection Challenges
  • • Still an Active Manager algorithm performed at*over time based on extracted health of the system– Replication health still determined by same criteria and phases– Criteria now includes health of the entire protocol stack• Considers a prioritized protocol health set in theselection– Four priorities – critical, high, medium, low (all health sets have apriority)– Failover responders trigger added checks to select a “protocol notworse” targetBest Copy and Server Selection
  • Best Copy and Server SelectionAll HealthyChecks for a server hosting a copy that has all health sets in a healthy stateUp to Normal HealthyChecks for a server hosting a copy that has all health sets Medium and above in a healthy stateAll Better than SourceChecks for a server hosting a copy that has health sets in a state that is better than the currentserver hosting the affected copySame as SourceChecks for a server hosting a copy of the affected database that has health sets in a state that is the same as thecurrent server hosting the affected copy
  • DAG NETWORK AUTOCONFIG
  • • DAG networks must be manually collapsedin a multi-subnet deployment• Continuing to reduce administrative burdenfor deployment and initial configurationDAG Network Challenges
  • • DAGs now default to automaticconfiguration– Still requires specific configuration settings on NICs– Manual edits and EAC controls blocked whenautomatic networking is enabled– Set DAG to manual network setup to edit or changeDAG networks• Multi-subnet DAG networks automaticallycollapsedDAG Network Enhancements
  • DAG Network Enhancements
  • SITE RESILIENCE
  • • Operationally complex• Mailbox and Client Access recoveryconnected• Namespace is a SPOFSite Resilience Challenges
  • • Operationally simplified• Mailbox and Client Access recoveryindependent• Namespace provides redundancySite Resilience Enhancements
  • • Previously loss of CAS, CAS array, VIP, LB,some portion of the DAG required admin toperform a datacenter switchover• In Exchange Server 2013, recovery happensautomatically– The admin focuses on fixing the issue, instead ofrestoring serviceSite Resilience – Operationally Simplified
  • • Previously, CAS and Mailbox serverrecovery were tied together in siterecoveries• In Exchange Server 2013, recovery isindependent, and may come automaticallyin the form of failoverSite Resilience – Recovery Independent
  • • DNS resolves to multiple IP addresses• Almost all protocol access in Exchange 2013 isHTTP• HTTP clients have built-in IP failover capabilities• Clients skip past IPs that produce hard TCP failures• Admins can switchover by removing VIP from DNS• Namespace no longer a SPOF• No dealing with DNS latencySite Resilience – Namespace Redundancy
  • • With the namespace simplification, consolidation ofserver roles, separation of CAS array and DAGrecovery, and load balancing changes, threelocations can simplify mailbox recovery and providedatacenter failovers• You must have at least three locations– Two locations with Exchange; one with witness server– Exchange sites must be well-connected– Witness server site must be isolated from network failuresaffecting Exchange sitesSite Resilience – Three Locations
  • PortlandRedmondSite Resiliencecas3 cas4cas1 cas2
  • PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4Assuming MBX3 and MBX4 are operating and one of them can lock the witness.logfile, automatic failover should occurwitness
  • PortlandRedmondSite Resiliencedag1mbx1 mbx2 mbx3 mbx4
  • PortlandRedmonddag1Site Resiliencembx1 mbx2 mbx3 mbx41. Mark the failed servers/site as down: Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Redmond2. Stop the Cluster Service on Remaining DAG members: Stop-Clussvc3. Activate DAG members in 2nd datacenter: Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite:Portland
  • SUMMARY
  • • Many storage enhancements targetedtowards JBOD environments• Numerous high availability improvements• Site resilience operationally simplifiedSummary
  • Scott SchnollPrincipal Technical Writerscott.schnoll@microsoft.comhttp://aka.ms/schnollschnollQuestions?
  • Formez-vous en ligneRetrouvez nos évènementsFaites-vous accompagnergratuitementEssayer gratuitement nossolutions ITRetrouver nos expertsMicrosoftPros de l’ITDéveloppeurswww.microsoftvirtualacademy.comhttp://aka.ms/generation-apphttp://aka.ms/evenements-developpeurshttp://aka.ms/itcamps-franceLes accélérateursWindows Azure, Windows Phone,Windows 8http://aka.ms/telechargementsLa Dev’Team sur MSDNhttp://aka.ms/devteamL’IT Team sur TechNethttp://aka.ms/itteam