UNC307 - Microsoft Exchange Server 2010 High Availability

High Availability Scott Schnoll Principal Technical Writer Microsoft Corporation Session Code: UNC307

Agenda Exchange 2010 High Availability Vision/Goals Exchange 2010 High Availability Features Exchange 2010 High Availability Deep Dive Deploying Exchange 2010 High Availability Features Transitioning to Exchange 2010 High Availability High Availability Design Examples

Exchange 2010 High Availability Vision/Goals

Exchange 2010 High Availability Vision and Goals Vision: Deliver a fast, easy-to-deploy and operate, economical solution that can provide messaging service continuity for all customers Goals Deliver a native solution for high availability/site resilience Enable less expensive and less complex storage Simplify administration and reduce support costs Increase end-to-end availability Support Exchange Server 2010 Online Support large mailboxes at low cost

Complex site resilience and recovery Dallas DB1 Outlook OWA, ActiveSync, or Outlook Anywhere DB2 Standby Cluster DB3 Clustered Mailbox Server had to be created manually San Jose Front End Server Third-party data replication needed for site resilience NodeB(passive) NodeA(active) Clustering knowledge required Failover at Mailbox server level DB1 DB4 DB2 DB5 DB3 DB6 Exchange Server 2003

Complex activation for remote server / datacenter Dallas DB1 SCR Outlook OWA, ActiveSync, or Outlook Anywhere DB2 Standby Cluster DB3 Clustered Mailbox Server can’t co-exist with other roles San Jose Client Access Server No GUI to manage SCR NodeB(passive) NodeA(active) CCR Clustering knowledge required DB1 DB4 DB1 DB4 DB2 DB2 DB5 DB5 Failover at Mailbox server level DB3 DB3 DB6 DB6 Exchange Server 2007

Dallas All clients connect via CAS servers DB1 DB3 Client DB5 Mailbox Server 6 San Jose Easy to extend across sites Client Access Server Failover managed by/with Exchange Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB1 DB4 DB1 DB5 DB3 DB2 Database level failover DB5 DB2 DB1 DB4 DB3 DB3 DB1 DB2 DB4 DB5 Exchange Server 2010

Exchange 2010 High Availability Features

Exchange 2010 High Availability Terminology High Availability – Solution must provide data availability, service availability, and automatic recovery from failures Disaster Recovery – Process used to manually recover from a failure Site Resilience – Disaster recovery solution used for recovery from site failure *over – Short for switchover/failover; a switchover is a manual activation of one or more databases; a failover is an automatic activation of one or more databases after a failure

Exchange 2010 High Availability Feature Names Mailbox Resiliency – Name of Unified High Availability and Site Resilience Solution Database Mobility – The ability of a single mailbox database to be replicated to and mounted on other mailbox servers Incremental Deployment – The ability to deploy high availability /site resilience after Exchange is installed Exchange Third Party Replication API – An Exchange-provided API that enables use of third-party replication for a DAG in lieu of continuous replication

Exchange 2010 High Availability Feature Names Database Availability Group – A group of up to 16 Mailbox servers that host a set of replicated databases Mailbox Database Copy – A mailbox database (.edb file and logs) that is either active or passive RPC Client Access service – A Client Access server feature that provides a MAPI endpoint for Outlook clients Shadow Redundancy – A transport feature that provides redundancy for messages for the entire time they are in transit

Exchange 2010 *overs Within a datacenter Database or server *overs Datacenter level: switchover Between datacenters Database or server *overs Assumptions: Each datacenter is a separate Active Directory site Each datacenter has live, active messaging services Standby datacenter must be active to support single database *over

Exchange 2007 Concepts Brought Forward Extensible Storage Engine (ESE) Databases and log files Continuous Replication Log shipping and replay Database seeding Store service/Replication service Database health and status monitoring Divergence Automatic database mount behavior Concepts of quorum and witness Concepts of *overs

Exchange 2010 Cut Concepts Storage Groups Databases identified by the server on which they live Server names as part of database names Clustered Mailbox Servers Pre-installing a Windows Failover Cluster Running Setup in Clustered Mode Moving a CMS network identity between servers Shared Storage Two HA Copy Limits Requirement of Two Networks Concepts of public, private and mixed networks

HA/Backup Strategy Changes Exchange 2010 Feature Set Feature Benefits HW/SW Failures Mailbox Resiliency Fast Recovery ,[object Object]

Data redundancyData Center Failures Single Item Recovery Accidentally Deleted Items ,[object Object],Administrator Error Data Retention Lagged Copy ,[object Object],Mailbox Corruption Personal Archive + Retention Policies Long Term Data Retention ,[object Object],[object Object]

Exchange 2010 HA Fundamentals Database Availability Group Server Database Database Copy Active Manager RPC Client Access RPC CAS SVR DB DB copy copy copy copy AM AM SVR DAG RPC CAS

Database Availability Group (DAG) Base component of high availability and site resilience A group of up to 16 servers that host a set of replicated databases “Wraps” a Windows Failover Cluster Manages membership (DAG member = node) Provides heartbeat of DAG member servers Active Manager stores data in cluster database Defines a boundary for: Mailbox database replication Database and server *overs Active Manager

DAG Requirements Windows Server 2008 SP2 Enterprise Edition or Windows Server 2008 R2 Enterprise Edition Exchange Server 2010 Standard Edition or Exchange Server 2010 Enterprise Edition Standard supports up to 5 databases per server Enterprise supports up to 100 databases per server At least one network card per DAG member

Active Manager Exchange component that manages *overs Runs on every server in the DAG Selects best available copy on failovers Is the definitive source of information on where a database is active Stores this information in cluster database Provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport) Two Active Manager roles: PAM and SAM Active Manager client runs on CAS and Hub

Active Manager Primary Active Manager (PAM) Runs on the node that owns the cluster group Gets topology change notifications Reacts to server failures Selects the best database copy on *overs Standby Active Manager (SAM) Runs on every other node in the DAG Responds to queries about which server hosts the active copy of the mailbox database Both roles are necessary for automatic recovery If Replication service is stopped, automatic recovery will not happen

Active ManagerSelection of Active Database Copy Active Manager selects the “best” copy to become active when existing active fails Ignores servers that are unreachable or activation is temporarily or regularly blocked Sorts copies by currency to minimize data loss Breaks ties during sort based on Activation Preference Selects from sorted listed based on copy status of each copy

Active ManagerSelection of Active Database Copy Active Manager selects the “best” copy to become active when existing active fails 8 6 9 5 7 10 Catalog Crawling Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource CopyQueueLength < 10 Catalog Healthy Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource Catalog Crawling Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource Catalog Healthy Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource CopyQueueLength < 10 ReplayQueueLength < 50 Catalog Crawling Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource CopyQueueLength < 10 ReplayQueueLength < 50 Catalog Healthy Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource ReplayQueueLength < 50 Catalog Crawling Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource ReplayQueueLength < 50 Catalog Healthy Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource CopyQueueLength < 10 Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource ReplayQueueLength < 50 Copy status Healthy, DisconnectedAndHealthy,DisconnectedAndResynchronizing, orSeedingSource

Automatic Recovery Process When a failure occurs that affects a database: Active Manager determines the best copy to activate The Replication service on the target server attempts to copy missing log files from the source (ACLL) If successful, then the database will mount with zero data loss If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting The mounted database will generate new log files (using the same log generation sequence) Transport Dumpster requests will be initiated for the mounted database to recover lost messages When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed

Example: Database Failover Database failure occurs Failure item is raised Active Manager moves active database Database copy is restored Similar flow within and across datacenters DAG Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB3 DB2 DB4 DB3 DB4 DB1 DB5 DB4 DB5 DB5 DB2 DB1 DB3 DB1 DB2

Example: Server Failover Server failure occurs Cluster notification of node down Active Manager moves active databases Server is restored Cluster notification of node up Database copies resynchronize with active databases Similar flow within and across datacenters DAG Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB3 DB2 DB4 DB3 DB4 DB1 DB5 DB4 DB5 DB5 DB2 DB1 DB3 DB1 DB2

Example: RCA service and AM Outlook tries to reconnect Outlook tries again Outlook1 Outlook3 Outlook2 CAS Array Load Balancer RPC Client Access Server RPC Client Access Server RPC Client Access Server Active Manager Client Active Manager Client Active Manager Client CAS1 CAS2 CAS3 Active Manager Returns Mailbox Server1 Outlook’s reconnect triggers new AM request If failover is in progress AM returns old server & connect fails DB failover is complete & AM returns new server Disk Fails CAS Fails Where’s the DB mounted? DAG MAPI RPC Active Manager MAPI RPC Active Manager MAPI RPC Active Manager MAPI RPC Active Manager Store Store Store Store Mailbox Server1 Mailbox Server2 Mailbox Server3 Mailbox Server4

DAG Lifecycle DAG is created initially as empty object in Active Directory Continuous replication or 3rd party replication using Third Party Replication mode DAG is given a name and one or more IP addresses (or configured to use DHCP) When first Mailbox server is added to a DAG A Windows failover cluster is formed with a Node Majority quorum using the name of the DAG The server is added to the DAG object in Active Directory A cluster network object (CNO) for the DAG is created in the built-in Computers container The Name and IP address of the DAG is registered in DNS The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)

DAG Lifecycle When second and subsequent Mailbox server is added to a DAG The server is joined to cluster for the DAG The quorum model is automatically adjusted Node Majority - DAGs with odd number of members Node and File Share Majority - DAGs with even number of members File share witness cluster resource, directory, and share are automatically created by Exchange when needed The server is added to the DAG object in Active Directory The cluster database for the DAG is updated with info on configured databases, including if they are locally active (which they should be)

DAG Lifecycle After servers have been added to a DAG Configure the DAG Network Encryption Network Compression Configure DAG networks Network subnets Enable/disable MAPI traffic/replication Create mailbox database copies Seeding is performed automatically Monitor health and status of database copies Perform switchovers as needed

DAG Lifecycle Before you can remove a server from a DAG, you must first remove all replicated databases from the server When a server is removed from a DAG: The server is evicted from the cluster The cluster quorum is adjusted as needed The server is removed from the DAG object in Active Directory Before you can remove a DAG, you must first remove all servers from the DAG

Deploying Exchange 2010 HA Features

Exchange 2010 Incremental Deployment Create a DAGNew-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 -WitnessDirectory C:AG1FSW -DatabaseAvailablityGroupIpAddresses 10.0.0.8New-DatabaseAvailabilityGroup -Name DAG2 -DatabaseAvailablityGroupIpAddresses 10.0.0.8,192.168.0.8 Add first Mailbox Server to DAGAdd-DatabaseAvailbilityGroupServer -Identity DAG1 -MailboxServer EXMBX1 Add second and subsequent Mailbox ServerAdd-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2 Add a Mailbox Database CopyAdd-MailboxDatabaseCopy -Identity MBXDB1 -MailboxServer EXMBX3 Extend as needed

Transitioning to Exchange 2010 High Availability

Transition Steps Verify that you meet requirements for Exchange 2010 Deploy Exchange 2010 Use Exchange 2010 mailbox move features to migrate Unsupported Transitions In-place upgrade to Exchange 2010 from any previous version of Exchange Using database portability between Exchange 2010 and non-Exchange 2010 databases Backup and restore of earlier versions of Exchange databases on Exchange 2010 Using continuous replication between Exchange 2010 and Exchange 2007

Exchange Server 2010 High Availability Design Examples

High Availability Design ExampleBranch/Small Office Design Hardware Load Balancer 8 processor cores recommended with a maximum of 64GB RAM Member servers of DAG can host other server roles Client Access Hub Transport Mailbox Client AccessHub TransportMailbox DB1 DB1 UM role not recommended for co-location 2-server DAGs should use RAID DB2 DB2 DB2 DB3 DB3

High Availability Design ExampleDouble Resilience – Maintenance + DB Failure 2 servers out -> manual activation of server 3 In 3 server DAG, quorum is lost DAGs with more servers sustain more failures – greater resiliency AD: Dublin Single Site 3 Nodes 3 HA Copies CAS NLB Farm JBOD -> 3 physical Copies X Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 X DB2 DB1 DB3 DB2 DB1 DB3 DB2 DB1 DB3 DB4 DB5 DB6 DB4 DB5 DB6 DB5 DB6 DB4 Database Availability Group

High Availability Design ExampleDouble Node/Disk Failure Resilience AD: Dublin ,[object Object]

2 active copies dieCAS NLB Farm X Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 X DB6 DB4 DB5 DB3 DB7 DB5 DB2 DB1 DB3 DB8 DB7 DB1 DB8 DB1 DB2 DB6 DB7 DB8 DB5 DB4 DB6 DB2 DB3 DB4 Database Availability Group (DAG)

UNC307 - Microsoft Exchange Server 2010 High Availability

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (19)

Plus de Louis Göhl

Plus de Louis Göhl (19)

Dernier

Dernier (20)

UNC307 - Microsoft Exchange Server 2010 High Availability