Vous aussi, devenez incollable sur la Haute Dispo d’Exchange ! Session technique, en Anglais, faite par le gourou des technos de haute disponibilité d’Exchange : Scott Schnoll. Scott est speaker aux TechReady et TechEd de Microsoft, a écrit de nombreux livres de référence, et il sera présent en exclusivité pour animer cette session. Parmi les thèmes abordés : Comment séparer mon flux de réplication des logs de mon flux client ? quand un DAG (Database Availability Group) tombe, comment le système choisit-il la bonne copie de la base de données à répliquer ? Allez au-delà des fonctions de base de la haute disponibilité et apprenez ce qui se passe réellement dans les arcanes d’un DAG Exchange. Cette session couvre le fonctionnement interne des DAGs, nous discuterons des réseaux de DAGs, d’Active Manager, de comment le système permet la sélection des meilleures réplications de bases et du Datacenter Activation Coordination Mode.
2. Exchange Server 2010
High Availability Deep
Dive
08-févr-12
Scott Schnoll
Principal Technical Writer
Microsoft Corporation
MSG306
3. Agenda
Exchange Server 2010 High Availability Deep Dive
Quorum
Witness, Witness Server, and Alternate Witness Server
Database Availability Group Networks
Active Manager
Best Copy Selection
Datacenter Activation Coordination Mode
5. Quorum
Used to ensure that only one subset of members is functioning at one
time
Requires a majority of members to be active and have communications
with each other
Represents a shared view of members (voters and some resources)
Dual Usage
Data shared between the voters representing configuration, etc.
Number of voters required for the solution to stay running (majority); quorum
is a consensus of voters
When a majority of voters can communicate with each other, the cluster has
quorum
When a majority of voters cannot communicate with each other, the cluster does
not have quorum
6. Quorum
Quorum is necessary for cluster functions and for DAG functions
The DAG must have quorum in order to mount and activate databases
Exchange 2010 uses only two of the four cluster quorum models
Node Majority (DAGs with an odd number of members)
Node and File Share Majority (DAGs with an even number of members)
Quorum = (V/2) + 1 (whole numbers only)
6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters)
9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters)
13 members: (13/2) + 1 = 7 votes for quorum (can lose 6 voters)
15 members: (15/2) + 1 = 8 votes for quorum (can lose 7 voters)
9. Witness and Witness Server
A witness is a share on a server that is external to the
DAG that participates in quorum by providing a weighted
vote for the DAG member that has a lock on the
witness.log file
Configured for all DAGs
Used only by DAGs that have an even number of
members
Witness server does not maintain a copy of quorum data,
does not vote, and is not a member of the DAG or cluster
12. Alternate Witness Server
Witness server used by a DAG after a datacenter
switchover
DAG is configured to use alternate witness server when
you run Restore-DatabaseAvailabilityGroup or
ahead of time by using Set-
DatabaseAvailabilityGroup
DAGs do not dynamically switch witness servers
Alternate witness server does not provide redundancy
for witness server or FSW resource
14. DAG Networks
A DAG network is a collection of one or more subnets
There are two types of DAG networks
MAPI Network - connects DAG members to network resources
(Active Directory, other Exchange servers, DNS, etc.)
Registered in DNS / DNS configured
Uses default gateway
Client for Microsoft Networks/File and Print Sharing enabled
Replication Network - used for/by continuous replication (log shipping
and seeding)
Not registered in DNS / DNS not configured
Typically no default gateway
Client for Microsoft Networks/File and Print Sharing disabled
15. DAG Networks
Maximum round trip return latency between all DAG
members must be 500 ms or less
Regardless of the latency of the solution, customers
should validate that the network between all DAG
members is capable of satisfying the data protection
and availability goals of the deployment
May need to investigate increasing the number of
databases or decreasing the number of mailboxes per
database to achieve desired goals
16. DAG Networks
All DAGs must have:
Exactly one MAPI network
Zero or more Replication networks
Separate network(s) on separate subnet(s)
LRU determines which replication network is used with
multiple replication networks
DAG networks automatically created when server is added to
DAG
Based on cluster’s enumeration of networks
Cluster enumeration based on subnet
One cluster network is created for each subnet
22. Active Manager
Exchange component that manages high availability
platform
Runs inside the Microsoft Exchange Replication
service on every Mailbox server
Is the definitive source of information on where a
database is active
Stores this information in cluster database
Provides this information to Active Manager client
running on other server roles (Client Access and
Hub Transport)
23. Active Manager Roles
Standalone Active Manager
Primary Active Manager (PAM)
Standby Active Manager (SAM)
Active Manager Client
Runs in RPC Client Access service on CAS
and Transport service on Hub
24. Active Manager
Primary Active Manager (PAM)
Runs on the node that owns the cluster core
resources (cluster group)
Gets topology change notifications
Reacts to server failures
Selects the best database copy on failovers and
targetless switchovers
Detects failures of local Information Store and local
databases
25. Active Manager
Standby Active Manager (SAM)
Runs on every other node in the DAG
Detects failures of local Information Store and local
databases
Reacts to failures by asking PAM to initiate a failover
Responds to queries from CAS/Hub about which server
hosts the active copy
Both roles are necessary for automatic recovery
If the Microsoft Exchange Replication service is stopped,
automatic recovery will not happen
26. Active Manager Functionality
Mount and Dismount Databases
Provide Database Availability Information
Provide Interface for Administrative Tasks
Maintains Database and Server State
Information
Monitor for Failures and Initiate Recovery
28. Best Copy Selection
Process of finding the best copy of an individual
database to activate, given a list potential copies for
activation and their status
Active Manager selects the “best” copy to become
the new active copy when the existing active copy
fails or when an administrator performs a targetless
switchover
29. Best Copy Selection – RTM
Sorts copies by copy queue length to minimize data
loss, using activation preference as a secondary sorting
key if necessary
Selects from sorted listed based on which set of criteria
met by each copy
Attempt Copy Last Logs (ACLL) runs and attempts to copy
missing log files from previous active copy
30. Best Copy Selection – SP1 and
later copies by activation preference when auto database
Sorts
mount dial is set to Lossless
Otherwise, sorts copies based on copy queue
length, with activation preference used a secondary
sorting key if necessary
Selects from sorted listed based on which set of criteria
met by each copy
Attempt Copy Last Logs (ACLL) runs and attempts to copy
missing log files from previous active copy
31. Best Copy Selection
Is database mountable?
Is copy queue length <= AutoDatabaseMountDial?
If Yes, database is marked as current active and
mount request is issued
If not, next best database tried (if one is available)
During best copy selection, any servers that are
unreachable or “activation blocked” are ignored
32. Best Copy Selection
Criteria Copy Queue Length Replay Queue Length Content Index Status
1 < 10 logs < 50 logs Healthy
2 < 10 logs < 50 logs Crawling
3 N/A < 50 logs Healthy
4 N/A < 50 logs Crawling
5 N/A < 50 logs N/A
6 < 10 logs N/A Healthy
7 < 10 logs N/A Crawling
8 N/A N/A Healthy
9 N/A N/A Crawling
10 Any database copy with a status of Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, or SeedingSource
33. Best Copy Selection – RTM
Four copies of DB1
DB1 currently active on Server1
Server1 Server2 Server3 Server4
Database Copy Activation Copy Queue
XDB1
Replay Queue
DB1
CI State
DB1 DB1
Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
34. Best Copy Selection – RTM
Sort list of available copies based by Copy Queue Length
(using AP as secondary sort key if necessary):
Server3DB1
Server2DB1
Server4DB1
Database Copy Activation Copy Queue Replay Queue CI State Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
35. Best Copy Selection – RTM
Only two copies meet first set of criteria for activation
(CQL< 10; RQL< 50; CI=Healthy):
Server3DB1 Lowest copy queue length – tried first
Server2DB1
Server4DB1
Database Copy Activation Copy Queue Replay Queue CI State Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
36. Best Copy Selection – SP1 and
later copies of DB1
Four
DB1 currently active on Server1
Auto database mount
Server1 Server2 Server3 Server4
dial set to Lossless
Database Copy Activation Copy Queue
XDB1
Replay Queue
DB1
CI State
DB1 DB1
Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
37. Best Copy Selection – SP1 and
later list of available copies based by Activation
Sort
Preference:
Server2DB1
Server3DB1
Server4DB1
Database Copy Activation Copy Queue Replay Queue CI State Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
38. Best Copy Selection – SP1 and
later list of available copies based by Activation
Sort
Preference: Lowest preference value – tried first
Server2DB1
Server3DB1
Server4DB1
Database Copy Activation Copy Queue Replay Queue CI State Database State
Preference Length Length
Server2DB1 2 4 0 Healthy Healthy
Server3DB1 3 2 2 Healthy DiscAndHealthy
Server4DB1 4 10 0 Crawling Healthy
39. Best Copy Selection
After Active Manager determines the best copy to activate
The Replication service on the target server attempts
to copy missing log files from the source (ACLL)
If successful, then the database will mount with
zero data loss
If unsuccessful (lossy failure), then the database
will mount based on the AutoDatabaseMountDial
setting
If data loss is outside of dial setting, next copy will
be tried
40. Best Copy Selection
After Active Manager determines the best copy to activate
The mounted database will generate new log files
(using the same log generation sequence)
Transport Dumpster requests will be initiated for the
mounted database to recover lost messages
When original server or database recovers, it will run
through divergence detection and either perform an
incremental resync or require a full reseed
42. DAC Mode
Datacenter Activation Coordination (DAC) mode is a property
setting of a DAG
Acts as an application-level form of quorum
Designed to prevent multiple copies of same database
mounting on different members due to loss of network
Also enables use of Site Resilience cmdlets
Stop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup
Start-DatabaseAvailabilityGroup
43. DAC Mode
Exchange 2010 RTM
DAC Mode is only for DAGs with three or more
members that are extended to two Active Directory
sites
Exchange 2010 SP1 and later
DAC Mode can (and should) be enabled for all DAGs
44. DAC Mode
Uses Datacenter Activation Coordination Protocol (DACP),
which is a bit in memory set to either:
0 = can’t mount
1 = can mount
45. DAC Mode
Active Manager startup sequence
DACP is set to 0
DAG member communicates with other DAG members it
can reach to determine the current value for their DACP bits
If the starting DAG member can communicate with all
other members, DACP bit switches to 1
If other DACP bits are set to 0, starting DAG member
DACP bit remains at 0
If another DACP bit is set to 1, starting DAG member
DACP bit switches to 1
46. DAC Mode
Secondary Datacenter
Primary Datacenter
Outlook Outlook
DAG1 HT2010 CAS-Pri CAS-Sec HT2010
FSW
DAG1 Active Active
MBX-A MBX-B MBX-C MBX-D
47. DAC Mode
Secondary Datacenter
Primary Datacenter
Outlook Outlook
AWS
DAG1 HT2010 CAS-Pri CAS-Sec HT2010
FSW
DAG1 Active Active
MBX-A MBX-B MBX-C MBX-D
48. DAC Mode
Secondary Datacenter
Primary Datacenter
Outlook Outlook
AWS
DAG1 HT2010 CAS-Pri CAS-Sec HT2010
FSW
DAG1 Active Active
0 0 1 1
MBX-A MBX-B MBX-C MBX-D
49. Questions?
Thank you for attending!
Contact me at any time with questions:
scott.schnoll@microsoft.com
Twitter: @schnoll
Blog: http://blogs.technet.com/scottschnoll
Replication networks typically do not have default gateways, and if the MAPI network has a default gateway, then no other networks should have default gateways. Routing of network traffic on a Replication network can be configured by using persistent, static routes to the corresponding network on other DAG members using gateway addresses that have the ability to route between the Replication networks. All other traffic not matching this route will be handled by the default gateway that's configured on the adapter for the MAPI network.