4. DAG Replication Service
•
Introduced in Exchange 2007 RTM
•
•
•
•
•
Microsoft Exchange Replication service | MSExchangeRepl
MSExchangeRepl.exe
Runs on all Mailbox servers (not just DAG members)
Communicates with Active Directory and other DAG members
Includes 16 components
Active Directory lookup
Replay RPC server wrapper
TPR API manager
Copy status lookup
Remote data provider wrapper
Support API manager
Replay core manager
VssWriter
Server locator manager
Seed manager
Active manager
Health state tracker
Autoreseed manager
Active manager RPC server wrapper
Disk reclaimer manager
Failure item manager
5. DAG Management Service
•
Introduced in RTM CU2
•
•
•
•
•
•
•
•
•
6
Microsoft Exchange DAG Management service | MSExchangeDagMgmt
MSExchangeDagMgmt.exe
Runs on all Mailbox servers (not just DAG members)
Communicates with Active Directory and other DAG members
Active Directory lookup
Copy status lookup
Monitoring
Tracer instance
Includes 4 components
6. DAG Management Service
•
Created for two primary reasons:
•
•
•
•
•
•
•
•
so the Replication service can have more focused functionality
so Managed Availability actions can kill lower-priority activities
Microsoft Exchange DAG management service | MSExchangeDagMgmt
MSExchangeDagMgmt.exe
Runs on all mailbox servers (not just DAG members)
Communicates with Active Directory and other DAG members
•
•
AutoReseed, Disk reclaimer, Dynamic replay lag playdown
Future AutoDAG copy layout and mobility features
Writes events to same place as Replication service
Other functions will move to this service
7. Cluster service
• Introduced in NT Server enterprise edition (1997)
• Cluster Service | ClusSvc
• Clussvc.exe
• Exchange DAGs use several cluster components
•
•
•
•
8
Quorum
Membership and node management
Networks and heartbeating
Cluster registry
8. Cluster service
• Quorum is required in order to mount databases
• Quorum is based on votes, not membership
• Voting can be rigged
• Votes can be taken away manually or dynamically
• Exchange manages quorum model, not quorum
• Exchange management of quorum model based on nodes, not
votes
• Removing votes requires manual configuration of quorum model
• Exchange will make incorrect quorum model management
decisions if votes are manually removed at the cluster level
9. Cluster registry
• Active Manager stores database / server information in the
cluster registry for DAG members
• Registry changes are replicated immediately to all DAG
members
• Stored information is used as part of BCSS
12. Crimson channel
•
Applications and Services logs
•
Area of Windows Server event log used by applications for logging and internal
communication
These logs store events from a single application or component rather than
events that might have system-wide impact
This is referred to as an application's crimson channel
•
•
•
•
•
•
ActiveMonitoring
HighAvailability
MailboxDatabaseFailureItems
ManagedAvailability
PushNotifications
Troubleshooters
•
•
•
Exchange 2013 has multiple channels
15. Witness Server
•
A server that participates in a failover cluster with an even
number of members
•
•
•
•
Is not a member of the cluster
Does not contain a full copy of quorum data
Represented by File Share Witness resource
•
If server or share are not available, cluster resources are failed and moved
to another node
If another node does not bring resource online, the resource remains in a
Failed state, with restart attempts every 60 minutes
If needed for quorum, but cannot be brought online, quorum will be lost
Uses IsAlive check for availability
•
•
16. Witness Server
• A lock is not actively maintained on the witness
• When it becomes necessary to obtain an additional
vote to maintain quorum
• An SMB file lock is placed on the witness.log file by one
node
• Node paxos information is incremented by the locking node
and the updated paxos tag written to the witness.log file
• Lock is released when witness server is no longer
needed to maintain quorum
17. Windows Failover Clustering
• Node that locks witness.log gets the witness vote
• If enough nodes are in contact with the locking node to
constitute a majority, they will maintain quorum and
continue providing service
• Nodes not in contact with the locking node are in the
minority and lose quorum
• Nodes not owning cluster core resources wait 6
seconds prior to attempting to lock the FSW
(arbitrationDelay)
20. Witness server placement
• Basic guidance for Exchange 2010
• “We recommend that you use a Hub Transport server
running on Microsoft Exchange Server 2010 in the Active
Directory site containing the DAG. This allows the witness
server and directory to remain under the control of an
Exchange administrator.”
• “If your DAG is extended to multiple datacenters, we
recommend deploying the witness server in the datacenter
that is considered to be the primary datacenter.”
21. Witness server placement
• Exchange 2013 guidance more complicated due
to new options introduced by architectural
changes
• Exchange 2013 includes support for new DAG
configuration options that are not recommended
or possible in previous versions of Exchange
• A third location, such as a third physical datacenter or
branch office
22. Witness server placement
• Ultimately, the placement of a DAG’s witness server
depends on business requirements and the options
available to the organization
23. Witness server placement
Deployment scenario
Recommendations
Single DAG deployed in a single
datacenter
Locate witness server in the same datacenter as DAG members
Single DAG deployed across two
datacenters; no additional locations
available
Locate witness server in primary datacenter
Multiple DAGs deployed in a single
datacenter
Locate witness server in the same datacenter as DAG members. Additional options
include:
•
Using the same witness server for multiple DAGs
•
Using a DAG member to act as a witness server for a different DAG
Multiple DAGs deployed across two
datacenters
Locate witness server in the same datacenter as DAG members. Additional options
include:
•
Using the same witness server for multiple DAGs
•
Using a DAG member to act as a witness server for a different DAG
Single or Multiple DAGs deployed across
more than two datacenters
Locate the witness server in the datacenter where you want the majority of quorum
votes to exist
24. Witness server placement
• If the organization has a third location, a DAG’s
witness server can be deployed there for
automatic failover between sites
• The witness server location must have network
infrastructure and connectivity that is isolated from
network failures that affect the two datacenters with
Exchange
• For all DAGs, the availability of the witness server
should be on the Exchange administrator’s radar
25. Witness server placement
• Azure is not supported for use as a Witness
Server for Exchange DAGs
• Investigation into using Azure to host witness
server ran into dead end
• Azure does not yet support the required underlying
network configuration to enable an Azure file server VM
to act as a witness server
• More info at http://aka.ms/DAGAzure
27. Dynamic Quorum
• In Windows Server 2008 R2, quorum majority is
fixed, based on the initial cluster configuration
• In Windows Server 2012 (and later), cluster
quorum majority is determined by the set of
nodes that are active members of the cluster at a
given time
• This new feature is called Dynamic Quorum, and
it is enabled for all clusters by default
28. Dynamic Quorum
• Cluster dynamically manages the vote assignment to
nodes, based on the state of each node
• When a node shuts down or crashes, the node loses its quorum
vote
• When a node successfully rejoins the cluster, it regains its quorum
vote
• By dynamically adjusting the assignment of quorum
votes, the cluster can increase or decrease the number of
quorum votes that are required to keep running
• This enables the cluster to maintain availability during sequential
node failures or shutdowns
29. Dynamic Quorum
• With dynamic quorum management, it is also
possible for a cluster to run on the last
surviving cluster node
• By dynamically adjusting the quorum majority
requirement, the cluster can sustain sequential
node shutdowns to a single node
• This is referred to as “Last Man Standing” scenario
30. Dynamic Quorum
• Does not allow a cluster to sustain a
simultaneous failure of a majority of voting
members
• To continue running, the cluster must always have a
quorum majority at the time of a node shutdown or
failure
• If you remove a node’s vote, the cluster does not
dynamically add the vote back
38. Dynamic Quorum
Use Get-ClusterNode to verify DynamicWeight common
property of Node
0 = does not have quorum vote
1 = has quorum vote
Get-ClusterNode <Name> | ft name, *weight, state
Name
---EX1
DynamicWeight
------------1
NodeWeight State
---------- ----1
Up
39. Dynamic Quorum and DAGs
• Does not change quorum requirements for DAGs
• Does work with DAGs
• All internal DAG testing done with dynamic quorum
enabled
• Enabled in Office 365 for servers on Windows
Server 2012
• Exchange is not dynamic quorum-aware
40. Dynamic quorum and DAGs
Cluster team guidance on dynamic quorum:
“Selecting this option generally increases the availability of the cluster. By default the
option is enabled, and it is strongly recommended to not disable this option. This option
allows the cluster to continue running in failure scenarios that are not possible when this
option is disabled.”
Exchange team guidance on dynamic quorum:
Leave it enabled for majority of DAG members
Don’t factor it into availability plans
The advantage is that, in some cases where 2008 R2 would have lost quorum, 2012 can
maintain quorum; this only applies to a few cases, and should not be relied upon when
planning a DAG
42. DAG member maintenance
• Basic guidance for DAG member maintenance in
Exchange 2010
• Run StartDagServerMaintenance.ps1 to put DAG member
in maintenance mode
• Perform the maintenance (e.g., install the update rollup)
• Run StopDagServerMaintenance.ps1 to take DAG member
out of maintenance mode and put it back into production
• Optionally rebalance the DAG by using
RedistributeActiveDatabases.ps1
46. Summary
• DAG architecture continues to evolve
• More witness server placement options
available
• Dynamic quorum works with DAGs
• DAG member maintenance mode process is
new