4. Exchange Server 2010 - Database Availability Group
Let’s Begin DAG
DAG is one of the major enhancements in exchange 2010. LCR, CCR and SCR in exchange 2007
are dropped in Exchange 2010 and DAG is introduced as a single HA solution. Exchange “14” uses the
same Continuous Replication technology found into Exchange Server 2007, but unites on-site (CCR) and
off-site (SCR) data replication into a one framework. Exchange server manages all aspects of failover and
no windows clustering knowledge is required as DAG configure clustering by it. DAG can have as many as
15 copies (up to 16 Nodes) of Databases compared to the two node CCR cluster. DAG makes failover more
granular, database-level rather than server level. So a failure of a database running on DAG won’t result in
entire server failover which affect the users in the other Databases on the server. A server that is a part of
DAG can still hold other server roles. This reduces the minimum number of servers required to build a
redundant exchange environment to two. DAG can easily be stretch across sites to provide site resilience
from a disaster. In CCR the passive server is in an inactive condition where as in DAG databases can be
distributed among the nodes.
Some Clustering Basics
Before we begin DAG it will be better to have a look into some clustering technologies, which will help
you to understand DAG very quickly.
The concept of a cluster involves taking two or more computers and organizing them to work together to
provide higher availability, reliability and scalability than can be obtained by using a single system. When
failure occurs in a cluster, resources can be redirected and the workload can be redistributed. A Server
cluster provides high availability by making application software and data available on several servers
linked together in a cluster configuration. If one server stops functioning, a process called failover
automatically shifts the workload of the failed server to another server in the cluster. The failover process
is designed to ensure continuous availability of critical applications and data.
5. There are mainly three types of clustering in Windows Server.
Network Load Balancing provides failover support for IP-based applications and services that require
high scalability and availability. With Network Load Balancing (NLB), organizations can build groups of
clustered computers to support load balancing of Transmission Control Protocol (TCP), User Datagram
Protocol (UDP) and Generic Routing Encapsulation (GRE) traffic requests. Web-tier and front-end
services are ideal candidates for NLB.
Component Load Balancing, which is a feature of Microsoft Application Center 2000, provides dynamic
load balancing of middle-tier application components that use COM+. With Component Load Balancing
(CLB), COM+ components can be load balanced over multiple nodes to dramatically enhance the
availability and scalability of software applications.
Server cluster provides failover support for applications and services that require high availability,
scalability and reliability. With clustering, organizations can make applications and data available on
multiple servers linked together in a cluster configuration. Back-end applications and services, such as
those provided by database servers, are ideal candidates for Server cluster. Some of the components of
Server clusters are discussed below,
Quorum
A quorum is the cluster’s configuration database, it tells the cluster which node should be active.
Standard quorum: It is a configuration database for the cluster and is stored on a shared hard
disk, accessible to all of the cluster’s nodes.
The other thing that the quorum does is to intervene when communications fail between nodes.
Normally, each node within a cluster can communicate with every other node in the cluster over a
dedicated network connection. If this network connection were to fail though, the cluster would be split
into two pieces, each containing one or more functional nodes that cannot communicate with the nodes
that exist on the other side of the communications failure.
When this type of communications failure occurs, the cluster is said to have been partitioned. The
problem is that both partitions have the same goal; to keep the application running. The application can’t
6. be run on multiple servers simultaneously though, so there must be a way of determining which partition
gets to run the application. This is where the quorum comes in. The partition that “owns” the quorum is
allowed to continue running the application. The other partition is removed from the cluster.
Majority Node Set (MNS) Quorum: The Main difference between a Standard Quorum and a
MNS quorum is that that in MNS each node has its own, locally stored copy of the quorum database. The
other way that a MNS quorum depends on majorities is in starting the nodes. A majority of the nodes
((number of nodes /2) +1) must be online before the cluster will start the virtual server. If fewer than the
majority of nodes are online, then the cluster is said to “not have quorum”. In such a case, the necessary
services will keep restarting until a sufficient number of nodes are present.
One of the most important things about MNS is that you must have at least three nodes in the cluster.
Remember that a majority of nodes must be running at all times. If a cluster only has two nodes, then the
majority is calculated to be 2 ((2 nodes / 2) +1)-2. Therefore, if one node were to fail, the entire cluster
would go down because it would not have quorum.
File share witness
The file share witness feature is an improvement to Majority Node Set (MNS) quorum model. This feature
lets you use a file share that is external to the cluster as an additional "vote" to determine the status of the
cluster in a two-node MNS quorum cluster deployment.
Consider a two-node MNS quorum cluster. Because an MNS quorum cluster can only run when the
majority of the cluster nodes are available, a two-node MNS quorum cluster is unable to sustain the failure
of any cluster node. This is because the majority of a two-node cluster is two. To sustain the failure of any
one node in an MNS quorum cluster, you must have at least three devices that can be considered as
available. The file share witness feature enables you to use an external file share as a witness. This witness
acts as the third available device in a two-node MNS quorum cluster. Therefore, with this feature enabled,
a two-node MNS quorum cluster can sustain the failure of a single cluster node. Additionally, the file
share witness feature provides the following two functionalities: It helps protect the cluster against a
problem that is known as a split brain (a condition that occurs when all networks fail). It helps protect the
cluster against a problem that is known as a partition in time.
7. Fundamental of DAG
DAG
A database availability group (DAG) is the base component of the high availability and site
resilience framework built into Microsoft Exchange Server 2010. A DAG is a group of up to 16 Mailbox
servers that host a set of databases and provide automatic database-level recovery from failures that affect
individual servers or databases.
A DAG is a boundary for mailbox database replication, database and server switchovers, and failovers, and
for an internal component called Active Manager. Active Manager is an Exchange 2010 component which
manages switchovers and failovers that runs on every server in a DAG.
What DAG changes
1. No more Exchange Virtual Servers/Clustered Mailbox Servers.
2. Database is no longer associated to a Server but is an Organization Level resource.
8. 3. There is no longer a requirement to choose Cluster or Non Cluster at installation, an Exchange
2010 server can move in and out of a DAG as needed.
4. The limitation of only hosting the mailbox role on a clustered Exchange server.
5. Storage Groups have been removed from Exchange.
Server A server is a unit of membership for a DAG. A server hosts active and passive copies of Multiple
Mailbox Databases and execute various services on Exchange Mailbox Database like Information Store,
Mailbox Assistance etc. A server is also responsible for the execution of replication service on passive
mailbox database copies. Server provides connection point between Information Store and RPC Client
Access. It defines very few server-level properties relevant to High Availability (HA) like Server’s DAG
and Activation Policy.
Mailbox Database
A database is a unit of Failover in a DAG. A database has only one active copy, it can be mounted
or dismounted. A Mailbox database can have as many as 15 passive copies depending on the number of
Mailbox Servers available. Ideally it takes only about 30 seconds for database failover. Server
failover/switchover involves moving all active databases to one or more other servers. Database names are
unique across a forest. Mailbox Database defines properties like GUID, EDB file path and Name of servers
hosting copies.
Mailbox Availability Terms
Active Mailbox: Provide mail services to the clients.
Passive Mailbox: Available to provide mail services to the clients if active copy fails.
Source Mailbox: Provides data for copying to a separate location
Target Mailbox: Receives data from the source
9. Mailbox Database Copy
It defines the Scope of Database replication. A Database copy is either source or target of replication at any
given time. A copy is either active or passive at any given time. Only one copy of each database in a DAG
is active at a time. A server may not host one or no copy of any database.
Active Manager
For exchange server Active Directory is primary source for configuration information, whereas Active
Manager is primary source for changeable state information such as active and mounted.
Active Manager is an Exchange-aware resource manager known as high availability’s brain. AM run on
every server in the DAG and manages which copies should be active and which should be passive. It is
also definitive source of information on where a database is active or mounted and provides this
information to other Exchange components (e.g., RPC Client Access and Hub Transport). AM Information
is stored in cluster database.
In Exchange Server 2010, the Microsoft Exchange Replication service periodically monitors the health of
all mounted databases. In addition, it also monitors Extensible Storage Engine (ESE) for any I/O errors or
failures. When the service detects a failure, it notifies Active Manager. Active Manager then determines
which database copy should be mounted and what it required to mount that database. In addition, tracks
the active copy of a mailbox database (based on the last mounted copy of the database) and provides the
tracking results information to the RPC Client Access component on the Client Access server to which the
client is connected.
When an administrator makes a database copy the active mailbox database, this process is known as a
switchover. When a failure affecting a database occurs and a new database becomes the active copy, this
process is known as a failover. This process also refers to a server failure in which one or more servers
bring online the databases previously online on the failed server. When either a switchover or failover
occurs, other Exchange Server 2010 server roles become aware of the switchover almost immediately and
will redirect client and messaging traffic to the new active database.
For example, if an active database in a DAG fails because of an underlying storage failure, Active Manager
will automatically recover by failing over to a database copy on another Mailbox server in the DAG. In the
10. event the database is outside the automatic mount criteria and cannot be automatically mounted, an
administrator can manually perform a database failover.
Primary Active Manager (PAM): PAM is the Active Manager in the DAG which decides which
copies will be active and passive. It moves to other servers if the server hosting it is no longer able to. You
need to move the PAM if you take a server offline for maintenance or upgrade. PAM is responsible for
getting topology change notifications and reacting to server failures. PAM is a role of an Active Manager.
If the server hosting the PAM fails, another instance of Active Manager adopts the role (the one that takes
ownership of the cluster group). The PAM controls all movement of the active designations between a
database’s copies (only one copy can be active at any given time, and that copy may be mounted or
dismounted). The PAM also performs the functions of the SAM role on the local system (detecting local
database and local Information Store failures).
Standby Active Manager (SAM): SAM provides information on which server hosts the active
copy of a mailbox database to other components of Exchange, e.g., RPC Client Access Service or Hub
Transport. SAM detects failures of local databases and the local Information Store. It reacts to failures by
asking the PAM to initiate a failover (if the database is replicated). A SAM does not determine the target
of failover, nor does it update a database’s location state in the PAM. It will access the active database
copy location state to answer queries for the active copy of the database that it receives from CAS, Hub,
etc.
Active Manager Best Copy Selection
When a failure occurs that affects a replicated mailbox database, the PAM initiates failover logic and
selects the best available database copy for activation. PAM uses up to ten separate sets of criteria when
locating the best copy to activate. When a failure affecting the active database occurs, Active Manager uses
several sets of selection criteria to determine which database copy should be activated. Active Manager
attempts to locate a mailbox database copy that has a status of Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, or SeedingSource, then depending on the status of the content
indexing, Replay Queue Length, Copy Queue length it determines the best copy to activate.
11. Continuous Replication
Continuous Replication combines the asynchronous log shipping and replay technology. It includes the
following steps
Database copy seeding of target
Log copying from source to target
Log inspection at target
Log replay into database copy
Database Seeding
Seeding is the process of making available a baseline copy of a database on the passive nodes. Depending
on the situation, seeding can be an automatic process or a manual process in which you initiate the
seeding.
Automatic seeding: An automatic seed produces a copy of a database in the target location. Automatic
seeding requires that all log files, including the very first log file created by the database (it contains the
database creation log record), be available on the source. Automatic seeding only occurs during the
creation of a new server or creation of a new database (or if the first log still exists, i.e. log truncation
hasn’t occurred).
Seeding using the Update-MailboxDatabaseCopy cmdlet: You can use the Update-MailboxDatabaseCopy
cmdlet in the Exchange Management Shell to seed a database copy. This option utilizes the streaming copy
backup API to copy the database from the active location to the target location.
Manually copying the offline database: This process dismounts the database and copies the database file to
the same location on the passive node. If you use this method, there will be an interruption in service
because the procedure requires you to dismount the database.
Seeding is required under the following conditions:
When a new passive node is introduced into a DAG environment and the first log file of the
production Database is not available.
12. After a failover occurs in which data is lost as a result of the now passive copy having become
diverged and unrecoverable.
When the system has detected a corrupted log file that cannot be replayed into the passive copy.
After an offline defragmentation of the database occurs.
After a page scrubbing of the active copy of a database occurs, and you want to propagate the
changes to the passive copy.
After the log generation sequence for the Database group has been reset back to 1.
Log Shipping
Log shipping allows you to automatically send transaction log backups from a primary database on a
primary server instance to one or more secondary databases on separate secondary server instances. Log
shipping in Exchange Server 2010 leverages TCP sockets and supports encryption and compression.
Administrator can set TCP port to be used for replication.
Replication service on target notifies the active instance the next log file it expects based on last log file
which it inspected. Replication service on source responds by sending the required log file(s). Copied log
files are placed in the target’s Inspector directory.
Log Inspection
Log inspector is Responsible for verifying that the log files are valid. The following actions are performed
by LogInspector:
Physical integrity inspection This validation utilizes ESEUTIL /K against the log file and validates that
the checksum recorded in the log file matches the checksum generated in memory.
Header inspection The Replication service validates the following aspects of the log file’s header:
The generation is not higher than the highest generation recorded for the database in question.
The generation recorded in the log header matches the generation recorded in the log filename.
The log file signature recorded in the log header matches that of the log file.
Removal of Exx.log Before the inspected log file can be moved into the log folder, the Replication service
needs to remove any Exx.log files. These log files are placed into another sub-directory of the log
13. directory, the ExxOutofDate directory. An Exx.log file would only exist on the target if it was previously
running as a source. The Exx.log file needs to be removed before log replay occurs because it will contain
old data which has been superseded by a full log file with the same generation. If the closed log file is not a
superset of the existing Exx.log file, then we will have to perform an incremental or full reseed
Log Replay
After the log files have been inspected, they are placed within the log directory so that they can be
replayed in the database copy. Before the Replication service replays the log files, it performs a series of
validation tests. Once these validation checks have been completed, the Replication service will replay the
log iteration.
Lossy Failure Process
In the event of failure, the following steps will occur for the failed database:
1. Active Manager will determine the best copy to activate
2. The Replication service on the target server will attempt to copy missing log files from the source
– ACLL (Attempt To Copy Last Log)
3. If successful (for example, because the server is online and the shares and necessary data are
accessible), then the database will mount with zero data loss.
4. If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial
setting
5. The mounted database will generate new log files (using the same log generation sequence)
6. Transport Dumpster requests will be initiated for the mounted database to recover lost messages
7. When original server or database recovers, it will run through divergence detection and perform
an incremental reseed or require a full reseed
AutoDatabaseMountDial.
There are three possible values for the server setting AutoDatabaseMountDial.
Lossless Lossless is zero logs lost. When the attribute is set to Lossless, under most circumstances
the system waits for the failed node to come back online before databases are mounted. Even then
the failed system must return with all logs accessible and not corrupted. After the failure, the
14. passive node is made active, and the Microsoft Exchange Information Store service is brought
online. It checks to determine whether the databases can be mounted without any data loss. If
possible, the databases are mounted. If they cannot be automatically mounted, the system
periodically attempts to copy the logs. If the server returns with its logs intact, this attempt will
eventually succeed, and the databases will mount. If the server returns without its logs intact, the
remaining logs will not be available, and the affected databases will not mount automatically. In
this event, administrative action is required to force the database to mount when logs are lost.
Good availability Good availability is three logs lost. Good availability provides fully automatic
recovery when replication is operating normally and replicating logs at the rate they are being
generated.
Best availability Best availability is six logs lost, which is the default setting. Best availability
operates similarly to Good availability, but it allows automatic recovery when the replication
experiences slightly more latency. Thus, the new active node might be slightly farther behind the
state of the old active node after the failover, thereby increasing the likelihood that database
divergence occurs, which requires a full reseed to correct.
Incremental Resync
In Exchange Server 2007, LLR (Lost Log Resilience) delayed writes to the active database to minimize
divergence between an old failed active and the new active, and thereby minimize the need to perform
for reseeds. Changes were written in the passive database before they were written in the active database.
When the old failed active came back, it was unlikely that it contained data that had never made it to the
passive before it failed. Only when it contained data that had never made it to the passive, did it have to
receive a full reseed when it came back online.
In Exchange Server 2010, we now have two incremental resync solutions. Incremental resync v1 is based
on LLR depth and is only used when the waypoint = 1 (i.e. we’ve only lost one log). Incremental resync
v2 is used when more than a single log is lost and has the following process:
1. Active DB1 on server1 fails and is a lossy failure.
2. Passive DB1 copy on Server3 takes over service.
3. Sometime later, failed DB1 on Server1 comes back as passive, but contains inconsistent data.
15. 4. Replication service on Server1 will compare the transaction logs on Server1 with Server3 starting
with the newest generation and working backwards to locate the divergence point.
5. Once the divergence point is located, the log records of the diverged logs on Server1 will be
scanned and a list of page records will be built.
6. The replication service will then copy over the corresponding page records and logs from Server3.
In addition, the database header min/max required logs will also be obtained from the active db
copy on Server3.
7. Replication Service on Server1 will then revert the changes of diverged logs by inserting the
correct pages from Server3.
8. Server1’s copy’s db header will be updated with the appropriate min/max log generations.
9. Log recovery is then run to get the db copy current.
Database Activation Coordination
DAC mode is used to control the activation behavior of a DAG when a catastrophic (disastrous or
extremely harmful) failure occurs that affects the DAG (for example, a complete failure of one of the
datacenters). When DAC mode isn't enabled, and a failure affecting multiple servers in the DAG occurs,
when a majority of servers are restored after the failure, the DAG will restart and attempt to mount
databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a
condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each
other. Split brain syndrome also occurs when network connectivity is severed between the datacenters.
Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of
DAGs with an even number of members, the DAG's witness server) to be available and interacting for the
DAG to be operational. When a majority of the members are communicating, the DAG is said to have a
quorum.
DAC is designed to prevent this by implementing a “mommy may I” (Datacenter Activation Coordination
Protocol (DACP)) protocol. In the event where there has been a catastrophic loss, when the DAG recovers
it cannot mount databases just because quorum is present in the DAG. Instead it must coordinate with the
other active managers in the DAG to determine state.
16. Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary
datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision
to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the
primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG
members in the primary datacenter will power up, but they won’t be able to communicate with the DAG
members in the activated standby datacenter. The primary datacenter should always contain the majority
of the DAG quorum voters, which means that when power is restored, even in the absence of WAN
connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter
have a majority and therefore have quorum. This is a problem because with quorum, these servers may be
able to mount their databases, which in turn would cause divergence from the actual active databases that
are now mounted in the activated standby datacenter.
DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells
the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a
DAG is running in DAC mode (which would be any DAG with three or more members), each time Active
Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC
mode, the server must try to communicate with all other members of the DAG that it knows to get
another DAG member to give it an answer as to whether it can mount local databases that are assigned as
active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If
another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the
server starting up sets its bit to 1 and mounts its databases.
But when you recover from a primary datacenter power outage where the servers are recovered but WAN
connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP
bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will
mount databases, because none of them can communicate with a DAG member that has a DACP bit value
of 1.
17. Transport Dumpster
You may be aware that Transport Dumpster is part of HUB server. Although it is part of HUB server it
works in conjunction with DAG. So it will be better to discuss the functionality of Transport Dumpster
while discussing about DAG. So let’s see how
Transport dumpster is a feature designed to minimize data loss by redelivering recently submitted
messages back to the mailbox server after a lossy failure.
Improvements in Transport Dumpster
In Exchange 2007, messages were retained in the transport dumpster until the administrator-defined time
limit or size limit is reached. In Exchange 2010, the transport dumpster now receives feedback from the
replication pipeline to determine which messages have been delivered and replicated. As a message goes
through Hub Transport servers on its way to a replicated mailbox database in a DAG, a copy is kept in the
transport queue (mail.que) until the replication pipeline has notified the Hub Transport server that the
transaction logs representing the message have been successfully replicated to and inspected by all copies
of the mailbox database. After the logs have been replicated to and inspected by all database copies, they
are truncated from the transport dumpster. This keeps the transport dumpster queue smaller by
maintaining only copies of messages whose transactions logs haven't yet been replicated.
The transport dumpster has also been enhanced to account for the changes to the Mailbox server role that
enable a single mailbox database to move between Active Directory sites. DAGs can be extended to
multiple Active Directory sites, and as a result, a single mailbox database in one Active Directory site can
fail over to another Active Directory site. When this occurs, any transport dumpster redelivery requests
will be sent to both Active Directory sites: the original site and the new site.
Whenever a Hub Transport server receives a message, it undergoes categorization. Part of the
categorization process involves querying Active Directory to determine if the destination Database
containing the recipient’s mailbox is enabled DAG. Once the message has been delivered to all recipients,
the message is committed to the mail.que file on the Hub Transport server and stored in the transport
dumpster inside the mail.que file. The transport dumpster is available for each Database within each
18. Active Directory site that has DAG enabled. There are two settings that define the life of a message within
the transport dumpster. They are:
MaxDumpsterSizePerDatabase: The MaxDumpsterSizePerDatabase parameter specifies the
maximum size of the transport dumpster on a Hub Transport server for each database. The default
value is 18 MB. The valid input range for this parameter is from 0 through 2147483647 KB. The
recommendation is that this be set to 1.5 times the maximum message size limit within your
environment. If you do not have a maximum message size limit set, then you should evaluate the
messages that are delivered within your environment and set the value to 1.5 times the average
message size in your organization.
When you enter a value, qualify the value with one of the following units:
KB (kilobytes)
MB (megabytes)
GB (gigabytes)
TB (terabytes)
Unqualified values are treated as kilobytes.
MaxDumpsterTime Defines the length of time that a message remains within the transport
dumpster if the dumpster size limit is not reached. The default is seven days.
If either the time or size limit is reached, messages are removed from the transport dumpster by order of
first in, first out.
When a failover (unscheduled outage) occurs, the Replication service will attempt to copy the missing log
files. If the copy attempt fails, then this is known as a lossy failover and the following steps are taken.
1. If the databases are within the AutoDatabaseMountDial value, they will automatically mount.
2. The Replication service will record that the Database requires Transport Dumpster redelivery in
the cluster database by setting the DumpsterRedeliveryRequired key to true.
3. The Replication service will record the Hub Transport servers that exist within the clustered
mailbox server’s Active Directory site in the cluster database
19. 4. The Replication service will calculate the loss window. This is done using the LastLogInspected
marker as the start time and the current time as the end time. Since the transport dumpster is
based on message delivery times, we generously pad the loss window by expanding it 12 hours
back and 4 hours forward. The start time is recorded in DumpsterRedeliveryStartTime and the end
time is recorded in DumpsterRedeliveryEndTime.
5. The Replication service makes an RPC call to the Hub Transport servers listed in
DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window.
6. The Hub Transport server will acknowledge the first redelivery request with a Retry response.
7. The Hub Transport server will redeliver the messages it has within its transport dumpster for the
allotted time window. Once the message is resubmitted for delivery, the message is removed from
the transport dumpster.
8. The Replication service makes an RPC call to the Hub Transport servers listed in
DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window.
9. The Hub Transport servers that have successfully redelivered the dumpster messages will
acknowledge the redelivery request with a Success response. At this point the Replication service
will remove those Hub Transport servers from the DumpsterRedeliveryServers key.
10. This process will continue until either all Hub Transport servers have redelivered the mail, or the
MaxDumpsterTime has been reached.
Note: If there are no message size limits in your organization, a single 18 MB message will purge all other
messages for given Database on a given Hub Transport server.
Reference
Understanding Database Availability Groups
Understanding Active Manager
Understanding Mailbox Database Copies
Understanding the Exchange Information Store
White Paper: Continuous Replication Deep Dive
Understanding How Cluster Quorums Work