SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
HCL Infosystem


Database Availability Group
Fundamentals of DAG




Sarath Kumar M
3/4/2010
Contents


Exchange Server 2010 - Database Availability Group ...................................................................................... 4

Let’s Begin DAG ................................................................................................................................................. 4

Some Clustering Basics ....................................................................................................................................... 4

   Network Load Balancing ................................................................................................................................ 5

   Component Load Balancing ........................................................................................................................... 5

   Server cluster .................................................................................................................................................. 5

       Quorum ....................................................................................................................................................... 5

       Standard quorum ........................................................................................................................................ 5

       Majority Node Set (MNS) Quorum: .......................................................................................................... 6

       File share witness ....................................................................................................................................... 6

Fundamental of DAG ......................................................................................................................................... 7

   DAG ................................................................................................................................................................ 7

   Mailbox Database ........................................................................................................................................... 8

   Mailbox Database Copy .................................................................................................................................. 9

   Active Manager .............................................................................................................................................. 9

       Primary Active Manager (PAM): ............................................................................................................ 10

       Standby Active Manager (SAM): ............................................................................................................. 10

       Active Manager Best Copy Selection....................................................................................................... 10

   Continuous Replication................................................................................................................................ 11

   Database Seeding .......................................................................................................................................... 11

   Log Shipping ................................................................................................................................................. 12
Log Inspection .............................................................................................................................................. 12

   Log Replay .................................................................................................................................................... 13

   Lossy Failure Process .................................................................................................................................... 13

       AutoDatabaseMountDial.......................................................................................................................... 13

   Incremental Resync...................................................................................................................................... 14

   Database Activation Coordination .............................................................................................................. 15

Transport Dumpster ......................................................................................................................................... 17

Reference .......................................................................................................................................................... 19
Exchange Server 2010 - Database Availability Group

Let’s Begin DAG
        DAG is one of the major enhancements in exchange 2010. LCR, CCR and SCR in exchange 2007
are dropped in Exchange 2010 and DAG is introduced as a single HA solution. Exchange “14” uses the
same Continuous Replication technology found into Exchange Server 2007, but unites on-site (CCR) and
off-site (SCR) data replication into a one framework. Exchange server manages all aspects of failover and
no windows clustering knowledge is required as DAG configure clustering by it. DAG can have as many as
15 copies (up to 16 Nodes) of Databases compared to the two node CCR cluster. DAG makes failover more
granular, database-level rather than server level. So a failure of a database running on DAG won’t result in
entire server failover which affect the users in the other Databases on the server. A server that is a part of
DAG can still hold other server roles. This reduces the minimum number of servers required to build a
redundant exchange environment to two. DAG can easily be stretch across sites to provide site resilience
from a disaster. In CCR the passive server is in an inactive condition where as in DAG databases can be
distributed among the nodes.


Some Clustering Basics
Before we begin DAG it will be better to have a look into some clustering technologies, which will help
you to understand DAG very quickly.


The concept of a cluster involves taking two or more computers and organizing them to work together to
provide higher availability, reliability and scalability than can be obtained by using a single system. When
failure occurs in a cluster, resources can be redirected and the workload can be redistributed. A Server
cluster provides high availability by making application software and data available on several servers
linked together in a cluster configuration. If one server stops functioning, a process called failover
automatically shifts the workload of the failed server to another server in the cluster. The failover process
is designed to ensure continuous availability of critical applications and data.
There are mainly three types of clustering in Windows Server.


Network Load Balancing provides failover support for IP-based applications and services that require
high scalability and availability. With Network Load Balancing (NLB), organizations can build groups of
clustered computers to support load balancing of Transmission Control Protocol (TCP), User Datagram
Protocol (UDP) and Generic Routing Encapsulation (GRE) traffic requests. Web-tier and front-end
services are ideal candidates for NLB.


Component Load Balancing, which is a feature of Microsoft Application Center 2000, provides dynamic
load balancing of middle-tier application components that use COM+. With Component Load Balancing
(CLB), COM+ components can be load balanced over multiple nodes to dramatically enhance the
availability and scalability of software applications.


Server cluster provides failover support for applications and services that require high availability,
scalability and reliability. With clustering, organizations can make applications and data available on
multiple servers linked together in a cluster configuration. Back-end applications and services, such as
those provided by database servers, are ideal candidates for Server cluster. Some of the components of
Server clusters are discussed below,


Quorum
A quorum is the cluster’s configuration database, it tells the cluster which node should be active.


        Standard quorum: It is a configuration database for the cluster and is stored on a shared hard
disk, accessible to all of the cluster’s nodes.


The other thing that the quorum does is to intervene when communications fail between nodes.
Normally, each node within a cluster can communicate with every other node in the cluster over a
dedicated network connection. If this network connection were to fail though, the cluster would be split
into two pieces, each containing one or more functional nodes that cannot communicate with the nodes
that exist on the other side of the communications failure.


When this type of communications failure occurs, the cluster is said to have been partitioned. The
problem is that both partitions have the same goal; to keep the application running. The application can’t
be run on multiple servers simultaneously though, so there must be a way of determining which partition
gets to run the application. This is where the quorum comes in. The partition that “owns” the quorum is
allowed to continue running the application. The other partition is removed from the cluster.


          Majority Node Set (MNS) Quorum: The Main difference between a Standard Quorum and a
MNS quorum is that that in MNS each node has its own, locally stored copy of the quorum database. The
other way that a MNS quorum depends on majorities is in starting the nodes. A majority of the nodes
((number of nodes /2) +1) must be online before the cluster will start the virtual server. If fewer than the
majority of nodes are online, then the cluster is said to “not have quorum”. In such a case, the necessary
services will keep restarting until a sufficient number of nodes are present.


One of the most important things about MNS is that you must have at least three nodes in the cluster.
Remember that a majority of nodes must be running at all times. If a cluster only has two nodes, then the
majority is calculated to be 2 ((2 nodes / 2) +1)-2. Therefore, if one node were to fail, the entire cluster
would go down because it would not have quorum.


File share witness
The file share witness feature is an improvement to Majority Node Set (MNS) quorum model. This feature
lets you use a file share that is external to the cluster as an additional "vote" to determine the status of the
cluster        in        a         two-node          MNS          quorum           cluster        deployment.


Consider a two-node MNS quorum cluster. Because an MNS quorum cluster can only run when the
majority of the cluster nodes are available, a two-node MNS quorum cluster is unable to sustain the failure
of any cluster node. This is because the majority of a two-node cluster is two. To sustain the failure of any
one node in an MNS quorum cluster, you must have at least three devices that can be considered as
available. The file share witness feature enables you to use an external file share as a witness. This witness
acts as the third available device in a two-node MNS quorum cluster. Therefore, with this feature enabled,
a two-node MNS quorum cluster can sustain the failure of a single cluster node. Additionally, the file
share witness feature provides the following two functionalities: It helps protect the cluster against a
problem that is known as a split brain (a condition that occurs when all networks fail). It helps protect the
cluster against a problem that is known as a partition in time.
Fundamental of DAG




DAG

        A database availability group (DAG) is the base component of the high availability and site
resilience framework built into Microsoft Exchange Server 2010. A DAG is a group of up to 16 Mailbox
servers that host a set of databases and provide automatic database-level recovery from failures that affect
individual servers or databases.


A DAG is a boundary for mailbox database replication, database and server switchovers, and failovers, and
for an internal component called Active Manager. Active Manager is an Exchange 2010 component which
manages switchovers and failovers that runs on every server in a DAG.


        What DAG changes

    1. No more Exchange Virtual Servers/Clustered Mailbox Servers.
    2. Database is no longer associated to a Server but is an Organization Level resource.
3. There is no longer a requirement to choose Cluster or Non Cluster at installation, an Exchange
       2010 server can move in and out of a DAG as needed.
   4. The limitation of only hosting the mailbox role on a clustered Exchange server.
   5. Storage Groups have been removed from Exchange.


Server A server is a unit of membership for a DAG. A server hosts active and passive copies of Multiple
Mailbox Databases and execute various services on Exchange Mailbox Database like Information Store,
Mailbox Assistance etc. A server is also responsible for the execution of replication service on passive
mailbox database copies. Server provides connection point between Information Store and RPC Client
Access. It defines very few server-level properties relevant to High Availability (HA) like Server’s DAG
and Activation Policy.


Mailbox Database

       A database is a unit of Failover in a DAG. A database has only one active copy, it can be mounted
or dismounted. A Mailbox database can have as many as 15 passive copies depending on the number of
Mailbox Servers available. Ideally it takes only about 30 seconds for database failover. Server
failover/switchover involves moving all active databases to one or more other servers. Database names are
unique across a forest. Mailbox Database defines properties like GUID, EDB file path and Name of servers
hosting copies.


       Mailbox Availability Terms


         Active Mailbox: Provide mail services to the clients.


         Passive Mailbox: Available to provide mail services to the clients if active copy fails.


         Source Mailbox: Provides data for copying to a separate location


         Target Mailbox: Receives data from the source
Mailbox Database Copy

It defines the Scope of Database replication. A Database copy is either source or target of replication at any
given time. A copy is either active or passive at any given time. Only one copy of each database in a DAG
is active at a time. A server may not host one or no copy of any database.


Active Manager

For exchange server Active Directory is primary source for configuration information, whereas Active
Manager is primary source for changeable state information such as active and mounted.


Active Manager is an Exchange-aware resource manager known as high availability’s brain. AM run on
every server in the DAG and manages which copies should be active and which should be passive. It is
also definitive source of information on where a database is active or mounted and provides this
information to other Exchange components (e.g., RPC Client Access and Hub Transport). AM Information
is stored in cluster database.

In Exchange Server 2010, the Microsoft Exchange Replication service periodically monitors the health of
all mounted databases. In addition, it also monitors Extensible Storage Engine (ESE) for any I/O errors or
failures. When the service detects a failure, it notifies Active Manager. Active Manager then determines
which database copy should be mounted and what it required to mount that database. In addition, tracks
the active copy of a mailbox database (based on the last mounted copy of the database) and provides the
tracking results information to the RPC Client Access component on the Client Access server to which the
client is connected.

When an administrator makes a database copy the active mailbox database, this process is known as a
switchover. When a failure affecting a database occurs and a new database becomes the active copy, this
process is known as a failover. This process also refers to a server failure in which one or more servers
bring online the databases previously online on the failed server. When either a switchover or failover
occurs, other Exchange Server 2010 server roles become aware of the switchover almost immediately and
will redirect client and messaging traffic to the new active database.

For example, if an active database in a DAG fails because of an underlying storage failure, Active Manager
will automatically recover by failing over to a database copy on another Mailbox server in the DAG. In the
event the database is outside the automatic mount criteria and cannot be automatically mounted, an
administrator can manually perform a database failover.

        Primary Active Manager (PAM): PAM is the Active Manager in the DAG which decides which
copies will be active and passive. It moves to other servers if the server hosting it is no longer able to. You
need to move the PAM if you take a server offline for maintenance or upgrade. PAM is responsible for
getting topology change notifications and reacting to server failures. PAM is a role of an Active Manager.
If the server hosting the PAM fails, another instance of Active Manager adopts the role (the one that takes
ownership of the cluster group). The PAM controls all movement of the active designations between a
database’s copies (only one copy can be active at any given time, and that copy may be mounted or
dismounted). The PAM also performs the functions of the SAM role on the local system (detecting local
database and local Information Store failures).

        Standby Active Manager (SAM): SAM provides information on which server hosts the active
copy of a mailbox database to other components of Exchange, e.g., RPC Client Access Service or Hub
Transport. SAM detects failures of local databases and the local Information Store. It reacts to failures by
asking the PAM to initiate a failover (if the database is replicated). A SAM does not determine the target
of failover, nor does it update a database’s location state in the PAM. It will access the active database
copy location state to answer queries for the active copy of the database that it receives from CAS, Hub,
etc.

Active Manager Best Copy Selection

When a failure occurs that affects a replicated mailbox database, the PAM initiates failover logic and
selects the best available database copy for activation. PAM uses up to ten separate sets of criteria when
locating the best copy to activate. When a failure affecting the active database occurs, Active Manager uses
several sets of selection criteria to determine which database copy should be activated. Active Manager
attempts to locate a mailbox database copy that has a status of Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, or SeedingSource, then depending on the status of the content
indexing, Replay Queue Length, Copy Queue length it determines the best copy to activate.
Continuous Replication

Continuous Replication combines the asynchronous log shipping and replay technology. It includes the
following steps
          Database copy seeding of target
          Log copying from source to target
          Log inspection at target
          Log replay into database copy


Database Seeding

Seeding is the process of making available a baseline copy of a database on the passive nodes. Depending
on the situation, seeding can be an automatic process or a manual process in which you initiate the
seeding.


Automatic seeding: An automatic seed produces a copy of a database in the target location. Automatic
seeding requires that all log files, including the very first log file created by the database (it contains the
database creation log record), be available on the source. Automatic seeding only occurs during the
creation of a new server or creation of a new database (or if the first log still exists, i.e. log truncation
hasn’t                                                                                              occurred).
Seeding using the Update-MailboxDatabaseCopy cmdlet: You can use the Update-MailboxDatabaseCopy
cmdlet in the Exchange Management Shell to seed a database copy. This option utilizes the streaming copy
backup API to copy the database from the active location to the target location.


Manually copying the offline database: This process dismounts the database and copies the database file to
the same location on the passive node. If you use this method, there will be an interruption in service
because         the       procedure         requires    you       to       dismount        the       database.


Seeding is required under the following conditions:


          When a new passive node is introduced into a DAG environment and the first log file of the
          production Database is not available.
After a failover occurs in which data is lost as a result of the now passive copy having become
        diverged and unrecoverable.
        When the system has detected a corrupted log file that cannot be replayed into the passive copy.
        After an offline defragmentation of the database occurs.
        After a page scrubbing of the active copy of a database occurs, and you want to propagate the
        changes to the passive copy.
        After the log generation sequence for the Database group has been reset back to 1.

Log Shipping

Log shipping allows you to automatically send transaction log backups from a primary database on a
primary server instance to one or more secondary databases on separate secondary server instances. Log
shipping in Exchange Server 2010 leverages TCP sockets and supports encryption and compression.
Administrator can set TCP port to be used for replication.


Replication service on target notifies the active instance the next log file it expects based on last log file
which it inspected. Replication service on source responds by sending the required log file(s). Copied log
files are placed in the target’s Inspector directory.


Log Inspection

Log inspector is Responsible for verifying that the log files are valid. The following actions are performed
by LogInspector:


Physical integrity inspection This validation utilizes ESEUTIL /K against the log file and validates that
the checksum recorded in the log file matches the checksum generated in memory.


Header inspection The Replication service validates the following aspects of the log file’s header:


        The generation is not higher than the highest generation recorded for the database in question.
        The generation recorded in the log header matches the generation recorded in the log filename.
        The log file signature recorded in the log header matches that of the log file.


Removal of Exx.log Before the inspected log file can be moved into the log folder, the Replication service
needs to remove any Exx.log files. These log files are placed into another sub-directory of the log
directory, the ExxOutofDate directory. An Exx.log file would only exist on the target if it was previously
running as a source. The Exx.log file needs to be removed before log replay occurs because it will contain
old data which has been superseded by a full log file with the same generation. If the closed log file is not a
superset of the existing Exx.log file, then we will have to perform an incremental or full reseed


Log Replay

After the log files have been inspected, they are placed within the log directory so that they can be
replayed in the database copy. Before the Replication service replays the log files, it performs a series of
validation tests. Once these validation checks have been completed, the Replication service will replay the
log iteration.


Lossy Failure Process

In the event of failure, the following steps will occur for the failed database:


    1. Active Manager will determine the best copy to activate
    2. The Replication service on the target server will attempt to copy missing log files from the source
        – ACLL (Attempt To Copy Last Log)
    3. If successful (for example, because the server is online and the shares and necessary data are
        accessible), then the database will mount with zero data loss.
    4. If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial
        setting
    5. The mounted database will generate new log files (using the same log generation sequence)
    6. Transport Dumpster requests will be initiated for the mounted database to recover lost messages
    7. When original server or database recovers, it will run through divergence detection and perform
        an incremental reseed or require a full reseed

AutoDatabaseMountDial.
There are three possible values for the server setting AutoDatabaseMountDial.
        Lossless Lossless is zero logs lost. When the attribute is set to Lossless, under most circumstances
        the system waits for the failed node to come back online before databases are mounted. Even then
        the failed system must return with all logs accessible and not corrupted. After the failure, the
passive node is made active, and the Microsoft Exchange Information Store service is brought
       online. It checks to determine whether the databases can be mounted without any data loss. If
       possible, the databases are mounted. If they cannot be automatically mounted, the system
       periodically attempts to copy the logs. If the server returns with its logs intact, this attempt will
       eventually succeed, and the databases will mount. If the server returns without its logs intact, the
       remaining logs will not be available, and the affected databases will not mount automatically. In
       this event, administrative action is required to force the database to mount when logs are lost.
       Good availability Good availability is three logs lost. Good availability provides fully automatic
       recovery when replication is operating normally and replicating logs at the rate they are being
       generated.
       Best availability Best availability is six logs lost, which is the default setting. Best availability
       operates similarly to Good availability, but it allows automatic recovery when the replication
       experiences slightly more latency. Thus, the new active node might be slightly farther behind the
       state of the old active node after the failover, thereby increasing the likelihood that database
       divergence occurs, which requires a full reseed to correct.

Incremental Resync

In Exchange Server 2007, LLR (Lost Log Resilience) delayed writes to the active database to minimize
divergence between an old failed active and the new active, and thereby minimize the need to perform
for reseeds. Changes were written in the passive database before they were written in the active database.
When the old failed active came back, it was unlikely that it contained data that had never made it to the
passive before it failed. Only when it contained data that had never made it to the passive, did it have to
receive a full reseed when it came back online.


In Exchange Server 2010, we now have two incremental resync solutions. Incremental resync v1 is based
on LLR depth and is only used when the waypoint = 1 (i.e. we’ve only lost one log). Incremental resync
v2 is used when more than a single log is lost and has the following process:


   1. Active DB1 on server1 fails and is a lossy failure.
   2. Passive DB1 copy on Server3 takes over service.
   3. Sometime later, failed DB1 on Server1 comes back as passive, but contains inconsistent data.
4. Replication service on Server1 will compare the transaction logs on Server1 with Server3 starting
       with the newest generation and working backwards to locate the divergence point.
   5. Once the divergence point is located, the log records of the diverged logs on Server1 will be
       scanned and a list of page records will be built.
   6. The replication service will then copy over the corresponding page records and logs from Server3.
       In addition, the database header min/max required logs will also be obtained from the active db
       copy on Server3.
   7. Replication Service on Server1 will then revert the changes of diverged logs by inserting the
       correct pages from Server3.
   8. Server1’s copy’s db header will be updated with the appropriate min/max log generations.
   9. Log recovery is then run to get the db copy current.

Database Activation Coordination

DAC mode is used to control the activation behavior of a DAG when a catastrophic (disastrous or
extremely harmful) failure occurs that affects the DAG (for example, a complete failure of one of the
datacenters). When DAC mode isn't enabled, and a failure affecting multiple servers in the DAG occurs,
when a majority of servers are restored after the failure, the DAG will restart and attempt to mount
databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a
condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each
other. Split brain syndrome also occurs when network connectivity is severed between the datacenters.
Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of
DAGs with an even number of members, the DAG's witness server) to be available and interacting for the
DAG to be operational. When a majority of the members are communicating, the DAG is said to have a
quorum.


DAC is designed to prevent this by implementing a “mommy may I” (Datacenter Activation Coordination
Protocol (DACP)) protocol. In the event where there has been a catastrophic loss, when the DAG recovers
it cannot mount databases just because quorum is present in the DAG. Instead it must coordinate with the
other active managers in the DAG to determine state.
Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary
datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision
to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the
primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG
members in the primary datacenter will power up, but they won’t be able to communicate with the DAG
members in the activated standby datacenter. The primary datacenter should always contain the majority
of the DAG quorum voters, which means that when power is restored, even in the absence of WAN
connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter
have a majority and therefore have quorum. This is a problem because with quorum, these servers may be
able to mount their databases, which in turn would cause divergence from the actual active databases that
are now mounted in the activated standby datacenter.


DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells
the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a
DAG is running in DAC mode (which would be any DAG with three or more members), each time Active
Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC
mode, the server must try to communicate with all other members of the DAG that it knows to get
another DAG member to give it an answer as to whether it can mount local databases that are assigned as
active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If
another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the
server starting up sets its bit to 1 and mounts its databases.


But when you recover from a primary datacenter power outage where the servers are recovered but WAN
connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP
bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will
mount databases, because none of them can communicate with a DAG member that has a DACP bit value
of 1.
Transport Dumpster
You may be aware that Transport Dumpster is part of HUB server. Although it is part of HUB server it
works in conjunction with DAG. So it will be better to discuss the functionality of Transport Dumpster
while discussing about DAG. So let’s see how


Transport dumpster is a feature designed to minimize data loss by redelivering recently submitted
messages back to the mailbox server after a lossy failure.


Improvements in Transport Dumpster
In Exchange 2007, messages were retained in the transport dumpster until the administrator-defined time
limit or size limit is reached. In Exchange 2010, the transport dumpster now receives feedback from the
replication pipeline to determine which messages have been delivered and replicated. As a message goes
through Hub Transport servers on its way to a replicated mailbox database in a DAG, a copy is kept in the
transport queue (mail.que) until the replication pipeline has notified the Hub Transport server that the
transaction logs representing the message have been successfully replicated to and inspected by all copies
of the mailbox database. After the logs have been replicated to and inspected by all database copies, they
are truncated from the transport dumpster. This keeps the transport dumpster queue smaller by
maintaining only copies of messages whose transactions logs haven't yet been replicated.


The transport dumpster has also been enhanced to account for the changes to the Mailbox server role that
enable a single mailbox database to move between Active Directory sites. DAGs can be extended to
multiple Active Directory sites, and as a result, a single mailbox database in one Active Directory site can
fail over to another Active Directory site. When this occurs, any transport dumpster redelivery requests
will be sent to both Active Directory sites: the original site and the new site.


Whenever a Hub Transport server receives a message, it undergoes categorization. Part of the
categorization process involves querying Active Directory to determine if the destination Database
containing the recipient’s mailbox is enabled DAG. Once the message has been delivered to all recipients,
the message is committed to the mail.que file on the Hub Transport server and stored in the transport
dumpster inside the mail.que file. The transport dumpster is available for each Database within each
Active Directory site that has DAG enabled. There are two settings that define the life of a message within
the transport dumpster. They are:


         MaxDumpsterSizePerDatabase: The         MaxDumpsterSizePerDatabase         parameter    specifies    the
         maximum size of the transport dumpster on a Hub Transport server for each database. The default
         value is 18 MB. The valid input range for this parameter is from 0 through 2147483647 KB. The
         recommendation is that this be set to 1.5 times the maximum message size limit within your
         environment. If you do not have a maximum message size limit set, then you should evaluate the
         messages that are delivered within your environment and set the value to 1.5 times the average
         message size in your organization.


   When you enter a value, qualify the value with one of the following units:
         KB (kilobytes)
         MB (megabytes)
         GB (gigabytes)
         TB (terabytes)
Unqualified values are treated as kilobytes.


         MaxDumpsterTime Defines the length of time that a message remains within the transport
         dumpster if the dumpster size limit is not reached. The default is seven days.


If either the time or size limit is reached, messages are removed from the transport dumpster by order of
first in, first out.
When a failover (unscheduled outage) occurs, the Replication service will attempt to copy the missing log
files. If the copy attempt fails, then this is known as a lossy failover and the following steps are taken.


    1. If the databases are within the AutoDatabaseMountDial value, they will automatically mount.
    2. The Replication service will record that the Database requires Transport Dumpster redelivery in
         the cluster database by setting the DumpsterRedeliveryRequired key to true.
    3. The Replication service will record the Hub Transport servers that exist within the clustered
         mailbox server’s Active Directory site in the cluster database
4. The Replication service will calculate the loss window. This is done using the LastLogInspected
       marker as the start time and the current time as the end time. Since the transport dumpster is
       based on message delivery times, we generously pad the loss window by expanding it 12 hours
       back and 4 hours forward. The start time is recorded in DumpsterRedeliveryStartTime and the end
       time is recorded in DumpsterRedeliveryEndTime.
   5. The Replication service makes an RPC call to the Hub Transport servers listed in
       DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window.
   6. The Hub Transport server will acknowledge the first redelivery request with a Retry response.
   7. The Hub Transport server will redeliver the messages it has within its transport dumpster for the
       allotted time window. Once the message is resubmitted for delivery, the message is removed from
       the transport dumpster.
   8. The Replication service makes an RPC call to the Hub Transport servers listed in
       DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window.
   9. The Hub Transport servers that have successfully redelivered the dumpster messages will
       acknowledge the redelivery request with a Success response. At this point the Replication service
       will remove those Hub Transport servers from the DumpsterRedeliveryServers key.
   10. This process will continue until either all Hub Transport servers have redelivered the mail, or the
       MaxDumpsterTime has been reached.


Note: If there are no message size limits in your organization, a single 18 MB message will purge all other
messages for given Database on a given Hub Transport server.

Reference
Understanding Database Availability Groups


Understanding Active Manager


Understanding Mailbox Database Copies


Understanding the Exchange Information Store


White Paper: Continuous Replication Deep Dive


Understanding How Cluster Quorums Work

Contenu connexe

Tendances

Windows clustering and quorum basics
Windows clustering and quorum basicsWindows clustering and quorum basics
Windows clustering and quorum basicsHarsh Chawla
 
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011Michael Noel
 
DFWUUG -- MySQL InnoDB Cluster & Group Replciation
DFWUUG -- MySQL InnoDB Cluster & Group ReplciationDFWUUG -- MySQL InnoDB Cluster & Group Replciation
DFWUUG -- MySQL InnoDB Cluster & Group ReplciationDave Stokes
 
Percona Cluster with Master_Slave for Disaster Recovery
Percona Cluster with Master_Slave for Disaster RecoveryPercona Cluster with Master_Slave for Disaster Recovery
Percona Cluster with Master_Slave for Disaster RecoveryRam Gautam
 
Ch05 high availability
Ch05 high availabilityCh05 high availability
Ch05 high availabilityShane Flooks
 
googlecluster-ieee
googlecluster-ieeegooglecluster-ieee
googlecluster-ieeeHiroshi Ono
 
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011Michael Noel
 
Breda Development Meetup 2016-06-08 - High Availability
Breda Development Meetup 2016-06-08 - High AvailabilityBreda Development Meetup 2016-06-08 - High Availability
Breda Development Meetup 2016-06-08 - High AvailabilityBas Peters
 
Ch06 edge transport
Ch06 edge transportCh06 edge transport
Ch06 edge transportShane Flooks
 
Dell PowerEdge M520 server solution: Energy efficiency and database performance
Dell PowerEdge M520 server solution: Energy efficiency and database performanceDell PowerEdge M520 server solution: Energy efficiency and database performance
Dell PowerEdge M520 server solution: Energy efficiency and database performancePrincipled Technologies
 
Virtualizing Microsoft SQL Server 2008 with Citrix XenServer
Virtualizing Microsoft SQL Server 2008 with Citrix XenServerVirtualizing Microsoft SQL Server 2008 with Citrix XenServer
Virtualizing Microsoft SQL Server 2008 with Citrix XenServerwebhostingguy
 
#VirtualDesignMaster 3 Challenge 2 – James Brown
#VirtualDesignMaster 3 Challenge 2 – James Brown#VirtualDesignMaster 3 Challenge 2 – James Brown
#VirtualDesignMaster 3 Challenge 2 – James Brownvdmchallenge
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
LXF #102 - Linux Virtual Server
LXF #102 - Linux Virtual Server LXF #102 - Linux Virtual Server
LXF #102 - Linux Virtual Server guest69bec2
 

Tendances (17)

Windows clustering and quorum basics
Windows clustering and quorum basicsWindows clustering and quorum basics
Windows clustering and quorum basics
 
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011
Building the Perfect SharePoint 2010 Farm - TechEd Australia 2011
 
Nomad and WAN caching appliances 1.6
Nomad and WAN caching appliances 1.6Nomad and WAN caching appliances 1.6
Nomad and WAN caching appliances 1.6
 
DFWUUG -- MySQL InnoDB Cluster & Group Replciation
DFWUUG -- MySQL InnoDB Cluster & Group ReplciationDFWUUG -- MySQL InnoDB Cluster & Group Replciation
DFWUUG -- MySQL InnoDB Cluster & Group Replciation
 
Percona Cluster with Master_Slave for Disaster Recovery
Percona Cluster with Master_Slave for Disaster RecoveryPercona Cluster with Master_Slave for Disaster Recovery
Percona Cluster with Master_Slave for Disaster Recovery
 
Ch05 high availability
Ch05 high availabilityCh05 high availability
Ch05 high availability
 
googlecluster-ieee
googlecluster-ieeegooglecluster-ieee
googlecluster-ieee
 
Clustering and High Availability
Clustering and High Availability Clustering and High Availability
Clustering and High Availability
 
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011
Building the Perfect SharePoint 2010 Farm - SPS Brisbane 2011
 
Mysql wp memcached
Mysql wp memcachedMysql wp memcached
Mysql wp memcached
 
Breda Development Meetup 2016-06-08 - High Availability
Breda Development Meetup 2016-06-08 - High AvailabilityBreda Development Meetup 2016-06-08 - High Availability
Breda Development Meetup 2016-06-08 - High Availability
 
Ch06 edge transport
Ch06 edge transportCh06 edge transport
Ch06 edge transport
 
Dell PowerEdge M520 server solution: Energy efficiency and database performance
Dell PowerEdge M520 server solution: Energy efficiency and database performanceDell PowerEdge M520 server solution: Energy efficiency and database performance
Dell PowerEdge M520 server solution: Energy efficiency and database performance
 
Virtualizing Microsoft SQL Server 2008 with Citrix XenServer
Virtualizing Microsoft SQL Server 2008 with Citrix XenServerVirtualizing Microsoft SQL Server 2008 with Citrix XenServer
Virtualizing Microsoft SQL Server 2008 with Citrix XenServer
 
#VirtualDesignMaster 3 Challenge 2 – James Brown
#VirtualDesignMaster 3 Challenge 2 – James Brown#VirtualDesignMaster 3 Challenge 2 – James Brown
#VirtualDesignMaster 3 Challenge 2 – James Brown
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
LXF #102 - Linux Virtual Server
LXF #102 - Linux Virtual Server LXF #102 - Linux Virtual Server
LXF #102 - Linux Virtual Server
 

Similaire à DAG

Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013EMC
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability► Supreme Mandal ◄
 
Rha cluster suite wppdf
Rha cluster suite wppdfRha cluster suite wppdf
Rha cluster suite wppdfprojectmgmt456
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 
IBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM India Smarter Computing
 
IBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM India Smarter Computing
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Cloud Programming Simplified: A Berkeley View on Serverless Computing
Cloud Programming Simplified: A Berkeley View on Serverless ComputingCloud Programming Simplified: A Berkeley View on Serverless Computing
Cloud Programming Simplified: A Berkeley View on Serverless Computingmustafa sarac
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsAltoros
 
3 Ways To Accelerate Your Transformation to Cloud Provider
3 Ways To Accelerate Your Transformation to Cloud Provider3 Ways To Accelerate Your Transformation to Cloud Provider
3 Ways To Accelerate Your Transformation to Cloud ProviderJuniper Networks UKI
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureOdinot Stanislas
 
Cost effective failover clustering
Cost effective failover clusteringCost effective failover clustering
Cost effective failover clusteringeSAT Journals
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data storesJoão Gabriel Lima
 

Similaire à DAG (20)

Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013
 
Clustering overview2
Clustering overview2Clustering overview2
Clustering overview2
 
Cluster configuration best practices
Cluster configuration best practicesCluster configuration best practices
Cluster configuration best practices
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability
 
My sql
My sqlMy sql
My sql
 
Rha cluster suite wppdf
Rha cluster suite wppdfRha cluster suite wppdf
Rha cluster suite wppdf
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
Cluster quorums
Cluster quorumsCluster quorums
Cluster quorums
 
Cluster arch
Cluster archCluster arch
Cluster arch
 
IBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloud
 
IBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloudIBM SONAS Enterprise backup and remote replication solution in a private cloud
IBM SONAS Enterprise backup and remote replication solution in a private cloud
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Programming Simplified: A Berkeley View on Serverless Computing
Cloud Programming Simplified: A Berkeley View on Serverless ComputingCloud Programming Simplified: A Berkeley View on Serverless Computing
Cloud Programming Simplified: A Berkeley View on Serverless Computing
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
 
3 Ways To Accelerate Your Transformation to Cloud Provider
3 Ways To Accelerate Your Transformation to Cloud Provider3 Ways To Accelerate Your Transformation to Cloud Provider
3 Ways To Accelerate Your Transformation to Cloud Provider
 
Configuration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® ArchitectureConfiguration and Deployment Guide For Memcached on Intel® Architecture
Configuration and Deployment Guide For Memcached on Intel® Architecture
 
Cost effective failover clustering
Cost effective failover clusteringCost effective failover clustering
Cost effective failover clustering
 
Cost effective failover clustering
Cost effective failover clusteringCost effective failover clustering
Cost effective failover clustering
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data stores
 

Dernier

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

DAG

  • 1. HCL Infosystem Database Availability Group Fundamentals of DAG Sarath Kumar M 3/4/2010
  • 2. Contents Exchange Server 2010 - Database Availability Group ...................................................................................... 4 Let’s Begin DAG ................................................................................................................................................. 4 Some Clustering Basics ....................................................................................................................................... 4 Network Load Balancing ................................................................................................................................ 5 Component Load Balancing ........................................................................................................................... 5 Server cluster .................................................................................................................................................. 5 Quorum ....................................................................................................................................................... 5 Standard quorum ........................................................................................................................................ 5 Majority Node Set (MNS) Quorum: .......................................................................................................... 6 File share witness ....................................................................................................................................... 6 Fundamental of DAG ......................................................................................................................................... 7 DAG ................................................................................................................................................................ 7 Mailbox Database ........................................................................................................................................... 8 Mailbox Database Copy .................................................................................................................................. 9 Active Manager .............................................................................................................................................. 9 Primary Active Manager (PAM): ............................................................................................................ 10 Standby Active Manager (SAM): ............................................................................................................. 10 Active Manager Best Copy Selection....................................................................................................... 10 Continuous Replication................................................................................................................................ 11 Database Seeding .......................................................................................................................................... 11 Log Shipping ................................................................................................................................................. 12
  • 3. Log Inspection .............................................................................................................................................. 12 Log Replay .................................................................................................................................................... 13 Lossy Failure Process .................................................................................................................................... 13 AutoDatabaseMountDial.......................................................................................................................... 13 Incremental Resync...................................................................................................................................... 14 Database Activation Coordination .............................................................................................................. 15 Transport Dumpster ......................................................................................................................................... 17 Reference .......................................................................................................................................................... 19
  • 4. Exchange Server 2010 - Database Availability Group Let’s Begin DAG DAG is one of the major enhancements in exchange 2010. LCR, CCR and SCR in exchange 2007 are dropped in Exchange 2010 and DAG is introduced as a single HA solution. Exchange “14” uses the same Continuous Replication technology found into Exchange Server 2007, but unites on-site (CCR) and off-site (SCR) data replication into a one framework. Exchange server manages all aspects of failover and no windows clustering knowledge is required as DAG configure clustering by it. DAG can have as many as 15 copies (up to 16 Nodes) of Databases compared to the two node CCR cluster. DAG makes failover more granular, database-level rather than server level. So a failure of a database running on DAG won’t result in entire server failover which affect the users in the other Databases on the server. A server that is a part of DAG can still hold other server roles. This reduces the minimum number of servers required to build a redundant exchange environment to two. DAG can easily be stretch across sites to provide site resilience from a disaster. In CCR the passive server is in an inactive condition where as in DAG databases can be distributed among the nodes. Some Clustering Basics Before we begin DAG it will be better to have a look into some clustering technologies, which will help you to understand DAG very quickly. The concept of a cluster involves taking two or more computers and organizing them to work together to provide higher availability, reliability and scalability than can be obtained by using a single system. When failure occurs in a cluster, resources can be redirected and the workload can be redistributed. A Server cluster provides high availability by making application software and data available on several servers linked together in a cluster configuration. If one server stops functioning, a process called failover automatically shifts the workload of the failed server to another server in the cluster. The failover process is designed to ensure continuous availability of critical applications and data.
  • 5. There are mainly three types of clustering in Windows Server. Network Load Balancing provides failover support for IP-based applications and services that require high scalability and availability. With Network Load Balancing (NLB), organizations can build groups of clustered computers to support load balancing of Transmission Control Protocol (TCP), User Datagram Protocol (UDP) and Generic Routing Encapsulation (GRE) traffic requests. Web-tier and front-end services are ideal candidates for NLB. Component Load Balancing, which is a feature of Microsoft Application Center 2000, provides dynamic load balancing of middle-tier application components that use COM+. With Component Load Balancing (CLB), COM+ components can be load balanced over multiple nodes to dramatically enhance the availability and scalability of software applications. Server cluster provides failover support for applications and services that require high availability, scalability and reliability. With clustering, organizations can make applications and data available on multiple servers linked together in a cluster configuration. Back-end applications and services, such as those provided by database servers, are ideal candidates for Server cluster. Some of the components of Server clusters are discussed below, Quorum A quorum is the cluster’s configuration database, it tells the cluster which node should be active. Standard quorum: It is a configuration database for the cluster and is stored on a shared hard disk, accessible to all of the cluster’s nodes. The other thing that the quorum does is to intervene when communications fail between nodes. Normally, each node within a cluster can communicate with every other node in the cluster over a dedicated network connection. If this network connection were to fail though, the cluster would be split into two pieces, each containing one or more functional nodes that cannot communicate with the nodes that exist on the other side of the communications failure. When this type of communications failure occurs, the cluster is said to have been partitioned. The problem is that both partitions have the same goal; to keep the application running. The application can’t
  • 6. be run on multiple servers simultaneously though, so there must be a way of determining which partition gets to run the application. This is where the quorum comes in. The partition that “owns” the quorum is allowed to continue running the application. The other partition is removed from the cluster. Majority Node Set (MNS) Quorum: The Main difference between a Standard Quorum and a MNS quorum is that that in MNS each node has its own, locally stored copy of the quorum database. The other way that a MNS quorum depends on majorities is in starting the nodes. A majority of the nodes ((number of nodes /2) +1) must be online before the cluster will start the virtual server. If fewer than the majority of nodes are online, then the cluster is said to “not have quorum”. In such a case, the necessary services will keep restarting until a sufficient number of nodes are present. One of the most important things about MNS is that you must have at least three nodes in the cluster. Remember that a majority of nodes must be running at all times. If a cluster only has two nodes, then the majority is calculated to be 2 ((2 nodes / 2) +1)-2. Therefore, if one node were to fail, the entire cluster would go down because it would not have quorum. File share witness The file share witness feature is an improvement to Majority Node Set (MNS) quorum model. This feature lets you use a file share that is external to the cluster as an additional "vote" to determine the status of the cluster in a two-node MNS quorum cluster deployment. Consider a two-node MNS quorum cluster. Because an MNS quorum cluster can only run when the majority of the cluster nodes are available, a two-node MNS quorum cluster is unable to sustain the failure of any cluster node. This is because the majority of a two-node cluster is two. To sustain the failure of any one node in an MNS quorum cluster, you must have at least three devices that can be considered as available. The file share witness feature enables you to use an external file share as a witness. This witness acts as the third available device in a two-node MNS quorum cluster. Therefore, with this feature enabled, a two-node MNS quorum cluster can sustain the failure of a single cluster node. Additionally, the file share witness feature provides the following two functionalities: It helps protect the cluster against a problem that is known as a split brain (a condition that occurs when all networks fail). It helps protect the cluster against a problem that is known as a partition in time.
  • 7. Fundamental of DAG DAG A database availability group (DAG) is the base component of the high availability and site resilience framework built into Microsoft Exchange Server 2010. A DAG is a group of up to 16 Mailbox servers that host a set of databases and provide automatic database-level recovery from failures that affect individual servers or databases. A DAG is a boundary for mailbox database replication, database and server switchovers, and failovers, and for an internal component called Active Manager. Active Manager is an Exchange 2010 component which manages switchovers and failovers that runs on every server in a DAG. What DAG changes 1. No more Exchange Virtual Servers/Clustered Mailbox Servers. 2. Database is no longer associated to a Server but is an Organization Level resource.
  • 8. 3. There is no longer a requirement to choose Cluster or Non Cluster at installation, an Exchange 2010 server can move in and out of a DAG as needed. 4. The limitation of only hosting the mailbox role on a clustered Exchange server. 5. Storage Groups have been removed from Exchange. Server A server is a unit of membership for a DAG. A server hosts active and passive copies of Multiple Mailbox Databases and execute various services on Exchange Mailbox Database like Information Store, Mailbox Assistance etc. A server is also responsible for the execution of replication service on passive mailbox database copies. Server provides connection point between Information Store and RPC Client Access. It defines very few server-level properties relevant to High Availability (HA) like Server’s DAG and Activation Policy. Mailbox Database A database is a unit of Failover in a DAG. A database has only one active copy, it can be mounted or dismounted. A Mailbox database can have as many as 15 passive copies depending on the number of Mailbox Servers available. Ideally it takes only about 30 seconds for database failover. Server failover/switchover involves moving all active databases to one or more other servers. Database names are unique across a forest. Mailbox Database defines properties like GUID, EDB file path and Name of servers hosting copies. Mailbox Availability Terms Active Mailbox: Provide mail services to the clients. Passive Mailbox: Available to provide mail services to the clients if active copy fails. Source Mailbox: Provides data for copying to a separate location Target Mailbox: Receives data from the source
  • 9. Mailbox Database Copy It defines the Scope of Database replication. A Database copy is either source or target of replication at any given time. A copy is either active or passive at any given time. Only one copy of each database in a DAG is active at a time. A server may not host one or no copy of any database. Active Manager For exchange server Active Directory is primary source for configuration information, whereas Active Manager is primary source for changeable state information such as active and mounted. Active Manager is an Exchange-aware resource manager known as high availability’s brain. AM run on every server in the DAG and manages which copies should be active and which should be passive. It is also definitive source of information on where a database is active or mounted and provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport). AM Information is stored in cluster database. In Exchange Server 2010, the Microsoft Exchange Replication service periodically monitors the health of all mounted databases. In addition, it also monitors Extensible Storage Engine (ESE) for any I/O errors or failures. When the service detects a failure, it notifies Active Manager. Active Manager then determines which database copy should be mounted and what it required to mount that database. In addition, tracks the active copy of a mailbox database (based on the last mounted copy of the database) and provides the tracking results information to the RPC Client Access component on the Client Access server to which the client is connected. When an administrator makes a database copy the active mailbox database, this process is known as a switchover. When a failure affecting a database occurs and a new database becomes the active copy, this process is known as a failover. This process also refers to a server failure in which one or more servers bring online the databases previously online on the failed server. When either a switchover or failover occurs, other Exchange Server 2010 server roles become aware of the switchover almost immediately and will redirect client and messaging traffic to the new active database. For example, if an active database in a DAG fails because of an underlying storage failure, Active Manager will automatically recover by failing over to a database copy on another Mailbox server in the DAG. In the
  • 10. event the database is outside the automatic mount criteria and cannot be automatically mounted, an administrator can manually perform a database failover. Primary Active Manager (PAM): PAM is the Active Manager in the DAG which decides which copies will be active and passive. It moves to other servers if the server hosting it is no longer able to. You need to move the PAM if you take a server offline for maintenance or upgrade. PAM is responsible for getting topology change notifications and reacting to server failures. PAM is a role of an Active Manager. If the server hosting the PAM fails, another instance of Active Manager adopts the role (the one that takes ownership of the cluster group). The PAM controls all movement of the active designations between a database’s copies (only one copy can be active at any given time, and that copy may be mounted or dismounted). The PAM also performs the functions of the SAM role on the local system (detecting local database and local Information Store failures). Standby Active Manager (SAM): SAM provides information on which server hosts the active copy of a mailbox database to other components of Exchange, e.g., RPC Client Access Service or Hub Transport. SAM detects failures of local databases and the local Information Store. It reacts to failures by asking the PAM to initiate a failover (if the database is replicated). A SAM does not determine the target of failover, nor does it update a database’s location state in the PAM. It will access the active database copy location state to answer queries for the active copy of the database that it receives from CAS, Hub, etc. Active Manager Best Copy Selection When a failure occurs that affects a replicated mailbox database, the PAM initiates failover logic and selects the best available database copy for activation. PAM uses up to ten separate sets of criteria when locating the best copy to activate. When a failure affecting the active database occurs, Active Manager uses several sets of selection criteria to determine which database copy should be activated. Active Manager attempts to locate a mailbox database copy that has a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource, then depending on the status of the content indexing, Replay Queue Length, Copy Queue length it determines the best copy to activate.
  • 11. Continuous Replication Continuous Replication combines the asynchronous log shipping and replay technology. It includes the following steps Database copy seeding of target Log copying from source to target Log inspection at target Log replay into database copy Database Seeding Seeding is the process of making available a baseline copy of a database on the passive nodes. Depending on the situation, seeding can be an automatic process or a manual process in which you initiate the seeding. Automatic seeding: An automatic seed produces a copy of a database in the target location. Automatic seeding requires that all log files, including the very first log file created by the database (it contains the database creation log record), be available on the source. Automatic seeding only occurs during the creation of a new server or creation of a new database (or if the first log still exists, i.e. log truncation hasn’t occurred). Seeding using the Update-MailboxDatabaseCopy cmdlet: You can use the Update-MailboxDatabaseCopy cmdlet in the Exchange Management Shell to seed a database copy. This option utilizes the streaming copy backup API to copy the database from the active location to the target location. Manually copying the offline database: This process dismounts the database and copies the database file to the same location on the passive node. If you use this method, there will be an interruption in service because the procedure requires you to dismount the database. Seeding is required under the following conditions: When a new passive node is introduced into a DAG environment and the first log file of the production Database is not available.
  • 12. After a failover occurs in which data is lost as a result of the now passive copy having become diverged and unrecoverable. When the system has detected a corrupted log file that cannot be replayed into the passive copy. After an offline defragmentation of the database occurs. After a page scrubbing of the active copy of a database occurs, and you want to propagate the changes to the passive copy. After the log generation sequence for the Database group has been reset back to 1. Log Shipping Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances. Log shipping in Exchange Server 2010 leverages TCP sockets and supports encryption and compression. Administrator can set TCP port to be used for replication. Replication service on target notifies the active instance the next log file it expects based on last log file which it inspected. Replication service on source responds by sending the required log file(s). Copied log files are placed in the target’s Inspector directory. Log Inspection Log inspector is Responsible for verifying that the log files are valid. The following actions are performed by LogInspector: Physical integrity inspection This validation utilizes ESEUTIL /K against the log file and validates that the checksum recorded in the log file matches the checksum generated in memory. Header inspection The Replication service validates the following aspects of the log file’s header: The generation is not higher than the highest generation recorded for the database in question. The generation recorded in the log header matches the generation recorded in the log filename. The log file signature recorded in the log header matches that of the log file. Removal of Exx.log Before the inspected log file can be moved into the log folder, the Replication service needs to remove any Exx.log files. These log files are placed into another sub-directory of the log
  • 13. directory, the ExxOutofDate directory. An Exx.log file would only exist on the target if it was previously running as a source. The Exx.log file needs to be removed before log replay occurs because it will contain old data which has been superseded by a full log file with the same generation. If the closed log file is not a superset of the existing Exx.log file, then we will have to perform an incremental or full reseed Log Replay After the log files have been inspected, they are placed within the log directory so that they can be replayed in the database copy. Before the Replication service replays the log files, it performs a series of validation tests. Once these validation checks have been completed, the Replication service will replay the log iteration. Lossy Failure Process In the event of failure, the following steps will occur for the failed database: 1. Active Manager will determine the best copy to activate 2. The Replication service on the target server will attempt to copy missing log files from the source – ACLL (Attempt To Copy Last Log) 3. If successful (for example, because the server is online and the shares and necessary data are accessible), then the database will mount with zero data loss. 4. If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting 5. The mounted database will generate new log files (using the same log generation sequence) 6. Transport Dumpster requests will be initiated for the mounted database to recover lost messages 7. When original server or database recovers, it will run through divergence detection and perform an incremental reseed or require a full reseed AutoDatabaseMountDial. There are three possible values for the server setting AutoDatabaseMountDial. Lossless Lossless is zero logs lost. When the attribute is set to Lossless, under most circumstances the system waits for the failed node to come back online before databases are mounted. Even then the failed system must return with all logs accessible and not corrupted. After the failure, the
  • 14. passive node is made active, and the Microsoft Exchange Information Store service is brought online. It checks to determine whether the databases can be mounted without any data loss. If possible, the databases are mounted. If they cannot be automatically mounted, the system periodically attempts to copy the logs. If the server returns with its logs intact, this attempt will eventually succeed, and the databases will mount. If the server returns without its logs intact, the remaining logs will not be available, and the affected databases will not mount automatically. In this event, administrative action is required to force the database to mount when logs are lost. Good availability Good availability is three logs lost. Good availability provides fully automatic recovery when replication is operating normally and replicating logs at the rate they are being generated. Best availability Best availability is six logs lost, which is the default setting. Best availability operates similarly to Good availability, but it allows automatic recovery when the replication experiences slightly more latency. Thus, the new active node might be slightly farther behind the state of the old active node after the failover, thereby increasing the likelihood that database divergence occurs, which requires a full reseed to correct. Incremental Resync In Exchange Server 2007, LLR (Lost Log Resilience) delayed writes to the active database to minimize divergence between an old failed active and the new active, and thereby minimize the need to perform for reseeds. Changes were written in the passive database before they were written in the active database. When the old failed active came back, it was unlikely that it contained data that had never made it to the passive before it failed. Only when it contained data that had never made it to the passive, did it have to receive a full reseed when it came back online. In Exchange Server 2010, we now have two incremental resync solutions. Incremental resync v1 is based on LLR depth and is only used when the waypoint = 1 (i.e. we’ve only lost one log). Incremental resync v2 is used when more than a single log is lost and has the following process: 1. Active DB1 on server1 fails and is a lossy failure. 2. Passive DB1 copy on Server3 takes over service. 3. Sometime later, failed DB1 on Server1 comes back as passive, but contains inconsistent data.
  • 15. 4. Replication service on Server1 will compare the transaction logs on Server1 with Server3 starting with the newest generation and working backwards to locate the divergence point. 5. Once the divergence point is located, the log records of the diverged logs on Server1 will be scanned and a list of page records will be built. 6. The replication service will then copy over the corresponding page records and logs from Server3. In addition, the database header min/max required logs will also be obtained from the active db copy on Server3. 7. Replication Service on Server1 will then revert the changes of diverged logs by inserting the correct pages from Server3. 8. Server1’s copy’s db header will be updated with the appropriate min/max log generations. 9. Log recovery is then run to get the db copy current. Database Activation Coordination DAC mode is used to control the activation behavior of a DAG when a catastrophic (disastrous or extremely harmful) failure occurs that affects the DAG (for example, a complete failure of one of the datacenters). When DAC mode isn't enabled, and a failure affecting multiple servers in the DAG occurs, when a majority of servers are restored after the failure, the DAG will restart and attempt to mount databases. In a multi-datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome also occurs when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. When a majority of the members are communicating, the DAG is said to have a quorum. DAC is designed to prevent this by implementing a “mommy may I” (Datacenter Activation Coordination Protocol (DACP)) protocol. In the event where there has been a catastrophic loss, when the DAG recovers it cannot mount databases just because quorum is present in the DAG. Instead it must coordinate with the other active managers in the DAG to determine state.
  • 16. Consider the two-datacenter scenario. Suppose there is a complete power failure in the primary datacenter. In this event, all of the servers and the WAN are down, so the organization makes the decision to activate the standby datacenter. In almost all such recovery scenarios, when power is restored to the primary datacenter, WAN connectivity is typically not immediately restored. This means that the DAG members in the primary datacenter will power up, but they won’t be able to communicate with the DAG members in the activated standby datacenter. The primary datacenter should always contain the majority of the DAG quorum voters, which means that when power is restored, even in the absence of WAN connectivity to the DAG members in the standby datacenter, the DAG members in the primary datacenter have a majority and therefore have quorum. This is a problem because with quorum, these servers may be able to mount their databases, which in turn would cause divergence from the actual active databases that are now mounted in the activated standby datacenter. DACP was created to address this issue. Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it's allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn't allowed to mount databases. Because it's in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 1, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases. But when you recover from a primary datacenter power outage where the servers are recovered but WAN connectivity has not been restored, all of the DAG members in the primary datacenter will have a DACP bit value of 0; and therefore none of the servers starting back up in the recovered primary datacenter will mount databases, because none of them can communicate with a DAG member that has a DACP bit value of 1.
  • 17. Transport Dumpster You may be aware that Transport Dumpster is part of HUB server. Although it is part of HUB server it works in conjunction with DAG. So it will be better to discuss the functionality of Transport Dumpster while discussing about DAG. So let’s see how Transport dumpster is a feature designed to minimize data loss by redelivering recently submitted messages back to the mailbox server after a lossy failure. Improvements in Transport Dumpster In Exchange 2007, messages were retained in the transport dumpster until the administrator-defined time limit or size limit is reached. In Exchange 2010, the transport dumpster now receives feedback from the replication pipeline to determine which messages have been delivered and replicated. As a message goes through Hub Transport servers on its way to a replicated mailbox database in a DAG, a copy is kept in the transport queue (mail.que) until the replication pipeline has notified the Hub Transport server that the transaction logs representing the message have been successfully replicated to and inspected by all copies of the mailbox database. After the logs have been replicated to and inspected by all database copies, they are truncated from the transport dumpster. This keeps the transport dumpster queue smaller by maintaining only copies of messages whose transactions logs haven't yet been replicated. The transport dumpster has also been enhanced to account for the changes to the Mailbox server role that enable a single mailbox database to move between Active Directory sites. DAGs can be extended to multiple Active Directory sites, and as a result, a single mailbox database in one Active Directory site can fail over to another Active Directory site. When this occurs, any transport dumpster redelivery requests will be sent to both Active Directory sites: the original site and the new site. Whenever a Hub Transport server receives a message, it undergoes categorization. Part of the categorization process involves querying Active Directory to determine if the destination Database containing the recipient’s mailbox is enabled DAG. Once the message has been delivered to all recipients, the message is committed to the mail.que file on the Hub Transport server and stored in the transport dumpster inside the mail.que file. The transport dumpster is available for each Database within each
  • 18. Active Directory site that has DAG enabled. There are two settings that define the life of a message within the transport dumpster. They are: MaxDumpsterSizePerDatabase: The MaxDumpsterSizePerDatabase parameter specifies the maximum size of the transport dumpster on a Hub Transport server for each database. The default value is 18 MB. The valid input range for this parameter is from 0 through 2147483647 KB. The recommendation is that this be set to 1.5 times the maximum message size limit within your environment. If you do not have a maximum message size limit set, then you should evaluate the messages that are delivered within your environment and set the value to 1.5 times the average message size in your organization. When you enter a value, qualify the value with one of the following units: KB (kilobytes) MB (megabytes) GB (gigabytes) TB (terabytes) Unqualified values are treated as kilobytes. MaxDumpsterTime Defines the length of time that a message remains within the transport dumpster if the dumpster size limit is not reached. The default is seven days. If either the time or size limit is reached, messages are removed from the transport dumpster by order of first in, first out. When a failover (unscheduled outage) occurs, the Replication service will attempt to copy the missing log files. If the copy attempt fails, then this is known as a lossy failover and the following steps are taken. 1. If the databases are within the AutoDatabaseMountDial value, they will automatically mount. 2. The Replication service will record that the Database requires Transport Dumpster redelivery in the cluster database by setting the DumpsterRedeliveryRequired key to true. 3. The Replication service will record the Hub Transport servers that exist within the clustered mailbox server’s Active Directory site in the cluster database
  • 19. 4. The Replication service will calculate the loss window. This is done using the LastLogInspected marker as the start time and the current time as the end time. Since the transport dumpster is based on message delivery times, we generously pad the loss window by expanding it 12 hours back and 4 hours forward. The start time is recorded in DumpsterRedeliveryStartTime and the end time is recorded in DumpsterRedeliveryEndTime. 5. The Replication service makes an RPC call to the Hub Transport servers listed in DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window. 6. The Hub Transport server will acknowledge the first redelivery request with a Retry response. 7. The Hub Transport server will redeliver the messages it has within its transport dumpster for the allotted time window. Once the message is resubmitted for delivery, the message is removed from the transport dumpster. 8. The Replication service makes an RPC call to the Hub Transport servers listed in DumpsterRedeliveryServers requesting dumpster redelivery for the given loss time window. 9. The Hub Transport servers that have successfully redelivered the dumpster messages will acknowledge the redelivery request with a Success response. At this point the Replication service will remove those Hub Transport servers from the DumpsterRedeliveryServers key. 10. This process will continue until either all Hub Transport servers have redelivered the mail, or the MaxDumpsterTime has been reached. Note: If there are no message size limits in your organization, a single 18 MB message will purge all other messages for given Database on a given Hub Transport server. Reference Understanding Database Availability Groups Understanding Active Manager Understanding Mailbox Database Copies Understanding the Exchange Information Store White Paper: Continuous Replication Deep Dive Understanding How Cluster Quorums Work