As an Exchange consultant I get to see how a lot of customers use and misuse Exchange. With Exchange 2013 and Exchange 2010, one of the most misunderstood concepts is how Database Availability Groups (DAG) leverage File Share Witnesses. The average Joe does not understand DAGs, File Share Witnesses or quorum, but they do understand email being down.
The Background
For those of you who do not live and breathe Exchange Server, Microsoft introduced the concept of the Database Availability Group in Exchange 2010. A DAG contains mailbox servers that become members of the DAG. Once a mailbox server is a member of a DAG, the Failover Clustering role is installed on the server and all required clustering resources such as the cluster heartbeat, cluster networks, and the quorum are created. The term quorum comes from parliamentary procedure, where a majority of voting members must be present to make a decision. In Windows Failover Clustering, this means a majority of cluster members must be online for the cluster to function. This is important because if the cluster loses quorum, all DAG operations cease and all databases hosted in the DAG dismount—i.e., email is down. In this case, human intervention is required to correct the quorum problem and restore Exchange availability.
The Problem
As mailbox servers are added to the DAG, they are joined to the cluster and added to the list of voting members. However, an odd number of quorum voters must be maintained for majority decisions to be made. In the event there is an even number of mailbox servers, the DAG employs an external File Share Witness server to act as a tiebreaker. When the witness server is needed for quorum, any member of the DAG that can communicate with the witness server can place a lock on its witness.log file. The DAG member that locks the witness file (referred to as the locking node) retains an additional vote for quorum purposes.
The part many administrators miss is that once a witness server has been configured for the DAG, the cluster’s quorum mode is automatically adjusted by Exchange as mailbox servers are added or removed. So an administrator thinks they have built extra resilience against quorum loss into their environment by adding a File Share Witness, but in reality that File Share Witness may be completely ignored by Exchange.
The Example
Let’s walk through a common scenario and see how this plays out. An organization sets up a single mailbox server and a single Hub Transport/Client Access server in their primary site. In order to facilitate site resiliency, another mailbox server is staged at a disaster recovery site and a DAG is created. However, a two-member DAG will lose quorum when one of the mailbox servers goes offline so a File Share Witness is deployed using the Set-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell. This gives administrator breathing room to do maintenance on the Exchange servers one at a time.
Figure 1. DAG in Node and File Share Majority mode
At this point, if you look at the quorum configuration in the Failover Cluster Manager it will report the cluster is operating in Node and File Share Majority mode.
Figure 2. Quorum Configuration in Failover Cluster Manager
After some time passes, the organization decides they would prefer a more convenient maintenance option than failing over to another site. In order to facilitate local database failover, they stage another mailbox server/DAG member in the primary site.
Figure 3. DAG in Node Majority mode
However, they incorrectly assume that the File Share Witness is still being counted by Exchange and that the DAG will maintain quorum even with 2 servers offline. What administrators don’t realize is that upon adding the third mailbox server, Exchange took it upon itself to change the cluster quorum mode from Node and File Share Majority to Node Majority. Consequently, the File Share Witness is now being ignored by the DAG. A Get-DatabaseAvailabilityGroup query will still show the settings for the File Share Witness, but the Failover Cluster Manger or the Get-Cluster PowerShell cmdlet will report that the File Share Witness is no longer in use.
Figure 4. Get-Cluster cmdlet output
The reason Exchange makes this change is related to how it calculates quorum, Q = nodes/2 + 1. If there were four voting members, then three nodes would be required to maintain quorum. Whereas with three voting members, only two nodes are required for quorum. Now if a fourth mailbox server was added to the DAG, Exchange would again automatically change the quorum mode back to Node and File Share Majority to maintain an odd number of voting members and the File Share Witness would again be relied upon.
The Conclusion
Organizations implement Database Availability Groups to add redundancy and resilience to their Exchange environments. This is the right call, I prefer to have at least 3 replicas of Exchange data available at all times. However, customers are often surprised when they suffer unplanned outages because they were not prepared to manage the additional layers of complexity that come with DAGs. Do you want to add resiliency to Microsoft Exchange and need additional information or real world expertise? Credera has extensive experience in designing, planning, and implementing messaging solutions. If you have questions about this blog post, points of view, or IT infrastructure, please leave a comment below, tweet us @CrederaIT or contact us online.
Contact Us
Ready to achieve your vision? We're here to help.
We'd love to start a conversation. Fill out the form and we'll connect you with the right person.
Searching for a new career?
View job openings