>High Availability for Exchange 2010


>How much do you rely on your email system? What would happen to productivity within your organization if your email system failed even for just a short period of time? These concerns bring about the topic of high availability.

A highly available system is typically one defined as having only a few hours of acceptable downtime in a given timeframe. For example, a system with 99.999% availability is considered to be unavailable for about only 5 minutes in a given year. With email being a major communication component of most organizations today, availability of these systems and their services has become a true focus. Exchange Server 2010 has introduced new concepts and many needed improvements for mailbox availability.

Let’s first take a quick look at how mailbox availability was addressed in previous Exchange server versions.

Exchange 2003 and 2007 both utilized the services of Windows Failover Clustering. While 2003 offered what we may call traditional active/active or active/passive windows clustering, 2007 introduced the concepts of SCC (single copy cluster), an active/passive only multi-node failover cluster, LCR (local continuous replication), a single server solution offering database availability but no server failover, CCR (cluster continuous replication), a two node active/passive failover cluster, and SCR (standby continuous replication), offered with SP1, this solution provided an offline copy of the mailbox databases which could be kept in an offsite datacenter.

In these implementations, the clustering solutions needed to be configured from the beginning of your deployment (prior to installing and configuring the Mailbox server role), making it somewhat difficult to later add additional servers. Also, the failover of a failed node occurred at the server level, causing potentially unnecessary service disruption to those users with mailboxes on still functioning databases. Another potential drawback was that in Exchange 2007, a CCR environment supported only the role of the Mailbox server. The Hub and Client Access server roles needed to be deployed separately and apart from the Mailbox server. Now, even though the Mailbox server was highly available, the Hub and CAS still provided a single point of failure. If considering the availability of these roles, additional hardware and configurations were still necessary. So, in all, you may have been looking at a minimum four server solution for availability, but still no true disaster recovery solution was in place. While the addition of SCR provided the necessary option for offsite disaster recover, this process, however, required some effort and technical expertise.

Exchange Server 2010 takes the best of all of these previous options and brings about what we now know as Database Availability Groups (DAG). With DAG, we now have the option of performing incremental deployments allowing for additional servers to be added at any time with minimal effort. Meaning, the DAG can be created anytime after installing the Mailbox server role, and members can be added to the DAG group with minimal effort at any later time. We also are no longer required to separate the Hub/CAS roles from the Mailbox role as this deployment will now support the combination of these roles on a single server. Not only does this increase and simplify your high availability environment, but you could also experience a reduction in costs of licensing and hardware!

A DAG allows up to 16 servers as its members for replication and up to 16 copies of any single database across these servers (with only one copy allowed per server). A new feature of Exchange 2010 called the “Active Manager” runs on all Mailbox servers that are DAG members. This component has replaced the previous failover management features of earlier Exchange versions. The Active Manager now determines which copies of the database will be active or passive, handles notifications of changes to the replication topology and detects failures to the local database and local Information Store.

DAGs also provide average database failover time around 30 seconds (compared to 2 minutes with CCR). Notice that was “database” failover. A big benefit of DAG is that failovers no longer occur at the server level, but now occur at the individual database level. So, we can now failover a single failed database and not cause disruption to users unnecessarily.

DAGs do still utilize the functionality of failover clustering as our previous versions did, but administrators no longer have to go through the painstaking motions of installing and configuring windows clusters for the DAG to function. All management of the DAG is done right within the Exchange environment, which makes life here much simpler. Oh, and for those of you previously left out of the cluster environment because you weren’t running the Enterprise edition – welcome back! Exchange 2010 Standard now allows for up to 5 DAG databases (compared to the 100 databases allowed now in Enterprise).

All in all, it is now much more simple to deploy both a highly available and fault tolerant Exchange environment. It may also be worth noting that according to Microsoft, Exchange 2010 has a 90% reduction in IOPS from Exchange 2003, so cheaper storage options may also be in your future. To make matters even better, if deploying three or more database copies, Microsoft even recommends leaving your RAID 5 or RAID 10 disks configurations and instead move to a single SATA disk per database and transaction log files.

While a DAG is one of the most noted new features of Exchange 2010, it most certainly is not the only feature making administrators take a closer look at this product. More information on these features can be found in Microsoft Course 10135: Configuring, Managing and Troubleshooting Exchange Server 2010.