Tuesday, February 16, 2010

Exchange 2010 Site Disaster Recovery on a Dime! Part 1: Building the Solution



By Lasse Pettersson, Exchange MVP

Since Microsoft has made significant improvements to how Exchange handles disaster recovery of databases, many organizations have started to wonder how they can effectively prevent site, datacenter and other such disasters from occurring. But not every company has the budget to implement a new infrastructure, so how can such companies still take advantage of these new techonolgies? The answer is in this article -- I will explain how this can be accomplished with only two Exchange 2010 servers. In Part 1 we will discuss how to build the solution; then in Part 2 we will move on to discover how to activate the disaster recovery site.

Please note that this solution does not give you High Availability, but it will provide you with a solution for site and server disaster.

This solution builds and depends upon the Exchange 2010 feature called Database Availability Group (DAG). DAG is the new High Availability feature of Exchange 2010 that is the evolution of the Exchange 2007 CCR, LCS and SCR replication technology. A DAG can be built with as little as 2 Exchange server mailbox roles, and with as many as 16, making this a very flexible solution. The beauty of the Exchange 2010 DAG feature is that can also contain other Exchange server roles such as CAS and HUB, which is an attractive option for smaller organizations. To demonstrate the scalability of the DAG feature, I will use only two servers in my example – one in the production site and one in the Disaster Recovery site. This represents the smallest installation that can be done for DAG, but remember this is a flexible solution so at any point if you need to scale out with multiple DAG members the steps you would perform are nearly identical.

Building the solution.


In both the production site and the Disaster Recovery site we need a server with Windows Enterprise edition since DAG relies on Microsoft Failover Clustering which is only available in the Enterprise edition. (Remember that Exchange comes in either Standard or Enterprise edition. The Standard edition can be used with up to five databases, but if you need more than five then it is necessary to utilize the Enterprise edition of Exchange.) Both sites also need Domain Controllers and Global Catalog Servers. The DR (Disaster Recovery) site is most likely a different site in Active Directory to prevent users from accessing it.

Installing Exchange.

To install Exchange, you simply perform a standard Exchange installation in both sites. When you are finished you will have one Exchange server in the production site and one Exchange server in the DR site. Both servers can have all standard roles (i.e. Mailbox, HUB and CAS), but you can also install them on separate servers and have multiple roles on multiple servers.

To test that everything is functioning properly, I recommend creating a mailbox on each database that is mounted on each server, and then sending a test email from one mailbox to the other. Our configuration thus far is very basic since no clusters or DAGs have been built yet. At this point, our example consists of two Exchange servers located in different Active Directory sites.

Since DAG is one of the hottest new features in Exchange 2010, many articles have been written on the subject. Hence, I will walk you through the steps of creating a DAG fairly quickly.

Creating a DAG.
In the Exchange Management Console, under the Organization Configuration, Mailbox and the ‘Database Availability Groups’ tab, right click and select ‘New Database Availability Group.’

The Create a DAG wizard starts.

Next, enter a name for your DAG. If you have a server with a HUB role but no mailbox role, then the wizard will select the HUB server and create the witness directory for you. If you don’t have an available HUB server, then you must manually specify the ‘Witness Server’ and a ‘Witness Directory.’

At this stage I need to caution you that a permission issue might occur when creating the File Share Witness directory. This is because it’s not the logged on users security context that is utilized when creating the File Share Witness directory, but rather the Exchange server computer account. The solution is to add the ‘Exchange Trusted subsystem’ group to the witness server local administrators group. This is also necessary becasue in order to create a DAG you must also create a computer account in Active Directory. Thus, you might need to delegate ‘Exchange Trusted subsystem’ group to create and manage the computer account in Active Directory, or at least in a pre-populated disabled computer account.

Exchange Management Shell or Wizard?

If you prefer Exchange Management Shell over the Wizard, below is the command you need to create a DAG:


New-DatabaseAvailabilityGroup -Name DAG1 -WitnessDirectory C:\DAG1 -WitnessServer FQDNofaServerinPrimarySite -DatabaseAvailabilityGroupIpAddresses 192.168.15.233,192.168.25.233 -Verbose

The Exchange Management Shell is a better approach than the Wizard when you consider the following: with the Wizard you cannot set a fixed IP on your DAG. Instead, it will use DHCP to assign an IP. This is important to consider since it is recommended that you have an IP in every subnet that contains DAG members. The reasoning behind this is that when DAG moves to a different IP subnet, it needs to have a valid IP address on that IP subnet.

Adding the parameter Verbose will allow you to receive clues in case something goes wrong as the command runs and pulls more information for you.

Why is having fixed IP for your DAG preferable to using DHCP?

Remember that a DAG is actually a failover cluster, and in order for the cluster to function IP must be up and running. Since not every company uses DHCP on the server subnets (some only use it on client subnets), it is often more convenient to have fixed IP.

The next step is to add your Exchange mailbox servers to your DAG.

Click ‘Manage Database Availability Group Membership’ and then add the mailbox server to it.
If everything works out accordingly, then the Failover Cluster role will be installed on the servers you added to your DAG. You can start the Failover Cluster Management tool and see that there is a cluster called DAG1 that contains your two mailbox servers. The computer account should also be enabled, and the witness directory should be shared and also populated with a couple of files.

Below is the Exchange Management Shell comand that you must run one time for mailbox server that you add:


Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer FQDNofMailboxServer –Verbose

Remember to allow AD replication between each step, otherwise you may not be able to join servers to your DAG.

You should also see that a DAGNetwork has been created, and if you have multiple networks on your mailbox servers then there should be multiple DAGnetworks. Even though you should run DAG on a single network, it is oftentimes better to have mutiple NIC and networks in your server because it gives you the ability to separate MAPI, Cluster and replication traffic into different networks.

The next step is to add databases to your DAG members in order to enable replication. Up to this point, each server had only one database mounted but now we would like to add more to it.

Click the ’Add Mailbox Database Copy’

Next, select which servers you want to hold a copy of the mailbox database and the ActivationPreference.

Below is the Exchange Management Shell command:


Add-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681' -MailboxServer FQDNofServerInDRSite -ActivationPreference 2

This step can potentially take a long time since the database is seeded to the DR (Disaster Recovery) site; how long it takes depends on the database size and available bandwidth.

Now we must set some parameters on the mailbox database so that it is not automatically activated.

From Exchange Management Shell (EMS) run the following command:


Suspend-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' -ActivationOnly –Verbose

This will ensure that replication is still happening automatically while ensuring activation will not.

Next, run every mailbox database to both your servers with the ActivationPreference set to 1 on the server in the production site; then, set the database copy on the server in the Disaster Recovery site to ‘suspended’ for activation.

Configuring Replay Lag Time

Configuring Replay Lag time is something that you should seriously consider doing. Lag time is how long the passive copy will wait until the transaction log is replayed into the database. Replication is still happening as fast as possible.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -ReplayLagTime 0.1:0:0 –Verbose

(Please note: 0.1:0:0 means 1 hour. In real life you should most likely set this to a higher value.)

There is also another paratemeter that you might want to use--the Truncation Lag Time.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -TruncationLagTime 0.2:0:0

(Please note: 0.2:0:0 means 2 hours. In real life you should probably set this to another value.)

How long you set the ReplayLagTime and TruncationLogTime for depends on two things: 1) How long it takes you to notice a corruption on the production site, and 2) How long it takes to replay all transaction log files if you activate the DR site. For instance, if you know you can detect a corruption in the active datacenter within 10 hours, then you should probably set the ReplayLagTime to 12 hours or so to allow for recovery of all non-corrupted data. Also consider the amount of disk space you have when setting the ReplayLagTime.

More information about Managing Mailbox Database Copies can be found on Technet: http://technet.microsoft.com/en-us/library/dd335158.aspx

For more information on creating a DAG, click here: http://msexchangeteam.com/archive/2009/06/14/451609.aspx

Creating the CASArray.
Now your DAG and databases should be all ready to go! Remember to monitor the replication with Get-MailboxDatabaseCopyStatus –Server FQDNofServer

CopyQueueLength and ReplicationQueueLength should show small numbers if possible, preferably zero or one, but in real life you would see higher values depending on your bandwith, serverload, etc.

Why do you need a ClientAccessArray?

Technically, this is not needed but rather highly recommended because it’s easier to manage a system that has one, and since it’s only a name that you can move to another IP, you can also move your client connection point.


Move client connection point?!

Yes, the Outlook MAPI connection is moved from the Information Store on the mailbox server to the CAS (and the CASArray name if you have one defined.)

New-ClientAccessArray -Name CASArray-HQ -Fqdn FQDNofYourDesiredEndpoint -Site ADsiteInPrimaryDatacenter


For more information on the New-ClientAccessArray, click here:
http://technet.microsoft.com/en-us/library/dd351149.aspx

Now configure all your databases to have the CASArray-HQ object as the RPCClientAccessServer. This will ensure that Outlook conencts to CASArray FQDN instead of the actual server name.

Get-MailboxDatabase | Set-MailboxDatabase -RpcClientAccessServer CASArray-HQ

You must also create a record in DNS with FQDNofYourDesiredEndpoint with an IP of your Exchange server in the primary datacenter. Set the TTL to a low value, such as 5 minutes, to make the switchover go faster to the Disaster Recover site.

When Outlook connects, it will now connect to the ‘FQDNofYourDesiredEndpoint’ name. Also, if you look at the MAPI settings, Outlook thinks that the FQDNofYourDesiredEndpoint is the Exchange mailbox server.

Configuring Autodiscover

For Outlook to connect properly you must make sure to configure Autodiscover correctly.

At this point you should have two servers with the Mailbox, HUB, and CAS roles on each one, a DAG with the two servers (one in each AD site), and a CASArray located on the server in the primary AD site.

Failovers will not occur automatically because of the configurations we did on the mailbox databases. Thus, if you reboot the primary server then clients will lose connection to their mail.

I hope you have enjoyed this tutorial on Exchange Server 2010 Disaster Site, and that you were able to follow my instructions and begin preparing your organization for the worst-case scenario: site or server disaster. Now that you know how to build the solution, in Part 2 of this piece we will move on to discussing how to activate the disaster recovery site, at which point I will explain how to backup, test and perform a switchover should your Exchange server fail.

Labels: ,

Tuesday, February 2, 2010

Dude, Where's my Backup?

By Mahmoud Magdy

When I started writing this post, I couldn’t get the movie “Dude, Where’s My Car?” and its events out of my head for two reasons. The first reason is that the conundrum the film’s characters find themselves in reminds me of a similar event in which one of my customers experienced an Exchange disaster and I was brought in to assist. I realized straightaway the client needed to perform a backup, and when I informed the Exchange Administrator of this he frantically turned to his Backup Administrator and asked him, ‘Dude, where’s my backup?!’

The second reason this movie reminded me of that incident is because the same clueless looks that the film’s starring actors had on their faces when they awoke after a crazy night to find their car missing were identical to the looks on the faces of the Exchange and Backup Admin when I asked them to perform a backup. In fact, I see that look on many of my customers’ faces when I ask them to restore an Exchange backup set for me! Those days of panic-stricken looks and long hours spent worrying over data loss, log deletion, and mailbox restoration are now over. Join me as we explore one of the undocumented features of Exchange 2010: the backup-less deployment.

Backups in Exchange have always been a point of concern for me due to my experiences while working as an Infrastructure Manager. In one instance I thought I had done everything I should have: I had everything in place, our Exchange was up and running and I had assigned a team to backup Exchange, AD, SQL and most of our critical systems. We tested the restore steps and everything ran smoothly, but when we had a disaster you can already predict what happened – we experienced another backup set failure which cost us two hours of downtime.

The secret to successfully restoring Exchange has always been a mystery. A successful restore even for an Exchange guru is a tedious task! We are fortunate that today we have assistance in the form of DB portability, power shells, and wizards for backups and restores; but even with this help, the task of restoring Exchange remains tedious.

Other issues that arose with the introduction of Exchange 2007 were the single item and single mailbox restoration. It is now possible to restore a single mailbox, or better yet a single item, but clever software is needed to perform the task. You must also properly train and prepare your IT staff, and remember that the software and hardware requirements for either type of restoration are expensive. You must carefully compare your options when purchasing decent backup software since their prices can be high.

You say you want a revolution…

Well you know, when Microsoft introduced Exchange 2010 that’s exactly what they brought. For the first time, Microsoft is recommending that administrators perform backup-less deployments. When I heard that I laughed out loud, as I am sure many of you are, since for years as consultants and as customers we have always been told to backup everything, most importantly our Exchange data, so just exactly how will this revolution of backup-less Exchange deployments change the world?

So Microsoft has a real solution…

Would you like to hear the plan? If you are shocked by this new recommendation then let me set your mind at ease: ‘it’s gonna be alright.’ If you feel like it will take awhile for you to trust Microsoft’s recommendation then you are not alone. It took me, a technically savvy (and extremely humble) guy nearly 3 months to accept this fact, but what it really took was for me to design a backup-less configuration for the first time. After designing this configuration I have learned the benefits of going backup-less, so please join me as I explain them to you.

Backups’ Background:

Historically, I have always considered Exchange backups to be more important than Exchange itself. ‘Why?’ you might ask. My answer is multi-fold: because it guarantees that I will be back online in the blink of an eye if the system goes down. Plus , it will enable me to recover items for users that have been hard deleted, and more importantly this is the only way to flush and delete the logs of the mailbox database (previously this was tied to the Storage Group.) The other not-so-popular method of deleting such logs is known as circular logging.

Backup in Exchange 2003 was straight forward, but with Exchange 2007 Microsoft introduced the concept of database copies which provided a new way to backup your Exchange data.

Now you can perform a backup from the passive copy, which provides enough data to help you discern what the online copy is suffering from (i.e. IOPs, users’ access, AV Scan.) When you back up the passive copy, and the backup to the passive copy is complete, then the database is marked as backed up and logs are deleted from passive and active copy.

As mentioned previously, doing Exchange backups historically required costly backup software as well as hardware, including storage, backup tapes, tapes libraries, and backup hustle.

Microsoft made a bold decision to change the Exchange world by introducing backup-less configuration, which I will now discuss in more detail.

Less is more, don’t you agree?

What does backup-less really mean? It simply means that you do not have to backup your Exchange data, or at the very least it gives you the ability for the first time to not have to back it up.

I can completely understand many of you doubting that this is in fact a possibility, to never have to backup your Exchange data, but before you make your decision let us explore backup-less architectures and learn how they really work.

Backup-less Architecture:
As stated above, backup in E12 could be done to the passive copy but this is only true for CCR or LCR. At the time, this was a viable option: to backup the passive node and then once backup is done the passive copy updates the database header, notifies the active node, and the active node deletes the logs.

Issues to consider before designing or deploying a Backup-less configuration:
- Data protection, Database health, Database recovery.
- What to do when you lose data.
- How to delete your logs.
- How to restore items and mailboxes like before.

In order to address these issues, you must understand how Backup-less Configurations work:

When you want to configure your Exchange in backup-less, you should have at least two copies of the data (Active/Passive.) Microsoft recommends doing backup-less in more than 3 copies (Active/Passive/Passive) configuration. In order to configure your infrastructure to be backup-less, you must obtain three copies of the data and configure circular logging on the mailbox database.

I can hear you saying, ‘Circular logging?! No way!’ And I understand your reaction, but keep in mind we never do circular logging unless we have strong reason to, so let us see how circular logging works with the backup-less.

Real World Example:
To illustrate how circular logging works with backup-less, let us consider the following example:

You have a mailbox store called MB1 that has 3 copies of it on Servers 1, 2 and 3. MB1 is active on Server 1 and has two copies on Servers 2 and 3. Now you want to configure it in Backup-less. All you have to do is configure the mailbox database to do circular logging, and once you do so Exchange will change its architecture slightly and perform circular logging in another way.

When circular logging is enabled on the database, the logs are written to the Hard disk. Once the data is committed to the database, logs will be flushed. In Backup-less (DAG environment only) this changes the Exchange behavior: logs are written but never get flushed until logs are replicated and marked as checked at the other database copies.

To understand this, let us go back to our example: MB1 has log E01 that is waiting to be written. E01 is written to the DB and now it gets held in Server 1 when before it would have gotten flushed.

Server 1 replicates E01 to Server 2, Server 2 copies the log and it remains in Server 1 where it checks the logs and marks it as healthy/inspected and notifies Server 1. Server 1 does the same with Server 3 and once Server 3 verifies its logs and reports to Server 1 that its copy of E01 is healthy/inspected, then Server 1 deletes and flushes the logs.

There are 2 questions that might arise at this point:
- Why didn’t Exchange wait until the log is replayed at Server 2 and Sever 3?
- Does Server 1 wait until it replicates the data to all of its adjacent servers? (In our example server 2 and server 3)

The answer to the first question is Exchange will not wait for the log replay because you might have a lagged replay configured on your DB copy. This means that you might replay the logs 48 hours later which translates into huge numbers of logs for Exchange.

I do not have a confirmed answer to the second question yet, but if you attended an Exchange 2010 Advanced storage session you would know that an Exchange server can recover and resend the logs, and even better, the specific bits in case of database corruption. But if Server 1 deletes its logs and the same for Server 2, then where does Server 3 get its logs from?

Hopefully by now the answer to that question is a little bit clearer. Exchange now has a self-based mechanism to flush its logs, but Backup-less configuration is not a specific setting that you assign to Exchange. By that I mean you don’t go to the options page and check the box stating this is a Backup-less organization; rather, this is a group of configurations that you apply to Exchange so you can deploy a Backup-less configuration. It is important to remember that this behavior is the same if you have 2 copies and do circular logging, even if you do backup.

There are several pertinent questions that we should answer one at a time:
- What about the health of my Database, Database availability, and uptime?
Exchange 2010 has a self-healing mechanism. What that means is that if page No. 485950 gets written to a bad block, or gets corrupted logically or physically, then Exchange 2010 can replicate this page from another server by copying only the required page with the next replication cycle. This keeps the Exchange database healthy and minimizes the replication requirements.

If Exchange cannot make the active database healthy then we have DAGs that pick the best available copy and make it an active copy. Typically if a physical server failed, a Hard disk failed, or a database failed physically or logically, you would not need your backup since you already have two copies. This means you don’t need your backup! (Are you becoming a backup-less fan yet?)

Now the other dimension is minimizing the storage cost. Since you have three copies of the database, and since Exchange 2010 has 70% less IOPs, you no longer need expensive SCSI disks, or even a SAN. I recommend using a JBOD configuration which is much more cost effective than any other storage option. Thus, in a backupless configuration, you can have three copies of your data and reduce both the backup software and hardware cost. (Considering jumping on the backup-less bandwagon now?)

- What should I do if I want to replace a single item or a mailbox?

Before answering that, first ask yourself how many times as an Exchange admin you had to do that (restore an item or mailbox for a user). In my career, I only had to do it at most three to five times. It might be different in your organization, but in general most Exchange administrators do not need to do that on regular basis.

Since we have cheaper storage we can increase the mailbox store dumpster. It is set at 14 days by default, but now you can increase it and ask the users to recover their mailbox store. You can also use the new RBAC (role-based access control) model and give helpdesk personnel the permission to search the Exchange dumpster and perform discovery within it using PowerShell in order to recover items for users…..meaning you as the Exchange Admin does not have to!

- Don’t I need a backup at all?

I will not say that you don’t need to backup the Exchange system at all, but you might want to consider backing it up as a second layer of protection. If you do perform a backup-less configuration, then your first line of defense is not the backup sets any more, it is your Exchange 2010 Backup-less configuration,. In other words, it is done automatically.

I know after being told for years to backup everything, most especially Exchange data, that it will be difficult to change your thinking radically with a single article. You probably have legislations that make you comply with 3 years’ restore SLA. But if you are one of the Exchange admins that do not have to abide by such legislations, then you should consider Backup-less Configuration.

Hopefully you now understand the architecture change of the circular logging, DAGs, and how to do backup-less configurations. Backup-less configuration is still an un-documented feature of Exchange 2010 and you will not find much information about it. My recommendation is that you open your mind to the idea and take care in calculating the total cost required for backup gear as compared to the B-less cost, without forgetting their technical and operational requirements as well. I cannot say that backup-less is for everyone, but it is a great option that can save you money, and one you should give decent thought to.

I look forward to bringing you another thought-provoking article within a month, and until that time I wish you the best uptimes and the fastest Exchange servers!

Labels:


 

 

 


 

 

 

Previous Posts
Browse Monthly Archives

Suggest a Topic
Hire Us

Subscribe to
Posts [Atom]