Tuesday, March 2, 2010

Understanding Exchange 2010 Storage Architecture: Part 3

By Mahmoud Magdy

In Part 1 of this series, we reviewed the Microsoft’s ESE (Extensible Storage Engine), and discussed the new storage enhancements that were introduced in Exchange 2010.

In Part 2, we continued our journey through the Exchange 2010 storage enhancements by exploring the concepts of logical and physical changes to the Microsoft ESE database.

In the final part of this series, we are going to explore some very clever changes Microsoft has made that significantly improves system performance. In this article we are going to take a look at the following areas:

- Read/write Coalescing and page compression
- Cache compression
- Online maintenance

Read/write coalescing and page compressions:
In Part 1 of this series, we noted that database page has been changed from 8 KB in Exchange 2007 to 32 KB in Exchange 2010. How does this change improve performance? Let’s compare how Exchange 2007 and 2010 would handle a 20 KB email item. Exchange 2007 would require 3 separate IOs to read this single email item in comparison to one read operation in Exchange 2010. Please see the following diagram to better understand this concept.






To better understand why this change is significant, it is helpful to look back to how data was previously managed in legacy versions of Exchange. Exchange 2003 was much like an infant in that it basically did whatever it needed when it wanted in regards to the database. Most babies eat, sleep, relieve themselves and play on their own schedule, much to their parent’s dismay. That pretty much sums up how Exchange 2003 used to write, read or delete items to the database. When Exchange 2007 was introduced, it was evident that the technologies had grown up. This progression has naturally continued in Exchange 2010.

One big area of improvement is that Exchange 2010 has a much larger page size and manages the read/write operations more efficiently. For example, when a read or write operation is received, the size of the item is compared to the page size to determine if the operation should be committed or to wait to gather more changes so that a single IO can be used. The following diagrams show how Exchange 2010 is much more efficient that Exchange 2007.



Notice how the same Read operation takes 3 transactions in Exchange 2007 as opposed to a single operation in Exchange 2010. The diagram below highlights how a Write process is handled more efficiently in Exchange 2010.



Now that Exchange 2010 has a larger page size, what happens for smaller pages in the cache? Since the cache is in memory and not on the Hard disk, you might assume that the whole 32 KB page would be used and this would result in a waste of memory. The engineers at Microsoft thought about this problem when introducing designing Exchange 2010 and created a nice solution. When a page is not fully utilized, Exchange uses cache compression. For example a page with 7 KB of data will be compressed in memory so that the extra space is not wasted. The diagram below graphically represents this concept.



Online Defragmentation:
In previous versions of Exchange, the maintenance of database was handled nightly by the exchange server. This included page purges and the handling of mailboxes that were deleted.
Due to the many storage related enhancements that were introduced in Exchange 2010, Microsoft wanted to assure that the detection of faults, logically or physically, were detected and handled as soon as possible. For this reason, the way Exchange maintained its databases was changed. Instead of waiting until the evening to perform these important tasks, Exchange now performs maintenance constantly.



You can enable a special option on the database and enable 24/7 database maintenance on the database. This new feature also recovers white space on the fly. This also really eliminates the need to perform off line defragmentations of the database. It also allows the database to recover from logical or physical errors at the database level in real time.



The above diagrams show the new features and architectural changes that occurred to the OLD and thus became OLD2, OLD provides Background/throttled process that maintains contiguity of “Sequential Tables” by rebuilding leaf level of B+ Trees, thus gives the Exchange 2010 the ability to get a defragmented database like below:

The above diagrams shows a live view that compares the Exchange 2010 DB vs. 2007 fragmentation level, it is clear that Exchange 2010 has maintenance over the older technology.


Talking about storage in Exchange 2010 never ends. I like reading and writing about it but at some time it should stop. I hope that I was able to give you a solid view of the Exchange 2010 storage architecture and its new features. In my next article, I will talk about certificate consolidation and considerations for Exchange/OCS deployments. I hope that you liked this series and that you will keep visiting our blog for cool blog entries from fellow ESE writers.

Labels: , , ,

Tuesday, February 16, 2010

Exchange 2010 Site Disaster Recovery on a Dime! Part 1: Building the Solution



By Lasse Pettersson, Exchange MVP

Since Microsoft has made significant improvements to how Exchange handles disaster recovery of databases, many organizations have started to wonder how they can effectively prevent site, datacenter and other such disasters from occurring. But not every company has the budget to implement a new infrastructure, so how can such companies still take advantage of these new techonolgies? The answer is in this article -- I will explain how this can be accomplished with only two Exchange 2010 servers. In Part 1 we will discuss how to build the solution; then in Part 2 we will move on to discover how to activate the disaster recovery site.

Please note that this solution does not give you High Availability, but it will provide you with a solution for site and server disaster.

This solution builds and depends upon the Exchange 2010 feature called Database Availability Group (DAG). DAG is the new High Availability feature of Exchange 2010 that is the evolution of the Exchange 2007 CCR, LCS and SCR replication technology. A DAG can be built with as little as 2 Exchange server mailbox roles, and with as many as 16, making this a very flexible solution. The beauty of the Exchange 2010 DAG feature is that can also contain other Exchange server roles such as CAS and HUB, which is an attractive option for smaller organizations. To demonstrate the scalability of the DAG feature, I will use only two servers in my example – one in the production site and one in the Disaster Recovery site. This represents the smallest installation that can be done for DAG, but remember this is a flexible solution so at any point if you need to scale out with multiple DAG members the steps you would perform are nearly identical.

Building the solution.


In both the production site and the Disaster Recovery site we need a server with Windows Enterprise edition since DAG relies on Microsoft Failover Clustering which is only available in the Enterprise edition. (Remember that Exchange comes in either Standard or Enterprise edition. The Standard edition can be used with up to five databases, but if you need more than five then it is necessary to utilize the Enterprise edition of Exchange.) Both sites also need Domain Controllers and Global Catalog Servers. The DR (Disaster Recovery) site is most likely a different site in Active Directory to prevent users from accessing it.

Installing Exchange.

To install Exchange, you simply perform a standard Exchange installation in both sites. When you are finished you will have one Exchange server in the production site and one Exchange server in the DR site. Both servers can have all standard roles (i.e. Mailbox, HUB and CAS), but you can also install them on separate servers and have multiple roles on multiple servers.

To test that everything is functioning properly, I recommend creating a mailbox on each database that is mounted on each server, and then sending a test email from one mailbox to the other. Our configuration thus far is very basic since no clusters or DAGs have been built yet. At this point, our example consists of two Exchange servers located in different Active Directory sites.

Since DAG is one of the hottest new features in Exchange 2010, many articles have been written on the subject. Hence, I will walk you through the steps of creating a DAG fairly quickly.

Creating a DAG.
In the Exchange Management Console, under the Organization Configuration, Mailbox and the ‘Database Availability Groups’ tab, right click and select ‘New Database Availability Group.’

The Create a DAG wizard starts.

Next, enter a name for your DAG. If you have a server with a HUB role but no mailbox role, then the wizard will select the HUB server and create the witness directory for you. If you don’t have an available HUB server, then you must manually specify the ‘Witness Server’ and a ‘Witness Directory.’

At this stage I need to caution you that a permission issue might occur when creating the File Share Witness directory. This is because it’s not the logged on users security context that is utilized when creating the File Share Witness directory, but rather the Exchange server computer account. The solution is to add the ‘Exchange Trusted subsystem’ group to the witness server local administrators group. This is also necessary becasue in order to create a DAG you must also create a computer account in Active Directory. Thus, you might need to delegate ‘Exchange Trusted subsystem’ group to create and manage the computer account in Active Directory, or at least in a pre-populated disabled computer account.

Exchange Management Shell or Wizard?

If you prefer Exchange Management Shell over the Wizard, below is the command you need to create a DAG:


New-DatabaseAvailabilityGroup -Name DAG1 -WitnessDirectory C:\DAG1 -WitnessServer FQDNofaServerinPrimarySite -DatabaseAvailabilityGroupIpAddresses 192.168.15.233,192.168.25.233 -Verbose

The Exchange Management Shell is a better approach than the Wizard when you consider the following: with the Wizard you cannot set a fixed IP on your DAG. Instead, it will use DHCP to assign an IP. This is important to consider since it is recommended that you have an IP in every subnet that contains DAG members. The reasoning behind this is that when DAG moves to a different IP subnet, it needs to have a valid IP address on that IP subnet.

Adding the parameter Verbose will allow you to receive clues in case something goes wrong as the command runs and pulls more information for you.

Why is having fixed IP for your DAG preferable to using DHCP?

Remember that a DAG is actually a failover cluster, and in order for the cluster to function IP must be up and running. Since not every company uses DHCP on the server subnets (some only use it on client subnets), it is often more convenient to have fixed IP.

The next step is to add your Exchange mailbox servers to your DAG.

Click ‘Manage Database Availability Group Membership’ and then add the mailbox server to it.
If everything works out accordingly, then the Failover Cluster role will be installed on the servers you added to your DAG. You can start the Failover Cluster Management tool and see that there is a cluster called DAG1 that contains your two mailbox servers. The computer account should also be enabled, and the witness directory should be shared and also populated with a couple of files.

Below is the Exchange Management Shell comand that you must run one time for mailbox server that you add:


Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer FQDNofMailboxServer –Verbose

Remember to allow AD replication between each step, otherwise you may not be able to join servers to your DAG.

You should also see that a DAGNetwork has been created, and if you have multiple networks on your mailbox servers then there should be multiple DAGnetworks. Even though you should run DAG on a single network, it is oftentimes better to have mutiple NIC and networks in your server because it gives you the ability to separate MAPI, Cluster and replication traffic into different networks.

The next step is to add databases to your DAG members in order to enable replication. Up to this point, each server had only one database mounted but now we would like to add more to it.

Click the ’Add Mailbox Database Copy’

Next, select which servers you want to hold a copy of the mailbox database and the ActivationPreference.

Below is the Exchange Management Shell command:


Add-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681' -MailboxServer FQDNofServerInDRSite -ActivationPreference 2

This step can potentially take a long time since the database is seeded to the DR (Disaster Recovery) site; how long it takes depends on the database size and available bandwidth.

Now we must set some parameters on the mailbox database so that it is not automatically activated.

From Exchange Management Shell (EMS) run the following command:


Suspend-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' -ActivationOnly –Verbose

This will ensure that replication is still happening automatically while ensuring activation will not.

Next, run every mailbox database to both your servers with the ActivationPreference set to 1 on the server in the production site; then, set the database copy on the server in the Disaster Recovery site to ‘suspended’ for activation.

Configuring Replay Lag Time

Configuring Replay Lag time is something that you should seriously consider doing. Lag time is how long the passive copy will wait until the transaction log is replayed into the database. Replication is still happening as fast as possible.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -ReplayLagTime 0.1:0:0 –Verbose

(Please note: 0.1:0:0 means 1 hour. In real life you should most likely set this to a higher value.)

There is also another paratemeter that you might want to use--the Truncation Lag Time.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -TruncationLagTime 0.2:0:0

(Please note: 0.2:0:0 means 2 hours. In real life you should probably set this to another value.)

How long you set the ReplayLagTime and TruncationLogTime for depends on two things: 1) How long it takes you to notice a corruption on the production site, and 2) How long it takes to replay all transaction log files if you activate the DR site. For instance, if you know you can detect a corruption in the active datacenter within 10 hours, then you should probably set the ReplayLagTime to 12 hours or so to allow for recovery of all non-corrupted data. Also consider the amount of disk space you have when setting the ReplayLagTime.

More information about Managing Mailbox Database Copies can be found on Technet: http://technet.microsoft.com/en-us/library/dd335158.aspx

For more information on creating a DAG, click here: http://msexchangeteam.com/archive/2009/06/14/451609.aspx

Creating the CASArray.
Now your DAG and databases should be all ready to go! Remember to monitor the replication with Get-MailboxDatabaseCopyStatus –Server FQDNofServer

CopyQueueLength and ReplicationQueueLength should show small numbers if possible, preferably zero or one, but in real life you would see higher values depending on your bandwith, serverload, etc.

Why do you need a ClientAccessArray?

Technically, this is not needed but rather highly recommended because it’s easier to manage a system that has one, and since it’s only a name that you can move to another IP, you can also move your client connection point.


Move client connection point?!

Yes, the Outlook MAPI connection is moved from the Information Store on the mailbox server to the CAS (and the CASArray name if you have one defined.)

New-ClientAccessArray -Name CASArray-HQ -Fqdn FQDNofYourDesiredEndpoint -Site ADsiteInPrimaryDatacenter


For more information on the New-ClientAccessArray, click here:
http://technet.microsoft.com/en-us/library/dd351149.aspx

Now configure all your databases to have the CASArray-HQ object as the RPCClientAccessServer. This will ensure that Outlook conencts to CASArray FQDN instead of the actual server name.

Get-MailboxDatabase | Set-MailboxDatabase -RpcClientAccessServer CASArray-HQ

You must also create a record in DNS with FQDNofYourDesiredEndpoint with an IP of your Exchange server in the primary datacenter. Set the TTL to a low value, such as 5 minutes, to make the switchover go faster to the Disaster Recover site.

When Outlook connects, it will now connect to the ‘FQDNofYourDesiredEndpoint’ name. Also, if you look at the MAPI settings, Outlook thinks that the FQDNofYourDesiredEndpoint is the Exchange mailbox server.

Configuring Autodiscover

For Outlook to connect properly you must make sure to configure Autodiscover correctly.

At this point you should have two servers with the Mailbox, HUB, and CAS roles on each one, a DAG with the two servers (one in each AD site), and a CASArray located on the server in the primary AD site.

Failovers will not occur automatically because of the configurations we did on the mailbox databases. Thus, if you reboot the primary server then clients will lose connection to their mail.

I hope you have enjoyed this tutorial on Exchange Server 2010 Disaster Site, and that you were able to follow my instructions and begin preparing your organization for the worst-case scenario: site or server disaster. Now that you know how to build the solution, in Part 2 of this piece we will move on to discussing how to activate the disaster recovery site, at which point I will explain how to backup, test and perform a switchover should your Exchange server fail.

Labels: ,

Tuesday, February 2, 2010

Dude, Where's my Backup?

By Mahmoud Magdy

When I started writing this post, I couldn’t get the movie “Dude, Where’s My Car?” and its events out of my head for two reasons. The first reason is that the conundrum the film’s characters find themselves in reminds me of a similar event in which one of my customers experienced an Exchange disaster and I was brought in to assist. I realized straightaway the client needed to perform a backup, and when I informed the Exchange Administrator of this he frantically turned to his Backup Administrator and asked him, ‘Dude, where’s my backup?!’

The second reason this movie reminded me of that incident is because the same clueless looks that the film’s starring actors had on their faces when they awoke after a crazy night to find their car missing were identical to the looks on the faces of the Exchange and Backup Admin when I asked them to perform a backup. In fact, I see that look on many of my customers’ faces when I ask them to restore an Exchange backup set for me! Those days of panic-stricken looks and long hours spent worrying over data loss, log deletion, and mailbox restoration are now over. Join me as we explore one of the undocumented features of Exchange 2010: the backup-less deployment.

Backups in Exchange have always been a point of concern for me due to my experiences while working as an Infrastructure Manager. In one instance I thought I had done everything I should have: I had everything in place, our Exchange was up and running and I had assigned a team to backup Exchange, AD, SQL and most of our critical systems. We tested the restore steps and everything ran smoothly, but when we had a disaster you can already predict what happened – we experienced another backup set failure which cost us two hours of downtime.

The secret to successfully restoring Exchange has always been a mystery. A successful restore even for an Exchange guru is a tedious task! We are fortunate that today we have assistance in the form of DB portability, power shells, and wizards for backups and restores; but even with this help, the task of restoring Exchange remains tedious.

Other issues that arose with the introduction of Exchange 2007 were the single item and single mailbox restoration. It is now possible to restore a single mailbox, or better yet a single item, but clever software is needed to perform the task. You must also properly train and prepare your IT staff, and remember that the software and hardware requirements for either type of restoration are expensive. You must carefully compare your options when purchasing decent backup software since their prices can be high.

You say you want a revolution…

Well you know, when Microsoft introduced Exchange 2010 that’s exactly what they brought. For the first time, Microsoft is recommending that administrators perform backup-less deployments. When I heard that I laughed out loud, as I am sure many of you are, since for years as consultants and as customers we have always been told to backup everything, most importantly our Exchange data, so just exactly how will this revolution of backup-less Exchange deployments change the world?

So Microsoft has a real solution…

Would you like to hear the plan? If you are shocked by this new recommendation then let me set your mind at ease: ‘it’s gonna be alright.’ If you feel like it will take awhile for you to trust Microsoft’s recommendation then you are not alone. It took me, a technically savvy (and extremely humble) guy nearly 3 months to accept this fact, but what it really took was for me to design a backup-less configuration for the first time. After designing this configuration I have learned the benefits of going backup-less, so please join me as I explain them to you.

Backups’ Background:

Historically, I have always considered Exchange backups to be more important than Exchange itself. ‘Why?’ you might ask. My answer is multi-fold: because it guarantees that I will be back online in the blink of an eye if the system goes down. Plus , it will enable me to recover items for users that have been hard deleted, and more importantly this is the only way to flush and delete the logs of the mailbox database (previously this was tied to the Storage Group.) The other not-so-popular method of deleting such logs is known as circular logging.

Backup in Exchange 2003 was straight forward, but with Exchange 2007 Microsoft introduced the concept of database copies which provided a new way to backup your Exchange data.

Now you can perform a backup from the passive copy, which provides enough data to help you discern what the online copy is suffering from (i.e. IOPs, users’ access, AV Scan.) When you back up the passive copy, and the backup to the passive copy is complete, then the database is marked as backed up and logs are deleted from passive and active copy.

As mentioned previously, doing Exchange backups historically required costly backup software as well as hardware, including storage, backup tapes, tapes libraries, and backup hustle.

Microsoft made a bold decision to change the Exchange world by introducing backup-less configuration, which I will now discuss in more detail.

Less is more, don’t you agree?

What does backup-less really mean? It simply means that you do not have to backup your Exchange data, or at the very least it gives you the ability for the first time to not have to back it up.

I can completely understand many of you doubting that this is in fact a possibility, to never have to backup your Exchange data, but before you make your decision let us explore backup-less architectures and learn how they really work.

Backup-less Architecture:
As stated above, backup in E12 could be done to the passive copy but this is only true for CCR or LCR. At the time, this was a viable option: to backup the passive node and then once backup is done the passive copy updates the database header, notifies the active node, and the active node deletes the logs.

Issues to consider before designing or deploying a Backup-less configuration:
- Data protection, Database health, Database recovery.
- What to do when you lose data.
- How to delete your logs.
- How to restore items and mailboxes like before.

In order to address these issues, you must understand how Backup-less Configurations work:

When you want to configure your Exchange in backup-less, you should have at least two copies of the data (Active/Passive.) Microsoft recommends doing backup-less in more than 3 copies (Active/Passive/Passive) configuration. In order to configure your infrastructure to be backup-less, you must obtain three copies of the data and configure circular logging on the mailbox database.

I can hear you saying, ‘Circular logging?! No way!’ And I understand your reaction, but keep in mind we never do circular logging unless we have strong reason to, so let us see how circular logging works with the backup-less.

Real World Example:
To illustrate how circular logging works with backup-less, let us consider the following example:

You have a mailbox store called MB1 that has 3 copies of it on Servers 1, 2 and 3. MB1 is active on Server 1 and has two copies on Servers 2 and 3. Now you want to configure it in Backup-less. All you have to do is configure the mailbox database to do circular logging, and once you do so Exchange will change its architecture slightly and perform circular logging in another way.

When circular logging is enabled on the database, the logs are written to the Hard disk. Once the data is committed to the database, logs will be flushed. In Backup-less (DAG environment only) this changes the Exchange behavior: logs are written but never get flushed until logs are replicated and marked as checked at the other database copies.

To understand this, let us go back to our example: MB1 has log E01 that is waiting to be written. E01 is written to the DB and now it gets held in Server 1 when before it would have gotten flushed.

Server 1 replicates E01 to Server 2, Server 2 copies the log and it remains in Server 1 where it checks the logs and marks it as healthy/inspected and notifies Server 1. Server 1 does the same with Server 3 and once Server 3 verifies its logs and reports to Server 1 that its copy of E01 is healthy/inspected, then Server 1 deletes and flushes the logs.

There are 2 questions that might arise at this point:
- Why didn’t Exchange wait until the log is replayed at Server 2 and Sever 3?
- Does Server 1 wait until it replicates the data to all of its adjacent servers? (In our example server 2 and server 3)

The answer to the first question is Exchange will not wait for the log replay because you might have a lagged replay configured on your DB copy. This means that you might replay the logs 48 hours later which translates into huge numbers of logs for Exchange.

I do not have a confirmed answer to the second question yet, but if you attended an Exchange 2010 Advanced storage session you would know that an Exchange server can recover and resend the logs, and even better, the specific bits in case of database corruption. But if Server 1 deletes its logs and the same for Server 2, then where does Server 3 get its logs from?

Hopefully by now the answer to that question is a little bit clearer. Exchange now has a self-based mechanism to flush its logs, but Backup-less configuration is not a specific setting that you assign to Exchange. By that I mean you don’t go to the options page and check the box stating this is a Backup-less organization; rather, this is a group of configurations that you apply to Exchange so you can deploy a Backup-less configuration. It is important to remember that this behavior is the same if you have 2 copies and do circular logging, even if you do backup.

There are several pertinent questions that we should answer one at a time:
- What about the health of my Database, Database availability, and uptime?
Exchange 2010 has a self-healing mechanism. What that means is that if page No. 485950 gets written to a bad block, or gets corrupted logically or physically, then Exchange 2010 can replicate this page from another server by copying only the required page with the next replication cycle. This keeps the Exchange database healthy and minimizes the replication requirements.

If Exchange cannot make the active database healthy then we have DAGs that pick the best available copy and make it an active copy. Typically if a physical server failed, a Hard disk failed, or a database failed physically or logically, you would not need your backup since you already have two copies. This means you don’t need your backup! (Are you becoming a backup-less fan yet?)

Now the other dimension is minimizing the storage cost. Since you have three copies of the database, and since Exchange 2010 has 70% less IOPs, you no longer need expensive SCSI disks, or even a SAN. I recommend using a JBOD configuration which is much more cost effective than any other storage option. Thus, in a backupless configuration, you can have three copies of your data and reduce both the backup software and hardware cost. (Considering jumping on the backup-less bandwagon now?)

- What should I do if I want to replace a single item or a mailbox?

Before answering that, first ask yourself how many times as an Exchange admin you had to do that (restore an item or mailbox for a user). In my career, I only had to do it at most three to five times. It might be different in your organization, but in general most Exchange administrators do not need to do that on regular basis.

Since we have cheaper storage we can increase the mailbox store dumpster. It is set at 14 days by default, but now you can increase it and ask the users to recover their mailbox store. You can also use the new RBAC (role-based access control) model and give helpdesk personnel the permission to search the Exchange dumpster and perform discovery within it using PowerShell in order to recover items for users…..meaning you as the Exchange Admin does not have to!

- Don’t I need a backup at all?

I will not say that you don’t need to backup the Exchange system at all, but you might want to consider backing it up as a second layer of protection. If you do perform a backup-less configuration, then your first line of defense is not the backup sets any more, it is your Exchange 2010 Backup-less configuration,. In other words, it is done automatically.

I know after being told for years to backup everything, most especially Exchange data, that it will be difficult to change your thinking radically with a single article. You probably have legislations that make you comply with 3 years’ restore SLA. But if you are one of the Exchange admins that do not have to abide by such legislations, then you should consider Backup-less Configuration.

Hopefully you now understand the architecture change of the circular logging, DAGs, and how to do backup-less configurations. Backup-less configuration is still an un-documented feature of Exchange 2010 and you will not find much information about it. My recommendation is that you open your mind to the idea and take care in calculating the total cost required for backup gear as compared to the B-less cost, without forgetting their technical and operational requirements as well. I cannot say that backup-less is for everyone, but it is a great option that can save you money, and one you should give decent thought to.

I look forward to bringing you another thought-provoking article within a month, and until that time I wish you the best uptimes and the fastest Exchange servers!

Labels:

Tuesday, January 19, 2010

Understanding Exchange 2010 Storage Architecture: Part 2

By Mahmoud Magdy

In Part 1 of our series on the Exchange 2010 storage architecture, we went back to the basics by reviewing Microsoft’s ESE (Extensible Storage Engine), then moved on to discuss the new enhancements that further reduce IOPS (Input/Output operations per Second.)

In Part 2, we will continue our journey through the Exchange 2010 storage enhancements by exploring the concepts of logical and physical changes to the Microsoft ESE database. But first I would like to revisit a few important topics that deserve elaboration--namely, the SIS (Single Instance Storage) removal and the Lazy View Updates.


SIS (Single Instance Storage) Removal:

SIS, or single instance storage, was introduced to the Exchange server product suite in Version 4.0 and remained there until the release of Exchange 2007 (Version 12). The role of SIS was to store a single copy of an email or attachment in a Mailbox database, thus allowing any recipients within that database who received the message to be able to access it via a single instance. The greatest asset of SIS was its ability to prevent attachments from being duplicated, engendering huge space savings on the disks.

SIS in Action:

Consider the following example:

When User A sends a message with a 1 MB attachment to a DL (Distribution List) or a group of 100 users, SIS steps in and delivers only 1 copy of the attachment to the mailbox store on which this particular group of users is located. Thus, instead of User A forcing that database to store all 100 MB, or 100 copies of the attachment, he or she saves approximately 99 MB of space on the Mailbox store.

Many people were concerned when they heard SIS was being removed from Exchange Server 2007, but one must trust that Microsoft has their reasons. In 1996 when Exchange 4.0 was released, disks were bigger, slower and more expensive in comparison to current storage prices. Since SIS is only effective when used within a single database, SIS was the perfect solution to reducing the size of mailbox stores in a time when many companies only had one database. The trend in storage architecture shifted as disks became smaller, faster, and cheaper, meaning that most companies now have multiple databases storing more users on fewer disks.

As disk storage became less expensive and the database engine itself evolved from the mid 1990s through the turn of the century, Microsoft admitted that the benefits of SIS were no longer as beneficial as they used to be. In fact, studies have indicated that the 20% database reduction savings were never fully realized, and that the more accurate figure was closer to 10% and in some cases as low as 5%. If you recall from Part 1 of our series, Microsoft decided to make a dramatic change to the ESE, but in order to do so they had to make a choice: keep SIS or provide better performance? To provide better performance meant Microsoft had to increase the IO size to 32KB and force the ESE to make larger IOs and reduce the frequency of read/writes. Incorporating these changes for the sake of better performance required bidding the SIS farewell.

After implementing these changes, however, Microsoft found that space hints and the new B+ tree architecture added approximately 20% space to the Exchange 2010 database, so Microsoft introduced a new feature called the Database Compression or LV (long value) Compression.

Before we dive into Long Value Compression, let’s first answer the question of what is a long value (LV)? As many of you know, in Exchange 2010 the boundary of a page size was increased to 32 KB, and to understand why you must first understand the basics of how data is stored in Exchange databases. In Exchange, all data stored in databases is held in B+ trees which are further divided into pages. The unit size used for caching in databases is the page size, which is the minimum size required for reading and writing to the database. Since performing operations by memory is much faster than reading directly from the disk, by increasing the page size to 32 KB it allowed the ESE to reduce IOPS. The result of the reduction in IOPS is improved performance since the larger page size is cached in the memory.

Now back to the explanation of Long Values. Since the page size in Exchange 2010 is 32 KB, the emails larger than this value end up consuming extra pages and space within the database. LV Compression is the solution to this problem: it defines another table to be used by those emails, and then they are compressed to provide better space saving.







The above figure illustrates the database file analysis and comparison between E12 and E14. E12 wins in the analysis for RTF files; however, as you all know most of the emails are text or HTML-based, so using the LV compression technique renders a better space saving. Even with the removal of the SIS, the Exchange 2010 DB file is reduced by about 12% less than the E12 database size.



Lazy View Update:
Another dramatic change to the ESE brought about by Exchange 2010 is the Lazy View Update. To examine this in further detail, let’s consider the following example:


In E12, if a User (who is using OWA or Outlook Web Access) has 5 views in his inbox, then the next time the User gets an email Exchange instantly updates all of the 5 views. While this improved the end-user experience, it forced Exchange to do 2 things:
1. Perform unnecessary IOPs. (i.e. The user might be out of office, or the email might have been received in the middle of the night, thus forcing Exchange to pay for IOPs that are not necessary.)
2. Since the update is done per email, it made Exchange create excessive small IOPs to update the views.



Microsoft has solved this problem with the introduction of Lazy View updates. Going back to our example, if the above User is using OWA or Outlook Online, the view will not be updated until that User opens the view. Although this might be slower on the backend than in previous versions, the larger and now sequential IOs that are performed prevent the User from noticing any performance impacts during viewing or opening the views.





ESE Logical Contiguity:


Microsoft has made dramatic changes to the ESE storage in order to allow better IO utilization using sequential IO; a single hard disk cannot exceed 200 random IOs, while a regular SATA disk can do 300+ sequential IOs easily.

Now to better reflex the changes in the ESE architecture, try to envision the following scenario in your head. (I recommend this approach as it has greatly helped me during my own Exchange sessions.)

Imagine that you are looking at the ESE database through two transparent films: one is a logical film and one is a physical film.

The logical film is how data is structured in the ESE database, and includes tables, indexes, LV (Long Value) tables, etc. Once data is located, you must go in and find its reflex and physical location within the ESE database. (Remember this is where the pages, which are stored directly on the hard disk, are stored inside the ESE database file.)




In Part 1 of this series, we introduced the concept of logical contiguity. Let us complete our exploration of this topic by looking at the following diagram:




Microsoft has changed the table architecture in the mailbox store from a table per database to a table per mailbox. This allows fewer yet larger size sequential IOs to be committed against the ESE database, and thus optimizes the IO operations at the logical layer.

SIS removal, table architecture change, LV Compressions and Lazy View Updates are all fundamental components of the logical architecture changes to the ESE engine.

ESE Physical Contiguity:


Now that we have explored logical contiguity, let us take a look at the physical structure inside the ESE Database. Recall from Part 1 that the ESE data is stored based on the B+ tree model, which consists of properties which are stored in records which are in turn placed in a node that is stored in a page.

In the previous versions of Exchange (E14 and below), data was stored inside the database in a random matter, which was the reasoning behind having to place logs in separate disks or spindles apart from the database files. This was done because logs used to commit sequential IO while Exchange used to commit Random IOs.

This behavior negatively impacted the Exchange storage design and performance, and over time the database became fragmented and offline defragmentation of the database was necessary. In order to improve this behavior, Microsoft has changed the ESE writing behavior so that it stores the ESE pages in a contiguous manner.

To understand it better, one must visualize the design. Take a look at the following diagram:


The above diagram compares the B+ tree in the previous version of Exchange to the current Exchange 2010 version. As you can see, in Exchange 2007 pages are committed to the database in a random manner, causing the database to become fragmented over time and forcing Exchange to commit IOs in small random orders.

In Exchange 2010, the B+ tree design has been modified: pages are now stored in a contiguous manner where they are written and read in a sequential manner, thus improving the physical contiguity of the ESE file.

There remain some missing pieces to the puzzle. For instance, what happens if a read/write IO has to be committed and it cannot be done sequentially? This mystery, along with others, will be discussed in Part 3 of this series.







Labels: ,

Tuesday, January 5, 2010

Understanding Exchange 2010 Storage Architecture: Part 1

By Mahmoud Magdy

In this article, we will take a close look at the Exchange 2010 Storage architecture, but first let us go back to the basics by reviewing the ESE engine storage and then delve into the new enhancements that were introduced with Exchange 2010. First, a brief review of the ESE basics: Microsoft’s Extensible Storage Engine (ESE) is an ISAM (Indexed Sequential Access Method) data storage technology. The purpose of the ESE is to allow applications to store and retrieve data via indexed and sequential access. The ESE is suitable for server applications since its transactions are highly concurrent; but at the same time it is lightweight enough that it also works well for auxiliary applications. Worried about losing stored data in the event of a system crashing? The ESE provides transacted data update and retrieval, meaning that data consistency is maintained should your system crash via the ESE’s crash recovery mechanism.


As you all know, ESE relies on the B+ tree in order to store data. The following diagram features a simple tree that illustrates how information is stored in the data tree:

Since sorting and searching through mounds of data is time-consuming, ESE stores data in trees in order to optimize their sorting and searching behavior. In addition, the regular tree model has been updated using the B+ tree to allow for faster, more efficient sorting of data.

There are 2 types of data sorting: either internal or external. Internal data sorting means that the system can store and sort the data in the memory. However, since it is impossible for each system to sort its data within the memory, the system is forced to store data on the disk and then begin using the B+ Tree.


Data in the ESE is stored based on the following hierarchy:

  • A property is created, generated and placed in table record. Keep in mind that MAPI uses properties in order to define data and their structure at the lowest level.


  • Multiple properties are placed in a record.


  • The record is stored on a node, and a corresponding key is used to both index and vastly access the record. One thing to remember is that the leaf nodes (the end nodes) are logically linked together to allow the horizontal crawling and movement of data within the B+ Tree.


  • A record is placed into lines which are then stored on a page, with the page being the smallest element of the hard disk. Storage sizes in previous versions of Exchange: In Exchange 2003 the hard disk size was 4 KB. That number doubled to 8 KB in Exchange 2007, and then quadrupled to 32 KB in Exchange 2010.

How did Microsoft improve the storage engine in Exchange 2010?

Exchange 2007 introduced significant enhancements for the storage usage and optimization, however Microsoft wanted to further improve these enhancements with the release of Exchange 2010. While doing preliminary research to determine the most pertinent areas in storage use and optimization that need attention, Microsoft found that enterprises suffer from several challenges with the current storage technologies, including but not limited to:

  • Random IO and disk limits: The current technologies provide limited random IOs throughput; however, most of the current systems can perform several hundred requests on sequential IOs.


  • Storage Design flexibility: As email communication increases, enterprises are continually demanding improved and flexible options for storing users’ growing amounts of data.


  • Using SATA Disks and JBOD technologies: Enterprises were limited to their capacity limits by the SAS/SCSI disks; however, there are currently 2 TB SATA disks (even though Exchange should be able to work with the limited throughput of the SATA disk.)

Task 1: change the ESE storage scheme:





In previous versions of Exchange, as illustrated in the first diagram, there were multiple tables per database that contained the users’ data. In figure 2 (and in Exchange 2007) there were multiple tables (for example: mailbox table, folders table, messages table, etc) per mailbox database. Thus, in order to open a user’s mailbox, Exchange required multiple small IOs to be performed.

In Exchange 2010, Microsoft moved to a table per mailbox, making it faster and easier to open a user’s mailbox. With Exchange 2010, opening a mailbox requires fewer and larger IOs in order to open a user’s mailbox and read specific email messages stored inside. This is due to the fact that the underlying architecture of the storage design was modified in Exchange 2010 in order to reduce IOPS (input/output operations per second). Microsoft dramatically reduced IOPS with Exchange 2010 to a full 70% reduction over 2007 and a 90% reduction over Exchange 2003.

In addition to the aforementioned features introduced in Exchange 2010, other enhancements have also been made to further reduce IOPS, including the Lazy View update and the usage of the ‘pay to play’ method. Remember that in previous versions of Exchange, custom views were updated as soon as the store received an email. Although this technique provided the end users with a better experience, it had a negative impact on Exchange, forcing the Exchange system to continuously update the view and create random small IOs in order to keep the store with the most updated view. With the Lazy View update, the email store is only updated when requested by the end user.

Exchange 2010 utilizes Lazy View technology in which the views are updated when the user attempts to access them. Although this increased the time it takes to open the view, it dramatically enhanced the Exchange IO performance by using the notion that it is faster for the disk to read data stored in larger, sequential pieces versus the disk head having to gather smaller chunks of data spread out across the disk.

In order for Microsoft to create a table per mailbox, they had to remove SIS (Single Instance storage). Some of you may complain about this initially, but never fear: Microsoft provided a work-around known as Database compression. This technology is used to compress the content of the database (especially text and html files), and provides an alternative to the SIS removal issue.

Now take another look at the Exchange 2010 ESE and compare it to Exchange 2007’s ESE. In Exchange 2007, in order to open a message in Joe’s mailbox, Exchange had to open the mailbox table, read the message header, open the message and read the attachment (examples of small random IOs.)

In Exchange 2010, the Exchange system can open the mailbox table, read the message header, and open the message directly. It is important to note that since these tables are now logically connected it is more convenient for Exchange to access them, and thanks to the new page size in Exchange 2010, E14 can read the entire message body in a single IO. If additional IOs are needed they can be done, but in order to streamline the data gathering process, these commands are now grouped in larger, sequential IOs.

Let us pause at this point and revisit our discussion of Microsoft’s enhancements to the ESE in Exchange 2010 in Part 2, at which time we will delve deeper into the topics of physical and logical contiguity.

Labels: ,


 

 

 


 

 

 

Previous Posts
Browse Monthly Archives

Suggest a Topic
Hire Us

Subscribe to
Posts [Atom]