| |
|
Tuesday, January 19, 2010
Understanding Exchange 2010 Storage Architecture: Part 2
By Mahmoud Magdy
In Part 1 of our series on the Exchange 2010 storage architecture, we went back to the basics by reviewing Microsoft’s ESE (Extensible Storage Engine), then moved on to discuss the new enhancements that further reduce IOPS (Input/Output operations per Second.)
In Part 2, we will continue our journey through the Exchange 2010 storage enhancements by exploring the concepts of logical and physical changes to the Microsoft ESE database. But first I would like to revisit a few important topics that deserve elaboration--namely, the SIS (Single Instance Storage) removal and the Lazy View Updates. SIS (Single Instance Storage) Removal:
SIS, or single instance storage, was introduced to the Exchange server product suite in Version 4.0 and remained there until the release of Exchange 2007 (Version 12). The role of SIS was to store a single copy of an email or attachment in a Mailbox database, thus allowing any recipients within that database who received the message to be able to access it via a single instance. The greatest asset of SIS was its ability to prevent attachments from being duplicated, engendering huge space savings on the disks.
SIS in Action:
Consider the following example:
When User A sends a message with a 1 MB attachment to a DL (Distribution List) or a group of 100 users, SIS steps in and delivers only 1 copy of the attachment to the mailbox store on which this particular group of users is located. Thus, instead of User A forcing that database to store all 100 MB, or 100 copies of the attachment, he or she saves approximately 99 MB of space on the Mailbox store.
Many people were concerned when they heard SIS was being removed from Exchange Server 2007, but one must trust that Microsoft has their reasons. In 1996 when Exchange 4.0 was released, disks were bigger, slower and more expensive in comparison to current storage prices. Since SIS is only effective when used within a single database, SIS was the perfect solution to reducing the size of mailbox stores in a time when many companies only had one database. The trend in storage architecture shifted as disks became smaller, faster, and cheaper, meaning that most companies now have multiple databases storing more users on fewer disks.
As disk storage became less expensive and the database engine itself evolved from the mid 1990s through the turn of the century, Microsoft admitted that the benefits of SIS were no longer as beneficial as they used to be. In fact, studies have indicated that the 20% database reduction savings were never fully realized, and that the more accurate figure was closer to 10% and in some cases as low as 5%. If you recall from Part 1 of our series, Microsoft decided to make a dramatic change to the ESE, but in order to do so they had to make a choice: keep SIS or provide better performance? To provide better performance meant Microsoft had to increase the IO size to 32KB and force the ESE to make larger IOs and reduce the frequency of read/writes. Incorporating these changes for the sake of better performance required bidding the SIS farewell.
After implementing these changes, however, Microsoft found that space hints and the new B+ tree architecture added approximately 20% space to the Exchange 2010 database, so Microsoft introduced a new feature called the Database Compression or LV (long value) Compression.
Before we dive into Long Value Compression, let’s first answer the question of what is a long value (LV)? As many of you know, in Exchange 2010 the boundary of a page size was increased to 32 KB, and to understand why you must first understand the basics of how data is stored in Exchange databases. In Exchange, all data stored in databases is held in B+ trees which are further divided into pages. The unit size used for caching in databases is the page size, which is the minimum size required for reading and writing to the database. Since performing operations by memory is much faster than reading directly from the disk, by increasing the page size to 32 KB it allowed the ESE to reduce IOPS. The result of the reduction in IOPS is improved performance since the larger page size is cached in the memory.
Now back to the explanation of Long Values. Since the page size in Exchange 2010 is 32 KB, the emails larger than this value end up consuming extra pages and space within the database. LV Compression is the solution to this problem: it defines another table to be used by those emails, and then they are compressed to provide better space saving.
The above figure illustrates the database file analysis and comparison between E12 and E14. E12 wins in the analysis for RTF files; however, as you all know most of the emails are text or HTML-based, so using the LV compression technique renders a better space saving. Even with the removal of the SIS, the Exchange 2010 DB file is reduced by about 12% less than the E12 database size.
Lazy View Update: Another dramatic change to the ESE brought about by Exchange 2010 is the Lazy View Update. To examine this in further detail, let’s consider the following example:
In E12, if a User (who is using OWA or Outlook Web Access) has 5 views in his inbox, then the next time the User gets an email Exchange instantly updates all of the 5 views. While this improved the end-user experience, it forced Exchange to do 2 things: 1. Perform unnecessary IOPs. (i.e. The user might be out of office, or the email might have been received in the middle of the night, thus forcing Exchange to pay for IOPs that are not necessary.) 2. Since the update is done per email, it made Exchange create excessive small IOPs to update the views.
Microsoft has solved this problem with the introduction of Lazy View updates. Going back to our example, if the above User is using OWA or Outlook Online, the view will not be updated until that User opens the view. Although this might be slower on the backend than in previous versions, the larger and now sequential IOs that are performed prevent the User from noticing any performance impacts during viewing or opening the views.

ESE Logical Contiguity:
Microsoft has made dramatic changes to the ESE storage in order to allow better IO utilization using sequential IO; a single hard disk cannot exceed 200 random IOs, while a regular SATA disk can do 300+ sequential IOs easily. Now to better reflex the changes in the ESE architecture, try to envision the following scenario in your head. (I recommend this approach as it has greatly helped me during my own Exchange sessions.)
Imagine that you are looking at the ESE database through two transparent films: one is a logical film and one is a physical film.
The logical film is how data is structured in the ESE database, and includes tables, indexes, LV (Long Value) tables, etc. Once data is located, you must go in and find its reflex and physical location within the ESE database. (Remember this is where the pages, which are stored directly on the hard disk, are stored inside the ESE database file.)
In Part 1 of this series, we introduced the concept of logical contiguity. Let us complete our exploration of this topic by looking at the following diagram:

Microsoft has changed the table architecture in the mailbox store from a table per database to a table per mailbox. This allows fewer yet larger size sequential IOs to be committed against the ESE database, and thus optimizes the IO operations at the logical layer.
SIS removal, table architecture change, LV Compressions and Lazy View Updates are all fundamental components of the logical architecture changes to the ESE engine.
ESE Physical Contiguity:
Now that we have explored logical contiguity, let us take a look at the physical structure inside the ESE Database. Recall from Part 1 that the ESE data is stored based on the B+ tree model, which consists of properties which are stored in records which are in turn placed in a node that is stored in a page.
In the previous versions of Exchange (E14 and below), data was stored inside the database in a random matter, which was the reasoning behind having to place logs in separate disks or spindles apart from the database files. This was done because logs used to commit sequential IO while Exchange used to commit Random IOs.
This behavior negatively impacted the Exchange storage design and performance, and over time the database became fragmented and offline defragmentation of the database was necessary. In order to improve this behavior, Microsoft has changed the ESE writing behavior so that it stores the ESE pages in a contiguous manner.
To understand it better, one must visualize the design. Take a look at the following diagram:
The above diagram compares the B+ tree in the previous version of Exchange to the current Exchange 2010 version. As you can see, in Exchange 2007 pages are committed to the database in a random manner, causing the database to become fragmented over time and forcing Exchange to commit IOs in small random orders.
In Exchange 2010, the B+ tree design has been modified: pages are now stored in a contiguous manner where they are written and read in a sequential manner, thus improving the physical contiguity of the ESE file. There remain some missing pieces to the puzzle. For instance, what happens if a read/write IO has to be committed and it cannot be done sequentially? This mystery, along with others, will be discussed in Part 3 of this series.
Labels: Exchange 2010, Exchange Information Stores
0 Comments:
|
|
|
Previous Posts
Suggest a Topic
Hire Us
Subscribe to Posts [Atom]
|
|