Microsoft has made dramatic changes to the ESE storage in order to allow better IO utilization using sequential IO; a single hard disk cannot exceed 200 random IOs, while a regular SATA disk can do 300+ sequential IOs easily.
Now to better reflex the changes in the ESE architecture, try to envision the following scenario in your head. (I recommend this approach as it has greatly helped me during my own Exchange sessions.)
Imagine that you are looking at the ESE database through two transparent films: one is a logical film and one is a physical film.
The logical film is how data is structured in the ESE database, and includes tables, indexes, LV (Long Value) tables, etc. Once data is located, you must go in and find its reflex and physical location within the ESE database. (Remember this is where the pages, which are stored directly on the hard disk, are stored inside the ESE database file.)
In Part 1 of this series, we introduced the concept of logical contiguity. Let us complete our exploration of this topic by looking at the following diagram:

Microsoft has changed the table architecture in the mailbox store from a table per database to a table per mailbox. This allows fewer yet larger size sequential IOs to be committed against the ESE database, and thus optimizes the IO operations at the logical layer.
SIS removal, table architecture change, LV Compressions and Lazy View Updates are all fundamental components of the logical architecture changes to the ESE engine.
ESE Physical Contiguity:
Now that we have explored logical contiguity, let us take a look at the physical structure inside the ESE Database. Recall from Part 1 that the ESE data is stored based on the B+ tree model, which consists of properties which are stored in records which are in turn placed in a node that is stored in a page.
In the previous versions of Exchange (E14 and below), data was stored inside the database in a random matter, which was the reasoning behind having to place logs in separate disks or spindles apart from the database files. This was done because logs used to commit sequential IO while Exchange used to commit Random IOs.
This behavior negatively impacted the Exchange storage design and performance, and over time the database became fragmented and offline defragmentation of the database was necessary. In order to improve this behavior, Microsoft has changed the ESE writing behavior so that it stores the ESE pages in a contiguous manner.
To understand it better, one must visualize the design. Take a look at the following diagram:
The above diagram compares the B+ tree in the previous version of Exchange to the current Exchange 2010 version. As you can see, in Exchange 2007 pages are committed to the database in a random manner, causing the database to become fragmented over time and forcing Exchange to commit IOs in small random orders.
In Exchange 2010, the B+ tree design has been modified: pages are now stored in a contiguous manner where they are written and read in a sequential manner, thus improving the physical contiguity of the ESE file.
There remain some missing pieces to the puzzle. For instance, what happens if a read/write IO has to be committed and it cannot be done sequentially? This mystery, along with others, will be discussed in Part 3 of this series.
Labels: Exchange 2010, Exchange Information Stores
Tuesday, January 5, 2010
Understanding Exchange 2010 Storage Architecture: Part 1
By Mahmoud Magdy
In this article, we will take a close look at the Exchange 2010 Storage architecture, but first let us go back to the basics by reviewing the ESE engine storage and then delve into the new enhancements that were introduced with Exchange 2010. First, a brief review of the ESE basics: Microsoft’s Extensible Storage Engine (ESE) is an ISAM (Indexed Sequential Access Method) data storage technology. The purpose of the ESE is to allow applications to store and retrieve data via indexed and sequential access. The ESE is suitable for server applications since its transactions are highly concurrent; but at the same time it is lightweight enough that it also works well for auxiliary applications. Worried about losing stored data in the event of a system crashing? The ESE provides transacted data update and retrieval, meaning that data consistency is maintained should your system crash via the ESE’s crash recovery mechanism.
As you all know, ESE relies on the B+ tree in order to store data. The following diagram features a simple tree that illustrates how information is stored in the data tree:
Since sorting and searching through mounds of data is time-consuming, ESE stores data in trees in order to optimize their sorting and searching behavior. In addition, the regular tree model has been updated using the B+ tree to allow for faster, more efficient sorting of data.
There are 2 types of data sorting: either internal or external. Internal data sorting means that the system can store and sort the data in the memory. However, since it is impossible for each system to sort its data within the memory, the system is forced to store data on the disk and then begin using the B+ Tree.
Data in the ESE is stored based on the following hierarchy:
- A property is created, generated and placed in table record. Keep in mind that MAPI uses properties in order to define data and their structure at the lowest level.
- Multiple properties are placed in a record.
- The record is stored on a node, and a corresponding key is used to both index and vastly access the record. One thing to remember is that the leaf nodes (the end nodes) are logically linked together to allow the horizontal crawling and movement of data within the B+ Tree.
- A record is placed into lines which are then stored on a page, with the page being the smallest element of the hard disk. Storage sizes in previous versions of Exchange: In Exchange 2003 the hard disk size was 4 KB. That number doubled to 8 KB in Exchange 2007, and then quadrupled to 32 KB in Exchange 2010.
How did Microsoft improve the storage engine in Exchange 2010?
Exchange 2007 introduced significant enhancements for the storage usage and optimization, however Microsoft wanted to further improve these enhancements with the release of Exchange 2010. While doing preliminary research to determine the most pertinent areas in storage use and optimization that need attention, Microsoft found that enterprises suffer from several challenges with the current storage technologies, including but not limited to:
- Random IO and disk limits: The current technologies provide limited random IOs throughput; however, most of the current systems can perform several hundred requests on sequential IOs.
- Storage Design flexibility: As email communication increases, enterprises are continually demanding improved and flexible options for storing users’ growing amounts of data.
- Using SATA Disks and JBOD technologies: Enterprises were limited to their capacity limits by the SAS/SCSI disks; however, there are currently 2 TB SATA disks (even though Exchange should be able to work with the limited throughput of the SATA disk.)
Task 1: change the ESE storage scheme:

In previous versions of Exchange, as illustrated in the first diagram, there were multiple tables per database that contained the users’ data. In figure 2 (and in Exchange 2007) there were multiple tables (for example: mailbox table, folders table, messages table, etc) per mailbox database. Thus, in order to open a user’s mailbox, Exchange required multiple small IOs to be performed.
In Exchange 2010, Microsoft moved to a table per mailbox, making it faster and easier to open a user’s mailbox. With Exchange 2010, opening a mailbox requires fewer and larger IOs in order to open a user’s mailbox and read specific email messages stored inside. This is due to the fact that the underlying architecture of the storage design was modified in Exchange 2010 in order to reduce IOPS (input/output operations per second). Microsoft dramatically reduced IOPS with Exchange 2010 to a full 70% reduction over 2007 and a 90% reduction over Exchange 2003.
In addition to the aforementioned features introduced in Exchange 2010, other enhancements have also been made to further reduce IOPS, including the Lazy View update and the usage of the ‘pay to play’ method. Remember that in previous versions of Exchange, custom views were updated as soon as the store received an email. Although this technique provided the end users with a better experience, it had a negative impact on Exchange, forcing the Exchange system to continuously update the view and create random small IOs in order to keep the store with the most updated view. With the Lazy View update, the email store is only updated when requested by the end user.
Exchange 2010 utilizes Lazy View technology in which the views are updated when the user attempts to access them. Although this increased the time it takes to open the view, it dramatically enhanced the Exchange IO performance by using the notion that it is faster for the disk to read data stored in larger, sequential pieces versus the disk head having to gather smaller chunks of data spread out across the disk.
In order for Microsoft to create a table per mailbox, they had to remove SIS (Single Instance storage). Some of you may complain about this initially, but never fear: Microsoft provided a work-around known as Database compression. This technology is used to compress the content of the database (especially text and html files), and provides an alternative to the SIS removal issue.
Now take another look at the Exchange 2010 ESE and compare it to Exchange 2007’s ESE. In Exchange 2007, in order to open a message in Joe’s mailbox, Exchange had to open the mailbox table, read the message header, open the message and read the attachment (examples of small random IOs.)
In Exchange 2010, the Exchange system can open the mailbox table, read the message header, and open the message directly. It is important to note that since these tables are now logically connected it is more convenient for Exchange to access them, and thanks to the new page size in Exchange 2010, E14 can read the entire message body in a single IO. If additional IOs are needed they can be done, but in order to streamline the data gathering process, these commands are now grouped in larger, sequential IOs.
Let us pause at this point and revisit our discussion of Microsoft’s enhancements to the ESE in Exchange 2010 in Part 2, at which time we will delve deeper into the topics of physical and logical contiguity.Labels: Exchange 2010, Exchange Information Stores
Tuesday, July 8, 2008
To Offline Defrag or not to Offline Defrag that is the question...
Reproduced with permission from TelnetPort25.
Many Exchange Admins will come across this conundrum at some point during their careers - essentially when is it good to defrag your Exchange databases. There are of course many different views expressed by many different Exchange administrators on this particular subject - therefore in this article I would like to share some of my own personal thoughts on the subject.
As many of you know ESEUTIL is the tool supplied with Exchange that allows for an Offline Defrag to happen before I begin I would like to do a brief overview of ESEUTIL.
What is ESEUTIL?
ESEUTIL (located in the <Exchange Installation\Bin Folder> – might be considered by many to be the dark over-lord of database utilities which, in the blink of an eye can reduce your information store to a quivering mass of non-functional dog do-do, and accelerate the demise of your career as an Exchange Admin.
However, is ESEUTIL really that bad? – I suppose that the answer to this is yes and no as using the tool incorrectly – or – when it is not need can produce undesirable situations – the following are some quick bullet points about the Pro’s and the Con’s of using the ESEUTILS:
Pros:
- Using ESEUTIL correctly and when required (more on this later) can physically reduce the size of your information store databases
- When you have no other options left and you have a dead database ESEUTIL can get you some data back (by using the dreaded /P switch)
- ESEUTIL can be used in a recovery scenario to roll forward to a specific point post a disaster as long as you have the Transaction Logs (but then again so can most decent backup products for Exchange)
- ESEUTIL can be used to check the structural health of your database
- ESEUTIL can be used to clone your database
Cons:
- Its command line based, messing up a command could leave you with a dead database
- Any Database that you intend to run ESEUTIL against must be off-line – therefore users cannot access the system resulting in lengthy down-time
- Its slow – Depending on your hardware ESEUTIL will run at around 3 – 6 GB per hour (under a repair) and can be in-determinant during defrags
- Its not intelligent – this is dangerous, for example – a Defrag process creates a new database, copies useful data from the old database to the new and then deletes the old PRODUCTION database and renames the TEMP database to the same name as the old – what if power is cut to the server during the Production Delete? and the rest of the process does not finish – ouch!
Generally speaking you should only use ESEUTIL under the following Circumstances (there are generally no exceptions):
- When you have no usable backup of your Exchange Databases – Repair Scenarios
- When you have had a lot of transient behaviour in the database – Defrag Scenarios – for example;
- A large number of users have either left the company, or moved to another store within the environment
- You have installed a archiving solution into your environment and it has been running for at least 5 months
- You have hit a limit on the Database (in the standard Edition of Exchange only) – this scenario should not happen when using SP2 of Exchange 2003 or Exchange 2007
- When you have good reason (good means Application Event Log errors) that suggest a corruption in the Database – Integrity Scenarios
- When you wish to replay log files into the Database
- When it is recommended by Microsoft Product Support Services, or when you are confident about using the command syntax and you are sure that it is going to be of benefit to you
OK, But I am still interested in ESEUTIL – can you give us some further information?
ESEUtil is designed to check and fix individual database tables based around the JET BLUE engine, however products like Exchange are comprised of many structured and complex pages (which can be either 4 or 8 kilobytes in size) which in turn are linked via indices which are accessed sequentially (this is called ISAM).
As a result ESEUTIL is not Necessarily aware of the data contained within the database pages – nor the relationships between database pages. The results of which when ESEUTIL is used for example using its “Hard Repair” (/P) mode, when it finds a damaged page or index it deletes it, nothing else, just deletes. Given the previous scenario you may of had a database that will not Mount – however /P will potentially get you into the position where it will Mount – but you will normally find missing data within Exchange.
An example of which is many years ago when consulting I encountered an Exchange 5.5 system where the Information Store would not start. There was no backup therefore I had to use ESEUTIL /P in order to get the store to start – ESEUTIL fixed the database and the store service then started, however, every user in the database lost access to all their attachments (the icon would show in Outlook indicating that the message had an attachment, but attachments could be accessed).
Additionally ESEUTIL can be used to de-fragment, check the Integrity of, recover (Hard and Soft), copy, checksum, and dump various informational aspects of your databases.
ESEUTIL – De-fragmentation Mode [/D];
OK, now that we have had a brief look at ESEUTIL and indeed established that it is indeed a tool that needs to be respected - I would like to go over the command switch of ESEUTIL which most people to ask Questions about the DEFRAG – or – the /D Switch.
The de fragmentation Mode of ESEUTIL is designed to reduce the physical size of your Exchange Databases – as online de-fragmentation does not physically reduce the size of the DB – is essentially performs internal maintenance within the Exchange Database.
It does this by creating a temporary Database file, reading through the live database page by page and copying all relevant data into the Temp database (note it skips over white space identified by the online maintenance (event ID 1221) in the Live Database) this process is generally known as re-organisation.
When all of the data from the live database is copied over into the temp database, the live instance is deleted and the temp database is renamed to that of the previous live instances (although this is a very simplified overview of the command).
All indexes in the database are also recreated as part of this process.
You should be aware that you will at least 110% of the size of your production database free on the drive in order to have a successful de fragmentation, although should this not be possible you have the option of either redirecting the Temp file to another disk on the server – or – by following the steps in this article http://articles.techrepublic.com.com/5100-22_11-5285289.html you can copy all of the required files to another server with enough space to handle the defrag, but bear in mind that you will have to copy the database back from the additional server which adds to the overall down-time of your mail system – and also introduces the (small) chance of corruption during the copy back from the source server over the network.
The basic command syntax for the De-fragmentation command is as follows:
ESEUTIL /D <path to database file> – for example ESEUTIL /D x:\EXCHSRVR\SG1\DB\Priv1.edb
There are a number of partner command line switches which accompany the de-fragmentation mode which are as follows:
/S – Specify the location of the Streaming File (this option is not implemented in 5.5 or 2007)
/T – Specify the location where the Temp Database file is to be created (useful if the disk that the database is on does not have enough free space to complete the De-fragmentation)
/F – Specify the location and the name of the temp streaming file (this option is not implemented in 5.5 or 2007)
/I – Do not de-fragment the streaming file
/P – Do not delete the temporary database files at the end of the process
/B – Make a backup copy of the database
Given the above commands and options – if I wished to defrag my Priv1.edb which is located on Drive X:, but place the temp file on L: I would use the following command:
ESEUTIL /D x:\EXCHSRVR\SG1\DB\Priv1.edb /T L:\<tempFile.tmp>
From the above you can derive that the syntax for a successful command is: ESEUTIL /D <Path to DB> <Options – e.g. /T>
One of the questions that many people ask how much space can they generally expect to claim back by performing an off-line defrag on their information store – the answer to which is pretty difficult to give, and should be mainly based around minimum expectation.
For example: the normal and widely accepted way to gain an idea is to check the Event Log for Event ID 1221 – see below:
Essentially the part of the Event Description which states “has ‘n’ megabytes if free space” is the bit that you are interested in.
This the value of ‘n’ in the Event is generally described as the least amount of space that you can claim back (to within one megabyte).
There is another way in which you can calculate the amount of space that you might gain back – however it does require you to take the database off-line to perform the process.
Although this is a pain – I have found this method to be pretty accurate when determining space reclamation metrics:
- In the Exchange System Manager Dismount the Database that you wish to process
- Open a Windows Command Prompt ([ Start -> Run -> Type CMD the press <Enter> ]) and type in the following command:
ESEUTIL /MS <path to edb file> >c:\Analysis.txt then press enter – see below
This will produce a text file (located in C:\) called Analysis.txt – when you open this file you will see that it is split up into two sections:
SLV Space Dump – this relates to the STM File (not in Exchange 2007) – see below:
At the bottom of the SLV dump (you will find a section entitled “TOTALS”) here there is an entry called “FREE” – the value of this when multiplied by 4096 (this length of a database page in Exchange 2003) will give you the free space in the STM file in bytes – so from my results above:
78 * 4096 = 319488 bytes
319488 bytes = 312KB – space that could be reclaimed
The other section of the report which is called the SPACE DUMP (which is much longer than the SLV dump as it relates to the EDB file) – looks like the following (please note that the following example has been cropped):
At the bottom of the SPACE DUMP on the far right hand side (under the “AVAILABLE” column) you will have a value.
In my case this value is 524, again this is the free space in bytes – therefore in order to determine how much space that I would get back I would use the following calculation:
524 * 4096 = 2096 KB – space that could be reclaimed
Checking the Event 1221 events is easy and does not cause any disruption to normal operations, however using the ESEUTIL /MS does require the store to be off-line – personally I feel that using the ESEUTIL /MS command gives you a more accurate representation of the space that could be recovered, but you need to be aware that it does cause disruption – however if you are considering defragging your Exchange Databases you could build the space analysis in the down-time required.
I would personally only use the ESEUTIL /MS method to check for potential space under the following circumstances:
- When a large number of people (much greater than 500 users whom were heavy users) have left the organisation and their mailboxes have been deleted from the store (Purged)
- When a company instigates a program such as Mailbox Archiving where mail items going back many years are removed from the store
- You know for a fact that there are been no defrag performed on the store for a number of years (at least 3 years).
Now that we have an idea about how much space MIGHT be reclaimed, the question that needs to be answered is – “Do I actually need to defrag the database?”
Do you Need to Defrag Your Database?
OK lets consider the basic reasons why an Exchange Admin would consider De-fragmentation of one of their databases and then go over some explanation of as to why a DEFRAG might not be your first option even though it might seem so:
- Performance issues
- Running out of space on the Database Disk
- General Space reclamation
- When asked to by Microsoft PSS
Performance Issues:
One of the first things that I would like to address is that having a large database does not always mean that you will have poor performance.
In Exchange 2003 Enterprise Edition the theoretical maximum size of a single Database can be 16TB (or as often described “unlimited”) whereas in Exchange 2003 SP2 STANDARD Edition the maximum size of the Database is 75GB – however in practicality one would assume that there must be a point where Size = Performance.
I have seen Exchange Database instances which have reached sizes between 190 and 220 GB (and I also know of larger sizes) which perform very well, however the underlying hardware has been specified to cope with the IO and Operation Per Second (IO/OPS) demands that such a size would require. It should also be noted that an Exchange Database should be cared for – they should be monitored, have sensible online maintenance windows which complete and backup regimes that are successful and serviced sensibly.
Diametrically – I have also seen Database sizes of 56 GB which perform very badly, this can be linked to the hardware, online maintenance does not run correctly and no form of checks are made upon them.
So in terms of the [ Size = Performance ] theory (considering the statements above), the outcome means that the formula (as stated) might change to:
Size (of DB) + Administrative Specification + Administrative Habits / User Habits = Desired Deserved Performance
Essentially if you specify your hardware according to accurate load, ensure that required routines run against the databases (and do not overlap with backup schedules) then you can expect the overall physical file sizes to reach significant proportions without manual intervention, however if you do not follow initial sizing guidelines and allow for your Exchange server to proceed un-monitored and do not regulate the actions of your user population then your are asking for trouble.
In terms of my statement “regulating the actions of your user population” – There is another school of thought (on overall performance) right from the development team of Exchange) where it is stated that the amount of items in “Critical Path Folders” – e.g. Inbox, Calendar, and Sent Items can also have an effect on the performance of a user / database – have a look at the following article here: http://msexchangeteam.com/archive/2005/03/14/395229.aspx (and read the comments) – essentially if you allow for your users to use Exchange as a “Filing System” you might (or perhaps will) experience performance issues.
So in summary if you are experiencing performance issues with your Exchange Databases, before you consider using ESEUTIL to defrag, have a look at other root causes. As mentioned above it is possible to have really large EDB files and acceptable performance, so in the first instance use tools such as PERFMON which will give you useful information about what your Exchange Server is doing.
The following is a link to the Microsoft’s Exchange Performance and Scalability Guide here you will find an overview of which counters within PERFMON are relevant to Exchange (http://technet.microsoft.com/en-us/library/aa996078.aspx) I also recommend that you down-load the PerfMON Wizard which automates the configuration of a number of counters that can provide data regarding the performance of your Information Store.
Also if you experiencing performance problems it is an idea to have a check what is going on inside your Exchange databases – this can be accomplished by opening up the Exchange System Manager then navigating to the following: [ Administrative Groups -> Servers -> Your Server -> Storage Group -> Database Name -> Mailboxes ] and have a look under the “Total Items” column readings here will give you an idea if any (or many) of your users are falling into the criteria which the article on the Ms Exchange Team blog describes.
My final comments on performance are that you should ensure that the setup and configuration of you Exchange Disk subsystem is configured and specified to the load and size of the database – if you are experiencing performance problems – before even considering a Defrag have a look at the following:
- Are you using the correct RAID levels for your Databases and Transaction Logs (RAID 5 (or 10) for Databases RAID 1 for Transaction Logs)
- Have you separated out your Transaction Logs from your Databases
- Does each database exist on its own LUN
- Move the TEMP/TMP to a high performance drive
- Are you using 10K or 15K drives?
If you have gone through all of the above and feel that everything is is, then it might be worth considering Defragging the Database.
Running out of space on the Database Disk:
From what I have seen in the Forums this is one of the most common reasons for administrators wishing to run ESEUTIL /D.
Wherever you can the best option is to add further disk and the move you databases over to the new storage rather than Defrag; I say this as generally you are never looking at huge amounts of space being reclaimed from your Databases when you use ESEUTIL in defrag mode – so in the end you are putting off the inevitable (running out of space) so the best option is to bite the bullet so to speak and add further storage.
To give you an idea of storage reclamation I recently ran a Defrag against my corporate databases and the following are the space saving results (bear in mind that it has been 3 years since I last defragged the stores, and for two of the 3 years we have been using Enterprise Vault:
As you can see for the time periods involved, the amount of users and the presence of an Archiving solution the space savings are not huge.
However if you are not in a position to increase the storage within your server then you would have little choice but to use ESEUTIL /D however I would recommend the following prior to running the defrag:
- Examine the Application Event Log for ID 1221 – or perform a ESEUTIL /MS against the databases (then use the method of working out the potential space reclamation from above) - this will help you work out how much space you will get back – you might be faced with a situation where the amount of space that you reclaim is only enough for another month – therefore you will need to present a business case for upgrading the server.
- Ensure that you backup your Databases prior to running the Defrag
- Prepare your business for down time – depending on the hardware that you have and the size of your Databases ESEUTIL /D can take quite a while to run (for example if you look at my table above SRV2–GeneralStorageDB.edb took 17 hours to complete – and that was without any other databases mounted or being defragged on the same server).
General Space reclamation:
General space reclamation suggests that an administrator is using ESEUTIL /D as part of a scheduled and regular maintenance task. Please do not do this – from the examples that I have given above even with high user turn over, an archiving solution and several years between defrags I only claimed back slightly over 21 GB from 12 Databases if you examine the tables from a per database perspective the actual space reclaimed represents a very small percentage of the overall size of the DB.
Scheduling regular Defrags (for example every 6 months) only guarantees that your database will be off-line for several hours every 6 months.
If you have the inclination to reclaim space periodically and you are Using the Enterprise Edition of Exchange server – then perhaps a better way of doing this is to create a new database and then move your users over to the new database. This eliminates down-time and also serves the same purpose as defragging.
When asked to by Microsoft PSS:
Those of you whom have support agreements with Microsoft may be asked to defrag a database as part of a support call.
Normally PSS will be trying to get an index rebuild rather than being interested in shrinking the size of the database – however, they know what they are doing – but ensure that you follow their instructions to the letter.
Summary:
Ultimately your Exchange database belongs to you, therefore as an Admin you are best placed to make a choice on the course of action that you wish to take. The above is general advice from experience – however it may not fit all scenarios, so just to finish if you are going to perform this task please consider the following pointers:
- Always ensure that you have a backup of ANY Exchange Database that you are going to Defrag – ensure that you have tried to restore it prior to starting.
- Understand that Defragging is a lengthly process – your database will be out of action for a significant period of time.
- Ensure that your server has a working UPS – nothing worse than a power outage right in the middle of using this tool.
- If you have the Enterprise Edition of Exchange – consider creating a new store and moving the users over rather than a off-line defrag.
- If you have performance issues consider the performance area and the options given there before Defragging – there is nothing worse than taking your database down for 10 hours and then getting no perceivable benefit.
- Do your homework on the amount of space you might get back – similar to above, nothing worse than 10 hours down-time and only getting back 1 GB
- Don’t use ESEUTIL /D as part of a regular schedule – its not worth it.
This article was provided by: Andy Grogan
Labels: Exchange 2003, Exchange 2007, Exchange Information Stores, Exchange Support, Exchange Tips