NetApp Tech OnTap

Getting the Most from Exchange 2007 Replication

Best Practices for SP1

About a year ago, I wrote an article that appeared here in Tech OnTap discussing, among other things, the new replication capabilities in Microsoft® Exchange Server 2007. The recent release of Exchange 2007 Service Pack 1 (SP1) not only provides bug fixes and enhancements to the capabilities I described, it also includes substantial new functionality, and it affects the way you should design the architecture of your NetApp® storage for Exchange. This article is an overview of the changes that have the greatest impact on storage and a reexamination of best practices for designing Exchange storage. For details on these topics, see my recent technical report, “Microsoft Exchange 2007 SP1 Continuous Replication Best Practices Guide .”

Important Enhancements and New Features

Among the many changes in Exchange 2007 SP1, there are five that you should definitely know about.

SP1 is the only version of Exchange 2007 that supports Windows® Server 2008. Windows Server 2003 is still supported under SP1 as well.

SP1 offers a new flavor of replication, Standby Continuous Replication—SCR. SCR lets you separate high availability and site resilience functions and lets you create a lot of new redundancy scenarios. It can be configured for use with any type of Exchange server as long as it’s not also configured for Local Continuous Replication. (You can’t be using LCR on your target SCR servers either.) You can have multiple SCR targets for each storage group. For instance, you might have both a local and a remote replica. You can also configure a single SCR server as a target for multiple source servers.

SP1 completely redesigns how storage is used on replication targets to make it much more efficient. The I/O on all replication targets (LCR, CCR, and SCR) is dramatically reduced from what it was in the initial release of Exchange 2007. Formerly, database I/O could be as much as 200% to 300% more on replication targets than it was on the source. In SP1, database I/O on the target is now 78% less than on the source. Target log I/O has also been reduced in SP1.

The aggressiveness of online defragmentation has been reduced and monitoring is provided. You can now see how long a defragmentation pass takes to complete and how often the database has been fully defragmented. You can use this information to adjust the online maintenance window to reduce disk churn, which affects Snapshot™ space consumption.

Database scanning can now be performed during online maintenance to check for corruption. Prior to SP1, the only way to ensure that a database was not corrupt without taking it offline was to perform an online streaming backup, forcing each page in the database to be read and verified. This was also the only way to force the zeroing of deleted pages, as mandated in some security scenarios. If you’ve been performing VSS backups on a replica rather than on the active database, this probably means that your production database has never been fully verified.

In practice, both NetApp and Microsoft recommend using Visual SourceSafe® (VSS) for backup. The backup application verifies the backup data, but once again a check is never actually run on the live data. That creates a small but real risk of database corruption creeping in.

When enabled in the registry of an Exchange server with SP1, database scanning and page zeroing run every day during the online maintenance window to verify the active database and zero deleted pages.

Best Practices

At this point you’re probably saying, “That’s all great, but how should I take advantage of these new features in my NetApp environment?”

Windows Server 2008. If you want your Exchange servers to run the latest OS, you need to install SP1 on a new server running Windows 2008; upgrading the OS of a running server is not supported. You don’t need to install Exchange 2007 before installing SP1; the service pack includes everything.

Regardless of OS, I recommend upgrading to SP1 for the bug fixes and enhancements, which have a major impact on storage.

LUN configuration for replication. The Microsoft recommended best practice is that you provide the same LUN configuration for both the source and target sides of each replica—and naturally you’ll want the source and target LUNs to be on separate storage systems. Because of the heavy I/O load that the initial release of Exchange 2007 put on target servers, we used to recommend that you configure a separate log and database aggregate for each cluster node. If a storage system was supporting multiple replication servers (sources and/or targets), it needed a lot of independent aggregates.

With the reduction in I/O load in SP1, that requirement goes away. We continue to recommend that you keep logs and databases in separate aggregates (and you should still keep source and target storage separate), but you can now group multiple databases from separate clusters into a single aggregate, and the same goes for logs. This eliminates “islands” of storage, simplifies storage configuration, and can increase utilization as well.

CCR Cluster

Figure 1) LUN configuration for replicas with Exchange 2007 SP1. Both source and target storage systems use 1 aggregate for databases and 1 for logs for a total of 4 aggregates (2 on each system) versus the 12 aggregates that were previously needed.

Online defragmentation. Defragmentation can have a big impact on the number of blocks that must be retained by NetApp Snapshot copies. The defrag process changes blocks, and NetApp Snapshot copies maintain a stable, point-in-time image by retaining change blocks. Therefore the more blocks you change through defragmentation, the more space you need to store Snapshot copies.

This is where the new metrics come in. Microsoft recommends that defragmentation be completed in 14 days, so if you check the metrics and find that defragmentation is completing much sooner, you can shorten the length of the daily online maintenance window to reduce the amount of daily disk churn, which in turn reduces Snapshot space consumption.

For more precision, two performance counters can be logged to determine your online defragmentation trend:

  • MSExchange Database ==> Instances\Online Defrag Pages Freed/Sec
  • MSExchange Database ==> Instances\Online Defrag Pages Read/sec.
If the ratio of reads to freed pages is greater than 100 to 1, reduce the online maintenance window. If the read-to-freed ratio is less than 50 to 1,increase the online maintenance window.

Database scanning and page zeroing. By default, database scanning is either on or off. If it’s on, it runs every night during your online maintenance window. That’s okay, but you have to ask yourself whether you really need to verify your database that frequently, since essentially it reads the entire database block by block. Naturally, that creates a lot of extra load on storage systems.

If page zeroing isn’t critical to you and you have NetApp SnapManager® for Exchange, an alternative is to configure a weekly copy backup that doesn’t truncate your logs. This will check for corruption and verify your active database. You may be able to schedule it on a weekend when the workload is lighter.

If you absolutely have to have page zeroing, running database scanning every night is the best way to ensure that it happens. Be aware that if a database didn’t have page zeroing enabled from the beginning, the first time it runs it will create an extreme I/O load. To reduce the impact of the first pass of page zeroing, you can enable throttling.

 

Getting the Most from Exchange 2007 SP1

Exchange 2007 SP1 offers so many enhancements and new functionality that it’s almost like a new release of Exchange. By paying attention to a few best practices, you can get the most from your Exchange installation and your NetApp storage. Here’s a summary of current NetApp best practice recommendations for Exchange 2007 SP1.

LUN Configuration and Replication
  • Isolate active and target server storage on separate storage systems.
  • Provision the active and target LUNs identically with regard to capacity and performance.
  • Separate logs and databases in their own aggregates.
  • Create a separate NetApp FlexVol® volume for each storage group.
  • Use NetApp RAID-DP® for superior performance and protection.
  • Run SnapManager for Exchange backups on the target node and adjust the online maintenance window on the active node.
  • Consider NetApp ReplicatorX™ and/or SnapMirror® to achieve an RPO of less than 5 minutes when replicating Exchange databases, logs, and hub transport data.
Defragmentation and DB Scanning
  • Reduce the online maintenance window if it is completing a full pass within 2 weeks and the read-to-freed ratio is greater than 100 to 1.
  • Consider using a copy backup (which does not truncate logs) and checksum integrity to validate database health versus online database scanning.
Robert Quimbey

Robert Quimbey
Microsoft Alliance Engineer
NetApp

Robert joined NetApp in 2007 after 8 years as a member of the Microsoft Exchange product team, where he was responsible for storage and high availability. Since joining NetApp, his activities have continued to center around Exchange 2007, including designing best practices and an in-depth analysis of Clustered Continuous Replication (CCR) I/O. His recent work will lead to the creation of reference architectures for Exchange.


Interested in learning more about deploying Exchange on NetApp
? Visit NetApp at Microsoft TechEd in Orlando, Florida, June 10 - 13, 2008. Stop by booth 201, and attend our technical session.

Wednesday, June 11
2:45-4:30 p.m.

How resilient is your Microsoft® applications infrastructure? Corporate IT organizations struggle with business continuity and disaster recovery challenges every day. Often there is no simple solution, and the current strategies are usually untested. This session discusses how you can address business continuity and disaster recovery today with Microsoft products and technologies.

At this session, you will learn how to create high-availability and disaster recovery solutions for Microsoft applications such as Exchange Server, SQL Server™, Office SharePoint® Server, and Microsoft Hyper-V™ to ensure that your next disaster recovery exercise will be a positive and defining moment in your career.

 
Explore