SCR and CCR Clustering Technologies

Mailscape Functionality Help > SCR and CCR Clustering Technologies

Exchange CCR common issues

Component

Errors

Possible Reasons

Possible Solutions

PassiveNodeUp 

Failure 

Active node cannot communicate with the passive nodeCluster service has stopped or failed on the passive nodePublic Network or Heartbeat network cannot reach the passive node 

Validate that the cluster service is in a running state in Services MMC

If the cluster services start and stop frequently, please review the event viewer for error logs.

Validate that the network interface cards are connected and available in the cluster and you have a proper connection between the two nodes

For more information see the following link: http://msexchangeteam.com/archive/2008/04/03/448615.aspx

ClusterNetwork 

Failure 

Network connectivity outage

Switch Down

NIC hardware problem

NIC disabled in the OS 

Check network connectivity between active-passive node

Validate you can communicate with your default gateway through the network switch

Validate there is no hardware error messages in the event viewer

Validate that all network cards used in the cluster are enabled and configured with a static IP address 

QuorumGroup 

Failure 

Cluster IP address is in a failed state

IP Address conflict

File Share Witness is in a failed state (see below)

If there IP address fails, the Cluster Network will not be brought online, because it depends on the Cluster IP address (please the DnsRegistrationStatus) If the File Share Witness is in a failed state, please review the FileShareQuorum solutions and workarounds. 

FileShareQuorum 

Failure 

Cluster cannot reach the server that hosts the file share witness.

The file share was created with the wrong shared permissions.

The file share was created with the wrong NTFS permissions. 

Check if the server hosting the file share witness is available and if the folder still exists.

Validate if the computer accounts of the cluster has full access control over the shared folder and in NTFS

The everyone group must exists and have read permissions in the shared folder.

For more information on how to create the file share witness, please review the articles below:

http://technet.microsoft.com/en-us/library/bb676490(EXCHG.80).aspx

http://msexchangeteam.com/archive/2008/04/03/448615.aspx 

CmsGroup 

Failure
 

If any of the Exchange CMS services are in a stopped state, this item will be in a critical state. I.E.: Microsoft Exchange Information Store, Microsoft Exchange System Attendant and any of the Storage Groups and Mailboxes databases. 

Check which service is stopped and look into the event viewer for error messages. If both cluster nodes were shutdown unexpectedly when they are brought online it could result in a failed state. Try to stop one of the servers and start the services on the active node. If you are performing a failover, it is possible that this item will be in a critical state for a moment and then go back to normal when your Exchange CMS group is back online. 

NodePaused 

Failure
 

The cluster was put in a paused state by the administrator. 

Go to cluster administration or failover cluster management tool and start the cluster service in the paused node. 

DnsRegistrationStatus 

Failure
 

Cluster service cannot register the DNS entry into the Active Directory DNS zone. 

Validate that the DNS service is up and running.

Validate that the DNS entries on the public interface card can be reached from every node of the cluster.

Try to perform an ipconfig /flushdns && ipconfig /registerdns and also see if there are any errors related to DNS in the event viewer.

Try to ping the cluster name from another server on the network.

Validate that your active directory zone supports secure updates and there is no replication issues in your AD infrastructure.

NOTE: Cluster service uses Kerberos authentication to register your entry into AD/DNS zone. 

ReplayService 

Failure
 

Microsoft Exchange Replication service is not running. 

Try to start the Microsoft Exchange Replication service manually through Services MMC, if the service stops repeatedly, please review the event viewer for more error messages. If the service starts it will trigger the replication of your Storage Group Copies from active to the passive node 

DBMountedFailover 

Failure  

The database is not mounted. 

If the cluster service is up and running and the replication services are started, go to Exchange Management Shell or Cluster Administrative Tool and try to mount the database.

If the mount fails, review the application and system logs for errors. Your database maybe be in a dirty shutdown state or corrupted. You may want to contact Microsoft PSS or call us for assistance. 

SGCopySuspended 

Checks if there are any storage groups copy in the 'Suspended' state. 

Storage Group Copy was put in a suspended state manually or automatically. 

Halting replication stops all propagation of the changes from the active storage group to the copy for the period of the suspension. Should a failover happen during this time, the storage group copy will not have the latest changes. Depending on the volume of changes that has occurred on the active node, the lack of recent updates is likely to prevent the system from mounting the copy on the passive node. Thus, you can either use the available version of the storage group on the passive node or wait until the original server recovers.

It is important to minimize the time that the replication is halted to minimize this exposure. Please review this article for more detail in how to handle Storage Group Copies: http://technet.microsoft.com/en-us/library/aa997676(EXCHG.80).aspx 

SGCopyFailed 

The passive node was shut down and when it was brought back online the SGcopy failed to initialize.  

Microsoft Exchange Replication Service has stopped. 

In this case you should reseed the whole storage group from the active node to the passive node. Please use the Update-StorageGroupCopy –Identity <StorageGroupName> -DeleteExistingFiles cmdlet with the right parameters to initiate a full reseed your storage group.

Please note that this command should be executed from the passive node of your cluster.

For more information, please see the following article: http://technet.microsoft.com/en-us/library/aa998853(EXCHG.80).aspx 

SGInitializing 

The passive node was shut down and when it was brought back online the SGcopy failed to initialize. 

A node of the cluster was shutdown and brought back online.

The Microsoft Exchange information store was stopped and brought back online.

The Microsoft Replication Service was shutdown and brought back online.

 

Checks to see if any storage groups are in the Initializing state.

Verify if another administrator created a new storage group.

Verify if another administrator failed over the cluster with the Loss Less option selected.

If the initialization process does not progress please suspend the storage group copy with the Suspend-StorageGroupCopy –Identity <StorageGroupName> cmdlet and on passive node perform a full reseed of your Storage Group Copy using the Update-StorageGroupCopy –Identity <StorageGroupName> -DeleteExistingFiles

For more information review the following article: http://technet.microsoft.com/en-us/library/aa998182(EXCHG.80).aspx 

SGCopyQueueLength 

Failure 

The copy queue length is above the warning or failure thresholds. There are many items in the queue waiting to be delivered to the passive node. 

This item monitors how many log shipping are still pending to be delivered from the active to the passive node.

if exists many items on the Storage Group Copy queue, please review the following items:

Massive generation of logs could be a cause of a massive send/receive e-mail messages in the system.

Malware infection occurring inside the network and the e-mail system was infected.

Nom appropriate usage of e-mail system, sending massive amount of e-mail messages (e-mail marketing campaigns, internal applications, etc.).

SMTP Relay attacks passing though the security, getting permission to send large amount of e-mails. 

SGReplayQueueLength 

Failure 

The replay queue length is above the warning or failure thresholds. There are many items in the replay queue waiting to be applied on passive node. 

Checks to see if any storage group has a replication replay queue length greater than best practice thresholds. Currently, these thresholds are:

Warning   Queue length is 30–59 log files.

Failure   Queue length is 60 or more log files.

This is also an indication of queuing logs in the active node to be delivered to the passive node, please review the item SGCopyQueueLength for more details.

Restart the Microsoft Exchange Replication service on cluster nodes and review if the queue gets back to normal.

Review the event viewer for any events on cluster nodes.

If the queue does not goes down, suspend the replication, and perform a manual reseed of the database to resume the replication in a healthy state.

 

 

 

<<back to Mailscape Functionality Help