Hints and tips on using MSCS

 


Verifying that MSCS is working

The task descriptions starting with Creating a queue manager for use with MSCS assume that you have a running MSCS cluster within which you can create, migrate, and destroy resources. If you want to make sure that you have such a cluster:

  1. Using the MSCS Cluster Administrator, create a group.

  2. Within that group, create an instance of a generic application resource, specifying the system clock (pathname C:\winnt\system32\clock.exe and working directory of C:\).

  3. Make sure that you can bring the resource online, that you can move the group that contains it to the other node, and that you can take the resource offline.

 

Using the IBM MQSeries Service

Use the IBM MQSeries Service to monitor and control queue managers, running it on both machines in the cluster. The service has its own administration interface. See MSCS security for information about user account options.

Once you have your queue manager online in MSCS, if you stop the IBM MQSeries Service for any reason, it automatically stops the queue manager. MSCS sees this as a failure condition.

 

Custom services

WebSphere MQ allows you to define custom services for a queue manager that the IBM MQSeries Service starts and stops when it starts and stops that queue manager. If you place such a queue manager under MSCS control, the custom services that are coordinated to start and stop with that queue manager can automatically be started and stopped during a failover. However, the registry keys that define such custom services are not stored in the same part of the registry as those for the queue manager, so you need to add them, using the Advanced properties button on the Parameters property page of the MSCS resource for the queue manager. This allows you to specify which of the service keys to checkpoint as part of running the queue manager under MSCS control.

Do not include keys that might have an adverse effect on other services that are running in the cluster, especially on the other node. This is particularly important when running in Active/Active mode.

WebSphere MQ also allows you to define custom services that are not attached to any queue manager. These cannot be handled by the WebSphere MQ MSCS resource type, because the unit of failover is a queue manager). Create your own resource type in MSCS to handle any such custom services.

 

Manual startup

For a queue manager managed by MSCS, you must set the startup attribute to manual. This ensures that the WebSphere MQ MSCS support can restart the IBM MQSeries Service without immediately starting the queue manager.

The WebSphere MQ MSCS support needs to be able to restart the service so that it can perform monitoring and control, but must itself remain in control of which queue managers are running, and on which machines. See Moving a queue manager to MSCS storage for more information.

 

MSCS and queue managers

This section describes some things to consider about your queue managers when using MSCS, as follows:

 

Creating a matching queue manager on the other node

For clustering to work with WebSphere MQ, you need an identical queue manager on node B for each one on node A. However, you do not need to explicitly create the second one. You can create or prepare a queue manager on one node, move it to the other node as described in Moving a queue manager to MSCS storage, and it is fully duplicated on that node.

 

Default queue managers

Do not use a default queue manager under MSCS control. A queue manager does not have a property that makes it the default; WebSphere MQ keeps its own separate record. If you move a queue manager set to be the default to the other computer on failover, it does not become the default there. Make all your applications refer to specific queue managers by name.

 

Deleting a queue manager

Once a queue manager has moved node, its details exist in the registry on both computers. When you want to delete it, do so as normal on one computer, and then run the utility described in WebSphere MQ MSCS support utility programs to clean up the registry on the other computer.

 

Support for existing queue managers

You can put an existing queue manager under MSCS control, provided that you can put your queue manager log files and queue files on a disk that is on the shared SCSI bus between the two machines (see Figure 19). You need to take the queue manager offline briefly while the MSCS Resource is created.

If you want to create a new queue manager, create it independently of MSCS, test it, then put it under MSCS control. See:

 

Telling MSCS which queue managers to manage

You choose which queue managers are placed under MSCS control by using the MSCS Cluster Administrator to create a resource instance for each such queue manager. This process presents you with a list of resources from which to select the queue manager that you want that instance to manage.

 

Queue manager log files

When you move a queue manager to MSCS storage, you move its log and data files to a shared disk (for an example see Moving a queue manager to MSCS storage).

Always make backups on separate media before you do this. In the log file of a queue manager there are references to fully-qualified paths. When you migrate the queue manager from a local drive to a shared disk, the paths used by the queue manager change and subsequent log entries refer to the new paths. The older (pre-migration) log entries now refer to paths that no longer exist. You cannot replay the log from a point before the migration.

Before you migrate, shut the queue manager cleanly and take a full backup of the queue files and log files. Should the queue manager resources be damaged by media loss in the future, you can restore the files from the backup. Because you shut the queue manager down cleanly, you do not need to replay log records with sequence numbers earlier than the migration. All post-migration log records have valid paths.

 

Multiple queue managers

WebSphere MQ MSCS support allows you to run multiple queue managers on each machine and to place individual queue managers under MSCS control.

 

Always use MSCS to manage clusters

Do not try to perform start and stop operations directly on any clustered queue manager using either the WebSphere MQ Explorer or WebSphere MQ Services user interfaces. Instead, use the MSCS Cluster Administrator to request that MSCS brings the queue manager online or takes it offline. This is partly to prevent possible confusion caused by MSCS reporting that the queue manager is offline, when in fact you have started it outside the control of MSCS. More seriously, stopping a queue manager without using MSCS is detected by MSCS as a failure, initiating failover to the other node.

 

Working in Active/Active mode

Both computers in the MSCS cluster can run queue managers in Active/Active mode. You do not need to have a completely idle machine acting as standby (but you can, if you want, in Active/Passive Mode). If you plan to use both machines to run workload, provide each with sufficient capacity (processor, memory, secondary storage) to run the entire cluster workload at a satisfactory level of performance.

Note:
If you are using MSCS together with Microsoft Transaction Server (MTS), you cannot use Active/Active mode. This is because, to use WebSphere MQ with MSCS and MTS:

  • Application components that use WebSphere MQ's MTS support must run on the same computer as the Distributed Transaction Coordinator (DTC), a part of MTS.

  • The queue manager must also run on the same computer.

  • The DTC must be configured as an MSCS resource, and can therefore run on only one of the computers in the cluster at any time.

 

PostOnlineCommand and PreOfflineCommand

Specify these commands in the Parameters to a resource of type IBM WebSphere MQ MSCS. You can use them to integrate WebSphere MQ MSCS support with other systems or procedures. For example, you could specify the name of a program that sends a mail message, activates a pager, or generates some other form of alert to be captured by another monitoring system.

PostOnlineCommand is invoked when the resource changes from offline to online; PreOfflineCommand is invoked for a change from online to offline. Both commands run under the user account used to run the MSCS Cluster Service; and are invoked asynchronously; WebSphere MQ MSCS support does not wait for them to complete before continuing. This eliminates any risk that they might block or delay further cluster operations.

You can also use these commands to issue WebSphere MQ commands, for example to restart Requester channels. However, because they are edge-triggered, the commands are not suited to performing long-running functions, and cannot make assumptions about the current state of the queue manager. For example, the commands cannot assume that the queue manager is online; it is quite possible that, immediately after the queue manager was brought online, an administrator issued an offline command.

If you want to run programs that depend on the state of the queue manager, consider creating instances of the MSCS Generic Application resource type, placing them in the same MSCS group as the queue manager resource, and making them dependent on the queue manager resource.

 

Using preferred nodes

It can be useful when using Active/Active mode to configure a preferred node for each queue manager. However, in general it is better not to set a preferred node but to rely on a manual failback. Unlike some other relatively stateless resources, a queue manager can take a while to fail over (or back) from one node to the other. To avoid unnecessary outages, test the recovered node before failing a queue manager back to it. This precludes use of the immediate failback setting. You can configure failback to occur between certain times of day.

Probably the safest route is to move the queue manager back manually to the desired node, when you are certain that the node is fully recovered. This precludes use of the preferred node option.

 

Performance benchmarking

How long does it take to fail a queue manager over from one machine to the other? This depends heavily on the amount of workload on the queue manager and on the mix of traffic, that is, how much of it is persistent, within syncpoint, how much committed before the failure, and so on. In our test we have seen failover and failback times of about a minute. This was on a very lightly loaded queue manager and actual times will vary considerably depending on load.

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.