The high availability deployment manager

Operating Systems: AIX, HP-UX, Linux, Solaris, Windows

The high availability deployment manager

The high availability (HA) deployment manager function is configured using a shared file system. When this configuration option is chosen, multiple deployment managers are configured. The benefit of the HA deployment manager function is that it eliminates the deployment manager as a single point of failure for cell administration. This is important in environments that have significant reliance on automated operations, including application deployment and server monitoring.
The deployment managers exist as peers. One is considered active, also known as primary, and hosts the administrative function of the cell, while the others are backups in standby mode. If the active manager fails a standby takes over and is designated the new active deployment manager. A new command line utility is provided with WebSphere^® Extended Deployment to clone the original cell deployment manager into additional deployment managers. Each deployment manager is installed and configured to run on a different physical or logical computer. The deployment managers need not be hosted on homogenous operating platforms, although like platforms are recommended. Each deployment manager shares the same instance of the master configuration repository and workspace area. These must be located on a shared file system.
The file system must support fast lock recovery. The IBM^® General Parallel File System™ (GPFS™) is recommended, and the Network File System Version 4 (NFS) is also an option.
Normal operation includes starting at least two deployment managers. A new highly available deployment manager component runs in each deployment manager to control which deployment manager is elected as the active one. Any other deployment manager in the configuration is in standby mode. The WebSphere Extended Deployment on demand router (ODR) is configured with the communication endpoints for the administrative console, the wsadmin tool, and scripting. The ODR recognizes which deployment manager instance is active and routes all administrative communication to that instance. The HA deployment manager function supports only use of the JMX SOAP connector. The JMX RMI connector is not supported in this configuration.
The deployment managers are initially configured into the same core group. Configuring the deployment managers in the same core group is important so that the routing information that is exposed to the ODR is consistent across all the deployment managers. If the deployment managers are placed into separate core groups, the core groups must be connected with a core group bridge.
A typical HA deployment manager configuration consists of two deployment managers that are located on separate workstations. The deployment managers share a master repository that is located on a SAN FS. All administrative operations are performed through the elected active deployment manager. The standby deployment manager is fully initialized and ready to do work but cannot be used for administration. This is because the administrative function does not currently support multiple concurrent server processes writing to the same configuration. Therefore, the standby rejects any login and JMX requests. However, if the active deployment manager is stopped or fails, the highly available deployment manager component recognizes the loss of the active deployment manager and dynamically switches the standby into active mode so it can take over for the lost deployment manager. The active and standbys share work spaces. When a deployment manager takeover occurs, work is not lost. When the deployment manager takeover occurs, the ODR automatically recognizes the election of the new active deployment manager and reroutes administrative requests to the new active deployment manager, as depicted in the following diagram:
While the HA deployment manager component is able to detect deployment manager failure and initiate takeover, there are edge conditions where each deployment manager could temporarily believe it is the active deployment manager. To prevent this situation from occurring, the active deployment manager holds a file lock in the shared file system. Because of this, the takeover of the active deployment manager by the standby will take a brief period of time approximately equal to the time it takes for the shared file system to detect the loss of the active deployment manager and release the lock. SAN FS and NFS both use a lock lease model and have configurable times for lock release for failed lock holders. This can be configured as low as 10 seconds for SAN FS. Note: The alternative for deployment manager HA on z/OS^® is based on starting the deployment on a different logical partition (LPAR). This is documented in IBM Techdoc WP100415: Starting Deployment Manager on another MVS™ image.

Related tasks

Configure a high availability deployment manager environment
Configure communication between core groups that are in the same cell
Configure WebSphere Virtual Enterprise for cross-cell communication
Related information

IBM General Parallel File System (GPFS) Information Center
IBM Techdoc WP100415: Starting Deployment Manager on another MVS image