Peer recovery of transactions

 

+

Search Tips   |   Advanced Search

 

Peer recovery enables any server in a cluster to recover the transactional work for any other server in the same cluster.

As a vital part of providing recovery for transactions, the transaction service logs information about active transactional work, such that the information is preserved across a server crash. This means that any transactional work in progress at the time of a server failure can be resolved when the server is restarted.

The standard recovery process performed when an application server restarts is for the server to retrieve and process the logged transaction information, to recover transactional work and complete in-doubt transactions. Completion of the transactional work (and hence the release of any database locks held by the transactions) is delayed until the server has successfully restarted and processed its transaction logs. If the server is slow to recover or requires manual intervention, the transactional work cannot be completed and access to associated databases is disrupted.

To minimize such disruption to transactional work and the associated databases, WAS provides a high availability strategy known as transaction peer recovery.

Peer recovery is provided within a server cluster. Each server in the cluster has a recovery process that can run alongside normal server activity, and enables a server in the cluster to recover the transactional work for another server in the same cluster. There is no need to start a new application server specifically to recover the failed server.

The peer recovery process is the logical equivalent to restarting the failed server, but does not constitute a complete restart of the failed server within the peer server. It merely provides an opportunity for outstanding work to be completed. It is not possible for the peer recovery process to start new work beyond recovery processing. In other words, no "forward processing" is possible for the failed server. Both transactions and the compensation service fail over together to the same peer server.

Peer recovery moves the high availability requirements away from individual servers and onto the server cluster. After such failures, the WLM system of the cluster dispatches new work onto the remaining servers, the only difference from the users perspective being the potential drop in overall system throughput. If a server fails, all that is required is to tidy up work that was active on the failed server and redirect requests to an alternate server. Both transactions and the compensation service fail over together to the same peer server.

 

Common configuration for peer recovery

The transaction service requires a common configuration in order to be able to perform peer recovery between servers. This means that peer recovery processing can only take place between members of the same server cluster. Although a cluster can contain both v5 and v6 servers, peer recovery can only be performed between servers in the cluster that are at v6 or later.

Control over which server is nominated to perform recovery processing for a failed peer is handled by the selected Clustered TM Policy of the cluster's core group. The default "1 of N with preferred server" policy nominates a running member of the cluster to perform peer recovery processing and passes recovery control back to the failed server when it restarts.

By default, peer recovery is disabled until the Enable high availability for persistent services check box in the cluster configuration is selected. When this option has been selected, cluster members must be restarted before they engage in peer recovery processing for other cluster members. Similarly, if this option is disabled, cluster members must be restarted to prevent them from performing peer recovery.

 

Location of recovery log files

The storage mechanism used to host recovery log files (for example, use IBM NAS and shared SCSI drives, but not simple network share) and access to that mechanism (for example, through a LAN), must support the file-based force operation that is used by the recovery log service to force data to disk. After the force operation is complete, information must be persistently stored on physical disk media; for example, IBM NAS (http://www.ibm.com/servers/storage/nas/index.html).

Interactions between the HA framework and the recovery log service must prevent concurrent access to a single physical recovery log.

 

Recovery log directory administration and scripting

You can configure the location of the transaction log directory using either the WebSphere administrative console or commands. For peer recovery, the configuration is stored as part of the recovery log configuration in the serverindex.xml node-level configuration file.

To ease migration of the transaction log configuration from previous versions of WebSphere Application Server, special logic has been added to the administrative console. This is to help migration of the transaction log directory configuration from the original server.xml server-level configuration file to the serverindex.xml node-level configuration file.

  • Changes to recovery log directory settings are always stored within the new serverindex.xml file.

  • Scripted modifications that configure the original recovery log settings, or migration of v5 application servers to v6, cause the original transaction log directory configuration to be updated. The administrative console detects this condition and prompts the user to save the configuration when they view the transaction service panel. This save operation saves the changed configuration to the serverindex.xml file, and resets the older fields to null.

  • New scripting should target the serverindex.xml configuration directly. Existing scripting that targets the server.xml configuration should be changed to target the serverindex.xml at the earliest opportunity.


 

See Also


Core groups
High availability groups

 

Related Tasks


Configure transaction properties for an application server