Directory Server, Version 6.1

 

Appendix Q. High Availability Scenarios

The IBM Tivoli Directory Server (TDS) is widely deployed in high availability (HA) configurations. In a typical HA configuration, a load balancer is configured in front of several peer masters. Load balancing function, also called virtual IP support or layer 4 routing is usually implemented using network switches. Many network switches from Cisco, F5, Nortel, and other switch vendors have this capability.

For HA configurations, a load balancer is configured only for the purpose of a failover. If a primary master goes down, all traffic to that master is redirected to one of the peer masters. Usually, failback to the original peer is not automatic. However, this is appropriate as the failback is desired only when the replication queue to the newly restarted peer becomes empty. The load balancer sends health check messages to the LDAP servers frequently. For most load balancers, the default health check message is very basic such as a TCP SYN packet. If the target server responds with an ACK, then it is regarded as up. However, the SYN packet is not a very accurate measure of availability, because an ACK is returned even if the target server is in a hung state.

In larger configurations, both load balancing and failover may be desired. Typically, load balancing of write traffic is unwise, because it leads to a possibility of an update conflict. So, one common approach is to configure read and write applications to use a virtual IP address in the load balancer which is configured for failover, and to have read-only applications point to a different virtual IP address, which is configured for load balancing. For write access, the load balancers are configured to failover between peer masters. For read access, failover and load balancing may occur between read-only replicas or between a combination of peer masters and read-only replicas.

The licensed version of TDS also includes the proxy server. The proxy server has the ability to distinguish between LDAP reads and writes, and so it can failover writes and load balance reads. However, it is advisable to have several proxies so that there is no single point of failure. The proxies are typically fronted by one or several load balancers.

Many LDAP applications use persistent sessions. If persistent sessions are used, the failover process may not be fast. While new sessions are redirected to the backup server, the existing sessions may take several minutes to time out, resulting in a loss of service for that period. This problem can be resolved by using the TDS 6.1 proxy, which fails over existing sessions without disruption. Some load balancers, such as the software load balancer included in IBM WebSphere Application Server Network Deployment, can be configured to send a reset (RST) packet to any persistent sessions, so that they can be quickly re-established on the failover server.

There are several other characteristics of an HA configuration. For instance, in an HA configuration scenario, if one system goes down, the remaining systems must be able to bear the load. Also, it is a good idea to build redundancy into the network configuration, so that if one LAN segment or switch goes down, traffic can still flow from LDAP clients to LDAP servers. In an HA configuration, it is advisable to store LDAP data on RAID arrays, so that no server outage is caused by a physical disk failure. It is also advisable to use system monitoring tools to poll the availability of the servers, so that recovery procedures can be initiated if any of the servers go down. Some scenarios may also have HA support to include multiple redundant sites, so that if an entire site is lost, the other one takes over.

Another important characteristic of HA configuration involves the ability to accomplish maintenance without system downtime. IBM Tivoli directory server supports incremental upgrade of a server topology, so that service can be applied to one server at a time without downtime for the directory service. Updates for the server that is down are queued, so that it comes back into full synch when it is restarted. TDS also supports online backup of an existing server, using either DB2 or RAID facilities. This allows new servers to be added or existing servers to be replaced in the topology without downtime.



[ Top of Page | Previous Page | Next Page | Contents | Index ]