+

Search Tips   |   Advanced Search

Intelligent Management: troubleshooting health management

We can look for the following problems when health management is not working, or not working the way that you expect.


Finding the correct logs

The health controller is a distributed resource managed by the high availability (HA) manager. It exists within all node agent and deployment manager processes and is active within one of these processes. If a process fails, the controller becomes active on another node agent or deployment manager process.

To determine where the health controller is running, click...

The location and stability status of the health controller displays.


The performance advisor is enabled with the predefined memory leak health policy

The predefined memory leak health policy uses the performance advisor function, so the performance advisor is enabled when this policy has assigned members. To disable the performance advisor, remove this health policy or narrow the membership of the health policy. To preserve the health policy for future use, keep the memory leak policy, but remove all members. To change the members, click...

We can edit the health policy memberships by adding and removing specific members.


Health controller settings

The following list contains issues that are encountered as a result of the health controller settings:


Health policy settings

The following issues are encountered as a result of the health policy settings:


Application placement controller interactions

The following list contains issues triggered by health management and application placement controller interactions:


Sensor problems

The following list contains issues related to health management and node group membership settings:


Task management status

Sometimes a Restart action task status ends up in Failed or Unknown state. This scenario happens when the server does not stop during the time period that is allocated by default, or when the task times out. Use the following cell level property to adjust the timeout for the environment: HMM.StopServerTimeout. The value is expressed in milliseconds, and the default value is 10000. Allow health management to extend the wait time for server stop notifications received from the on demand configuration.

To increase the timeout for the environment, go to...

The default is 5 minutes. The restart task starts after twice the amount specified, allowing the server to stop and start.