Selecting the metrics

 

For each type of monitor, Management Central offers several measurements, known as metrics, to help you pinpoint different aspects of system activity. A metric is a measurement of a particular characteristic of a system resource or the performance of a program or a system.

For a system monitor, you can select from a wide range of available metrics, such as CPU utilization, interactive response time, transaction rate, disk arm utilization, disk storage, disk IOP utilization, and more.

For a message monitor, you can specify one or more message IDs, message types, severity levels. You can also select from a list of predefined sets of messages that are associated with a specific type of problem, such as a communications link problem, a cabling or hardware problem, or a modem problem.

For a file monitor, you can select to monitor files across multiple endpoint systems for a specified text string or for a specified size. Or, you can select to trigger an event whenever a specified file has been modified. You can select one or more files to be monitored, or you can select the History log option, which will monitor the i5/OS™ history log (QHST).

For a job monitor, available metrics include job count, job status, job log messages, CPU utilization, logical I/O rate, disk I/O rate, communications I/O rate, transaction rate, and more.

The Metrics page in the New Monitor window allows you to view and change the metrics that you want to monitor. To access this page, click Monitors, right-click the type of monitor you want to create (for example, Job), and then click New Monitor. Fill in the required fields, and then click the Metrics tab.

Use the online help to assist you in selecting your metrics. Remember to specify threshold values that allow you to be notified and to specify actions to be taken when a certain value (called the trigger value) is reached.

 

System monitor metrics

Metrics that you can use in a system monitor include the following:

Table 1. System monitor metric definitions
Name Description
CPU Utilization (Average) The percentage of available processing unit time that is being consumed by all jobs, threads of a job, and Licensed Internal Code tasks on the system. Click any collection point on the graph to see a Details chart that shows the 20 jobs or tasks with the highest CPU utilization.
CPU Utilization (Interactive Jobs) The percentage of available processing unit time that is being consumed on the system for all jobs which include the following:

  • A 5250 workstation that includes a Twinax attached remote line and local area network (LAN) line

  • Systems Network Architecture (SNA) attached line that includes SNA display station pass-through

  • All Telnet sessions, for example, LAN, IBM Personal Communications, iSeries Access PC5250, and other SNA or Telnet emulators
Click any collection point on the graph to see a Details chart that shows the 20 interactive jobs (5250 jobs) with the highest CPU utilization.
CPU Utilization (Interactive Feature) The percentage of available interactive capability. The model number of your server (and for some models, the optional interactive feature card) determines the interactive capability of your system. It is possible to operate at greater than 100% of your available interactive capability. However, optimal system performance is achieved by maintaining an interactive workload that does not exceed the 100% level for extended periods. A recommended range should be approximately equal to or less than 70%. Click any collection point in the graph to see a Details chart that shows the 20 jobs with the highest CPU contributing to this workload.
CPU Utilization Basic (Average) The percentage of available processing unit time that is being consumed by all jobs on the system. This metric includes the same work as CPU Utilization (Average) but does not include active job details. No additional data is available for this metric. You save system resource by not tracking the more detailed information.
CPU Utilization (Secondary Workloads) The percentage of available processing unit time that is being consumed by secondary workloads running on your dedicated server. For example, if your system is a dedicated server for Domino, Domino work is considered the primary workload. CPU Utilization (Secondary Workloads) shows the available processing unit time that is being consumed by any work other than Domino work on your server and can include WebSphere Java and general Java servlets that run as Domino applications. No additional data is available for this metric.
CPU Utilization (Database Capability) The percentage of available database capability that is being consumed by i5/OS database functions on your system, which includes file I/O, SQL, and general query functions. The model number and features of your system determine the amount of CPU available for database processing on your system. A recommended range should be approximately equal to or less than CPU Utilization (Average). Click any collection point in the graph to see a Details chart that shows the 20 jobs with the highest database CPU utilization.
Interactive Response Time (Average) The average response time, in seconds, being experienced by 5250 interactive jobs on the system. Click any collection point on the graph to see a Details chart that shows the 20 jobs with the highest response time.
Interactive Response Time (Maximum) The maximum response time, in seconds, that has been experienced by any 5250 interactive job on the system during the collection interval. Click any collection point on the graph to see a Details chart that shows the 20 jobs with the highest response time.
Transaction Rate (Average) The number of transactions that are being completed per second by all active jobs on the system. Click any collection point on the graph to see a Details chart that shows the 20 jobs with the highest transaction rate.
Transaction Rate (Interactive) The number of transactions that are being completed per second on the system by active 5250 jobs, which include the following:

  • A 5250 workstation that includes a Twinax attached remote line and local area network (LAN) line

  • Systems Network Architecture (SNA) attached line that includes SNA display station pass-through

  • All Telnet sessions, for example, LAN, IBM Personal Communications, iSeries Access PC5250, and other SNA or Telnet emulators
Click any collection point on the graph to see a Details chart that shows the 20 jobs with the highest transaction rate.
Batch Logical Database I/O The average number of logical database input/output (I/O) operations being performed per second by all non-5250 batch jobs on the system. A logical I/O operation occurs when data is transferred between the system and application I/O buffers. This metric indicates how much work your batch jobs are performing during any given interval. Click any collection point on the graph to see a Details chart that shows the 20 batch jobs with the highest number of logical database I/O operations per second.
Disk Arm Utilization (Average) The average percentage of all disk arm capacity that was utilized on the system during the collection interval. This metric shows how busy the disk arms on the system are during the current interval. Click any collection point on the graph to see a Details chart that shows the utilization of each disk arm.
Disk Arm Utilization (Maximum) The maximum percentage of capacity that was utilized by any disk arm on the system during the collection interval. This metric shows how busy the disk arms on the system are during the current interval. Click any collection point on the graph to see a Details chart that shows the utilization of each disk arm.
Disk Storage (Average) The average percentage of storage that was full on all disk arms during the collection interval. This metric shows how full the disk arms on the system are during the current interval. Click any collection point on the graph to see a Details chart that shows the percentage of storage that was full on each disk arm.
Disk Storage (Maximum) The maximum percentage of storage that was full on any disk arm on the system during the collection interval. This metric shows how full the disk arms on the system are during the current interval. Click any collection point on the graph to see a Details chart that shows the percentage of storage that was full on each disk arm.
Disk IOP Utilization (Average) The average utilization of all the disk input/output processors (IOPs) during the collection interval. This metric shows how busy the disk IOPs on the system are during the current interval. Multifunction IOPs can perform both Disk and Communication I/O work and can therefore be reported under either or both categories. If they performed work in both areas, the division of utilization is unknown and is reported fully under each category. Click any collection point on the graph to see a Details chart that shows the utilization of each input/output processor (IOP).
Disk IOP Utilization (Maximum) The maximum utilization of any disk input/output processor (IOP) during the collection interval. This metric shows how busy the disk IOPs on the system are during the current interval. Multifunction IOPs can perform both Disk and Communication I/O work and can therefore be reported under either or both categories. If they performed work in both areas, the division of utilization is unknown and is reported fully under each category. Click any collection point on the graph to see a Details chart that shows the utilization of each input/output processor (IOP).
Communications IOP Utilization (Average) The average utilization of all the communications input/output processors (IOPs) during the collection interval. This metric shows how busy the communications IOPs on the system are during the current interval. Multifunction IOPs can perform both Disk and Communication I/O work and can therefore be reported under either or both categories. If they performed work in both areas, the division of utilization is unknown and is reported fully under each category. Click any collection point on the graph to see a Details chart that shows the utilization of each input/output processor (IOP).
Communications IOP Utilization (Maximum) The maximum utilization of any communications input/output processor (IOP) during the collection interval. This metric shows how busy the communications IOPs on the system are during the current interval. Multifunction IOPs can perform both Disk and Communication I/O work and can therefore be reported under either or both categories. If they performed work in both areas, the division of utilization is unknown and is reported fully under each category. Click any collection point on the graph to see a Details chart that shows the utilization of each input/output processor (IOP).
Communications Line Utilization (Average) The average amount of data that was actually sent and received for all non-LAN lines that are active during the time you collect data. Line utilization is an approximation of the actual amount of data transmitted compared with the theoretical limit of the lines based on the line speed settings in the line descriptions. The communication lines included on this monitor are one of the following line types: Bisync, Async, IDLC, X25, LAPD, SDLC, or PPP. This metric shows how actively the system is using its communication lines. If you have communications lines, such as fax lines, that are very busy much of the time, you may want to exclude these heavily utilized lines from the system monitor graph. Click any collection point on the graph to see a Details chart that shows the utilization of each line on the system.
Communications Line Utilization (Maximum) The maximum amount of data that was actually sent and received for all non-LAN lines that are active during the time you collect data. Line utilization is an approximation of the actual amount of data transmitted compared with the theoretical limit of the line based on its line speed setting in the line description. The communication lines included on this monitor are one of the following line types: Bisync, Async, IDLC, X25, LAPD, SDLC, or PPP. This metric shows how actively the system is using its communication lines. If you have communications lines, such as fax lines, that are very busy much of the time, you may want to exclude these heavily utilized lines from the system monitor graph. Click any collection point on the graph to see a Details chart that shows the utilization of each line on the system.
LAN Utilization (Average) The average amount of data that was actually sent and received on all local area network (LAN) lines in the system, compared with the theoretical limit of the lines based on the line speed settings in the line descriptions. The LAN lines included on this monitor are one of the following line types: token-ring or Ethernet. This metric shows how actively the system is using its LAN lines. Click any collection point on the graph to see a Details chart that shows the utilization of each line on the system.
LAN Utilization (Maximum) The maximum amount of data that was actually sent and received on any local area network (LAN) line in the system, compared with the theoretical limit of the line based on its line speed setting in the line description. The LAN lines included on this monitor run one of the following line types: token-ring or Ethernet. This metric shows how actively the system is using its LAN lines. Click any collection point on the graph to see a Details chart that shows the utilization of each line on the system.
Machine Pool Faults The average number of faults per second that occur in the machine pool of the system during the time you collect the data. Only Licensed Internal Code runs in the machine pool. This metric shows the level of faulting activity in the system’s machine pool. Click any collection point on the graph to see a Details chart that shows the number of faults per second in the system’s machine pool.
User Pool Faults (Average) The average number of faults per second occurring in all of the user pools on the system during the time you collect the data. This metric shows how much faulting activity is occurring in the system’s user pools. Click any collection point on the graph to see a Details chart that shows the number of faults per second in each auxiliary storage pool.
User Pool Faults (Maximum) The maximum number of faults per second occurring in all of the user pools on the system during the time you collect the data. This metric shows how much faulting activity is occurring in the system’s user pools. Click any collection point on the graph to see a Details chart that shows the number of faults per second in each auxiliary storage pool.

 

Job monitor metrics

You can use any metric, a group of metrics, or all the metrics from the list to be included in your monitor. Metrics you can use in a job monitor include the following:

Table 2. Job monitor metric definitions
Name Description
Job Count Monitor for a specific number of jobs matching the job selection.
Job Status Monitor for jobs in any selected status, such as Completed, Disconnected, Ending, Held while running, or Initial thread held. Remember: Metrics for job status can affect performance. Limit the number of jobs that you are monitoring to 40.
Job Log Messages Monitor for messages based on any combination of Message ID, Type, and Minimum severity.

 

Job numeric values

Table 3. Job numeric values definition
Name Description
CPU Utilization The percentage of available processing unit time used by all jobs that are included by this monitor on this system.
Logical I/O Rate The number of logical I/O actions, per second, by each job that is being monitored on this system.
Disk I/O Rate The average number of I/O operations, per second, performed by each job that is being monitored on this system. The value in this column is the sum of the asynchronous and synchronous disk I/O operations.
Communications I/O Rate The number of communications I/O actions, per second, by each job that is being monitored on this system.
Transaction Rate The number of transactions per second by each job that is being monitored on this system.
Transaction Time The total transaction time for each job that is being monitored on this system.
Thread Count The number of active threads in each job that is being monitored on this system.
Page Fault Rate The average number of times, per second, that an active program in each job that is being monitored on this system refers to an address that is not in main storage.

 

Summary numeric values

Table 4. Summary numeric values definition
Name Description
CPU Utilization The percentage of available processing unit time used by all jobs monitored on this system. For multiple-processor systems, this is the average percent busy for all processors.
Logical I/O Rate The number of logical I/O actions, per second, by all jobs monitored on this system.
Disk I/O Rate The average number of I/O operations, per second, performed by all jobs monitored on this system. The value in this column is the sum of the asynchronous and synchronous disk I/O operations.
Communications I/O Rate The number of communications I/O actions, per second, by all jobs monitored on this system.
Transaction Rate The number of transactions per second by all jobs monitored on this system.
Transaction Time The total transaction time for all jobs monitored on this system.
Thread Count The number of active threads for all jobs monitored on this system.
Page Fault Rate The average number of times, per second, that active programs in all jobs monitored on this system refer to an address that is not in main storage.

 

Parent topic:

Creating a new monitor

Next topic: Specifying the threshold values