HP OpenVMS Availability Manager User's Guide

Appendix A
CPU Process States

The CPU process states shown in Table A-1 are displayed in the OpenVMS CPU Process States page (Figure 3-8) and in the OpenVMS Process Information page (Figure 3-23).

Table A-1 CPU Process States
Process State Description

CEF Common Event Flag, waiting for a common event flag

COLPG Collided Page Wait, involuntary wait state; likely to indicate a memory shortage, waiting for hard page faults

COM Computable; ready to execute

COMO Computable Outswapped, COM, but swapped out

CUR Current, currently executing in a CPU

FPG Free Page Wait, involuntary wait state; most likely indicates a memory shortage

LEF Local Event Flag, waiting for a Local Event Flag

LEFO Local Event Flag Outswapped; LEF, but outswapped

HIB Hibernate, voluntary wait state requested by the process; it is inactive

HIBO Hibernate Outswapped, hibernating but swapped out

MWAIT Miscellaneous Resource Wait, involuntary wait state, possibly caused by a shortage of a systemwide resource, such as no page or swap file capacity or no synchronizations for single-threaded code.
Types of MWAIT states are shown in the following table:

MWAIT State Definition

BWAIT Process waiting for buffered I/O byte count quota.

JWAIT Process in either BWAIT or TWAIT state.

TWAIT Process waiting for timer queue entry quota.

EXH Kernel thread in exit handler (not currently used).

IMODE Kernel thread waiting to acquire inner-mode semaphore.

PSXFR Process waiting during a POSIX fork operation.

RWAST Process waiting for system or special kernel mode AST.

RWMBX Process waiting because mailbox is full.

RWNBX Process waiting for nonpaged dynamic memory.

RWPFF Process waiting because page file is full.

RWPAG Process waiting for paged dynamic memory.

RWMPE Process waiting because modified page list is empty.

RWMPB Process waiting because modified page writer is busy.

RWSCS Process waiting for distributed lock manager.

RWCLU Process waiting because OpenVMS Cluster is in transition.

RWCAP Process waiting for CPU that has its capability set.

RWCSV Kernel thread waiting for request completion by OpenVMS Cluster server process.

PFW Page Fault Wait, involuntary wait state; possibly indicates a memory shortage, waiting for hard page faults.

RWAST Resource Wait State, waiting for delivery of an asynchronous system trap (AST) that signals a resource availability; usually an I/O is outstanding or a process quota is exhausted.

RWBRK Resource Wait for BROADCAST to finish

RWCAP Resource Wait for CPU Capability

RWCLU Resource Wait for Cluster Transition

RWCSV Resource Wait for Cluster Server Process

RWIMG Resource Wait for Image Activation Lock

RWLCK Resource Wait for Lock ID data base

RWMBX Resource Wait on MailBox, either waiting for data in mailbox (to read) or waiting to place data (write) into a full mailbox (some other process has not read from it; mailbox is full so this process cannot write).

RWMPB Resource Wait for Modified Page writer Busy

RWMPE Resource Wait for Modified Page list Empty

RWNPG Resource Wait for Non Paged Pool

RWPAG Resource Wait for Paged Pool

RWPFF Resource Wait for Page File Full

RWQUO Resource Wait for Pooled Quota

RWSCS Resource Wait for System Communications Services

RWSWP Resource Wait for Swap File space

SUSP Suspended, wait state process placed into suspension; it can be resumed at the request of an external process

SUSPO Suspended Outswapped, suspended but swapped out

Appendix B
Tables of Events

This appendix contains the following tables of events:

OpenVMS Events Table B-1
Windows Events Table B-2

Each table provides the following information:

Alphabetical list of the events that the Availability Manager signals in the Event pane of the System Overview window (Figure 1-1)
Abbreviation and brief description of each event (also displayed in the Event pane)
Explanation of the event and a suggestion for remedial action, if applicable

Table B-1 OpenVMS Events
Event Description Explanation Recommended Action

CFGDON Configuration done The server application has made a connection to the node and will start collecting the data according to the Customize Data Collection options. This informational event indicates that the node is recognized. No further investigation is required.

DPGERR Error executing driver program The Data Collector has detected a program error while executing the data collection program. This event can occur if you have a bad driver program library, or there is a bug in the driver program. Make sure you have the program library that shipped with the kit; if it is correct, contact your customer support representative with the full text of the event.

DSKERR High disk error count The error count for the disk device exceeds the threshold. Check error log entries for device errors. A disk device with a high error count could indicate a problem with the disk or with the connection between the disk and the system.

DSKINV Disk is invalid The valid bit in the disk device status field is not set. The disk device is not considered valid by the operating system. Make sure that the disk device is valid and is known to the operating system.

DSKMNV Disk in mount verify state The disk device is performing a mount verification. The system is performing a mount verification for the disk device. This could be caused by:

A removable disk on a local or remote node was removed.
A disk on a local or remote node has gone offline due to errors.
The node that serves the disk is down.
The connection to a remote disk is down.

DSKOFF Disk device is off line The disk device has been placed in the off line state. Check whether the disk device should be off line. This event is also signalled when the same device name is used for two different physical disks. The volume name in the event is the second node to use the same device name.

DSKQLN High disk queue length The average number of pending I/Os to the disk device exceeds the threshold. More I/O requests are being queued to the disk device than the device can service. Reasons include a slow disk or too much work being done on the disk.

DSKRWT High disk RWAIT count The RWAIT count on the disk device exceeds the threshold. RWAIT is an indicator that an I/O operation has stalled, usually during normal connection failure recovery or volume processing of host-based shadowing. A node has probably failed and shadowing is recovering data.

DSKUNA Disk device is unavailable The disk device has been placed in the Unavailable state. The disk device state has been set to /NOAVAILABLE. See DCL help for the SET DEVICE/AVAILABLE command.

DSKWRV Wrong volume mounted The disk device has been mounted with the wrong volume label. Set the correct volume name by entering the DCL command SET VOLUME/LABEL on the node.

ELIBCR Bad CRC for exportable program library The CRC calculation for the exportable program library does not match the CRC value in the library. The exportable program library may be corrupt. Restore the exportable program library from its original source.

ELIBNP No privilege to access exportable program library Unable to access the exportable program library. Check to make sure that the Availability Manager has the proper security access to the exportable program library file.

ELIBUR Unable to read exportable program library Unable to read the exportable program library for the combination of hardware architecture and OpenVMS version. The exportable program library may be corrupt. Restore the exportable program library from its original source.

FXCPKT Received a corrupt fix response packet from node The Availability Manager tried to perform a fix, but the fix acknowledgment from the node was corrupt. This event could occur if there is network congestion or some problem with the node. Confirm the connection to the node, and reapply the fix if necessary.

FXCRSH Crash node fix The Availability Manager has successfully performed a Crash Node fix on the node. This informational message indicates a successful fix. Expect to see a Path Lost event for the node.

FXDCPR Decrement process priority fix The Availability Manager has successfully performed a Decrement Process Priority fix on the process. This informational message indicates a successful fix. Setting a process priority too low takes CPU time away from the process.

FXDCWS Decrement process working set size fix The Availability Manager has successfully decreased the working set size of the process on the node by performing an Adjust Working Set fix. This informational message indicates a successful fix. This fix disables the automatic working set adjustment for the process.

FXDLPR Delete process fix The Availability Manager has successfully performed a Delete Process fix on the process. This informational message indicates a successful fix. If the process is in RWAST state, this fix does not work. This fix also does not work on processes created with the no delete option.

FXEXIT Exit image fix The Availability Manager has successfully performed an Exit Image fix on the process. This informational message indicates a successful fix. Forcing a system process to exit its current image can corrupt the kernel.

FXINPR Increment process priority fix The Availability Manager has successfully performed an Increment Process Priority fix on the process. This informational message indicates a successful fix. Setting a process priority too high takes CPU time away from other processes. Set the priority above 15 only for "real-time" processing.

FXINQU Increment process quota limits fix The Availability Manager has successfully increased the quota limit of the process on the node by placing a new limit value in the limit field of the quota. This informational message indicates a successful fix. This fix is only for the life of the process. If the problem continues, change the limit for the account in the UAF file.

FXINWS Increment process working set size fix The Availability Manager has successfully increased the working set size of the process on the node by performing an Adjust Working Set fix. This informational message indicates a successful fix. This fix disables the automatic working set adjustment for the process. The adjusted working set value cannot exceed WSQUOTA for the process or WSMAX for the system.

FXNOPR No-change process priority fix The Availability Manager has successfully performed a Process Priority fix on the process that resulted in no change to the process priority. This informational message indicates a successful fix. The Fix Value slider was set to the current priority of the process.

FXNOQU No-change process quota limits fix The Availability Manager has successfully performed a quota limit fix for the process that resulted in no change to the quota limit. This informational message indicates a successful fix. The Fix Value slider was set to the current quota of the process.

FXNOWS No-change process working set size fix The Availability Manager has successfully performed Adjust Working Set fix on the process. This informational message indicates a successful fix. The Fix Value slider was set to the current working set size of the process.

FXPGWS Purge working set fix The Availability Manager has successfully performed a Purge Working Set fix on the process. This informational message indicates a successful fix. The purged process might page fault to retrieve memory it needs for current processing.

FXPRIV No privilege to attempt fix The Availability Manager cannot perform a fix on the node due either to no CMKRNL privilege or to unmatched security triplets. See Chapter 6 for details about setting up security.

FXQUOR Adjust quorum fix The Availability Manager has successfully performed an Adjust Quorum fix on the node. This informational message indicates a successful fix. Use this fix when you find many processes in RWCAP state on a cluster node.

FXRESM Resume process fix The Availability Manager has successfully performed a Resume Process fix on the process. This informational message indicates a successful fix. If the process goes back into suspend state, check the AUDIT_SERVER process for problems.

FXSUSP Suspend process fix The Availability Manager has successfully performed a Suspend Process fix on the process. This informational message indicates a successful fix. Do not suspend system processes.

FXTIMO Fix timeout The Availability Manager tried to perform a fix, but no acknowledgment for the fix was received from the node within the timeout period. This event can occur if there is network congestion, if some problem is causing the node not to respond, or if the fix request failed to reach the node. Confirm the connection to the node, and reapply the fix if necessary.

FXUERR Unknown error code for fix The Availability Manager tried to perform a fix, but the fix failed for an unexpected reason. Please contact your HP customer support representative with the text of this event. The event text is also recorded in the event log.

HIBIOR High buffered I/O rate The node's average buffered I/O rate exceeds the threshold. A high buffered I/O rate can cause high system overhead. If this is affecting overall system performance, use the I/O Summary to determine the high buffered I/O processes, and adjust their priorities or suspend them as needed.

HICOMQ Many processes waiting in COM or COMO The average number of processes on the node in the COM or COMO queues exceeds the threshold. Use the CPU Mode Summary to determine which processes are competing for CPU resources. Possible adjustments include changing process priorities and suspending processes.

HIDIOR High direct I/O rate The average direct I/O rate on the node exceeds the threshold. A high direct I/O rate can cause high system overhead. If this is affecting overall system performance, use the I/O Summary to determine the high direct I/O processes, and adjust their priorities or suspend them as needed.

HIHRDP High hard page fault rate The average hard page fault rate on the node exceeds the threshold. A high hard page fault indicates that the free or modified page list is too small. Check Chapter 6 for possible actions.

HIMWTQ Many processes waiting in MWAIT The average number of processes on the node in the Miscellaneous Resource Wait (MWAIT) queues exceeds the threshold. Use the CPU and Single Process pages to determine which resource is awaited. See Chapter 6 for more information about wait states.

HINTER High interrupt mode time The average percentage of time the node spends in interrupt mode exceeds the threshold. Consistently high interrupt time prohibits processes from obtaining CPU time. Determine which device or devices are overusing this mode.

HIPINT High interrupt mode time on Primary CPU The average percentage of time the node spends in interrupt mode exceeds the threshold. Consistently high interrupt time on the Primary CPU can slow down IO and servicing various systems in OpenVMS. Enabling Fast Path helps distribute the servicing of interrupts from IO among the CPUs on the node. Also, determine which device or devices are overusing this mode.

HIPRCT High process count The proportion of actual processes to maximum processes is too high. If the number of processes reaches the maximum (MAXPROCESSCNT), no more processes can be created and the system might hang as a result. Decrease the number of actual processes. Increase SYSGEN parameter MAXPROCESSCNT.

HIPWIO High paging write I/O rate The average paging write I/O rate on the node exceeds the threshold. Use the Process I/O and Memory Summary pages to determine which processes are writing to the page file excessively, and decide whether their working sets need adjustment.

HIPWTQ Many processes waiting in COLPG, PFW, or FPG The average number of processes on the node that are waiting for page file space exceeds the threshold. Use the CPU Process States and Memory Summary to determine which processes are in the COLPG, PFW, or FPG state. COLPG and PFW processes might be constrained by too little physical memory, too restrictive working set quotas, or lack of available page file space. FPG processes indicate too little physical memory is available.

HISYSP High system page fault rate The node's average page fault rate for pageable system areas exceeds the threshold. These are page faults from pageable sections in loadable executive images, page pool, and the global page table. The system parameter SYSMWCNT might be set too low. Use AUTOGEN to adjust this parameter.

HITTLP High total page fault rate The average total page fault rate on the node exceeds the threshold. Use the Memory Summary to find the page faulting processes, and make sure that their working sets are set properly.

HMPSYN High multiprocessor (MP) synchronization mode time The average percentage of time the node handles multiprocessor (MP) synchronization exceeds the threshold. High synchronization time prevents other devices and processes from obtaining CPU time. Determine which device is overusing this mode.

HPMPSN High MP synchronization mode time on Primary CPU The average percentage of time the node handles multiprocessor (MP) synchronization exceeds the threshold. High synchronization time prevents other devices and processes from obtaining CPU time. This is especially critical for the Primary CPU, which is the only CPU that performs certain tasks on OpenVMS. Determine which spinlocks are overusing this mode. Executing SYS$EXAMPLES:SPL.COM shows which spinlocks are being used.

KTHIMD Kernel thread waiting for inner-mode semaphore The average percentage of time that the kernel thread waits for the inner-mode semaphore exceeds the threshold. Use SDA to determine which kernel thread of the process has the semaphore.

LCKBLK Lock blocking The process holds the highest priority lock in the resource's granted lock queue. This lock is blocking all other locks from gaining access to the resource. Use the Single Process Windows to determine what the process is doing. If the process is in an RW xxx state, try exiting the image or deleting the process. If this fails, crashing the blocking node might be the only other fix option.

LCKCNT Lock contention The resource has a contention situation, with multiple locks competing for the same resource. The competing locks are the currently granted lock and those that are waiting in the conversion queue or in the waiting queue. Use Lock Contention to investigate a potential lock contention situation. Locks for the same resource might have the NODLCKWT wait flag enabled and be on every member of the cluster. Usually this is not a lock contention situation, and these locks can be filtered out.

LCKWAT Lock waiting The process that has access to the resource is blocking the process that is waiting for it. Once the blocking process releases its access, the next highest lock request acquires the blocking lock. If the blocking process holds the resource too long, check to see whether the process is working correctly; if not, one of the fixes might solve the problem.

LOASTQ Process has used most of ASTLM quota Either the remaining number of asynchronous system traps (ASTs) the process can request is below the threshold, or the percentage of ASTs used compared to the allowed quota is above the threshold. If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increase the ASTLM quota for the process in the UAF file. ASTLM is only a count; system resources are not compromised by increasing this count.

LOBIOQ Process has used most of BIOLM quota Either the remaining number of Buffered I/Os (BIO) the process can request is below the threshold, or the percentage of BIOs used is above the threshold. If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increasing the BIOLM quota for the process in the UAF file. BIOLM is only a count; system resources are not compromised by increasing this count.

LOBYTQ Process has used most of BYTLM quota Either the remaining number of bytes for the buffered I/O byte count (BYTCNT) that the process can request is below the threshold, or the percentage of bytes used is above the threshold. If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can raise the BYTLM quota for the process in the UAF file. BYTLM is the number of bytes in nonpaged pool used for buffered I/O.

LODIOQ Process has used most of DIOLM quota Either the remaining number of Direct I/Os (DIOs) the process can request is below the threshold, or the percentage of DIOs used is above the threshold. If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increase the DIOLM quota for the process in the UAF file. DIOLM is only a count; system resources are not compromised by increasing this count.

LOENQU Process has used most of ENQLM quota Either the remaining number of lock enqueues (ENQ) the process can request is below the threshold, or the percentage of ENQs used is above the threshold. If the limit reaches the quota, the process is not able to make further lock queue requests. If the process requires a higher quota, you can increase the ENQLM quota for the process in the UAF file.

LOFILQ Process has used most of FILLM quota Either the remaining number of files the process can open is below the threshold, or the percentage of files open is above the threshold. If the amount used reaches the quota, the process must first close some files before being allowed to open new ones. If the process requires a higher quota, you can increase the FILLM quota for the process in the UAF file.

LOMEMY Free memory is low For the node, the percentage of free memory compared to total memory is below the threshold. Use the automatic Purge Working Set fix, or use the Memory and CPU Summary to select processes that that are either not currently executing or not page faulting, and purge their working sets.

LOPGFQ Process has used most of PGFLQUOTA quota Either the remaining number of pages the process can allocate from the system page file is below the threshold, or the percentage of pages allocated is above the threshold. If the process requires a higher quota, you can raise the PGFLQUOTA quota for the process in the UAF file. This value limits the number of pages in the system page file that the account's processes can use.

LOPGSP Low page file space Either the remaining number of pages in the system page file is below the threshold, or the percentage of page file space remaining is below the threshold. Either extend the size of this page file or create a new page file to allow new processes to use the new page file.

LOPRCQ Process has used most of PRCLM quota Either the remaining number of subprocesses the current process is allowed to create is below the threshold, or the percentage of created subprocesses is above the threshold. If the amount used reaches the quota, the process is not allowed to create more subprocesses. If the process requires a higher quota, you can increase the PRCLM quota for the process in the UAF file.

LOSTVC Lost virtual circuit to node The virtual circuit between the listed nodes has been lost. Check to see whether the second node listed has failed or whether the connection between the nodes is broken. The VC name listed in parentheses is the communication link between the nodes.

LOSWSP Low swap file space Either the remaining number of pages in the system page file is below the threshold, or the percentage of page file space remaining is below the threshold. Either increase the size of this page file, or create a new page file to allow new processes to use the new page file.

LOTQEQ Process has used most of TQELM quota Either the remaining number of Timer Queue Entries (TQEs) the process can request is below the threshold, or the percentage of TQEs used to the allowed quota is above the threshold. If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can raise the TQELM quota for the process in the UAF file. TQELM is only a count; system resources are not compromised by raising it.

LOVLSP Low disk volume free space Either the remaining number of blocks on the volume is below the threshold, or the percentage of free blocks remaining on the volume is below the threshold. You must free up some disk volume space. If part of the purpose of the volume is to be filled, such as a page/swap device, then you can filter the volume from the display.

LOVOTE Low cluster votes The difference between the number of VOTES and the QUORUM in the cluster is below the threshold. Check to see whether voting members have failed. To avoid the hang that results if VOTES goes below QUORUM, use the Adjust Quorum fix.

LOWEXT Low process working set extent The process page fault rate exceeds the threshold, and the percentage of working set size compared to working set extent exceeds the threshold. This event indicates that the WSEXTENT value in the UAF file might be too low. The process needs more physical memory but cannot obtain it; therefore, the process page faults excessively.

LOWSQU Low process working set quota The process page fault rate exceeds the threshold, and the percentage of working set size exceeds the threshold. This event indicates the process needs more memory but might not be able to obtain it because one of the following is true:

The WSQUOTA value in the UAF file is set too low for the size of memory allocation requests or
The system is memory constrained.

LRGHSH Remote lock hash table too large to collect data on The Availability Manager cannot investigate the node's resource hash table (RESHASHTBL). It is either too sparse or too dense to investigate efficiently. This event indicates that the Availability Manager will take too many collection iterations to analyze lock contention situations efficiently. Make sure that the SYSGEN parameter RESHASHTBL is set properly for the node.

NOPGFL No page file The Availability Manager cannot find a page file on the node. Use SYSGEN to create and connect a page file on the node.

NOPLIB No program library The program library for the combination of hardware architecture and OpenVMS version was not found. Check to see that all the program library files exist in the program library directory.

NOPRIV Not allowed to monitor node The Availability Manager cannot monitor the node due to unmatched security triplets. See Chapter 6 for details on setting up security.

NOPROC Specific process not found The Availability Manager cannot find the process name selected in the Process Name Search dialog box on the Node Summary page. This event can occur because the listed process no longer exists, or the process name is listed incorrectly in the dialog box.

NOSWFL No swap file The Availability Manager cannot find a swap file on the node. If you do not use swap files, you can ignore this event. Otherwise, use SYSGEN to create and connect a swap file for the node.

OPCERR Event not sent to OPCOM Either the Availability Manager was unable to send the event to OPCOM because of a setup problem or an error was returned by OPCOM. A text message in the status field indicates that the Availability Manager was not configured properly, including missing shareable images or incorrectly defined logical names.
A hexadecimal condition value in the status field indicates the reason that OPCOM was not able to post the event. The $SNDOPR system service returns this value. For a list of condition values and additional information, see the HP OpenVMS System Services Reference Manual.

OVOERR Event not sent to OpenView The Availability Manager was unable to send the event to OpenView. The reason is stated in the event description in the Event pane. Problems can include the following:

The Availability Manager was not configured properly, including missing shareable images or incorrectly defined logical names.
An HP OpenView policy or template might not have been deployed properly.
A problem occurred communicating to or within OpenView.
The user does not have sufficient privileges or quotas, or both.
Too many events are waiting to be escalated by OpenView.

PKTFER Packet format error The data packet sent to the remote node was not in the correct format for the remote node to process. Please contact your HP customer support representative with the full text of the event, the version of the Availability Manager, the configuration of the node running the Availability Manager, and the configuration of the nodes being monitored.

PLIBNP No privilege to access program library Unable to access the program library. Check to see that the Availability Manager has the proper security access to the program library file.

PLIBUR Unable to read program library Unable to read the program library for the combination of hardware architecture and OpenVMS version. The program library is either corrupt or from a different version of the Availability Manager. Restore the program library from the last installation.

PRBIOR High process buffered I/O rate The average buffered I/O rate of the process exceeds the threshold. If the buffered I/O rate is affecting overall system performance, lowering the process priority or suspending the process would allow other processes to obtain access to the CPU.

PRBIOW Process waiting for buffered I/O The average percentage of time the process is waiting for a buffered I/O to complete exceeds the threshold. Use SDA on the node to ensure that the device to which the process is performing buffered I/Os is still available and is not being overused.

PRCCOM Process waiting in COM or COMO The average number of processes on the node in the COM or COMO queues exceeds the threshold. Use the CPU Summary to determine which processes should be given more CPU time, and adjust process priorities and states accordingly.

PRCCUR Process has a high CPU rate The average percentage of time the process is currently executing in the CPU exceeds the threshold. Make sure that the listed process is not looping or preventing other processes from gaining access to the CPU. Adjust process priority or state as needed.

PRCFND Process has recently been found The Availability Manager has discovered the process name selected on the Watch Process page (see Figure 7-24). No action required.

PRCMUT Process waiting for a mutex The average percentage of time the process is waiting for a particular system mutex exceeds the threshold. Use SDA to help determine which mutex the process is waiting for and to help determine the owner of the mutex.

PRCMWT Process waiting in MWAIT The average percentage of time the process is in a Miscellaneous Resource Wait (MWAIT) state exceeds the threshold. Various resource wait states are part of the collective wait state called MWAIT. See Appendix A for a list of these states. The CPU Process page and the Single Process page display which state the process is in. Check the Single Process page to determine which resource the process is waiting for and whether the resource is still available for the process.

PRCPSX Process waiting in PSXFR The average percentage of time the process waits during a POSIX fork operation exceeds the threshold.

PRCPUL Most of CPULIM process quota used The remaining CPU time available for the process is below the threshold. Make sure the CPU time allowed for the process is sufficient for its processing needs. If not, increase the CPU quota in the UAF file of the node.

PRCPWT Process waiting in COLPG, PFW or FPG The average percentage of time the process is waiting to access the system page file database exceeds the threshold. Check to make sure the system page file is large enough for all the resource requests being made.

PRCQUO Process waiting for a quota The average percentage of time the process is waiting for a particular quota exceeds the threshold. Use the Single Process pages to determine which quota is too low. Then adjust the quotas of the account in the UAF file.

PRCRWA Process waiting in RWAST The average percentage of time the process is waiting in the RWAST state exceeds the threshold. RWAST indicates the process is waiting for an asynchronous system trap to complete. Use the Single Process pages to determine if RWAST is due to the process quota being set too low. If not, use SDA to determine if RWAST is due to a problem between the process and a physical device.

PRCRWC Process waiting in RWCAP The average percentage of time the process is waiting in the RWCAP state exceeds the threshold. RWCAP indicates that the process is waiting for CPU capability. When many processes are in this state, the system might be hung because not enough nodes are running in the cluster to maintain the cluster quorum. Use the Adjust Quorum fix to correct the problem.

PRCRWM Process waiting in RWMBX The average percentage of time the process is waiting in the RWMBX state exceeds the threshold. RWMBX indicates the process is waiting for a full mailbox to be empty. Use SDA to help determine which mailbox the process is waiting for.

PRCRWP Process waiting in RWPAG, RWNPG, RWMPE, or RWMPB The average percentage of time the process is waiting in the RWPAG, RWNPG, RWMPE, or RWMPB state exceeds the threshold. RWPAG and RWNPG are for paged or nonpaged pool; RWMPE and RWMPB are for the modified page list. Processes in the RWPAG or RWNPG state can indicate you need to increase the size of paged or nonpaged pool, respectively. Processes in the RWMPB state indicate that the modified page writer cannot handle all the modified pages being generated. See Chapter 6 for suggestions.

PRCRWS Process waiting in RWSCS, RWCLU, or RWCSV The average percentage of time the process is waiting in the RWSCS, RWCLU, or RWCSV state exceeds the threshold. RWCSV is for the cluster server; RWCLU is for the cluster transition; RWSCS is for cluster communications. The process is waiting for a cluster event to complete. Use the Show Cluster utility to help investigate.

PRCUNK Process waiting for a system resource The average percentage of time the process is waiting for an undetermined system resource exceeds the threshold. The state in which the process is waiting is unknown to the Availability Manager.

PRDIOR High process direct I/O rate The average direct I/O rate of the process exceeds the threshold. If the I/O rate is affecting overall system performance, lowering the process priority might allow other processes to obtain access to the CPU.

PRDIOW Process waiting for direct I/O The average percentage of time the process is waiting for a direct I/O to complete exceeds the threshold. Use SDA on the node to ensure that the device to which the process is performing direct I/Os is still available and is not being overused.

PRLCKW Process waiting for a lock The average percentage of time the process is waiting in the control wait state exceeds the threshold. The control wait state indicates that a process is waiting for a lock. Although no locks might appear in Lock Contention, the awaited lock might be filtered out of the display.

PRPGFL High process page fault rate The average page fault rate of the process exceeds the threshold. The process is memory constrained; it needs an increased number of pages to perform well. Make sure that the working set quotas and extents are set correctly. To increase the working set quota temporarily, use the Adjust Working Set fix.

PRPIOR High process paging I/O rate The average page read I/O rate of the process exceeds the threshold. The process needs an increased number of pages to perform well. Make sure that the working set quotas and extents are set correctly. To increase the working set quota temporarily, use the Adjust Working Set fix.

PTHLST Path lost The connection between the server and collection node has been lost. Check to see whether the node failed or whether the LAN segment to the node is having problems. This event occurs when the server no longer receives data from the node on which data is being collected.

RESDNS Resource hash table dense The percentage of occupied entries in the hash table exceeds the threshold. A densely populated table can result in a performance degradation. Use the system parameter RESHASHTBL to adjust the total number of entries.

RESPRS Resource hash table sparse The percentage of occupied entries in the hash table is less than the threshold. A sparsely populated table wastes memory resources. Use the system parameter RESHASHTBL to adjust the total number of entries.

UEXPLB Using OpenVMS program export library The program library for the combination of hardware architecture and OpenVMS version was not found. Check to see that all the program library files exist in the program library directory.

UNSUPP Unsupported node The Availability Manager does not support this combination of hardware architecture and OpenVMS version. Check the product SPD for supported system configurations.

VLSZCH Volume size changed Informational message to indicate that the volume has been resized. No further investigation is required.

WINTRN High window turn rate This indicates that current open files are fragmented. Reading from fragmented files or extending a file size, or both, can cause a high window turn rate. Defragment heavily used volumes using BACKUP or a disk fragmentation program. For processes that extend the size of a file, make sure that the file extent value is large. (See the $SET RMS/EXTEND_QUANTITY command documentation for more information.)

**Table B-1 OpenVMS Events**
Event	Description	Explanation	Recommended Action
CFGDON	Configuration done	The server application has made a connection to the node and will start collecting the data according to the Customize Data Collection options.	This informational event indicates that the node is recognized. No further investigation is required.
DPGERR	Error executing driver program	The Data Collector has detected a program error while executing the data collection program.	This event can occur if you have a bad driver program library, or there is a bug in the driver program. Make sure you have the program library that shipped with the kit; if it is correct, contact your customer support representative with the full text of the event.
DSKERR	High disk error count	The error count for the disk device exceeds the threshold.	Check error log entries for device errors. A disk device with a high error count could indicate a problem with the disk or with the connection between the disk and the system.
DSKINV	Disk is invalid	The valid bit in the disk device status field is not set. The disk device is not considered valid by the operating system.	Make sure that the disk device is valid and is known to the operating system.
DSKMNV	Disk in mount verify state	The disk device is performing a mount verification.	The system is performing a mount verification for the disk device. This could be caused by: A removable disk on a local or remote node was removed. A disk on a local or remote node has gone offline due to errors. The node that serves the disk is down. The connection to a remote disk is down.
DSKOFF	Disk device is off line	The disk device has been placed in the off line state.	Check whether the disk device should be off line. This event is also signalled when the same device name is used for two different physical disks. The volume name in the event is the second node to use the same device name.
DSKQLN	High disk queue length	The average number of pending I/Os to the disk device exceeds the threshold.	More I/O requests are being queued to the disk device than the device can service. Reasons include a slow disk or too much work being done on the disk.
DSKRWT	High disk RWAIT count	The RWAIT count on the disk device exceeds the threshold.	RWAIT is an indicator that an I/O operation has stalled, usually during normal connection failure recovery or volume processing of host-based shadowing. A node has probably failed and shadowing is recovering data.
DSKUNA	Disk device is unavailable	The disk device has been placed in the Unavailable state.	The disk device state has been set to /NOAVAILABLE. See DCL help for the SET DEVICE/AVAILABLE command.
DSKWRV	Wrong volume mounted	The disk device has been mounted with the wrong volume label.	Set the correct volume name by entering the DCL command SET VOLUME/LABEL on the node.
ELIBCR	Bad CRC for exportable program library	The CRC calculation for the exportable program library does not match the CRC value in the library.	The exportable program library may be corrupt. Restore the exportable program library from its original source.
ELIBNP	No privilege to access exportable program library	Unable to access the exportable program library.	Check to make sure that the Availability Manager has the proper security access to the exportable program library file.
ELIBUR	Unable to read exportable program library	Unable to read the exportable program library for the combination of hardware architecture and OpenVMS version.	The exportable program library may be corrupt. Restore the exportable program library from its original source.
FXCPKT	Received a corrupt fix response packet from node	The Availability Manager tried to perform a fix, but the fix acknowledgment from the node was corrupt.	This event could occur if there is network congestion or some problem with the node. Confirm the connection to the node, and reapply the fix if necessary.
FXCRSH	Crash node fix	The Availability Manager has successfully performed a Crash Node fix on the node.	This informational message indicates a successful fix. Expect to see a Path Lost event for the node.
FXDCPR	Decrement process priority fix	The Availability Manager has successfully performed a Decrement Process Priority fix on the process.	This informational message indicates a successful fix. Setting a process priority too low takes CPU time away from the process.
FXDCWS	Decrement process working set size fix	The Availability Manager has successfully decreased the working set size of the process on the node by performing an Adjust Working Set fix.	This informational message indicates a successful fix. This fix disables the automatic working set adjustment for the process.
FXDLPR	Delete process fix	The Availability Manager has successfully performed a Delete Process fix on the process.	This informational message indicates a successful fix. If the process is in RWAST state, this fix does not work. This fix also does not work on processes created with the no delete option.
FXEXIT	Exit image fix	The Availability Manager has successfully performed an Exit Image fix on the process.	This informational message indicates a successful fix. Forcing a system process to exit its current image can corrupt the kernel.
FXINPR	Increment process priority fix	The Availability Manager has successfully performed an Increment Process Priority fix on the process.	This informational message indicates a successful fix. Setting a process priority too high takes CPU time away from other processes. Set the priority above 15 only for "real-time" processing.
FXINQU	Increment process quota limits fix	The Availability Manager has successfully increased the quota limit of the process on the node by placing a new limit value in the limit field of the quota.	This informational message indicates a successful fix. This fix is only for the life of the process. If the problem continues, change the limit for the account in the UAF file.
FXINWS	Increment process working set size fix	The Availability Manager has successfully increased the working set size of the process on the node by performing an Adjust Working Set fix.	This informational message indicates a successful fix. This fix disables the automatic working set adjustment for the process. The adjusted working set value cannot exceed WSQUOTA for the process or WSMAX for the system.
FXNOPR	No-change process priority fix	The Availability Manager has successfully performed a Process Priority fix on the process that resulted in no change to the process priority.	This informational message indicates a successful fix. The Fix Value slider was set to the current priority of the process.
FXNOQU	No-change process quota limits fix	The Availability Manager has successfully performed a quota limit fix for the process that resulted in no change to the quota limit.	This informational message indicates a successful fix. The Fix Value slider was set to the current quota of the process.
FXNOWS	No-change process working set size fix	The Availability Manager has successfully performed Adjust Working Set fix on the process.	This informational message indicates a successful fix. The Fix Value slider was set to the current working set size of the process.
FXPGWS	Purge working set fix	The Availability Manager has successfully performed a Purge Working Set fix on the process.	This informational message indicates a successful fix. The purged process might page fault to retrieve memory it needs for current processing.
FXPRIV	No privilege to attempt fix	The Availability Manager cannot perform a fix on the node due either to no CMKRNL privilege or to unmatched security triplets.	See Chapter 6 for details about setting up security.
FXQUOR	Adjust quorum fix	The Availability Manager has successfully performed an Adjust Quorum fix on the node.	This informational message indicates a successful fix. Use this fix when you find many processes in RWCAP state on a cluster node.
FXRESM	Resume process fix	The Availability Manager has successfully performed a Resume Process fix on the process.	This informational message indicates a successful fix. If the process goes back into suspend state, check the AUDIT_SERVER process for problems.
FXSUSP	Suspend process fix	The Availability Manager has successfully performed a Suspend Process fix on the process.	This informational message indicates a successful fix. Do not suspend system processes.
FXTIMO	Fix timeout	The Availability Manager tried to perform a fix, but no acknowledgment for the fix was received from the node within the timeout period.	This event can occur if there is network congestion, if some problem is causing the node not to respond, or if the fix request failed to reach the node. Confirm the connection to the node, and reapply the fix if necessary.
FXUERR	Unknown error code for fix	The Availability Manager tried to perform a fix, but the fix failed for an unexpected reason.	Please contact your HP customer support representative with the text of this event. The event text is also recorded in the event log.
HIBIOR	High buffered I/O rate	The node's average buffered I/O rate exceeds the threshold.	A high buffered I/O rate can cause high system overhead. If this is affecting overall system performance, use the I/O Summary to determine the high buffered I/O processes, and adjust their priorities or suspend them as needed.
HICOMQ	Many processes waiting in COM or COMO	The average number of processes on the node in the COM or COMO queues exceeds the threshold.	Use the CPU Mode Summary to determine which processes are competing for CPU resources. Possible adjustments include changing process priorities and suspending processes.
HIDIOR	High direct I/O rate	The average direct I/O rate on the node exceeds the threshold.	A high direct I/O rate can cause high system overhead. If this is affecting overall system performance, use the I/O Summary to determine the high direct I/O processes, and adjust their priorities or suspend them as needed.
HIHRDP	High hard page fault rate	The average hard page fault rate on the node exceeds the threshold.	A high hard page fault indicates that the free or modified page list is too small. Check Chapter 6 for possible actions.
HIMWTQ	Many processes waiting in MWAIT	The average number of processes on the node in the Miscellaneous Resource Wait (MWAIT) queues exceeds the threshold.	Use the CPU and Single Process pages to determine which resource is awaited. See Chapter 6 for more information about wait states.
HINTER	High interrupt mode time	The average percentage of time the node spends in interrupt mode exceeds the threshold.	Consistently high interrupt time prohibits processes from obtaining CPU time. Determine which device or devices are overusing this mode.
HIPINT	High interrupt mode time on Primary CPU	The average percentage of time the node spends in interrupt mode exceeds the threshold.	Consistently high interrupt time on the Primary CPU can slow down IO and servicing various systems in OpenVMS. Enabling Fast Path helps distribute the servicing of interrupts from IO among the CPUs on the node. Also, determine which device or devices are overusing this mode.
HIPRCT	High process count	The proportion of actual processes to maximum processes is too high. If the number of processes reaches the maximum (MAXPROCESSCNT), no more processes can be created and the system might hang as a result.	Decrease the number of actual processes. Increase SYSGEN parameter MAXPROCESSCNT.
HIPWIO	High paging write I/O rate	The average paging write I/O rate on the node exceeds the threshold.	Use the Process I/O and Memory Summary pages to determine which processes are writing to the page file excessively, and decide whether their working sets need adjustment.
HIPWTQ	Many processes waiting in COLPG, PFW, or FPG	The average number of processes on the node that are waiting for page file space exceeds the threshold.	Use the CPU Process States and Memory Summary to determine which processes are in the COLPG, PFW, or FPG state. COLPG and PFW processes might be constrained by too little physical memory, too restrictive working set quotas, or lack of available page file space. FPG processes indicate too little physical memory is available.
HISYSP	High system page fault rate	The node's average page fault rate for pageable system areas exceeds the threshold.	These are page faults from pageable sections in loadable executive images, page pool, and the global page table. The system parameter SYSMWCNT might be set too low. Use AUTOGEN to adjust this parameter.
HITTLP	High total page fault rate	The average total page fault rate on the node exceeds the threshold.	Use the Memory Summary to find the page faulting processes, and make sure that their working sets are set properly.
HMPSYN	High multiprocessor (MP) synchronization mode time	The average percentage of time the node handles multiprocessor (MP) synchronization exceeds the threshold.	High synchronization time prevents other devices and processes from obtaining CPU time. Determine which device is overusing this mode.
HPMPSN	High MP synchronization mode time on Primary CPU	The average percentage of time the node handles multiprocessor (MP) synchronization exceeds the threshold.	High synchronization time prevents other devices and processes from obtaining CPU time. This is especially critical for the Primary CPU, which is the only CPU that performs certain tasks on OpenVMS. Determine which spinlocks are overusing this mode. Executing SYS$EXAMPLES:SPL.COM shows which spinlocks are being used.
KTHIMD	Kernel thread waiting for inner-mode semaphore	The average percentage of time that the kernel thread waits for the inner-mode semaphore exceeds the threshold.	Use SDA to determine which kernel thread of the process has the semaphore.
LCKBLK	Lock blocking	The process holds the highest priority lock in the resource's granted lock queue. This lock is blocking all other locks from gaining access to the resource.	Use the Single Process Windows to determine what the process is doing. If the process is in an RW xxx state, try exiting the image or deleting the process. If this fails, crashing the blocking node might be the only other fix option.
LCKCNT	Lock contention	The resource has a contention situation, with multiple locks competing for the same resource. The competing locks are the currently granted lock and those that are waiting in the conversion queue or in the waiting queue.	Use Lock Contention to investigate a potential lock contention situation. Locks for the same resource might have the NODLCKWT wait flag enabled and be on every member of the cluster. Usually this is not a lock contention situation, and these locks can be filtered out.
LCKWAT	Lock waiting	The process that has access to the resource is blocking the process that is waiting for it. Once the blocking process releases its access, the next highest lock request acquires the blocking lock.	If the blocking process holds the resource too long, check to see whether the process is working correctly; if not, one of the fixes might solve the problem.
LOASTQ	Process has used most of ASTLM quota	Either the remaining number of asynchronous system traps (ASTs) the process can request is below the threshold, or the percentage of ASTs used compared to the allowed quota is above the threshold.	If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increase the ASTLM quota for the process in the UAF file. ASTLM is only a count; system resources are not compromised by increasing this count.
LOBIOQ	Process has used most of BIOLM quota	Either the remaining number of Buffered I/Os (BIO) the process can request is below the threshold, or the percentage of BIOs used is above the threshold.	If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increasing the BIOLM quota for the process in the UAF file. BIOLM is only a count; system resources are not compromised by increasing this count.
LOBYTQ	Process has used most of BYTLM quota	Either the remaining number of bytes for the buffered I/O byte count (BYTCNT) that the process can request is below the threshold, or the percentage of bytes used is above the threshold.	If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can raise the BYTLM quota for the process in the UAF file. BYTLM is the number of bytes in nonpaged pool used for buffered I/O.
LODIOQ	Process has used most of DIOLM quota	Either the remaining number of Direct I/Os (DIOs) the process can request is below the threshold, or the percentage of DIOs used is above the threshold.	If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can increase the DIOLM quota for the process in the UAF file. DIOLM is only a count; system resources are not compromised by increasing this count.
LOENQU	Process has used most of ENQLM quota	Either the remaining number of lock enqueues (ENQ) the process can request is below the threshold, or the percentage of ENQs used is above the threshold.	If the limit reaches the quota, the process is not able to make further lock queue requests. If the process requires a higher quota, you can increase the ENQLM quota for the process in the UAF file.
LOFILQ	Process has used most of FILLM quota	Either the remaining number of files the process can open is below the threshold, or the percentage of files open is above the threshold.	If the amount used reaches the quota, the process must first close some files before being allowed to open new ones. If the process requires a higher quota, you can increase the FILLM quota for the process in the UAF file.
LOMEMY	Free memory is low	For the node, the percentage of free memory compared to total memory is below the threshold.	Use the automatic Purge Working Set fix, or use the Memory and CPU Summary to select processes that that are either not currently executing or not page faulting, and purge their working sets.
LOPGFQ	Process has used most of PGFLQUOTA quota	Either the remaining number of pages the process can allocate from the system page file is below the threshold, or the percentage of pages allocated is above the threshold.	If the process requires a higher quota, you can raise the PGFLQUOTA quota for the process in the UAF file. This value limits the number of pages in the system page file that the account's processes can use.
LOPGSP	Low page file space	Either the remaining number of pages in the system page file is below the threshold, or the percentage of page file space remaining is below the threshold.	Either extend the size of this page file or create a new page file to allow new processes to use the new page file.
LOPRCQ	Process has used most of PRCLM quota	Either the remaining number of subprocesses the current process is allowed to create is below the threshold, or the percentage of created subprocesses is above the threshold.	If the amount used reaches the quota, the process is not allowed to create more subprocesses. If the process requires a higher quota, you can increase the PRCLM quota for the process in the UAF file.
LOSTVC	Lost virtual circuit to node	The virtual circuit between the listed nodes has been lost.	Check to see whether the second node listed has failed or whether the connection between the nodes is broken. The VC name listed in parentheses is the communication link between the nodes.
LOSWSP	Low swap file space	Either the remaining number of pages in the system page file is below the threshold, or the percentage of page file space remaining is below the threshold.	Either increase the size of this page file, or create a new page file to allow new processes to use the new page file.
LOTQEQ	Process has used most of TQELM quota	Either the remaining number of Timer Queue Entries (TQEs) the process can request is below the threshold, or the percentage of TQEs used to the allowed quota is above the threshold.	If the amount used reaches the quota, the process enters RWAST state. If the process requires a higher quota, you can raise the TQELM quota for the process in the UAF file. TQELM is only a count; system resources are not compromised by raising it.
LOVLSP	Low disk volume free space	Either the remaining number of blocks on the volume is below the threshold, or the percentage of free blocks remaining on the volume is below the threshold.	You must free up some disk volume space. If part of the purpose of the volume is to be filled, such as a page/swap device, then you can filter the volume from the display.
LOVOTE	Low cluster votes	The difference between the number of VOTES and the QUORUM in the cluster is below the threshold.	Check to see whether voting members have failed. To avoid the hang that results if VOTES goes below QUORUM, use the Adjust Quorum fix.
LOWEXT	Low process working set extent	The process page fault rate exceeds the threshold, and the percentage of working set size compared to working set extent exceeds the threshold.	This event indicates that the WSEXTENT value in the UAF file might be too low. The process needs more physical memory but cannot obtain it; therefore, the process page faults excessively.
LOWSQU	Low process working set quota	The process page fault rate exceeds the threshold, and the percentage of working set size exceeds the threshold.	This event indicates the process needs more memory but might not be able to obtain it because one of the following is true: The WSQUOTA value in the UAF file is set too low for the size of memory allocation requests or The system is memory constrained.
LRGHSH	Remote lock hash table too large to collect data on	The Availability Manager cannot investigate the node's resource hash table (RESHASHTBL). It is either too sparse or too dense to investigate efficiently.	This event indicates that the Availability Manager will take too many collection iterations to analyze lock contention situations efficiently. Make sure that the SYSGEN parameter RESHASHTBL is set properly for the node.
NOPGFL	No page file	The Availability Manager cannot find a page file on the node.	Use SYSGEN to create and connect a page file on the node.
NOPLIB	No program library	The program library for the combination of hardware architecture and OpenVMS version was not found.	Check to see that all the program library files exist in the program library directory.
NOPRIV	Not allowed to monitor node	The Availability Manager cannot monitor the node due to unmatched security triplets.	See Chapter 6 for details on setting up security.
NOPROC	Specific process not found	The Availability Manager cannot find the process name selected in the Process Name Search dialog box on the Node Summary page.	This event can occur because the listed process no longer exists, or the process name is listed incorrectly in the dialog box.
NOSWFL	No swap file	The Availability Manager cannot find a swap file on the node.	If you do not use swap files, you can ignore this event. Otherwise, use SYSGEN to create and connect a swap file for the node.
OPCERR	Event not sent to OPCOM	Either the Availability Manager was unable to send the event to OPCOM because of a setup problem or an error was returned by OPCOM.	A text message in the status field indicates that the Availability Manager was not configured properly, including missing shareable images or incorrectly defined logical names. A hexadecimal condition value in the status field indicates the reason that OPCOM was not able to post the event. The $SNDOPR system service returns this value. For a list of condition values and additional information, see the HP OpenVMS System Services Reference Manual.
OVOERR	Event not sent to OpenView	The Availability Manager was unable to send the event to OpenView.	The reason is stated in the event description in the Event pane. Problems can include the following: The Availability Manager was not configured properly, including missing shareable images or incorrectly defined logical names. An HP OpenView policy or template might not have been deployed properly. A problem occurred communicating to or within OpenView. The user does not have sufficient privileges or quotas, or both. Too many events are waiting to be escalated by OpenView.
PKTFER	Packet format error	The data packet sent to the remote node was not in the correct format for the remote node to process.	Please contact your HP customer support representative with the full text of the event, the version of the Availability Manager, the configuration of the node running the Availability Manager, and the configuration of the nodes being monitored.
PLIBNP	No privilege to access program library	Unable to access the program library.	Check to see that the Availability Manager has the proper security access to the program library file.
PLIBUR	Unable to read program library	Unable to read the program library for the combination of hardware architecture and OpenVMS version.	The program library is either corrupt or from a different version of the Availability Manager. Restore the program library from the last installation.
PRBIOR	High process buffered I/O rate	The average buffered I/O rate of the process exceeds the threshold.	If the buffered I/O rate is affecting overall system performance, lowering the process priority or suspending the process would allow other processes to obtain access to the CPU.
PRBIOW	Process waiting for buffered I/O	The average percentage of time the process is waiting for a buffered I/O to complete exceeds the threshold.	Use SDA on the node to ensure that the device to which the process is performing buffered I/Os is still available and is not being overused.
PRCCOM	Process waiting in COM or COMO	The average number of processes on the node in the COM or COMO queues exceeds the threshold.	Use the CPU Summary to determine which processes should be given more CPU time, and adjust process priorities and states accordingly.
PRCCUR	Process has a high CPU rate	The average percentage of time the process is currently executing in the CPU exceeds the threshold.	Make sure that the listed process is not looping or preventing other processes from gaining access to the CPU. Adjust process priority or state as needed.
PRCFND	Process has recently been found	The Availability Manager has discovered the process name selected on the Watch Process page (see Figure 7-24).	No action required.
PRCMUT	Process waiting for a mutex	The average percentage of time the process is waiting for a particular system mutex exceeds the threshold.	Use SDA to help determine which mutex the process is waiting for and to help determine the owner of the mutex.
PRCMWT	Process waiting in MWAIT	The average percentage of time the process is in a Miscellaneous Resource Wait (MWAIT) state exceeds the threshold.	Various resource wait states are part of the collective wait state called MWAIT. See Appendix A for a list of these states. The CPU Process page and the Single Process page display which state the process is in. Check the Single Process page to determine which resource the process is waiting for and whether the resource is still available for the process.
PRCPSX	Process waiting in PSXFR	The average percentage of time the process waits during a POSIX fork operation exceeds the threshold.
PRCPUL	Most of CPULIM process quota used	The remaining CPU time available for the process is below the threshold.	Make sure the CPU time allowed for the process is sufficient for its processing needs. If not, increase the CPU quota in the UAF file of the node.
PRCPWT	Process waiting in COLPG, PFW or FPG	The average percentage of time the process is waiting to access the system page file database exceeds the threshold.	Check to make sure the system page file is large enough for all the resource requests being made.
PRCQUO	Process waiting for a quota	The average percentage of time the process is waiting for a particular quota exceeds the threshold.	Use the Single Process pages to determine which quota is too low. Then adjust the quotas of the account in the UAF file.
PRCRWA	Process waiting in RWAST	The average percentage of time the process is waiting in the RWAST state exceeds the threshold. RWAST indicates the process is waiting for an asynchronous system trap to complete.	Use the Single Process pages to determine if RWAST is due to the process quota being set too low. If not, use SDA to determine if RWAST is due to a problem between the process and a physical device.
PRCRWC	Process waiting in RWCAP	The average percentage of time the process is waiting in the RWCAP state exceeds the threshold. RWCAP indicates that the process is waiting for CPU capability.	When many processes are in this state, the system might be hung because not enough nodes are running in the cluster to maintain the cluster quorum. Use the Adjust Quorum fix to correct the problem.
PRCRWM	Process waiting in RWMBX	The average percentage of time the process is waiting in the RWMBX state exceeds the threshold. RWMBX indicates the process is waiting for a full mailbox to be empty.	Use SDA to help determine which mailbox the process is waiting for.
PRCRWP	Process waiting in RWPAG, RWNPG, RWMPE, or RWMPB	The average percentage of time the process is waiting in the RWPAG, RWNPG, RWMPE, or RWMPB state exceeds the threshold. RWPAG and RWNPG are for paged or nonpaged pool; RWMPE and RWMPB are for the modified page list.	Processes in the RWPAG or RWNPG state can indicate you need to increase the size of paged or nonpaged pool, respectively. Processes in the RWMPB state indicate that the modified page writer cannot handle all the modified pages being generated. See Chapter 6 for suggestions.
PRCRWS	Process waiting in RWSCS, RWCLU, or RWCSV	The average percentage of time the process is waiting in the RWSCS, RWCLU, or RWCSV state exceeds the threshold. RWCSV is for the cluster server; RWCLU is for the cluster transition; RWSCS is for cluster communications. The process is waiting for a cluster event to complete.	Use the Show Cluster utility to help investigate.
PRCUNK	Process waiting for a system resource	The average percentage of time the process is waiting for an undetermined system resource exceeds the threshold.	The state in which the process is waiting is unknown to the Availability Manager.
PRDIOR	High process direct I/O rate	The average direct I/O rate of the process exceeds the threshold.	If the I/O rate is affecting overall system performance, lowering the process priority might allow other processes to obtain access to the CPU.
PRDIOW	Process waiting for direct I/O	The average percentage of time the process is waiting for a direct I/O to complete exceeds the threshold.	Use SDA on the node to ensure that the device to which the process is performing direct I/Os is still available and is not being overused.
PRLCKW	Process waiting for a lock	The average percentage of time the process is waiting in the control wait state exceeds the threshold.	The control wait state indicates that a process is waiting for a lock. Although no locks might appear in Lock Contention, the awaited lock might be filtered out of the display.
PRPGFL	High process page fault rate	The average page fault rate of the process exceeds the threshold.	The process is memory constrained; it needs an increased number of pages to perform well. Make sure that the working set quotas and extents are set correctly. To increase the working set quota temporarily, use the Adjust Working Set fix.
PRPIOR	High process paging I/O rate	The average page read I/O rate of the process exceeds the threshold.	The process needs an increased number of pages to perform well. Make sure that the working set quotas and extents are set correctly. To increase the working set quota temporarily, use the Adjust Working Set fix.
PTHLST	Path lost	The connection between the server and collection node has been lost.	Check to see whether the node failed or whether the LAN segment to the node is having problems. This event occurs when the server no longer receives data from the node on which data is being collected.
RESDNS	Resource hash table dense	The percentage of occupied entries in the hash table exceeds the threshold.	A densely populated table can result in a performance degradation. Use the system parameter RESHASHTBL to adjust the total number of entries.
RESPRS	Resource hash table sparse	The percentage of occupied entries in the hash table is less than the threshold.	A sparsely populated table wastes memory resources. Use the system parameter RESHASHTBL to adjust the total number of entries.
UEXPLB	Using OpenVMS program export library	The program library for the combination of hardware architecture and OpenVMS version was not found.	Check to see that all the program library files exist in the program library directory.
UNSUPP	Unsupported node	The Availability Manager does not support this combination of hardware architecture and OpenVMS version.	Check the product SPD for supported system configurations.
VLSZCH	Volume size changed	Informational message to indicate that the volume has been resized.	No further investigation is required.
WINTRN	High window turn rate	This indicates that current open files are fragmented. Reading from fragmented files or extending a file size, or both, can cause a high window turn rate.	Defragment heavily used volumes using BACKUP or a disk fragmentation program. For processes that extend the size of a file, make sure that the file extent value is large. (See the $SET RMS/EXTEND_QUANTITY command documentation for more information.)

Table B-2 Windows Events
Event Description Explanation Recommended Action

CFGDON Configuration done The server application has made a connection to the node and will start collecting the data according to the Customize Data Collection options. An informational event to indicate that the node is recognized. No further investigation is required.

NODATA Unable to collect performance data The Availability Manager is unable to collect performance data from the node. The performance data is collected by the PerfServ service on the remote node. Check to see that the service is up and running properly.

NOPRIV Not allowed to monitor node The Availability Manager cannot monitor the node due to a password mismatch between the Data Collector and the Data Analyzer. See Chapter 6 for details on setting up security.

PTHLST Path lost The connection between the Data Analyzer and the Data Collector has been lost. Check if the node crashed or if the LAN segment to the node is having problems. This event occurs when the server no longer receives data from the node on which data is being collected.

PVRMIS Packet version mismatch This version of the Availability Manager is unable to collect performance data from the node because of a data packet version mismatch. The version of the Availability Manager Data Collector is more recent than the Data Analyzer. To process data from the node, upgrade the Data Analyzer to correspond to the Data Collector.

**Table B-2 Windows Events**
Event	Description	Explanation	Recommended Action
CFGDON	Configuration done	The server application has made a connection to the node and will start collecting the data according to the Customize Data Collection options.	An informational event to indicate that the node is recognized. No further investigation is required.
NODATA	Unable to collect performance data	The Availability Manager is unable to collect performance data from the node.	The performance data is collected by the PerfServ service on the remote node. Check to see that the service is up and running properly.
NOPRIV	Not allowed to monitor node	The Availability Manager cannot monitor the node due to a password mismatch between the Data Collector and the Data Analyzer.	See Chapter 6 for details on setting up security.
PTHLST	Path lost	The connection between the Data Analyzer and the Data Collector has been lost.	Check if the node crashed or if the LAN segment to the node is having problems. This event occurs when the server no longer receives data from the node on which data is being collected.
PVRMIS	Packet version mismatch	This version of the Availability Manager is unable to collect performance data from the node because of a data packet version mismatch.	The version of the Availability Manager Data Collector is more recent than the Data Analyzer. To process data from the node, upgrade the Data Analyzer to correspond to the Data Collector.

Contents

Index

HP OpenVMS Availability Manager User's Guide

Appendix ACPU Process States

Appendix BTables of Events

Appendix A
CPU Process States

Appendix B
Tables of Events