Document revision date: 30 March 2001 | |
Previous | Contents | Index |
The CPU is the central resource in your system and it is the most costly to augment. Good CPU performance is vital to that of the system as a whole, because the CPU performs the two most basic system functions: it allocates and initiates the demand for all the other resource, and it provides instruction execution service to user processes.
This chapter discusses the following topics:
Only one process can execute on a CPU at a time, so the CPU resource must be shared sequentially. Because several processes can be ready to use the CPU at any given time, the system maintains a queue of processes waiting for the CPU.
These processes are in the compute (COM) or compute outswapped (COMO)
scheduling states.
9.1.1 Quantum
The system allocates the CPU resource for a period of time known as a quantum to each process that is not waiting for other resources.
During its quantum, a process can execute until any of the following events occur:
A good measure of the CPU response is the average number of processes in the COM and COMO states over time---that is, the average length of the compute queue.
If the number of processes in the compute queue is close to zero, unblocked processes will rarely need to wait for the CPU.
Several factors affect how long any given process must wait to be granted its quantum of CPU time:
The worst-case scenario involves a large compute queue of compute-bound processes. Each compute-bound process can retain the CPU for the entire quantum period.
Assuming no interrupt time and a default quantum of 200 milliseconds, a group of five compute-bound processes of the same priority (one in CUR state and the others in COM state) acquires the CPU once every second.
As the number of such processes increases, there is a proportional increase in the waiting time.
If the processes are not compute bound, they can relinquish the CPU before having consumed their quantum period, thus reducing waiting time for the CPU.
Because of MONITOR's sampling nature, the utility rarely detects
processes that remain only briefly in the COM state. Thus, if MONITOR
shows COM processes, you can assume they are the compute-bound type.
9.1.3 Determining Optimal Queue Length
The best way to determine a reasonable length for the compute queue at your site is to note its length during periods when all the system resources are performing adequately and when users perceive response time to be satisfactory.
Then, watch for deviations from this value and try to develop a sense
for acceptable ranges.
9.1.4 Estimating Available CPU Capacity
To estimate available CPU capacity, observe the average amount of idle time and the average number of processes in the various scheduling wait states.
While idle time is a measure of the percentage of unused CPU time, the wait states indicate the reasons that the CPU was idle and might point to utilization problems with other resources.
Before using idle time to estimate growth potential or as an aid to balancing the CPU resource among processes in an OpenVMS Cluster, ensure that the other resources are not overcommitted, thereby causing the CPU to be underutilized.
Whenever a process enters a scheduling wait state---a state other than CUR (process currently using the CPU) and COM---it is said to be blocked from using the CPU.
Most times, a process enters a wait state as part of the normal synchronization that takes place between the CPU and the other resources.
But certain wait states can indicate problems with those other resources that could block viable processes from using the CPU.
MONITOR data on the scheduling wait states provides clues about
potential problems with the memory and disk I/O resources.
9.1.5 Types of Scheduling Wait States
There are two types of scheduling wait
states---voluntary and involuntary.
Processes enter voluntary wait states directly; they are placed in
involuntary wait states by the system.
9.1.5.1 Voluntary Wait States
Processes in the local event flag wait (LEF) state are said to be voluntarily blocked from using the CPU; that is, they are temporarily requesting to wait before continuing with CPU service. Since the LEF state can indicate conditions ranging from normal waiting for terminal command input to waiting for I/O completion or locks, you can obtain no useful information about potentially harmful blockage simply by observing the number of processes in that state. You can usually assume, though, that most of them are waiting for terminal command input (at the DCL prompt).
Some processes might enter the LEF state because they are awaiting I/O completion on a disk or other peripheral device. If the I/O subsystem is not overloaded, this type of waiting is temporary and inconsequential. If, on the other hand, the I/O resource, particularly disk I/O, is approaching capacity, it could be causing the CPU to be seriously underutilized.
Long disk response times are the clue that certain processes are in the LEF state because they are experiencing long delays in acquiring disk service. If your system exhibits unusually long disk response times, refer to Section 7.2.1 and try to correct that problem before attempting to improve CPU responsiveness.
Other processes in the LEF state might be waiting for a lock to be granted. This situation can arise in environments where extensive file sharing is the norm---particularly in OpenVMS Clusters. Check the ENQs Forced to Wait Rate. (This is the rate of $ENQ lock requests forced to wait before the lock was granted.) Since the statistic gives no indication of the duration of lock waits, it does not provide direct information about lock waiting. A value significantly larger than your system's normal value, however, can indicate that users will start to notice delays.
On large SMP systems, it might improve performance to give one CPU all lock manager work. If you have a high CPU count and a high amount time spent synchronizing mulitple CPU's, consider implementing a dedicated lock manager as described in Section 13.2.
If you suspect... | Then... |
---|---|
The lock waiting is caused by file sharing 1 | Attempt to reduce the level of sharing. |
The lock waiting results from user or third-party application locks | Attempt to influence the redesign of such applications. |
A high amount of locking activity in an SMP environment | Assign a CPU to perform dedicated lock management. |
Processes can also enter the LEF state or the other voluntary wait
states (common event flag wait [CEF], hibernate [HIB], and suspended
[SUSP]) when system services are used to synchronize applications. Such
processes have temporarily abdicated use of the CPU; they do not
indicate problems with other resources.
9.1.5.2 Involuntary Wait States
Involuntary wait states are not requested by processes but are invoked by the system to achieve process synchronization in certain circumstances:
The presence of processes in the MWAIT state indicates that there might be a shortage of a systemwide resource (usually page or swapping file capacity) and that the shortage is blocking these processes from the CPU.
If you see processes in this state, do the following:
$ MONITOR /INPUT=SYS$MONITOR:file-spec /VIEWING_TIME=1 PROCESSES |
The most common types of resource waits are those signifying depletion of the page and swapping files as shown in the following table:
State | Description |
---|---|
RWSWP | Indicates a swapping file of deficient size. |
RWMBP, RWMPE, RWPGF | Indicates a paging file that is too small. |
RWAST |
Indicates that the process is waiting for a resource whose availability
will be signaled by delivery of an asynchronous system trap (AST).
In most instances, either an I/O operation is outstanding (incomplete), or a process quota has been exhausted. |
You can determine paging and swapping file sizes and the amount of available space they contain by entering the SHOW MEMORY/FILES/FULL command.
The AUTOGEN feedback report provides detailed information about paging
and swapping file use. AUTOGEN uses the data in the feedback report to
resize or to recommend resizing the paging and swapping files.
9.2 Detecting CPU Limitations
The surest way to determine whether a CPU limitation could be degrading performance is to check for a state queue with the MONITOR STATES command. See Figure A-16. If any processes appear to be in the COM or COMO state, a CPU limitation may be at work. However, if no processes are in the COM or COMO state, you need not investigate the CPU limitation any further.
If processes are in the COM or COMO state, they are being denied access to the CPU. One or more of the following conditions is occurring:
If you suspect the system is performing suboptimally because processes are blocked by a process running at higher priority, do the following:
If you find that this condition exists, your option is to adjust the
process priorities. See Section 13.3 for a discussion of how to change
the process priorities assigned in the UAF, define priorities in the
login command procedure, or change the priorities of processes while
they execute.
9.2.2 Time Slicing Between Processes
Once you rule out the possibility of preemption by higher priority processes, you need to determine if there is a serious problem with time slicing between processes at the same priority. Using the list of top CPU users, compare the priorities and assess how many processes are operating at the same one. Refer to Section 13.3, if you conclude that the priorities are inappropriate.
However, if you decide that the priorities are correct and will not
benefit from such adjustments, you are confronted with a situation that
will not respond to any form of system tuning. Again, the only
appropriate solution here is to adjust the work load to decrease the
demand or add CPU capacity (see Section 13.7).
9.2.3 Excessive Interrupt State Activity
If you discover that blocking is not due to contention with other processes at the same or higher priorities, you need to find out if there is too much activity in interrupt state. In other words, is the rate of interrupts so excessive that it is preventing processes from using the CPU?
You can determine how much time is spent in interrupt state from the MONITOR MODES display. A percentage of time in interrupt state less than 10 percent is moderate; 20 percent or more is excessive. (The higher the percentage, the more effort you should dedicate to solving this resource drain.)
If the interrupt time is excessive, you need to explore which devices cause significant numbers of interrupts on your system and how you might reduce the interrupt rate.
The decisions you make will depend on the source of heavy interrupts.
Perhaps they are due to communications devices or special hardware used
in real-time applications. Whatever the source, you need to find ways
to reduce the number of interrupts so that the CPU can handle work from
other processes. Otherwise, the solution may require you to adjust the
work load or acquire CPU capacity (see Section 13.7).
9.2.4 Disguised Memory Limitation
Once you have either ruled out or resolved a CPU limitation, you need
to determine which other resource limitation produces the block. Your
next check should be for the amount of idle time. See Figure A-17.
Use the MONITOR MODES command. If there is any idle time, another
resource is the problem and you may be able to tune for a solution. If
you reexamine the MONITOR STATES display, you will likely observe a
number of processes in the COMO state. You can conclude that this
condition reflects a memory limitation, not a CPU limitation. Follow
the procedures described in Chapter 7 to find the cause of the
blockage, and then take the corrective action recommended in
Chapter 10.
9.2.5 Operating System Overhead
If the MONITOR MODES display indicates that there is no idle time, your CPU is 100 percent busy. You will find that processes are in the COM state on the MONITOR STATES display. You must answer one more question. Is the CPU being used for real work or for nonessential operating system functions? If there is operating system overhead, you may be able to reduce it.
Analyze the MONITOR MODES display carefully. If your system exhibits excessive kernel mode activity, it is possible that the operating system is incurring overhead in the areas of memory management, I/O handling, or scheduling. Investigate the memory limitation and I/O limitation (Chapters 7 and 8), if you have not already done so.
Once you rule out the possibility of improving memory management or I/O
handling, the problem of excessive kernel mode activity might be due to
scheduling overhead. However, you can do practically nothing to tune
the scheduling function. There is only one case that might respond to
tuning. The clock-based rescheduling that can occur at quantum end is
costlier than the typical rescheduling that is event driven by process
state. Explore whether the value of the system parameter QUANTUM is too
low and can be increased to bring about a performance improvement by
reducing the frequency of this clock-based rescheduling (see
Section 13.4). If not, your only other recourse is to adjust the work
load or acquire CPU capacity (see Section 13.7).
9.2.6 RMS Misused
If the MONITOR MODES display indicates that a great deal of time is
spent in executive mode, it is possible that RMS is being misused. If
you suspect this problem, proceed to the steps described in
Section 8.3.3 for RMS induced I/O limitations, making any changes that
seem indicated. You should also consult the Guide to OpenVMS File Applications.
9.2.7 CPU at Full Capacity
If at this point in your investigation the MONITOR MODES display
indicates that most of the time is spent in supervisor mode or user
mode, you are confronted with a situation where the CPU is performing
real work and the demand exceeds the capacity. You must either make
adjustments in the work load to reduce demand (by more efficient coding
of applications, for example) or you must add CPU capacity (see
Section 13.7).
9.3 MONITOR Statistics for the CPU Resource
Use the following MONITOR commands to obtain the appropriate statistic:
Command | Statistic |
---|---|
Compute Queue | |
STATES | Number of processes in compute (COM) and compute outswapped (COMO) scheduling states |
Estimating CPU Capacity | |
STATES | All items |
MODES | Idle time |
Voluntary Wait States | |
STATES | Number of processes in local event flag wait (LEF), common event flag wait (CEF), hibernate (HIB), and suspended (SUSP) states |
LOCK | ENQs Forced to Wait Rate |
MODES | MP synchronization |
Involuntary Wait States | |
STATES | Number of processes in miscellaneous resource wait (MWAIT) state |
PROCESSES | Types of resource waits (RW xxx) |
Reducing CPU Consumption | |
MODES | All items |
Interrupt State | |
IO | Direct I/O Rate, Buffered I/O Rate, Page Read I/O Rate, Page Write I/O Rate |
DLOCK | All items |
SCS | All items |
MP Synchronization Mode | |
MODES | MP Synchronization |
IO | Direct I/O Rate, Buffered I/O Rate |
DLOCK | All items |
PAGE | All items |
DISK | Operation Rate |
Kernel Mode | |
MODES | Kernel mode |
IO | Page Fault Rate, Inswap Rate, Logical Name Translation Rate |
LOCK | New ENQ Rate, Converted ENQ Rate, DEQ Rate |
FCB | All items |
PAGE | Demand Zero Fault Rate, Global Valid Fault Rate, Page Read I/O |
DECNET | Sum of packet rates |
CPU Load Balancing | |
MODES | Time spent by processors in each mode |
See Table B-1 for a summary of MONITOR data items.
Previous | Next | Contents | Index |
privacy and legal statement | ||
6491PRO_010.HTML |