 |
Index for Section 7 |
|
 |
Alphabetical listing for P |
|
 |
Bottom of page |
|
pfm(7)
NAME
pfm - The on-chip performance counter pseudo-device
SYNOPSIS
pseudo-device pfm
DESCRIPTION
The pfm pseudo-device is the interface to Alpha implementation-specific
on-chip performance counters. A set of ioctl calls form the interface, as
defined in the <sys/pfcntr.h> header file.
The kernel in use must have the pfm pseudo-device configured into it. To do
this, use one of the following methods:
· Add the following line to the kernel configuration file and rebuild
the kernel. Do not use this method if CPU hot-swap is supported by the
system, because it does not allow pfm to be easily unconfigured as
required for a hot-swap; instead, use the sysconfig method below.
pseudo-device pfm
· Enter the following command from the root account. Do not configure
pfm if CPU hot-swap is anticipated.
# sysconfig -c pfm
If pfm is configured, the CPU hot-swap procedure requires that it be
unconfigured, using the following command, before any CPU is swapped.
# sysconfig -u pfm
The autosysconfig program can be used to automatically load the
configurable pfm device at each system startup.
EV4 INTERFACE DESCRIPTION
The EV4 implementations (21064, 21064A, 21066, and 21068) have two
counters, each of which can be independently programmed to count certain
internal or external events. Each counter interrupts the system when a
certain number of the selected events have been counted. Any one of the
following three actions can happen at each interrupt (tick):
· Counters (PFM_COUNTERS)
· IPL histogramming (PFM_IPL)
· User or kernel PC profiling (PFM_PROFILING)
These values are defined in <sys/pfcntr.h> and can be selected orthogonally
by bitwise ORing the selections together and passing the result to the
PCNTSETITEMS ioctl request.
If counters are enabled, the interrupt count for this event is incremented.
This records the number of times each event has happened, in multiples of
the interrupt frequency selected (PCNTSETMUX). Note that the driver can
only count the interrupts generated; no direct access to the EV4 on-chip
counter values is provided.
If IPL histogramming is enabled, the appropriate entry in the IPL array is
incremented. The entries are:
· 0-5 refer to IPL0-IPL5.
· 6 is unused. (IPL6 is the level of the performance counter
interrupts.)
· 7 counts "idle" ticks (IPL = 0 and current_thread = idle_thread).
· 8 counts user mode ticks.
If profiling is enabled, a PC sample is added to the profile histogram if
the mode is correct (kernel or user).
Each CPU in a multiprocessor platform has separate counters, and the device
can be opened in three different ways:
· PCNTOPENONE opens and collects data on only the CPU that the program
is running on.
· PCNTOPENEACH opens all CPUs but keeps data for each one separately.
· PCNTOPENALL opens all CPUs, aggregating the data for all CPUs into one
collection.
These values are defined in <sys/pfcntr.h> and are bitwise ORed into the
mode passed to the device open call. Note that if PCNTOPENONE is selected,
the opening thread/process must be bound to that processor; otherwise, the
open will fail. It must also remain bound to that processor for the
duration of the driver usage or extremely unpredictable results will occur.
The following ioctl calls apply to the performance counter pseudo-device.
Note that most of the EV4 ioctls can also be used on EV5, EV6, and EV7:
PCNTRDISABLE
Disables performance counter interrupts on the CPU. Takes no arguments.
PCNTRENABLE
Enables performance counter interrupts on the CPU. Takes no arguments.
PCNTSETMUX (EV4 only)
Selects the statistics to be counted by each performance counter and
the interrupt frequency. Takes a pointer to a struct iccsr that
contains the MUX register values desired. The fields in this register
are:
iccsr_pc0
Controls the interrupt frequency of performance counter 0. If set,
interrupt frequency is every 2^12 events. If clear, interrupt
frequency is every 2^16 events.
iccsr_pc1
Controls the interrupt frequency of performance counter 1. If set,
interrupt frequency is every 2^8 events. If clear, interrupt
frequency is every 2^12 events.
iccsr_mux0
Selects the event counted by counter 0. One of: PF_ISSUES,
PF_PIPEDRY, PF_LOADI, PF_PIPEFROZEN, PF_BRANCHI, PF_CYCLES,
PF_PALMODE, PF_NONISSUES, PF_EXTPIN0
iccsr_mux1
Selects the event counted by counter 1. One of: PF_DCACHE,
PF_ICACHE, PF_DUAL, PF_BRANCHMISS, PF_FPINST, PF_INTOPS, PF_STOREI,
PF_EXTPIN1
iccsr_disable
Contains two bits, each of which disables data collection on the
specified counter. For example, set to 2 to disable counter 1 and
enable counter 0. Cannot be set to 3 (which disables both counters,
causing PCNTSETMUX to return EINVAL).
iccsr_ign0, iccsr_ign1, iccsr_ign2, iccsr_ign3
Do not set these fields. Must be zero.
PCNTSETITEMS
Selects the data items to be collected at each tick:
· Counters (PFM_COUNTERS)
· IPL histogramming (PFM_IPL)
· User or kernel PC profiling (PFM_PROFILING - see PCNTSETUADDR,
PCNTSETURANGE, PCNTSETKADDR, and PCNTSETKRANGE)
These values are defined in <sys/pfcntr.h> and can be selected
orthogonally by bitwise ORing the selections together into the integer
argument. If no items are selected, returns EINVAL.
PCNTLOGALL
Sets the on-chip counters to count all system activity. Takes no
arguments and returns no errors.
PCNTLOGSELECT
Sets the on-chip counters to count only those threads/processes with
the PCB_PME_BIT set in their PCBs, and sets the PCB_PME_BIT for this
process. This bit is inherited across fork/exec, setting it for all
children. Takes no arguments and returns no errors.
PCNTCLEARPCBPME
Clears the PCB_PME_BIT in the PCB of the current process. Takes no
arguments and returns no errors.
PCNTCLEARCNT
Clears the driver's internal counters appropriate to the actions
selected. If PFM_COUNTERS is enabled, the interrupt counters and cycle
counter value are reset. If PFM_IPL is enabled, the IPL histogram is
reset. If neither is enabled (PFM_PROFILING only), returns EINVAL and
nothing is cleared. Takes no arguments.
PCNTGETCNT (EV4 only)
Returns the driver's counter values and the pcc value(s). Takes a
pointer to an array of struct pfcntrs; the array is filled in with the
values. Sample usage of this ioctl is:
struct pfcntrs cntrs[NUM_OF_CPUS];
struct pfcntrs *pfcntrs = cntrs;
ioctl (fd, PCNTGETCNT, &pfcntrs);
If the driver is opened in mode PCNTOPENEACH, the underlying array must
be big enough to hold all of the data for each CPU; otherwise, EFAULT
is returned. If the driver is opened in mode PCNTOPENONE or
PCNTOPENALL, the array can be one element. If PFM_COUNTER is not
enabled, returns EINVAL.
PCNTGETRSIZE
Returns the number of bytes of data available to read for getting the
PC profiling samples. By default this will be equal to one fourth of
the address range being profiled. (By default, profiling data is kept
as one bucket per four instructions, which corresponds to a default
profiling stride of 4 instructions per sample count.) If the driver is
opened in mode PCNTOPENEACH, this number of bytes will be multiplied by
the number of CPUs.
To set the profiling address range and stride (and select user or
kernel profiling), use the PCNTSETURANGE or PCNTSETKRANGE ioctl,
respectively. To set the address range without changing the stride, you
can also use the PCNTSETUADDR or PCNTSETKADDR ioctl.
The PCNTGETRSIZE ioctl takes a pointer to a long and returns no errors.
The returned value will be 0 if profiling is not currently selected or
if the address range and mode have not been specified.
PCNTGETIPLHIS
Returns the current IPL histogram(s). Takes a pointer to an array of
struct pfipls; the array is filled in with the values. Sample usage of
this ioctl is:
struct pfipls ipls[NUM_OF_CPUS];
struct pfipls *pfipls = ipls;
ioctl (fd, PCNTGETIPLHIS, &pfipls);
If the driver is opened in mode PCNTOPENEACH, the underlying array must
be big enough to hold all of the data for each CPU. If the underlying
array is not big enough, EFAULT might be returned or other data in the
program might be overwritten.
If the driver is opened in mode PCNTOPENONE or PCNTOPENALL, the array
can be one element. If PFM_IPL is not enabled, returns EINVAL.
PCNTCALLER
If kernel mode profiling is turned on (with PCNTSETKADDR or
PCNTSETKRANGE), directs the profiler to collect data on the caller of
certain system utility routines (for example, bcopy, bzero,
simple_lock). If kernel mode profiling is not turned on, returns
EINVAL. (See also the descriptions of PCNTSETKADDR and PCNTSETKRANGE
for information about their use in PCNTCALLER mode.)
PCNTSETKADDR
Sets the kernel address range to profile and turns on kernel mode PC
profiling. If the device is not open for profiling, returns EINVAL. If
memory cannot be obtained for the sample data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged, specifies an additional
address range to collect profiling data on the caller of a routine,
instead of the routine itself. Takes a start and end address range. Up
to 4 additional address ranges may be added; additional attempts will
return ENOSPC. If the addresses are out of range of kernel text, not
aligned, or otherwise invalid, returns EFAULT.
Note that PCNTSETKRANGE performs the same functions as PCNTSETKADDR
and, in addition, lets you set the profiling stride.
PCNTSETKRANGE
Sets the kernel address range to profile and sets the profile stride
(the number of consecutive instructions grouped together for each
sample count). The stride must be a power of two (for example, 0, 1, 2,
4, 8). A zero stride means there should be only one counter for the
whole address range. This ioctl also turns on kernel mode PC profiling.
If the device is not open for profiling, returns EINVAL. If memory
cannot be obtained for the sample data, returns ENOMEM.
If PCNTCALLER kernel profiling mode is engaged, specifies an additional
address range to collect profiling data on the caller of a routine,
instead of the routine itself. Takes a start and end address range, and
ignores the stride. Up to 4 additional address ranges may be added;
additional attempts will return ENOSPC. If the addresses are out of
range of kernel text, not aligned, or otherwise invalid, returns
EFAULT.
PCNTSETUADDR
Sets the user address range to profile and turns on user mode PC
profiling. If the device is not open for profiling, returns EINVAL. If
memory cannot be obtained for the sample data, returns ENOMEM. Note
that PCNTSETURANGE performs the same functions as PCNTSETUADDR and, in
addition, lets you set the profiling stride.
PCNTSETURANGE
Sets the user address range to profile and sets the profile stride (the
number of consecutive instructions grouped together for each sample
count). The stride must be a power of two (for example, 0, 1, 2, 4, 8).
A zero stride means there should be only one counter for the whole
address range. This ioctl also turns on user mode PC profiling. If the
device is not open for profiling, returns EINVAL. If memory cannot be
obtained for the sample data, returns ENOMEM.
Only one process can have the pfm device open at any point in time. If the
device is opened with PCNTOPENONE, only the specified CPU is considered
open; subsequent open attempts will return EBUSY. If the device is opened
with PCNTOPENALL or PCNTOPENEACH, all CPUs must be available; otherwise,
returns EBUSY.
EBUSY will also be returned if another tool is using the performance
counters (or has used them but has not restored the default performance
counter interrupt handler). In this case, if you are sure no other users
are using the performance counters, re-execute the open call with superuser
privilege. This will reset the busy status and proceed to use the counters.
It is sufficient to open the device read-only. Opening the device will
disable interrupts (PCNTDISABLE) and log all system activity (PCNTLOGALL),
generating simple counters only. The counters are not cleared. Closing the
device automatically disables interrupts and resets the service routines
(PCNTDISABLE).
EV4 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can be
counted by the two on-chip counters associated with the EV4
implementations. For more information, consult the 21064 chip
specification.
Counter 0:
Issues (Total Issues Divided By 2)
This counter is incremented by one for each cycle in which two
instructions are issued and is incremented by 1/2 for each cycle in
which one instruction is issued. The number of cycles in which one
instruction is issued can be found by using the Dual Issues field and
the equation S = (I - D) * 2, where S = Single Issues, D = Dual Issues,
and I = Issues.
Pipedry
This counter is incremented by one for each cycle in which nothing is
issued due to the lack of valid instruction stream data. The causes
could be instruction cache refill operations (due to normal sequential
operation or delays while fetching the target of a branch) or delays
caused by the draining of the pipeline in response to an exception.
Loads
This counter is incremented for each load instruction. Note: If a load
misses in the primary data cache, the replay of the instruction will
cause the load counter to be incremented again.
Pipefrozen
This counter is incremented for each cycle in which nothing is issued
due to a resource conflict within the pipeline. Examples are:
· Not all source and destination registers are available
· A load miss or write buffer overflow occurs
· A conditional branch cannot be issued in the cycle following a
jump
· Memory Barrier instruction processing can cause the pipe to freeze
Branches
This counter is incremented for each branch instruction.
Cycles
This counter is incremented for each cycle.
PALcycles
This counter is incremented for each cycle spent in PALmode.
Nonissues (Total Non-issues Divided By 2)
This counter is incremented by one for each cycle in which no
instructions are issued and is incremented by 1/2 for each cycle in
which only one instruction is issued. This counter is the inverse of
the Issues counter: Non-issues = 1 - Issues.
Victims (External Pin 0)
This counter is incremented for each external event supplied to
external pin 0. On the DEC 3000/500 and DEC 3000/400, this pin is
connected to logic that indicates external cache misses with victims. A
victim is a data block that must be written back to main memory before
it is reused.
Counter 1:
Dcache
This counter is incremented for each primary data cache miss. Note:
this counter actually is incremented each time a primary data cache
probe does not complete in one cycle. This includes all misses, but
also includes hits that are stalled for other reasons such as bus
traffic holding previously misses pending.
Icache
This counter is incremented for each primary instruction cache miss.
Dualissues
This counter is incremented for each cycle in which two instructions
are dual-issued.
Mispredicts
This counter is incremented for each incorrectly predicted branch.
Floatops
This counter is incremented for each floating-point operate
instruction. The floating-point operate instructions do not include the
floating-point load, floating-point branch and floating-point store
instructions.
Intops
This counter is incremented for each integer operate instruction as
well as for each Load Address and Load Address High instruction.
Stores
This counter is incremented for each store instruction.
Novictims (External Pin 1)
This counter is incremented for each external event supplied to
external pin 1. On the DEC 3000/500 and DEC 3000/400, this pin is
connected to logic that indicates external cache misses without
victims.
Most items count the instances of different types of instructions. These
counters are incremented for each occurrence, and they do not give
information about the cost of executing the instruction. The Pipe
Frozen/Dry counter increments for each frozen or dry cycle, not for each
instance of pipe freeze or pipe dry.
EV5 INTERFACE DESCRIPTION
The EV5 implementations (21164, 21164A, and 21164PC) have three counters,
each of which can be independently programmed to count certain internal or
external events. They operate in much the same way as on EV4. Most of the
EV4 ioctl calls can also be used on EV5. Here are some descriptions for
EV5-specific ioctl calls:
PCNT5MUX
Selects the events counted by all three counters. The argument is a
bitwise OR of one event name for each counter. See <sys/pfcntr.h> for
the identifiers for the events: PF5_MUX0_*, PF5_MUX1_*, PF5_MUX2_*.
PCNT5FREQ
Selects the sampling interrupt frequency for all three counters. The
argument is a bitwise OR of one frequency indicator for each counter. A
frequency of 256 requires superuser privilege because it can place an
extremely heavy load on the system. Only carefully selected rare events
should be counted with such a high frequency. A lower frequency is
usually advisable, for example:
PF5_C0_INT_EVERY_65536
PF5_C1_INT_EVERY_65536
PF5_C2_INT_EVERY_16384
PCNT5ENABLE, PCNT5RESTART
Enables selected counters. (PCNT5RESTART zeroes them first.) The
argument is the address of the pmctrs_ev5_long member of a union
pmctrs_ev5, with the following additional field-member assignments:
· pmctrs_ev5_cpu = PMCTRS_ALL_CPUS
· pmctrs_ev5_select = any combination of PF5_SEL_COUNTER_0,
PF5_SEL_COUNTER_1, and PF5_SEL_COUNTER_2 using a bitwise OR
operator
PCNT5DISABLE
Disables selected counters.
PCNT5CLEAR, PCNT5SETCNTRS
Clears or writes selected counters on selected CPUs. The argument is
the address of the pmctrs_ev5_long member of a union pmctrs_ev5. See
<sys/pfcntr.h> for more information.
PCNT5CTXTS
Sets contexts in which to count. The argument is a bitwise OR of
selected PF5_CTXT_* values.
PCNT5GETCNT
Similar to EV4's PCNTGETCNT except that the argument is a pointer to an
array of struct pfcntrs_ev5.
PCNT5READCNTRS
Similar to PCNT5GETCNT except that the driver's counter values (i.e.,
the number of interrupts from each counter) are shifted left by the
counter width. The current raw hardware counters are read and added to
the tally.
PCNT5GETCNTRS
Reads the hardware counters from the selected CPU. The argument is the
address of the pmctrs_ev5_long member of a union pmctrs_ev5. See
<sys/pfcntr.h> for more information.
EV5 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can be
counted by the three on-chip counters associated with the EV5
implementations. For more information, see the 21164 or 21164PC chip
specification.
All EV5 Implementations (EV5, EV56, PCA56)
Counter 0:
Cycles0
This counter is incremented for each cycle. (Note that counter 2 also
has a cycles counter.)
Issues
This counter is incremented for each instruction.
Counter 1:
Nonissues
This counter is incremented for each cycle in which valid instructions
are ready for issue, but none are issued because of a pipeline stall or
because the resources they need are not available.
Splitissue
This counter is incremented for each cycle in which some but not all of
the maximum of four instructions are issued.
Pipedry
This counter is incremented for each cycle in which no instructions are
ready to issue.
Replay
This counter is incremented for each time an instruction has to be
executed again (instead of those behind it in the pipeline) because
resources it needed were found to be unavailable the first time it
executed.
Singleissues
This counter is incremented for each cycle in which one instruction is
issued.
Dualissues
This counter is incremented for each cycle in which two instructions
are issued.
Tripleissues
This counter is incremented for each cycle in which three instructions
are issued.
Quadissues
This counter is incremented for each cycle in which four instructions
are issued.
Flowchanges
This counter is incremented for each branch, jump, or return
instruction.
Intops
This counter is incremented for each integer operation.
Floatops
This counter is incremented for each floating-point operation.
Loads
This counter is incremented for each load operation.
Stores
This counter is incremented for each store operation.
Icacheacc
This counter is incremented for each Instruction Cache access.
Dcacheacc
This counter is incremented for each Data Cache access.
Counter 2:
Longstalls
This counter is incremented for each long pipeline stall (over 15
cycles).
Pcmispredicts
This counter is incremented for each PC misprediction.
Branchmispredicts
This counter is incremented for each branch misprediction.
Icachemisses
This counter is incremented for each instruction not found in either
the Instruction Cache or the associated Refill Buffer.
Itbmisses
This counter is incremented for each Instruction Cache miss for which
the instruction's page entry is not stored in the Instruction
Translation Buffer.
Dcacheldmisses
This counter is incremented for each load of a value that is not in the
Data Cache.
Dtbmisses
This counter is incremented for each Data Cache miss for which the data
page entry is not stored in the Data Translation Buffer.
Ldsmerged
This counter is incremented for each load from an address that misses
in the Data Cache but is merged with another load from the same address
that is already in the Missed Address File.
Ldureplays
This counter is incremented for each Data Cache miss (for a load) that
causes the replay of a later instruction that uses the loaded value.
Fullreplays
This counter is incremented for each store that is replayed because the
Write Buffer is full and for each load that is replayed because the
Missed Address File is full.
Externalinput
This counter is incremented for each cycle for which the perf_mon_h
External Input pin is true.
Cycles2
This counter is incremented for each cycle. (Note that counter 0 also
has a cycles counter.)
Memorybarriers
This counter is incremented for each stall cycle resulting from a
Memory Barrier.
Lockedloads
This counter is incremented for each Locked Load instruction.
EV5 and EV56 Implementations Only
Counter 1:
Scacheacc
This counter is incremented for each Secondary Cache access (for either
instructions or data).
Scachereads
This counter is incremented for each read from the Secondary Cache.
Scachewrites1
This counter is incremented for each write to the Secondary Cache.
(Note that counter 2 also has a scachewrites counter.)
Scachevictim
This counter is incremented for each time a data block in the Secondary
Cache must be written back to main memory before it is reused.
Bcacheref
This counter is incremented for each access to the optional, board-
level Backup Cache.
Bcachevictim
This counter is incremented for each time a data block in the Backup
Cache must be written back to main memory before it is reused.
Sysreqs
This counter is incremented for each system request.
Counter 2:
Scachemisses
This counter is incremented for each Secondary Cache miss.
Scachereadmisses
This counter is incremented for each Secondary Cache Read miss.
Scachewritemisses
This counter is incremented for each Secondary Cache Write miss.
Scachesharedwrites
This counter is incremented for each Secondary Cache Shared Write
operation.
Scachewrites2
This counter is incremented for each Secondary Cache Write operation.
(Note that counter 1 also has a scachewrites counter.)
Bcachemisses
This counter is incremented for each miss in the optional board-level
Backup Cache.
Systeminvalidates
This counter is incremented for each System Invalidate operation.
Systemreadrequests
This counter is incremented for each System Read Request.
PCA56 Implementation Only
Counter 1:
bcachereads
This counter is incremented for each read request from the MBOX.
bcachedreadhits
This counter is incremented for each Dstream read request that hits in
the bcache.
bcachedreadfills
This counter is incremented for each Dstream read fill to the Bcache.
bcachewrites
This counter is incremented for each write request from the MBOX.
bcachecleanwritehits
This counter is incremented for each write that hits a clean block in
the Bcache.
bcachevictims
This counter is incremented for each VICTIM command issued by the
21164PC.
readmisstwo
This counter is incremented each time a second READ_MISS is sent to the
system while an earlier READ_MISS command is still outstanding.
Counter 2:
bcachedreads
This counter is incremented for each Dstream read request from the
MBOX.
bcachereadhits
This counter is incremented for each read request that hits in the
Bcache.
bcachereadfills
This counter is incremented for each read fill to the Bcache.
bcachewritehits
This counter is incremented for each write that hits in the Bcache.
bcachewritefills
This counter is incremented for each write fill to the Bcache.
sysreadflushhits
This counter is incremented for each system READ or FLUSH hit in the
Bcache.
sysreadflushmisses
This counter is incremented for each system READ or FLUSH request.
readmissthree
This counter is incremented each time a third READ_MISS is sent to the
system while two earlier READ_MISS commands are still outstanding.
EV6 INTERFACE DESCRIPTION
The EV6 implementation (21264) has two counters, each of which can be
programmed to count certain internal or external events. They operate in
much the same way as the counters on EV4 and EV5. Most of the EV4 ioctl
calls can also be used on EV6. Below are some descriptions for EV6-specific
ioctl calls. Note that the EV6 interface should also be used on EV7
systems.
PCNT6MUX
Selects the events counted by the two counters. The argument is a
bitwise OR of one event name for each counter. See <sys/pfcntr.h> for
the identifiers for the events: PF6_MUX0_*, PF6_MUX1_*.
PCNT6ENABLE, PCNT6RESTART, PCNT6ENABWRITE
Enables selected counters. PCNT6RESTART zeros them first.
PCNT6ENABWRITE sets them to specified values. The argument is the
address of the pmctrs_ev6_long member of a union pmctrs_ev6, with the
following additional field-member assignments:
· pmctrs_ev6_cpu = PMCTRS_ALL_CPUS
· pmctrs_ev6_select = any combination of PF6_SEL_COUNTER_0 and
PF6_SEL_COUNTER_1 using a bitwise OR operator.
PCNT6DISABLE
Disables selected counters.
PCNT6CLEAR, PCNT6SETCNTRS
Clears or writes selected counters on selected CPUs. The argument is
the address of the pmctrs_ev6_long member of a union pmctrs_ev6. See
<sys/pfcntr.h> for more information.
PCNT6GETCNT
Similar to EV4's PCNTGETCNT except that the argument is a pointer to an
array of struct pfcntrs_ev6.
PCNT6GETCNTRS
Reads the hardware counters from the selected CPU. The argument is the
address of the pmctrs_ev6_long member of a union pmctrs_ev6. See
<sys/pfcntr.h> for more information.
PCNT6READCNTRS
Similar to PCNT6GETCNT except that the driver's counter values (i.e.,
the number of interrupts from each counter) are shifted left by the
counter width. The current raw hardware counters are read and added to
the tally.
EV6 DETAILED STAT DESCRIPTIONS
Following are more detailed descriptions of each of the events that can be
counted by the two on-chip counters associated with the EV6 implementation.
For more information, see the 21264 chip specification.
Counter 0:
cycles0
This counter is incremented for each cycle. (Note that counter 1 also
has a cycles counter.)
retinst
This counter is incremented for every retired instruction.
Counter 1:
cycles1
This counter is incremented for each cycle. (Note that counter 0 also
has a cycles counter.)
retcondbranch
This counter is incremented for each retired conditional branch.
retdtb1miss
This counter is incremented twice for each retired single dstream
translation buffer (DTB) miss.
retdtb2miss
This counter is incremented for each retired double DTB miss.
retitbmiss
This counter is incremented for each retired instruction translation
buffer (ITB) miss.
retunaltrap
This counter is incremented for each retired unaligned trap.
replay
This counter is incremented for each replay trap.
EV67 AND EV7 DETAILED STAT DESCRIPTIONS
Following are some descriptions of events that can be counted by the on-
chip counters associated with the EV67 implementation. The EV67 counters
may be used in two mutually exclusive modes: traditional aggregate and
profile-me. The EV67 traditional aggregate counters are not completely
independent. Any one statistic may be selected, or one of the following
pairs may be selected: (cycles0, replay); (retinst, cycles1); (retinst,
bcachemisses). EV7 provides the same statistics that EV67 does.
Counter 0:
cycles0
This counter is incremented for each cycle. (Note that counter 1 also
has a cycles counter.)
retinst
This counter is incremented for every retired instruction.
Counter 1:
cycles1
This counter is incremented for each cycle. (Note that counter 0 also
has a cycles counter.)
bcachemisses
This counter is incremented for each miss in the Backup Cache.
replay
This counter is incremented for each replay trap.
EV67 profile-me mode and traditional aggregate counters work differently:
instead of counting events as done by traditional aggregate counters,
instructions in profile-me mode are uniformly selected and various events
are recorded during the execution of each selected instruction.
The descriptions below are written for the perspective of a uprofile or
kprofile user. For example, the *_per_ret statistics actually cause the pfm
driver to return (statistic, retired) pairs which are later processed by
uprofile or kprofile. Similarly, the freq statistic is merely the same as
the retired statistic until uprofile or kprofile postprocesses it.
Any one of the following profile-me statistics may be selected.
Profile-me:
abort
This statistic is incremented if the profiled execution is aborted.
abort_per_ret
This ratio is the abort statistic scaled by 100 and divided by the
retired statistic.
arith_trap
This statistic is incremented if the profiled execution causes an
arithmetic trap.
cbr_taken
This statistic is incremented if the profiled execution is a taken
conditional branch.
cbr_taken_per_ret
This ratio is the cbr_taken statistic scaled by 100 and divided by the
retired statistic.
cycles
This statistic is incremented by the approximate number of cycles the
execution was in flight.
cycles_per_ret
This ratio is the cycles statistic divided by the retired statistic.
delay
This statistic is incremented by the approximate retire delay of the
profiled execution.
delay_per_ret
This ratio is the delay statistic scaled by 100 and divided by the
retired statistic.
dstream_fault
This statistic is incremented if the profiled execution causes a
Dstream fault.
dtb_miss
This statistic is incremented if the profiled execution causes a DTB
single miss.
dtb_miss_per_ret
This ratio is the dtb_miss statistic scaled by 100 and divided by the
retired statistic.
dtb_miss3
This statistic is incremented if the profiled execution causes a DTB
double miss (3 level page tables).
dtb_miss4
This statistic is incremented if the profiled execution causes a DTB
double miss (4 level page tables).
early_kill
This statistic is incremented if the profiled execution is killed early
in the pipeline.
early_kill_per_ret
This ratio is the early_kill statistic scaled by 100 and divided by the
retired statistic.
fp_disabled
This statistic is incremented if the profiled execution causes a
floating-point disabled trap.
freq
This statistic is incremented if the profiled execution retires.
uprofile and kprofile average this statistic within basic blocks to
provide instruction execution frequency estimates.
icache_miss
This statistic is incremented if the profiled execution was not yet
prefetched for the cache. Note the profiled instruction may experience
an unrecorded icache miss if the fetch is in progress.
icache_miss_per_ret
This ratio is the icache_miss statistic scaled by 100 and divided by
the retired statistic.
icache_parity
This statistic is incremented if the profiled execution experienced an
icache parity error.
inflt_bcache
This statistic is incremented by the approximate number of bcache
misses during the profiled execution.
inflt_replays
This statistic is incremented by the approximate number of replay traps
during the profiled execution.
inflt_retires
This statistic is incremented by the approximate number of instruction
retires during the profiled execution.
interrupt
This statistic is incremented if the profiled execution is pre-empted
by an interrupt.
istream_accvio
This statistic is incremented if the profiled execution causes an
istream access violation.
itb_miss
This statistic is incremented if the profiled execution causes an ITB
miss.
ldst_order
This statistic is incremented if the profiled execution causes a load-
store order trap.
ldst_unalign
This statistic is incremented if the profiled execution causes an
unaligned load or store.
map_stall
This statistic is incremented if the profiled execution stalled before
it was mapped.
map_stall_per_ret
This ratio is the map_stall statistic scaled by 100 and divided by the
retired statistic.
mispredict
This statistic is incremented if the profiled execution experiences a
misprediction.
mispredict_per_ret
This ratio is the mispredict statistic scaled by 100 and divided by the
retired statistic.
opcdec
This statistic is incremented if the profiled execution causes a
reserved opcode trap.
replay_trap
This statistic is incremented if the profiled execution causes a replay
trap.
replay_trap_per_ret
This ratio is the replay_trap statistic scaled by 100 and divided by
the retired statistic.
retire
This statistic is incremented if the profiled execution retires.
trap
This statistic is incremented if the profiled execution causes a trap.
trap_per_ret
This ratio is the trap statistic scaled by 100 and divided by the
retired statistic.
valid
This statistic is incremented if the profiled execution is valid.
For more information, see the 21264a chip specification.
NOTES
The notes in this section pertain only to EV4 processors.
Disabling an EV4 counter cannot actually disable it from interrupting the
CPU. However, the interrupt will be dismissed without recording any data.
Connections of the CPU's External Input pins to external events are
platform dependent. The DEC 3000/400, /500, /600, /800 workstations have
these connections; they count BCache Misses and BCache Misses with Victims.
Generating statistics on a per-process basis is only possible on 21064 Pass
3 or later processors. Attempts to do this on a Pass 2 or earlier will
gather statistics for the entire system.
FILES
/dev/pfcntr
The device entry (character, dev# 26/0)
/usr/include/sys/pfcntr.h
Structure definitions
SEE ALSO
Commands: kprofile(1), uprofile(1), prof(1), sysconfig(8), autosysconfig(8)
 |
Index for Section 7 |
|
 |
Alphabetical listing for P |
|
 |
Top of page |
|