 |
Index for Section 1 |
|
 |
Alphabetical listing for P |
|
 |
Bottom of page |
|
prof(1)
NAME
prof, pixstats - Analyzes profile data
SYNOPSIS
prof [options] [prog_name [PC-sampling_data_file]...]
prof -pixie [options] [prog_name [Addrs_file | Counts_file]...]
prof -pixstats [options] [prog_name [Addrs_file | Counts_file]...]
pixstats [options] [prog_name [Addrs_file | Counts_file]...]
OPERANDS
prog_name
Name of the program executable to be profiled. This program should be
compiled with the -g1, -g2, or -g3 option to obtain more complete
profiling information. If the default symbol table level (-g0) has
been used, line number information, static procedure names, and file
names are unavailable to the profiling code.
PC-sampling_data_file
Name of a profiling data file (default mon.out) produced by executing a
program that has been linked with the cc -p command.
Counts_file
Name of an instruction-counts file produced by executing a program that
has been instrumented with pixie. If no Counts_file or Addrs_file is
specified, prog_name.Counts is used if found in the current working
directory.
Addrs_file
Name of an instruction-address file produced when the executable or
shared library object is instrumented with pixie. By default, the path
of each object.Addrs file will be recorded in the Counts_file, so they
do not need to be specified. The order of precedence for finding an
Addrs_file is as follows: Addrs_file path specified on command line,
current directory, directory of object specified in command line
argument, directory where pixie created it.
OPTIONS
For each prof option, you need to type only enough of the name to
distinguish it from the other options. If you do not specify any options,
prof uses -procedures by default. Always specify -pixie or -pixstats when
you process .Addrs and .Counts files.
The prof command accepts the following options:
-all
Causes the profiles for all shared libraries (if any) described in the
data file(s) to be displayed, in addition to the profile for the
executable.
-asm
Causes the profiler to print the assembly instructions for each
subroutine along with the cycle counts for each instruction. The
subroutines are sorted from highest cycle count to lowest. The
instructions for each subroutine are printed in order; they are not
sorted by cycle count.
When used without the -pixie option for a PC-sampling profile, the CPU
time used by each instruction is presented in milliseconds. (For
uprofile and kprofile, per-instruction sample counts are also provided
for events other than time.)
-clock megahertz
Alters the appropriate parts of the listing to reflect the clock speed
of the CPU. By default, the cycle time of the processor on which
program was run is used. (Use this option only with the -pixie option.)
-disassemble
Disassembles and shows the analyzed object code. (Use this option only
with the -pixstats option.)
-dislimit f
Limits the disassembly to blocks with f% frequency. (Use this option
only with the -pixstats option.)
-exclude procedure_name
If you use one or more -exclude options, the profiler omits the
specified procedure and its descendents from the listing. If any
option uses an uppercase "E" (for "Exclude"), prof also omits that
procedure from the base upon which it calculates percentages. To
represent all of the variations of an overloaded C++ function name, you
can specify just the part of the name up to but not including the "(".
-excobj object_file_name
Causes the profile for the named executable or shared library not to be
printed. You can use this option multiple times in a single prof
command.
-feedback filename
Produces a file with information that the compiler system can use to
decide which parts of the program will benefit most from global
optimization and which parts will benefit most from in-line procedure
substitution (requires basic-block counting). (Use this option only
with the -pixie option.)
This option is for compilers whose -feedback option requires a feedback
file (rather than an executable file) and that do not support the prof
command's -update option. For compilers that support the -update
option, better results can be achieved using that option instead of the
(prof) -feedback option.
-heavy
Reports the most heavily used lines in descending order of use.
-incobj object_file_name
Causes the profile for the named shared library to be printed, in
addition to the profile for the executable. You can use this option
multiple times in a single prof command.
-invocations
For each procedure, reports how many times the procedure was invoked
from each of its possible callers (requires basic-block counting). For
this listing, the -exclude and -only options apply to callees, but not
to callers. (Use this option only with the -pixie option.)
-Ldir
Changes the library directory search order for shared object libraries
so that prof looks for them in dir before the library recorded in
profile_file and the default library directories. You can specify
multiple -Ldir switches to specify several directory names.
-L Changes the library directory search order for shared object libraries
so that prof never looks for them in the default library directories.
Use this option when the default library directories should not be
searched and only the directories specified by -Ldir are to be
searched.
-lines
Gives the lines in order of occurrence within procedures. The
procedures are sorted in descending order of use.
-merge filename
Sums the sampling data files (or, in pixie mode, the .Counts files) and
writes the result into a new file with the specified name. The -only
and -exclude options have no effect on the merged data.
-nocounts
Uses 1 for each basic block count. (Use this option only with the
-pixstats or -pixie option.)
-numbers
Prints each procedure's starting line number if source file information
is available from the object file.
-only procedure_name
If you use one or more -only options, the profile listing includes only
the named procedures, rather than the entire program. If any option
uses an uppercase "O" for "Only," prof uses only the named procedures,
rather than the entire program, as the base upon which it calculates
percentages. To represent all of the variations of an overloaded C++
function name, you can specify just the part of the name up to but not
including the "(".
-pixie
Selects pixie mode, as opposed to sampling mode.
-pixstats
Selects generation of an alternative pixie-mode report for basic-block
profiling data, as previously produced by the pixstats(1) command. All
options of the previous version of pixstats(1) are recognized, for
compatibility.
-procedures
Reports time spent per procedure (using data obtained from sampling or
basic-block counting; the listing tells which one). For basic-block
counting, this option also reports the number of invocations per
procedure, including the aggregated invocations of any alternate entry
points.
-quit n
Truncates listings after n lines (if n is an integer), after the first
entry that represents less than n percent of the total (if n is
followed immediately by a "%" character), or after enough entries have
been printed to account for n percent of the total (if n is followed
immediately by "cum%"). For example, "-quit 15" truncates each part of
the listing after 15 lines of text, "-quit 15%" truncates each part
after the first line that represents less than 15 percent of the whole,
and "-quit 15cum%" truncates each part after the line that brought the
cumulative percentage above 15 percent.
-testcoverage
Reports all lines that never executed. (Use this option only with the
-pixie option.)
-totals
For -procedures and -invocations listings, prints cumulative statistics
for the entire object file instead of for each procedure in the object.
-truecycles [0,1,2]
Generates more analysis of a program to provide a more accurate reading
of cycles, instead of the default which assumes each instruction
executes in one cycle. The higher the number chosen from the arguments,
the more accurate the reading, although the profiler will run slower,
and memory-access delays are still not reflected. This option has
little or no effect on EV6 (21264) and later Alpha systems. (Use this
option only with the -pixie option.)
-update
Updates the program executable (prog_name) with profiling information
in the specified .Counts files, for use in future cc -feedback
prog_name command(s). This option requires that prog_name have been
compiled with the -feedback prog_name option or updating will fail.
This option will not generate a display unless another option forcing
the display behavior is specified. (Use this option only with the
-pixie option.)
-version
Prints the tool's version number.
-zero
Prints a list of procedures that were never invoked (requires basic-
block counting). (Use this option only with the -pixie option.)
DESCRIPTION
The prof command analyzes one or more data files generated by the
compiler's execution-profiling system and produces a listing. The prof
command can also combine those data files or produce a feedback file that
lets the optimizer take into account the program's run-time behavior during
a subsequent compilation. Profiling is a three-step process:
1. Compile the program
2. Execute the program
3. Run prof to analyze the data.
The compiler system provides two kinds of profiling:
PC-sampling
Interrupts the program periodically, recording the value of the program
counter.
Basic-block counting
Divides the program into blocks delimited by labels, jump instructions,
and branch instructions. It counts the number of times each block
executes.
The uprofile and kprofile tools provide a third kind of profiling,
performance counter sampling. The Alpha architecture on-chip performance
counters are used in performance counter sampling.
The following sections describe how to perform the various kinds of
profiling.
PC-Sampling Profiles
To use PC-sampling, compile your program with the -p option (strictly
speaking, it is sufficient to use this option only when linking the
program). Then, run the program containing the profiling startup routine
that calls monstartup to allocate extra memory to hold the profiling data.
If the program terminates normally or calls exit(2), it records the data in
a file at the end of execution.
If your program uses shared libraries, note that only its call-shared
portion is profiled in detail. Only the total time spent in each shared
library is recorded. To individually profile all library routines a program
uses, build the program with the -non_shared switch (by default, the
compiler produces a call-shared object unless -non_shared is explicitly
specified), or set the PROFFLAGS environment variable as described in the
Environment Variables section.
After running your program, use prof to analyze the PC-sampling data file.
For example:
cc -c myprog.c
cc -p -o myprog myprog.o
myprog (generates mon.out)
prof myprog mon.out
When you use prof for PC-sampling, the program name defaults to a.out. The
PC-sampling data file name defaults to mon.out; if you specify more than
one PC-sampling data file, prof reports the sum of the data.
PC-Sampling Environment Variables
You can use environment variables to change the default PC sampling and
profile data collection behavior. The variables are PROFDIR and PROFFLAGS.
The general form for setting these variables is:
· For C shell: setenv varname "value"
· For Bourne shell: varname = "value"; export varname
· For Korn shell: export varname = value
In the preceding example, varname can be one of the following:
PROFDIR
This environment variable causes PC-sampling data files to be generated
with unique file names in a specified directory.
You specify a directory path as the value and your prof results are
placed in the file path/pid.progname where path is the pathname, pid is
the process ID of the executing program, and progname is the program
name.
PROFFLAGS
This environment variable can take any of the following values:
-threads
Causes a separate data file to be generated for each thread. The
name of the data file takes the following form: pid.sid.progname.
The form of the filename resolves to pid as the process ID of the
program, sid as the sequence number of the thread, and progname as
the name of the program being profiled.
-all
Causes the program to fully profile all the permanently loaded
shared libraries, in addition to the nonshared or call-shared
executable.
-incobj name
Causes the program to profile only the named executable or shared
library.
-excobj name
Causes the program not to profile the named executable or shared
library.
-stride
Causes prof to change the ratio of text segment stride size to PC-
sample counter buffer size, that is, the number of instructions
that are counted together in a single counter word. The appropriate
ratio involves a tradeoff of size versus precision. Strides of 1,
2, 4, and 8 are supported. A special stride of 0 causes a single
PC-sample count to be recorded for each text segment.
The default stride is 2 for the executable, and 0 for each of its
shared libraries. If -all or -incobj are specified, all selected
objects are profiled with the same stride.
-sigdump signal-name
Automatically establishes monitor_signal(3) as the signal handler
for the named signal, and it causes monitor_signal(3) to zero the
profile after it is written to a file. This allows a signal to be
sent several times without the successive profiles overlapping, if
the file is renamed. The asynchronous nature of a signal may cause
small variations in the profile. Unrecognized signal-names are
ignored. The -threads option is ignored if combined with -sigdump.
-dirname directory
Specifies the directory path in which the profiling data file or
files are created.
-[no]pids
[Disables] or enables the addition of the process-id number to the
name of the profiling data file or files.
You can use the PROFDIR and PROFFLAGS environment variables together. For
more information, see the Programmer's Guide.
Basic-Block Counting
To use basic-block counting, compile your program without the option -p.
Use the pixie program to translate your program into a profiling version
and generate a file (prog_name.Addrs) containing block addresses. Then, run
the pixie version of the program, which (assuming the program terminates
normally or calls exit(2)) will generate a file (prog_name\.Counts)
containing block counts.
After running the pixie version of your program, use prof with the -pixie
option to analyze the .Addrs and .Counts files. Notice that you must
specify the name of your original program, not the name of the .pixie
version. For example:
cc -c myprog.c
cc -o myprog myprog.o
pixie myprog (generates myprog.Addrs and myprog.pixie)
myprog.pixie (generates myprog.Counts)
prof -pixie myprog myprog.Addrs myprog.Counts
When you use prof with the -pixie option, the .Addrs file name defaults to
prog_name.Addrs, and the .Counts file name defaults to prog_name.Counts.
Note that, when the .Counts file name defaults to prog_name.Counts, prof
does not attach any path prefix to prog_name, and it looks for the .Counts
file in the current working directory. If you specify more than one .Counts
file, prof reports the sum of the data.
For each shared library selected for profiling, the prof command searches
for an .Addrs file in the following locations if the file location is not
explicitly specified on the command line:
· Current directory
· Directory in which the object file is located if the location of the
object file is explicitly specified on the command line
· Directory in which pixie created it, as recorded in the .Counts file
For each selected shared library, the prof command searches for an object
file in the following locations:
· Directories specified in -Ldir options
· Directory in which pixie found it, as recorded in the .Addrs file, if
the -L option is specified
· Standard library search directories, as searched by ld, if the -L
option is not specified
Basic-Block Statistics
Use the -pixstats option to get an alternative profile. All options of the
previous version of the pixstats(1) command are recognized, for
compatibility.
If a disassembly is requested, all basic blocks (or those whose execution
count exceeds the -dislimit percentage of total instructions) are
disassembled, in increasing address order. Each block is labeled with its
procedure name and any offset from the start of the procedure. For each
instruction, the relative estimated CPU cycle at which the instruction
executes is printed, plus its source line, address, binary code, and
assembly language. The total CPU cycles used by one execution of the
block, the number of times it was executed, and its percentage of all
instructions executed are printed at the end of the block, following any
line reporting a non-zero delay caused to a follow-on block.
The main report begins with a record of the command line. This is followed
by a summary of the program's behavior:
· Total CPU cycles used by the profiled objects, plus the equivalent
number of seconds
· Total number of instructions executed
· Total delay caused by instructions executed in the preceding basic
block
· Total integer and floating-point no-op, arithmetic and logical,
logical, shift, load, store, load and store, load followed by load,
load and store and fetch (data bus use), load and store relative to
the stack or global pointers, floating-point, floating-point compare,
conditional branch instructions executed (itemized). Also, total
number of branch instructions executed whose target instruction is
another branch; and total number of such branches that are estimated
to be taken, rather than executing the next instruction in line.
· Total basic blocks, procedure calls, and branches that skip a single
instruction that were executed.
Next, some ratios are printed:
· Stores : stores + loads
· Instructions : basic block
· Instructions : branches
· Backward branches : branches
· CPU cycles : procedure calls
· Instructions : procedure calls
· Integer no-ops : integer and floating-point no-ops
· Floating-point no-ops : integer and floating-point no-ops
· Floating-point pipeline interlocks : floating-point operators
Next, basic blocks are analyzed according to how many instructions they
contain. For each size, pixstats reports the execution count, its
precentage and cumulative percentage relative to both instructions and
basic blocks, the number of instructions contained in blocks of that size,
the percentage and cumulative percentage of this relative to all
instructions, and the CPU-cycle cost per instruction of blocks of that
size. Then, pixstats prints various averages and quartiles of basic block
size, plus the largest basic block execution count encountered (to indicate
the chance of integer overflow in the analysis).
Next, pixstats analyzes the number of registers (integer and floating-
point) that are saved on procedure entry (and restored on exit). It prints
the number of procedure entries that save a given number of registers, and
the percentage and cumulative percentage of this relative to all procedure
entries, all registers saved, and all instructions executed. Finally, it
prints some averages and ratios.
The next two tables contain information on the sizes of executed
procedures' stack frames and the frequency of execution of each kind of
instruction. Frame sizes are reported in "bits"; for example, 6 bits means
a 32- to 48-byte stack frame. The number, percentage, and cumulative
percentage of executed calls to procedures with the given frame size is
printed. Similarly, the execution count is printed for each machine
instruction code, but this table is ordered by decreasing usage.
The next four tables are similar. They provide information about the size
of literals used by various categories of Alpha instructions:
· ADD,SUB,CMP instructions
· AND,BIC,BIS,XOR,CMOV instructions
· MUL instructions
· SHIFT,EXT,INS,MSK,ZAP instructions
(Note that a table may be omitted if there is no use of literals in the
program for the particular instruction category). For each of these tables
the size of the literal is reported in bits (for example, 4 bits means the
literal is greater than or equal to 8 and less than 16).
The next six tables are similar. They contain information on the size of
the memory displacement from a base register:
· LDA displacement from 0 (used like a load immediate instruction)
· LDAH displacement from 0 (used like a load immediate high)
· Branch
· SP-based load/store (load or store within a stack frame)
· GP-based load/store (load or store within a global offset table)
· All load or store instructions
Again, the "size" of the displacement is reported in bits; for example, 6
bits means a 32 to 63 byte displacement. For both positive displacements
(in the "0-extend" column) and negative displacements (in the "1-extend"
column), the execution count is printed along with percentage and
cumulative percentage. The summed cumulative percentage is printed last (in
the "Total" column).
In the "static" analysis of instructions, each instruction is counted once
per executed basic-block. The "static" distribution will be the same as the
regular opcode distribution when -nocounts is specified. Following "static"
totals for instructions and basic blocks, the number and percentage of each
instruction code is listed.
The next two tables contain information on how many times each integer and
floating-point register was accessed, plus its percentage, ordered by
register number. For integer registers, the number and percent of uses as
a base register in memory operations is also listed.
Finally, pixstats prints a flat profile of CPU cycles used by procedures.
This includes the CPU cycles used by the procedure, the percentage of the
total, the cumulative percentage, the number of instructions executed as
part of the procedure, its average number of CPU cycles per instruction,
the number of calls made to the procedure, the average number of CPU cycles
per call, and the procedure name. If -numbers is specified, the object and
source file names and line number are also printed.
Performance Counter Samples
After running the uprofile or kprofile utility to collect profiling data or
your program or the kernel, respectively, run prof to examine the resulting
mon.out or kmon.out file, as follows:
· For uprofile output: prof prog_name mon.out
· For kprofile output: prof /vmunix kmon.out
Use prof as for PC sampling, except that only the executable has a profile.
Old performance counter sample data files, generated on versions of the
operating system prior to DIGITAL UNIX Version 4.0, must be analyzed as if
they contained PC-sampling data.
RESTRICTIONS
The -pixstats option models execution assuming a perfect memory system.
Memory system events such as cache misses will increase execution above the
-pixstats predictions.
The set of statistics reported by the -pixstats option and the format of
the report are the same as for previous versions of the pixstats(1)
command, but note the following:
· The labels on disassembled basic blocks take the form procedure-name
(or proc_at_0x... if no symbol is available) for an initial block and
procedure-name+offset for subsequent blocks.
· All reported cycles reflect CPU pipeline interlocks, so they usually
do not match the reported instruction counts.
· If not all the shared objects used by a program are profiled, the
procedure-call counts may be smaller than the jsr/bsr instruction
counts.
FILES
crt0.o
Normal startup code
mcrt0.o
Startup code for PC-sampling
libprof1.a
Library for PC-sampling
kmon.out
Default kprofile data file
mon.out
Default PC-sampling data file
umon.out
Default uprofile data file
SEE ALSO
Introduction: prof_intro(1)
Commands: as(1), cc(1), gprof(1), pixie(1), uprofile(1), kprofile(1),
dxprof(1). (dxprof is available as an option.)
Functions: monitor(3), profil(2)
Programmer's Guide
 |
Index for Section 1 |
|
 |
Alphabetical listing for P |
|
 |
Top of page |
|