HP OpenVMS MACRO Compiler Porting and User's Guide

HP OpenVMS MACRO Compiler
Porting and User's Guide

Contents

Index

4.2.1 Default Code Flow and Branch Prediction

Generally, the compiler generates Alpha and Itanium code that follows unconditional VAX MACRO branches and falls through conditional VAX MACRO branches unless it is directed otherwise. For example, consider the following VAX MACRO code sequence:

(Code block A) BLBS R0,10$ (Code block B) 10$: (Code block C) BRB 30$ 20$: . (Code block D) . 30$: (Code block E)

The Alpha code generated for this sequence looks like the following:

(Code block A) BLBS R0,10$ (Code block B) 10$: (Code block C) 30$: (Code block E)

Note that the compiler fell through the BLBS instruction, continuing with the instructions immediately following the BLBS. At the BRB instruction, it did not generate a branch instruction at all but followed the Alpha and Itanium code generated from Code block C with the Alpha and Itanium code generated from Code block E, at the branch destination. Code from Code block D at label 20$ will be generated at a later point in the routine. If there is no branch to the label 20$, the compiler will report the following informational message and will not generate Alpha or Itanium code for Code block D:

UNRCHCODE, unreachable code

In most cases, this algorithm produces Alpha and Itanium code that matches the assumptions of the architecture:

If a conditional branch is backward in the VAX MACRO code, then the destination likely has been generated already in the Alpha and Itanium code, and so the generated branch will also be backward.
If the conditional branch is forward in the VAX MACRO code, then the destination will likely not have been generated yet in the Alpha and Itanium code, and so the generated branch will also be forward.

However, because the compiler follows unconditional branches, the destination of a backward VAX MACRO branch may not have been generated yet. In this case, a conditional branch that was backward in the VAX MACRO source code may become a forward branch in the generated Alpha and Itanium code. See Section 4.2.5 for a further discussion and resolution of this problem.

There are some cases where the compiler may assume that a forward branch is taken. For example, consider the following common VAX MACRO coding practice:

JSB XYZ ;Call a routine BLBS R0,10$ ;Branch to continue on success BRW ERROR ;Destination too far for byte offset 10$:

In this case, and any case where the inline code following the branch is only a few lines and does not rejoin the code flow at the branch destination, the forward branch is considered taken. This eliminates the delay that occurs on OpenVMS Alpha and OpenVMS I64 systems for a mispredicted branch. The compiler will automatically change the sense of the branch, and will move the code between the branch and the label out of line to a point beyond the normal exit of the routine. For this example it would generate the following code:

JSR XYZ BLBC $L1 10$: . . . (routine exit) $L1: BRW ERROR

4.2.2 Changing the Compiler's Branch Prediction

The compiler provides two directives, .BRANCH_LIKELY and .BRANCH_UNLIKELY, to change its assumptions about branch prediction. The directive .BRANCH_LIKELY is for use with forward conditional branches when the probability of the branch is large, say 75 percent or more. The directive .BRANCH_UNLIKELY is for use with backward conditional branches when the probability of the branch is less than 25 percent.

These directives should only be used in performance-sensitive code. Furthermore, you should be more cautious when adding .BRANCH_UNLIKELY, because it introduces an additional branch indirection for the case when the branch is actually taken. That is, the branch is changed to a forward branch to a branch instruction, which in turn branches to the original branch target.

There is no directive to tell the compiler not to follow an unconditional branch. However, if you want the compiler to generate code that does not follow the branch, you can change the unconditional branch to be a conditional branch that you know will always be taken. For example, if you know that in the current code section R3 always contains the address of a data structure, you could change a BRB instruction to a TSTL R3 followed by a BNEQ instruction. This branch will always be taken, but the compiler will fall through and continue code generation with the next instruction. This will always cause a mispredicted branch when executed, but may be useful in some situations.

4.2.3 How to Use .BRANCH_LIKELY

If your code has forward conditional branches that you know will most likely be taken, you can instruct the compiler to generate code using that assumption by inserting the directive .BRANCH_LIKELY immediately before the branch instruction. For example:

MOVL (R0),R1 ; Get structure .BRANCH_LIKELY BNEQ 10$ ; Structure exists . (Code to deal with missing structure, which is too large for the compiler to automatically change the branch prediction) . 10$:

The compiler will follow the branch and will modify the code flow as described in the previous example, moving all the code that deals with the missing structure out of line to the end of the module.

4.2.4 How to Use .BRANCH_UNLIKELY

If your code has backward conditional branches that you know will most likely not be taken, you can instruct the compiler to generate code using that assumption by inserting the directive .BRANCH_UNLIKELY immediately before the branch instruction. For example:

MOVL #QUEUE,R0 ;Get queue header 10$: MOVL (R0),R0 ;Get entry from queue BEQL 20$ ;Forward branch assumed unlikely . ;by default . ;Process queue entry . TSTL (R0) ;More than one entry (known to be .BRANCH_UNLIKELY ;unlikely) BNEQ 10$ ;This branch made into forward 20$: ;conditional branch

The .BRANCH_UNLIKELY directive is used here because the compiler would predict a backward branch to 10$ as likely to be taken. The programmer knows it is a rare case, so the directive is used to change the branch to a forward branch, which is predicted not taken.

There is an unconditional branch instruction at the forward branch destination which branches back to the original destination. Again, this code fragment is moved to a point beyond the normal routine exit point. The code that would be generated by the previous VAX MACRO code follows:

LDQ R0, 48(R27) ;Get address of QUEUE from linkage sect. 10$: LDL R0, (R0) ;Get entry from QUEUE BEQ R0, 20$ . . ;Process queue entry . LDL R22, (R0) ;Load temporary register with (R0) BNE R22,$L1 ;Conditional forward branch predicted 20$: ;not taken by Alpha hardware . . . (routine exit) $L1: BR 10$ ;Branch to original destination

4.2.5 Forward Jumps into Loops

Because of the way that the compiler follows the code flow, a particular case that may not compile well is a forward unconditional branch into a loop. The code generated for this case usually splits the loop into two widely separated pieces. For example, consider the following macro coding construct:

(Allocate a data block and set up initial pointer) BRB 20$ 10$: (Move block pointer to next section to be moved) 20$: (Move block of data) (Test - is there more to move?) (Yes, branch to 10$) (Remainder of routine)

The MACRO compiler will follow the BRB instruction when generating the code flow and will then fall through the subsequent conditional branch to 10$. However, because the code at 10$ was skipped over by the BRB instruction, it will not be generated until after the end of the routine. This will convert the conditional branch into a forward branch instead of a backward branch. The generated code layout will look like the following:

(Allocate a data block and set up initial pointer) 20$: (Move block of data) (Test - is there more to move?) (Yes, branch to 10$) . . (Remainder of routine) (Routine exit) . . 10$: (Move block pointer to next section to be moved) BRB 20$

This results in the loop being very slow because the branch to 10$ is always predicted not taken, and the code flow has to keep going back and forth between the two locations. This situation can be fixed by inserting a .BRANCH_LIKELY directive before the conditional branch back to 10$. This will result in the following code flow:

(Allocate a data block and set up initial pointer) 20$: (Move block of data) (Test - is there more to move?) (No, branch to $L1) 10$: (Move block pointer to next section to be moved) BRB 20$ $L1: (Remainder of routine)

4.3 Code Optimization

The MACRO compiler performs several optimizations on the generated code. It performs all of them by default except VAXREGS. (VAXREGS is OpenVMS Alpha systems only.) You can change these default values with the /OPTIMIZE switch on the command line. The valid options are:

ADDRESSES
The compiler recognizes that the same address is referenced multiple times, and only loads the address once for use by multiple references.
REFERENCES
The compiler recognizes that the same data value is referenced multiple times, and only loads the data once for use by multiple references, subject to restrictions to ensure that the data being used is not stale.
PEEPHOLE
The compiler identifies instruction sequences that can be identically performed by smaller instruction sequences, and replaces the longer sequences with the shorter ones.
SCHEDULING
The compiler uses its knowledge of the nature of the multiple instruction issue ability of the Alpha and Itanium architecture to reschedule the code for optimum performance.
VAXREGS (Alpha systems only)
By default, the registers from R13 through R28 may be used as temporary scratch registers by the compiler if they are not used in the source code. When VAXREGS is specified, the compiler may also use any of the VAX register set (R0 through R12) that are not explicitly used by the MACRO source code. VAX registers used in this way will be restored to their original values at routine exit unless declared SCRATCH.

Note

Debugging is simplified if you specify /NOOPTIMIZE, because the optimizations include relocating and rescheduling code. For more information, see Section 2.13.1.

4.3.1 Using the VAXREGS Optimization (OpenVMS Alpha only)

To use the VAXREGS optimization, you must ensure that all routines correctly declare their register usage in their .CALL_ENTRY, .JSB_ENTRY, or .JSB32_ENTRY routine declarations. In addition, you must identify any VAX registers that are required or modified by any routines that are called. By default, the compiler assumes that no VAX registers are required as input to any called routine, and that all VAX registers except R0 and R1 are preserved across the call. To declare this usage, use the READ and WRITTEN qualifiers to the compiler directive .SET_REGISTERS. For example:

.SET_REGISTERS READ=<R3,R4>, WRITTEN=R5 JSB DO_SOMETHING_USEFUL

In this example, the compiler will assume that R3 and R4 are required inputs to the routine DO_SOMETHING_USEFUL, and that R5 is overwritten by the routine. The register usage can be determined by using the input mask of DO_SOMETHING_USEFUL as the READ qualifier, and the combined output and scratch masks as the WRITE qualifier.

Note

Using the VAXREGS qualifier without correct register declaration for both routine entry points and routine calls will produce incorrect code.

4.4 Common-Based Referencing

On an OpenVMS Alpha system, references to data cells generally require two memory references---one reference to load the data cell address from the linkage section and another reference to the data cell itself. If several data cells are located in proximity to one other, and the ADDRESSES optimization is used, the compiler can load a register with a common base address and then reference the individual data cells as offsets from that base address. This eliminates the load of each individual data cell address and is known as common-based referencing.

The compiler performs this optimization automatically for local data psects when the ADDRESSES optimization is turned on. The compiler generates symbols of the form $PSECT_BASEn to use as the base of a local psect.

To use common-based referencing for external data psects, you must create a prefix file which defines symbols as offsets from a common base. The prefix file cannot be used when assembling the module for OpenVMS VAX because the VAX MACRO assembler does not allow symbols to be defined as offsets from external symbols.

4.4.1 Creating a Prefix File for Common-Based Referencing

The following example illustrates the benefits of creating a prefix file to use common-based referencing. It shows:

Code generated without the use of a prefix file
How to create a prefix file
Code generated with the use of a prefix file

Consider the following simple code section (CODE.MAR), which refers to data cells in another module (DATA.MAR):

Module DATA.MAR: .PSECT DATA NOEXE BASE:: A:: .LONG 1 B:: .LONG 2 C:: .LONG 3 D:: .LONG 4 .END Module CODE.MAR: .PSECT CODE NOWRT E:: .CALL_ENTRY MOVL A,R1 MOVL B,R2 MOVL C,R3 MOVL D,R4 RET .END

When compiling CODE.MAR without using common-based referencing, the following code is generated:

In the linkage section:

.ADDRESS A .ADDRESS B .ADDRESS C .ADDRESS D

In the code section (not including the prologue/epilogue code):

LDQ R28, 40(R27) ;Load address of A from linkage section LDQ R26, 48(R27) ;Load address of B from linkage section LDQ R25, 56(R27) ;Load address of C from linkage section LDQ R24, 64(R27) ;Load address of D from linkage section LDL R1, (R28) ;Load value of A LDL R2, (R26) ;Load value of B LDL R3, (R25) ;Load value of C LDL R4, (R24) ;Load value of D

By creating a prefix file that defines external data cells as offsets from a common base address, you can cause the compiler to use common-based referencing for external references. A prefix file for this example, which defines A, B, C, and D in terms of BASE, follows:

A = BASE+0 B = BASE+4 C = BASE+8 D = BASE+12

When compiling CODE.MAR using this prefix file and the ADDRESSES optimization, the following code is generated:

In the linkage section:

.ADDRESS BASE ;Base of data psect

In the code section (not including the prologue/epilogue code):

LDQ R16, 40(R27) ;Load address of BASE from linkage section LDL R1, (R16) ;Load value of A LDL R2, 4(R16) ;Load value of B LDL R3, 8(R16) ;Load value of C LDL R4, 12(R16) ;Load value of D

In this example, common-based referencing shrinks the size of both the code and the linkage sections and eliminates three memory references. This method of creating a prefix file to enable common-based referencing of external data cells can be useful if you have one large, separate module that defines a data area used by many modules.

4.4.1.1 Code Sequence Differences on OpenVMS I64 Systems

The same effect shown in Section 4.4.1 occurs on OpenVMS I64 systems, though the details of the code sequences differ. The unmodified code results in an instruction sequence such as the following:

add r19 = D, r1 add r22 = C, r1 add r23 = B, r1 add r24 = A, r1 ld8 r19 = [r19] ld8 r24 = [r24] ld8 r22 = [r22] ld8 r23 = [r23] ld4 r4 = [r19] ld4 r9 = [r24] ld4 r3 = [r22] ld4 r28 = [r23] sxt4 r4 = r4 sxt4 r9 = r9 sxt4 r3 = r3 sxt4 r28 = r28

By using the prefix file method shown in Section 4.4.1, an instruction sequence with almost half of the memory accesses removed will result:

add r24 = BASE, r1 ld8 r24 = [r24] mov r23 = r24 ld4 r9 = [r24] adds r18 = 12, r24 adds r19 = 8, r24 adds r22 = 4, r24 adds r24 = 12, r24 ld4 r4 = [r24], -4 sxt4 r9 = r9 ld4 r3 = [r24], -4 sxt4 r4 = r4 sxt4 r3 = r3 ld4 r28 = [r24], -4 sxt4 r28 = r28

Contents

Index