|
HP OpenVMS systems documentation |
Previous | Contents | Index |
All floating-point instructions and directives, with the exception of POLYx, EMODx and all H_floating instructions, are supported.
These instructions are emulated by means of subroutine calls. This support is provided to allow hands-off compatibility for most existing VAX MACRO modules and is not designed for fast floating-point performance.
Besides the overhead of the emulation routine call, on OpenVMS Alpha
systems, all floating-point operands must be passed through memory
because the Alpha architecture does not have instructions to move
values directly from the integer registers to the floating-point
registers. In addition, on the first floating-point instruction, the
FEN (floating-point enable) bit is set for the process which will cause
the entire floating-point register set to be saved and restored on
every context switch for the life of the image.
2.10.1 Differences Between the OpenVMS VAX and OpenVMS Alpha/I64 Implementations
The differences between the implementations on OpenVMS VAX and OpenVMS Alpha/I64 systems are noted in the following list:
MOVF @8(AP), @4(AP) |
MOVL 8(AP), R1 MOVL 4(AP), R2 MOVF (R1),(R2) |
This support does not make the floating-point register set visible to the compiler. It simply allows floating point-operations to be done on the integer registers. This means that routines in other languages that want to interface with a VAX MACRO routine, either calling it or being called by it, must not expect any floating-point values as inputs or outputs. Compilers for other languages will pass these values in the floating-point registers. Floating-point arguments can be passed into or out of a VAX MACRO routine only by pointer.
Calls to run-time library (RTL) routines of other languages fall into
this category. For example, a call to MTH$RANDOM returns a floating
value in floating-point register F0. The compiler cannot directly read
F0. You need to either create a jacket routine (in another language),
which makes the call to MTH$RANDOM and then moves the result to R0, or
write a separate routine that only does the move.
2.11 Preserving VAX Atomicity and Granularity
The VAX architecture includes instructions that perform a read-modify-write memory operation so that it appears to be a single, noninterruptible operation in a uniprocessing system. Atomicity is the term used to describe the ability to modify memory in one operation. Because the complexity of such instructions severely limits performance, read-modify-write operations on an Alpha or I64 system can be performed only by nonatomic, interruptible instruction sequences.
Furthermore, VAX instructions can address single-aligned or unaligned byte, word, and longword locations in memory without affecting the surrounding memory locations. (A data item is considered aligned if its address is an even multiple of the item's size in bytes.) Granularity is the term used to describe the ability to independently write to portions of aligned longwords.
Because byte, word, and unaligned longword access also severely limits performance, an OpenVMS Alpha system can only access aligned longword or quadword locations. Therefore, a sequence of instructions to write a single byte, word, or unaligned longword causes some of the surrounding bytes to be read and rewritten.
While Itanium has instructions for accessing bytes and words, there is a performance penalty if they are unaligned.
These architectural differences can cause data to become corrupted under certain conditions.
In an OpenVMS Alpha system, atomicity and granularity preservation are not provided by locking out other threads from modifying memory, but by providing a way to determine if a piece of memory may have been modified during the read-modify-write operation. In this case, the read-modify-write operation is retried.
In an OpenVMS I64 system, atomicity is achieved by retrying the operation as for OpenVMS Alpha.
To ensure data integrity, the compiler provides certain qualifiers and
directives to be used for the conditions described in the following
sections.
2.11.1 Preserving Atomicity
On OpenVMS VAX, OpenVMS Alpha, and OpenVMS I64 multiprocessing systems, an application in which multiple, concurrent threads can modify shared data in a writable global section must have some way of synchronizing their access to that data. On a OpenVMS VAX single processor system, a memory modification instruction is sufficient to provide synchronized access to shared data. However, it is not sufficient on OpenVMS Alpha or OpenVMS I64 systems.
The compiler provides the /PRESERVE=ATOMICITY option to guarantee the integrity of read-modify-write operations for VAX instructions that have a memory modify operand. Alternatively, you can insert the .PRESERVE ATOMICITY and .NOPRESERVE ATOMICITY directives in sections of VAX MACRO source code as required to enable and disable atomicity.
For instance, assume the following instruction, which requires a read, modify, and write sequence on the data pointed to by R1:
INCL (R1) |
In a OpenVMS VAX system, the microcode performs these three operations. Therefore, an interrupt cannot occur until the sequence is fully completed.
In an OpenVMS Alpha system, the following three instructions are required to perform the one VAX instruction:
LDL R27, (R1) ADDL R27, 1, R27 STL R27, (R1) |
Similarly, in an OpenVMS I64 system, the following four instructions are required:
ld4 r22 = [r9] sxt4 r22 = r22 adds r22 = 1, r22 st4 [r9] = r22 |
The problem with this Alpha/Itanium code sequence is that an interrupt can occur between any of the instructions. If the interrupt causes an AST routine to execute or causes another process to be scheduled between the LDL and the STL, and the AST or other process updates the data pointed to by R1, the STL will store the result (R1) based on stale data.
When an atomic operation is required, and /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified, the compiler generates the following Alpha instruction sequence for INCL (R1):
Retry: LDL_L R28,(R1) ADDL R28,#1,R28 STL_C R28,(R1) BEQ R28, fail . . . fail: BR Retry |
and the following Itanium instruction sequence:
$L3: ld4 r23 = [r9] mov.m apccv = r23 mov r22 = r23 sxt4 r23 = r23 adds r23 = 1, r23 cmpxchg4.acq r23, [r9] = r23 cmp.eq pr0, pr6 = r22, r23 (pr6) br.cond.dpnt.few $L3 |
On the OpenVMS Alpha system, if (R1) is modified by any other code thread on the current or any other processor during this sequence, the Store Longword Conditional instruction (STL_C) will not update (R1), but will indicate an error by writing 0 into R28. In this case, the code branches back and retries the operation until it completes without interference.
The BEQ Fail and BR Retry are done instead of a BEQ Retry because the branch prediction logic of the Alpha architecture assumes that backward conditional branches will be taken. Since this operation will rarely need to be retried, it is more efficient to make a forward conditional branch which is assumed not to be taken.
Because of the way atomicity is preserved on OpenVMS Alpha systems, this guarantee of atomicity applies to both uniprocessor and multiprocessor systems. This guarantee applies only to the actual modify instruction and does not extend interlocking to subsequent or previous memory accesses (see Section 2.11.6).
The OpenVMS I64 version of the code uses the compare-exchange instruction (cmpxchg) to implement the locked access, but the effect is the same: If other code has modified the location being modified here, then this code will loop back and retry the operation.
You should take special care in porting an application to an OpenVMS Alpha or OpenVMS I64 system if it involves multiple processes that modify shared data in a writable global section, even if the application executes only on a single processor. Additionally, you should examine any application in which a mainline process routine modifies data in process space that can also be modified by an asynchronous system trap (AST) routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications1 for a more complete discussion of the programming issues involved in read-modify-write operations in an Alpha system.
When preserving atomicity, the compiler generates aligned memory instructions that cannot be handled by the Alpha PALcode unaligned fault handler. They will cause a fatal reserved operand fault on unaligned addresses. Therefore, all memory references for which .PRESERVE ATOMICITY is specified must be to aligned addresses (see Section 2.11.5). |
To preserve the granularity of a VAX MACRO memory write instruction on a byte, word, or unaligned longword on Alpha means to guarantee that the instruction executes successfully on the specified data and preserves the integrity of the surrounding data.
The VAX architecture includes instructions that perform independent access to byte, word, and unaligned longword locations in memory so two processes can write simultaneously to different bytes of the same aligned longword without interfering with each other.
The Alpha architecture, as originally implemented, defined instructions that could address only aligned longword and quadword operands. However, byte and word operands for load and store were later added.
On Alpha, code that writes a data field to memory that is less than a longword in length or is not aligned can do so only by using an interruptible instruction sequence that involves a quadword load, an insertion of the modified data into the quadword, and a quadword store. In this case, two processes that intend to write to different bytes in the same quadword will actually load, perform operations on, and store the whole quadword. Depending on the timing of the load and store operations, one of the byte writes could be lost.
The Itanium architecture has byte, word, longword, and quadword addressibility, so granularity of access is easily obtained without having to request it with an option or declaration.
The compiler provides the /PRESERVE=GRANULARITY option to guarantee the integrity of byte, word, and unaligned longword writes. The /PRESERVE=GRANULARITY option causes the compiler to generate Alpha instructions that provide granularity preservation for any VAX instructions that write to bytes, words, or unaligned longwords. Alternatively, you can insert the .PRESERVE GRANULARITY and .NOPRESERVE GRANULARITY directives in sections of VAX MACRO source code as required to enable and disable granularity preservation.
For example, the instruction MOVB R1, (R2) generates the following Alpha code sequence:
LDQ_U R23, (R2) INSBL R1, R2, R22 MSKBL R23, R2, R23 BIS R23, R22, R23 STQ_U R23, (R2) |
If any other code thread modifies part of the data pointed to by (R2) between the LDQ_U and the STQ_U instructions, that data will be overwritten and lost.
The following Itanium code sequence is generated:
st1 [r28] = r9 |
If you have specified that granularity be preserved for the same instruction, by either the command qualifier or the directive, the Alpha code sequence becomes the following:
BIC R2,#^B0111,R24 RETRY: LDQ_L R28,(R24) MSKBL R28,R2,R28 INSBL R1,R2,R25 BIS R25,R28,R25 STQ_C R25,(R24) BEQ R25, FAIL . . . FAIL: BR RETRY |
In this case, if the data pointed to by (R2) is modified by another code thread, the operation will be retried.
The Itanium code sequence would be unchanged, because the code is already only writing to the affected memory locations.
For a MOVW R1,(R2) instruction, the Alpha code generated to preserve granularity depends on whether the register R2 is currently assumed to be aligned by the compiler's register alignment tracking. If R2 is assumed to be aligned, the compiler generates essentially the same code as in the preceding MOVB example, except that it uses INSWL and MSKWL instructions instead of INSBL and MSKBL, and it uses #^B0110 in the BIC of the R2 address. If R2 is assumed to be unaligned, the compiler generates two separate LDQ_L/STQ_C pairs to ensure that the word is correctly written even if it crosses a quadword boundary.
Similarly, for Itanium, the compiler will simply generate st2 [r28] = r9 if the address is word-aligned.
The code generated for an aligned word write, with granularity preservation enabled, will cause a fatal reserved operand fault at run time if the address is not aligned. If the address being written to could ever be unaligned, inform the compiler that it should generate code that can write to an unaligned word by using the compiler directive .SET_REGISTERS UNALIGNED=Rn immediately before the write instruction. |
To preserve the granularity of a MOVL R1,(R2) instruction, the compiler always writes whole longwords with a STL instruction, even if the address to which it is writing is assumed to be unaligned. If the address is unaligned, the STL instruction will cause an unaligned memory reference fault. The PALcode unaligned fault handler will then do the loads, masks, and stores necessary to write the unaligned longword. However, since PALcode is noninterruptible, this ensures that the surrounding memory locations are not corrupted.
When porting an application to an OpenVMS Alpha system, you should determine whether the application performs byte, word, or unaligned longword writes to memory that is shared either with processes executing on the local processor, or with processes executing on another processor in the system, or with an AST routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications for a more complete discussion of the programming issues involved in granularity operations in an OpenVMS Alpha system.
INSV instructions do not generate code that correctly preserves granularity when granularity is turned on. |
If you enable the preservation of both granularity and atomicity, and the compiler encounters VAX code that requires that both be preserved, atomicity takes precedence over granularity.
For example, the instruction INCW 1(R0), when compiled with .PRESERVE=GRANULARITY, retries the write of the new word value, if it is interrupted. However, when compiled with .PRESERVE=ATOMICITY, it will also refetch the initial value and increment it, if interrupted. If both options are specified, it will do the latter.
In addition, while the compiler can successfully generate code for
unaligned words and longwords that preserves granularity, it cannot
generate code for unaligned words or longwords that preserves
atomicity. If both options are specified, all memory references must be
to aligned addresses.
2.11.4 When Atomicity Cannot Be Guaranteed
Because compiler atomicity guarantees only affect memory modification operands in VAX instructions, you should take special care in examining VAX MACRO sources for coding problems that /PRESERVE=ATOMICITY cannot resolve on OpenVMS Alpha or OpenVMS I64 systems.
For example, consider the following VAX instruction:
ADDL2 (R1),4(R1) |
For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:
LDL R28,(R1) Retry: LDL_L R24,4(R1) ADDL R28,R24,R24 STL_C R24,4(R1) BEQ fail . . . fail: BR Retry |
Note that, in this code sequence, when the STL_C fails, only the modify operand is reread before the add. The data (R1) is not reread. This behavior differs slightly from VAX behavior. In an OpenVMS VAX system, the entire instruction would execute without interruption; in an OpenVMS Alpha or OpenVMS I64 system, only the modify operand is updated atomically.
As a result, code that requires the read of the data (R1) to be atomic must use another method, such as a lock, to obtain that level of synchronization.
For this instruction, the compiler generates an Itanium code sequence such as the following:
ld4 r19 = [r9] sxt4 r19 = r19 adds r16 = 4, r9 $L4: ld4 r17 = [r16] mov.m apccv = r17 mov r15 = r17 sxt4 r17 = r17 add r17 = r19, r17 cmpxchg4.acq r17, [r16] = r17 cmp.eq pr0, pr7 = r15, r17 (pr7) br.cond.dpnt.few $L4 |
Consider another VAX instruction:
MOVL (R1),4(R1) |
LDL R28,(R1) STL R28,4(R1) |
The VAX instruction in this example is atomic on a single VAX CPU, but the Alpha instruction sequence is not atomic on a single Alpha CPU. Because the 4(R1) operand is a write operand and not a modify operand, the operation is not made atomic by the use of the LDL_L and STL_C.
On OpenVMS I64 systems, the code sequence would be something like the following:
ld4 r14 = [r9] sxt4 r14 = r14 adds r24 = 4, r9 st4 [r24] = r14 |
Finally, consider a more complex VAX INCL instruction:
INCL @(R1) |
LDL R28,(R1) Retry: LDL_L R24,(R28) ADDL R24,#1,R24 STL_C R24,(R28) BEQ fail . . . fail: BR Retry |
Here, only the update of the modify data is atomic. The fetch required to obtain the address of the modify data is not part of the atomic sequence.
On OpenVMS I64 systems, the code sequence would be similar to the following:
ld4 r16 = [r9] sxt4 r16 = r16 $L5: ld4 r14 = [r16] mov.m apccv = r14 mov r24 = r14 sxt4 r14 = r14 adds r14 = 1, r14 cmpxchg4.acq r14, [r16] = r14 cmp.eq pr0, pr8 = r24, r14 (pr8) br.cond.dpnt.few $L5 |
1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM. |
Previous | Next | Contents | Index |