|
HP OpenVMS systems documentation |
Previous | Contents | Index |
The areas of noncompliance detected by the SRM_CHECK tool can be grouped into the following four categories. Most of these can be fixed by recompiling with new compilers. In rare cases, the source code may need to be modified. See Section 6.6.3.5 for information about compiler versions.
If the SRM_CHECK tool finds a violation in an image, the image should
be modified if necessary and recompiled with the appropriate compiler
(see Section 6.6.3.5). After recompiling, the image should be analyzed
again. If violations remain after recompiling, the source code must be
examined to determine why the code scheduling violation exists.
Modifications should then be made to the source code.
6.6.3.4 Coding Requirements
The Alpha Architecture Reference Manual describes how an atomic update of data between processors must be formed. The Third Edition, in particular, has much more information on this topic.
Exceptions to the following two requirements are the source of all known noncompliant code:
LDx_L Rx, n(Ry) ... label: ... STx_C Rx, n(Ry) |
Therefore, the SRM_CHECK tool looks for the following:
To illustrate, the following are examples of code flagged by SRM_CHECK.
** Found an unexpected ldq at 0008291C 00082914 AC300000 ldq_l R1, (R16) 00082918 2284FFEC lda R20, 0xFFEC(R4) 0008291C A6A20038 ldq R21, 0x38(R2) |
In the above example, an LDQ instruction was found after an LDQ_L before the matching STQ_C. The LDQ must be moved out of the sequence, either by recompiling or by source code changes. (See Section 6.6.3.3.)
** Backward branch from 000405B0 to a STx_C sequence at 0004059C 00040598 C3E00003 br R31, 000405A8 0004059C 47F20400 bis R31, R18, R0 000405A0 B8100000 stl_c R0, (R16) 000405A4 F4000003 bne R0, 000405B4 000405A8 A8300000 ldl_l R1, (R16) 000405AC 40310DA0 cmple R1, R17, R0 000405B0 F41FFFFA bne R0, 0004059C |
In the above example, a branch was discovered between the LDL_L and STL_C. In this case, there is no "fall through" path between the LDx_L and STx_C, which the architecture requires.
This branch backward from the LDx_L to the STx_C is characteristic of the noncompliant code introduced by the "loop rotation" optimization. |
The following MACRO--32 source code demonstrates code where there is a "fall through" path, but this case is still noncompliant because of the potential branch and a memory reference in the lock sequence.
getlck: evax_ldql r0, lockdata(r8) ; Get the lock data movl index, r2 ; and the current index. tstl r0 ; If the lock is zero, beql is_clear ; skip ahead to store. movl r3, r2 ; Else, set special index. is_clear: incl r0 ; Increment lock count evax_stqc r0, lockdata(r8) ; and store it. tstl r0 ; Did store succeed? beql getlck ; Retry if not. |
To correct this code, the memory access to read the value of INDEX must first be moved outside the LDQ_L/STQ_C sequence. Next, the branch between the LDQ_L and STQ_C, to the label IS_CLEAR, must be eliminated. In this case, it could be done using a CMOVEQ instruction. The CMOVxx instructions are frequently useful for eliminating branches around simple value moves. The following example shows the corrected code:
movl index, r2 ; Get the current index getlck: evax_ldql r0, lockdata(r8) ; and then the lock data. evax_cmoveq r0, r3, r2 ; If zero, use special index. incl r0 ; Increment lock count evax_stqc r0, lockdata(r8) ; and store it. tstl r0 ; Did write succeed? beql getlck ; Retry if not. |
This section contains information about versions of compilers that may generate noncompliant code sequences and the minimum recommended versions to use when recompiling.
Table 6-1 contains information for OpenVMS compilers.
Old Version | Recommended Minimum Version |
---|---|
BLISS V1.1 | BLISS V1.3 |
DEC C V5.x | HP C V6.0 |
DEC C++ V5.x | HP C++ V6.0 |
DEC Pascal V5.0-2 | HP Pascal V5.1-11 |
MACRO--32 V3.0 |
V3.1 for OpenVMS Version 7.1--2
V4.1 for OpenVMS Version 7.2 |
MACRO--64 V1.2 | See below. |
Current versions of the MACRO--64 assembler may still encounter the
loop rotation issue. However, MACRO--64 does not perform code
optimization by default, and this problem occurs only when optimization
is enabled. If SRM_CHECK indicates a noncompliant sequence in the
MACRO--64 code, it should first be recompiled without optimization. If
the sequence is still flagged when retested, the source code itself
contains a noncompliant sequence that must be corrected.
6.6.3.6 Interlocked Memory Sequence Checking for the MACRO--32 Compiler
The MACRO--32 Compiler for OpenVMS Alpha Version 4.1 and later performs additional code checking and displays warning messages for noncompliant code sequences. The following warning messages can display under the circumstances described:
BRNDIRLOC, branch directive ignored in locked memory sequence
BRNTRGLOC, branch target within locked memory sequence in routine 'routine_name'
MEMACCLOC, memory access within locked memory sequence in routine 'routine_name'
RETFOLLOC, RET/RSB follows LDx_L instruction
RTNCALLOC, routine call within locked memory sequence in routine 'routine_name'
STCMUSFOL, STx_C instruction must follow LDx_L instruction
Any MACRO--32 code on OpenVMS Alpha that invokes either the ALONONPAGED_INLINE or the LAL_REMOVE_FIRST macros from the SYS$LIBRARY:LIB.MLB macro library must be recompiled on OpenVMS Version 7.2 and later to obtain a correct version of these macros. The change to these macros corrects a potential synchronization problem that is more likely to be encountered on the Alpha 21264 (EV6) and subsequent processors.
Source modules that call the EXE$ALONONPAGED routine (or any of its variants) do not need to be recompiled. These modules transparently use the correct version of the routine that is included in this release. |
On VAX systems, seven instructions interlock memory. A memory interlock enables a VAX CPU or I/O processor to make an atomic read-modify-write operation to a location in memory that is shared by multiple processors. The memory interlock is implemented at the level of the memory controller. On a VAX multiprocessor system, an interlocked instruction is the only way to perform an atomic read-modify-write on a shared piece of data. The seven interlock memory instructions are as follows:
The VAX architecture interlock memory instructions are described in detail in the VAX Architecture Reference Manual.
The following description of the interlocked instruction mechanism assumes that the interlock is implemented by the memory controller and that the memory contents are fresh.
When a VAX CPU executes an interlocked instruction, it issues an interlock-read command to the memory controller. The memory controller sets an internal flag and responds with the requested data. While the flag is set, the memory controller stalls any subsequent interlock-read commands for the same aligned longword from other CPUs and I/O processors, even though it continues to process ordinary reads and writes. Because interlocked instructions are noninterruptible, they are atomic with respect to threads of execution on the same processor.
When the VAX processor that is executing the interlocked instruction issues a write-unlock command, the memory controller writes the modified data back and clears its internal flag. The memory interlock exists for the duration of only one instruction. Execution of an interlocked instruction includes paired interlock-read and write-unlock memory controller commands.
When you synchronize data with interlocks, you must make sure that all accessors of that data use them. This means that memory references of an interlocked instruction are atomic only with respect to other interlocked memory references.
On VAX systems, the granularity of the interlock depends on the type of
VAX system. A given VAX implementation is free to implement a larger
interlock granularity than that which is required by the set of
interlocked instructions listed above. On some processors, for example,
while an interlocked access to a location is in progress, interlocked
access to any other location in memory is not allowed.
6.6.5 Memory Barriers (Alpha Only)
On Alpha systems, there are no implied memory barriers except those performed by the PALcode routines that emulate the interlocked queue instructions. Wherever necessary, you must insert explicit memory barriers into your code to impose an order on references to data shared with threads of execution that could be running on other members of an SMP system. Memory barriers are required to ensure both the order in which other members of an SMP system or an I/O processor see writes to shared data, and the order in which reads of shared data complete.
There are two types of memory barrier:
The MB instruction guarantees that all subsequent loads and stores do not access memory until after all previous loads and stores have accessed memory from the viewpoint of multiple threads of execution. Alpha compilers provide semantics for generating memory barriers when needed for specific operations on data items.
Code that modifies the instruction stream must be changed to synchronize the old and new instruction streams properly. Use of an REI instruction to accomplish this does not work on OpenVMS Alpha systems.
The instruction memory barrier (IMB) PALcode routine must be used after a modification to the instruction stream to flush prefetched instructions. In addition, it also provides the same ordering effects as the MB instruction.
If a kernel mode code sequence changes the expected instruction stream, it must issue an IMB instruction after changing the instruction stream and before the time the change is executed. For example, if a device driver stores an instruction sequence in an extension to the unit control block (UCB) and then transfers control there, it must issue an IMB instruction after storing the data in the UCB but before transferring control to the UCB data.
The MACRO-32 compiler for OpenVMS Alpha provides the EVAX_IMB built-in
to insert explicitly an IMB instruction in the instruction stream.
6.6.6 Memory Fences (I64 Only)
The I64 memory fence (mf) instruction causes all memory operations
before the mf instruction to complete before any memory operations
after the mf instruction are allowed to begin. Fence instructions
combine the release and acquire semantics into a bidirectional fence;
that is, they guarantee that all previous orderable instructions are
made visible prior to any subsequent orderable instruction being made
visible.
6.6.7 PALcode Routines (Alpha Only)
Privileged architecture library (PALcode) routines include Alpha
instructions that emulate VAX queue and interlocked queue instructions.
PALcode executes in a special environment with interrupts blocked. This
feature results in noninterruptible updates. A PALcode routine can
perform the multiple memory reads and memory writes that insert or
remove a queue element without interruption.
6.6.8 I64 Emulation of PALcode Built-ins
The VAX interlocked queue instructions work unchanged on OpenVMS I64 systems and result in the SYS$PAL_xxxx run-time routine PALcode equivalents being called, which incorporate the necessary interlocks and memory barriers.
Whenever possible, the OpenVMS I64 BLISS, C, and MACRO compilers convert CALL_PAL macros to the equivalent OpenVMS-provided SYS$PAL_xxxx operating system calls for backward compatibility.
The BLISS compiler compiles each of the queue manipulation PALcode builtins into SYS$PAL_xxxx system service requests.
Refer to Porting Applications from HP OpenVMS Alpha to HP OpenVMS Industry Standard 64 for Integrity Servers for complete information on the BLISS
implementation.
6.7 Software-Level Synchronization
The operating system uses the synchronization primitives provided by
the hardware as the basis for several different synchronization
techniques. The following sections summarize the operating system's
synchronization techniques available to application software.
6.7.1 Synchronization Within a Process
On Alpha and I64 systems without kernel threads, only one thread of execution can execute within a process at a time, so synchronization of threads that execute simultaneously is not a concern. However, a delivery of an AST or the occurrence of an exception can intervene in a sequence of instructions in one thread of execution. Because these conditions can occur, application design must take into account the need for synchronization with condition handlers and AST procedures.
On Alpha systems without the byte-word extension, writing bytes or words or performing a read-modify-write operation requires a sequence of Alpha instructions. If the sequence incurs an exception or is interrupted by AST delivery or an exception, another process code thread can run. If that thread accesses the same data, it can read incompletely written data or cause data corruption. Aligning data on natural boundaries and unpacking word and byte data reduce this risk.
On Alpha and I64 systems, an application written in a language other than MACRO-32 must identify to the compiler data accessed by any combination of mainline code, AST procedures, and condition handlers to ensure that the compiler generates code that is atomic with respect to other threads. Also, data shared with other processes must be identified.
With process-private data accessed from both AST and non-AST threads of execution, the non-AST thread can block AST delivery by using the Set AST Enable (SYS$SETAST) system service. If the code is running in kernel mode, it can also raise IPL to block AST delivery. The Guide to Creating OpenVMS Modular Procedures describes the concept of AST reentrancy.
On a uniprocessor or in a symmetric multiprocessing (SMP) system,
access to multiple locations with a read or write instruction or with a
read-modify-write sequence is not atomic on OpenVMS systems. Additional
synchronization methods are required to control access to the data. See
Section 6.7.4 and the sections following it, which describe the use of
higher-level synchronization techniques.
6.7.2 Synchronization in Inner Mode (Alpha and I64 Only)
On Alpha and I64 systems with kernel threads, the system allows multiple execution contexts, or threads within a process, that all share the same address space to run simultaneously. The synchronization provided by spinlocks continues to allow thread safe access to process data structures such as the process control block (PCB). However, access to process address space and any structures currently not explicitly synchronized with spin locks are no longer guaranteed exclusive access solely by access mode. In the multithreaded environment, a new process level synchronization mechanism is required.
Because spin locks operate on a systemwide level and do not offer the process level granularity required for inner-mode access synchronization in a multithreaded environment, a process level semaphore is necessary to serialize inner mode (kernel and executive) access. User and supervisor mode threads are allowed to run without any required synchronization.
The process level semaphore for inner-mode synchronization is the inner mode (IM) semaphore. The IM semaphore is created in the first floating-point registers and execution data block (FRED) page in the balance set slot process for each process. In a multithreaded environment, a thread requiring inner mode access acquires ownership of the IM semaphore. That is, in general, two threads associated with the same process cannot execute in inner mode simultaneously. If the semaphore is owned by another thread, then the requesting thread spins until inner mode access becomes available, or until some specified timeout value has expired.
Previous | Next | Contents | Index |