|
HP OpenVMS systems documentation |
Previous | Contents | Index |
The following recommendations apply to aligning data:
On Alpha systems, the DECmigrate for OpenVMS Alpha VAX Environment Software Translator (VEST) utility is a tool that translates binary OpenVMS VAX image files into OpenVMS Alpha image files. Image files are also called executable files. Though it is similar to compiler, VEST is for binaries instead of sources.
VEST deals with alignment in two different modes: pessimistic and optimistic. VEST is optimistic by default; but whether optimistic or pessimistic, the alignment of program counter (PC) relative data is known at translation time, and the appropriate instruction sequence can be generated.
In pessimistic mode, all non PC-relative references are treated as unaligned using the safe access sequences. In optimistic mode, the emulated VAX registers (R0--R14) are assumed to be quadword aligned upon entry to each basic block. Autoincrement and autodecrement changes to the base registers are tracked. The offset plus the base register alignment are used to determine the alignment and the appropriate access sequence is generated.
The /OPTIMIZE=NOALIGN qualifier on the VEST command tells VEST to be pessimistic; it assumes that base registers are not aligned, and should generate the safe instruction sequence. Doing this can slow execution speed by a factor of two or more, if there are no unaligned data references. On the other hand, it can result in a performance gain if there are a significant number of unaligned references, since safe sequences avoid any unaligned data traps.
Additional controls preserve atomicity in longword data that is not
naturally aligned. Wherever possible, data should be aligned in the VAX
source code and the image rebuilt before translating the image with
DECmigrate. This results in better performance on both VAX and Alpha
systems.
15.3 Using Tools for Finding Unaligned Data
Tools that aid the uncovering of unaligned data include the OpenVMS
Debugger, Performance and Coverage Analyzer (PCA), and eight system
services. These tools are discussed in the following sections.
15.3.1 The OpenVMS Debugger
By using the OpenVMS Debugger, you can turn on and off unaligned data exception breakpoints by using the commands SET BREAK/UNALIGNED_DATA and CANCEL BREAK/UNALIGNED_DATA. These commands must be used with the SET BREAK/EXCEPTION command. When the debugger breaks at the unaligned data exception, the context is like any other exception. You can examine the program counter (PC), processor status (PS), and virtual address of the unaligned data exception. Example 15-1 shows the output from the debugger using the SET OUTPUT LOG command of a simple program.
Example 15-1 OpenVMS Debugger Output from SET OUTPUT LOG Command |
---|
#include <stdio.h> #include <stdlib.h> main( ) { char *p; long *lp; /* malloc returns at least quadword aligned printer */ p = (char *)malloc( 32 ); /* construct unaligned longword pointer and place into lp */ lp = (long *)((char *)(p+1)); /* load data into unaligned longword */ lp[0] = 123456; printf( "data - %d\n", lp[0] ); return; } ------- Compile and Link commands ------- $ cc/debug debug_example $ link/debug debug_example $ run debug_example ------- DEBUG session using set output log ------- Go ! break at routine DEBUG_EXAMPLE\main ! 598: p - (char *)malloc( 32 ); set break/unaligned_data set break/exception set radix hexadecimal Go !Unaligned data access: virtual address - 003CEEA1, PC - 00020048 !break on unaligned data trap preceding DEBUG_EXAMPLE\main\%LINE 602 ! 602: printf( "data - %d\n", lp[0] ); ex/inst 00020048-4 !DEBUG_EXAMPLE\main\%LINE 600+4: STL R1,(R0) ex r0 !DEBUG_EXAMPLE\main\%R0: 00000000 003CEEA1 |
The PCA allows you to detect and fix performance problems. Because unaligned data handling can significantly increase overhead, PCA has the capability to collect and present information on aligned data exceptions. PCA commands that collect and display unaligned data exceptions are:
Also, PCA can display data according to the PC of the fault, or by the
virtual address of the unaligned data.
15.3.3 System Services (Alpha and I64 Only)
On Alpha and I64 systems, there are eight system services to help locate unaligned data. The first three system services establish temporary image reporting; the next two provide process-permanent reporting, and the last three provide for system alignment fault tracking. The symbols used in calling all eight of these system services are located in $AFRDEF in the Macro-32 library, SYS$LIBRARY:STARLET.MLB. You can also call these system services in C with #include <afrdef.h> .
The first three system services can be used together; they report on the currently executing image. They are as follows:
You can use two of the eight system services to report unaligned data exceptions for the current process. The two services are as follows:
The three system services that allow you to track systemwide alignment faults are as follows:
These services require CMKRNL privilege. Alignment faults for all modes
and all addresses can be reported using these services. The user can
also set up masks to report only certain types of alignment faults. For
example, you can get reports on only kernel modes, only user PC, or
only data in system space.
15.3.4 Alignment Fault Utility (Alpha and I64 Only)
You can use use the Alignment Fault Utility (FLT) to find alignment faults. This utility can be started and stopped on the fly without the need for a system reboot. It records all alignment faults into a ring buffer, which can be sized when starting the alignment fault tracing. The summary screen displays the results sorted by the prorgram counter (PC) that has incurred the most alignment faults. The detailed trace output also shows the process identification (PID) of the process that caused the alignment fault, along with the virtual address that triggered the fault. The following example shows sample summary output.
$ ANALYZE/SYSTEM SDA> FLT LOAD SDA> FLT START TRACE SDA> FLT SHOW TRACE /SUMMARY Fault Trace Information: (at 18-AUG-2004 04:49:58.61, trace time 00:00:45.229810) --------------------------------------------------------------------------------- Exception PC Count Exception PC Module Offset ----------------- ------------ ------------ ---------------------------------- FFFFFFFF.80B25621 1260 SECURITY+1B021 SECURITY 0001B021 FFFFFFFF.80B25641 1260 SECURITY+1B041 SECURITY 0001B041 FFFFFFFF.80B25660 1260 SECURITY+1B060 SECURITY 0001B060 FFFFFFFF.80B25671 1260 SECURITY+1B071 SECURITY 0001B071 FFFFFFFF.80B25691 1260 SECURITY+1B091 SECURITY 0001B091 FFFFFFFF.80B39330 1243 NSA$SIZE_NSAB_C+00920SECURITY 0002ED30 FFFFFFFF.807273A1 1144 LOCKING+271A1 LOCKING 000271A1 FFFFFFFF.807273D1 1144 LOCKING+271D1 LOCKING 000271D1 FFFFFFFF.80B25631 1131 SECURITY+1B031 SECURITY 0001B031 FFFFFFFF.80B25661 1131 SECURITY+1B061 SECURITY 0001B061 FFFFFFFF.80B25600 1131 SECURITY+1B000 SECURITY 0001B000 FFFFFFFF.80B25650 1131 SECURITY+1B050 SECURITY 0001B050 FFFFFFFF.80B25680 1131 SECURITY+1B080 SECURITY 0001B080 FFFFFFFF.84188930 999 LIBRTL+00158930 LIBRTL 00158930 FFFFFFFF.80A678E0 991 RMS+001D4EE0 RMS 001D4EE0 FFFFFFFF.841888A0 976 LIBRTL+001588A0 LIBRTL 001588A0 FFFFFFFF.80B25AE0 392 EXE$TLV_TO_PSB_C+003B0SECURITY 0001B4E0 FFFFFFFF.80B26870 392 SECURITY+1C270 SECURITY 0001C270 FFFFFFFF.80B256F0 360 SECURITY+1B0F0 SECURITY 0001B0F0 FFFFFFFF.80B25AC0 336 EXE$TLV_TO_PSB_C+00390SECURITY 0001B4C0 FFFFFFFF.80B25EF0 336 EXE$TLV_TO_PSB_C+007C0SECURITY 0001B8F0 FFFFFFFF.80B256E0 326 SECURITY+1B0E0 SECURITY 0001B0E0 [...............] SDA> FLT STOP TRACE SDA> FLT UNLOAD |
OpenVMS Alpha and OpenVMS I64 very large memory (VLM) features for memory management provide extended support for database, data warehouse, and other very large database (VLDB) products. The VLM features enable database products and data warehousing applications to realize increased capacity and performance gains.
By using the extended VLM features, application programs can create large, in-memory global data caches that do not require an increase in process quotas. These large memory-resident global sections can be mapped with shared global pages to dramatically reduce the system overhead required to map large amounts of memory.
This chapter describes the following OpenVMS Alpha and OpenVMS I64 memory management VLM features:
To see an example program that demonstrates many of these VLM features,
refer to Appendix C.
16.1 Overview of VLM Features
Memory-resident global sections allow a database server to keep larger amounts of hot data cached in physical memory. The database server then accesses the data directly from physical memory without performing I/O read operations from the database files on disk. With faster access to the data in physical memory, run-time performance increases dramatically.
Fast I/O reduces CPU costs per I/O request, which increases the performance of database operations. Fast I/O requires data to be locked in memory through buffer objects. Buffer objects can be created for global pages, including pages in memory-resident sections.
Shared page tables allow that same database server to reduce the amount of physical memory consumed within the system. Because multiple server processes share the same physical page tables that map the large database cache, an OpenVMS Alpha or OpenVMS I64 system can support more server processes. This increases overall system capacity and decreases response time to client requests.
Shared page tables dramatically reduce the database server startup time because server processes can map memory-resident global sections hundreds of times faster than traditional global sections. With a multiple gigabyte global database cache, the server startup performance gains can be significant.
The system parameters GBLPAGES and GBLPAGFIL are dynamic parameters. Users with the CMKRNL privilege can now change these parameter values on a running system. Increasing the value of the GBLPAGES parameter allows the global page table to expand, on demand, up to the new maximum size.
The Reserved Memory Registry supports memory-resident
global sections and shared page tables. Through its interface within
the SYSMAN utility, the Reserved Memory Registry allows an OpenVMS
system to be configured with large amounts of memory set aside for use
within memory-resident sections or other privileged code. The Reserved
Memory Registry also allows an OpenVMS system to be properly tuned
through AUTOGEN, thus accounting for the preallocated reserved memory.
For information about using the reserved memory registry, see the
HP OpenVMS System Manager's Manual.
16.2 Memory-Resident Global Sections
Memory-resident global sections are non-file-backed global sections. This means that the pages within a memory-resident global section are not backed by the pagefile or by any other file on disk. Thus, no pagefile quota is charged to any process or charged to the system. When a process maps to a memory-resident global section and references the pages, working set list entries are not created for the pages. No working set quota is charged to the process.
Pages within a memory-resident global demand zero (DZRO) section initially have zero contents.
Creating a memory-resident global DZRO section is performed by calling either the SYS$CREATE_GDZRO system service or the SYS$CRMPSC_GDZRO_64 system service.
Mapping to a memory-resident global DZRO section is performed by calling either the SYS$CRMPSC_GDZRO_64 system service or the SYS$MGBLSC_64 system service.
To create a memory-resident global section, the process must have been granted the VMS$MEM_RESIDENT_USER rights identifier. Mapping to a memory-resident global section does not require this right identifier.
Two options are available when creating a memory-resident global DZRO section:
To use the fault option, it is recommended, but not required that the pages within the memory-resident global section be deducted from the system's fluid page count through the Reserved Memory Registry.
Using the Reserved Memory Registry ensures that AUTOGEN tunes the system properly to exclude memory-resident global section pages in its calculation of the system's fluid page count. AUTOGEN sizes the system pagefile, number of processes, and working set maximum size based on the system's fluid page count.
If the memory-resident global section has not been registered through the Reserved Memory Registry, the system service call fails if there are not enough fluid pages left in the system to accommodate the memory-resident global section.
If the memory-resident global section has been registered through the Reserved Memory Registry, the system service call fails if the size of the global section exceeds the size of reserved memory and there are not enough fluid pages left in the system to accommodate the additional pages.
If memory has been reserved using the Reserved Memory Registry, that memory must be used for the global section named in the SYSMAN command. To return the memory to the system, SYSMAN can be run to free the reserved memory, thus returning the pages back into the system's count of fluid pages.
If the name of the memory-resident global section is not known at boot time, or if a large amount of memory is to be configured out of the system's pool of fluid memory, entries in the Reserved Memory Registry can be added and the system can be retuned with AUTOGEN. After the system re-boots, the reserved memory can be freed for use by any application in the system with the VMS$MEM_RESIDENT_USER rights identifier. This technique increases the availability of fluid memory for use within memory-resident global sections without committing to which applications or named global sections will receive the reserved memory.
To use the allocate option, the memory must be pre-allocated during system initialization to ensure that contiguous, aligned physical pages are available. OpenVMS attempts to allow granularity hints, so that in many or even most cases, preallocated resident memory sections are physically contiguous. However, for example on systems supporting resource affinity domains (RADs), OpenVMS intentionally tries to "stripe" memory across all RADs, unless told to use only a single RAD. Granularity hints can be used when mapping to the memory-resident global section if the virtual alignment of the mapping is on an even 8-page, 64-page, or 512-page boundary. (With a system page size of 8 KB, granularity hint virtual alignments are on 64-KB, 512-KB, and 4-MB boundaries.) The maximum granularity hint on Alpha and I64 covers 512 pages. With 8-KB pages, this is 4 MB. If your selection is below this limit, there is an excellent chance that it will be contiguous. Currently, there is no guarantee of continguousness for application software. OpenVMS chooses optimal virtual alignment to use granularity hints if the flag SEC$M_EXPREG is set on the call to one of the mapping system services, such as SYS$MGBLSC.
Sufficiently contiguous, aligned PFNs are reserved using the Reserved Memory Registry. These pages are allocated during system initialization, based on the description of the reserved memory. The memory-resident global section size must be less than or equal to the size of the reserved memory or an error is returned from the system service call.
If memory has been reserved using the Reserved Memory Registry, that
memory must be used for the global section named in the SYSMAN command.
To return the memory to the system, SYSMAN can be run to free the
prereserved memory. Once the pre-reserved memory has been freed, the
allocate option can no longer be used to create the memory-resident
global section.
16.3 Fast I/O and Buffer Objects for Global Sections
VLM applications can use Fast I/O for memory shared by processes through global sections. Fast I/O requires data to be locked in memory through buffer objects. Database applications where multiple processes share a large cache can create buffer objects for the following types of global sections:
Buffer objects enable Fast I/O system services, which can be used to read and write very large amounts of shared data to and from I/O devices at an increased rate. By reducing the CPU cost per I/O request, Fast I/O increases performance for I/O operations.
Fast I/O improves the ability of VLM applications, such as database
servers, to handle larger capacities and higher data throughput rates.
16.3.1 Comparison of $QIO and Fast I/O
The $QIO system service must ensure that a specified memory range exists and is accessible for the duration of each direct I/O request. Validating that the buffer exists and is accessible is done in an operation called probing. Making sure that the buffer cannot be deleted and that the access protection is not changed while the I/O is still active is achieved by locking the memory pages for I/O and by unlocking them at I/O completion.
The probing and locking/unlocking operations for I/O are costly operations. Having to do this work for each I/O can consume a significant percentage of CPU capacity. The advantage of Fast I/O is that memory is locked only for the duration of a single I/O and can otherwise be paged.
Fast I/O must still ensure that the buffer is available, but if many I/O requests are performed from the same memory cache, performance can increase if the cache is probed and locked only once---instead of for each I/O. OpenVMS must then ensure only that the memory access is unchanged between multiple I/Os. Fast I/O uses buffer objects to achieve this goal. Fast I/O gains additional performance advantages by pre-allocating some system resources and by streamlining the I/O flow in general.
Previous | Next | Contents | Index |