Document revision date: 30 March 2001 | |
Previous | Contents | Index |
This chapter explains in detail the following topics:
Descriptor Names and Field Names
In this chapter and throughout this manual it is generally the practice
to use only the main part of a descriptor name or a descriptor field
name, without the 32-bit or 64-bit prefix used in the actual code. For
example, the length field is referred to using LENGTH rather than by
mentioning both DSC$W_LENGTH and DSC64$Q_LENGTH. The complete
descriptor or field name, including the prefix, is used only when
referring to one particular form of the descriptor.
2.1 String Semantics in the Run-Time Library
The semantics of a string refers to the conventions
that determine how a string is stored, written, and read. The Alpha and
VAX architectures support three string semantics: fixed length, varying
length, and dynamic length.
2.1.1 Fixed-Length Strings
Fixed-length strings have the following attributes:
The length of a fixed-length string is constant. It is usually initialized when the program is compiled or linked. After initialization, this length is read but never written. When a Run-Time Library routine copies a source string into a longer fixed-length destination string, the routine pads the destination string with trailing blanks.
When you pass a string to a Run-Time Library routine, you pass the string by descriptor. For a fixed-length string, the descriptor must contain this information:
In most cases, you do not have to construct an actual descriptor. By default, most OpenVMS Alpha and OpenVMS VAX languages pass strings by descriptor. For information about how the language you are using handles strings, see your language reference manual. For more information about descriptors used for fixed-length strings, refer to OpenVMS Programming Interfaces: Calling a System Routine1
In contrast to Run-Time Library routines, system services do not pad output strings. For this reason, when a program calls a system service that returns a fixed-length string, the program should supply an additional argument that indicates how many bytes the system service actually deposited in the fixed-length buffer of the calling program. Some system service routines have corresponding Run-Time Library routines that provide the proper semantics for fixed-length, varying-length, and dynamic output strings. |
Varying-length strings have the following attributes:
The current length, in bytes, of a varying-length string is stored in a two-byte field, called CURLEN, preceding the text of the string. The address of the string points to the beginning of this CURLEN field, not to the beginning of the string's text.
The maximum string length is a field in the string's descriptor. This field specifies how much space is allocated to the string in a program. The maximum string length is fixed and does not change.
The value in the CURLEN field specifies how many bytes beyond the CURLEN field are occupied by the string's text. The character positions beyond this range are reserved for the growth of the string. Their contents are undefined.
For example, assume a varying string whose CURLEN is 3 and whose maximum length is 6. If a string 'ABCD' is copied into this string, the result is 'ABCD' and the CURLEN is changed to 4. If a string 'XYZ' is now copied into the same varying string, the resulting string is 'XYZ' with a CURLEN of 3. The maximum length is still 6. The bytes beyond the range designated by CURLEN are undefined.
For varying-length strings pointed to by both 32-bit and 64-bit
descriptors, CURLEN is a two-byte field. Because of this, the maximum
length of a varying-length string is limited to 216 - 1, or
65,535, characters.
2.1.3 Dynamic-Length Strings
Dynamic-length strings have the following attributes:
Theoretically, dynamic strings have unbounded length. However, the descriptor LENGTH field contains the length of the string as an unsigned value. This effectively limits the maximum length of the string to the maximum unsigned integer value this field can hold.
For 32-bit dynamic descriptors, the LENGTH field is an unsigned value occupying two bytes. Because its maximum value is 216 - 1, or 65,535, the maximum length of a string is limited to 65,535 characters.
On Alpha systems, the LENGTH field of a 64-bit dynamic descriptor is an unsigned value occupying eight bytes. Because its maximum value is 264 - 1, the maximum length of a string is 264 - 1 characters.
The actual space for a dynamic-length string is allocated from heap storage by the Run-Time Library. When a Run-Time Library routine copies a character string into a dynamic string, and the currently allocated heap storage is not large enough to contain the string, the currently allocated storage returns to a pool of heap storage maintained by the string routines. Then the string routines obtain a new area of the correct size. As a result of this process of deallocation and reallocation, both the current-length field and the address portion of the string's descriptor may change. Often, dynamic strings are the most convenient type to write.
The Run-Time Library STR$ routines are the only routines that you should use to alter the length or address of a dynamic string. Do not use LIB$GET_VM or LIB$GET_VM_64 for this purpose. |
The following examples illustrate what happens when the string 'ABCDEF' (of length 6) is copied into various destination strings:
Length of output string | 10 |
Result | 'ABCDEF ' |
Length of output string | 6 |
Result | 'ABCDEF' |
Length of output string | 3 |
Result | 'ABC' |
1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM. |
2.2 Descriptor Classes and String Semantics
A calling program passes strings to an STR$ routine by descriptor. That is, the argument list entry for an input or output string is actually the address of a string descriptor. All STR$ routines handle both 32-bit and 64-bit descriptors in the argument list.
The calling program allocates a descriptor for the input string that indicates the string's address and length, so that the called routine can find the string's text and operate on it. The calling program also allocates a descriptor for the output string. In addition to length and address fields, each descriptor contains a field (DSC$B_CLASS or DSC64$B_CLASS) indicating the descriptor's class. The STR$ routine reads the class field to determine whether to write the output string as a fixed-length, varying-length, or dynamic string.
To determine the address and length of the data in the input string, Run-Time Library routines call one of the string descriptor analysis routines: LIB$ANALYZE_SDESC, LIB$ANALYZE_SDESC_64, STR$ANALYZE_SDESC, or STR$ANALYZE_SDESC_64.
The STR$ routines provide a centralized facility for analyzing string descriptors, allowing string-handling routines to function independently of the class of the input string. This means that if the Run-Time Library recognizes new string types, only the analysis routine needs to be changed, not the string routines themselves. If you are writing a routine that recognizes all the string types recognized by the Run-Time Library, your routine should first call the appropriate string-descriptor analysis routine to obtain the address and length of the input string.
You can also use the string descriptor analysis routines to find the length of a returned string. Assume that your called routine calls one of the Run-Time Library string-copying routines to create a new string. You now want the called routine to return the actual length of the new string to the calling program. The called routine calls one of the string-descriptor analysis routines to determine this length. This sequence of calls allows you to create the new string without knowing its ultimate length at the time it is created.
The Run-Time Library routines recognize the following classes of string descriptors:
For a detailed description of these descriptor classes and their fields, see the OpenVMS Calling Standard.
Table 2-1 indicates how the Run-Time Library routines access the fields of the descriptor for input and output string arguments. Given the class of the string and the field of the descriptor, the table shows whether the routine reads, writes, or modifies the field.
String Descriptor Fields | |||
---|---|---|---|
String Type | Class | Length | Pointer |
Input Argument to Routines | |||
Input string passed by descriptor | Read | Read | Read |
Output Argument from Routines; Called Routine Assumes the Descriptor Class | |||
Output string passed by descriptor, fixed-length | Ignored | Read | Read |
Output string passed by descriptor, dynamic | Ignored | Read, can be modified | Read, can be modified |
Output Argument from Routines; Calling Program Specifies the Descriptor Class in the Descriptor | |||
Output string, fixed-length--- Descriptor class: S, Z, A, NCA, SD | Read | Read | Read |
Output string, dynamic--- Descriptor class: D | Read | Read, can be modified | Read, can be modified |
Output string, varying-length--- Descriptor class: VS | Read | MAXSTRLEN is read; CURLEN is modified | Read |
When a calling program passes a string as an argument to a Run-Time Library routine, the argument contains the address of a descriptor. The called routine examines the CLASS field of the descriptor to determine in which fields it can find the length of the string and the first byte of the string's text. For each descriptor class, Table 2-2 indicates which descriptor fields the routine uses to locate this information. For diagrams of the descriptors, see the OpenVMS Calling Standard manual.
Class | String Length | Address of First Byte of Data |
---|---|---|
Z |
DSC$W_LENGTH
DSC64$Q_LENGTH |
DSC$A_POINTER
DSC64$PQ_POINTER |
S |
DSC$W_LENGTH
DSC64$Q_LENGTH |
DSC$A_POINTER
DSC64$PQ_POINTER |
D |
DSC$W_LENGTH
DSC64$Q_LENGTH |
DSC$A_POINTER
DSC64$PQ_POINTER |
A |
DSC$L_ARSIZE
DSC64$Q_ARSIZE |
DSC$A_POINTER
DSC64$PQ_POINTER |
SD |
DSC$W_LENGTH
DSC64$Q_LENGTH |
DSC$A_POINTER
DSC64$PQ_POINTER |
NCA |
DSC$L_ARSIZE
DSC64$Q_ARSIZE |
DSC$A_POINTER
DSC64$PQ_POINTER |
VS |
Word at DSC$A_POINTER or
at DSC64$PQ_POINTER (CURLEN field) |
Value of DSC$A_POINTER + 2 or
of DSC64$PQ_POINTER + 2 (byte after CURLEN field) |
Normally, Run-Time Library routines return the result of an operation in one of the following ways:
The STR$ routines that produce string results use the first method to pass the results back to the calling program. Because a result string, by definition, does not fit in R0/R1, the function value from an STR$ routine is placed in the first position in the argument list.
The string manipulation routines in the LIB$ and OTS$ facilities use the second method, returning their results as output arguments.
For example, there are three entry points for the string-copying routine: LIB$SCOPY_DXDX, OTS$SCOPY_DXDX, and STR$COPY_DX. These copy the source string to the destination string. Their formats are as follows:
LIB$SCOPY_DXDX(source-string ,destination-string)
OTS$SCOPY_DXDX(source-string ,destination-string)
STR$COPY_DX(destination-string ,source-string)
Because the STR$ entry point places the result string in the first position, you can call STR$COPY_DX using a function reference in languages that support string functions. In Fortran, for example, you can use a function reference to invoke STR$COPY_DX in the following ways:
CHARACTER*80 STR$COPY_DX RETURN_STATUS = STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING) |
or
DESTINATION_STRING = STR$COPY_DX(SOURCE_STRING) |
If you use the second form, you cannot access the return status, which is used to indicate truncation.
If you use a function reference to invoke a string manipulation routine in a language that does not support the concept of a string function (such as MACRO, BLISS, and Pascal), you must place the destination string variable in the argument list. In Pascal, for example, you can use a function reference to invoke STR$COPY_DX as follows:
STATUS := STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING); |
However, the following statement results in an error:
DESTINATION_STRING := STR$COPY_DX(SOURCE_STRING) |
In addition to allocating a variable for the output string, the calling program must allocate the space for and fill in the fields of the output string descriptor at compile, link, or run time. High-level languages do this automatically.
When a Run-Time Library routine returns an output string argument to the calling program, the argument list entry is the address of a descriptor. The routine determines the semantics of the output string (fixed, varying, or dynamic) by examining the class of the descriptor for the destination string. Given the class of the output string's descriptor, Table 2-3 specifies the semantics used by Run-Time Library routines when writing the string.
Class | Description | Restrictions | Semantics |
---|---|---|---|
Z | Unspecified | Treated as class S. | Fixed-length string |
S | Scalar, string | None. | Fixed-length string |
D | Dynamic string |
String length:
DSC$W_LENGTH < 2 16 (64K) |
Dynamic-length string |
A | Array |
Array is one-dimensional (DIMCT = 1).
String length: DSC$L_ARSIZE < 2 16 (64K) Length of array elements is 1 byte (LENGTH = 1). |
Fixed-length string |
SD | Scalar decimal | The DIGITS and SCALE fields are ignored. | Fixed-length string |
NCA | Noncontiguous array |
Array is one-dimensional (DIMCT = 1).
String length: DSC$L_ARSIZE < 2 16 (64K) Length of array elements is 1 byte (LENGTH = 1). Array is contiguous (S1 = LENGTH). |
Fixed-length string |
VS | Varying string | Current length less than maximum string length. (CURLEN <= MAXSTRLEN <= 2 16 (64K)) | Varying-length string |
When a called routine returns a string whose length cannot be determined by the calling routine, the calling routine should also pass an optional argument to contain the output length. If the output string is a fixed-length string, the length argument would reflect the number of characters written, not counting the fill characters.
The output length argument is useful, for instance, when your program is reading variable-length records. The program can read the input strings into a buffer that is large enough to contain the largest. When you want to perform the next operation on the contents of the buffer, the length argument indicates exactly how many characters have been read, so that the program does not need to manipulate the whole buffer.
For example, LIB$GET_INPUT has the optional argument resultant-length. If LIB$GET_INPUT is called with a fixed-length, 5-character string as an argument, and the routine reads a record containing 'ABC', then resultant-length has a value of 3 and the output string contains the characters ABC followed by two blanks. But if the routine reads a record containing the value 'ABCDEFG', resultant-length has a value of 5 and the output string is 'ABCDE'. In either case, the calling program knows exactly how many characters (not counting fillers) the routine has read.
A routine such as STR$COPY_DX does not need the length argument, because the calling program can determine the length of the output string. If the output string is dynamic, the length is the same as the input string length. If the output string is fixed-length, the length is the shorter of the two input lengths.
Previous | Next | Contents | Index |
privacy and legal statement | ||
5936PRO_001.HTML |