HP C
Run-Time Library Reference Manual for OpenVMS Systems


Previous Contents Index


Chapter 10
Developing International Software

This chapter describes typical features of international software and the features provided with the HP C Run-Time Library (RTL) that enable you to design and implement international software.

See the Reference Section for more detailed information on the functions described in this chapter.

10.1 Internationalization Support

The HP C RTL has added capabilities to allow application developers to create international software. The HP C RTL obtains information about a language and a culture by reading this information from locale files.

10.1.1 Installation

If you are using these HP C RTL capabilities, you must install a separate kit to provide these files to your system. See the appendix "Installing OpenVMS Internationalization data kit" in the OpenVMS Upgrade and Installation Guide.

On OpenVMS VAX systems, the save set VMSI18N0nn is provided on the same media as the OpenVMS operating system.

On OpenVMS Alpha systems the save set is provided on the Layered Product CD, and is named VMSI18N0nn or ALPVMSI18N0n_07nn.

To install this save set, follow the standard OpenVMS installation procedures using this save-set name as the name of the kit. There are several categories of locales that you can select to install. You can select as many locales as you need by answering the following prompts:


* Do you want European and US support? [YES]? 
* Do you want Chinese GB18030 support (locale and Unicode converters) [YES]? 
* Do you want Chinese support? [YES]? 
* Do you want Japanese support? [YES]? 
* Do you want Korean support? [YES]? 
* Do you want Thai support? [YES]? 
* Do you want the Unicode converters? [YES]? 

This kit also has an Installation Verification Procedure that we recommend you run to verify the correct installation of the kit.

10.1.2 Unicode Support

In OpenVMS Version 7.2, the HP C Run-Time Library added the Universal Unicode locale, which is distributed with the OpenVMS system, not with the VMSI18N0nn kit. The name of the Unicode locale is:


   UTF8-20 

Like those locales shipped with the VMSI18N0nn kit, the Unicode locale is located at the standard location referred to by the SYS$I18N_LOCALE logical name.

The UTF8-20 Unicode is based on Unicode standard Version V2.0. The Unicode locale uses UCS-4 as wide-character encoding and UTF-8 as multibyte character encodings.

HP C RTL also includes converters that perform conversions between Unicode and any other supported character sets. The expanded set of converters includes converters for UCS-2, UCS-4, and UTF-8 Unicode encoding. The Unicode converters can be used by the ICONV CONVERT utility and by the iconv family of functions in the HP C Run-Time Library.

In OpenVMS Version 7.2, the HP C Run-Time Library added Unicode character set converters for Microsoft Code Page 437.

10.2 Features of International Software

International software is software that can support multiple languages and cultures. An international program should be able to:

To meet the previous requirements, an application should not make any assumptions about the language, local customs, or the coded character set used. All this localization data should be defined separately from the program, and only bound to it at run time.

The rest of this chapter describes how you can create international software using HP C.

10.3 Developing International Software Using HP C

The HP C environment provides the following facilities to create international software:

10.4 Locales

A locale consists of different categories, each of which determines one aspect of the international environment. Table 10-1 lists the categories in a locale and describes the information in each.

Table 10-1 Locale Categories
Category Description
LC_COLLATE Contains information about collating sequences.
LC_CTYPE Contains information about character classification.
LC_MESSAGES Defines the answers that are expected in response to yes/no prompts.
LC_MONETARY Contains monetary formatting information.
LC_NUMERIC Contains information about formatting numbers.
LC_TIME Contains time and date information.

The locales provided reside in the directory defined by the SYS$I18N_LOCALE logical name. The file-naming convention for locales is:


language_country_codeset.locale 

Where:

10.5 Using the setlocale Function to Set Up an International Environment

An application sets up its international environment at run time by calling the setlocale function. The international environment is set up in one of two ways:

The syntax for the setlocale function is:

char *setlocale(int category, const char *locale)

Where:

If an application does not call the setlocale function, the default locale is the C locale. This allows such applications to call those functions that use information in the current locale.

Specifying the Locale Using Logical Names

If the setlocale function is called with "" as the locale argument, the function checks for a number of logical names to determine the locale name for the category specified.

There are a number of logical names that users can set up to define their international environment:

In addition to the logical names defined by a user, there are a number of systemwide logical names, set up during system startup, that define the default international environment for all users on a system:

The setlocale function checks for user-defined logical names first, and if these are not defined, it checks the system logical names.

10.6 Using Message Catalogs

An important requirement for international software is that it should be able to communicate with the user in the user's own language. The messaging system enables program messages to be created separately from the program source, and linked to the program at run time.

Messages are defined in a message text source file, and compiled into a message catalog using the GENCAT command. The message catalog is accessed by a program using the functions provided in the HP C RTL.

The functions provided to access the messages in a catalog are:

For information on generating message catalogs, see the GENCAT command description in the OpenVMS system documentation.

10.7 Handling Different Character Sets

The HP C RTL supports a number of state-independent codesets and codeset encoding schemes that contain the ASCII encoded Portable Character Set. It does not support state-dependent codesets. The codesets supported are:

10.7.1 Charmap File

The characters in a codeset are defined in a charmap file. The charmap files supplied by HP are located in the directory defined by the SYS$I18N_LOCALE logical name. The file type for a charmap file is .CMAP.

10.7.2 Converter Functions

As well as supporting different coded character sets, the HP C RTL provides the following converter functions that enable you to convert characters from one codeset to another:

10.7.3 Using Codeset Converter Files

The file-naming convention for codeset converters is:


fromcode_tocode.iconv 

Where fromcode is the name of the source codeset, and tocode is the name of the codeset to which characters are converted.

You can add codeset converters to a given system by installing the converter files in the directory pointed by the logical name SYS$I18N_ICONV.

Codeset converter files can be implemented either as table-based conversion files or as algorithm-based converter files created as OpenVMS shareable images.

Creating a Table-Based Conversion File

The following summarizes the necessary steps to create a table-based codeset converter file:

  1. Create a text file that describes the mapping between any character from the source codeset to the target codeset. For the format of this file, see the DCL command ICONV COMPILE in the OpenVMS New Features Manual, which processes such a file and creates a codeset converter table file.
  2. Copy the resulting file from the previous step to the directory pointed by the logical SYS$I18N_ICONV, assuming you have the privilege to do so.

Creating an Algorithm-Based Conversion File

To create an algorithm-based codeset converter file implemented as a shareable image, follow these steps:

  1. Create C source files that implement the codeset converter. The API is documented in the public header file <iconv.h> as follows:
  2. Compile and link the modules that comprise the codeset converter as an OpenVMS shareable image, making sure that the file name adheres to the preceding conventions.
  3. Copy the resulting file from the previous step to the directory pointed by the logical SYS$I18N_ICONV, assuming you have the privilege to do so.

Some Final Notes

By default, SYS$I18N_ICONV is a search list where the first directory in the list SYS$SYSROOT:[SYS$I18N.ICONV.USER] is meant for use as a site-specific repository for iconv codeset converters.

The number of codesets and locales installed vary from system to system. Check the SYS$I18N directory tree for the codesets, converters, and locales installed on your system.

10.8 Handling Culture-Specific Information

Each locale contains the following cultural information:

You can extract some of this cultural information using the nl_langinfo function and the localeconv function. See Section 10.8.1.

10.8.1 Extracting Cultural Information From a Locale

The nl_langinfo function returns a pointer to a string that contains an item of information obtained from the program's current locale. The information you can extract from the locale is:

The localeconv function returns a pointer to a data structure that contains numeric formatting and monetary formatting data from the LC_NUMERIC and LC_MONETARY categories.

10.8.2 Date and Time Formatting Functions

The functions that use the date and time information are:

10.8.3 Monetary Formatting Function

The strfmon function uses the monetary information in a locale to convert a number of values into a string. The format of the string is controlled by a format string.

10.8.4 Numeric Formatting

The information in LC_NUMERIC is used by various functions. For example, strtod , wcstod , and the print and scan functions determine the radix character from the LC_NUMERIC category.

10.9 Functions for Handling Wide Characters

A character can be represented by single-byte or multibyte values depending on the codeset. To make it easier to handle both single-byte and multibyte characters in the same way, the HP C RTL defines a wide-character data type, wchar_t . This data type can store characters that are represented by 1-, 2-, 3-, or 4-byte values.

The functions provided to support wide characters are:


Previous Next Contents Index