OpenVMS Cluster Systems

Document revision date: 15 July 2002

OpenVMS Cluster Systems

Contents

Index

Table A-2 lists system parameters that should not require adjustment at any time. These parameters are provided for use in system debugging. Compaq recommends that you do not change these parameters unless you are advised to do so by your Compaq support representative. Incorrect adjustment of these parameters can result in cluster failures.

Table A-2 Cluster System Parameters Reserved for OpenVMS Use Only
Parameter Description

++MC_SERVICES_P1 (dynamic) The value of this parameter must be the same on all nodes connected by MEMORY CHANNEL.

++MC_SERVICES_P5 (dynamic) This parameter must remain at the default value of 8000000. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.

++MC_SERVICES_P8 (static) This parameter must remain at the default value of 0. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.

++MPDEV_D1 A multipath system parameter.

PAMAXPORT PAMAXPORT specifies the maximum port number to be polled on each CI and DSSI. The CI and DSSI port drivers poll to discover newly initialized ports or the absence or failure of previously responding remote ports.
A system will not detect the existence of ports whose port numbers are higher than this parameter's value. Thus, this parameter should be set to a value that is greater than or equal to the highest port number being used on any CI or DSSI connected to the system.
You can decrease this parameter to reduce polling activity if the hardware configuration has fewer than 16 ports. For example, if the CI or DSSI with the largest configuration has a total of five ports assigned to port numbers 0 through 4, you could set PAMAXPORT to 4.
If no CI or DSSI devices are configured on your system, this parameter is ignored.
The default for this parameter is 15 (poll for all possible ports 0 through 15). Compaq recommends that you set this parameter to the same value on each cluster computer.

PANOPOLL Disables CI and DSSI polling for ports if set to 1. (The default is 0.) When PANOPOLL is set, a computer will not promptly discover that another computer has shut down or powered down and will not discover a new computer that has booted. This parameter is useful when you want to bring up a computer detached from the rest of the cluster for debugging purposes.
PANOPOLL is functionally equivalent to uncabling the system from the DSSI or star coupler. This parameter does not affect OpenVMS Cluster communications over the LAN.
The default value of 0 is the normal setting and is required if you are booting from an HSC controller or if your system is joining an OpenVMS Cluster. This parameter is ignored if there are no CI or DSSI devices configured on your system.

PANUMPOLL Establishes the number of CI and DSSI ports to be polled during each polling interval. The normal setting for PANUMPOLL is 16.
On older systems with less powerful CPUs, the parameter may be useful in applications sensitive to the amount of contiguous time that the system spends at IPL 8. Reducing PANUMPOLL reduces the amount of time spent at IPL 8 during each polling interval while increasing the number of polling intervals needed to discover new or failed ports.
If no CI or DSSI devices are configured on your system, this parameter is ignored.

PAPOLLINTERVAL Specifies, in seconds, the polling interval the CI port driver uses to poll for a newly booted computer, a broken port-to-port virtual circuit, or a failed remote computer.
This parameter trades polling overhead against quick response to virtual circuit failures. This parameter should be set to the same value on each cluster computer.

PAPOOLINTERVAL Specifies, in seconds, the interval at which the port driver checks available nonpaged pool after a pool allocation failure.
This parameter trades faster response to pool allocation failures for increased system overhead.
If CI or DSSI devices are not configured on your system, this parameter is ignored.

PASANITY PASANITY controls whether the CI and DSSI port sanity timers are enabled to permit remote systems to detect a system that has been hung at IPL 8 or higher for 100 seconds. It also controls whether virtual circuit checking gets enabled on the local system. The TIMVCFAIL parameter controls the time (1--99 seconds).
PASANITY is normally set to 1 and should be set to 0 only if you are debugging with XDELTA or planning to halt the CPU for periods of 100 seconds or more.
PASANITY is only semidynamic. A new value of PASANITY takes effect on the next CI or DSSI port reinitialization.
If CI or DSSI devices are not configured on your system, this parameter is ignored.

PASTDGBUF The number of datagram receive buffers to queue initially for each CI or DSSI port driver's configuration poller; the initial value is expanded during system operation, if needed.
If no CI or DSSI devices are configured on your system, this parameter is ignored.

PASTIMOUT The basic interval at which the CI port driver wakes up to perform time-based bookkeeping operations. It is also the period after which a timeout will be declared if no response to a start handshake datagram has been received.
If no CI or DSSI device is configured on your system, this parameter is ignored.

PRCPOLINTERVAL Specifies, in seconds, the polling interval used to look for SCS applications, such as the connection manager and MSCP disks, on other computers. Each computer is polled, at most, once each interval.
This parameter trades polling overhead against quick recognition of new computers or servers as they appear.

SCSMAXMSG The maximum number of bytes of system application data in one sequenced message. The amount of physical memory consumed by one message is SCSMAXMSG plus the overhead for buffer management.
If an SCS port is not configured on your system, this parameter is ignored.

SCSMAXDG Specifies the maximum number of bytes of application data in one datagram.
If an SCS port is not configured on your system, this parameter is ignored.

SCSFLOWCUSH Specifies the lower limit for receive buffers at which point SCS starts to notify the remote SCS of new receive buffers. For each connection, SCS tracks the number of receive buffers available. SCS communicates this number to the SCS at the remote end of the connection. However, SCS does not need to do this for each new receive buffer added. Instead, SCS notifies the remote SCS of new receive buffers if the number of receive buffers falls as low as the SCSFLOWCUSH value.
If an SCS port is not configured on your system, this parameter is ignored.

**Table A-2 Cluster System Parameters Reserved for OpenVMS Use Only**
Parameter	Description
++MC_SERVICES_P1 (dynamic)	The value of this parameter must be the same on all nodes connected by MEMORY CHANNEL.
++MC_SERVICES_P5 (dynamic)	This parameter must remain at the default value of 8000000. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.
++MC_SERVICES_P8 (static)	This parameter must remain at the default value of 0. This parameter value must be the same on all nodes connected by MEMORY CHANNEL.
++MPDEV_D1	A multipath system parameter.
PAMAXPORT	PAMAXPORT specifies the maximum port number to be polled on each CI and DSSI. The CI and DSSI port drivers poll to discover newly initialized ports or the absence or failure of previously responding remote ports. A system will not detect the existence of ports whose port numbers are higher than this parameter's value. Thus, this parameter should be set to a value that is greater than or equal to the highest port number being used on any CI or DSSI connected to the system. You can decrease this parameter to reduce polling activity if the hardware configuration has fewer than 16 ports. For example, if the CI or DSSI with the largest configuration has a total of five ports assigned to port numbers 0 through 4, you could set PAMAXPORT to 4. If no CI or DSSI devices are configured on your system, this parameter is ignored. The default for this parameter is 15 (poll for all possible ports 0 through 15). Compaq recommends that you set this parameter to the same value on each cluster computer.
PANOPOLL	Disables CI and DSSI polling for ports if set to 1. (The default is 0.) When PANOPOLL is set, a computer will not promptly discover that another computer has shut down or powered down and will not discover a new computer that has booted. This parameter is useful when you want to bring up a computer detached from the rest of the cluster for debugging purposes. PANOPOLL is functionally equivalent to uncabling the system from the DSSI or star coupler. This parameter does not affect OpenVMS Cluster communications over the LAN. The default value of 0 is the normal setting and is required if you are booting from an HSC controller or if your system is joining an OpenVMS Cluster. This parameter is ignored if there are no CI or DSSI devices configured on your system.
PANUMPOLL	Establishes the number of CI and DSSI ports to be polled during each polling interval. The normal setting for PANUMPOLL is 16. On older systems with less powerful CPUs, the parameter may be useful in applications sensitive to the amount of contiguous time that the system spends at IPL 8. Reducing PANUMPOLL reduces the amount of time spent at IPL 8 during each polling interval while increasing the number of polling intervals needed to discover new or failed ports. If no CI or DSSI devices are configured on your system, this parameter is ignored.
PAPOLLINTERVAL	Specifies, in seconds, the polling interval the CI port driver uses to poll for a newly booted computer, a broken port-to-port virtual circuit, or a failed remote computer. This parameter trades polling overhead against quick response to virtual circuit failures. This parameter should be set to the same value on each cluster computer.
PAPOOLINTERVAL	Specifies, in seconds, the interval at which the port driver checks available nonpaged pool after a pool allocation failure. This parameter trades faster response to pool allocation failures for increased system overhead. If CI or DSSI devices are not configured on your system, this parameter is ignored.
PASANITY	PASANITY controls whether the CI and DSSI port sanity timers are enabled to permit remote systems to detect a system that has been hung at IPL 8 or higher for 100 seconds. It also controls whether virtual circuit checking gets enabled on the local system. The TIMVCFAIL parameter controls the time (1--99 seconds). PASANITY is normally set to 1 and should be set to 0 only if you are debugging with XDELTA or planning to halt the CPU for periods of 100 seconds or more. PASANITY is only semidynamic. A new value of PASANITY takes effect on the next CI or DSSI port reinitialization. If CI or DSSI devices are not configured on your system, this parameter is ignored.
PASTDGBUF	The number of datagram receive buffers to queue initially for each CI or DSSI port driver's configuration poller; the initial value is expanded during system operation, if needed. If no CI or DSSI devices are configured on your system, this parameter is ignored.
PASTIMOUT	The basic interval at which the CI port driver wakes up to perform time-based bookkeeping operations. It is also the period after which a timeout will be declared if no response to a start handshake datagram has been received. If no CI or DSSI device is configured on your system, this parameter is ignored.
PRCPOLINTERVAL	Specifies, in seconds, the polling interval used to look for SCS applications, such as the connection manager and MSCP disks, on other computers. Each computer is polled, at most, once each interval. This parameter trades polling overhead against quick recognition of new computers or servers as they appear.
SCSMAXMSG	The maximum number of bytes of system application data in one sequenced message. The amount of physical memory consumed by one message is SCSMAXMSG plus the overhead for buffer management. If an SCS port is not configured on your system, this parameter is ignored.
SCSMAXDG	Specifies the maximum number of bytes of application data in one datagram. If an SCS port is not configured on your system, this parameter is ignored.
SCSFLOWCUSH	Specifies the lower limit for receive buffers at which point SCS starts to notify the remote SCS of new receive buffers. For each connection, SCS tracks the number of receive buffers available. SCS communicates this number to the SCS at the remote end of the connection. However, SCS does not need to do this for each new receive buffer added. Instead, SCS notifies the remote SCS of new receive buffers if the number of receive buffers falls as low as the SCSFLOWCUSH value. If an SCS port is not configured on your system, this parameter is ignored.

++Alpha specific

Appendix B
Building Common Files

This appendix provides guidelines for building a common user authorization file (UAF) from computer-specific files. It also describes merging RIGHTSLIST.DAT files.

For more detailed information about how to set up a computer-specific authorization file, see the descriptions in the OpenVMS Guide to System Security.

B.1 Building a Common SYSUAF.DAT File

To build a common SYSUAF.DAT file, follow the steps in Table B-1.

Table B-1 Building a Common SYSUAF.DAT File
Step Action

1 Print a listing of SYSUAF.DAT on each computer. To print this listing, invoke AUTHORIZE and specify the AUTHORIZE command LIST as follows:
$ SET DEF SYS$SYSTEM
$ RUN AUTHORIZE
UAF> LIST/FULL [*,*]

2 Use the listings to compare the accounts from each computer. On the listings, mark any necessary changes. For example:

Delete any accounts that you no longer need.
Make sure that UICs are set appropriately:

User UICs
Check each user account in the cluster to see whether it should have a unique user identification code (UIC). For example, OpenVMS Cluster member VENUS may have a user account JONES that has the same UIC as user account SMITH on computer MARS. When computers VENUS and MARS are joined to form a cluster, accounts JONES and SMITH will exist in the cluster environment with the same UIC. If the UICs of these accounts are not differentiated, each user will have the same access rights to various objects in the cluster. In this case, you should assign each account a unique UIC.
Group UICs
Make sure that accounts that perform the same type of work have the same group UIC. Accounts in a single-computer environment probably follow this convention. However, there may be groups of users on each computer that will perform the same work in the cluster but that have group UICs unique to their local computer. As a rule, the group UIC for any given work category should be the same on each computer in the cluster. For example, data entry accounts on VENUS should have the same group UIC as data entry accounts on MARS.

Note: If you change the UIC for a particular user, you should also change the owner UICs for that user's existing files and directories. You can use the DCL commands SET FILE and SET DIRECTORY to make these changes. These commands are described in detail in the OpenVMS DCL Dictionary.

3 Choose the SYSUAF.DAT file from one of the computers to be a master SYSUAF.DAT.
Note: The default values for a number of SYSUAF process limits and quotas are higher on an Alpha computer than they are on a VAX computer. See A Comparison of System Management on OpenVMS AXP and OpenVMS VAX ¹ for information about setting values on both computers.

4 Merge the SYSUAF.DAT files from the other computers to the master SYSUAF.DAT by running the Convert utility (CONVERT) on the computer that owns the master SYSUAF.DAT. (See the OpenVMS Record Management Utilities Reference Manual for a description of CONVERT.) To use CONVERT to merge the files, each SYSUAF.DAT file must be accessible to the computer that is running CONVERT.
Syntax: To merge the UAFs into the master SYSUAF.DAT file, specify the CONVERT command in the following format:
CONVERT SYSUAF1,SYSUAF2,...SYSUAFn MASTER_SYSUAF

Note that if a given user name appears in more than one source file, only the first occurrence of that name appears in the merged file.
Example: The following command sequence example creates a new SYSUAF.DAT file from the combined contents of the two input files:
$ SET DEFAULT SYS$SYSTEM
$ CONVERT/MERGE [SYS1.SYSEXE]SYSUAF.DAT, -
_$ [SYS2.SYSEXE]SYSUAF.DAT SYSUAF.DAT

The CONVERT command in this example adds the records from the files [SYS1.SYSEXE]SYSUAF.DAT and [SYS2.SYSEXE]SYSUAF.DAT to the file SYSUAF.DAT on the local computer.
After you run CONVERT, you have a master SYSUAF.DAT that contains records from the other SYSUAF.DAT files.

5 Use AUTHORIZE to modify the accounts in the master SYSUAF.DAT according to the changes you marked on the initial listings of the SYSUAF.DAT files from each computer.

6 Place the master SYSUAF.DAT file in SYS$COMMON:[SYSEXE].

7 Remove all node-specific SYSUAF.DAT files.

**Table B-1 Building a Common SYSUAF.DAT File**
Step	Action
1	Print a listing of SYSUAF.DAT on each computer. To print this listing, invoke AUTHORIZE and specify the AUTHORIZE command LIST as follows: $ SET DEF SYS$SYSTEM $ RUN AUTHORIZE UAF> LIST/FULL [,]
2	Use the listings to compare the accounts from each computer. On the listings, mark any necessary changes. For example: Delete any accounts that you no longer need. Make sure that UICs are set appropriately: User UICs Check each user account in the cluster to see whether it should have a unique user identification code (UIC). For example, OpenVMS Cluster member VENUS may have a user account JONES that has the same UIC as user account SMITH on computer MARS. When computers VENUS and MARS are joined to form a cluster, accounts JONES and SMITH will exist in the cluster environment with the same UIC. If the UICs of these accounts are not differentiated, each user will have the same access rights to various objects in the cluster. In this case, you should assign each account a unique UIC. Group UICs Make sure that accounts that perform the same type of work have the same group UIC. Accounts in a single-computer environment probably follow this convention. However, there may be groups of users on each computer that will perform the same work in the cluster but that have group UICs unique to their local computer. As a rule, the group UIC for any given work category should be the same on each computer in the cluster. For example, data entry accounts on VENUS should have the same group UIC as data entry accounts on MARS. Note: If you change the UIC for a particular user, you should also change the owner UICs for that user's existing files and directories. You can use the DCL commands SET FILE and SET DIRECTORY to make these changes. These commands are described in detail in the OpenVMS DCL Dictionary.
3	Choose the SYSUAF.DAT file from one of the computers to be a master SYSUAF.DAT. Note: The default values for a number of SYSUAF process limits and quotas are higher on an Alpha computer than they are on a VAX computer. See A Comparison of System Management on OpenVMS AXP and OpenVMS VAX ¹ for information about setting values on both computers.
4	Merge the SYSUAF.DAT files from the other computers to the master SYSUAF.DAT by running the Convert utility (CONVERT) on the computer that owns the master SYSUAF.DAT. (See the OpenVMS Record Management Utilities Reference Manual for a description of CONVERT.) To use CONVERT to merge the files, each SYSUAF.DAT file must be accessible to the computer that is running CONVERT. Syntax: To merge the UAFs into the master SYSUAF.DAT file, specify the CONVERT command in the following format: CONVERT SYSUAF1,SYSUAF2,...SYSUAFn MASTER_SYSUAF Note that if a given user name appears in more than one source file, only the first occurrence of that name appears in the merged file. Example: The following command sequence example creates a new SYSUAF.DAT file from the combined contents of the two input files: $ SET DEFAULT SYS$SYSTEM $ CONVERT/MERGE [SYS1.SYSEXE]SYSUAF.DAT, - _$ [SYS2.SYSEXE]SYSUAF.DAT SYSUAF.DAT The CONVERT command in this example adds the records from the files [SYS1.SYSEXE]SYSUAF.DAT and [SYS2.SYSEXE]SYSUAF.DAT to the file SYSUAF.DAT on the local computer. After you run CONVERT, you have a master SYSUAF.DAT that contains records from the other SYSUAF.DAT files.
5	Use AUTHORIZE to modify the accounts in the master SYSUAF.DAT according to the changes you marked on the initial listings of the SYSUAF.DAT files from each computer.
6	Place the master SYSUAF.DAT file in SYS$COMMON:[SYSEXE].
7	Remove all node-specific SYSUAF.DAT files.

¹This manual has been archived but is available in PostScript and DECW$BOOK (Bookreader) formats on the OpenVMS Documentation CD-ROM. A printed book cn be ordered through DECdirect (800-354-4825).

B.2 Merging RIGHTSLIST.DAT Files

If you need to merge RIGHTSLIST.DAT files, you can use a command sequence like the following:

$ ACTIVE_RIGHTSLIST = F$PARSE("RIGHTSLIST","SYS$SYSTEM:.DAT") $ CONVERT/SHARE/STAT 'ACTIVE_RIGHTSLIST' RIGHTSLIST.NEW $ CONVERT/MERGE/STAT/EXCEPTION=RIGHTSLIST_DUPLICATES.DAT - _$ [SYS1.SYSEXE]RIGHTSLIST.DAT, [SYS2.SYSEXE]RIGHTSLIST.DAT RIGHTSLIST.NEW $ DUMP/RECORD RIGHTSLIST_DUPLICATES.DAT $ CONVERT/NOSORT/FAST/STAT RIGHTSLIST.NEW 'ACTIVE_RIGHTSLIST'

The commands in this example add the RIGHTSLIST.DAT files from two OpenVMS Cluster computers to the master RIGHTSLIST.DAT file in the current default directory. For detailed information about creating and maintaining RIGHTSLIST.DAT files, see the security guide for your system.

Appendix C
Cluster Troubleshooting

C.1 Diagnosing Computer Failures

This appendix contains information to help you perform troubleshooting operations for the following:

Failures of computers to boot or to join the cluster
Cluster hangs
CLUEXIT bugchecks
Port device problems

C.1.1 Preliminary Checklist

Before you initiate diagnostic procedures, be sure to verify that these conditions are met:

All cluster hardware components are correctly connected and checked for proper operation.
When you attempt to add a new or recently repaired CI computer to the cluster, verify that the CI cables are correctly connected, as described in Section C.10.5.
OpenVMS Cluster computers and mass storage devices are configured according to requirements specified in the OpenVMS Cluster Software Software Product Description (SPD 29.78.xx).
When attempting to add a satellite to a cluster, you must verify that the LAN is configured according to requirements specified in the OpenVMS Cluster Software SPD. You must also verify that you have correctly configured and started the network, following the procedures described in Chapter 4.

If, after performing preliminary checks and taking appropriate corrective action, you find that a computer still fails to boot or to join the cluster, you can follow the procedures in Sections C.2 through C.4 to attempt recovery.

C.1.2 Sequence of Booting Events

To perform diagnostic and recovery procedures effectively, you must understand the events that occur when a computer boots and attempts to join the cluster. This section outlines those events and shows typical messages displayed at the console.

Note that events vary, depending on whether a computer is the first to boot in a new cluster or whether it is booting in an active cluster. Note also that some events (such as loading the cluster database containing the password and group number) occur only in OpenVMS Cluster systems on a LAN.

The normal sequence of events is shown in Table C-1.

Table C-1 Sequence of Booting Events
Step Action

1 The computer boots. If the computer is a satellite, a message like the following shows the name and LAN address of the MOP server that has downline loaded the satellite. At this point, the satellite has completed communication with the MOP server and further communication continues with the system disk server, using OpenVMS Cluster communications.
%VAXcluster-I-SYSLOAD, system loaded from Node X...

For any booting computer, the OpenVMS "banner message" is displayed in the following format:
operating-system Version n.n dd-mmm-yyyy hh:mm.ss

2 The computer attempts to form or join the cluster, and the following message appears:
waiting to form or join an OpenVMS Cluster system

If the computer is a member of an OpenVMS Cluster based on the LAN, the cluster security database (containing the cluster password and group number) is loaded. Optionally, the MSCP server and TMSCP server can be loaded:
%VAXcluster-I-LOADSECDB, loading the cluster security database
%MSCPLOAD-I-LOADMSCP, loading the MSCP disk server
%TMSCPLOAD-I-LOADTMSCP, loading the TMSCP tape server

3 If the computer discovers a cluster, the computer attempts to join it. If a cluster is found, the connection manager displays one or more messages in the following format:
%CNXMAN, Sending VAXcluster membership request to system X...

Otherwise, the connection manager forms the cluster when it has enough votes to establish quorum (that is, when enough voting computers have booted).

4 As the booting computer joins the cluster, the connection manager displays a message in the following format:
%CNXMAN, now a VAXcluster member -- system X...

Note that if quorum is lost while the computer is booting, or if a computer is unable to join the cluster within 2 minutes of booting, the connection manager displays messages like the following:
%CNXMAN, Discovered system X...
%CNXMAN, Deleting CSB for system X...
%CNXMAN, Established "connection" to quorum disk
%CNXMAN, Have connection to system X...
%CNXMAN, Have "connection" to quorum disk

The last two messages show any connections that have already been formed.

5 If the cluster includes a quorum disk, you may also see messages like the following:
%CNXMAN, Using remote access method for quorum disk
%CNXMAN, Using local access method for quorum disk

The first message indicates that the connection manager is unable to access the quorum disk directly, either because the disk is unavailable or because it is accessed through the MSCP server. Another computer in the cluster that can access the disk directly must verify that a reliable connection to the disk exists.
The second message indicates that the connection manager can access the quorum disk directly and can supply information about the status of the disk to computers that cannot access the disk directly.
Note: The connection manager may not see the quorum disk initially because the disk may not yet be configured. In that case, the connection manager first uses remote access, then switches to local access.

6 Once the computer has joined the cluster, normal startup procedures execute. One of the first functions is to start the OPCOM process:
%%%%%%%%%%% OPCOM 15-JAN-1994 16:33:55.33 %%%%%%%%%%%
Logfile has been initialized by operator _X...$OPA0:
Logfile is SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;17
%%%%%%%%%%% OPCOM 15-JAN-1994 16:33:56.43 %%%%%%%%%%%
16:32:32.93 Node X... (csid 0002000E) is now a VAXcluster member

7 As other computers join the cluster, OPCOM displays messages like the following:
%%%%% OPCOM 15-JAN-1994 16:34:25.23 %%%%% (from node X...)
16:34:24.42 Node X... (csid 000100F3)
received VAXcluster membership request from node X...

**Table C-1 Sequence of Booting Events**
Step	Action
1	The computer boots. If the computer is a satellite, a message like the following shows the name and LAN address of the MOP server that has downline loaded the satellite. At this point, the satellite has completed communication with the MOP server and further communication continues with the system disk server, using OpenVMS Cluster communications. %VAXcluster-I-SYSLOAD, system loaded from Node X... For any booting computer, the OpenVMS "banner message" is displayed in the following format: operating-system Version n.n dd-mmm-yyyy hh:mm.ss
2	The computer attempts to form or join the cluster, and the following message appears: waiting to form or join an OpenVMS Cluster system If the computer is a member of an OpenVMS Cluster based on the LAN, the cluster security database (containing the cluster password and group number) is loaded. Optionally, the MSCP server and TMSCP server can be loaded: %VAXcluster-I-LOADSECDB, loading the cluster security database %MSCPLOAD-I-LOADMSCP, loading the MSCP disk server %TMSCPLOAD-I-LOADTMSCP, loading the TMSCP tape server
3	If the computer discovers a cluster, the computer attempts to join it. If a cluster is found, the connection manager displays one or more messages in the following format: %CNXMAN, Sending VAXcluster membership request to system X... Otherwise, the connection manager forms the cluster when it has enough votes to establish quorum (that is, when enough voting computers have booted).
4	As the booting computer joins the cluster, the connection manager displays a message in the following format: %CNXMAN, now a VAXcluster member -- system X... Note that if quorum is lost while the computer is booting, or if a computer is unable to join the cluster within 2 minutes of booting, the connection manager displays messages like the following: %CNXMAN, Discovered system X... %CNXMAN, Deleting CSB for system X... %CNXMAN, Established "connection" to quorum disk %CNXMAN, Have connection to system X... %CNXMAN, Have "connection" to quorum disk The last two messages show any connections that have already been formed.
5	If the cluster includes a quorum disk, you may also see messages like the following: %CNXMAN, Using remote access method for quorum disk %CNXMAN, Using local access method for quorum disk The first message indicates that the connection manager is unable to access the quorum disk directly, either because the disk is unavailable or because it is accessed through the MSCP server. Another computer in the cluster that can access the disk directly must verify that a reliable connection to the disk exists. The second message indicates that the connection manager can access the quorum disk directly and can supply information about the status of the disk to computers that cannot access the disk directly. Note: The connection manager may not see the quorum disk initially because the disk may not yet be configured. In that case, the connection manager first uses remote access, then switches to local access.
6	Once the computer has joined the cluster, normal startup procedures execute. One of the first functions is to start the OPCOM process: %%%%%%%%%%% OPCOM 15-JAN-1994 16:33:55.33 %%%%%%%%%%% Logfile has been initialized by operator _X...$OPA0: Logfile is SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;17 %%%%%%%%%%% OPCOM 15-JAN-1994 16:33:56.43 %%%%%%%%%%% 16:32:32.93 Node X... (csid 0002000E) is now a VAXcluster member
7	As other computers join the cluster, OPCOM displays messages like the following: %%%%% OPCOM 15-JAN-1994 16:34:25.23 %%%%% (from node X...) 16:34:24.42 Node X... (csid 000100F3) received VAXcluster membership request from node X...

As startup procedures continue, various messages report startup events.

Hint: For troubleshooting purposes, you can include in your site-specific startup procedures messages announcing each phase of the startup process---for example, mounting disks or starting queues.

C.2 Computer on the CI Fails to Boot

If a CI computer fails to boot, perform the following checks:

Step Action

1 Verify that the computer's SCSNODE and SCSSYSTEMID parameters are unique in the cluster. If they are not, you must either alter both values or reboot all other computers.

2 Verify that you are using the correct bootstrap command file. This file must specify the internal bus computer number (if applicable), the HSC or HSJ node number, and the disk from which the computer is to boot. Refer to your processor-specific installation and operations guide for information about setting values in default bootstrap command procedures.

3 Verify that the PAMAXPORT system parameter is set to a value greater than or equal to the largest CI port number.

4 Verify that the CI port has a unique hardware station address.

5 Verify that the HSC subsystem is on line. The ONLINE switch on the HSC operator control panel should be pressed in.

6 Verify that the disk is available. The correct port switches on the disk's operator control panel should be pressed in.

7 Verify that the computer has access to the HSC subsystem. The SHOW HOSTS command of the HSC SETSHO utility displays status for all computers (hosts) in the cluster. If the computer in question appears in the display as DISABLED, use the SETSHO utility to set the computer to the ENABLED state.
Reference: For complete information about the SETSHO utility, consult the HSC hardware documentation.

8 Verify that the HSC subsystem allows access to the boot disk. Invoke the SETSHO utility to ensure that the boot disk is available to the HSC subsystem. The utility's SHOW DISKS command displays the current state of all disks visible to the HSC subsystem and displays all disks in the no-host-access table.

IF... THEN...

The boot disk appears in the no-host-access table. Use the SETSHO utility to set the boot disk to host-access.

The boot disk is available or mounted and host access is enabled, but the disk does not appear in the no-host-access table. Contact your support representative and explain both the problem and the steps you have taken.

Contents

Index

privacy and legal statement

4477PRO_023.HTML

OpenVMS Cluster Systems

Appendix BBuilding Common Files

B.1 Building a Common SYSUAF.DAT File

B.2 Merging RIGHTSLIST.DAT Files

Appendix CCluster Troubleshooting

C.1 Diagnosing Computer Failures

C.1.1 Preliminary Checklist

C.1.2 Sequence of Booting Events

C.2 Computer on the CI Fails to Boot

Appendix B
Building Common Files

Appendix C
Cluster Troubleshooting