From: CSBVAX::CSBVAX::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 9-MAR-1989 03:49 To: MRGATE::"ARISIA::EVERHART" Subj: VMS 5.X Cluster Disk Corruption Problems Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Wed, 8 MAR 89 14:25:56 PDT Received: from UICVM.uic.edu ([128.248.2.50]) by KL.SRI.COM with TCP; Wed, 8 Mar 89 13:59:13 PST Received: from IUBACS.BITNET by UICVM.uic.edu (IBM VM SMTP R1.2) with BSMTP id 5194; Wed, 08 Mar 89 15:58:14 CST Date: Wed, 8 Mar 89 16:56 EST From: Subject: VMS 5.X Cluster Disk Corruption Problems To: info-vax@kl.sri.com X-Original-To: "info-vax@kl.sri.com", FLOWERS At Indiana University, we have uncovered (or perhaps rediscovered?!) a nasty problem with VMS 5.0 and 5.1 which looks to us like a design oversight in the VMS file management system in a clustered disk environment. It has to do with the INDEXF.SYS file extention by one cluster node followed by another node writing to the same disk. When one views the disk (DIR/DAT etc.) from the original INDEXF file extending node, the error "bad directory file format" is returned! This is apparently due to the fact that the two nodes do not see the same INDEX.SYS file. It likely is caused by caching data not being written to the disk. We feel that should a system crash occur while this "window of error" exists that user files would be corrupted. (The lifetime of this "window" is minutes and eventually is cleared up.) The fact is that we have had files lost with users seeing the error "unsupported file structure level" appearing after a system crash and when they try to use their files. After an ANALYZE/DISK/ REPAIR run, the error message changes to "no such file" which is not much better for the system user. While we have not intentionally crashed the cluster to exactly duplicate the original bashed files problem, I feel that there is a reasonable probability that just such a disk file management snafu could cause it. Apart from any other evidence, this problem should be fixed. Users should not be seeing disk file error messages that "clear up" on their own. Either there is an error or there is not. We can recreate the error window at will. We have reported the problem to Digital but have gotten no fixes as yet. We feel that some of you out there must have seen similar problems and would appreciate some feedback on your solutions. Our workaround approach has been to preallocate large (75,000 block) INDEXF.SYS files to avoid the extention problem. We are running a mixed-mode cluster with VMS 4.7 on three VAXes & A5.0-2 on our two 8820's. We have 52 RA82 disks in the cluster. Chuck Flowers Operating Systems Manager Bloomington Academic Computing Services Indiana University Bloomington, IN 47405-4801