Troubleshooting locking problems

Lock problems will be evident when an NFS client tries to lock a file, and it fails because someone has it locked. For applications that share access to files, the expectation is that locks will be short-lived. Thus, the pattern your users will notice when something is awry is that yesterday an application started up quite quickly, but today it hangs. Usually it is because an NFS/NLM client holds a lock on a file that your application needs to lock, and the holding client has crashed.

Diagnosing NFS lock hangs

On Solaris, you can use tools like pstack and truss to verify that processes are hanging in a lock request:

client1% ps -eaf | grep SuperApp mre 23796 10031 0 11:13:22 pts/6 0:00 SuperApp client1% pstack 23796 23796: SuperApp ff313134 fcntl (1, 7, ffbef9dc) ff30de48 fcntl (1, 7, ffbef9dc, 0, 0, 0) + 1c8 ff30e254 lockf (1, 1, 0, 2, ff332584, ff2a0140) + 98 0001086c main (1, ffbefac4, ffbefacc, 20800, 0, 0) + 1c 00010824 _start (0, 0, 0, 0, 0, 0) + dc client1% truss -p 23796 fcntl(1, F_SETLKW, 0xFFBEF9DC) (sleeping...)

This verifies that the application is stuck in a lock request. We can use pfiles to see what is going on with the files of process :

client1% pfiles 23796 pfiles 23796 23796: SuperApp Current rlimit: 256 file descriptors 0: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37 O_RDWR 1: S_IFREG mode:0644 dev:208,1823 ino:5516985 uid:466 gid:300 size:0 O_WRONLY|O_LARGEFILE advisory write lock set by process 3242 2: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37 O_RDWR

That we are told that there is an advisory lock set on file descriptor 1 that is set by another process, process ID 3242, is useful, but unfortunately it doesn't tell us if 3242 is a local process or a process on another NFS client or NFS server. We also aren't told if the file mapped to file descriptor 1 is a local file, or an NFS file. We are, however, told that the major and minor device numbers of the filesystem are 208 and 1823 respectively. If you run the mount command without any arguments, this dumps the list of mounted file systems. You should see a display similar to:

/ on /dev/dsk/c0t0d0s0 read/write/setuid/intr/largefiles/onerror=panic/dev=2200000 on Thu Dec 21 11:13:33 2000 /usr on /dev/dsk/c0t0d0s6 read/write/setuid/intr/largefiles/onerror=panic/ dev=2200006 on Thu Dec 21 11:13:34 2000 /proc on /proc read/write/setuid/dev=31c0000 on Thu Dec 21 11:13:29 2000 /dev/fd on fd read/write/setuid/dev=32c0000 on Thu Dec 21 11:13:34 2000 /etc/mnttab on mnttab read/write/setuid/dev=3380000 on Thu Dec 21 11:13:35 2000 /var on /dev/dsk/c0t0d0s7 read/write/setuid/intr/largefiles/onerror=panic/ dev=2200007 on Thu Dec 21 11:13:40 2000 /home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on Thu Dec 28 08:51:30 2000

The numbers after dev= are in hexadecimal. Device numbers are constructed by taking the major number, shifting it left several bits, and then adding the minor number. Convert the minor number 1823 to hexadecimal, and look for it in the mount table:

client1% printf "%x\n" 1823 71f client1% mount | grep 'dev=.*71f' /home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on Thu Dec 28 08:51:30 2000

We now know four things:

This is an NFS file we are blocking on.
The NFS server name is spike.
The filesystem on the server is /export/home/mre.
The inode number of the file is .

One obvious cause you should first eliminate is whether the NFS server spike has crashed or not. If it hasn't crashed, then the next step is to examine the server.

Examining lock state on NFS/NLM servers

Solaris and other System V-derived systems have a useful tool called crash for analyzing system state. Crash actually reads the Unix kernel's memory and formats its data structures in a more human readable form. Continuing with the example from "Diagnosing NFS lock hangs", assuming /export/home/mre is a directory on a UFS filesystem, which can be verified by doing:

spike# df -F ufs | grep /export /export (/dev/dsk/c0t0d0s7 ): 503804 blocks 436848 files

then you can use crash to get more lock state.The crash command is like a shell, but with internal commands for examining kernel state. The internal command we will be using is lck :

spike# crash dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout > lck Active and Sleep Locks: INO TYP START END PROC PID FLAGS STATE PREV NEXT LOCK 30000c3ee18 w 0 0 13 136 0021 3 48bf0f8 ae9008 6878d00 30000dd8710 w 0 MAXEND 17 212 0001 3 8f1a48 8f02d8 8f0e18 30001cce1c0 w 193 MAXEND -1 3242 2021 3 6878850 c43a08 2338a38 Summary From List: TOTAL ACTIVE SLEEP 3 3 0 >

An important field is PROC. PROC is the "slot" number of the process. If it is -1, that indicates that the lock is being held by a nonlocal (i.e., an NFS client) process, and the PID field thus indicates the process ID, relative to the NFS client. In the sample display, we see one such entry:

cce1c0 w 193 MAXEND -1 3242 2021 3 6878850 c43a08 2338a38

Note that the process id, 3242, is equal to that which the pfiles command displayed earlier in this example. We can confirm that this lock is for the file in question via crash's uinode command:

> uinode 30001cce1c0 UFS INODE MAX TABLE SIZE = 34020 ADDR MAJ/MIN INUMB RCNT LINK UID GID SIZE MODE FLAGS 30001cce1c0 136, 7 5516985 2 1 466 300 403 f---644 mt rf >

The inode numbers match what pfiles earlier displayed on the NFS client. However, inode numbers are unique per local filesystem. We can make doubly sure this is the file by comparing the major and minor device numbers from the uinode command, 136 and 7, with that of the filesystem that is mounted on /export :

spike# ls -lL /dev/dsk/c0t0d0s7 brw------- 1 root sys 136, 7 May 6 2000 /dev/dsk/c0t0d0s7 spike#

Clearing lock state

Continuing with our example from "Examining lock state on NFS/NLM servers", at this point we know that the file is locked by another NFS client. Unfortunately, we don't know which client it is, as crash won't give us that information. We do however have a potential list of clients in the server's /var/statmon/sm directory:

spike# cd /var/statmon/sm spike# ls client1 ipv4.10.1.0.25 ipv4.10.1.0.26 gonzo java

The entries prefixed with ipv4 are just symbolic links to other entries. The non-symbolic link entries identify the hosts we want to check for. The most likely cause of the lock not getting released is that the holding NFS client has crashed. You can take the list of hosts from the /var/statmon/sm directory and check if any are dead, or not responding due to a network partition. Once you determine which are dead, you can use Solaris's clear_locks command to clear lock state. Let's suppose you determine that gonzo is dead. Then you would do:

spike# clear_locks gonzo

If clearing the lock state of dead clients doesn't fix the problem, then perhaps a now-live client crashed, but for some reason after it rebooted, its status monitor did not send a notification to the NLM server's status monitor. You can log onto the live clients and check if they are currently mounting the filesystem from the server (in our example, spike:/export). If they are not, then you should consider using clear_locks to clear any residual lock state those clients might have had. Ultimately, you may be forced to reboot your server. Short of that there are other things you could do. Since you know the inode number and filesystem of file in question, you can determine the file's name:

spike# cd /export find . -inum 5516985 -print ./home/mre/database

You could rename file database to something else, and copy it back to a file named database. Then kill and restart the SuperApp application on client1. Of course, such an approach requires intimate knowledge or experience with the application to know if this will be safe.