![]() | ![]() |
client1 client2 or server
% cd /shared/mod1
% cd /shared
% rm -rf mod1
% ls
.: Stale File Handle
It is important to note that recreating the removed directory before
client1 lists the directory would not have
prevented the stale filehandle problem:
client1 client2 or server
% cd /shared/mod1
% cd /shared
% rm -rf mod1
% mkdir mod1
% ls
.: Stale File Handle
This occurs because the client filehandle is tied to the inode number
and generation count of the file or directory. Removing and
recreating the directory mod1 results in the
creation of a new directory entry with the same name as before but
with a different inode number and generation count (and consequently
a different filehandle). This explains why clients get stale
filehandle errors when files or directories on the server are moved
to a different filesystem. Be careful when you perform filesystem
maintenance on the NFS server. Unfortunately you cannot bring a
server down, move files to a new filesystem (perhaps to a larger
disk), and reshare the new filesystem without risking your clients
getting stale filehandles. Moving the files to a new filesystem on
the server results in new inode numbers and generation counts for the
files since inode numbers are not preserved across filesystem moves.
If your client gets stale filehandles, then you may need to terminate
all processes accessing the filesystem on the client, and unmount the
NFS filesystem in order to clear the large number of stale
filehandles. Unfortunately, identifying all the processes that hold a
filesystem busy is not always feasible, in which case you may have to
resort to forcibly unmounting the filesystem:
Specify the -f option to the umount [59] command to forcibly unmount a filesystem. This should be done only as a last resort, since using this option can cause data loss for open files.# umount -f /shared
[59]The ability to forcibly unmount a filesystem was introduced in Solaris 8. This feature is supported by the Linux kernel 2.1.116 or later. Previously, you would have had to reboot the NFS client to clear the stale filehandles.You will also get stale filehandle errors when the server or another client removes a file that your client currently has open:
Process A on client1 client2 or server
...
fd = open("/shared/foo", O_RDONLY);
% rm /shared/foo
read(fd, &buffer, buffer_len);
Read fails! Stale File Handle
If you consistently suffer from stale filehandle errors, you should
look at the way in which users share files using NFS. Even though
users see the same set of files, they do not necessarily have to do
their work in the same directories. Watch out for users who share
directories or copies of code. Use a source code control system that
lets them make private copies of source files in their own
directories. NFS provides an excellent mechanism for allowing all
users to see the common source tree, but nobody should be doing
development in it. Similarly, users who share scratch space may
decide to clean it out periodically. Any user who had a scratch file
open when another user on another NFS client purged the scratch
directory will receive stale filehandle errors on the next reference
to the (now removed) scratch file.
As with most things, it helps to have an understanding of how your users are using the filesystems presented to them by NFS. In many cases, users want access to a wide variety of filesystems, but they do not want all of them mounted at all times (for fear of server crashes), nor do they want to keep track of where all filesystems are exported from and where they should be mounted. The NFS automounter solves all of these problems by applying NIS management to NFS mount information. As part of your client tuning, consider using the automounter to make client NFS administration easier. Chapter 9, "The Automounter" describes the automounter in detail.