NFS file problem: maybe a stale/stuck handle?

Ask a Question related to Linux / Unix Administration, Design and Development.

  1. #1

    Default NFS file problem: maybe a stale/stuck handle?

    This has me stumped. My apologies if the subject line misses the
    mark.

    I've recently migrated some directories from one NFS file server
    (SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
    NFS client (IRIX 6.5.19m).

    Before the migration, the NFS client did this:

    /usr/local -> /net/sun-host/mnt/usr/local

    After the migration, it does this:

    /usr/local -> /net/irix-host/mnt/usr/local

    Everything seems fine, except for one file:

    /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat

    The symptom is that Netscape mail's "Spelling" tool is unavailable.
    I've traced the spelling problem to that particular file.

    I've compared the files on the SunOS and IRIX servers with diff and
    cmp, as a regular user (no special privileges), and they appear to
    be identical. Neither diff nor cmp have any trouble reading either
    file. Ownership, sizes, dates, permissions, everything I can think
    of to compare is identical, except for their physical locations on
    disk. In trying to debug this, I've found the following:

    * Only netscape seems to have any trouble accessing the file.
    Other apps (diff, cmp, strings, cp) have no trouble reading it.

    * If I replace the file on the IRIX server with a symlink back to
    the SunOS server, it works in netscape:

    pen4s324.dat -> /net/sun-host/mnt/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat

    * If I replace it with a symlink to another duplicate of the file
    on the NFS client's internal disk, it works in netscape (only on
    that host, of course):

    pen4s324.dat -> /usr/local-on-nfs-client/pen4s324.dat

    * If I make another duplicate of the file, or even the entire
    netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
    work. Everything else in netscape seems to run fine, except for
    the "Spelling" tool.

    * If I log-in to the IRIX server and run netscape there (with
    DISPLAY pointing back to the NFS client), THAT works. It avoids
    NFS and accesses the file locally:

    /usr/local -> /mnt/usr/local

    I've rebooted both the IRIX NFS server and the NFS client, but this
    behavior persists. It's not tied to netscape's pathname to the file
    (since the pathname works if it's a symlink to the SunOS server).
    It's not tied to the file's inode (since copying the file to a
    different inode doesn't change anything). It's not tied to the
    file's contents (since it works as long as the app is running on the
    NFS server rather than the NFS client).

    The last point seems to indicate some sort of NFS problem, but as I
    said, I've rebooted both machines to no avail. I'd appreciate any
    tips.

    --

    Ted Hall
    Theodore W. Hall Guest

  2. Similar Questions and Discussions

    1. CSS local cache stale with WebDAV connection
      We've been successfully using Contribute 4 (Windows) against an Apache 2 WebDAV server for several months now. Roughly 20 contributors have been...
    2. Accessing web.sitemap XML file - completely stuck!
      Web.sitemap is the default name of the XML sitemap file used in asp .net web sites. I am having trouble opening it in an embedded .swf on any of my...
    3. Dev Challenge: Stuck on PDF problem
      Let's see if I can describe this scenario. I have a "triple-feature" CD. Runs on Windows/Mac OSX/Mac OS9.x I use a shared drive on my Mac as my...
    4. File Browser Stuck
      The file browser is stuck on the Photoshop 7.01 desktop. I cannot move it or move the scroll bars in order to get to the file menus. I have tried...
    5. handle file > 2GB
      Hi there, Anyone know how to handle file size > 2GB in SUN? I called stat but it fail with EOVERFLOW if the file > 2GB. TIA
  3. #2

    Default Re: NFS file problem: maybe a stale/stuck handle?

    "Theodore W. Hall" <twhall@cuhk.edu.hk> writes:
    >This has me stumped. My apologies if the subject line misses the
    >mark.
    >...
    >Everything seems fine, except for one file:
    > /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
    >...
    >* If I make another duplicate of the file, or even the entire
    > netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
    > work. Everything else in netscape seems to run fine, except for
    > the "Spelling" tool.
    I wonder if the pen4s324.dat might be a "sparse file" which may not
    have gotten copied correctly onto the new server. Try running a local
    and nfs mounted cksum on both files and see if there is a checksum
    difference. Along the same line of reasoning if the file did get
    copied OK but is indeed a sparse file there may be some sort of nfs
    bug that you are running into related to sparse file handling.

    Good Luck!

    Mark Hittinger
    [email]bugs@pu.net[/email]
    Mark Hittinger Guest

  4. #3

    Default Re: NFS file problem: maybe a stale/stuck handle?

    In article <eKOdnVU3rrwIqafdRVn_vA@comcast.com>,
    [email]bugs@pu.net[/email] (Mark Hittinger) wrote:
    > "Theodore W. Hall" <twhall@cuhk.edu.hk> writes:
    > >This has me stumped. My apologies if the subject line misses the
    > >mark.
    > >...
    > >Everything seems fine, except for one file:
    > > /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
    > >...
    > >* If I make another duplicate of the file, or even the entire
    > > netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
    > > work. Everything else in netscape seems to run fine, except for
    > > the "Spelling" tool.
    >
    > I wonder if the pen4s324.dat might be a "sparse file" which may not
    > have gotten copied correctly onto the new server. Try running a local
    > and nfs mounted cksum on both files and see if there is a checksum
    > difference. Along the same line of reasoning if the file did get
    > copied OK but is indeed a sparse file there may be some sort of nfs
    > bug that you are running into related to sparse file handling.
    He said he already compared them using "cmp" and "diff"; I can't imagine
    how checksum would detect a difference that these didn't. Sparseness is
    transparent to user-level applications.

    I suggest the OP use truss on the client to see what it's doing when
    Netscape hangs.

    --
    Barry Margolin, [email]barmar@alum.mit.edu[/email]
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    Barry Margolin Guest

  5. #4

    Default Re: NFS file problem: maybe a stale/stuck handle?

    Theodore W. Hall wrote:
    >
    > I've recently migrated some directories from one NFS file server
    > (SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
    > NFS client (IRIX 6.5.19m).
    cp isn't a good tool to copy directory trees from one machine to
    another.
    > /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
    >
    > The symptom is that Netscape mail's "Spelling" tool is unavailable.
    > I've traced the spelling problem to that particular file.
    >
    > I've compared the files on the SunOS and IRIX servers with diff and
    > cmp, as a regular user (no special privileges), and they appear to
    > be identical. Neither diff nor cmp have any trouble reading either
    > file. Ownership, sizes, dates, permissions, everything I can think
    > of to compare is identical, except for their physical locations on
    > disk.
    Next check the UID and GID numbers that give the names you see.
    Check them on both machines. The ownership could still be wrong.

    Finally there's a topic that comes up a lot in tape drives that may
    apply. IRIX uses little-endian and Solaris uses big-endian (or is
    it the other way around, anyways they are different). Reading a file
    locally won't do htonl() and ntohl() mapping of binary files. Reading
    a file over NFS should do network order mapping. I suspect it is a
    binary file and the XDR layer of NFS broke its byte order.
    Doug Freyburger Guest

  6. #5

    Default Re: NFS file problem: maybe a stale/stuck handle?

    Barry Margolin wrote:
    > I suggest the OP use truss on the client to see what it's doing
    > when Netscape hangs.
    That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,
    which reports an error of "No locks available":

    open("/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat", O_RDONLY, 0777) = 28
    fcntl(28, F_GETLK, 0x7ffefba0) errno = 46 (No locks available)
    close(28) OK
    close(28) errno = 9 (Bad file number)

    Moreover, when I shuffle things to get around that (as I described
    in my original post), I get a similar "No locks available" error on
    a couple of other spelling related files -- netscape.dic and
    ${HOME}/.netscape/custom.dic

    open("/usr/local/netscape_4.79_irix6.5/spell/netscape.dic", O_RDONLY, 0777) = 29
    fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
    close(29)
    END-close() OK
    ...
    open("/usr/people/hall/.netscape/custom.dic", O_RDONLY, 0777) = 29
    fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
    close(29) OK

    Apparently, if "pen4s324.dat" fails, then netscape doesn't even try
    to open the custom dictionaries. If "pen4s324.dat" succeeds, then
    the Spelling tool is available even though it fails to open the
    custom dictionaries. ("pen4s324.dat" seems to be the main
    dictionary, in some binary format. "netscape.dic" is a text file
    with only a few entries such as "Netscape", "HTML", "browser",
    "Collabra", "applets", ... not essential to the tool.)

    I've rebooted the NFS client again, but it made no difference. So
    the next question is: Why are there no locks available for these
    few files? Everything else seems to be fine.

    --

    Ted Hall
    Theodore W. Hall Guest

  7. #6

    Default Re: NFS file problem: maybe a stale/stuck handle?

    Mark Hittinger wrote:
    > I wonder if the pen4s324.dat might be a "sparse file" which may not
    > have gotten copied correctly onto the new server. Try running a local
    > and nfs mounted cksum on both files and see if there is a checksum
    > difference. Along the same line of reasoning if the file did get
    > copied OK but is indeed a sparse file there may be some sort of nfs
    > bug that you are running into related to sparse file handling.
    >
    > Good Luck!
    Thanks. I tried /usr/bin/cksum on both NFS servers and the NFS
    client and got identical results on all three hosts.

    I've also discovered via /usr/sbin/par that a similar error is
    occurring on a plain text file of just 144 bytes (netscape.dic), so
    it doesn't seem to be related to "sparsity". I didn't notice this
    error previously since the only "damage" is the loss of a handful
    of custom dictionary entries.

    --

    Ted Hall
    Theodore W. Hall Guest

  8. #7

    Default Re: NFS file problem: maybe a stale/stuck handle?

    Doug Freyburger wrote:
    > Next check the UID and GID numbers that give the names you see.
    > Check them on both machines. The ownership could still be wrong.
    UID = 0, GID = 0, mode = -rw-r--r--

    > Finally there's a topic that comes up a lot in tape drives that may
    > apply. IRIX uses little-endian and Solaris uses big-endian (or is
    > it the other way around, anyways they are different). Reading a
    > file locally won't do htonl() and ntohl() mapping of binary files.
    > Reading a file over NFS should do network order mapping. I suspect
    > it is a binary file and the XDR layer of NFS broke its byte order.
    Mmm, in my experience, IRIX-MIPS and SunOS-SPARC are both big-endian.

    Anyay, /usr/sbin/par reveals that I'm getting a similar error
    "No locks available" on a 144-byte plain text file, "netscape.dic".

    Thanks for your suggestions. I've been bitten by byte order before,
    but not this time.

    --

    Ted Hall
    Theodore W. Hall Guest

  9. #8

    Default Re: NFS file problem: maybe a stale/stuck handle?

    In article <403B697A.96099D1E@cuhk.edu.hk>,
    "Theodore W. Hall" <twhall@cuhk.edu.hk> wrote:
    > That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,
    > which reports an error of "No locks available":
    Sounds like there's a problem with file locking on the server you copied
    the files to.

    --
    Barry Margolin, [email]barmar@alum.mit.edu[/email]
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    Barry Margolin Guest

  10. #9

    Default Re: NFS file problem: maybe a stale/stuck handle?

    In article <403B697A.96099D1E@cuhk.edu.hk>, I wrote:
    > That was revealing. "man -k truss" on IRIX led me to
    > /usr/sbin/par, which reports an error of "No locks available":

    Barry Margolin wrote:
    > Sounds like there's a problem with file locking on the server you
    > copied the files to.
    Eureka! I've just discovered that the SGI NFS server isn't
    running lockd -- it's a separate switch from nfsd. nfsd is on,
    but lockd is off. Urggh ... live and learn ... I'm obviously
    an amateur at this.

    I can't reboot now, but I'm confident that starting lockd on the
    next reboot will clear this up. If it doesn't, I'll be back ...

    --

    Ted Hall
    Theodore W. Hall Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139