Ask a Question related to Linux / Unix Administration, Design and Development.
-
Theodore W. Hall #1
NFS file problem: maybe a stale/stuck handle?
This has me stumped. My apologies if the subject line misses the
mark.
I've recently migrated some directories from one NFS file server
(SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
NFS client (IRIX 6.5.19m).
Before the migration, the NFS client did this:
/usr/local -> /net/sun-host/mnt/usr/local
After the migration, it does this:
/usr/local -> /net/irix-host/mnt/usr/local
Everything seems fine, except for one file:
/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
The symptom is that Netscape mail's "Spelling" tool is unavailable.
I've traced the spelling problem to that particular file.
I've compared the files on the SunOS and IRIX servers with diff and
cmp, as a regular user (no special privileges), and they appear to
be identical. Neither diff nor cmp have any trouble reading either
file. Ownership, sizes, dates, permissions, everything I can think
of to compare is identical, except for their physical locations on
disk. In trying to debug this, I've found the following:
* Only netscape seems to have any trouble accessing the file.
Other apps (diff, cmp, strings, cp) have no trouble reading it.
* If I replace the file on the IRIX server with a symlink back to
the SunOS server, it works in netscape:
pen4s324.dat -> /net/sun-host/mnt/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
* If I replace it with a symlink to another duplicate of the file
on the NFS client's internal disk, it works in netscape (only on
that host, of course):
pen4s324.dat -> /usr/local-on-nfs-client/pen4s324.dat
* If I make another duplicate of the file, or even the entire
netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
work. Everything else in netscape seems to run fine, except for
the "Spelling" tool.
* If I log-in to the IRIX server and run netscape there (with
DISPLAY pointing back to the NFS client), THAT works. It avoids
NFS and accesses the file locally:
/usr/local -> /mnt/usr/local
I've rebooted both the IRIX NFS server and the NFS client, but this
behavior persists. It's not tied to netscape's pathname to the file
(since the pathname works if it's a symlink to the SunOS server).
It's not tied to the file's inode (since copying the file to a
different inode doesn't change anything). It's not tied to the
file's contents (since it works as long as the app is running on the
NFS server rather than the NFS client).
The last point seems to indicate some sort of NFS problem, but as I
said, I've rebooted both machines to no avail. I'd appreciate any
tips.
--
Ted Hall
Theodore W. Hall Guest
-
CSS local cache stale with WebDAV connection
We've been successfully using Contribute 4 (Windows) against an Apache 2 WebDAV server for several months now. Roughly 20 contributors have been... -
Accessing web.sitemap XML file - completely stuck!
Web.sitemap is the default name of the XML sitemap file used in asp .net web sites. I am having trouble opening it in an embedded .swf on any of my... -
Dev Challenge: Stuck on PDF problem
Let's see if I can describe this scenario. I have a "triple-feature" CD. Runs on Windows/Mac OSX/Mac OS9.x I use a shared drive on my Mac as my... -
File Browser Stuck
The file browser is stuck on the Photoshop 7.01 desktop. I cannot move it or move the scroll bars in order to get to the file menus. I have tried... -
handle file > 2GB
Hi there, Anyone know how to handle file size > 2GB in SUN? I called stat but it fail with EOVERFLOW if the file > 2GB. TIA -
Mark Hittinger #2
Re: NFS file problem: maybe a stale/stuck handle?
"Theodore W. Hall" <twhall@cuhk.edu.hk> writes:
I wonder if the pen4s324.dat might be a "sparse file" which may not>This has me stumped. My apologies if the subject line misses the
>mark.
>...
>Everything seems fine, except for one file:
> /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
>...
>* If I make another duplicate of the file, or even the entire
> netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
> work. Everything else in netscape seems to run fine, except for
> the "Spelling" tool.
have gotten copied correctly onto the new server. Try running a local
and nfs mounted cksum on both files and see if there is a checksum
difference. Along the same line of reasoning if the file did get
copied OK but is indeed a sparse file there may be some sort of nfs
bug that you are running into related to sparse file handling.
Good Luck!
Mark Hittinger
[email]bugs@pu.net[/email]
Mark Hittinger Guest
-
Barry Margolin #3
Re: NFS file problem: maybe a stale/stuck handle?
In article <eKOdnVU3rrwIqafdRVn_vA@comcast.com>,
[email]bugs@pu.net[/email] (Mark Hittinger) wrote:
He said he already compared them using "cmp" and "diff"; I can't imagine> "Theodore W. Hall" <twhall@cuhk.edu.hk> writes:>> >This has me stumped. My apologies if the subject line misses the
> >mark.
> >...
> >Everything seems fine, except for one file:
> > /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
> >...
> >* If I make another duplicate of the file, or even the entire
> > netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
> > work. Everything else in netscape seems to run fine, except for
> > the "Spelling" tool.
> I wonder if the pen4s324.dat might be a "sparse file" which may not
> have gotten copied correctly onto the new server. Try running a local
> and nfs mounted cksum on both files and see if there is a checksum
> difference. Along the same line of reasoning if the file did get
> copied OK but is indeed a sparse file there may be some sort of nfs
> bug that you are running into related to sparse file handling.
how checksum would detect a difference that these didn't. Sparseness is
transparent to user-level applications.
I suggest the OP use truss on the client to see what it's doing when
Netscape hangs.
--
Barry Margolin, [email]barmar@alum.mit.edu[/email]
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
Barry Margolin Guest
-
Doug Freyburger #4
Re: NFS file problem: maybe a stale/stuck handle?
Theodore W. Hall wrote:
cp isn't a good tool to copy directory trees from one machine to>
> I've recently migrated some directories from one NFS file server
> (SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
> NFS client (IRIX 6.5.19m).
another.
Next check the UID and GID numbers that give the names you see.> /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
>
> The symptom is that Netscape mail's "Spelling" tool is unavailable.
> I've traced the spelling problem to that particular file.
>
> I've compared the files on the SunOS and IRIX servers with diff and
> cmp, as a regular user (no special privileges), and they appear to
> be identical. Neither diff nor cmp have any trouble reading either
> file. Ownership, sizes, dates, permissions, everything I can think
> of to compare is identical, except for their physical locations on
> disk.
Check them on both machines. The ownership could still be wrong.
Finally there's a topic that comes up a lot in tape drives that may
apply. IRIX uses little-endian and Solaris uses big-endian (or is
it the other way around, anyways they are different). Reading a file
locally won't do htonl() and ntohl() mapping of binary files. Reading
a file over NFS should do network order mapping. I suspect it is a
binary file and the XDR layer of NFS broke its byte order.
Doug Freyburger Guest
-
Theodore W. Hall #5
Re: NFS file problem: maybe a stale/stuck handle?
Barry Margolin wrote:
That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,> I suggest the OP use truss on the client to see what it's doing
> when Netscape hangs.
which reports an error of "No locks available":
open("/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat", O_RDONLY, 0777) = 28
fcntl(28, F_GETLK, 0x7ffefba0) errno = 46 (No locks available)
close(28) OK
close(28) errno = 9 (Bad file number)
Moreover, when I shuffle things to get around that (as I described
in my original post), I get a similar "No locks available" error on
a couple of other spelling related files -- netscape.dic and
${HOME}/.netscape/custom.dic
open("/usr/local/netscape_4.79_irix6.5/spell/netscape.dic", O_RDONLY, 0777) = 29
fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
close(29)
END-close() OK
...
open("/usr/people/hall/.netscape/custom.dic", O_RDONLY, 0777) = 29
fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
close(29) OK
Apparently, if "pen4s324.dat" fails, then netscape doesn't even try
to open the custom dictionaries. If "pen4s324.dat" succeeds, then
the Spelling tool is available even though it fails to open the
custom dictionaries. ("pen4s324.dat" seems to be the main
dictionary, in some binary format. "netscape.dic" is a text file
with only a few entries such as "Netscape", "HTML", "browser",
"Collabra", "applets", ... not essential to the tool.)
I've rebooted the NFS client again, but it made no difference. So
the next question is: Why are there no locks available for these
few files? Everything else seems to be fine.
--
Ted Hall
Theodore W. Hall Guest
-
Theodore W. Hall #6
Re: NFS file problem: maybe a stale/stuck handle?
Mark Hittinger wrote:
Thanks. I tried /usr/bin/cksum on both NFS servers and the NFS> I wonder if the pen4s324.dat might be a "sparse file" which may not
> have gotten copied correctly onto the new server. Try running a local
> and nfs mounted cksum on both files and see if there is a checksum
> difference. Along the same line of reasoning if the file did get
> copied OK but is indeed a sparse file there may be some sort of nfs
> bug that you are running into related to sparse file handling.
>
> Good Luck!
client and got identical results on all three hosts.
I've also discovered via /usr/sbin/par that a similar error is
occurring on a plain text file of just 144 bytes (netscape.dic), so
it doesn't seem to be related to "sparsity". I didn't notice this
error previously since the only "damage" is the loss of a handful
of custom dictionary entries.
--
Ted Hall
Theodore W. Hall Guest
-
Theodore W. Hall #7
Re: NFS file problem: maybe a stale/stuck handle?
Doug Freyburger wrote:
UID = 0, GID = 0, mode = -rw-r--r--> Next check the UID and GID numbers that give the names you see.
> Check them on both machines. The ownership could still be wrong.
Mmm, in my experience, IRIX-MIPS and SunOS-SPARC are both big-endian.> Finally there's a topic that comes up a lot in tape drives that may
> apply. IRIX uses little-endian and Solaris uses big-endian (or is
> it the other way around, anyways they are different). Reading a
> file locally won't do htonl() and ntohl() mapping of binary files.
> Reading a file over NFS should do network order mapping. I suspect
> it is a binary file and the XDR layer of NFS broke its byte order.
Anyay, /usr/sbin/par reveals that I'm getting a similar error
"No locks available" on a 144-byte plain text file, "netscape.dic".
Thanks for your suggestions. I've been bitten by byte order before,
but not this time.
--
Ted Hall
Theodore W. Hall Guest
-
Barry Margolin #8
Re: NFS file problem: maybe a stale/stuck handle?
In article <403B697A.96099D1E@cuhk.edu.hk>,
"Theodore W. Hall" <twhall@cuhk.edu.hk> wrote:
Sounds like there's a problem with file locking on the server you copied> That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,
> which reports an error of "No locks available":
the files to.
--
Barry Margolin, [email]barmar@alum.mit.edu[/email]
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
Barry Margolin Guest
-
Theodore W. Hall #9
Re: NFS file problem: maybe a stale/stuck handle?
In article <403B697A.96099D1E@cuhk.edu.hk>, I wrote:
> That was revealing. "man -k truss" on IRIX led me to
> /usr/sbin/par, which reports an error of "No locks available":
Barry Margolin wrote:
Eureka! I've just discovered that the SGI NFS server isn't> Sounds like there's a problem with file locking on the server you
> copied the files to.
running lockd -- it's a separate switch from nfsd. nfsd is on,
but lockd is off. Urggh ... live and learn ... I'm obviously
an amateur at this.
I can't reboot now, but I'm confident that starting lockd on the
next reboot will clear this up. If it doesn't, I'll be back ...
--
Ted Hall
Theodore W. Hall Guest



Reply With Quote

