Strange crash on E-450

Ask a Question related to Sun Solaris, Design and Development.

  1. #1

    Default Strange crash on E-450

    I have an E-450 that has crashed the past 2 days. When I get in in
    the morning the server is single-user waiting for a password. The
    strange part is, it appears the server did not go all the way down and
    then boot back up. The /var, /opt and some other file systems are
    corrupt. I can fsck /var, but not the others: I get a "Can't open
    /dev/vx/dsk/opt" error. When I fix /var and mount it, all the users
    are still logged in. The server logs show that it has NOT rebooted.
    The uptime is from the last known reboot. There are no logs, since
    /var is not mounted. And I am not getting a crash dump.

    We are running Solaris 8. The only app running is Sybase 12.5.

    Any ideas...


    Joe
    Joe Guest

  2. Similar Questions and Discussions

    1. Strange crash in a game i'm developing...
      Hello, I'm programming a minigolf game using Shockwave 3d and Havok in Director 10.1. It's in beta and I've found a HUGE problem. Sometimes when...
    2. Strange Crash
      Perhaps this isn't the most appropriate avenue for seeking help about crashes, but this problem is driving me nuts... and costing me time. I have...
    3. #23132 [Csd]: Strange engine crash (reference counting problem)
      ID: 23132 Updated by: moriyoshi@php.net -Summary: Strange engine crash Reported By: edink at proventum dot net...
    4. #25410 [Opn->Csd]: strange crash & freeze bug when reassigning global var to method's return
      ID: 25410 Updated by: sniper@php.net Reported By: xuefer at 21cn dot com -Status: Open +Status: ...
    5. Strange script induced 'crash'
      Hi All, FMP 6.0v1, mac OS 9.2 two files (A, B), a simple perform external scrip, script in A executes the script in B: called to open The...
  3. #2

    Default Re: Strange crash on E-450

    Joe wrote:
    >
    > I have an E-450 that has crashed the past 2 days. When I get in in
    > the morning the server is single-user waiting for a password.
    Can you describe this in a little more detail?
    > The strange part is, it appears the server did not go all the way down
    > and then boot back up. The /var, /opt and some other file systems are
    > corrupt. I can fsck /var, but not the others: I get a "Can't open
    > /dev/vx/dsk/opt" error.
    You do realise you should fsck the raw device /dev/vx/rdsk/...?
    > When I fix /var and mount it, all the users are still logged in.
    Are you sure it was in single user mode?
    > The server logs show that it has NOT rebooted.
    > The uptime is from the last known reboot.
    What does last show?
    > There are no logs, since /var is not mounted.
    Hmmm ... last or the others won't show anything then.
    > And I am not getting a crash dump.
    >
    > We are running Solaris 8. The only app running is Sybase 12.5.
    What about ASE's log? Is its filesystem unmounted? If not,
    check ASE's log for anything unusual.

    -am © 2003
    Anthony Mandic Guest

  4. #3

    Default Re: Strange crash on E-450

    On 9 Jul 2003 06:06:15 -0700, [email]joehoechst@cs.com[/email] (Joe) wrote:
    >I have an E-450 that has crashed the past 2 days. When I get in in
    >the morning the server is single-user waiting for a password. The
    >strange part is, it appears the server did not go all the way down and
    >then boot back up. The /var, /opt and some other file systems are
    >corrupt. I can fsck /var, but not the others: I get a "Can't open
    >/dev/vx/dsk/opt" error. When I fix /var and mount it, all the users
    >are still logged in. The server logs show that it has NOT rebooted.
    >The uptime is from the last known reboot. There are no logs, since
    >/var is not mounted. And I am not getting a crash dump.
    >
    >We are running Solaris 8. The only app running is Sybase 12.5.
    >
    >Any ideas...
    >
    >
    >Joe
    You might want to set up mirrors for your system logs on partitions
    that stay up. (see /etc/syslog.conf)

    Does Sybase stay up? Are there any anomalies in the Sybase logs?

    When you say "is in single user waiting for the password", do you mean
    that you can't telnet in from another machine, or that the console is
    in text mode at the password prompt?

    Are there any errors in the system logs at all that might indicate a
    problem?

    When you say "the users are still logged in", do you mean that their
    processes are still there, but disconnected from their clients, or
    that these users can continue working as before? This bit is
    particularly confusing to me.

    How do you fix /opt and the other FS? Do you have to reboot?

    does prtdiag -v show everything you think it should?

    You have to be more specific.

    --Kent
    =================================
    Kent Smith * IPSO Incorporated
    Business * Technology * Solutions
    Financial Services and Accounting Systems Consulting

    [url]http://www.ipsoinc.com[/url]
    Kent Smith Guest

  5. #4

    Default Re: Strange crash on E-450

    Kent Smith <ksmith@ipsoinc.com> wrote in message news:<c49ogv446o517lo5phugikp84b1lkqvl9e@4ax.com>. ..
    > On 9 Jul 2003 06:06:15 -0700, [email]joehoechst@cs.com[/email] (Joe) wrote:
    > >
    > >Joe
    > You might want to set up mirrors for your system logs on partitions
    > that stay up. (see /etc/syslog.conf)
    >
    > Does Sybase stay up? Are there any anomalies in the Sybase logs?
    No Sybase does not stay up. It is one of the corrutped file systems.
    >
    > When you say "is in single user waiting for the password", do you mean
    > that you can't telnet in from another machine, or that the console is
    > in text mode at the password prompt?
    No, you can not telnet from other servers. This is what is at the
    console. (without the dashes)

    --------------
    Login incorrect

    Type control-d to proceed with normal startup,
    (or give root password for system maintenance):
    -------------------
    >
    > Are there any errors in the system logs at all that might indicate a
    > problem?
    No, there are no logs to speak of. We send the messages to another
    server (and keep them local). The only thing I get in the messages
    log is the typically messages when I reboot the server. There are no
    crash dumps either.
    >
    > When you say "the users are still logged in", do you mean that their
    > processes are still there, but disconnected from their clients, or
    > that these users can continue working as before? This bit is
    > particularly confusing to me.
    >
    I did not check on their processes. When I did the last command, I
    saw they were still logged in. The finger command confirmed that. I
    did a "who -b" and it said it was still up from April 7th (the first
    time this happened)
    > How do you fix /opt and the other FS? Do you have to reboot?
    The only file system that I could fsck was /var. I had to reboot to
    fix the rest. When I tried fo fsck the file systems it got "Can't
    open /dev/vx/dsk/opt" I did try to fsck /dev/vx/rdk/opt also with the
    same message.
    >
    > does prtdiag -v show everything you think it should?
    I believe it does. I don't see any errors. The one thing I notice is
    prtdiag on an E-450 does not show last power failure.
    >
    > You have to be more specific.
    I know. This sounds very strange. It appears as if the server is
    going DOWN to single user mode, not rebooting to single user mode.
    Otherwise I would assume a bad power supply/input. But since it
    appears that the servers does not completely re-boot, that theory does
    not seem possible.

    >
    > --Kent
    > =================================
    > Kent Smith * IPSO Incorporated
    > Business * Technology * Solutions
    > Financial Services and Accounting Systems Consulting
    >
    > [url]http://www.ipsoinc.com[/url]
    Joe Guest

  6. #5

    Default Re: Strange crash on E-450

    Joe wrote:
    >
    > I know. This sounds very strange. It appears as if the server is
    > going DOWN to single user mode, not rebooting to single user mode.
    > Otherwise I would assume a bad power supply/input. But since it
    > appears that the servers does not completely re-boot, that theory does
    > not seem possible.
    Caution: this is from another unix variant, belief is that Solaris
    would do the same thing tho... and someone will correct if wrong.

    Have seen a similar issue before, mysteriously going to single
    mode. A wacked out user application was sending kill signals
    without first making sure that the PID destination was properly
    initialized and made sense. Murphy's Law dictated that the
    PID it would end up with was "-1" which would kill the init
    process group. Due to an admin oopsie, this wacked app was
    running as root.

    Lon Stowell Guest

  7. #6

    Default Re: Strange crash on E-450


    "Lon Stowell" <lon.stowell@comcast.net> wrote in message
    news:Di_Oa.19146$Ph3.1404@sccrnsc04...
    > Joe wrote:
    >
    > >
    > > I know. This sounds very strange. It appears as if the server is
    > > going DOWN to single user mode, not rebooting to single user mode.
    > > Otherwise I would assume a bad power supply/input. But since it
    > > appears that the servers does not completely re-boot, that theory does
    > > not seem possible.
    >
    > Caution: this is from another unix variant, belief is that Solaris
    > would do the same thing tho... and someone will correct if wrong.
    >
    > Have seen a similar issue before, mysteriously going to single
    > mode. A wacked out user application was sending kill signals
    > without first making sure that the PID destination was properly
    > initialized and made sense. Murphy's Law dictated that the
    > PID it would end up with was "-1" which would kill the init
    > process group. Due to an admin oopsie, this wacked app was
    > running as root.
    >
    That makes very good sense.
    I still say have someone detail profile the power and environmental
    aspects...

    Is this system accesible to the internet? A Firewall in place?

    There are no logs anywhere on this server, with any clue as to what
    happened?

    Have you checked with Sun? Any patches available that address symptoms like
    this?

    Put a performance monitor product on both the power and on the E-450 /
    Solaris and see what it says, over a weeks worth of time. Detail profile
    what goes on, when, and what affect it has on operational dynamics and
    resources. If something is going on, you'll have a clear indication as to
    exactly when, what how & why. You can get a 10 day free trial version of an
    extremely detailed, low overhead Solaris performance monitor at
    [url]www.deltekonline.com[/url]. (Solaris Agent, Windows Performance Console - two
    components).

    hth.
    --
    Regards,
    Scott


    Scott Richardson Guest

  8. #7

    Default Re: Strange crash on E-450

    On 9 Jul 2003 12:39:20 -0700, [email]joehoechst@cs.com[/email] (Joe) wrote:
    >Kent Smith <ksmith@ipsoinc.com> wrote in message news:<c49ogv446o517lo5phugikp84b1lkqvl9e@4ax.com>. ..
    >> On 9 Jul 2003 06:06:15 -0700, [email]joehoechst@cs.com[/email] (Joe) wrote:
    >
    >> >
    >> >Joe
    >I know. This sounds very strange. It appears as if the server is
    >going DOWN to single user mode, not rebooting to single user mode.
    >Otherwise I would assume a bad power supply/input. But since it
    >appears that the servers does not completely re-boot, that theory does
    >not seem possible.
    >
    What does "uptime" report? Does the system thing it rebooted? What
    is the date of the "init" process (as reported by ps -ef)? Your last
    scheduled reboot, or the time of the anomalous behavior?

    Do you have savecore enabled? (in my 2.6 system that is done in
    rc2.d/S20sysetup). If not, you might want to. If so, did you get a
    system core? Anything interesting in it?

    This is a real stumper!

    --Kent
    =================================
    Kent Smith * IPSO Incorporated
    Business * Technology * Solutions
    Financial Services and Accounting Systems Consulting

    [url]http://www.ipsoinc.com[/url]
    Kent Smith Guest

  9. #8

    Default Re: Strange crash on E-450

    Scott Richardson wrote:
    >
    > Lon Stowell wrote:
    >
    > > Have seen a similar issue before, mysteriously going to single
    > > mode. A wacked out user application was sending kill signals
    > > without first making sure that the PID destination was properly
    > > initialized and made sense. Murphy's Law dictated that the
    > > PID it would end up with was "-1" which would kill the init
    > > process group. Due to an admin oopsie, this wacked app was
    > > running as root.
    >
    > That makes very good sense.
    Yeah, I knew someone who once did "kill 1 1" instead of
    "kill -1 1".
    > I still say have someone detail profile the power and environmental
    > aspects...
    >
    > Is this system accesible to the internet? A Firewall in place?
    >
    > There are no logs anywhere on this server, with any clue as to what
    > happened?
    I'd suggest pointing the loghost to another machine. Even if
    this one hangs and doesn't write to its logs, its syslogd
    might be able to send a log message out to another machine.

    -am © 2003
    Anthony Mandic Guest

  10. #9

    Default Re: Strange crash on E-450

    This sounds similar to an issue I had about 8 months ago. I had a e4500
    with 8 CPUs and 4Gb of ram with an external storage array in a room that was
    supposed to have 24x7 AC. As it turned out the rooms AC unit was on a
    controller that set to cool only during work hours. Bottom line is that the
    sever was shutting its self down because it was getting too hot during the
    night when the AC was not on. We did not catch the issue until I had to
    physically do something to on of the other servers in the same room,when I
    walked into the room it was easily 100 degrees Fahrenheit in the room and
    then system was reporting an internal temp of 140 degrees centigrade.

    Charles

    "Joe" <joehoechst@cs.com> wrote in message
    news:461cdbbf.0307090506.71bddbb5@posting.google.c om...
    > I have an E-450 that has crashed the past 2 days. When I get in in
    > the morning the server is single-user waiting for a password. The
    > strange part is, it appears the server did not go all the way down and
    > then boot back up. The /var, /opt and some other file systems are
    > corrupt. I can fsck /var, but not the others: I get a "Can't open
    > /dev/vx/dsk/opt" error. When I fix /var and mount it, all the users
    > are still logged in. The server logs show that it has NOT rebooted.
    > The uptime is from the last known reboot. There are no logs, since
    > /var is not mounted. And I am not getting a crash dump.
    >
    > We are running Solaris 8. The only app running is Sybase 12.5.
    >
    > Any ideas...
    >
    >
    > Joe

    Lacour Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139