Professional Web Applications Themes

5.0.7 machine locks up! - SCO

I just put into production my new OSR 5.0.7 machine running an Orthodontist application, and I have experienced several times where the system became unresponsive and I had to do power cycle on it (Ouch). I tried a terminal login, a virtual console, and even a telnet login. All I had was one shell that responded, but when I tried to switch to the root user, it would become unresponsive. Where can I look to find out what caused this? Here is how the server is configured. AMD Athlon 512 ECC Ram 120 Gig EIDE drive. Intel Ethernet Panasonic EIDE ...

  1. #1

    Default 5.0.7 machine locks up!

    I just put into production my new OSR 5.0.7 machine running an
    Orthodontist application, and I have experienced several times where
    the system became unresponsive and I had to do power cycle on it
    (Ouch). I tried a terminal login, a virtual console, and even a telnet
    login. All I had was one shell that responded, but when I tried to
    switch to the root user, it would become unresponsive. Where can I
    look to find out what caused this?

    Here is how the server is configured.
    AMD Athlon
    512 ECC Ram
    120 Gig EIDE drive.
    Intel Ethernet
    Panasonic EIDE DVD-RAM drive.

    I also have a Digi Portserver TS 16 for the terminals and printers. I
    am thinking that maybe it is causing some sort of kernel blocking.

    brian

    Brian Guest

  2. #2

    Default Re: 5.0.7 machine locks up!

    Brian Lavender wrote:
     

    Let's start by getting a good description of the problem. I don't
    understand "All I had was one shell that responded, but when I tried to
    switch to the root user, it would become unresponsive". Was this a
    pre-existing shell before the problem, or do you mean that you were able
    to login, but no more than that?

    When you say you tried "a terminal login, a virtual console, and even a
    telnet login", what happened in each case? Be explicit. For instance,
    for the [serial?] terminal login, presumably you walked up to a terminal
    that already had a "login:" prompt waiting. You typed "root", did it
    echo the characters? If so, you hit return, did it give you a
    "Password:" prompt? Describe all the attempts at this level of detail.

    Before the problem, was the console sitting at a graphical or text
    screen? If text, were there any messages? If graphical, were you able
    to switch away to a text screen? (That should be part of your
    description of the "virtual console" attempt...)
     

    Maybe. Make sure that the Digi cabling is correct; in particular, make
    sure you aren't using longer cables than recommended by Digi, to connect
    the external Portserver box to the PCI card.

    Next time it happens, record details of all the attempts to get in.

    Note whether it responds to `ping` from another system.

    From the first moment the hang is noticed, try to keep an eye on the
    hard disk light(s) on the machine, see whether you can detect any hard
    disk activity (lights blinking). If the light is stuck on _or_ stuck
    off, the drive may be hanging. If the symptoms can be described like
    this, it's probably a disk hang:

    Programs that are already running, like an existing shell prompt, a
    login: prompt, or inetd, respond normally; but they stop responding as
    soon as they need to access the disk. Running an internal command
    like "echo foo" at the shell prompt works. Running an external
    command like `uptime` hangs. Characters can be typed into the login
    prompt, but when you hit return, no password prompt appears. inetd-
    started services like telnet & ftp accept connections, but no login
    prompt appears; the FTP client never starts up.

    If you determine that it's a disk hang (or if you aren't sure), please
    run the following commands and show the output:

    # drive=/dev/rhd00
    # dparam $drive
    # fdisk -p -f $drive
    # divvy -S $drive
    # divvy -N -P $drive
    # divvy -R $drive

    If there's more than one disk and you suspect a different drive is
    hanging, run through those commands, starting with "drive=/dev/rhd10" or
    whatever drive it is.
     
    Bela Guest

  3. #3

    Default Re: 5.0.7 machine locks up!

    On Fri, 5 Sep 2003 02:06:25 GMT, Bela Lubkin <com> wrote:
     
    >
    >Let's start by getting a good description of the problem. I don't
    >understand "All I had was one shell that responded, but when I tried to
    >switch to the root user, it would become unresponsive". Was this a
    >pre-existing shell before the problem, or do you mean that you were able
    >to login, but no more than that?
    >
    >When you say you tried "a terminal login, a virtual console, and even a
    >telnet login", what happened in each case? Be explicit. For instance,
    >for the [serial?] terminal login, presumably you walked up to a terminal
    >that already had a "login:" prompt waiting. You typed "root", did it
    >echo the characters? If so, you hit return, did it give you a
    >"Password:" prompt? Describe all the attempts at this level of detail.
    >
    >Before the problem, was the console sitting at a graphical or text
    >screen? If text, were there any messages? If graphical, were you able
    >to switch away to a text screen? (That should be part of your
    >description of the "virtual console" attempt...)[/ref]


    Here's what the machine has attached to it.

    2 telnet logins
    2 serial based terminal logins
    1 login is through the Digi Terminal Server
    1 login is through the tty1a
    The console with three virtual consoles

    The telnet and serial based terminals became totally unresponsive. I
    had one console on tty02 in an existing shell where I could type
    $ w
    and it would respond. I could type a few other commands as well. On
    the other virtual consoles, if I logged out, I would get a login:
    prompt. I could type in the user name, but then I would get no
    password: prompt. I do believe that after a waiting a long time, I was
    able to get a password prompt. Then I did get a # prompt. I typed
    # init 6
    but it wouldn't go into reboot. The result of the w command showed
    zero load. I did a power cycle on the box, and after rebooting, I
    checked syslog and messages. I couldn't see nothing that resulted in
    the problem.

    The Digi Terminal Server doesn't connect via PCI or ISA. It sits on
    the network and uses a driver to make the serial ports look as if they
    are local.

    A friend suggested I look at ps and see if there is a process that has
    a blocked or waiting interrupt. He also suggested looking at lsof.

    The one thing I do know is that one of the serial based terminals
    shows names of patients who are scheduled to arrive. The patients
    normally check themselves in. If the receptionist checks in a person
    in a person instead, the program normally removes the patient, and
    updates the patient login screen. The tty for the patient checkin is
    writeable by other users. I am thinking that maybe there is some
    process that has or is waiting for something to come available, and it
    is causing the system to block. There doesn't seem to be any specific
    conditions that causes this lockup.

    Any suggestions on how to troubleshoot this?

    brian
    Brian Guest

  4. #4

    Default Re: 5.0.7 machine locks up!

    Brian Lavender wrote:
     [/ref][/ref]
     

    This could very well be a disk hang. Next time it happens, watch the
    disk light very carefully while provoking as much response as you can
    still get (e.g. running `w`, getting to login prompts on multiscreens,
    typing username, hitting return, no password prompt; etc.) If there is
    no disk light activity (remains stuck off or on), it's probably a disk
    hang. Then, when rebooting, watch the light again just to be sure that
    it works under normal conditions (it wouldn't do to blame the disk when
    actually the cable to the LED wasn't connected...)

    Then, check all disk cabling and related areas. If it's SCSI, go into
    the HBA's BIOS setup, make sure it's configured sanely.
     

    Something that causes all terminal processes on the system to become
    unresponsive is unlikely to be caused by a process. But do look at `ps
    -ef` if you can. Column 4 (labeled "C") is an indicator of recent CPU
    usage. If you see one or more processes that show 80 (the max value),
    they're probably spinning consuming CPU. But normally that would only
    slow down a system, not stop it.
     

    This doesn't sound like a probable cause. Though I must say I still
    don't particularly understand the description.
     

    Look into the disk stuff. Observe the disk lights; also run the
    commands I asked about earlier (dparam etc.).

    Link scodb into the kernel, break into the kernel debugger when the
    system hangs, look around. If you set up a serial console (easy, you've
    already got tty1a active), you can capture scodb sessions nicely. To do
    a temporary serial console, get to the boot prompt and type "systty=1".
    The boot prompt will move to tty1a == COM1 (9600/8/N/1). Have a PC
    waiting there with "capture" turned on. Control-X is the "enter scodb"
    command on a serial console. Once the machine hangs, hit ^X on the
    serial console, then:

    scodb> stack
    scodb> ps()

    There's much more you can do, but that's a decent start. One other
    thing you can do is capture a system image:

    scodb> sysdump()
    scodb> reboot()

    (the machine reboots as if you'd hit the power switch). Now go to
    single-user mode and run:

    sysdump -i /dev/swap -fu -o - | bzip2 > /tmp/dump.bz2

    Save that, I might want to look at it.
     
    Bela Guest

Similar Threads

  1. Flash content locks entire machine
    By Phoenix87ta in forum Macromedia Flash Player
    Replies: 0
    Last Post: May 13th, 08:36 PM
  2. filecopy from remote machine to local machine
    By anu in forum PERL Modules
    Replies: 5
    Last Post: February 8th, 03:49 PM
  3. Replies: 2
    Last Post: June 17th, 12:08 AM
  4. Getting files from a Win 2000 machine to a Windows XP machine
    By John Mycroft in forum Windows Networking
    Replies: 0
    Last Post: July 11th, 11:52 AM
  5. Replies: 2
    Last Post: December 30th, 07:40 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139