Professional Web Applications Themes

A1000: Determining bad disk - Sun Solaris

I am seeing some SCSI transport failures in /var/adm/messages on one of my LUNs. The A1000 has all RAID5 luns. I suspect the disk is going bad. How do I go about identifying the bad disk? I do have the /pci..../sd4,1 path in the messages file, but how can I identify the physical disk from this path? The slots on the A1000 are labelled as [0,0], [0,1], etc. Thanks...

  1. #1

    Default A1000: Determining bad disk

    I am seeing some SCSI transport failures in /var/adm/messages on one of my
    LUNs. The A1000 has all RAID5 luns.

    I suspect the disk is going bad.

    How do I go about identifying the bad disk? I do have the /pci..../sd4,1
    path in the messages file, but how can I identify the physical disk from
    this path? The slots on the A1000 are labelled as [0,0], [0,1], etc.

    Thanks
    Vikas Guest

  2. #2

    Default Re: A1000: Determining bad disk

    In article <fu-berlin.de>,
    Vikas Agnihotri <mailshell.com> wrote:
     

    Not an easy thing. Somewhere in the /var/adm/messages file after a boot
    is the listing of sd device numbers, SCSI target # (slot dependent),
    and WWN#. During the device discovery process, a message is printed to
    /var/adm/messages. Use the WWN of the disk that's going bad and luxadm
    display all to map which drive is going bad.

    [taken from memory...it's been a couple years since I mucked with a
    photon box]

    --
    DeeDee, don't press that button! DeeDee! NO! Dee...



    Michael Guest

  3. #3

    Default Re: A1000: Determining bad disk

    In comp.unix.solaris "Michael wrote: 
     [/ref]

    Why? If you do, you should run rm6 and run a healthcheck.
     [/ref]

    You don't. The path is to the LUN, which is a virtual object, not a
    drive.
     

    A1000 is HVD SCSI hardware raid. No WWNs..
     

    That would be 5000 series.

    --
    Darren Dunham com
    Unix System Administrator Taos - The SysAdmin Company
    Got some Dr Pepper? San Francisco, CA bay area
    < This line left intentionally blank to confuse you. >
    Darren Guest

  4. #4

    Default Re: A1000: Determining bad disk

    On Thu, 25 Sep 2003 18:33:09 GMT, Darren Dunham <taos.com>
    wrote:
     [/ref]
    >
    > Why? If you do, you should run rm6 and run a healthcheck.[/ref]

    I dont like the rm6 GUI, the CLI equivalent is 'healthck', right? I did a
    'healthck -a' and got 'Optimal'. I didnt expect anything else.

    I dont know how thorough 'healthck' is anyway. Say the disk was going bad,
    and I knew about it proactively, I could, on-demand, mark the drive failed
    using 'drivutil' and take the reconstruction hit when I want to instead of
    waiting for it to happen anytime!

    How about 'parityck', is that a more exhaustive disk check?

    Anyway, in this particular case, as it turned out, my SCSI errors were due
    to the "disconnected tagged commands", for which Sun support suggested that
    I consider reducing 'set sd:sd_max_throttle' (in /etc/system) to something
    like 10 (default is 256) or so.

    Is this common practice to throttle down the 'sd' driver with the RAID
    A1000? Is this because the disks are too fast for the sd driver? [Or is it
    the other way around?]

    Thanks
    Vikas Guest

  5. #5

    Default Re: A1000: Determining bad disk

    On Wed, 24 Sep 2003 23:01:14 -0400, Vikas Agnihotri
    <mailshell.com> wrote:
     

    /usr/lib/osa/bin/rm6
    John Guest

  6. #6

    Default Re: A1000: Determining bad disk

    In article <fu-berlin.de>,
    Vikas Agnihotri <mailshell.com> wrote: 

    Does the A1000 have a _differential_ terminator on the back of it? The
    vast majority of SCSI transport errors on A1000's are because someone
    forgot to put a terminator on the back - the A1000's do NOT auto-terminate
    like most other Sun disks.

    Scott
    Scott Guest

  7. #7

    Default Re: A1000: Determining bad disk



    On Sat, 27 Sep 2003, Vikas Agnihotri wrote:
     
    > >
    > > Why? If you do, you should run rm6 and run a healthcheck.[/ref]
    >
    > I dont like the rm6 GUI, the CLI equivalent is 'healthck', right? I did a
    > 'healthck -a' and got 'Optimal'. I didnt expect anything else.[/ref]

    Well, if you get Optimal, then the A1000 itself thinks its OK.
     

    If a disk was failing, the A1000 would probably give you a few events
    anyway, its quite good at kicking bad drives. Thats why you use Raid5, so
    that it CAN kick a drive without you loosing data. If you have a hotspare
    activated, thats even better.
     

    Yes and no, it checks the parity of the raid5, which as it happens it does
    by reading all the diskdata, which would in a way, test the disk, but its
    the data on them thats really checked.
     

    Yup, its in the best practive for A1000-A3500
     

    I'll leave that for a SCSI expert, but basically it doesnt have to do with
    slow or fast, but rather on how many scsi commands you can "queue" to the
    controller. Dont remember all the facts, but I'm sure someone else does
    :-)

    /Johan A
    Mr. Guest

  8. #8

    Default Re: A1000: Determining bad disk

    In comp.unix.solaris Vikas Agnihotri <mailshell.com> wrote: 
    >>
    >> Why? If you do, you should run rm6 and run a healthcheck.[/ref][/ref]
     

    Then why do you think a disk is going bad?
     

    I suppose, but what makes you think a disk is going bad?
     
     

    Yes. That's what I was (badly) trying to say. Since the OS can't "see"
    any of the disks anyway, any scsi errors in /var/adm/messages will be
    unrelated to disk errors. They would have to do with the RAID
    controller, the cable, the host adapter, and any scsi settings.
     

    Which adapter is this? I see a Sun Alert here...

    http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsalert/22803

    --
    Darren Dunham com
    Unix System Administrator Taos - The SysAdmin Company
    Got some Dr Pepper? San Francisco, CA bay area
    < This line left intentionally blank to confuse you. >
    Darren Guest

Similar Threads

  1. A1000: drivutil vs. raidutil
    By Vikas in forum Sun Solaris
    Replies: 11
    Last Post: October 2nd, 06:41 PM
  2. RAID A1000: Cant see the module
    By Vikas in forum Sun Solaris
    Replies: 7
    Last Post: August 27th, 06:03 PM
  3. SE A1000, Intel, Linux
    By Pawel PIWOWAREK in forum Sun Solaris
    Replies: 3
    Last Post: July 18th, 04:04 AM
  4. Replies: 0
    Last Post: July 4th, 08:06 AM
  5. A1000 controller
    By Terry in forum Sun Solaris
    Replies: 1
    Last Post: June 27th, 07:37 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139