Professional Web Applications Themes

I/O performance with Promise HW RAID - MySQL

We use MySQL 4.0.26 on CentOS 3.6 x86_64 running on a server with dual 2.2GHz Opterons and 16GB RAM. The databases are strored on a Promise UltraTrak RM8000 in RAID 10 configuration, connected via an Adaptec SCSI card. We are seeing what appears to be gradual I/O performance degradation over time; it seems to be OK for up to about 90 days, but not long after that both CPUs end up continuously spending 50-99% of their time in "iowait" state when reading/writing the RAID device, and the mysqld-max processes begin to be stuck for minutes at a time in disk ...

  1. #1

    Default I/O performance with Promise HW RAID

    We use MySQL 4.0.26 on CentOS 3.6 x86_64 running on a server with dual
    2.2GHz Opterons and 16GB RAM. The databases are strored on a Promise
    UltraTrak RM8000 in RAID 10 configuration, connected via an Adaptec
    SCSI card. We are seeing what appears to be gradual I/O performance
    degradation over time; it seems to be OK for up to about 90 days, but
    not long after that both CPUs end up continuously spending 50-99% of
    their time in "iowait" state when reading/writing the RAID device, and
    the mysqld-max processes begin to be stuck for minutes at a time in
    disk wait state, until finally the server becomes unusable.

    A simple reboot, even with a forced fsck, does not clear this up, but a
    full shutdown followed by power cycling the RAID device and then
    rebooting, seems to return things to normal. However, I'm having a
    hard time believing that this is a hardware problem rather than a
    software one.

    After doing some research when this most recently happened, we have
    used elvtune to lower the read and write latency on /dev/sda4 (which is
    the mysql filesystem on the RAID) to 128 and 256 respectively. However,
    we don't yet know whether this will make any difference, as it has only
    been 48 hours since the power cycle and it usually takes months for the
    problem to become noticable. I'd like to get out ahead of it this time
    if I can, so that we either know when to schedule a power cycle or have
    some confidence that we won't need to.

    Any information or suggestions would be appreciated.

    (Below this point is just hardware data in case it is helpful.)

    Some data from "lshw":

    description: Motherboard
    product: GT24-B2891
    vendor: TYAN Computer Corp
    physical id: 0
    slot: H1 L1 Cache

    The SCSI card:

    description: SCSI storage controller
    product: AIC-7892A U160/m
    vendor: Adaptec
    physical id: 8
    bus info: pci09:08.0
    logical name: scsi0
    version: 02
    width: 64 bits
    clock: 66MHz
    capabilities: scsi bus_master cap_list scsi-host
    configuration: driver=aic7 latency=72 maxlatency=25 mingnt=40
    resources: ioport:3000-30ff iomemory:df300000-df300fff irq:24

    /proc/cpuinfo:

    processor : 0
    vendor_id : AuthenticAMD
    cpu family : 15
    model : 5
    model name : AMD Opteron(tm) Processor 248
    physical id : 255
    siblings : 1
    stepping : 10
    cpu MHz : 2210.197
    cache size : 1024 KB
    fpu : yes
    fpu_exception : yes
    cpuid level : 1
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
    mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
    3dnowext 3dnow
    bogomips : 4404.01
    TLB size : 1088 4K pages
    clflush size : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management: ts fid vid ttp

    processor : 1
    vendor_id : AuthenticAMD
    cpu family : 15
    model : 5
    model name : AMD Opteron(tm) Processor 248
    physical id : 255
    siblings : 1
    stepping : 10
    cpu MHz : 2210.197
    cache size : 1024 KB
    fpu : yes
    fpu_exception : yes
    cpuid level : 1
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
    mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
    3dnowext 3dnow
    bogomips : 4404.01
    TLB size : 1088 4K pages
    clflush size : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management: ts fid vid ttp

    barton.schaefer@gmail.com Guest

  2. #2

    Default Re: I/O performance with Promise HW RAID

    "com" <com> wrote:
     

    Quite surely not a MySQL problem. Any process waiting for I/O will
    become unusable (because it simply stops working).

    Did you check the syslog for suspicious messages around the SCSI
    subsystem? Does the RAID box do any logging?
     

    Very strong clue that the problem is in the storage box itself.
     

    Actually there *is* a lot of software involved. In the storage box
    works an i960RM RISC processor [1][2], doing all the RAID stuff and
    acting as SCSI target towards your host controller.

    If there is a bug in the firmware, one possible manifestation could
    be SCSI communication problems. Do you use the latest firmware?
    Did you talk to Promise about your problems?


    [1] http://www.cpu-world.com/CPUs/80960/Intel-GC80960RM100.html
    [2] interesting how much you can find out by taking a quick glance
    at the firmware: unzip 2_SX8K_B43.zip; strings SX8K_B43.BIN|less

    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  3. #3

    Default Re: I/O performance with Promise HW RAID

    On Oct 25, 8:08 am, Axel Schwenke <de> wrote:
     

    Some more details, gleaned from snapshots of the process list taken
    while the problem was occurring:

    Many mysql threads would be in "Locked" state, even for tables where
    the only active threads were read-only queries. We can see that a
    thread locks the table for write, spends much longer than usual for
    that operation, and then completes ... but other reading threads that
    stacked up on the write lock do not immediately wake up when the
    writing thread is gone, remaining in locked state often for 2 or more
    minutes after the lock should have been freed. (MyISAM tables
    involved.)
     

    The RAID box reported everything normal (via the serial-line command
    connection) and there's nothing interesting in syslog.

    Thanks for your advice.

    barton.schaefer@gmail.com Guest

  4. #4

    Default Re: I/O performance with Promise HW RAID

    "com" <com> wrote: 
    >
    > Some more details, gleaned from snapshots of the process list taken
    > while the problem was occurring:
    >
    > Many mysql threads would be in "Locked" state, even for tables where
    > the only active threads were read-only queries. We can see that a
    > thread locks the table for write, spends much longer than usual for
    > that operation, and then completes ... but other reading threads that
    > stacked up on the write lock do not immediately wake up when the
    > writing thread is gone, remaining in locked state often for 2 or more
    > minutes after the lock should have been freed. (MyISAM tables
    > involved.)[/ref]

    I've never seen something like that and in fact it's hard to believe.

    MyISAM tables use a simple mutex for the WRITE lock. So as soon as the
    WRITE is finished the suspended threads in the READ queue will continue.
    Unless your thread library is severely broken there should be no
    noticable delay.

    Can you please provide the output of SHOW FULL PROCESSLIST for such a
    situation? (you should execute this as superuser to see all threads)


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  5. #5

    Default Re: I/O performance with Promise HW RAID

    Axel Schwenke wrote: 
    >
    > I've never seen something like that and in fact it's hard to believe.[/ref]

    Well, yes, we were rather surprised, too.
     

    Unfortunately, I cannot. It has not recurred since the incident two
    weeks ago, so I can't get a fresh snapshot. At the time, we processed
    the output through perl to sort by Time and limit to the
    longest-running threads, because otherwise there was just too much to
    deal with (400+ threads, many of them sleeping or waiting for the
    delayed insert handler). If it happens again I'll certainly save some
    complete dumps from "mysqladmin -v processlist" for posting, but
    frankly I'm hoping that it never happens again.

    When you say "as superuser" ... does that mean as any user with full
    GRANT privileges? Or is it important that it actually be the "root"
    user? The shapshots we were looking at were generated from a non-root
    but full-access mysql account. Could that cause us to miss some
    threads?

    barton.schaefer@gmail.com Guest

  6. #6

    Default Re: I/O performance with Promise HW RAID

    "com" <com> wrote: 
    >
    > Unfortunately, I cannot. It has not recurred since the incident two
    > weeks ago, so I can't get a fresh snapshot.
    > .... If it happens again I'll certainly save some
    > complete dumps from "mysqladmin -v processlist" for posting, but
    > frankly I'm hoping that it never happens again.[/ref]

    Hope dies last (German proverb :-)
     

    There is a SUPER privilege. Any user with that privilege is a
    superuser. The user name is irrelevant. The SUPER privilege can
    only be granted globally (GRANT ... ON *.* TO ...).


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  7. #7

    Default Re: I/O performance with Promise HW RAID

    On Nov 2, 9:18 am, Axel Schwenke <de> wrote: 

    OK ... yes, the user used to grab the processlists did have that
    privilege.

    barton.schaefer@gmail.com Guest

Similar Threads

  1. Hardware RAID vs. Software RAID
    By Jack in forum Oracle Server
    Replies: 23
    Last Post: October 25th, 12:32 PM
  2. Promise RAID problems
    By Terence in forum Linux Setup, Configuration & Administration
    Replies: 2
    Last Post: October 6th, 07:01 AM
  3. [linux-raid] Problem converting raid-5 array from 2.2.17 into 2.4.22
    By John Doe in forum Linux Setup, Configuration & Administration
    Replies: 0
    Last Post: September 2nd, 10:37 PM
  4. Install Debian 3.0 with Promise MBFastTrak 133 Raid 0
    By Micha Börner in forum Linux Setup, Configuration & Administration
    Replies: 0
    Last Post: July 16th, 09:53 PM
  5. IRQ Conflict problems Promise TX2000 RAID, Asus A7M266-D
    By news.eclipse.co.uk in forum Linux Setup, Configuration & Administration
    Replies: 0
    Last Post: July 13th, 12:15 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139