Managing 300+ io's per second

Ask a Question related to AIX, Design and Development.

  1. #1

    Default Managing 300+ io's per second

    Hi All,

    I'm fairly new to AIX tuning. I try to combine my knowledge of Oracle
    and AIX to improve the performance of database servers.

    This is the story: today I ran into an application that generated 15
    million IO's in a 17 hour period. That's approximately 250 ios per
    second on average. Regarding the fact that our EMC guys tell us that
    the storage subsystem is not too busy (and I have no reason to think
    otherwise), I started looking at some AIX parameters. The database is
    stored in 9 filesystems, mounted on 1 huge EMC 16-way striped LV
    (250G).

    I run vmtune -a every ten minutes. From the output I conclude that
    there is a shortage in bufstructs and pbufs (hd_pendblked and
    fsbufwaitcnt / psbufwaitcnt increase quite rapidly). I think adding
    more async IO servers won't help in that case; first we have to get
    rid of the waits. We already set the following vmtune command:
    vmtune -b 186 -p20 -P40

    The 9 filesystems are pounded with an equal amount of io. If my
    calculations are right, the system should be able to service 9*186 ios
    at a time. Or am I missing something?

    1) Is it OK to increase numfsbuf to, say, 400? What are the costs? We
    have plenty of memory...
    2) Can we safely increase numpsbufs using the vmtune -B command? Any
    side-effects?
    3) Are there any more AIX parameters we can use to facilitate this
    load?

    Thanks in advance.

    Regards,
    Hans de Git
    Hans de Git Guest

  2. Similar Questions and Discussions

    1. Help in managing the alert
      In my application, I need to show the alert on the parent window for displaying some message. As soon as the alert appears, parent window(below the...
    2. Managing all these instances
      I really dont want to load up 6 seperate coldfusion administrators- that would be dull - any idea how I manage this??
    3. managing date
      Hi, I've got a string that contains a date with the following format : yyyy-mm-dd hh:mm:ss ex : 2003-10-10 11:37:47 I'd like to get 4...
    4. How are you managing your users?
      I'm embarking on a project, aimed at managing our user accounts. I'm hoping to get some suggestions from you folks. I really want to avoid having to...
    5. managing DB schema with VSS
      Execution Plans have now been incorporated into mssqlXpress Jacco and some more of the complex code complete features are been developed. ***...
  3. #2

    Default Re: Managing 300+ io's per second

    Hans de Git wrote:
    > Hi All,
    >
    > I'm fairly new to AIX tuning. I try to combine my knowledge of Oracle
    > and AIX to improve the performance of database servers.
    >
    > This is the story: today I ran into an application that generated 15
    > million IO's in a 17 hour period. That's approximately 250 ios per
    > second on average. Regarding the fact that our EMC guys tell us that
    > the storage subsystem is not too busy (and I have no reason to think
    > otherwise),
    - AIX Version (if V5, use JFS2 instead of JFS).
    - Convert the whole database to AIX raw LVs. (Planning, loads of planning).
    - Oracle version ? 32 or 64 bits ?
    - Have you checked the amount of full table scans ?
    > [...] Or am I missing something?
    Yes. Performance starts at the application level, then DB and only OS
    in the end.
    > 1) Is it OK to increase numfsbuf to, say, 400? What are the costs? We
    > have plenty of memory...
    Increase the Oracle SGA.

    --
    Doing AIX support was the most monty-pythonesque
    activity available at the time.
    Eagerly awaiting my thin chocolat mint.

    Jose Pina Coelho Guest

  4. #3

    Default Re: Managing 300+ io's per second

    Jose Pina Coelho <eresquigal@netcabo.pt> wrote in message news:<3f6f7f10$0$210$a729d347@news.telepac.pt>...
    > - AIX Version (if V5, use JFS2 instead of JFS).
    Not feasible. At least not for a year or so..
    > - Convert the whole database to AIX raw LVs. (Planning, loads of planning).
    As a last resort. I'm very much in favor of raw volumes. However,
    'normal' files are easier to manage.
    > - Oracle version ? 32 or 64 bits ?
    Oracle 8.1.7.4, 32 bits. Cannot change that.
    > - Have you checked the amount of full table scans ?
    What is wrong with a full table scan?? If it means lesser IO's to full
    scan a table, then that's the best execution plan for the SQL
    statement.
    > Yes. Performance starts at the application level, then DB and only OS
    > in the end.
    Agreed. But: you don't know what effort has been taken to tune the
    application and the database design. The application is good, the
    desing is OK, the amount of IO's is inevitable. So, we're at the third
    step: OS.
    > Increase the Oracle SGA.
    Wanna keep a 300G database in memory?

    So, I'll ask again:
    1) any reason why we should not increase numfsbuf/numpsbuf?
    2) Other suggestions? (BTW: the raw-volume option is a good one).

    Thanks.
    Hans de Git Guest

  5. #4

    Default Re: Managing 300+ io's per second


    "Hans de Git" <hansdegit@hotmail.com> schrieb im Newsbeitrag
    news:72383d4a.0309222300.2134a967@posting.google.c om...
    > Jose Pina Coelho <eresquigal@netcabo.pt> wrote in message
    news:<3f6f7f10$0$210$a729d347@news.telepac.pt>...
    > > - AIX Version (if V5, use JFS2 instead of JFS).
    > Not feasible. At least not for a year or so..
    >
    > > - Convert the whole database to AIX raw LVs. (Planning, loads of
    planning).
    > As a last resort. I'm very much in favor of raw volumes. However,
    > 'normal' files are easier to manage.
    >
    > > - Oracle version ? 32 or 64 bits ?
    > Oracle 8.1.7.4, 32 bits. Cannot change that.
    >
    > > - Have you checked the amount of full table scans ?
    > What is wrong with a full table scan?? If it means lesser IO's to full
    > scan a table, then that's the best execution plan for the SQL
    > statement.
    >
    > > Yes. Performance starts at the application level, then DB and only OS
    > > in the end.
    > Agreed. But: you don't know what effort has been taken to tune the
    > application and the database design. The application is good, the
    > desing is OK, the amount of IO's is inevitable. So, we're at the third
    > step: OS.
    >
    > > Increase the Oracle SGA.
    > Wanna keep a 300G database in memory?
    >
    > So, I'll ask again:
    > 1) any reason why we should not increase numfsbuf/numpsbuf?
    > 2) Other suggestions? (BTW: the raw-volume option is a good one).
    >
    > Thanks.
    Hallo Hans,

    from my experience with oracle on raw devices I would not expect more than a
    5% performance 'boost' - therfore I do not think that this is worth the
    hazzle. 32 Bit Oracle with jfs is fine as it is much easier to administer.
    Stick with it.
    You say that you think it does not make sense to use more aio server and
    that you rely on the information from the 'EMC guys' that the storage
    subsystem is not too busy..... So I assume you are more on the db admin side
    of life :-) From the AIX administration point of view I would be interested
    in more information about type and configuration of the server and the
    storage subsystem you are using: how many physical disks of what type and
    size are utilised over how many power paths? What does the disk layout look
    like, are there any hot spots in the EMC regarding your application? It
    would also help if you give the vmstat output itself. If you just let us
    know your interpretation of the output you are directing possible answers
    into a direction you already suspect. However, that might not be helpful.
    The same applies to your statement that you 'already set the following
    vmtune command: vmtune -b 186 -p20 -P40' Without information of how much
    physical ram your server has ('plenty' ...) nobody who has no crystal ball
    will be able to tell you whether it makes sense.

    Regards,
    Andreas


    Andreas Schulze Guest

  6. #5

    Default Re: Managing 300+ io's per second

    >So I assume you are more on the db admin side of life

    Yep, I think I know a bit of Oracle performance tuning, with a little
    understanding of AIX performance tuning.

    Server is an M80, dedicated to the database. 4 fiberchannels plugged
    directly into symmetrix 8830. Not sure how many PowerPaths. I'm pretty
    sure the DB is placed on a 16-disk EMC stripe. Raw redologs are on
    separate devices, which are warm, not hot.

    'Plenty RAM' equals 20G. Oracle SGA is about 3G.

    Look at it this way: with a peak of 1528 i/o's per second (tablespaces
    only), what can I do to make sure that AIX will never be a bottleneck?
    Our EMC guys assure me that they can alleviate every IO problem. "we
    have a large cache, right?" Sure... OK, in that case, I want to stress
    the symmetrix as much as I can.

    vmtune output:
    =>/usr/samples/kernel/vmtune
    vmtune: current values:
    -p -P -r -R -f -F -N
    -W
    minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages
    maxrandwrt
    262141 1310707 2 8 1024 4096 524288
    0

    -M -w -k -c -b -B -u
    -l -d
    maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt
    lrubucket defps
    2097132 49152 12288 1 186 481 9
    131072 1

    -s -n -S -L -g
    -h
    sync_release_ilock nokilluid v_pinshm lgpg_regions lgpg_size
    strict_maxperm
    0 0 0 0 0
    0

    number of valid memory pages = 2621415 maxperm=50.0% of real memory
    maximum pinable=80.0% of real memory minperm=10.0% of real memory
    number of file memory pages = 1597882 numperm=61.0% of real memory


    So, will it hurt to adjust the vmtune parms?

    BTW: thanks for your input.

    HTH,
    Hans de Git
    Hans de Git Guest

  7. #6

    Default Re: Managing 300+ io's per second

    "Hans de Git" <hansdegit@hotmail.com> schrieb im Newsbeitrag
    news:72383d4a.0309232310.3fd0489c@posting.google.c om...
    > Server is an M80, dedicated to the database. 4 fiberchannels plugged
    > directly into symmetrix 8830. Not sure how many PowerPaths. I'm pretty
    > sure the DB is placed on a 16-disk EMC stripe. Raw redologs are on
    > separate devices, which are warm, not hot.
    >
    > 'Plenty RAM' equals 20G. Oracle SGA is about 3G.
    >
    > Look at it this way: with a peak of 1528 i/o's per second (tablespaces
    > only), what can I do to make sure that AIX will never be a bottleneck?
    > Our EMC guys assure me that they can alleviate every IO problem. "we
    > have a large cache, right?" Sure... OK, in that case, I want to stress
    > the symmetrix as much as I can.
    >
    > vmtune output:
    > =>/usr/samples/kernel/vmtune
    > vmtune: current values:
    > -p -P -r -R -f -F -N
    > -W
    > minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages
    > maxrandwrt
    > 262141 1310707 2 8 1024 4096 524288
    > 0
    >
    > -M -w -k -c -b -B -u
    > -l -d
    > maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt
    > lrubucket defps
    > 2097132 49152 12288 1 186 481 9
    > 131072 1
    >
    > -s -n -S -L -g
    > -h
    > sync_release_ilock nokilluid v_pinshm lgpg_regions lgpg_size
    > strict_maxperm
    > 0 0 0 0 0
    > 0
    >
    > number of valid memory pages = 2621415 maxperm=50.0% of real memory
    > maximum pinable=80.0% of real memory minperm=10.0% of real memory
    > number of file memory pages = 1597882 numperm=61.0% of real memory
    >
    >
    > So, will it hurt to adjust the vmtune parms?
    >
    > BTW: thanks for your input.
    >
    > HTH,
    > Hans de Git
    Hallo Hans,

    so you have a M80 with 20 Gig and your data is striped over 16 physical
    disks (which your server has exclusive rights for hopefully). I also assume
    that the logical volumes your database uses are spread exactly of all 16
    disks. This should be the recommended Oracle/EMC installation anyway. But
    what about the number of cpus, the aio servers and what is a 'large' EMC
    cache? (To me a large cache starts with 32 GB ;-))
    You want to stress the symmetrix - I suggest to start tuning the following
    settings.
    Do a
    # lsattr -El aio0
    This will give you min- and maxservers for asynchronous I/O. You did not
    mention it but the default is likely to be 1 and 10. You can change
    minservers to 2*(number of cpus) but you should change maxservers to at
    least 80. Increase maxservers up to 160 in steps of 20 if you can afford the
    time. Check that the cpu load can handle this (have had higher values on a
    2-way M80 with 8Gig and 4 disks so I do not think that it will be a cpu
    problem but results may be different depending on the sql requests)
    Secondly increase maxreqs to 8192. After increasing aioserves and maxreqs
    watch your DB in action. Ask the EMC guys whether it makes a difference on
    their system and if it is neither a problem for your cpus nor the EMC then
    increase maxreqs to 12288 or even 16384. Should you happen to use the AIX
    default values till now, changing them should speed up things as the aio is
    kernelized and therefore really fast.
    Changes are set with
    # smitty chaio
    but, unfortunately, values will be in effect after the next reboot only.
    Sorry.

    Now, you need to know whether the stripe size on the EMC fits the block size
    your DB uses to calculate how many pages AIX shall read in advance. You did
    not tell us whether you have large sequential reads/writes and whether you
    have OLTP apps or Data Warehouse apps but I think your maxpgahead value is
    way too low at the moment. AIX uses 4K Blocks while on the Oracle side 8k
    and 16k are much more common. Furthermore the data is striped over 16 disks
    (this leads to the wanted/hoped for effect that a file consisting of 16
    blocks is spread exactly over 16 disks leading to it being read in on go by
    every disk reading on block at the same time). Thus I would increase
    minpgahead at least to a minimum value of 16 and maxpgahead to 64 (must be
    power of 2). AIX doubles this value on every consecutive read up to
    maxpgahead (e.g. first read x, second request x*2, third request (x*2)*2
    4KBlocks). Rule of thumb is (stripe size * #physical disks)/4K rounding up
    to the nearest power of 2. So maxpgahead may be much higher for really large
    reads/writes on large stripe sets (e.g. 256, 512). Mind also that the larger
    the DB blocksize, the higher the percentage of split blocks and consequently
    the I/O performance degregation possible. You will find the best setting
    yourself.

    Another problem is file caching. I do not understand your maxperm setting
    which is 50 percent of the available physical memory. With a 3Gig SGA on a
    20Gig M80 I would expect (and recommend for a start) something like
    minperm -p 15 and maxperm -P 20. This would leave you with a cache of at
    least 16 GB (for your database data). If you can afford the time for testing
    and need the memory desperately you might adjust the minperm value in 1
    percent steps. Subsequently after reducing minperm you could decrease
    maxperm then too but probably not below 15. Opposed to the aio settings
    these settings are in effect immediately but will be lost after a reboot if
    not placed in the /etc/rc.local. Keep your eye on paging space utilisation
    during testing.

    It might also be worthwhile to check the size of your jfs logs as the
    default size is too small for large filesystems (which usually applies to db
    filesystems). IBM recommends 2 Megs jfslog per GB of filesystem. To make
    sure that your jfslogs are not becomming bottlenecks during heavy writes
    spread the jfslogs across as many disks as possible (16 in your case). You
    can change this for existing filesystems while they are not mounted (no
    reboot necessary). Remove the old jfslog, create a new one with 128K
    stripes, logform it and mount the filesystem again.

    One last thought (because I know you have been waiting for it ;-)): of
    course you can increase the numfsbufs -b for a high I/O rate, my
    recommendation for your server/storage combination is 980 (5*default).

    HTH,
    Andreas


    Andreas Schulze Guest

  8. #7

    Default Re: Managing 300+ io's per second

    > so you have a M80 with 20 Gig.
    Oops, my mistake. There are 'only' 10 GBytes.

    Not sure whether only we have access to the spindles. Database is
    about 250G, a 'standard' EMC Hyper is about 9 gigs, so yeah, I do
    believe that those 16 disks are there for our exclusive usage. Its
    actually one large VG over which we have mounted several filesystems.
    It's an EMC stripe, I believe (though I'm not sure) that the load is
    evenly balanced.

    My biggest problem is that I don't have access to the machine. I can
    only see things from within Oracle and through the eyes of the system
    enigneers. I'll try, as well as I can, to answer your questions. I
    believe the box is equipped with 6 cpu's.

    About the aio servers: min 75, max 150, maxrqs default 4096.
    EMC cache is max, 64G.

    About the maxpgahead values and caching of AIX. Read ahead is only
    interesting when
    a) Oracle blocks are in the same order as JFS pages
    and b) Oracle extents are huge
    and c) we do lots of full table scans.

    You can't be sure about a), b) is true, c) is not true. We do a lot
    of indexed IO on the io intensive tablespaces. That makes read ahead a
    pain in the ...because AIX is reading blocks that Oracle will not
    need.
    > Another problem is file caching. I do not understand your maxperm setting
    > which is 50 percent of the available physical memory.
    Well, BIG files are downloaded onto to the filesystem and loaded into
    the database afterwards. Perhaps maxperm can be lowered a bit....will
    try.
    > It might also be worthwhile to check the size of your jfs logs as the
    > default size is too small for large filesystems (which usually applies to db
    > filesystems)
    Thank you, will look into it.
    > One last thought (because I know you have been waiting for it ;-)): of
    > course you can increase the numfsbufs -b for a high I/O rate, my
    > recommendation for your server/storage combination is 980 (5*default).
    I always believed (and am pretty sure) that the default number of
    numfsbufs is 93 per filesystem....

    One more thing: I see that the shortage of fsbufs is at its peak
    levels whenever a log is archived. To archive a log, Oracle has to
    read from the raw redolog, then put that data into a filesystem (which
    is on a different VG). Suppose I want the archiving process to be as
    fast as possible (even though it is a background job) is it safe to
    increase lvm_bufcount to, say, 30?

    You have been of great help so far, Andreas. If I can ever be of any
    help regarding Oracle, don't hesitate to send an email.

    HTH,
    Hans
    Hans de Git Guest

  9. #8

    Default Re: Managing 300+ io's per second

    "Hans de Git" <hansdegit@hotmail.com> schrieb im Newsbeitrag
    news:72383d4a.0309241131.1447a8d2@posting.google.c om...

    Hallo Hans,
    > About the aio servers: min 75, max 150, maxrqs default 4096.
    > EMC cache is max, 64G.
    do a
    # pstat -a |grep -c aio
    Should the result be 150 push maxservers to 160. In any case increase
    maxreqs as 4096 is probably much too low. Watch the CPUs (average
    utilisation should be around 70%) and disks (average utilisation around 40
    %).
    >
    > About the maxpgahead values and caching of AIX. Read ahead is only
    > interesting when
    > a) Oracle blocks are in the same order as JFS pages
    They really should....
    > and b) Oracle extents are huge
    > and c) we do lots of full table scans.
    > You can't be sure about a), b) is true, c) is not true. We do a lot
    > of indexed IO on the io intensive tablespaces. That makes read ahead a
    > pain in the ...because AIX is reading blocks that Oracle will not
    > need.
    Well ... with the 64 GB cache ( great! :-)) in the EMC in mind I still think
    that it makes sense to adapt the read ahead configuration to the EMC stripe
    size. Did you test your DB's behavior with larger values, did it really slow
    down? I would be surprised. You also need to put AIX read ahead into
    correlation with db_file_multiblock_read_count_parameter (which AFAIK
    applies for Oracle full table scans).
    > > you can increase the numfsbufs -b for a high I/O rate, my
    > > recommendation for your server/storage combination is 980 (5*default).
    >
    > I always believed (and am pretty sure) that the default number of
    > numfsbufs is 93 per filesystem....
    Where did you get that information from? As always this number might give a
    starting point but is hardly true for all type of fs usage. IBM would put it
    this way: Your personal mileage may vary :-)
    Btw. you can speed up Oracle by using fewer (larger) files as Oracle looks
    at every tablespace/log on every start/stop and thus a smaller number of
    files results in faster start/stop sequences.
    >
    > One more thing: I see that the shortage of fsbufs is at its peak
    > levels whenever a log is archived. To archive a log, Oracle has to
    > read from the raw redolog, then put that data into a filesystem (which
    > is on a different VG).
    And on a different physical disk, hopefully.
    >Suppose I want the archiving process to be as
    > fast as possible (even though it is a background job) is it safe to
    > increase lvm_bufcount to, say, 30?
    Hmh, as you are using both jfs and raw devices I admit that I am not sure
    about how Oracle behaves in this respect because I have not dealt with such
    a combination. Should your default setting for lvm_bufcount be around 10 I
    would start with doubling it to about 20 and watch whether it has any
    effect. Running in the background should not be a problem. Btw. at least
    this is a situation where read ahead might help a lot, don't you think? As
    this is about the archive logs: what are your settings of the
    log_archive_buffer_size and how many log_archive_buffers do you use? Can you
    tune some values there?
    >
    > You have been of great help so far, Andreas. If I can ever be of any
    > help regarding Oracle, don't hesitate to send an email.
    At my current project I am dealing not with Oracle but with DB2 on AIX -
    however I will remember your offer another day ;-)
    >
    > HTH,
    > Hans
    Regards,
    Andreas



    Andreas Schulze Guest

  10. #9

    Default Re: Managing 300+ io's per second

    Hallo Andreas,
    > maxreqs as 4096 is probably much too low.
    I read (on the net) that: maxreqs = maxservers * queue_depth rounded
    up to a multiple of 4096. Queue_depth is configured at 8 (an EMC
    engineer just told me to boost that up to 32), so increasing maxreqs
    wont help while we stay on a queue_depth of 8, beacuse the ios would
    be blocked by AIX. By increasing maxreqs we could, perhaps, cache only
    one DBWR sweep. At some stage, the io's have to be flushed to the
    symmetrix.

    BTW: what is your setting of queue_depth? (do a lsattr -El
    hdiskpowerX).
    > Did you test your DB's behavior with larger values, did it really slow
    > down? I would be surprised. You also need to put AIX read ahead into
    > correlation with db_file_multiblock_read_count_parameter (which AFAIK
    > applies for Oracle full table scans).
    Nope, did not test it. AFAIK, when Oracle wants a 16k block
    (db_block_size=16384), the read-ahead mechanism actually reads 28k,
    because 4+8+16=28K. Not nice when doing random i/o...

    Check this link: (also for default numfsbufs)
    [url]http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IBMp690/IBM/usr/share/man/info/en_US/a_doc_lib/aixbman/prftungd/2365c89.htm[/url]
    > Btw. you can speed up Oracle by using fewer (larger) files as Oracle looks
    > at every tablespace/log on every start/stop and thus a smaller number of
    > files results in faster start/stop sequences.
    Actually, every session opens the db files. So, using lesser files
    results in faster session creation. But, when using more filesystems
    (and thus using more bufstructs) the tablespace has to be split up
    into smaller files.
    > this is a situation where read ahead might help a lot, don't you think?
    No, because Oracle would be reading from a raw volume. The VMM does
    not do read-ahead on raw volumes. Perhaps EMC read-ahead would help.
    > what are your settings of the
    > log_archive_buffer_size and how many log_archive_buffers do you use? Can you
    > tune some values there?
    No, because these parameters do not exist (anymore). Not in Oracle 8i,
    not in 9i.

    This is getting more and more interesting. From what I've read, moving
    to AIX5 / JFS2 / Direct-IO will help a lot. But that is not an option
    for now.

    Do we ever stop learning?

    Regards,
    Hans
    Hans de Git Guest

  11. #10

    Default Re: Managing 300+ io's per second

    [email]hansdegit@hotmail.com[/email] (Hans de Git) wrote in message news:<72383d4a.0309260707.27b32c50@posting.google. com>...

    Hallo Hans,
    >
    > > maxreqs as 4096 is probably much too low.
    > I read (on the net) that: maxreqs = maxservers * queue_depth rounded
    > up to a multiple of 4096. Queue_depth is configured at 8 (an EMC
    > engineer just told me to boost that up to 32), so increasing maxreqs
    > wont help while we stay on a queue_depth of 8, beacuse the ios would
    > be blocked by AIX. By increasing maxreqs we could, perhaps, cache only
    > one DBWR sweep. At some stage, the io's have to be flushed to the
    > symmetrix.
    > BTW: what is your setting of queue_depth? (do a lsattr -El
    > hdiskpowerX).
    yeah, there are quite some information about settings available.
    However, even information from IBM regarding the same parameters are
    not always consistent You might even find settings recommended that
    are mutual exclusive. The values I suggestet are my own, from my
    personal experience with IBM, EMC and Hitachi SAN storage in
    combination with Oracle and DB2 UDB database systems. And from my
    personal experiences I think you might push your queue depth up to a
    value of 24. Use your best judgement to find the adequate setting for
    your system. This requires some tests (at the cost of some downtime).
    >
    > > Did you test your DB's behavior with larger values, did it really slow
    > > down? I would be surprised. You also need to put AIX read ahead into
    > > correlation with db_file_multiblock_read_count_parameter (which AFAIK
    > > applies for Oracle full table scans).
    >
    > Nope, did not test it. AFAIK, when Oracle wants a 16k block
    > (db_block_size=16384), the read-ahead mechanism actually reads 28k,
    > because 4+8+16=28K. Not nice when doing random i/o...
    Your AIX settings (from No. 5 in thread) would read ahead
    2*(4096)+4*(4096)=18K with 2 reads. Btw. did not write that explicitly
    before so it might have been overlooked but best blocksize setting for
    AIX/Oracle is 4K when using jfs and lots of random I/O. However, when
    you have got 16K Blocks within Oracle setting a minpagahead of 4 makes
    sense as it would read 16K in one go.
    >
    > Check this link: (also for default numfsbufs)
    > [url]http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IBMp690/IBM/usr/share/man/info/en_US/a_doc_lib/aixbman/prftungd/2365c89.htm[/url]
    Ah, right so you are using AIX 5.1 - from your message (No. 3 in
    thread) I saw that you cannot use JFS2 and understood therefrom that
    you are using 4.3.3 instead. Nonetheless, reading ahaed 18 K (of which
    you actually need 16K) when you have got 64 Gigabyte of Cache in the
    EMC should not be a problem. You should give a higher value for read
    ahead/numfsbufs etc. a try, as the official information might result
    from a complete different machine/software installation and thus your
    results might differ quite a lot from the official recommendation.
    When you check these recommended settings, which I always refer to as
    a good starting point, with some real life installation or benchmarks
    you will find that nobody ever ends up with the first values from the
    documentation. Usually they are too low.
    >
    > > Btw. you can speed up Oracle by using fewer (larger) files as Oracle looks
    > > at every tablespace/log on every start/stop and thus a smaller number of
    > > files results in faster start/stop sequences.
    > Actually, every session opens the db files. So, using lesser files
    > results in faster session creation. But, when using more filesystems
    > (and thus using more bufstructs) the tablespace has to be split up
    > into smaller files.
    Did not get that - do you create one filesystem per tablespace?
    >
    > > this is a situation where read ahead might help a lot, don't you think?
    > No, because Oracle would be reading from a raw volume. The VMM does
    > not do read-ahead on raw volumes. Perhaps EMC read-ahead would help.
    Did not get that either - isn't the read ahead from raw devices
    organised from within Oracle? That is what I referred to but my Oracle
    know how is not so good when it comes to those details.
    >
    > > what are your settings of the
    > > log_archive_buffer_size and how many log_archive_buffers do you use? Can you
    > > tune some values there?
    > No, because these parameters do not exist (anymore). Not in Oracle 8i,
    > not in 9i.
    I worked with Oracle 8.1.6i and 8.1.7i in January the last time but
    did not realise that these parameters do not exist anymore. However,
    I'd guess that these parameters either changed their names or that the
    settings are selftuning by now. I do not think that they were dropped
    completely.
    >
    > This is getting more and more interesting. From what I've read, moving
    > to AIX5 / JFS2 / Direct-IO will help a lot. But that is not an option
    > for now.
    From what I see and hear 64 Bit AIX and JFS2 is not so stable right
    now so I would stay away from it for production if I have the choice.
    Remember it took 4.3.3 to pass maintenance level 8 to become 'good'
    (most complains stopped after ML 9 only). We are on ML 4 with 5.1...
    Takes some time.
    >
    > Do we ever stop learning?
    Probably we should not ;-) Too many things are changing - within the
    operating system (sometimes great differences even within different
    maintenance levels) and within the databases (think of how Oracle
    changed handling of raw devices from 9.1 to 9.2(.0.3). See the dropped
    parameters for log_buffers from 8i on :-( Coming to think about this:
    at least we will never get bored :-)
    >
    > Regards,
    > Hans
    Regards,
    Andreas
    Andreas Schulze Guest

  12. #11

    Default Re: Managing 300+ io's per second

    Hi Andreas,

    See comments below.
    > Your AIX settings (from No. 5 in thread) would read ahead
    > 2*(4096)+4*(4096)=18K with 2 reads. Btw. did not write that explicitly
    > before so it might have been overlooked but best blocksize setting for
    > AIX/Oracle is 4K when using jfs and lots of random I/O. However, when
    > you have got 16K Blocks within Oracle setting a minpagahead of 4 makes
    > sense as it would read 16K in one go.
    Read ahead is triggered at the moment when the second consecutive page
    is read, so with a minpageahead of 4, I would still be reading in too
    much data: a minumimum of 20K in 2 goes. And when Oracle accesses a
    page that is already read in by the read-ahead mechanism, more read
    aheads occur. No, Oracle knows best what data is requested.
    I also do not believe that the AIX read-ahead mechanism speeds up full
    table scans (or multiblock reads in general). The multiblock reads are
    scheduled in groups (N parallel reads of db_file_multiblock_read_count
    * db_block_size).
    > Ah, right so you are using AIX 5.1
    No, that's just the webpage that tells me something about the
    techniques implemented in the VMM. Is there a webpage that reveals how
    the 4.3.3 VMM works?
    > do you create one filesystem per tablespace?
    Almost...maximum of three files in heavily accessed filesystems.
    > From what I see and hear 64 Bit AIX and JFS2 is not so stable right
    > now so I would stay away from it for production if I have the choice.
    > Remember it took 4.3.3 to pass maintenance level 8 to become 'good'
    > (most complains stopped after ML 9 only). We are on ML 4 with 5.1...
    > Takes some time.
    Bummer...
    I'm still investigating whether Oracle opens its files with the
    o_direct option in AIX 5.2. For future use. I'll let you know.

    HTH,
    Hans
    Hans de Git Guest

  13. #12

    Default Re: Managing 300+ io's per second

    [email]hansdegit@hotmail.com[/email] (Hans de Git) wrote in message news:<72383d4a.0309300439.6459d09c@posting.google. com>...
    >
    Hallo Hans,
    > > Ah, right so you are using AIX 5.1
    > No, that's just the webpage that tells me something about the
    > techniques implemented in the VMM. Is there a webpage that reveals how
    > the 4.3.3 VMM works?
    The best place to start is the central IBM page for 4.3.3
    documentation:
    [url]http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/aix43.htm[/url]
    Choose the Performance Management Guide and search for VMM from there.
    Btw. I found the value of 93 numfsbufs there you mentioned a few
    postings ago. However, I still suspect that this value was valid five
    years ago (when the documentation was written) and does not make sense
    on a pSeries today.

    HTH,
    Andreas
    Andreas Schulze Guest

  14. #13

    Default Re: Managing 300+ io's per second

    Hi Hans, Picked up on this thread a little late...

    You have JFS files for the DBF's and you are not using direct i/o..
    so, you need to make the most out of your JFS cache.. I would look at
    upping the maxperm value and see if your buffer cache hit ratios
    increase any more with time.

    Please post your current vmtune (and -a) and vmstat 1 10 under load.

    From the formatting of the vmtune in the thread.. i cannot see if
    sync_release_ilock is on, you are using JFS files, so set it to "1" if
    you have not already done so...

    Also, you will want to be using filemon to map your i/o patterns ..
    i.e no good tuning for sequential writes, when 99% of the time you are
    doing random reads..

    Also, this will tell you if you are hitting any jfslog problems..

    You are gonna have to work out which tables are being hit, which lv's
    they are on.. have you got enough indexes on that table etc...

    you can't tune AIX and not tune the DB .. it wont work, data placement
    and finely tuned and optimised SQL are key, then tune AIX to match up
    with what you find...

    EMC should be ok, as long as you have spread your filesystems across
    many ranks and busy LVs dont cause hotspots on the EMC.

    Rgds
    Mark
    mark taylor Guest

  15. #14

    Default Re: Managing 300+ io's per second



    Andreas Schulze wrote:

    > Btw. I found the value of 93 numfsbufs there you mentioned a few
    > postings ago. However, I still suspect that this value was valid five
    > years ago (when the documentation was written) and does not make sense
    > on a pSeries today.

    Yes, thats true for many parameters.
    Many have defaults that were evaluated for 512mb ram-nodes.

    numfsbufs is easy to evaluate:
    look at fsbufwaitcount (output from vmtune -a) and if it increases,
    increase numfsbufs. One numfsbuf takes approx. 250bytes *
    number_of_filesystems

    you have to remount the fs before an increase takes effect

    Holger van Koll Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139