Professional Web Applications Themes

Need optimization suggestions (cleaning large table) - MySQL

Ignoramus23984 wrote: > This is for my website algebra.com. > > I have a large table, currently about 12 gigabytes. It is alled > "bincache". > > create table bincache ( > id VARCHAR(250) PRIMARY KEY, > dataclass VARCHAR(32), > lastdate DATETIME, > lastused DATETIME, > dataitem blob > ) > > It stores various expensive to obtain (compute or download) > stuff. Every night I clean it by deleting data items that are over, > say 2 weeks old (using lastdate field). > > My issue is that delete takes too long. I would like to know if, say, ...

  1. #1

    Default Re: Need optimization suggestions (cleaning large table)

    Ignoramus23984 wrote:
    > This is for my website algebra.com.
    >
    > I have a large table, currently about 12 gigabytes. It is alled
    > "bincache".
    >
    > create table bincache (
    > id VARCHAR(250) PRIMARY KEY,
    > dataclass VARCHAR(32),
    > lastdate DATETIME,
    > lastused DATETIME,
    > dataitem blob
    > )
    >
    > It stores various expensive to obtain (compute or download)
    > stuff. Every night I clean it by deleting data items that are over,
    > say 2 weeks old (using lastdate field).
    >
    > My issue is that delete takes too long. I would like to know if, say,
    > creating index based on lastdate would alleviate the problem.
    >
    > i
    >
    Impossible to say with the information you've given us.

    Performance is related to data - things such as amount of data as well
    as type of data, (to a certain extent) data distribution, the queries
    being performed, CPU load, disk speed, the phases of the moon...

    An index on lastdate *may* decrease the time - it depends on all of the
    above. But it also will *increase* the amount of time for every insert,
    and every update which changes this field.

    How about more details on the data and the query you're using?

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

  2. #2

    Default Re: Need optimization suggestions (cleaning large table)

    Ignoramus23984 <ignoramus23984NOSPAM.23984.invalid> wrote:
    >
    > I have a large table, currently about 12 gigabytes. It is alled
    > "bincache".
    Hmm. I guess it is MyISAM?
    > create table bincache (
    > id VARCHAR(250) PRIMARY KEY,
    > dataclass VARCHAR(32),
    > lastdate DATETIME,
    > lastused DATETIME,
    > dataitem blob
    > )
    >
    > It stores various expensive to obtain (compute or download)
    > stuff. Every night I clean it by deleting data items that are over,
    > say 2 weeks old (using lastdate field).
    I'm quite sure that your datafile is fragmented. Could you post the
    result of SHOW TABLE STATUS LIKE 'bincache'? Have a look at the
    `Data_length` and `Data_free` columns. If the latter is big (bigger
    than the first) your table is fragmented. To defragment it, you can
    use OPTIMIZE TABLE.
    > My issue is that delete takes too long. I would like to know if, say,
    > creating index based on lastdate would alleviate the problem.
    You say you DELETE daily, everything older than 2 weeks. That means
    you delete 1/15th of your rows on average. Probably an index on
    `lastdate` (IMHO `lastused` would be better) could speedup the
    selection of those rows. You already have significant costs for
    updating the primary key so another index would not be too bad.


    I would suggest to delete less often, say once a week and use the
    following algorithm:

    1. create a second, identical table bincache_new
    2. select all rows to be preserved from bincache and
    insert them into bincache_new (use INSERT ... SELECT ... WHERE)
    3. rename tables: bincache -> bincache_drop, bincache_new -> bincache
    4. drop bincache_drop

    That way you avoid fragmentation at all. Plus you get a nice clean
    fresh index distribution every time.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: [url]http://dev.mysql.com/doc/refman/5.0/en/[/url]
    MySQL User Forums: [url]http://forums.mysql.com/[/url]
    Axel Schwenke Guest

  3. #3

    Default Re: Need optimization suggestions (cleaning large table)

    Ignoramus23984 wrote:
    > On Thu, 02 Mar 2006 08:03:40 -0500, Jerry Stuckle <jstucklexattglobal.net> wrote:
    >
    >>Ignoramus23984 wrote:
    >>
    >>>This is for my website algebra.com.
    >>>
    >>>I have a large table, currently about 12 gigabytes. It is alled
    >>>"bincache".
    >>>
    >>>create table bincache (
    >>> id VARCHAR(250) PRIMARY KEY,
    >>> dataclass VARCHAR(32),
    >>> lastdate DATETIME,
    >>> lastused DATETIME,
    >>> dataitem blob
    >>> )
    >>>
    >>>It stores various expensive to obtain (compute or download)
    >>>stuff. Every night I clean it by deleting data items that are over,
    >>>say 2 weeks old (using lastdate field).
    >>>
    >>>My issue is that delete takes too long. I would like to know if, say,
    >>>creating index based on lastdate would alleviate the problem.
    >>>
    >>>i
    >>>
    >>
    >>Impossible to say with the information you've given us.
    >>
    >>Performance is related to data - things such as amount of data as well
    >>as type of data, (to a certain extent) data distribution, the queries
    >>being performed, CPU load, disk speed, the phases of the moon...
    >
    >
    > The amount of data is about 12 Gigs. Data is records of the above
    > table, the blob stuff is perhaps 20-40 kb on average.
    >
    > It is added at random moments throughout the day.
    >
    > It is retrieved by key.
    >
    >
    >>An index on lastdate *may* decrease the time - it depends on all of the
    >>above. But it also will *increase* the amount of time for every insert,
    >>and every update which changes this field.
    >
    >
    > Time for every insert is not an issue, as inserts happen relatively
    > rarely (say, no more than once every few seconds).
    >
    >
    >>How about more details on the data and the query you're using?
    >
    >
    > I would be delighted to provide all requisite data. Basically any
    > downtime is lost money, although at 2am when I do it, it is not as big
    > a deal.
    >
    > Thank you for your help.
    >
    > igor
    >
    Igor,

    Sorry I wasn't clear - by how much data, I was asking how many rows, not
    bytes. However, from your info and taking an average of 30K per blob,
    that's about 35K rows. Not that big of a database.

    An index on lastdate will probably help in this case, and shouldn't slow
    down performance on the inserts significantly. 35K datetime values
    isn't much.

    The main reason the index will help here is MySQL has to do a table scan
    when searching by lastdate. This means it has to read the entire 12GB
    of data - which just takes a while. I suspect this is a lot of your
    slowdown.

    Depending on your other operations, it might also help to put the blob
    data in a second table. I've found that can improve performance,
    especially if you're not selecting the blob data much of the time (i.e.
    you just get the title for an article instead of the entire article).
    That's because even if MySQL doesn't return the blob, it still must read
    it if it's in the same table (in MySQL it's one physical file).

    If you do the latter, create a new primary key of integer with
    autoincrement. You can keep your varchar id column and index, but make
    the other the primary key. The blob table would then contain an integer
    and the blob. Reason being - it's much, much faster to compare integers
    than strings.

    Hope this helps.


    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

Similar Threads

  1. Large site Structure suggestions
    By Mark in forum Coldfusion Database Access
    Replies: 0
    Last Post: September 9th, 11:37 PM
  2. table space impact on query optimization
    By xixi in forum IBM DB2
    Replies: 2
    Last Post: August 6th, 04:16 PM
  3. Replies: 0
    Last Post: July 7th, 05:31 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139