Professional Web Applications Themes

SELECT...LIMIT that does a partial select? - MySQL

I am in big trouble with MySQL 5. I have a huge table, about 25 GIGABYTES, that is a cache for hard to compute items. Each row has two timestamps, last updated and last retrieved timestamp, some metadata, key and a BLOB. I would like to clean that table based on timestamps. I used to clean it periodically, but doing a big DELETE...WHERE query locks it up for HOURS and that means downtime for my website, lost money, ed off tutors, etc. So, I thought, I would write a process that would do SELECT id FROM bincache WHERE ... LIMIT ...

  1. #1

    Default SELECT...LIMIT that does a partial select?

    I am in big trouble with MySQL 5.

    I have a huge table, about 25 GIGABYTES, that is a cache for hard to
    compute items. Each row has two timestamps, last updated and last
    retrieved timestamp, some metadata, key and a BLOB.

    I would like to clean that table based on timestamps. I used to clean
    it periodically, but doing a big DELETE...WHERE query locks it up for
    HOURS and that means downtime for my website, lost money, ed off
    tutors, etc.

    So, I thought, I would write a process that would do

    SELECT id FROM bincache WHERE ... LIMIT 100

    and that would run quickly to give me 100 IDs, that would then DELETE
    at my leisure without locking up the table. Repeat ad infinitum.

    Well, I just learned that LIMIT 100 is applied after ALL select work
    is done, which means that it would have to search gigabytes of data
    and return a many gigabyte result, only to trim it to 100 rows at the
    last moment.

    Even SELECT..LIMIT 1 takes forever and I had to kill it.

    How can I get around this awful problem.

    i
    Ignoramus24559 Guest

  2. #2

    Default Re: SELECT...LIMIT that does a partial select?

    Ignoramus24559 wrote:
     


    If you ORDER BY something that is indexed then it will read the table in the
    order of the index and stop after it has match 100 rows, it should be very
    fast.


    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
    Brian Guest

  3. #3

    Default Re: SELECT...LIMIT that does a partial select?

    On Fri, 15 Dec 2006 19:25:22 +0000, Brian Wakem <com> wrote: 
    >
    >
    > If you ORDER BY something that is indexed then it will read the table in the
    > order of the index and stop after it has match 100 rows, it should be very
    > fast.
    >
    >[/ref]

    Brian, this is beautiful and seems to work. At least when I am making
    my selects without having a lot of entries cleaned in the beginning of
    the index. I am a little concerned that this select...order by would
    become slower if it goes through the index, first scrolling through
    thousands of entries that already are devoid of deletable items.

    Anyway, I will see what I can do...

    i
    Ignoramus24559 Guest

  4. #4

    Default Re: SELECT...LIMIT that does a partial select?

    Ignoramus24559 <24559.invalid> wrote: 

    First: this design does not seem to be very clever. You have 25GB
    worth of "hard to compute" binary stuff? Why do you put it in the
    database anyway? Wouldn't it be better to keep that in plain old
    files? On the web server(s)? That should scale much better. Plus,
    filesystems are quite good at dealing with binary data. Better than
    most database engines.

    Next: you use a MyISAM table for that. Not good. With variable length
    records and heavy fluctuation you will experience tablespace
    fragmentation very soon. You would be better off with an InnoDB table.
    I can also imagine a set of MyISAM tables, bundled by the MERGE engine.
    If you partition your data by timestamp, cleaning out old entries would
    become as simple as CREATE fresh_table, ALTER merge_table, DROP
    oldest_table.
     

    Locking is a MyISAM only problem.

    Also you can do DELETE ... LIMIT. No need to break that in two steps.
    However, deleting from a big table is expensive:

    1. the rows to be deleted must be found
    2. rows must be deleted from tablespace (this may cause tablespace
    reorganisation)
    3. indexes must be updated

    If you delete big part of a table (say: 80% of the records) you will be
    better off with the following approach:

    1. create a new table with the same layout
    2. populate the new table with INSERT .... SELECT with all records
    you plan to keep
    3. rename tables (this is an atomic operation)
    4. drop the old table
     

    This clearly depends on your query. If MySQL does not need to
    materialize the result set (i.e. for sorting), it stops just after
    it processed the requested number of rows.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  5. #5

    Default Re: SELECT...LIMIT that does a partial select?


    Axel Schwenke wrote: 
    >
    > First: this design does not seem to be very clever. You have 25GB
    > worth of "hard to compute" binary stuff? Why do you put it in the
    > database anyway? Wouldn't it be better to keep that in plain old
    > files? On the web server(s)? That should scale much better. Plus,
    > filesystems are quite good at dealing with binary data. Better than
    > most database engines.
    >
    > Next: you use a MyISAM table for that. Not good. With variable length
    > records and heavy fluctuation you will experience tablespace
    > fragmentation very soon. You would be better off with an InnoDB table.
    > I can also imagine a set of MyISAM tables, bundled by the MERGE engine.
    > If you partition your data by timestamp, cleaning out old entries would
    > become as simple as CREATE fresh_table, ALTER merge_table, DROP
    > oldest_table.

    >
    > Locking is a MyISAM only problem.
    >
    > Also you can do DELETE ... LIMIT. No need to break that in two steps.
    > However, deleting from a big table is expensive:
    >
    > 1. the rows to be deleted must be found
    > 2. rows must be deleted from tablespace (this may cause tablespace
    > reorganisation)
    > 3. indexes must be updated
    >
    > If you delete big part of a table (say: 80% of the records) you will be
    > better off with the following approach:
    >
    > 1. create a new table with the same layout
    > 2. populate the new table with INSERT .... SELECT with all records
    > you plan to keep
    > 3. rename tables (this is an atomic operation)
    > 4. drop the old table

    >
    > This clearly depends on your query. If MySQL does not need to
    > materialize the result set (i.e. for sorting), it stops just after
    > it processed the requested number of rows.
    >
    >
    > XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    > MySQL User Forums: http://forums.mysql.com/[/ref]

    IMNSHO, using partitioned tables would be a much better alternative. I
    have administered real databases in the multi-terabyte range with this
    approach.

    create tablespace p0...
    create tablespace p1 etc...

    CREATE TABLE t1 (
    id INT,
    year_col INT
    )
    PARTITION BY RANGE (year_col) (
    PARTITION p0 VALUES LESS THAN (1991),
    PARTITION p1 VALUES LESS THAN (1995),
    PARTITION p2 VALUES LESS THAN (1999)
    );
    You can add a new partition p3 to this table for storing values less
    then 2002 as follows:

    ALTER TABLE t1 ADD PARTITION (PARTITION p3 VALUES LESS THAN (2002));

    Then look at how to drop the older partitions.

    One thing you need to do is to also partition your indexes (it if is
    allowed --I have not exhausted the docs on this topic...) such that if
    you drop the oldest partition - you do so on the indexes as well.

    Someone once said in this forum that in todays storage array technology
    it is not necessary to seperate data and indexes onto seperate
    filesystems (or disk drives depending on your platform). I would have
    to disagree because the problem is not necessarily on the storage
    array, but the shear volume of I/O's the OS can queue up on a single
    logical device. No OS handles hundreds or thousands of I/O's per
    second to a single-LVM very well unless your "SYSTEM" is designed
    properly. You must continue to "DESIGN" your system to match the
    requirements - don't just throw it together based on random pieces of
    comments (including mine) and expect it to perform optimally.

    If someone says `it "should" perform if you....`, RUN!!! Unless they
    have designed it before in a proven production environment, then they
    are just guessing. It may be a very good guess, but it is a guess
    nevertheless...

    Database System designers, IMHO, should have DBA, SA, Network and
    Storage Administration skills, because to design these types of
    systems, you must have an intimate, working knowlege of all of these
    disciplines and MUST work very closely with their counterparts in these
    areas.

    Database Consultant.

    onedbguru Guest

  6. #6

    Default Re: SELECT...LIMIT that does a partial select?

    "onedbguru" <com> wrote: 

    [Fullquote, including signature]

    Please don't fullquote!
     

    Did you notice the word "mysql" in this groups name? Your suggestion
    does not work with MySQL. In fact MySQL 5.1 does support partitions -
    but it is currently BETA and the syntax is slightly different.
    MERGE tables (as suggested by me) are the poor mans approximation of
    partitions, available in stable MySQL editions.

    However, the OPs problem is, that DELETE on a large table

    a) may take a long time and
    b) locks that table

    His original approach (deleting small chunks) tackles a). Using an
    InnoDB table would eliminate b). Partitions won't solve any of those
    in the first place - only if DELETE is implemented as DROP PARTITION.


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  7. #7

    Default Re: SELECT...LIMIT that does a partial select?

    Axel Schwenke wrote: 
    >
    > Did you notice the word "mysql" in this groups name? Your suggestion
    > does not work with MySQL. In fact MySQL 5.1 does support partitions -
    > but it is currently BETA and the syntax is slightly different.
    > MERGE tables (as suggested by me) are the poor mans approximation of
    > partitions, available in stable MySQL editions.[/ref]

    UM, no. I've been using them since 3.x, and they work great for huge
    amounts of data.


    Wayne Guest

Similar Threads

  1. Replies: 0
    Last Post: September 24th, 03:24 AM
  2. Replies: 0
    Last Post: September 11th, 12:19 AM
  3. Replies: 4
    Last Post: July 3rd, 08:55 AM
  4. Replies: 0
    Last Post: April 15th, 01:22 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139