Professional Web Applications Themes

Assistance with Query Optimization - MySQL

Hi all. I have a table with ~250,000 entries. Each entry has an associated year value, which could be any year in the past 200 or so... [id|item|year] I am creating a PHP page to display the years that have items, ordered by decade. $sql="SELECT year, SUBSTRING(year,2,2) AS decade FROM `items` GROUP BY decade, ORDER BY year DESC;" This query takes around 0.5s to execute. Results: [2000|00] [1990|90] .. .. .. [1780|78] I display these items as links (using anchors) to the full decade results further down the page, where I list each and every year I have items for ...

  1. #1

    Default Assistance with Query Optimization

    Hi all. I have a table with ~250,000 entries. Each entry has an associated
    year value, which could be any year in the past 200 or so...

    [id|item|year]

    I am creating a PHP page to display the years that have items, ordered by
    decade.

    $sql="SELECT year, SUBSTRING(year,2,2) AS decade FROM `items` GROUP BY
    decade, ORDER BY year DESC;"

    This query takes around 0.5s to execute. Results:

    [2000|00]
    [1990|90]
    ..
    ..
    ..
    [1780|78]

    I display these items as links (using anchors) to the full decade results
    further down the page, where I list each and every year I have items for in
    a given decade. In each individual decade section of the page, I then
    execute the following query to get the values for each year located in that
    decade with items:

    $sql = "SELECT DISTINCT year FROM `items` WHERE SUBSTRING(year, 2, 2) = 00
    ORDER BY year DESC;"
    $sql = "SELECT DISTINCT year FROM `items` WHERE SUBSTRING(year, 2, 2) = 90
    ORDER BY year DESC;"
    ..
    ..
    ..
    $sql = "SELECT DISTINCT year FROM `items` WHERE SUBSTRING(year, 2, 2) = 78
    ORDER BY year DESC;"

    Each of these queries is taking around 0.3 seconds to process, with the end
    result being around 7-8 seconds for the page to load... which I am getting
    complaints about.

    EXPLAIN SELECT DISTINCT year FROM `items` WHERE SUBSTRING(year, 2, 2) = 78
    ORDER BY year DESC;

    id select_type table type possible_keys key key_len
    ref rows Extra
    1 SIMPLE rankings index NULL iYear 2
    NULL 237945 Using where; Using index

    I have an index on my year column, but this does not look to be helping very
    much. Any ideas on how to better optimize this situation, and or do this?
    I have been beating my head against my keyboard here, as I am sure I am
    missing something obvious, so any advice is appreciated.

    GS


    GS Guest

  2. #2

    Default Re: Assistance with Query Optimization

    Hi,

    Looks like the substring() function is the one taking most of the time.

    I know this will sound like a nightmare to a purist DBA, but, if
    performance is your priority, how bad of an idea it is to add a decade
    to your table and populate it as you do an insert via substring()? This
    will take out the substring() function out of your selects and put it
    in your inserts. This might help big time with the performance.

    Let me know if this is worth a consideration :)

    -cheers,
    Manish

    Manish Guest

  3. #3

    Default Re: Assistance with Query Optimization

    "GS" <com> wrote:
     
     

    Of course your 'decade' is not unique as soon as you reach more than
    1000 years in your table. Also the above SELECT is problematic:

    - you GROUP BY decade, then year is undefined. MySQL will pick a random
    year out of the decade. Other DBMS will raise an error here.

    - ORDER BY year is the same than ORDER BY decade. Maybe slower.
    I expect your query to do an index scan (reading rows in sort order).
    This may be faster than doing a separate sorting pass.

    To make this query faster, you would have to introduce a real `decade`
    field in your table and have an index on it. With MySQL 5.0 or later
    you can use a trigger to update the `decade` field automatically.
     

    There are two problems:

    1. lots of queries
    2. queries do not use indexes
     

    The query is doing a scan on the whole `iYear` index, calculating
    SUBSTRING() for each row. It would be much faster if you write:

    $lower= 1000 + $decade*10;
    if ($decade < 78 ) { #no decades before 1780
    $lower+= 1000;
    }
    $upper= $lower + 9;

    SELECT ... WHERE year BETWEEN $lower AND $upper

    (feel free to calculate the upper/lower bounds of your decades in the
    SQL statement (using IF()) or in your application)

    This query would do a range scan on the index and should be signifi-
    cantly faster. Probably your page will load fast fast enough then.


    There is still the problem of the many queries. In worst case you can
    have up to 100 decades - thus 100 queries in the same page.

    Why not put everything in one query:

    SELECT DISTINCT year,
    SUBSTRING(year,2,2) AS decade
    FROM items ORDER BY year DESC

    then read your result set as follows:

    $last_decade= -1; #impossible value
    while ($row= fetch_hash()) {
    if ($row['decade'] != $last_decade) {
    if ($last_decade != -1) {
    finish($last_decade);
    }
    start($row['decade']);
    $last_decade= $row['decade']
    }
    process($row['year']);
    }
    finish($last_decade);

     

    It does. Scanning the index is still faster than scanning the table.


    HTH, XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

  4. #4

    Default Re: Assistance with Query Optimization

    "Axel Schwenke" <de> wrote in message
    news:homelinux.org... 

    >
    > Of course your 'decade' is not unique as soon as you reach more than
    > 1000 years in your table. Also the above SELECT is problematic:
    >
    > - you GROUP BY decade, then year is undefined. MySQL will pick a random
    > year out of the decade. Other DBMS will raise an error here.
    >
    > - ORDER BY year is the same than ORDER BY decade. Maybe slower.
    > I expect your query to do an index scan (reading rows in sort order).
    > This may be faster than doing a separate sorting pass.
    >
    > To make this query faster, you would have to introduce a real `decade`
    > field in your table and have an index on it. With MySQL 5.0 or later
    > you can use a trigger to update the `decade` field automatically.

    >
    > There are two problems:
    >
    > 1. lots of queries
    > 2. queries do not use indexes

    >
    > The query is doing a scan on the whole `iYear` index, calculating
    > SUBSTRING() for each row. It would be much faster if you write:
    >
    > $lower= 1000 + $decade*10;
    > if ($decade < 78 ) { #no decades before 1780
    > $lower+= 1000;
    > }
    > $upper= $lower + 9;
    >
    > SELECT ... WHERE year BETWEEN $lower AND $upper
    >
    > (feel free to calculate the upper/lower bounds of your decades in the
    > SQL statement (using IF()) or in your application)
    >
    > This query would do a range scan on the index and should be signifi-
    > cantly faster. Probably your page will load fast fast enough then.
    >
    >
    > There is still the problem of the many queries. In worst case you can
    > have up to 100 decades - thus 100 queries in the same page.
    >
    > Why not put everything in one query:
    >
    > SELECT DISTINCT year,
    > SUBSTRING(year,2,2) AS decade
    > FROM items ORDER BY year DESC
    >
    > then read your result set as follows:
    >
    > $last_decade= -1; #impossible value
    > while ($row= fetch_hash()) {
    > if ($row['decade'] != $last_decade) {
    > if ($last_decade != -1) {
    > finish($last_decade);
    > }
    > start($row['decade']);
    > $last_decade= $row['decade']
    > }
    > process($row['year']);
    > }
    > finish($last_decade);
    >

    >
    > It does. Scanning the index is still faster than scanning the table.
    >
    >
    > HTH, XL
    > --
    > Axel Schwenke, Senior Software Developer, MySQL AB
    >
    > Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    > MySQL User Forums: http://forums.mysql.com/[/ref]

    That worked great. I cleaned up my initial query per your recommendations,
    creating a separate indexed decade field.

    I then did a second query grabbing all of the years I have items for and
    used PHP to p the result set. Much faster.

    Interesting discovery... when selecting the years from an indexed column,
    SELECT DISTINCT year... ended up being almost twice as slow as SELECT year
    .... GROUP BY year. Does this make sense?

    GS


    GS Guest

  5. #5

    Default Re: Assistance with Query Optimization

    "GS" <com> wrote: 

    [full quote - including signature - removed]

    Please, do not quote complete postings. Read here:
    http://www.netmeister.org/news/learn2quote2.html
    for reasoning. Thanks!
     

    Glad to hear that.
     

    No. Maybe a deficiency in the query optimizer.
    Can you please run EXPLAIN for both queries?
    What is your *exact* version of MySQL?


    XL
    --
    Axel Schwenke, Senior Software Developer, MySQL AB

    Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
    MySQL User Forums: http://forums.mysql.com/
    Axel Guest

Similar Threads

  1. pdf optimization
    By palaksha@adobeforums.com in forum Adobe Acrobat SDK
    Replies: 0
    Last Post: July 18th, 08:01 AM
  2. table space impact on query optimization
    By xixi in forum IBM DB2
    Replies: 2
    Last Post: August 6th, 04:16 PM
  3. Need Urgent assistance with a SQL Query
    By Sandeep Commar in forum Microsoft SQL / MS SQL Server
    Replies: 10
    Last Post: July 23rd, 07:01 AM
  4. Replies: 3
    Last Post: July 11th, 09:08 AM
  5. SQL Query Assistance
    By Sam Commar in forum Microsoft SQL / MS SQL Server
    Replies: 0
    Last Post: July 10th, 12:37 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139