Professional Web Applications Themes

Settle a debate - to index or not to index? - MySQL

I have 2 tables: mysql> desc a; +---------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+------------------+------+-----+---------+-------+ | item_id | int(10) unsigned | NO | | 0 | | +---------+------------------+------+-----+---------+-------+ 1 row in set (0.00 sec) mysql> select count(*) from a; +----------+ | count(*) | +----------+ | 38364373 | +----------+ 1 row in set (0.00 sec) mysql> desc b; +---------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+---------------------+------+-----+---------+----------------+ | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment | | col2 | int(10) unsigned | NO ...

  1. #1

    Default Settle a debate - to index or not to index?

    I have 2 tables:

    mysql> desc a;
    +---------+------------------+------+-----+---------+-------+
    | Field | Type | Null | Key | Default | Extra |
    +---------+------------------+------+-----+---------+-------+
    | item_id | int(10) unsigned | NO | | 0 | |
    +---------+------------------+------+-----+---------+-------+
    1 row in set (0.00 sec)

    mysql> select count(*) from a;
    +----------+
    | count(*) |
    +----------+
    | 38364373 |
    +----------+
    1 row in set (0.00 sec)

    mysql> desc b;
    +---------+---------------------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra
    |
    +---------+---------------------+------+-----+---------+----------------+
    | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
    |
    | col2 | int(10) unsigned | NO | MUL | |
    |
    +---------+---------------------+------+-----+---------+----------------+
    5 rows in set (0.01 sec)

    mysql> show index from b;
    +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table | Non_unique | Key_name | Seq_in_index | Column_name |
    Collation | Cardinality | Sub_part | Packed | Null | Index_type |
    Comment |
    +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | b | 0 | PRIMARY | 1 | item_id | A
    | 60688657 | NULL | NULL | | BTREE | |
    | b | 1 | col2 | 1 | col2 | A
    | NULL | NULL | NULL | | BTREE | |
    +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    3 rows in set (0.00 sec)

    mysql> select count(*) from b;
    +----------+
    | count(*) |
    +----------+
    | 60688657 |
    +----------+
    1 row in set (0.00 sec)

    There are no duplicates in table a. I want to run the following query:
    SELECT col2 FROM a JOIN b USING(`item_id`)

    I am arguing that an index (a primary key) on the only column in table
    a will not make any difference whatsoever. Our DBA is telling me
    otherwise, claiming that w/o an index "it will go thru the whole table
    every time it can't find a matching row." It is not clear to me what
    he is trying to say; I disagree that it will be any slower to run the
    query w/o an index on table a.

    I see mysql doing the following: going thru each row in table a using
    natural row order and joining table b on it. On the join it uses the
    index on `item_id` in table b to see if there is a match and selects
    col2 if found. I do not see how an index on table a could optimize
    this process.

    The DBA also claims that because it would do lookups out of order, it
    would have to "seek" more. I don't think one can assume which way it
    would be faster - it all depends on how the data is physically laid out
    on disk, which we don't know, so no assumptions can be made either way.
    He also mentioned having to move the row pointer more; I do not know
    enough about the innards of mysql to make a comment on this.

    EXPLAIN seems to support my reasoning: I created a copy of table a and
    added a primary key on it; EXPLAIN says it will scan the same number of
    rows in both cases - all of them:
    mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
    +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len
    | ref | rows | Extra |
    +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    | 1 | SIMPLE | a | ALL | NULL | NULL | NULL
    | NULL | 38364373 | |
    | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
    | a.item_id | 1 | Using where |
    +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    2 rows in set (0.00 sec)

    mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
    +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    | id | select_type | table | type | possible_keys | key |
    key_len | ref | rows | Extra |
    +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
    | NULL | 38364373 | Using index |
    | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
    | a_indexed.item_id | 1 | Using where |
    +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    2 rows in set (0.01 sec)

    The DBA said that EXPLAIN is just an approximation. While I agree with
    this statement, I don't see how it applies to this problem.

    Am I wrong? Am I right? Please enlighten me (with a detailed technical
    explanation).


    P.S. The names of the tables & columns have been changed for privacy.

    Duke Ionescu Guest

  2. #2

    Default Re: Settle a debate - to index or not to index?

    Duke Ionescu wrote:
    > I have 2 tables:
    >
    > mysql> desc a;
    > +---------+------------------+------+-----+---------+-------+
    > | Field | Type | Null | Key | Default | Extra |
    > +---------+------------------+------+-----+---------+-------+
    > | item_id | int(10) unsigned | NO | | 0 | |
    > +---------+------------------+------+-----+---------+-------+
    > 1 row in set (0.00 sec)
    >
    > mysql> select count(*) from a;
    > +----------+
    > | count(*) |
    > +----------+
    > | 38364373 |
    > +----------+
    > 1 row in set (0.00 sec)
    >
    > mysql> desc b;
    > +---------+---------------------+------+-----+---------+----------------+
    > | Field | Type | Null | Key | Default | Extra
    > |
    > +---------+---------------------+------+-----+---------+----------------+
    > | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
    > |
    > | col2 | int(10) unsigned | NO | MUL | |
    > |
    > +---------+---------------------+------+-----+---------+----------------+
    > 5 rows in set (0.01 sec)
    >
    > mysql> show index from b;
    > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    > | Table | Non_unique | Key_name | Seq_in_index | Column_name |
    > Collation | Cardinality | Sub_part | Packed | Null | Index_type |
    > Comment |
    > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    > | b | 0 | PRIMARY | 1 | item_id | A
    > | 60688657 | NULL | NULL | | BTREE | |
    > | b | 1 | col2 | 1 | col2 | A
    > | NULL | NULL | NULL | | BTREE | |
    > +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    > 3 rows in set (0.00 sec)
    >
    > mysql> select count(*) from b;
    > +----------+
    > | count(*) |
    > +----------+
    > | 60688657 |
    > +----------+
    > 1 row in set (0.00 sec)
    >
    > There are no duplicates in table a. I want to run the following query:
    > SELECT col2 FROM a JOIN b USING(`item_id`)
    >
    > I am arguing that an index (a primary key) on the only column in table
    > a will not make any difference whatsoever. Our DBA is telling me
    > otherwise, claiming that w/o an index "it will go thru the whole table
    > every time it can't find a matching row." It is not clear to me what
    > he is trying to say; I disagree that it will be any slower to run the
    > query w/o an index on table a.
    >
    > I see mysql doing the following: going thru each row in table a using
    > natural row order and joining table b on it. On the join it uses the
    > index on `item_id` in table b to see if there is a match and selects
    > col2 if found. I do not see how an index on table a could optimize
    > this process.
    >
    > The DBA also claims that because it would do lookups out of order, it
    > would have to "seek" more. I don't think one can assume which way it
    > would be faster - it all depends on how the data is physically laid out
    > on disk, which we don't know, so no assumptions can be made either way.
    > He also mentioned having to move the row pointer more; I do not know
    > enough about the innards of mysql to make a comment on this.
    >
    > EXPLAIN seems to support my reasoning: I created a copy of table a and
    > added a primary key on it; EXPLAIN says it will scan the same number of
    > rows in both cases - all of them:
    > mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
    > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    > | id | select_type | table | type | possible_keys | key | key_len
    > | ref | rows | Extra |
    > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    > | 1 | SIMPLE | a | ALL | NULL | NULL | NULL
    > | NULL | 38364373 | |
    > | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
    > | a.item_id | 1 | Using where |
    > +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
    > 2 rows in set (0.00 sec)
    >
    > mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
    > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    > | id | select_type | table | type | possible_keys | key |
    > key_len | ref | rows | Extra |
    > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    > | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
    > | NULL | 38364373 | Using index |
    > | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
    > | a_indexed.item_id | 1 | Using where |
    > +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
    > 2 rows in set (0.01 sec)
    >
    > The DBA said that EXPLAIN is just an approximation. While I agree with
    > this statement, I don't see how it applies to this problem.
    >
    > Am I wrong? Am I right? Please enlighten me (with a detailed technical
    > explanation).
    >
    >
    > P.S. The names of the tables & columns have been changed for privacy.
    >
    I would argue that your DBA is correct, When you select from a_indexed, an
    index would allow you to select in numeric order. As the join values would now
    be in numeric order, there would potentially be less I/O on table b.

    For instance - with an index it could select in order of 1,2,3,4, etc. But
    without the index it could select in the order of 25694, 30125, 81320, 301976, etc.

    The first would be able to use many of the rows already buffered by the index in
    b. The second would have to select several different blocks from the index for b.

    Now - if your buffers are big enough to hold all the indexes, than probably
    MySQLr would eventually find everything in the buffer (you'd only have to read
    each block of the index once). But if your buffers are too small, the second
    may have to re-read the same block several times.

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    [email]jstucklexattglobal.net[/email]
    ==================
    Jerry Stuckle Guest

Similar Threads

  1. index.php
    By laidback64 in forum Macromedia Contribute General Discussion
    Replies: 1
    Last Post: October 13th, 09:36 PM
  2. Replies: 4
    Last Post: August 3rd, 03:11 PM
  3. Index Topics and Index References
    By Tom_Cusick@adobeforums.com in forum Adobe Indesign Windows
    Replies: 2
    Last Post: July 8th, 07:20 PM
  4. Newb query: index.htm & index.php & the server default
    By Lab309 in forum PHP Development
    Replies: 7
    Last Post: September 22nd, 02:08 PM
  5. Create index VS Set Index Enabled - IDS 7.31
    By Stéphane Gadoury in forum Informix
    Replies: 2
    Last Post: July 21st, 02:00 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139