Ask a Question related to MySQL, Design and Development.
-
Duke Ionescu #1
Settle a debate - to index or not to index?
I have 2 tables:
mysql> desc a;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| item_id | int(10) unsigned | NO | | 0 | |
+---------+------------------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
| 38364373 |
+----------+
1 row in set (0.00 sec)
mysql> desc b;
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra
|
+---------+---------------------+------+-----+---------+----------------+
| item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
|
| col2 | int(10) unsigned | NO | MUL | |
|
+---------+---------------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)
mysql> show index from b;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type |
Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| b | 0 | PRIMARY | 1 | item_id | A
| 60688657 | NULL | NULL | | BTREE | |
| b | 1 | col2 | 1 | col2 | A
| NULL | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> select count(*) from b;
+----------+
| count(*) |
+----------+
| 60688657 |
+----------+
1 row in set (0.00 sec)
There are no duplicates in table a. I want to run the following query:
SELECT col2 FROM a JOIN b USING(`item_id`)
I am arguing that an index (a primary key) on the only column in table
a will not make any difference whatsoever. Our DBA is telling me
otherwise, claiming that w/o an index "it will go thru the whole table
every time it can't find a matching row." It is not clear to me what
he is trying to say; I disagree that it will be any slower to run the
query w/o an index on table a.
I see mysql doing the following: going thru each row in table a using
natural row order and joining table b on it. On the join it uses the
index on `item_id` in table b to see if there is a match and selects
col2 if found. I do not see how an index on table a could optimize
this process.
The DBA also claims that because it would do lookups out of order, it
would have to "seek" more. I don't think one can assume which way it
would be faster - it all depends on how the data is physically laid out
on disk, which we don't know, so no assumptions can be made either way.
He also mentioned having to move the row pointer more; I do not know
enough about the innards of mysql to make a comment on this.
EXPLAIN seems to support my reasoning: I created a copy of table a and
added a primary key on it; EXPLAIN says it will scan the same number of
rows in both cases - all of them:
mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len
| ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL
| NULL | 38364373 | |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
| a.item_id | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
| 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
| NULL | 38364373 | Using index |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
| a_indexed.item_id | 1 | Using where |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
2 rows in set (0.01 sec)
The DBA said that EXPLAIN is just an approximation. While I agree with
this statement, I don't see how it applies to this problem.
Am I wrong? Am I right? Please enlighten me (with a detailed technical
explanation).
P.S. The names of the tables & columns have been changed for privacy.
Duke Ionescu Guest
-
index.php
I am knew to all of this. When ever I try to edit my web page, it brings up index.php. No image comes up with this file, you can view some code. ... -
Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index
Hey Folks,(New to .NET) This is driving me NUTZ... If anyone out there can resolve this from me I would greatly appreciate it... Line 238: Line... -
Index Topics and Index References
Ok here is my problem. I Generate my Index and it keeps coming up with the original index i created. I went through all the chapters in my manual... -
Newb query: index.htm & index.php & the server default
The problem I'm trying to solve is as follows: The website has two subdirectories: /ordinary and /phpstuff. Users typing hostname/ordinary get the... -
Create index VS Set Index Enabled - IDS 7.31
Hi List, I am in the process of re-doing my production DB. One of the step I decide to perform was to "Set index enabled" instead of of... -
Jerry Stuckle #2
Re: Settle a debate - to index or not to index?
Duke Ionescu wrote:
I would argue that your DBA is correct, When you select from a_indexed, an> I have 2 tables:
>
> mysql> desc a;
> +---------+------------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +---------+------------------+------+-----+---------+-------+
> | item_id | int(10) unsigned | NO | | 0 | |
> +---------+------------------+------+-----+---------+-------+
> 1 row in set (0.00 sec)
>
> mysql> select count(*) from a;
> +----------+
> | count(*) |
> +----------+
> | 38364373 |
> +----------+
> 1 row in set (0.00 sec)
>
> mysql> desc b;
> +---------+---------------------+------+-----+---------+----------------+
> | Field | Type | Null | Key | Default | Extra
> |
> +---------+---------------------+------+-----+---------+----------------+
> | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
> |
> | col2 | int(10) unsigned | NO | MUL | |
> |
> +---------+---------------------+------+-----+---------+----------------+
> 5 rows in set (0.01 sec)
>
> mysql> show index from b;
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Table | Non_unique | Key_name | Seq_in_index | Column_name |
> Collation | Cardinality | Sub_part | Packed | Null | Index_type |
> Comment |
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | b | 0 | PRIMARY | 1 | item_id | A
> | 60688657 | NULL | NULL | | BTREE | |
> | b | 1 | col2 | 1 | col2 | A
> | NULL | NULL | NULL | | BTREE | |
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> 3 rows in set (0.00 sec)
>
> mysql> select count(*) from b;
> +----------+
> | count(*) |
> +----------+
> | 60688657 |
> +----------+
> 1 row in set (0.00 sec)
>
> There are no duplicates in table a. I want to run the following query:
> SELECT col2 FROM a JOIN b USING(`item_id`)
>
> I am arguing that an index (a primary key) on the only column in table
> a will not make any difference whatsoever. Our DBA is telling me
> otherwise, claiming that w/o an index "it will go thru the whole table
> every time it can't find a matching row." It is not clear to me what
> he is trying to say; I disagree that it will be any slower to run the
> query w/o an index on table a.
>
> I see mysql doing the following: going thru each row in table a using
> natural row order and joining table b on it. On the join it uses the
> index on `item_id` in table b to see if there is a match and selects
> col2 if found. I do not see how an index on table a could optimize
> this process.
>
> The DBA also claims that because it would do lookups out of order, it
> would have to "seek" more. I don't think one can assume which way it
> would be faster - it all depends on how the data is physically laid out
> on disk, which we don't know, so no assumptions can be made either way.
> He also mentioned having to move the row pointer more; I do not know
> enough about the innards of mysql to make a comment on this.
>
> EXPLAIN seems to support my reasoning: I created a copy of table a and
> added a primary key on it; EXPLAIN says it will scan the same number of
> rows in both cases - all of them:
> mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> | id | select_type | table | type | possible_keys | key | key_len
> | ref | rows | Extra |
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> | 1 | SIMPLE | a | ALL | NULL | NULL | NULL
> | NULL | 38364373 | |
> | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
> | a.item_id | 1 | Using where |
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> 2 rows in set (0.00 sec)
>
> mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> | id | select_type | table | type | possible_keys | key |
> key_len | ref | rows | Extra |
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
> | NULL | 38364373 | Using index |
> | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
> | a_indexed.item_id | 1 | Using where |
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> 2 rows in set (0.01 sec)
>
> The DBA said that EXPLAIN is just an approximation. While I agree with
> this statement, I don't see how it applies to this problem.
>
> Am I wrong? Am I right? Please enlighten me (with a detailed technical
> explanation).
>
>
> P.S. The names of the tables & columns have been changed for privacy.
>
index would allow you to select in numeric order. As the join values would now
be in numeric order, there would potentially be less I/O on table b.
For instance - with an index it could select in order of 1,2,3,4, etc. But
without the index it could select in the order of 25694, 30125, 81320, 301976, etc.
The first would be able to use many of the rows already buffered by the index in
b. The second would have to select several different blocks from the index for b.
Now - if your buffers are big enough to hold all the indexes, than probably
MySQLr would eventually find everything in the buffer (you'd only have to read
each block of the index once). But if your buffers are too small, the second
may have to re-read the same block several times.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
[email]jstucklex@attglobal.net[/email]
==================
Jerry Stuckle Guest



Reply With Quote

