I came today upon a very strange issue. It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I’m looking for an explanation. I’ll appreciate any insight on this.
Take a look at the following table:
CREATE TABLE `t` ( `a` int(11) NOT NULL default '0', `b` int(11) NOT NULL default '0', `c` int(11) default NULL, PRIMARY KEY (`a`,`b`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Filled with this data:
mysql> SELECT * FROM t; +---+---+------+ | a | b | c | +---+---+------+ | 1 | 1 | NULL | | 1 | 2 | NULL | | 1 | 3 | NULL | | 1 | 4 | NULL | | 1 | 5 | NULL | | 2 | 1 | NULL | | 2 | 2 | NULL | | 2 | 3 | NULL | | 2 | 4 | NULL | | 2 | 5 | NULL | +---+---+------+ 10 rows in set (0.00 sec)
Now, it is known that I can query by tuples:
mysql> SELECT * FROM t WHERE (a,b) < (2,2); +---+---+------+ | a | b | c | +---+---+------+ | 1 | 1 | NULL | | 1 | 2 | NULL | | 1 | 3 | NULL | | 1 | 4 | NULL | | 1 | 5 | NULL | | 2 | 1 | NULL | +---+---+------+ 6 rows in set (0.00 sec)
MySQL understands tuple comparison (e.g. (a,b) < (2.2)) and returns correct results. Now here’s my issue: I would assume the PRIMARY KEY is used – since it’s on (a,b) – so that’s a simple (well, compound) range condition. Alas:
mysql> EXPLAIN SELECT * FROM t WHERE (a,b) < (2,2)G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10 Extra: Using where 1 row in set (0.00 sec)
We get a full table scan! Now, MySQL has no problem when I do an equality search (e.g. (a,b) = (2,2)). In that case, the PRIMARY KEY is utilized.
Why, then, would it not use it in the range query? Why would it not consider it as a possible key (I could live with FORCE KEY if that would solve the problem). This problem undermines some efforts of mine for nice optimization tricks.
[UPDATE: the above is a simplified version of a very large table I was using (~50M rows, ~30GB), and on which same results were achieved]
Any ideas or suggestions?
Hi Shlomi!
I think this is because you have such a smalll dataset – just 10 wee rows.
Make it a some hundred thousends (or tens of thousends) and you should see the index being used.
It all boils to to optimizer statistics.
You will get the same result if you do:
select * from t where a<2;
This is because your range scan reads the entire table. It is faster to read the entire table with a FTS than it is to read the full table by index.
Do you get a range scan when you do this?
select * from t where (a,b) < (1,2);
And do you get an index scan (covering index) when you do this?
select a,b from t where (a,b) < (2,2);
If the answer to the last two questions is YES, then the optimizer is doing what it is supposed to do.
Hi Roland!
Thanks for your comment. I actually found the problem out on a 30GB (~50M rows) table.
You were suggesting that MySQL preferred a full table scan – but that is not the case, since it would not follow FORCE INDEX, and at any case, does not provide with “possible_keys” on EXPLAIN.
Justin,
Thanks. Please see my comment for Roland. If the case was as you suggest, then FORCE INDEX would have done the job – but it does not.
This also happens on very large tables.
This is to emphasize:
The above happens on very large tables, where the search condition is known to be very selective.
If I translate as …WHERE (a < 2) OR (a = 2 AND b < 2), this translates to a PRIMARY KEY range search. At current, this is the only solution I see, but I'm a bit skeptical whether the optimizer would keep up if I had a 3 columns (uncommon) compound key, 4 columns (though rare) etc.