MySQL not being able to utilize a compound index?

I came today upon a very strange issue. It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I’m looking for an explanation. I’ll appreciate any insight on this.

Take a look at the following table:

CREATE TABLE `t` (
  `a` int(11) NOT NULL default '0',
  `b` int(11) NOT NULL default '0',
  `c` int(11) default NULL,
  PRIMARY KEY  (`a`,`b`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Filled with this data:

mysql> SELECT * FROM t;
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 2 | 4 | NULL |
| 2 | 5 | NULL |
+---+---+------+
10 rows in set (0.00 sec)

Now, it is known that I can query by tuples:

mysql> SELECT * FROM t WHERE (a,b) < (2,2);
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
+---+---+------+
6 rows in set (0.00 sec)

MySQL understands tuple comparison (e.g. (a,b) < (2.2)) and returns correct results. Now here’s my issue: I would assume the PRIMARY KEY is used – since it’s on (a,b) – so that’s a simple (well, compound) range condition. Alas:

mysql> EXPLAIN SELECT * FROM t WHERE (a,b) < (2,2)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

We get a full table scan! Now, MySQL has no problem when I do an equality search (e.g. (a,b) = (2,2)). In that case, the PRIMARY KEY is utilized.

Why, then, would it not use it in the range query? Why would it not consider it as a possible key (I could live with FORCE KEY if that would solve the problem). This problem undermines some efforts of mine for nice optimization tricks.

[UPDATE: the above is a simplified version of a very large table I was using (~50M rows, ~30GB), and on which same results were achieved]

Any ideas or suggestions?

25 thoughts on “MySQL not being able to utilize a compound index?”

Rick James says:

March 7, 2012 at 12:43 am

(a, b) < (…) is notorious for being poorly optimized.

The optimal way to do the query is not

(a<2) OR (a=2 AND b<2)

which involves OR at the top level, hence probably a table scan. Instead, use this equivalent:

(a<=2) AND (a<2 OR b<2)

(Caveat: When there is a huge number of rows with a=2, it will still scan all the a=2 values.)
Rick James says:

March 7, 2012 at 1:04 am

A further note. (HINT to developers)

It should be easy to transform the parse tree for

WHERE (a, b) <* (x, y)

into

a <= x AND (a < x OR b <* y)

Notes:

* <* represents either < or <=

* other operators should have similar transformations

* (a,b,c) < … — also possible, but much messier (exercise for the reader)
* This is useful only if you have an index starting with `a`. (Only `a` will be used, so including `b` is optional.)

* The index will be scanned _through_ a=x, regardless of b and y.
shlomi says:

March 7, 2012 at 7:12 am

@Rick,
There is no table scan. Although there is this “OR”, the optimizer works it out well.
shlomi says:

March 7, 2012 at 7:14 am

@Rick,
The “(a<=2) AND (a<2 OR b<2)" is a cool transformation. I need to test this.
Pingback: MySql is not optimizing the query properly | XL-UAT

This site uses Akismet to reduce spam. Learn how your comment data is processed.

25 thoughts on “MySQL not being able to utilize a compound index?”

Leave a Reply