MySQL not being able to utilize a compound index?

I came today upon a very strange issue. It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I’m looking for an explanation. I’ll appreciate any insight on this.

Take a look at the following table:

CREATE TABLE `t` (
  `a` int(11) NOT NULL default '0',
  `b` int(11) NOT NULL default '0',
  `c` int(11) default NULL,
  PRIMARY KEY  (`a`,`b`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Filled with this data:

mysql> SELECT * FROM t;
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 2 | 4 | NULL |
| 2 | 5 | NULL |
+---+---+------+
10 rows in set (0.00 sec)

Now, it is known that I can query by tuples:

mysql> SELECT * FROM t WHERE (a,b) < (2,2);
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
+---+---+------+
6 rows in set (0.00 sec)

MySQL understands tuple comparison (e.g. (a,b) < (2.2)) and returns correct results. Now here’s my issue: I would assume the PRIMARY KEY is used – since it’s on (a,b) – so that’s a simple (well, compound) range condition. Alas:

mysql> EXPLAIN SELECT * FROM t WHERE (a,b) < (2,2)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

We get a full table scan! Now, MySQL has no problem when I do an equality search (e.g. (a,b) = (2,2)). In that case, the PRIMARY KEY is utilized.

Why, then, would it not use it in the range query? Why would it not consider it as a possible key (I could live with FORCE KEY if that would solve the problem). This problem undermines some efforts of mine for nice optimization tricks.

[UPDATE: the above is a simplified version of a very large table I was using (~50M rows, ~30GB), and on which same results were achieved]

Any ideas or suggestions?

25
Leave a Reply

avatar
25 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
10 Comment authors
Rick JamesXaprbShantanu OakDean Ellishartmut Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Roland Bouman
Guest

Hi Shlomi!

I think this is because you have such a smalll dataset – just 10 wee rows.

Make it a some hundred thousends (or tens of thousends) and you should see the index being used.

Justin Swanhart
Guest

It all boils to to optimizer statistics. You will get the same result if you do: select * from t where a<2; This is because your range scan reads the entire table. It is faster to read the entire table with a FTS than it is to read the full table by index. Do you get a range scan when you do this? select * from t where (a,b) < (1,2); And do you get an index scan (covering index) when you do this? select a,b from t where (a,b) < (2,2); If the answer to the last two questions… Read more »

Justin Swanhart
Guest

Also, I’m a little unsure what mean with row comparators.

is
select * from t where (a,b) < (2,2) equivalent to:

select * from t where a<2 and b<2
OR
select * from t where a<=2 and b<2

in other words:

is (2,1) < (2,2) ?

Harrison Fisk
Guest
Harrison Fisk

I’m pretty sure that row comparison operators just can’t use indexes in MySQL.

If you check out:

http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html

You see this gem:

* Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:

(col1, col2, …) = (val1, val2, …)
col1 = val1 AND col2 = val2 AND …

Matthew Montgomery
Guest
Matthew Montgomery

Possible work around is to make sure you have a covering index when doing ROW() comparisons. mysql> desc SELECT * FROM t WHERE (a,b) > (2,2); +—-+————-+——-+——+—————+——+———+——+——+————-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +—-+————-+——-+——+—————+——+———+——+——+————-+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 37 | Using where | +—-+————-+——-+——+—————+——+———+——+——+————-+ 1 row in set (0.00 sec) mysql> alter table t add index (a,b,c); Query OK, 37 rows affected (0.08 sec) Records: 37 Duplicates: 0 Warnings: 0 mysql> desc… Read more »

hartmut
Guest
hartmut

I *think* we have a bug report for that already, can’t find it in the bugs database though …

Dean Ellis
Guest

Row comparators simply aren’t very well-optimized in MySQL yet. This is a documented limitation, although if memory serves it’s actually mentioned as a subquery limitation.

Some cases are handled properly (equality checks should behave in 5.1). Others you simply have to expand (ie: WHERE (a<2) OR (a=2 AND b<2)).

Roland Bouman
Guest

Hi Justin,

the syntax (a,b) < (x,y)

means:

CASE
WHEN a > x THEN FALSE
WHEN a = x THEN
CASE
WHEN b < y THEN TRUE
ELSE FALSE
END
END

Roland Bouman
Guest

darn…something is not well with literal <

me tries again….

(a,b) < (x,y) is equivalent to:

CASE
WHEN a > x THEN FALSE
WHEN a < x THEN TRUE
ELSE CASE
WHEN b < y THEN TRUE
ELSE FALSE
END
END

Shantanu Oak
Guest

1) Have you tried it on 5.1 version of MySQL?
2) Do you get the same explain plan with and without “limit” ?
3) What is the actual time that it took to return the results?

Harrison Fisk
Guest
Harrison Fisk

The query isn’t a subquery, however for some reason that is where the docs mentions the restriction about row operators. I suspect it was because the row operators were added at the same time as subqueries.

Really, the restriction should be put somewhere else in the documentation, such as the index section or even where row operators are mentioned in the docs.

Xaprb
Guest

Shlomi, as others said — this is just an optimizer weakness at the moment. Peter posted on this a while ago, too: http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/

trackback

[…] code.openark.org, Shlomi Noach writes It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I’m looking for an explanation. I’ll […]

Rick James
Guest

(a, b) < (…) is notorious for being poorly optimized.

The optimal way to do the query is not

(a<2) OR (a=2 AND b<2)

which involves OR at the top level, hence probably a table scan. Instead, use this equivalent:

(a<=2) AND (a<2 OR b<2)

(Caveat: When there is a huge number of rows with a=2, it will still scan all the a=2 values.)

Rick James
Guest

A further note. (HINT to developers)

It should be easy to transform the parse tree for

WHERE (a, b) <* (x, y)

into

a <= x AND (a < x OR b <* y)

Notes:

* <* represents either < or <=

* other operators should have similar transformations

* (a,b,c) < … — also possible, but much messier (exercise for the reader)
* This is useful only if you have an index starting with `a`. (Only `a` will be used, so including `b` is optional.)

* The index will be scanned _through_ a=x, regardless of b and y.

trackback

[…] There’s Bug #35819, which I originally found in this article, which was in turn mentioned in the comments on this post. […]