MySQL not being able to utilize a compound index?

May 7, 2009

I came today upon a very strange issue. It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I'm looking for an explanation. I'll appreciate any insight on this.

Take a look at the following table:

  `a` int(11) NOT NULL default '0',
  `b` int(11) NOT NULL default '0',
  `c` int(11) default NULL,
  PRIMARY KEY  (`a`,`b`)

Filled with this data:

mysql> SELECT * FROM t;
| a | b | c    |
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 2 | 4 | NULL |
| 2 | 5 | NULL |
10 rows in set (0.00 sec)

Now, it is known that I can query by tuples:

mysql> SELECT * FROM t WHERE (a,b) < (2,2);
| a | b | c    |
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
6 rows in set (0.00 sec)

MySQL understands tuple comparison (e.g. (a,b) < (2.2)) and returns correct results. Now here's my issue: I would assume the PRIMARY KEY is used - since it's on (a,b) - so that's a simple (well, compound) range condition. Alas:

mysql> EXPLAIN SELECT * FROM t WHERE (a,b) < (2,2)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

We get a full table scan! Now, MySQL has no problem when I do an equality search (e.g. (a,b) = (2,2)). In that case, the PRIMARY KEY is utilized.

Why, then, would it not use it in the range query? Why would it not consider it as a possible key (I could live with FORCE KEY if that would solve the problem). This problem undermines some efforts of mine for nice optimization tricks.

[UPDATE: the above is a simplified version of a very large table I was using (~50M rows, ~30GB), and on which same results were achieved]

Any ideas or suggestions?

  • Hi Shlomi!

    I think this is because you have such a smalll dataset - just 10 wee rows.

    Make it a some hundred thousends (or tens of thousends) and you should see the index being used.

  • It all boils to to optimizer statistics.

    You will get the same result if you do:
    select * from t where a<2;

    This is because your range scan reads the entire table. It is faster to read the entire table with a FTS than it is to read the full table by index.

    Do you get a range scan when you do this?
    select * from t where (a,b) < (1,2);

    And do you get an index scan (covering index) when you do this?
    select a,b from t where (a,b) < (2,2);

    If the answer to the last two questions is YES, then the optimizer is doing what it is supposed to do.

  • Hi Roland!

    Thanks for your comment. I actually found the problem out on a 30GB (~50M rows) table.
    You were suggesting that MySQL preferred a full table scan - but that is not the case, since it would not follow FORCE INDEX, and at any case, does not provide with "possible_keys" on EXPLAIN.

  • Justin,

    Thanks. Please see my comment for Roland. If the case was as you suggest, then FORCE INDEX would have done the job - but it does not.
    This also happens on very large tables.

  • This is to emphasize:
    The above happens on very large tables, where the search condition is known to be very selective.

    If I translate as ...WHERE (a < 2) OR (a = 2 AND b < 2), this translates to a PRIMARY KEY range search. At current, this is the only solution I see, but I'm a bit skeptical whether the optimizer would keep up if I had a 3 columns (uncommon) compound key, 4 columns (though rare) etc.

  • Also, I'm a little unsure what mean with row comparators.

    select * from t where (a,b) < (2,2) equivalent to:

    select * from t where a<2 and b<2
    select * from t where a<=2 and b<2

    in other words:

    is (2,1) < (2,2) ?

  • Harrison Fisk

    I'm pretty sure that row comparison operators just can't use indexes in MySQL.

    If you check out:

    You see this gem:

    * Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:

    (col1, col2, ...) = (val1, val2, ...)
    col1 = val1 AND col2 = val2 AND ...

  • Matthew Montgomery

    Possible work around is to make sure you have a covering index when doing ROW() comparisons.

    mysql> desc SELECT * FROM t WHERE (a,b) > (2,2);
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 37 | Using where |
    1 row in set (0.00 sec)

    mysql> alter table t add index (a,b,c);
    Query OK, 37 rows affected (0.08 sec)
    Records: 37 Duplicates: 0 Warnings: 0

    mysql> desc SELECT * FROM t WHERE (a,b) > (2,2);
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    | 1 | SIMPLE | t | index | NULL | a | 13 | NULL | 37 | Using where; Using index |
    1 row in set (0.00 sec)

  • hartmut

    I *think* we have a bug report for that already, can't find it in the bugs database though ...

  • Row comparators simply aren't very well-optimized in MySQL yet. This is a documented limitation, although if memory serves it's actually mentioned as a subquery limitation.

    Some cases are handled properly (equality checks should behave in 5.1). Others you simply have to expand (ie: WHERE (a<2) OR (a=2 AND b<2)).

  • Hi Justin,

    the syntax (a,b) < (x,y)


    WHEN a = x THEN
    WHEN b < y THEN TRUE

  • darn...something is not well with literal <

    me tries again....

    (a,b) < (x,y) is equivalent to:

    WHEN a < x THEN TRUE
    WHEN b < y THEN TRUE

  • @Matthew,

    This doesn't really change much - you just get a full index scan instead of a full table scan. If I need to get an entire row, I get no advantage. Thanks!

  • @Harrison, @Dean

    I don't think this query can be categorized as "subquery" in any way, or else the optimizer is very wrong.
    Do you find that there's a "sub" here?

  • 1) Have you tried it on 5.1 version of MySQL?
    2) Do you get the same explain plan with and without "limit" ?
    3) What is the actual time that it took to return the results?

  • Harrison Fisk

    The query isn't a subquery, however for some reason that is where the docs mentions the restriction about row operators. I suspect it was because the row operators were added at the same time as subqueries.

    Really, the restriction should be put somewhere else in the documentation, such as the index section or even where row operators are mentioned in the docs.

  • @Shantanu,

    1. Haven't tried this with 5.1
    2. I do
    3. Unlimited (killed query after realizing it was running for a few minutes)

  • @Harrison,

    I agree. Thanks for the reference!

  • Shlomi, as others said -- this is just an optimizer weakness at the moment. Peter posted on this a while ago, too:

  • Pingback: Log Buffer #146: a Carnival of the Vanities for DBAs | Pythian Group Blog()

  • (a, b) < (...) is notorious for being poorly optimized.

    The optimal way to do the query is not

    (a<2) OR (a=2 AND b<2)

    which involves OR at the top level, hence probably a table scan. Instead, use this equivalent:

    (a<=2) AND (a<2 OR b<2)

    (Caveat: When there is a huge number of rows with a=2, it will still scan all the a=2 values.)

  • A further note. (HINT to developers)

    It should be easy to transform the parse tree for

    WHERE (a, b) <* (x, y)


    a <= x AND (a < x OR b <* y)


    * <* represents either < or <=

    * other operators should have similar transformations

    * (a,b,c) < ... -- also possible, but much messier (exercise for the reader)
    * This is useful only if you have an index starting with `a`. (Only `a` will be used, so including `b` is optional.)

    * The index will be scanned _through_ a=x, regardless of b and y.

  • @Rick,
    There is no table scan. Although there is this "OR", the optimizer works it out well.

  • @Rick,
    The "(a<=2) AND (a<2 OR b<2)" is a cool transformation. I need to test this.

  • Pingback: MySql is not optimizing the query properly | XL-UAT()

Powered by Wordpress and MySQL. Theme by