MySQL not being able to utilize a compound index?

May 7, 2009

I came today upon a very strange issue. It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I'm looking for an explanation. I'll appreciate any insight on this.

Take a look at the following table:

CREATE TABLE `t` (
  `a` int(11) NOT NULL default '0',
  `b` int(11) NOT NULL default '0',
  `c` int(11) default NULL,
  PRIMARY KEY  (`a`,`b`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Filled with this data:

mysql> SELECT * FROM t;
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 2 | 4 | NULL |
| 2 | 5 | NULL |
+---+---+------+
10 rows in set (0.00 sec)

Now, it is known that I can query by tuples:

mysql> SELECT * FROM t WHERE (a,b) < (2,2);
+---+---+------+
| a | b | c    |
+---+---+------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 1 | 5 | NULL |
| 2 | 1 | NULL |
+---+---+------+
6 rows in set (0.00 sec)

MySQL understands tuple comparison (e.g. (a,b) < (2.2)) and returns correct results. Now here's my issue: I would assume the PRIMARY KEY is used - since it's on (a,b) - so that's a simple (well, compound) range condition. Alas:

mysql> EXPLAIN SELECT * FROM t WHERE (a,b) < (2,2)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
        Extra: Using where
1 row in set (0.00 sec)

We get a full table scan! Now, MySQL has no problem when I do an equality search (e.g. (a,b) = (2,2)). In that case, the PRIMARY KEY is utilized.

Why, then, would it not use it in the range query? Why would it not consider it as a possible key (I could live with FORCE KEY if that would solve the problem). This problem undermines some efforts of mine for nice optimization tricks.

[UPDATE: the above is a simplified version of a very large table I was using (~50M rows, ~30GB), and on which same results were achieved]

Any ideas or suggestions?

tags:
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

24 Comments to "MySQL not being able to utilize a compound index?"

  1. Roland Bouman wrote:

    Hi Shlomi!

    I think this is because you have such a smalll dataset - just 10 wee rows.

    Make it a some hundred thousends (or tens of thousends) and you should see the index being used.

  2. Justin Swanhart wrote:

    It all boils to to optimizer statistics.

    You will get the same result if you do:
    select * from t where a<2;

    This is because your range scan reads the entire table. It is faster to read the entire table with a FTS than it is to read the full table by index.

    Do you get a range scan when you do this?
    select * from t where (a,b) < (1,2);

    And do you get an index scan (covering index) when you do this?
    select a,b from t where (a,b) < (2,2);

    If the answer to the last two questions is YES, then the optimizer is doing what it is supposed to do.

  3. shlomi wrote:

    Hi Roland!

    Thanks for your comment. I actually found the problem out on a 30GB (~50M rows) table.
    You were suggesting that MySQL preferred a full table scan - but that is not the case, since it would not follow FORCE INDEX, and at any case, does not provide with "possible_keys" on EXPLAIN.

  4. shlomi wrote:

    Justin,

    Thanks. Please see my comment for Roland. If the case was as you suggest, then FORCE INDEX would have done the job - but it does not.
    This also happens on very large tables.

  5. shlomi wrote:

    This is to emphasize:
    The above happens on very large tables, where the search condition is known to be very selective.

    If I translate as ...WHERE (a < 2) OR (a = 2 AND b < 2), this translates to a PRIMARY KEY range search.
    At current, this is the only solution I see, but I'm a bit skeptical whether the optimizer would keep up if I had a 3 columns (uncommon) compound key, 4 columns (though rare) etc.

  6. Justin Swanhart wrote:

    Also, I'm a little unsure what mean with row comparators.

    is
    select * from t where (a,b) < (2,2) equivalent to:

    select * from t where a<2 and b<2
    OR
    select * from t where a<=2 and b<2

    in other words:

    is (2,1) < (2,2) ?

  7. Harrison Fisk wrote:

    I'm pretty sure that row comparison operators just can't use indexes in MySQL.

    If you check out:

    http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html

    You see this gem:

    * Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized:

    (col1, col2, ...) = (val1, val2, ...)
    col1 = val1 AND col2 = val2 AND ...

  8. Matthew Montgomery wrote:

    Possible work around is to make sure you have a covering index when doing ROW() comparisons.

    mysql> desc SELECT * FROM t WHERE (a,b) > (2,2);
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 37 | Using where |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    1 row in set (0.00 sec)

    mysql> alter table t add index (a,b,c);
    Query OK, 37 rows affected (0.08 sec)
    Records: 37 Duplicates: 0 Warnings: 0

    mysql> desc SELECT * FROM t WHERE (a,b) > (2,2);
    +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
    | 1 | SIMPLE | t | index | NULL | a | 13 | NULL | 37 | Using where; Using index |
    +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
    1 row in set (0.00 sec)

  9. hartmut wrote:

    I *think* we have a bug report for that already, can't find it in the bugs database though ...

  10. Dean Ellis wrote:

    Row comparators simply aren't very well-optimized in MySQL yet. This is a documented limitation, although if memory serves it's actually mentioned as a subquery limitation.

    Some cases are handled properly (equality checks should behave in 5.1). Others you simply have to expand (ie: WHERE (a<2) OR (a=2 AND b<2)).

  11. Roland Bouman wrote:

    Hi Justin,

    the syntax (a,b) < (x,y)

    means:

    CASE
    WHEN a > x THEN FALSE
    WHEN a = x THEN
    CASE
    WHEN b < y THEN TRUE
    ELSE FALSE
    END
    END

  12. Roland Bouman wrote:

    darn...something is not well with literal <

    me tries again....

    (a,b) < (x,y) is equivalent to:

    CASE
    WHEN a > x THEN FALSE
    WHEN a < x THEN TRUE
    ELSE CASE
    WHEN b < y THEN TRUE
    ELSE FALSE
    END
    END

  13. shlomi wrote:

    @Matthew,

    This doesn't really change much - you just get a full index scan instead of a full table scan. If I need to get an entire row, I get no advantage. Thanks!

  14. shlomi wrote:

    @Harrison, @Dean

    I don't think this query can be categorized as "subquery" in any way, or else the optimizer is very wrong.
    Do you find that there's a "sub" here?

  15. Shantanu Oak wrote:

    1) Have you tried it on 5.1 version of MySQL?
    2) Do you get the same explain plan with and without "limit" ?
    3) What is the actual time that it took to return the results?

  16. Harrison Fisk wrote:

    The query isn't a subquery, however for some reason that is where the docs mentions the restriction about row operators. I suspect it was because the row operators were added at the same time as subqueries.

    Really, the restriction should be put somewhere else in the documentation, such as the index section or even where row operators are mentioned in the docs.

  17. shlomi wrote:

    @Shantanu,

    Thanks,
    1. Haven't tried this with 5.1
    2. I do
    3. Unlimited (killed query after realizing it was running for a few minutes)

  18. shlomi wrote:

    @Harrison,

    I agree. Thanks for the reference!

  19. Xaprb wrote:

    Shlomi, as others said -- this is just an optimizer weakness at the moment. Peter posted on this a while ago, too: http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/

  20. Log Buffer #146: a Carnival of the Vanities for DBAs | Pythian Group Blog wrote:

    [...] code.openark.org, Shlomi Noach writes It seems like MySQL is unable to utilize a compound index when evaluating a plan for a query with a range condition. I’m looking for an explanation. I’ll [...]

  21. Rick James wrote:

    (a, b) < (...) is notorious for being poorly optimized.

    The optimal way to do the query is not

    (a<2) OR (a=2 AND b<2)

    which involves OR at the top level, hence probably a table scan. Instead, use this equivalent:

    (a<=2) AND (a<2 OR b<2)

    (Caveat: When there is a huge number of rows with a=2, it will still scan all the a=2 values.)

  22. Rick James wrote:

    A further note. (HINT to developers)

    It should be easy to transform the parse tree for

    WHERE (a, b) <* (x, y)

    into

    a <= x AND (a < x OR b <* y)

    Notes:

    * <* represents either < or <=

    * other operators should have similar transformations

    * (a,b,c) < ... -- also possible, but much messier (exercise for the reader)
    * This is useful only if you have an index starting with `a`. (Only `a` will be used, so including `b` is optional.)

    * The index will be scanned _through_ a=x, regardless of b and y.

  23. shlomi wrote:

    @Rick,
    There is no table scan. Although there is this "OR", the optimizer works it out well.

  24. shlomi wrote:

    @Rick,
    The "(a<=2) AND (a<2 OR b<2)" is a cool transformation. I need to test this.

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org