7 ways to convince MySQL to use the right index

April 2, 2009

Sometimes MySQL gets it wrong. It doesn't use the right index.

It happens that MySQL generates a query plan which is really bad (EXPLAIN says it's going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: "Sure, I can do that with 100 rows using a key".

A true story

A customer had issues with his database. Queries were taking 15 minutes to complete, and the db in general was not responsive. Looking at the slow query log, I found the criminal query. Allow me to bring you up to speed:

A table is defined like this:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT,
  type INT UNSIGNED,
  level TINYINT unsigned,
  ...
  PRIMARY KEY(id),
  KEY `type` (type)
) ENGINE=InnoDB;

The offending query was this:

SELECT id FROM data
WHERE type=12345 AND level > 3
ORDER BY id

The facts were:

  • `t` has about 10,000,000 rows.
  • The index on `type` is selective: about 100 rows per value on average.
  • The query took a long time to complete.
  • EXPLAIN has shown that MySQL uses the PRIMARY KEY, hence searches 10,000,000 rows, filtered "using where".
  • The other EXPLAIN has shown that by using the `type` key, only 110 rows are expected, to be filtered "using where", then sorted "using filesort"

So MySQL acknowledged it was generating the wrong plan. The other plan was better by its own standards.

Solving the problem

Let's walk through 7 ways to solve the problem, starting with the more aggressive solutions, refining to achieve desired behavior through subtle changes.

Solution #1: OPTIMIZE

If MySQL got it wrong, it may be because the table was frequently changed. This affects the statistics. If we can spare the time (table is locked during that time), we could help out by rebuilding the table.

Solution #2: ANALYZE

ANALYZE TABLE is less time consuming, in particular on InnoDB, where it is barely noticed. An ANALYZE will update the index statistics and help out in generating better query plans.

But hold on, the above two solutions are fine, but in the given case, MySQL already acknowledges better plans are at hand. The fact was I tried to run ANALYZE a few times, to no avail.

Solution #3: USE INDEX

Since the issue was urgent, my first thought went for the ultimate weapon:

SELECT id FROM data USE INDEX(type)
WHERE type=12345 AND level > 3
ORDER BY id

This instructs MySQL to only consider the indexes listed; in our example, I only want MySQL to consider using the `type` index. It is using this method that generated the other (good) EXPLAIN result. I could have gone even more ruthless and ask for FORCE INDEX.

Solution #4: IGNORE INDEX

A similar approach would be to explicitly negate the use of the PRIMARY KEY, like this:

SELECT id FROM data IGNORE INDEX(PRIMARY)
WHERE type=12345 AND level > 3
ORDER BY id

A moment of thinking

The above solutions are "ugly", in the sense that this is not standard SQL. It's too MySQL specific.

I've asked the programmers to do a quick rewrite, and had a few moments to consider: why did MySQL insist on using the PRIMARY KEY. Was it because I've asked it for the `id` column only? I rewrote as follows:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id

Nope. EXPLAIN got me the same bad plan. Then it must be the ORDER BY clause:

SELECT id FROM data
WHERE type=12345 AND level > 3

Sure enough, EXPLAIN now  indicates using the `type` index, only reading 110 rows. So MySQL preferred to scan 10,000,000 rows, just so that the rows are generated in the right ORDER, and so no sorting is required, when it could have read 110 rows (where each row is a mere INT) and sort them in no time.

Armed with this knowledge, a few more options come at hand.

Solution #5:Move some logic to the application

At about that point I got a message that the programmers were unable to add the USE INDEX part. Why? They were using the EJB framework, which limits your SQL-like queries to something very generic. Well, you can always drop the ORDER BY part and sort on the application side. That isn't fun, but it's been done.

Solution #6: Negate use of PRIMARY KEY

Can we force MySQL to use the `type` index, retain the ORDER BY, and do it all with standard SQL? Sure. The following query does this:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id+0

id+0 is a function on the `id` column. This makes MySQL unable to utilize the PRIMARY KEY (or any other index on `id`, had there been one).

In his book "SQL Tuning", Dan Tow dedicates a chapter on hints and tips like the above. He shows how to control the use or non-use of indexes, the order by which subqueries are calculated, and more.

Unfortunately, the EJB specification said this was not allowed. You could not ORDER BY a fucntion. Only on normal column.

Solution #7: Make MySQL think the problem is harder than it really is

Almost out of options. Just a moment before settling for sorting on the application side, another issue can be considered: since MySQL was fooled once, can it be fooled again to make things right? Can we fool it to believe that the PRIMARY KEY would not be worthwhile to use? The following query does this:

SELECT id, type, level FROM data
WHERE type=12345 AND level > 3
ORDER BY id, type, level

Let's reflect on this one. What is the order by which the rows are returned now? Answer: exactly as before. Since `id` is PRIMARY KEY, it is also UNIQUE, so no two `id` values are the same. Therefore, the secondary sorting column is redudant, and so is the following one. We get exactly the same result as "ORDER BY id".

But MySQL didn't catch this. This query caused MySQL to say: "Mmmmm. 'ORDER BY id, type, level' is not doable with the PRIMARY KEY only. Well, in this case, I had better used the `type` index". Is this a weakness of MySQL? I guess so. Maybe it will be fixed in the future. But this was the fix that made the day.

tags: , , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

30 Comments to "7 ways to convince MySQL to use the right index"

  1. Mark De Verno wrote:

    Very interesting read.

    Thank you.

  2. Antony Curtis wrote:

    Historically (ie, I haven't checked recently if true) given a set of available indexes, MySQL would use the index which is represented by the lowest bit in the index bitmask and MySQL is only able to choose from about 60 indexes, max.
    The primary index is always the first index (first bit). The remaining indexes are sorted in order of uniqueness and length. This index ordering is done at the table creation time.

    Which reminds me ... for a storage engine that I wrote for a past employer about 10 years ago, I had implemented a form of table discovery in MySQL which index 0 (usually primary key) was actually the recid ... and the remaining keys were in the order used by the native database. Since the recid was not a user-visible column, it was not generally possible to use it in a query. (I later implemented hacks to make it a virtual column which didn't show up on SHOW CREATE etc but it was not recommended to use because a rows recid can change when the row is modified).
    MySQL wrote out the frm files from the discovered tables without the use of the upper parsing/sorting layers of MySQL and so I was free to format the table layout however I liked, including having 'spare' space in the record and having 'overlap' fields.

    It worked very well. Fun times.

  3. Arjen Lentz wrote:

    Your logic in Solution#1.... the statistics the InnoDB subsystem and the optimiser has don't know about fragmentation, so it won't affect the choice of execution method.

  4. shlomi wrote:

    Arjen,

    Thanks. You are right, I didn't even mean defragmentation. I've corrected the text.

  5. Arjen Lentz wrote:

    @shlomi ye of course it'll update the stats, but OPTIMIZE TABLE is a very costly way of accomplishing that. As you noted, it'll lock the table for reads as well as writes for the duration as it's rewriting the entire table.

    To just update the stats, ANALYZE TABLE is the way to go, that was your method#2. the issue there is that InnoDB just takes a few index dives, so it can be a) statistically wrong and b) it has a higher chance of being wrong with a bigger dataset, as the # of dives is a constant. I think there's a Google or Percona patch to modify the # of dives (server-global), that can be useful when you know you have a big(ger) dataset.

  6. shlomi wrote:

    Arjen,

    Text updated. Obviously OPTIMIZE TABLE is a killer, which is why it is first in the list (going from crude to refined).

    Thanks for the info about having the number of dives configurable in google patches. I'm I'm not mistaken, it is 10 dives in InnoDB.

  7. Hannes wrote:

    My first guess would have been:

    ORDER BY type, id

    Have you tried this?

  8. shlomi wrote:

    Hi Hannes,

    I did try it, but the query plan still went for the PRIMARY key only.
    It's interesting to note that, on InnoDB tables, the key on 'type' actually includes the 'id' column implicitly, since secondary indexes contains the value of the primary keys.
    If the optimizer chooses to use the 'type' index on "ORDER BY type, id", it can do so using index only (however, in our case we also had a filter on the 'level' column, which negates this optimization)

  9. Aaron Suggs wrote:

    Another option is to get rid of the `type` index and add a multicolumn index on `type` and `id` (in that order, not `id` then `type`).

    With a multicolumn index, the plan for the original offending query would expect 110 rows, and wouldn't use filesort (since the index contains the sorted values).

  10. shlomi wrote:

    Hi Aaron,

    Please see my previous comment: on InnoDB tables, and when the PRIMARY KEY is `id`, the `type` index is implicitly an index on (`type`, `id`). This is because of the implementation InnoDB does for secondary indexes, which point to PRIMARY key values.

  11. Aaron Suggs wrote:

    Ah, you're right. Putting the primary key as the last column in a multicolumn index is redundant. I hadn't thought of that.

  12. nolan wrote:

    What if you broke it into a query and subquery? i.e.:

    SELECT id from (SELECT id FROM data
    WHERE type=12345 AND level > 3)
    ORDER BY id

    This should make MySQL ignore the primary key in the subquery. I guess you'll also lose the primary key in the main query, but that just means sorting a list of ~100 entries, so it's not so bad.

  13. shlomi wrote:

    Hi nolan,

    Yes, I think this should work also. Since I don't have that DB available to me anymore, and since I cannot reproduce the problem, I can't test...
    If indeed it works (and if EJB supports it) - I like it better than any of the above solutions.

    Thanks

  14. Interesting Items - April 27, 2009 | Oxidiser techblog wrote:

    [...] 7 ways to convince MySQL to use the right index [...]

  15. Artículos destacados de Abril 2009 | cambrico.net wrote:

    [...] formas de forzar el uso de un índice determinado, en openark.org. (en [...]

  16. Jim wrote:

    This bug may be fixed in 5.1.37: http://dev.mysql.com/doc/refman/5.1/en/news-5-1-37.html

    We had EXACTLY the same problem as described in this blog, and upgrading to 5.1.32 seems to have fixed it.

  17. Math.N wrote:

    ORDER BY NULL was the first solution you should have to try.

  18. How to Use Indexes In MySQL - MILOL Blog wrote:

    [...] http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index [...]

  19. pradeep jangid wrote:

    hi all,

    i select data from table and show on one page but page load take more time bcz table have 4 lac records

    so i what do for quickly fetch data from table plz tell me any solution for it immediately any one

    regards,

    pradeep kumar jangir

  20. shlomi wrote:

    @pradeep,

    Best if you ask in a proper forum.
    You should also try and explain your problem more clearly, it is impossible to answer like that. But, really, this is not the best place to ask questions of this sort.

  21. Bubba wrote:

    How to get MySql to use the right index.
    Step 1: Install Postgres.
    Step 2: Delete MySql.

  22. shlomi wrote:

    @Bubba:
    1. Shutdown computer
    2. Hammer computer
    3. Grow vegetables for living

  23. @shlomi wrote:

    @shlomi:

    Step 1 may be redundant, given step 2. Further testing needed.

  24. Baron Schwartz wrote:

    I usually just moderate comments like Bubba's :) It always makes me feel good to just click the Delete button before a comment ever appears live. It reminds me that I am a powerful person who is not to be trifled with! Bwah hah hah!

  25. shlomi wrote:

    @xaprb,
    I actually did, but a day later decided it was too ridiculous to let by.

  26. pirani wrote:

    [Translated from Persian via Google translate]

    Good day.
    I think I'm the problem here
    I have a table with a varchar field with 4 1000000 Record

    The index for all fields I

    But when running the query with a long pause (20) seconds to get across.

    This pause when sorting (order by) more!

    I did not see any difference in the index itself was removed

    Index seems to have no effect.

    Please Please Help

    please use http://translate.google.com. persian to english

  27. shlomi wrote:

    @pirani, I'm sorry. I can't possibly understand the question. My advice is to uploadyour question to StackOverflow, or else get decent consulting.

  28. ANkit wrote:

    Option #7 indeed helped, I was going nuts as to why MySQL suddenly stopped using index - might be due to fragmentation.

    Adding order by column forced optimizer to use the correct Index.

  29. R22 wrote:

    Thank You very much for the article.
    According to my "true story" as You wrote, I'll add option#8. In my case there were two big tables with 50 fields. Yes such amount is madness, but we have to do so, because a searching in one table was much more faster then joining of nearby 35 tables with indexes. So, in one table indexes was working fine, but in another the indexes refused to work. I spent over 12 hours to optimize a query in the second table, according to all advices over the internet. They did not bring results! I was ready to surrender, but in the last moment I found the error during the designing of these tables. First table engine was innoDB - indexes works fine, but the second was myIsam. After rebuilding the second table to InnoDB - indexes started to work. I tried to find out the reason of it in manuals - but couldn't! But I think the reason is in table engine properties as for indexes and keys. They are different.
    Best regards!

  30. Jayadev Swain wrote:

    Well documented article Shlomi!

    What could have also helped is if you had mentioned the Exec Time for each of the Solutions#.

    It would have helped with the comparisons. But nevertheless, this gives us techniques which can put to practice.

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by openark.org