Beware of implicit casting

February 2, 2010

Ever so often a query provides a "bad" execution plan. Adding a missing index can many times solve the problem. However, not everything can be solved with an index. I wish to highlight the point of having an implicit cast, which negates the use of an index on MySQL.

I see this happening a lot on customers' databases, and this begs for a short introduction.

MySQL doesn't support index functions

Let's assume the following table:

CREATE TABLE `person` (
 `id` bigint(20) NOT NULL AUTO_INCREMENT,
 `first_name` varchar(32) CHARACTER SET utf8 DEFAULT NULL,
 `last_name` varchar(32) CHARACTER SET utf8 DEFAULT NULL,
 `driver_license_registration` bigint(20) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `last_name` (`last_name`),
 KEY `driver_license_registration` (`driver_license_registration`)

And suppose we're looking for persons whose last name begin with "Smith". The following query will NOT utilize an index:

SELECT * FROM person WHERE LEFT(last_name, 5) = 'Smith';

Why not? Because we use a function (LEFT) over the last_name column, and this makes MySQL unable to deduce the outcome of the expression. Will it be ordered in the same way as the original column? (in this case, the answer is yes, LEFT is a monotonic function) Is it possible to produce the reverse function? (In this case, no). MySQL cannot and does not handle these questions and therefore avoids using the index on last_name altogether:

EXPLAIN SELECT * FROM person WHERE LEFT(last_name, 5) = 'Smith';
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | person | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where |

To solve the problem above, a simple change to the query will do the trick:

EXPLAIN SELECT * FROM person WHERE last_name LIKE 'Smith%';
| id | select_type | table  | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | person | range | last_name     | last_name | 99      | NULL |    5 | Using where |

Not all functions are explicit

This leads us to the subject of this post. What if we were to look for persons whose driver license registration number begins with '123'? Trying to learn from the above example, we write:

SELECT * FROM person WHERE driver_license_registration LIKE '123%';

But the above query does NOT utilize an index! See:

EXPLAIN SELECT * FROM person WHERE driver_license_registration LIKE '123%';
| id | select_type | table  | type | possible_keys               | key  | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | person | ALL  | driver_license_registration | NULL | NULL    | NULL | 1000 | Using where |

Why not?

Because the driver_license_registration column is an integer. The query asks for a string comparison (LIKE '123%'). This kind of comparison cannot be performed on integers. The column value will have to transform to text in order to work out the query. Effectively, it's as if we wrote:

SELECT * FROM person WHERE CAST(driver_license_registration AS CHAR) LIKE '123%';

There's no immediate solution to this last query. Perhaps the column should be textual after all, perhaps maintain a "ghost" column. If the number of digits in a number if fixed an known, then we can convert the above to a range query.

Casting is an implicit function

And, alas, it's a well hidden function. There's nothing to suggest a problem here. This is SQL, not Java. It's not a strongly typed language. I see the above example quite a lot. Another common casting mistake is comparing Timestamp values to strings.

So, until MySQL supports index functions, watch out for these nuances.

  • beware of joining two tables on columns with different collations/charsets. these won't use an index either.
    EXPLAIN EXTENDED followed by SHOW WARNINGS will allow you to catch these cases, since there will be a CONVERT function used...

  • I recommend you to convert the "driver_license_registration" column to a CHAR/VARCHAR column, numeric fields is just required when you need arithmetic operations (that's not the case here... I think)

  • @sbester
    Thank you

    driver_license_Registration being an example only; an integer column also has the advantage of low storage requirements, so it's not all about arithmetics.
    But you are right in suggesting this may be the right solution, as I also suggest above. Just note that this may affect other tables as well, which may join to this table using this column.

  • Pingback: 11 MySQL resources you must read!()

  • Pingback: Implicit casting you don’t want to see around |

  • Pingback: 23:59:59 is not the end of the day. No, really! |

Powered by Wordpress and MySQL. Theme by