EXPLAIN: missing db info

May 11, 2010

I'm further developing a general log hook, which can stream queries from the general log.

A particular direction I'm taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It's almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It's shaky, but mostly works.
  2. Just realized: there's no DB info in the EXPLAIN output! It's weird, since I've been EXPLAINing queries for years now. But I've always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

For example, look at the following imaginary query, involving both the world and sakila databases:

mysql> use test;
Database changed
mysql> EXPLAIN SELECT * FROM world.Country JOIN sakila.city WHERE Country.Capital = city.city_id;
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
| id | select_type | table   | type   | possible_keys | key     | key_len | ref                   | rows | Extra       |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
|  1 | SIMPLE      | Country | ALL    | NULL          | NULL    | NULL    | NULL                  |  239 |             |
|  1 | SIMPLE      | city    | eq_ref | PRIMARY       | PRIMARY | 2       | world.Country.Capital |    1 | Using where |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)

The query is imaginary, since the tables don't actually have anything in common. But look at the EXPLAIN result: can you tell where city came from? Country can somehow be parsed from the ref column, but no help on city.

Moreover, table aliases show on the EXPLAIN plan (which is good), but with no reference to the original table.

So, is it back to parsing of the SQL query? I'm lazy reluctant to do that. It's error prone, and one needs to implement, or use, a good parser, which also accepts MySQL dialect. Haven't looked into this yet.

I'm currently at a standstill with regard to automated query execution plan evaluation where database cannot be determined.

tags: , , , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

13 Comments to "EXPLAIN: missing db info"

  1. Peter Laursen wrote:

    Exactly. All issues you describe here are really showstoppers for *automated* analysis of the general log. Users analyzing *interactively* may have workarounds of the type you describe.

  2. Rob wrote:

    What about using explain extended? The warning generated is the rewritten query with full database.table.column notation that you can parse with the standard explain output.

  3. shlomi wrote:

    @Rob,

    This still leaves me with SQL parsing. Not only do I need a good parser, but I will also need to be able to cross-match the parser's results with the execution plan.
    I confess I have not even tried it; but I can think of a couple problems on the way. For example, what's the alias for a derived table? EXPLAIN uses 'derived1' etc. How can I follow the exact same path?

    But yes, the result of EXPLAIN EXTENDED if far more verbose than the original query. Thank you.

  4. shlomi wrote:

    @Peter,
    I see you've tried the same path...

  5. Roland Bouman wrote:

    Shlomi, Rob has a point. The "canonical" query in the EXPLAIN EXTENDED output is very ugly, but probably quite easy to parse, as all identifiers are quoted, join conditions are fully parenthesized, whitespace is normalized - lot of headeaches gone.

    I suspect you could even do it with a hand-crafted operator precedence parser.

  6. Xaprb wrote:

    Take a look at what mk-index-usage does. There is a lot of overlap between your case and what that tool does. Yes, there is a lot of SQL parsing, and it is not easy. We have a lot of test cases.

  7. shlomi wrote:

    Roland,

    It still does not help in matching the query against the execution plan.

    mysql> explain extended select id from world.City ci, (select code from world.Country co) s1;
    +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------------+
    | id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                          |
    +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------------+
    |  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |  239 |   100.00 |                                |
    |  1 | PRIMARY     | ci         | index | NULL          | PRIMARY | 4       | NULL | 4079 |   100.00 | Using index; Using join buffer |
    |  2 | DERIVED     | co         | index | NULL          | PRIMARY | 3       | NULL |  239 |   100.00 | Using index                    |
    +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------------+
    
    
    

    How can I guess the <derived2> alias and match it against the query? How about two derived tables?

    You are right; I don't have to go to that depth; but the job becomes very difficult. I suspect there are other examples.

  8. shlomi wrote:

    @Baron,

    Thanks; I've just seen it. I think mk-query-digest already uses a very smart SQL parser.
    If I happen to work on my perl skills, I'll take a better look.

  9. Roland Bouman wrote:

    @shlomi,

    "How can I guess the alias and match it against the query?"

    I think I picked this up during Baron's "Explain Demystified" talk from two years back at the MySQL user conference.

    Assuming you can parse the SQL and succesfully match the ci and co aliases in the explain output to the query parse tree, you then know that `co` is a DERIVED table. Together with the value in the id column for that table, you get "derived2" (as in, CONCAT(LOWER(select_type), id)

    Try it with another query like this:

    explain extended select id from world.City ci, (select code from (select * from world.Country) co) s1;

    You'll notice that it actually works.

    Of course, don't mistake my explanation as approval - this is an ugly mess, at least for what you want to achieve :)

  10. shlomi wrote:

    @Roland,
    Hey, that's pretty cool! I may play around with this (or maybe not just right now). I'll let you know if I happen to get results.

  11. Roland Bouman wrote:

    @shlomi,

    well, thank baron :)

    Here's the slides:

    http://assets.en.oreilly.com/1/event/36/EXPLAIN%20Demystified%20Presentation%201.pdf

    (these are of the 2010 talk - not the one I attended but I see the info is still there)

  12. shlomi wrote:

    @Baron

    Thank you; Roland is a good listener :)

  13. Xaprb wrote:

    I wish I had a memory that good.

    Even if you don't learn Perl, the test cases in trunk/common/t/QueryParser.t might be useful to you.

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by openark.org