logs – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Wed, 15 Dec 2010 17:46:06 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 oak-hook-general-log: your poor man’s Query Analyzer https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer#respond Wed, 15 Dec 2010 17:46:06 +0000 https://shlomi-noach.github.io/blog/?p=3032 The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries.

Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.

With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or… For example, the following execution will only log queries which make for filesort:

oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort

The problem with using the standard logs

So you have the general log, which you don’t often enable, since it tends to grow huge within moments. You then have the slow log. Slow log is great, and is among the top tools for MySQL diagnosis.

The slow log allows for log-queries-not-using-indexes, which is yet another nice feature. Not only should you log any query running for over X seconds, but also log any query which does not use an index.

Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common 20 rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.

And how about queries which do use an index, but do so poorly? They use an index, but retrieve some 12,500,000 rows, using temporary table & filesort?

What oak-hook-general-log does for you

This tool streams out the general log, and filters out queries based on their role or on their execution plan.

To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is 1 minute, configurable). It’s really not a tool you should keep running for days. But during the few moments it runs, it will:

  • Routinely rotate the mysql.general_log table so that it doesn’t fill up
  • Examine entries found in the general log
  • Cross reference entries to the PROCESSLIST so as to deduce database context (bug #52554)
  • If required and appropriate, evaluate a query’s execution plan
  • Decide whether to dump each entry based on filtering rules

Filtering rules

Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are:

  • filter-connection: only log connect/quit entries
  • filter-explain-fullscan: only log full table scans
  • filter-explain-temporary: only log queries which create implicit temporary tables
  • filter-explain-rows-exceed: only log queries where more than X number of rows are being accessed on some table (estimated)
  • filter-explain-total-rows-exceed: only log queries where more than X number of rows are accessed on all tables combined (estimated, with possibly incorrect numbers on some queries)
  • filter-explain-key: only log queries using a specific index. This feature somewhat overlaps with Maatkit’s mk-index-usage (read announcement).
  • filter-explain-contains: a general purpose grep on the execution plan. Log queries where the execution plan contains some text.

There are other filters, and I will possibly add more in due time.

Here are a couple cases I used oak-hook-general-log for:

Use case: temporary tables

I have a server with this alarming chart (courtesy mycheckpoint) of temporary tables:


What could possibly create 30 temporary tables per second on average?

The slow log produced nothing helpful, even with log-queries-not-using-indexes enabled. There were a lot of queries not using indexes there, but nothing at these numbers. With:

oak-hook-general-log --filter-explain-temporary

enabled for 1 minute, nothing came out. Weird. Enabled for 5 minutes, I got one entry. Turned out a scheduled script, acting once per 5 minutes, was making a single complicated query involving many nested views, which accounted for some hundreds of temporary tables created. All of them very small, query time was very fast. There is no temporary tables problem with this server, case closed.

Use case: connections

A server had issues with some exceptions being thrown on the client side. There was a large number of new connections created per second although the client was using a connection pool. Suspecting the pool didn’t work well, I issued:

oak-hook-general-log --filter-connect

The pool was working well, all right. No entries for that client were recorder in 1 minute of testing. However, it turned out some old script was flooding the MySQL server with requests, every second. The log showed root@somehost, and sure enough, the script was disabled. Exceptions were due to another reason; it was good to eliminate a suspect.

Some of the tool’s use case is relatively easy to solve with tail, grep & awk; others are not. I am using it more and more often, and find it to make significant shortcuts in tracking down queries.

Get it

Download the tool as part of openark kit: access the openark kit project page.

Or get the source code directly.

Feedback is most welcome.

]]>
https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer/feed 0 3032
EXPLAIN: missing db info https://shlomi-noach.github.io/blog/mysql/explain-missing-db-info https://shlomi-noach.github.io/blog/mysql/explain-missing-db-info#comments Tue, 11 May 2010 04:57:02 +0000 https://shlomi-noach.github.io/blog/?p=2368 I’m further developing a general log hook, which can stream queries from the general log.

A particular direction I’m taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It’s almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It’s shaky, but mostly works.
  2. Just realized: there’s no DB info in the EXPLAIN output! It’s weird, since I’ve been EXPLAINing queries for years now. But I’ve always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

For example, look at the following imaginary query, involving both the world and sakila databases:

mysql> use test;
Database changed
mysql> EXPLAIN SELECT * FROM world.Country JOIN sakila.city WHERE Country.Capital = city.city_id;
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
| id | select_type | table   | type   | possible_keys | key     | key_len | ref                   | rows | Extra       |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
|  1 | SIMPLE      | Country | ALL    | NULL          | NULL    | NULL    | NULL                  |  239 |             |
|  1 | SIMPLE      | city    | eq_ref | PRIMARY       | PRIMARY | 2       | world.Country.Capital |    1 | Using where |
+----+-------------+---------+--------+---------------+---------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)

The query is imaginary, since the tables don’t actually have anything in common. But look at the EXPLAIN result: can you tell where city came from? Country can somehow be parsed from the ref column, but no help on city.

Moreover, table aliases show on the EXPLAIN plan (which is good), but with no reference to the original table.

So, is it back to parsing of the SQL query? I’m lazy reluctant to do that. It’s error prone, and one needs to implement, or use, a good parser, which also accepts MySQL dialect. Haven’t looked into this yet.

I’m currently at a standstill with regard to automated query execution plan evaluation where database cannot be determined.

]]>
https://shlomi-noach.github.io/blog/mysql/explain-missing-db-info/feed 13 2368
oak-hook-general-log: streaming general log https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-streaming-general-log https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-streaming-general-log#comments Sun, 21 Mar 2010 08:45:58 +0000 https://shlomi-noach.github.io/blog/?p=2253 I’m seeking input on a new openark kit utility I’ve started to implement.

The tool, oak-hook-general-log, will hook up to a MySQL (>= 5.1) server, and stream the general log into standard output. It looks like this:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root
2010-03-21 10:18:42     root[root] @ localhost []       79      1       Query   SELECT COUNT(*) FROM City
2010-03-21 10:18:48     root[root] @ localhost []       79      1       Query   DELETE FROM City WHERE id=1000
2010-03-21 10:18:54     root[root] @ localhost []       79      1       Query   SHOW PROCESSLIST
2010-03-21 10:19:06     root[root] @ localhost []       79      1       Quit
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Connect root@localhost on
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Query   select @@version_comment limit 1
2010-03-21 10:22:33     root[root] @ localhost []       93      1       Query   SELECT City.Name, Country.Name FROM Country JOIN City ON Country.Capit
2010-03-21 10:22:58     root[root] @ localhost []       93      1       Quit

Since output is written to stdout, one can further:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep Connect
bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep webuser@webhost

What the tool does is to enable table logs, and periodically rotate the mysql.general_log table, read and dump its content.

The tool:

  • Stores and restores the original log state (general log enabled/disabled, log output).
  • Disables printing of its own queries to the general log.
  • Automatically times out (timeout configurable) so as not to enter a situation where the general log is forgotten to be turned on.
  • Can discard pre-existing data on the mysql.general_log table.
  • Will cleanup the mysql.slow_log table, if it wasn’t originally used (turning on table logs applies to both general log and slow log).

What would you have the tool do further? Should it provide filtering, or should we just use grep/sed/awk for that? Any internal aggregation of data?

I would love to hear your thoughts. Meanwhile, view or grab the python script file.

]]>
https://shlomi-noach.github.io/blog/mysql/oak-hook-general-log-streaming-general-log/feed 9 2253