Reading RBR binary logs with pt-query-digest

For purposes of auditing anything that goes on our servers we’re looking to parse the binary logs of all servers (masters), as with “Anemomaster“. With Row Based Replication this is problematic since pt-query-digest does not support parsing RBR binary logs (true for 2.2.12, latest at this time).

I’ve written a simple script that translates RBR logs to SBR-like logs, with a little bit of cheating. My interest is that pt-query-digest is able to capture and count the queries, nothing else. By doing some minimal text manipulation on the binary log I’m able to now feed it to pt-query-digest which seems to be happy.

The script of course does not parse the binary log directly; furthermore, it requires the binary log to be extracted via:

mysqlbinlog --verbose --base64-output=DECODE-ROWS your-mysql-binlog-filemame.000001

The above adds the interpretation of the RBR entires in the form of (unconventional) statements, commented, and strips out the cryptic RBR text. All that is left is to do a little manipulation on entry headers and uncomment the interpreted queries.

The script can be found in my gist repositories. Current version is as follows: Continue reading » “Reading RBR binary logs with pt-query-digest”

Bash script: report largest InnoDB files

The following script will report the largest InnoDB tables under the data directory: schema, table & length in bytes. The tables could be non-partitioned, in which case this is simply the size of the corresponding .ibd file, or they can be partitioned, in which case the reported size is the sum of all partition files. It is assumed tables reside in their own tablespace files, i.e. created with innodb_file_per_table=1.

(
    mysql_datadir=$(grep datadir /etc/my.cnf | cut -d "=" -f 2)
    cd $mysql_datadir
    for frm_file in $(find . -name "*.frm")
    do
        tbl_file=${frm_file//.frm/.ibd}
        table_schema=$(echo $frm_file | cut -d "/" -f 2)
        table_name=$(echo $frm_file | cut -d "/" -f 3 | cut -d "." -f 1)
        if [ -f $tbl_file ]
        then
            # unpartitioned table
            file_size=$(du -cb $tbl_file 2> /dev/null | tail -n 1) 
        else
            # attempt partitioned innodb table
            tbl_file_partitioned=${frm_file//.frm/#*.ibd}
            file_size=$(du -cb $tbl_file_partitioned 2> /dev/null | tail -n 1)
        fi
        file_size=${file_size//total/}
        # Replace the below with whatever action you want to take,
        # for example, push the values into graphite.
        echo $file_size $table_schema $table_name
    done
) | sort -k 1 -nr | head -n 20

We use this to push table statistics to our graphite service; we keep an eye on table growth (we actually do not limit to top 20 but just monitor them all). File size does not report the real table data size (this can be smaller due to tablespace fragmentation). It does give the correct information if you’re concerned about disk space. For table data we also monitor SHOW TABLE STATUS / INFORMATION_SCHEMA.TABLES, themselves being inaccurate. Gotta go by something.

common_schema 1.1 released: split(), try-catch, killall(), profiling

I’m very happy to announce the release of common_schema, version 1.1 (revision 300).

This version boasts with compelling new features: innovative QueryScript syntax, libraries, views which add to your skills as a DBA, making some maintenance and management tasks a breeze.

  • QueryScript, split statement: automagically break long queries into smaller chunks, avoid long locks and reduce query/transaction overhead
  • QueryScript, try-catch statement: just try { something; } catch { act_on_error; }.
  • killall(): quickly kill connections based on grantee/user/host information.
  • profiling/profiling_last: utility views to assist in query profiling diagnostics
  • 1 size fits all: a single installer which auto-recognizes available server features and enables respective common_schema features accordingly.
  • QueryScript performance boost
  • much much more…

Not familiar with common_schema? It allows you to do stuff on server side, by selecting from views, calling upon useful routines or writing easy-to-manage scripts.

I’m suggesting that common_schema should be a really-should-have tool to accompany your MySQL install. Did I say “tool”? It’s merely a schema. But it makes for a great framework:

In High Performance MySQL, 3rd edition, Baron Schwartz describes common_schema:

The common_schema is to MySQL as jQuery is to javaScript

Reviewing highlights for version 1.1:

QueryScript

QueryScript is a scripting language. It sees some major improvements here. I’ve made some speed boosts by avoiding using temporary tables, and by using string parsing instead.

Without doubt the two most handy statements added to QueryScript are: Continue reading » “common_schema 1.1 released: split(), try-catch, killall(), profiling”

common_schema, rev. 178: foreach(), repeat_exec(), Roland Bouman, query analysis

common_schema, revision 178 is now released, with major additions. This revision turns common_schema into a framework, rather than a set of views and functions.

common_schema provides with query scripting, analysis & informational views, and a function library, allowing for easier administration and diagnostics for MySQL. It introduces SQL based tools which simplify otherwise complex shell and client scripts, allowing the DBA to be independent of operating system, installed packages and dependencies.

There’s no Perl nor Python, and no dependencies to install. It’s just a schema.

Some highlights for the new revision:

  • foreach(), aka $(): loop through a collection, execute callback commands per element.
  • repeat_exec(): a repeat-until device: execute queries until some condition holds.
  • exec_file(): execute files a-la SOURCE, but on server side
  • Query analysis: analyze query text, view or routine definitions to detect dependency objects.
  • Improvements to views and routines, new routines introduced.

Let’s take a closer look:

rpbouman

I’m very happy to have Roland Bouman working on this project. He introduced some sophisticated code without which some functionality could not take place. I’m sure I don’t need to introduce his great capabilities; I’ll just pass the note that it is very good working with him!

foreach()

Introducing a looping device which can iterate a collection and execute callback commands.

What’s a collection? A range of numbers; a set of constants; the result set of a SELECT query; tables in your database and more.

What is a callback? A query or set of queries to invoke on the specific elements in the collection. For example:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED');

I’ll publish dedicated posts on foreach(), aka $(), following this post. Official documentation is here.

repeat_exec()

Repeat executing queries in a given interval, until some condition holds.

What kind of condition? You can loop forever, or until a given time has passed, a given number of iteration has passed. Continue reading » “common_schema, rev. 178: foreach(), repeat_exec(), Roland Bouman, query analysis”

MySQL Global status difference using single query

Have just read MySQL Global status difference using MySQL procedures / functions, by Andres Karlsson. Have commented, but realized I did not provide with a direct answer. In the comment, I suggested checking out a solution based on views, found in common_schema. But the solution in common_schema is split into two views, due to the fact views cannot handle derived tables subqueries.

Well, here’s a single query to do that: it checks GLOBAL_STATUS twice, 10 seconds apart in the following sample. It uses SLEEP() to actually wait between the two reads. Yes, you can do that with a query.

The following query shows all GLOBAL_STATUS values that have changed during the sample period.

[UPDATE] query updated to work with MySQL 5.6 optimizer Continue reading » “MySQL Global status difference using single query”

Announcing common_schema: common views & routines for MySQL

Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

Continue reading » “Announcing common_schema: common views & routines for MySQL”

Checking for AUTO_INCREMENT capacity with single query

Darn! This means oak-show-limits becomes redundant. Am I not supposed to speak about it on my coming presentation? Bad timing!

You have AUTO_INCREMENT columns. How far are you pushing the limits? Are you going to run out of AUTO_INCREMENT values soon? Perhaps you wonder whether you should ALTER from INT to BIGINT?

The answer is all there in INFORMATION_SCHEMA. The TABLES table shows the current AUTO_INCREMENT value per table, and the COLUMNS table tells us all about a column’s data type.

It takes some ugly code to deduce the maximum value per column type, what with signed/unsigned and data type, but then its very simple. Here is the query: Continue reading » “Checking for AUTO_INCREMENT capacity with single query”

oak-hook-general-log: your poor man’s Query Analyzer

The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries.

Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.

With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or… For example, the following execution will only log queries which make for filesort:

oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort

The problem with using the standard logs

So you have the general log, which you don’t often enable, since it tends to grow huge within moments. You then have the slow log. Slow log is great, and is among the top tools for MySQL diagnosis.

The slow log allows for log-queries-not-using-indexes, which is yet another nice feature. Not only should you log any query running for over X seconds, but also log any query which does not use an index.

Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common 20 rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.

And how about queries which do use an index, but do so poorly? They use an index, but retrieve some 12,500,000 rows, using temporary table & filesort?

What oak-hook-general-log does for you

This tool streams out the general log, and filters out queries based on their role or on their execution plan.

To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is 1 minute, configurable). It’s really not a tool you should keep running for days. But during the few moments it runs, it will:

  • Routinely rotate the mysql.general_log table so that it doesn’t fill up
  • Examine entries found in the general log
  • Cross reference entries to the PROCESSLIST so as to deduce database context (bug #52554)
  • If required and appropriate, evaluate a query’s execution plan
  • Decide whether to dump each entry based on filtering rules

Filtering rules

Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are: Continue reading » “oak-hook-general-log: your poor man’s Query Analyzer”

openark-kit (rev. 170): new tools, new functionality

I’m pleased to announce a new release of the openark kit. There’s a lot of new functionality inside; following is a brief overview.

The openark kit is a set of utilities for MySQL. They solve everyday maintenance tasks, which may be complicated or time consuming to work by hand.

It’s been a while since the last announced release. Most of my attention was on mycheckpoint, building new features, writing documentation etc. However my own use of openark kit has only increased in the past few months, and there’s new useful solutions to common problems that have been developed.

I’ve used and improved many tools over this time, but doing the final cut, along with proper documentation, took some time. Anyway, here are the highlights:

New tool: oak-hook-general-log

oak-hook-general-log hooks up a MySQL server and dumps the general log based on filtering rules, applying to query role or execution plan. It is possible to only dump connect/disconnect entries, queries which make a full table scan, or use temporary tables, or scan more than X number of rows, or…

I’ll write more on this tool shortly.

New tool: oak-prepare-shutdown

This tool makes for an orderly and faster shutdown by safely stopping replication, and flushing InnoDB pages to disk prior to shutting down (keeping server available for connections even while attempting to flush dirty pages to disk). A typical use case would be:

oak-prepare-shutdown --user=root --ask-pass --socket=/tmp/mysql.sock && /etc/init.d/mysql stop

New tool: oak-repeat query

oak-repeat-query repeats executing a given query until some condition holds. The condition can be:

  • Number of given iterations has been reached
  • Given time has elapsed
  • No rows have been affected by query

The tool comes in handy for cleanup jobs, warming up caches, etc. Continue reading » “openark-kit (rev. 170): new tools, new functionality”

EXPLAIN: missing db info

I’m further developing a general log hook, which can stream queries from the general log.

A particular direction I’m taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It’s almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It’s shaky, but mostly works.
  2. Just realized: there’s no DB info in the EXPLAIN output! It’s weird, since I’ve been EXPLAINing queries for years now. But I’ve always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

Continue reading » “EXPLAIN: missing db info”