Slides for “openark-kit: MySQL utilities for everyday use”

Today I have delivered my talk, openark-kit: MySQL utilities for everyday use, at the O’REILLY  MySQL Conference 2011.

The slides are uploaded to the O’Reilly site, and I’m attaching them here as well. Feel free to download the PDF: openark-kit-mysqlconf11.pdf

I wish to thank all who attended my talk!

 

Checking for AUTO_INCREMENT capacity with single query

Darn! This means oak-show-limits becomes redundant. Am I not supposed to speak about it on my coming presentation? Bad timing!

You have AUTO_INCREMENT columns. How far are you pushing the limits? Are you going to run out of AUTO_INCREMENT values soon? Perhaps you wonder whether you should ALTER from INT to BIGINT?

The answer is all there in INFORMATION_SCHEMA. The TABLES table shows the current AUTO_INCREMENT value per table, and the COLUMNS table tells us all about a column’s data type.

It takes some ugly code to deduce the maximum value per column type, what with signed/unsigned and data type, but then its very simple. Here is the query: Continue reading » “Checking for AUTO_INCREMENT capacity with single query”

Would you be my friend on mysqlconf? (tempting offer inside)

I’m still throwing papers to the trash and starting all over, fixing, rewriting and improving my talk at mysqlconf 2011, where I will be presenting openark-kit: MySQL utilities for everyday use.

However I’ve got something up my sleeve: a benefit many can enjoy, that’ll make me a respectful, popular and sought after speaker. While others may try and lure you with such earthly temptations as a 20% off discount, I am in a position to offer you a more spiritual gift: my friendship!

See, if you become my friend, I can offer you a 25% discount on the MySQL conference. Yes, that’s 5% more than my competitors! The only thing I ask in return is that you be my friend (hey, it’s called “friends of speaker”). Not like a FB virtual friend, but a real friendship! One where you can buy me beer or dinner!

If you agree to such humane terms, I will be in the position to let you know that all you have to do is fill in mys11fsd in your registration form.

No, wait! I let it slip! Rewrite: You should fill in mys11fsd [will only tell you this password after your commitment to a beer] in your registration form.

Oh no, not again!

Don’t use mys11fsd without talking to me first… You’re not supposed to… Oh, my beer!

Argghhh!

Speaking at the O’Reilly MySQL Conference 2011

I’m very pleased and humbled to announce that my submission to the upcoming O’Reilly MySQL Conference, April 2011, has been accepted.

I will present a 45 minute session titled openark-kit: MySQL utilities for everyday use.

In this session, I will present some of the tools in the openark kit. We’ll discuss some limitations of the MySQL server, and how openark kit tools overcome those limitations and provide with solutions to common maintenance and audit problems.

This will be a technical session and will discuss various topics of the MySQL server: security, execution plans, replication, triggers and more. I do not intend to discuss all tools, nor to cover the various options. Instead, I’ll present the “behind the scenes“, show why the tools work, present common problems and typical use case.

This will be the first time I present at the MySQL Conference (or any conference outside Israel, for that matter). I hope to have a good session. As extra measure of safety, I’ll bring along a couple basketballs; if the sun shines, we can all go outside and have a good time!

The idea to submit this talk (credit Roland Bouman) has given me the inspiration to put effort in making a new release with new and updated tools. So this talk is already a success as far as I’m concerned.

Hope to see you there!

[PS shameless plug: openark kit.]

oak-hook-general-log: your poor man’s Query Analyzer

The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries.

Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.

With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or… For example, the following execution will only log queries which make for filesort:

oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort

The problem with using the standard logs

So you have the general log, which you don’t often enable, since it tends to grow huge within moments. You then have the slow log. Slow log is great, and is among the top tools for MySQL diagnosis.

The slow log allows for log-queries-not-using-indexes, which is yet another nice feature. Not only should you log any query running for over X seconds, but also log any query which does not use an index.

Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common 20 rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.

And how about queries which do use an index, but do so poorly? They use an index, but retrieve some 12,500,000 rows, using temporary table & filesort?

What oak-hook-general-log does for you

This tool streams out the general log, and filters out queries based on their role or on their execution plan.

To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is 1 minute, configurable). It’s really not a tool you should keep running for days. But during the few moments it runs, it will:

  • Routinely rotate the mysql.general_log table so that it doesn’t fill up
  • Examine entries found in the general log
  • Cross reference entries to the PROCESSLIST so as to deduce database context (bug #52554)
  • If required and appropriate, evaluate a query’s execution plan
  • Decide whether to dump each entry based on filtering rules

Filtering rules

Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are: Continue reading » “oak-hook-general-log: your poor man’s Query Analyzer”

openark-kit (rev. 170): new tools, new functionality

I’m pleased to announce a new release of the openark kit. There’s a lot of new functionality inside; following is a brief overview.

The openark kit is a set of utilities for MySQL. They solve everyday maintenance tasks, which may be complicated or time consuming to work by hand.

It’s been a while since the last announced release. Most of my attention was on mycheckpoint, building new features, writing documentation etc. However my own use of openark kit has only increased in the past few months, and there’s new useful solutions to common problems that have been developed.

I’ve used and improved many tools over this time, but doing the final cut, along with proper documentation, took some time. Anyway, here are the highlights:

New tool: oak-hook-general-log

oak-hook-general-log hooks up a MySQL server and dumps the general log based on filtering rules, applying to query role or execution plan. It is possible to only dump connect/disconnect entries, queries which make a full table scan, or use temporary tables, or scan more than X number of rows, or…

I’ll write more on this tool shortly.

New tool: oak-prepare-shutdown

This tool makes for an orderly and faster shutdown by safely stopping replication, and flushing InnoDB pages to disk prior to shutting down (keeping server available for connections even while attempting to flush dirty pages to disk). A typical use case would be:

oak-prepare-shutdown --user=root --ask-pass --socket=/tmp/mysql.sock && /etc/init.d/mysql stop

New tool: oak-repeat query

oak-repeat-query repeats executing a given query until some condition holds. The condition can be:

  • Number of given iterations has been reached
  • Given time has elapsed
  • No rows have been affected by query

The tool comes in handy for cleanup jobs, warming up caches, etc. Continue reading » “openark-kit (rev. 170): new tools, new functionality”

Thoughts and ideas for Online Schema Change

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table,  to be later (asynchronously) analyzed and make for a replay of actions on the ghost table. Continue reading » “Thoughts and ideas for Online Schema Change”

openark-kit, Facebook Online Schema Change, and thoughts on open source licenses

MySQL@Facebook team have recently published an Online Schema Change code for non blocking ALTER TABLE operations. Thumbs Up!

The code is derived from oak-online-alter-table, part of openark-kit, a toolkit I’m authoring. Looking at the documentation I can see many ideas were incorporated as well. And of course many things are different, a lot of work has been put to it by MySQL@Facebook.

openark-kit is currently released under the new BSD license, and, as far as I can tell (I’m not a lawyer), Facebook’s work has followed the license to the letter. It is a strange thing to see your code incorporated into another project. While I knew work has begun on the tool by Facebook, I wasn’t in on it except for a few preliminary email exchanges.

And this is the beauty

You release code under open source license, and anyone can pick it up and continue working on it. One doesn’t have to ask or even let you know. Eventually one may release back to the community improved code, more tested (not many comments on oak-online-alter-table in the past 18 months).

It is a beauty, that you can freely use one’s patches, and he can then use yours.

And here is my concern

When I created both openark-kit and mycheckpoint, I licensed them under the BSD license. A very permissive license. Let anyone do what they want with it, I thought. However Facebook’s announcement suddenly hit me: what license would other people use for their derived work?

The OSC has been release under permissive license back to the community (again, I am not a lawyer). But, someone else could have made it less friendly. Perhaps not release the code at all: just sell it, closed-source, embedded in their product. And I found out that I do not want anyone to do whatever they want with my code.

I want all derived work to remain open!

Which is why in next releases of code I’m authoring the license will change to less permissive and more open license, such as GPL or LGPL. (Of course, all code released so far remains under the BSD license).

EXPLAIN: missing db info

I’m further developing a general log hook, which can stream queries from the general log.

A particular direction I’m taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It’s almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It’s shaky, but mostly works.
  2. Just realized: there’s no DB info in the EXPLAIN output! It’s weird, since I’ve been EXPLAINing queries for years now. But I’ve always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

Continue reading » “EXPLAIN: missing db info”

oak-hook-general-log: streaming general log

I’m seeking input on a new openark kit utility I’ve started to implement.

The tool, oak-hook-general-log, will hook up to a MySQL (>= 5.1) server, and stream the general log into standard output. It looks like this:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root
2010-03-21 10:18:42     root[root] @ localhost []       79      1       Query   SELECT COUNT(*) FROM City
2010-03-21 10:18:48     root[root] @ localhost []       79      1       Query   DELETE FROM City WHERE id=1000
2010-03-21 10:18:54     root[root] @ localhost []       79      1       Query   SHOW PROCESSLIST
2010-03-21 10:19:06     root[root] @ localhost []       79      1       Quit
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Connect root@localhost on
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Query   select @@version_comment limit 1
2010-03-21 10:22:33     root[root] @ localhost []       93      1       Query   SELECT City.Name, Country.Name FROM Country JOIN City ON Country.Capit
2010-03-21 10:22:58     root[root] @ localhost []       93      1       Quit

Since output is written to stdout, one can further:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep Connect
bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep webuser@webhost

What the tool does is to enable table logs, and periodically rotate the mysql.general_log table, read and dump its content.

Continue reading » “oak-hook-general-log: streaming general log”