Generating Google line charts with SQL, part I

In this series of posts I wish to show how Google Charts can be generated via SQL. We discuss the Google Charts limitations which must be challenged, and work towards a simple chart.

I’m going to present the algorithm I use in mycheckpoint, a MySQL monitoring utility, which generates Google charts by raw data using views. An example of such chart follows:


http://chart.apis.google.com/chart?cht=lc&chs=370x180&chts=303030,12&chtt=Latest+24+hours:+Nov+9,+05:50++-++Nov+10,+05:50&chf=c,s,ffffff&chdl=Rentals+rate:+custom_1_psec&chdlp=b&chco=ff8c00&chd=s:GDGKGFLFGMJHRLMPPNULJRPLTOPRUMYPPVRNbQUSUSbSNWUOfSWTObVSUVWSVYVPbTPjfTbRTdXReUWhcTQRQZbTWYVYPaVZXdYYWPTabYUTbW99QLgLNIOIRNNMIKRJEHGFHGJGGFIFDFGDK&chxt=x,y&chxr=1,0,8.720000&chxl=0:|+||08:00||+||12:00||+||16:00||+||20:00||+||00:00||+||04:00||&chxs=0,505050,10,0,lt&chg=4.17,25,1,2,0.69,0&chxp=0,0.69,4.86,9.03,13.20,17.37,21.54,25.71,29.88,34.05,38.22,42.39,46.56,50.73,54.90,59.07,63.24,67.41,71.58,75.75,79.92,84.09,88.26,92.43,96.60&tsstart=2010-11-09+05:50:00&tsstep=600

mycheckpoint does not actually call on Google to do the chart rendering, but invokes its own JavaScript code to visualize the URL locally.

Here are some downsides for using Google charts:

  • The URL cannot be as long as you like. 2048 characters is an upper bound you’ll want to keep behind. [Google charts POST method calls are available, which leads to 16K equivalent of URL length — this is still not too helpful due to the nature of POST calls]
  • Features are inconsistent. To specify label or tick positions, one must specify exact positions. To specify grid positions, one must supply with step, offset, etc. There are more such inconsistencies.
  • Google charts are not too friendly. Taking the ticks and grids example from above, there really shouldn’t be a reason why grids would not be automatically generated according to ticks definitions. But we are required to specify positions for the ticks as well as for the grids.
  • There is no support for time-series. One must translate time as x-axis values.
  • Perhaps most intimidating to many people: to generate a Google chart, once must send data to Google. Which is the main reason I used local JavaScript rendering.

Anyway, let’s build a very simple chart. Since I will not cover everything in this post, we make for some relaxed conditions. Continue reading » “Generating Google line charts with SQL, part I”

Multi condition UPDATE query

A simple question I’ve been asked:

Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query?

For example, is it possible to merge the following two UPDATE statements into one?

mysql> UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G';
Query OK, 178 rows affected (0.01 sec)

mysql> UPDATE film SET rental_rate=rental_rate-0.5 WHERE length < 90;
Query OK, 320 rows affected (0.01 sec)

To verify our tests, we take a checksum:

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
c2d253c3919efaa6d11487b1fd5061f3  -

Obviously, the following query is incorrect: Continue reading » “Multi condition UPDATE query”

Another use for “top N records per group” query

A few days ago I published SQL: selecting top N records per group. It just dawned on me that the very same query solved another type of problem I was having a couple years ago.

BTW, for reference, Baron Schwartz posted this 4 years ago. There are very interesting approaches in text and in comments. Good to see the ancients already knew of such problems, I should study my history better.

(Kidding, kidding! This is self criticism of course)

In this case I present now I have a table with recurring data, which, to some extent, represents revision of data. For the sake of simplicity let’s describe it as a simple simulation of a revision system:

  • I have text files, whose content I store within a row
  • Each text file uses at least one row in the table
  • Text files can be edited, whereas an edited file is written in a new row (never UPDATEd).

Hold your horses: I’m not really implementing a revision system, I just can’t get into the actual details here.

The table becomes large quickly, and it’s desired to purge rows from the table. The rule is: we must obtain the most recent two versions for each file (if there are indeed more than one). All others can be purged. So the question is: Continue reading » “Another use for “top N records per group” query”

SQL: selecting top N records per group

A while back I presented(*) an SQL trick to present with non-aggregated column on a GROUP BY query, without use of subquery or derived tables.

Based on a similar concept, combined with string walking, I now present a query which selects top-n records for each group, ordered by some condition. It will require no subqueries. It executes faster than its more conventional alternatives.

[UPDATE: this is MySQL only. Others can use Window Functions where available]

Using the simple world database, we answer the following question:

What are the top 5 largest (by area) countries for each continent? What are their names, surface area and population?

Similar questions would be:

What were the latest 5 films rented by each customer?

What were the most presented advertisements for each user?

etc.

Step 1: getting the top

We already know how to get a single column’s value for the top country, as presented in the aforementioned post: Continue reading » “SQL: selecting top N records per group”

Speaking at the O’Reilly MySQL Conference 2011

I’m very pleased and humbled to announce that my submission to the upcoming O’Reilly MySQL Conference, April 2011, has been accepted.

I will present a 45 minute session titled openark-kit: MySQL utilities for everyday use.

In this session, I will present some of the tools in the openark kit. We’ll discuss some limitations of the MySQL server, and how openark kit tools overcome those limitations and provide with solutions to common maintenance and audit problems.

This will be a technical session and will discuss various topics of the MySQL server: security, execution plans, replication, triggers and more. I do not intend to discuss all tools, nor to cover the various options. Instead, I’ll present the “behind the scenes“, show why the tools work, present common problems and typical use case.

This will be the first time I present at the MySQL Conference (or any conference outside Israel, for that matter). I hope to have a good session. As extra measure of safety, I’ll bring along a couple basketballs; if the sun shines, we can all go outside and have a good time!

The idea to submit this talk (credit Roland Bouman) has given me the inspiration to put effort in making a new release with new and updated tools. So this talk is already a success as far as I’m concerned.

Hope to see you there!

[PS shameless plug: openark kit.]

oak-hook-general-log: your poor man’s Query Analyzer

The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries.

Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.

With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or… For example, the following execution will only log queries which make for filesort:

oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort

The problem with using the standard logs

So you have the general log, which you don’t often enable, since it tends to grow huge within moments. You then have the slow log. Slow log is great, and is among the top tools for MySQL diagnosis.

The slow log allows for log-queries-not-using-indexes, which is yet another nice feature. Not only should you log any query running for over X seconds, but also log any query which does not use an index.

Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common 20 rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.

And how about queries which do use an index, but do so poorly? They use an index, but retrieve some 12,500,000 rows, using temporary table & filesort?

What oak-hook-general-log does for you

This tool streams out the general log, and filters out queries based on their role or on their execution plan.

To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is 1 minute, configurable). It’s really not a tool you should keep running for days. But during the few moments it runs, it will:

  • Routinely rotate the mysql.general_log table so that it doesn’t fill up
  • Examine entries found in the general log
  • Cross reference entries to the PROCESSLIST so as to deduce database context (bug #52554)
  • If required and appropriate, evaluate a query’s execution plan
  • Decide whether to dump each entry based on filtering rules

Filtering rules

Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are: Continue reading » “oak-hook-general-log: your poor man’s Query Analyzer”

openark-kit (rev. 170): new tools, new functionality

I’m pleased to announce a new release of the openark kit. There’s a lot of new functionality inside; following is a brief overview.

The openark kit is a set of utilities for MySQL. They solve everyday maintenance tasks, which may be complicated or time consuming to work by hand.

It’s been a while since the last announced release. Most of my attention was on mycheckpoint, building new features, writing documentation etc. However my own use of openark kit has only increased in the past few months, and there’s new useful solutions to common problems that have been developed.

I’ve used and improved many tools over this time, but doing the final cut, along with proper documentation, took some time. Anyway, here are the highlights:

New tool: oak-hook-general-log

oak-hook-general-log hooks up a MySQL server and dumps the general log based on filtering rules, applying to query role or execution plan. It is possible to only dump connect/disconnect entries, queries which make a full table scan, or use temporary tables, or scan more than X number of rows, or…

I’ll write more on this tool shortly.

New tool: oak-prepare-shutdown

This tool makes for an orderly and faster shutdown by safely stopping replication, and flushing InnoDB pages to disk prior to shutting down (keeping server available for connections even while attempting to flush dirty pages to disk). A typical use case would be:

oak-prepare-shutdown --user=root --ask-pass --socket=/tmp/mysql.sock && /etc/init.d/mysql stop

New tool: oak-repeat query

oak-repeat-query repeats executing a given query until some condition holds. The condition can be:

  • Number of given iterations has been reached
  • Given time has elapsed
  • No rows have been affected by query

The tool comes in handy for cleanup jobs, warming up caches, etc. Continue reading » “openark-kit (rev. 170): new tools, new functionality”

Where’s my cnf file?

So you have a running MySQL server, it’s working well and everyone’s happy. You want to make a minor change to the configuration file, so you edit the file, restart MySQL – but the change doesn’t catch!

Or maybe you want to check that some global variable has not been dynamically changed without an update to the configuration file. But the configuration file doesn’t make any sense — it looks like nothing is common between the file and the server.

Wait, which my.cnf file does MySQL read? Rather, which my.cnf files?

Ever happened to you? If you’re well organized, and only keep a single /etc/my.cnf file, you know exactly where everything is. But some systems are messier, with lots of configuration files hanging around. Which ones apply?

Let’s find out which configuration files apply.

No direct information

It would all be easier if we could just SHOW GLOBAL VARIABLES LIKE ‘configuration_files_that_this_server_has_read_list’. There isn’t such an option.

Continue reading » “Where’s my cnf file?”

Guide to MySQL installation files

Even for DBAs already familiar with MySQL, the choice of installation methods and the variety of install/package files is overwhelming and confusing.

I’ll make a (very!) brief introduction to the various installation options, concentrating on the Linux operating system, and provide with a simple shopping list.

For illustration, I’ll refer to the MySQL 5.1.52 community edition, the latest at the time of this writing. Downloads are available at http://dev.mysql.com/downloads/mysql/5.1.html.

I will then refer to alternate distributions.

A note for Windows users

You are lucky: your choice is very clear. Download the Windows MSI package. Choose 32 or 64 bit architecture according to your OS. If you know about the other ways to install and setup MySQL under Windows, you don’t need this post.

Linux: repositories

Easiest way to install MySQL on linux would be to use your distro’s repository. Just:

sudo apt-get install mysql-server-5.1

or

sudo yum install MySQL-Server-5.1

Your distro should resolve any package dependencies.

I’m in the opinion that if MySQL is the main application to be used on a server, distribution’s default repositories are not the way to go. Reasons include MySQL outdated version, incapability of installing multiple instance, danger of automatic upgrades or downgrades. Read this post and discussion that follows for more.

Linux: packages

If you’re a RedHat/CentOS/SuSE user, you’re in luck: MySQL provides pre-built RPM packages for your system. I vaguely remember MySQL announcing that Ubuntu is to be supported. That was a couple years ago; there is still no package for Debian/Ubuntu.

Continue reading » “Guide to MySQL installation files”

An argument for using mysqldump

I fully agree with Morgan’s An argument for not using mysqldump. This post does not come to contradict it, but rather shed a positive light on mysqldump.

I usually prefer an LVM snapshot based backup, or using XtraBackup. And, with databases as large as dozens of GB and above, I consider mysqldump to be a poor alternative. Poor in runtime, poor in overhead while taking the backup.

However once in a while I get to be reminded that mysqldump just works.

As a recent example, I had a server which was killed after an ALTER TABLE statement hanged forever (table already ALTERed, but old scheme never dropped). The old table data still hanged around the file system, but was not recognized by InnoDB. Trying out DISCARD TABLESPACE did not do the job, and eventually file was dropped.

So far, reasonable. InnoDB would complain about some table it never recognized in the first place, but all would work. That is, until backup was concerned. With innobackup or XtraBackup the restore would fail on some internal problem. LVM would work, but would only copy+paste the problem: innobackup would never again be able to be used on this database. Continue reading » “An argument for using mysqldump”