INFORMATION_SCHEMA.GLOBAL_STATUS: watch out

MySQL 5.1 boasts some new and useful INFORMATION_SCHEMA tables. Among them is the GLOBAL_STATUS table.

At last, it is possible to ask questions like:

node1> SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'innodb_os_log_written';
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| INNODB_OS_LOG_WRITTEN | 512            |
+-----------------------+----------------+
1 row in set (0.00 sec)

node1> SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| QUESTIONS     | 28             |
+---------------+----------------+
1 row in set (0.00 sec)

Watch out #1

As with all INFORMATION_SCHEMA tables, to get a single row one needs to materialize the entire table. To ask the above two questions, the table will materialize twice. This means gathering all the information — twice. To get 20 values, we materialize the table 20 times. It not only takes time, but also increases some of the status variables themselves, like questions, select_scan, created_tmp_tables. Ironically, when we used SHOW GLOBAL STATUS and had to parse the results in our application code, we only issued the query once. But with the convenience of INFORMATION_SCHEMA, it’s much easier (and makes more sense!) to query per variable.

Continue reading » “INFORMATION_SCHEMA.GLOBAL_STATUS: watch out”

How NOT to test that mysqld is alive

I had a call from a new customer last week. They had issues with their MySQL server. Apparently, it was repeatingly crashing, every few hours. To have their production system kept alive, they used a script to periodically see if MySQL was still alive, and if not – start it.

I was first thinking in directions of old installations; wrong memory allocations (too little memory for large content; to large memory allocation for weak machine). When I reached the customer’s premises, I quickly reviewed general settings, while explaining some basic configuration guidelines. There were peculiarities (e.g. query_cache_limit being larger than query_cache_size), but nothing to obviously suggest a reason for crash.

I then observed the error log. Took some time to find it, since the log_error parameter appeared twice in the my.cnf file; first one appearing at the very beginning, the second (overwriting the first) was far into the file.

Sure enough, there were a lot of startup messages. And… shutdown messages. In fact, the log looked something like: Continue reading » “How NOT to test that mysqld is alive”

High Performance MySQL – a book to re-read

I first read High Performance MySQL, 2nd edition about a year ago, when it first came out. I since re-read a few pages on occasion.

In my previous posts I’ve suggested ways to improve upon the common ranking solution. Very innovative stuff! Or… so I thought.

I happened to browse through the book today, and a section on User Variables caught my eye. “Let’s see if I get get some insight“, I thought to myself. Imagine my surprise when I realized almost everything I’ve suggested is discussed in this modest section, black on white, sitting on my bookshelf for over a year!

I have read it a year back, have forgotten all about it, have re-invented stuff already solved and discussed… Oh, for more brain capacity…

To be honest, this has happened to me more than once in the past few months; I’m taking the habit of browsing the web when I’m looking for answers to my problems; I forget that this book contains the answers to so many common, practical MySQL problems, and does so in a very direct and helpful manner.

So, yet again, thumbs up to High Performance MySQL. Really a must book. Get it if you haven’t already!

SQL: ranking without self join, revisited

This post follows SQL: Ranking without self join and On user variables evaluation order. I wish to share some insights with regard to user variables evaluation, as well as provide yet another ranking solution, which attempts to overcome the uncertainty factor with user variables.

There will be hand waving in this post (albeit empirical hand waving). Stop here if you don’t like hand waving. Continue if you feel curious or wish to contradict my assumptions.

Recap

The order of evaluation of user variables is undefined. The documentation has some contradicting example (bug 47514), but states that variables should not be read and assigned in different parts of the same statement (just what is a different part? Bug 47516).

Looking for a solution

There doesn’t seem to be a problem with reading and assigning variables in the very same part of the statement. For example, SELECT @a := @a+1… is exactly such a case.

Continue reading » “SQL: ranking without self join, revisited”

On user variables evaluation order

There is something very unclear about what is defined and is undefined with regard to the order by which user variables are evaluated, even within the MySQL documentation itself.

I wish to present some examples and draw a conclusion. Since I will following state there’s missing information, I would greatly appreciate any educated comments.

The trivial “reordering problem” case

Let’s look at the following query: Continue reading » “On user variables evaluation order”

SQL: Ranking without self join

The common way of solving the classic SQL problem of ranking, involves a  self join. I wish to present a different solution, which only iterates the table once, and provides the same output.

The ranking problem

Given a table with names and scores (e.g. students exams scores), add rank for each row, such that the rank identifies her position among other rows. Rows with identical scores should receive the same rank (e.g. both contenders got the silver medal).

Consider the following table (download score.sql):

mysql> select * from score;
+----------+--------------+-------+
| score_id | student_name | score |
+----------+--------------+-------+
|        1 | Wallace      |    95 |
|        2 | Gromit       |    97 |
|        3 | Shaun        |    85 |
|        4 | McGraw       |    92 |
|        5 | Preston      |    92 |
+----------+--------------+-------+
5 rows in set (0.00 sec)

We wish to present ranks in some way similar to:

+----------+--------------+-------+------+
| score_id | student_name | score | rank |
+----------+--------------+-------+------+
|        2 | Gromit       |    97 |    1 |
|        1 | Wallace      |    95 |    2 |
|        4 | McGraw       |    92 |    3 |
|        5 | Preston      |    92 |    3 |
|        3 | Shaun        |    85 |    4 |
+----------+--------------+-------+------+

Continue reading » “SQL: Ranking without self join”

InnoDB is dead. Long live InnoDB!

I find myself converting more and more customers’ databases to InnoDB plugin. In one case, it was a last resort: disk space was running out, and plugin’s compression released 75% space; in another, a slow disk made for IO bottlenecks, and plugin’s improvements & compression alleviated the problem; in yet another, I used the above to fight replication lag on a stubborn slave.

In all those case, I needed to justify the move to “new technology”. The questions “Is it GA? Is it stable?” are being asked a lot. Well, just a few days ago the MySQL 5.1 distribution started shipping with InnoDB plugin 1.0.4. That gives some weight to the stability question when facing a doubtful customer.

But I realized that wasn’t the point.

Continue reading » “InnoDB is dead. Long live InnoDB!”

SphinxSE 0.9.9-RC2 bug workaround

There is a serious bug with the sphinx storage engine, introduced in 0.9.9-RC2 (and which has not been fixed in latest revisions, as yet – last checked with rev 2006).

I would usually just revert to an older version (0.9.9-RC1 does not contain this bug), but for the reason that RC2 introduces an important feature: the sphinx_snippets() function, which allows for creation of snippets from within MySQL, and which makes the sphinx integration with MySQL complete, as far as the application is concerned.

The bug

The bug is described here and here (and see further discussions). Though it’s claimed to have been fixed, it’s been re-reported, and I’ve tried quite a few revisions and verified it has not been fixed (tested on Debian/Ubuntu x64). Essentially, the bug does not allow you to set filters on a query issued from within the SphinxSE. For example, the following queries fail:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;range=myUnixTimestamp,1249506000,1252184400;'
SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;filter=my_field,1;'

While the following query succeeds:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;'

The error message is this:

ERROR 1430 (HY000): There was a problem processing the query on the foreign data source. Data source error: searchd error: invalid or truncated request

I see this as a serious bug in the SphinxSE: it renders it useless; searching without the ability to filter is not something I can live with. Continue reading » “SphinxSE 0.9.9-RC2 bug workaround”

Generating numbers out of seemingly thin air

In some of my previous posts I’ve used a numbers table, like one holding values 1, 2, 3, …, 255. Such table can be used for string walking, joining with other tables, performing iterations.

The existence of number tables has always been a little pain. Yes, they’re very, very simple, but they need to be there. So if you just need to script some SQL query, you may find that you need to create such tables. Ummm… this means you need to have privileges (at least CREATE TEMPORARY and INSERT, if not CREATE).

The other day, Baron Schwartz posted How to round to the nearest whole multiple or fraction in SQL. In an offhand way, he generated some random numbers using the mysql.help_topic table. I then realized that post solved something I’ve been looking for: using a sure-to-exist table on any MySQL installation.

Continue reading » “Generating numbers out of seemingly thin air”

SQL pie chart

My other half says I’m losing it. But I think that as an enthusiast kernel developer she doesn’t have the right to criticize people. (“I like user space better!” – she exclaims upon reading this).

Shown below is a (single query) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples.

+----------------------------------------------------------------------+
| pie_chart                                                            |
+----------------------------------------------------------------------+
|                                                                      |
|                         ;;;;;;;;;;;;;;;;;;;;;                        |
|                  oooooooo;;;;;;;;;;;;;;;;;;;;;;;;;;;                 |
|             oooooooooooooo;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;            |
|          ooooooooooooooooo                 ;;;;;;;;;;;;#####         |
|        oooooooooooooo                           ;#############       |
|       oooooooooooo                                 ############      |
|      oooooooooooo                                   ############     |
|      ooooooooooo                                     ###########     |
|      oooooooooooo                                   ::::::::::::     |
|       oooooooooooo                                 ::::::::::::      |
|        ooooooooo:::::                           ::::::::::::::       |
|          o::::::::::::::::                 :::::::::::::::::         |
|             :::::::::::::::::::::::::::::::::::::::::::::            |
|                  :::::::::::::::::::::::::::::::::::                 |
|                         :::::::::::::::::::::                        |
|                                                                      |
| ##  red: 1 (10%)                                                     |
| ;;  blue: 2 (20%)                                                    |
| oo  orange: 3 (30%)                                                  |
| ::  white: 4 (40%)                                                   |
+----------------------------------------------------------------------+

Requirements

We need a generic query, which returns at least these two columns: name_column and value_column. For example, the following query will do: Continue reading » “SQL pie chart”