SQL: Ranking without self join

The common way of solving the classic SQL problem of ranking, involves a  self join. I wish to present a different solution, which only iterates the table once, and provides the same output.

The ranking problem

Given a table with names and scores (e.g. students exams scores), add rank for each row, such that the rank identifies her position among other rows. Rows with identical scores should receive the same rank (e.g. both contenders got the silver medal).

Consider the following table (download score.sql):

mysql> select * from score;
+----------+--------------+-------+
| score_id | student_name | score |
+----------+--------------+-------+
|        1 | Wallace      |    95 |
|        2 | Gromit       |    97 |
|        3 | Shaun        |    85 |
|        4 | McGraw       |    92 |
|        5 | Preston      |    92 |
+----------+--------------+-------+
5 rows in set (0.00 sec)

We wish to present ranks in some way similar to:

+----------+--------------+-------+------+
| score_id | student_name | score | rank |
+----------+--------------+-------+------+
|        2 | Gromit       |    97 |    1 |
|        1 | Wallace      |    95 |    2 |
|        4 | McGraw       |    92 |    3 |
|        5 | Preston      |    92 |    3 |
|        3 | Shaun        |    85 |    4 |
+----------+--------------+-------+------+

Continue reading » “SQL: Ranking without self join”

InnoDB is dead. Long live InnoDB!

I find myself converting more and more customers’ databases to InnoDB plugin. In one case, it was a last resort: disk space was running out, and plugin’s compression released 75% space; in another, a slow disk made for IO bottlenecks, and plugin’s improvements & compression alleviated the problem; in yet another, I used the above to fight replication lag on a stubborn slave.

In all those case, I needed to justify the move to “new technology”. The questions “Is it GA? Is it stable?” are being asked a lot. Well, just a few days ago the MySQL 5.1 distribution started shipping with InnoDB plugin 1.0.4. That gives some weight to the stability question when facing a doubtful customer.

But I realized that wasn’t the point.

Continue reading » “InnoDB is dead. Long live InnoDB!”

SphinxSE 0.9.9-RC2 bug workaround

There is a serious bug with the sphinx storage engine, introduced in 0.9.9-RC2 (and which has not been fixed in latest revisions, as yet – last checked with rev 2006).

I would usually just revert to an older version (0.9.9-RC1 does not contain this bug), but for the reason that RC2 introduces an important feature: the sphinx_snippets() function, which allows for creation of snippets from within MySQL, and which makes the sphinx integration with MySQL complete, as far as the application is concerned.

The bug

The bug is described here and here (and see further discussions). Though it’s claimed to have been fixed, it’s been re-reported, and I’ve tried quite a few revisions and verified it has not been fixed (tested on Debian/Ubuntu x64). Essentially, the bug does not allow you to set filters on a query issued from within the SphinxSE. For example, the following queries fail:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;range=myUnixTimestamp,1249506000,1252184400;'
SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;filter=my_field,1;'

While the following query succeeds:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;'

The error message is this:

ERROR 1430 (HY000): There was a problem processing the query on the foreign data source. Data source error: searchd error: invalid or truncated request

I see this as a serious bug in the SphinxSE: it renders it useless; searching without the ability to filter is not something I can live with. Continue reading » “SphinxSE 0.9.9-RC2 bug workaround”

Generating numbers out of seemingly thin air

In some of my previous posts I’ve used a numbers table, like one holding values 1, 2, 3, …, 255. Such table can be used for string walking, joining with other tables, performing iterations.

The existence of number tables has always been a little pain. Yes, they’re very, very simple, but they need to be there. So if you just need to script some SQL query, you may find that you need to create such tables. Ummm… this means you need to have privileges (at least CREATE TEMPORARY and INSERT, if not CREATE).

The other day, Baron Schwartz posted How to round to the nearest whole multiple or fraction in SQL. In an offhand way, he generated some random numbers using the mysql.help_topic table. I then realized that post solved something I’ve been looking for: using a sure-to-exist table on any MySQL installation.

Continue reading » “Generating numbers out of seemingly thin air”

SQL pie chart

My other half says I’m losing it. But I think that as an enthusiast kernel developer she doesn’t have the right to criticize people. (“I like user space better!” – she exclaims upon reading this).

Shown below is a (single query) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples.

+----------------------------------------------------------------------+
| pie_chart                                                            |
+----------------------------------------------------------------------+
|                                                                      |
|                         ;;;;;;;;;;;;;;;;;;;;;                        |
|                  oooooooo;;;;;;;;;;;;;;;;;;;;;;;;;;;                 |
|             oooooooooooooo;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;            |
|          ooooooooooooooooo                 ;;;;;;;;;;;;#####         |
|        oooooooooooooo                           ;#############       |
|       oooooooooooo                                 ############      |
|      oooooooooooo                                   ############     |
|      ooooooooooo                                     ###########     |
|      oooooooooooo                                   ::::::::::::     |
|       oooooooooooo                                 ::::::::::::      |
|        ooooooooo:::::                           ::::::::::::::       |
|          o::::::::::::::::                 :::::::::::::::::         |
|             :::::::::::::::::::::::::::::::::::::::::::::            |
|                  :::::::::::::::::::::::::::::::::::                 |
|                         :::::::::::::::::::::                        |
|                                                                      |
| ##  red: 1 (10%)                                                     |
| ;;  blue: 2 (20%)                                                    |
| oo  orange: 3 (30%)                                                  |
| ::  white: 4 (40%)                                                   |
+----------------------------------------------------------------------+

Requirements

We need a generic query, which returns at least these two columns: name_column and value_column. For example, the following query will do: Continue reading » “SQL pie chart”

SQL graphics

SQL is not meant to generate graphics, for sure; but I see some cases where generating non-tabular output can be desirable, as I will show in future posts.

I’d like to explain the basics of working SQL graphics: it is actually possible to do whatever you like. How?

Coordinates system

We’ll now develop a coordinates system using SQL. By producing this, I will have proven my point that anything is possible, and will provide an additional proof of concept.

To start with generating coordinates, I’ll need a helper table: a numbers table (tinyint_asc, example, with numbers ranging 0..255).

We’ll strive to produce a 10×10 coordinate matrix. To do this, we’ll self-join the numbers table against itself, and use a helper variable to set the size of the matrix. Continue reading » “SQL graphics”

“Vote for me…” how to embed in WordPress

[Clarification: I’m not actually asking you to vote for me :D, the title just follows a previous post]

Diego Medina has published a JavaScript code that can be embedded in your blog posts, and which allows for voting on Planet MySQL from within your blog.

Shared below is how to set this up for WordPress users. This is not a WordPress plugin, mind you. You’ll need to manually edit the WordPress template files (can be done from the Dashboard->Appearance->Editor->Single post).

The page you’re likely to edit is single.php, but depending on your template this can change. The explanation below assumes a single post page. This can also be worked out for your blog’s home page, which lists several entries.

Since there is no point in presenting the Planet MySQL voting widget for entries which do not relate to MySQL, the code verifies that the post is in the ‘MySQL’ category. You need to change this if your categorization differs. Mind that the category’s name is case sensitive. Continue reading » ““Vote for me…” how to embed in WordPress”

Reasons to use InnoDB Plugin

I wish to present some compelling reasons to use the InnoDB plugin. The plugin is a drop-in replacement for “normal” InnoDB tables; enabling many new features. It is the outcome of a long termed silence from InnoBase (Oracle), which were thought to be neglecting the InnoDB engine.

I’m going to leave out “performance” for the reason that grander forces have benchmarked and written about it.

Compression

Using the new Barracuda table format, table data can be compressed. Compression depends on the type of data you have in your table, and in KEY_BLOCK_SIZE. I have found tables with lots of textual data to compress well, to about 25% volume (that is, reduction of 75%), and strictly integer-typed tables (like an a-2-b connecting table) to compress poorly.

I have seen an InnoDB 50GB database shrink into some 12GB only. Wow! That meant a server which only had RAID 1 two 72GB disks, and which was dangerously filled up with disk space, could now accommodate the database, a backup, and then some!

Continue reading » “Reasons to use InnoDB Plugin”

Auto scaling, scaled SQL graphs concluded

I wasn’t sure I was to go this far. After catching breath the following have been added to Generic, auto scaling, scaled SQL graphs, and these will conclude my current hacks:

  • Displaying X-axis min/max values.
  • Support for Y-axis values precision.
  • Support for pre-defined scale range.

The addition of the above makes for presentable, usable graphs. See also sample graphs at the end of this post.

Step 8: adding X-axis values

I add minimum/maximum X-scale values to the graph. What was just ordering_column before, now turns to be the x in the y = f(x) function. Continue reading » “Auto scaling, scaled SQL graphs concluded”

Generic, auto scaling, scaled SQL graphs

In Rotating SQL graphs horizontally, I have shown how to rotate an ASCII SQL graph into horizontal position.

I was dissatisfied with some parts of the solution, which I will show now how to fix:

  • I had to manually scale the graph values so as to fit nicely into screen.
  • I had to rely on hard coded scaling schemes.
  • I had to rely on hard coded column names.
  • I had no y-axis legend.

I will now present an SQL query which allows for pluggable queries, which creates self, auto scaling graphs, along with y-axis scales.

Using deeply nested subqueries, we will evolve a simple SELECT query into an elaborate graph. I will present the many steps required, followed by explanations and sample results. But in the end – the steps are unimportant. I’ll present a generic query, into which your own SELECT can be embedded, and which will provide you with the graph.

We’ll use the same example, found in graph.sql.

Continue reading » “Generic, auto scaling, scaled SQL graphs”