Dynamic sequencing with a single query

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL’s world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That’s more of a problem. To try this out, I log out (important, otherwise @rank still has its previous value) from my client, log in again, and try the following:

SELECT
  @rank := IFNULL(@rank,0)+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    1 | IND  | India              | 1013662000 |
|    1 | USA  | United States      |  278357000 |
|    1 | IDN  | Indonesia          |  212107000 |
|    1 | BRA  | Brazil             |  170115000 |
|    1 | PAK  | Pakistan           |  156483000 |
|    1 | RUS  | Russian Federation |  146934000 |
|    1 | BGD  | Bangladesh         |  129155000 |
|    1 | JPN  | Japan              |  126714000 |
|    1 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Ooops. When a session variable is NULL, it only gets assigned after the query completes, instead of per row. (For fun, try running the above query again in the same session, and see what values you get for @rank).

I do not know the reason for this behavior. I don’t even know if it’s intended. But I do want to make a workaround. So I try by using various techniques:

SELECT
  @rank := CASE @rank WHEN NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
| NULL | CHN  | China              | 1277558000 |
| NULL | IND  | India              | 1013662000 |
| NULL | USA  | United States      |  278357000 |
| NULL | IDN  | Indonesia          |  212107000 |
| NULL | BRA  | Brazil             |  170115000 |
| NULL | PAK  | Pakistan           |  156483000 |
| NULL | RUS  | Russian Federation |  146934000 |
| NULL | BGD  | Bangladesh         |  129155000 |
| NULL | JPN  | Japan              |  126714000 |
| NULL | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Well, that wouldn’t work since NULL compared with NULL returns NULL, right? Let’s try another:

SELECT
  @rank := CASE WHEN @rank IS NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    0 | CHN  | China              | 1277558000 |
|    0 | IND  | India              | 1013662000 |
|    0 | USA  | United States      |  278357000 |
|    0 | IDN  | Indonesia          |  212107000 |
|    0 | BRA  | Brazil             |  170115000 |
|    0 | PAK  | Pakistan           |  156483000 |
|    0 | RUS  | Russian Federation |  146934000 |
|    0 | BGD  | Bangladesh         |  129155000 |
|    0 | JPN  | Japan              |  126714000 |
|    0 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

We can go on like this (and I did) trying to force the session variable into being set to 0 after the first row. Once can try nested assignment, selecting from DUAL, using IF, NULLIF and more. Still, MySQL will only set the variable, if it’s NULL, after the query completes. A solution is to force the variable to zero before the query begins. I will use a UNION ALL, in which the first part sets the @rank, and the second performs the query. Since its a UNION, I need to have the same number of columns in both parts. Moreover, since I’m ORDERing by Population, a column named `Population` must exist in the first part. This leads to the following query:

SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)<0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM Country ORDER BY Population DESC LIMIT 10

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|   94 | CHN  | China              | 1277558000 |
|   72 | IND  | India              | 1013662000 |
|  229 | USA  | United States      |  278357000 |
|   71 | IDN  | Indonesia          |  212107000 |
|   29 | BRA  | Brazil             |  170115000 |
|  152 | PAK  | Pakistan           |  156483000 |
|  226 | RUS  | Russian Federation |  146934000 |
|   19 | BGD  | Bangladesh         |  129155000 |
|   82 | JPN  | Japan              |  126714000 |
|  146 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query in the UNION should not return any rows, hence the impossible (@rank := 0)<0 condition.

Well, the rank has numbers all right, but what kind of numbers are these? Apparently the ranking took place before the ORDER BY. Not giving up, we try one more time:

SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)<0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM (SELECT Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10) AS c

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Now we’ve got it!

The question arises: why go through all this when a simple two-queries solution is available?

First, as a MySQL excercise, I find this an interesting problem. Second, it just may be possible you’ll be bound with one single query. For example, reporting tools may only allow for one query per report table. As another example, you may not have a sophisticated connection pool, and you are bound for sending one query per connection, hence unable to store session variables in between.

If you know of other solutions, hopefully simpler ones, please comment below!

12 thoughts on “Dynamic sequencing with a single query

  1. http://dev.mysql.com/doc/refman/5.0/en/unnamed-views.html

    “Subqueries in the FROM clause are executed even for the EXPLAIN statement (that is, derived temporary tables are built). This occurs because upper-level queries need information about all tables during the optimization phase, and the table represented by a subquery in the FROM clause is unavailable unless the subquery is executed.”

    So, it sounds like the derived query will be executed before the optimizer even creates the plan for the outer query, assuming the outer query references tables.

    That page also has an interesting example demonstrating how this side effect can result in the modification of table data if you do an EXPLAIN SELECT when the subquery contains a call to a stored procedure.

  2. Hi Shlomi!

    Just a nitpick – a “rank” is commonly understood as assigning a grade corresponding to the sorting order of the attribute *value* – not necessarily the same as the occurrence.

    In the case you present, the top 10 countries by population all have a different value for population, so you get to assign a unique number. But what if two (or more) of the top 10 happen to have the same value for population?

    Well, the common notion for rank is that it should assign a number the occurrence of distinct values – not the occurrence of rows. This means that the same rank can appear if the population value happened to be the same.

    What you are doing here is number the rows in order of appearance by population.

    Here’s a good explanation of the difference between rank and ordered numbering:

    http://technology.amis.nl/blog/189/analytical-sql-functions-theory-and-examples-part-2-on-the-order-by-and-windowing-clauses

    kind regards,

    ROland Bouman

  3. You were so close early on!

    SELECT
    @rank := IFNULL(@rank,0)+1 AS rank,
    Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10;

    If you use COALESCE instead of IFNULL, for the same purpose you were trying to achieve, you would have achieved it!

    http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce

    select @rank:=COALESCE(@rank,0)+1 AS rank,
    Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10;

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.