Dynamic sequencing with a single query

December 3, 2008

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL's world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That's more of a problem. To try this out, I log out (important, otherwise @rank still has its previous value) from my client, log in again, and try the following:

SELECT
  @rank := IFNULL(@rank,0)+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    1 | IND  | India              | 1013662000 |
|    1 | USA  | United States      |  278357000 |
|    1 | IDN  | Indonesia          |  212107000 |
|    1 | BRA  | Brazil             |  170115000 |
|    1 | PAK  | Pakistan           |  156483000 |
|    1 | RUS  | Russian Federation |  146934000 |
|    1 | BGD  | Bangladesh         |  129155000 |
|    1 | JPN  | Japan              |  126714000 |
|    1 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Ooops. When a session variable is NULL, it only gets assigned after the query completes, instead of per row. (For fun, try running the above query again in the same session, and see what values you get for @rank).

I do not know the reason for this behavior. I don't even know if it's intended. But I do want to make a workaround. So I try by using various techniques:

SELECT
  @rank := CASE @rank WHEN NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
| NULL | CHN  | China              | 1277558000 |
| NULL | IND  | India              | 1013662000 |
| NULL | USA  | United States      |  278357000 |
| NULL | IDN  | Indonesia          |  212107000 |
| NULL | BRA  | Brazil             |  170115000 |
| NULL | PAK  | Pakistan           |  156483000 |
| NULL | RUS  | Russian Federation |  146934000 |
| NULL | BGD  | Bangladesh         |  129155000 |
| NULL | JPN  | Japan              |  126714000 |
| NULL | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Well, that wouldn't work since NULL compared with NULL returns NULL, right? Let's try another:

SELECT
  @rank := CASE WHEN @rank IS NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    0 | CHN  | China              | 1277558000 |
|    0 | IND  | India              | 1013662000 |
|    0 | USA  | United States      |  278357000 |
|    0 | IDN  | Indonesia          |  212107000 |
|    0 | BRA  | Brazil             |  170115000 |
|    0 | PAK  | Pakistan           |  156483000 |
|    0 | RUS  | Russian Federation |  146934000 |
|    0 | BGD  | Bangladesh         |  129155000 |
|    0 | JPN  | Japan              |  126714000 |
|    0 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

We can go on like this (and I did) trying to force the session variable into being set to 0 after the first row. Once can try nested assignment, selecting from DUAL, using IF, NULLIF and more. Still, MySQL will only set the variable, if it's NULL, after the query completes. A solution is to force the variable to zero before the query begins. I will use a UNION ALL, in which the first part sets the @rank, and the second performs the query. Since its a UNION, I need to have the same number of columns in both parts. Moreover, since I'm ORDERing by Population, a column named `Population` must exist in the first part. This leads to the following query:

SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)<0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM Country ORDER BY Population DESC LIMIT 10

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|   94 | CHN  | China              | 1277558000 |
|   72 | IND  | India              | 1013662000 |
|  229 | USA  | United States      |  278357000 |
|   71 | IDN  | Indonesia          |  212107000 |
|   29 | BRA  | Brazil             |  170115000 |
|  152 | PAK  | Pakistan           |  156483000 |
|  226 | RUS  | Russian Federation |  146934000 |
|   19 | BGD  | Bangladesh         |  129155000 |
|   82 | JPN  | Japan              |  126714000 |
|  146 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query in the UNION should not return any rows, hence the impossible (@rank := 0)<0 condition.

Well, the rank has numbers all right, but what kind of numbers are these? Apparently the ranking took place before the ORDER BY. Not giving up, we try one more time:

SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)<0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM (SELECT Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10) AS c

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

Now we've got it!

The question arises: why go through all this when a simple two-queries solution is available?

First, as a MySQL excercise, I find this an interesting problem. Second, it just may be possible you'll be bound with one single query. For example, reporting tools may only allow for one query per report table. As another example, you may not have a sophisticated connection pool, and you are bound for sending one query per connection, hence unable to store session variables in between.

If you know of other solutions, hopefully simpler ones, please comment below!

  • Same results but a little cleaner, by putting the initial rank setting inside the join.

    SELECT   @rank := @rank+1 AS rank, Code, Name, Population 
    FROM Country 
    CROSS JOIN (select @rank := 0) qq 
    ORDER BY Population DESC LIMIT 10;
    
    +------+------+--------------------+------------+
    | rank | Code | Name               | Population |
    +------+------+--------------------+------------+
    |    1 | CHN  | China              | 1277558000 | 
    |    2 | IND  | India              | 1013662000 | 
    |    3 | USA  | United States      |  278357000 | 
    |    4 | IDN  | Indonesia          |  212107000 | 
    |    5 | BRA  | Brazil             |  170115000 | 
    |    6 | PAK  | Pakistan           |  156483000 | 
    |    7 | RUS  | Russian Federation |  146934000 | 
    |    8 | BGD  | Bangladesh         |  129155000 | 
    |    9 | JPN  | Japan              |  126714000 | 
    |   10 | NGA  | Nigeria            |  111506000 | 
    +------+------+--------------------+------------+
    
  • Hi Ryan,

    Thanks! I think it is much cleaner.

    Shlomi

  • William

    My version:

    SELECT @rank := @rank+1 AS rank, Code, Name, Population
    FROM Country, (SELECT @rank := 0) d
    ORDER BY Population DESC LIMIT 10;

  • You can also use a self JOIN with no variable to do ranking.

  • William - thanks! Looks the same as Ryan's.
    I wonder by what rule MySQL decides to first perform the assignment in the second SELECT.

    Toby - do you have an example?

    Shlomi

  • http://dev.mysql.com/doc/refman/5.0/en/unnamed-views.html

    "Subqueries in the FROM clause are executed even for the EXPLAIN statement (that is, derived temporary tables are built). This occurs because upper-level queries need information about all tables during the optimization phase, and the table represented by a subquery in the FROM clause is unavailable unless the subquery is executed."

    So, it sounds like the derived query will be executed before the optimizer even creates the plan for the outer query, assuming the outer query references tables.

    That page also has an interesting example demonstrating how this side effect can result in the modification of table data if you do an EXPLAIN SELECT when the subquery contains a call to a stored procedure.

  • Robert, many thanks!

  • Hi Shlomi!

    Just a nitpick - a "rank" is commonly understood as assigning a grade corresponding to the sorting order of the attribute *value* - not necessarily the same as the occurrence.

    In the case you present, the top 10 countries by population all have a different value for population, so you get to assign a unique number. But what if two (or more) of the top 10 happen to have the same value for population?

    Well, the common notion for rank is that it should assign a number the occurrence of distinct values - not the occurrence of rows. This means that the same rank can appear if the population value happened to be the same.

    What you are doing here is number the rows in order of appearance by population.

    Here's a good explanation of the difference between rank and ordered numbering:

    http://technology.amis.nl/blog/189/analytical-sql-functions-theory-and-examples-part-2-on-the-order-by-and-windowing-clauses

    kind regards,

    ROland Bouman

  • You were so close early on!

    SELECT
    @rank := IFNULL(@rank,0)+1 AS rank,
    Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10;

    If you use COALESCE instead of IFNULL, for the same purpose you were trying to achieve, you would have achieved it!

    http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce

    select @rank:=COALESCE(@rank,0)+1 AS rank,
    Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10;

  • Roland - thanks for the nitpick :), I stand corrected. My choice of terms was indeed misleading.

  • Sheeri,
    Sorry, no go! When executing on a new client, I get:

    select @rank:=COALESCE(@rank,0)+1 AS rank, Code, Name, Population FROM Country ORDER BY Population DESC LIMIT 10;
    +------+------+--------------------+------------+
    | rank | Code | Name               | Population |
    +------+------+--------------------+------------+
    |    1 | CHN  | China              | 1277558000 |
    |    1 | IND  | India              | 1013662000 |
    |    1 | USA  | United States      |  278357000 |
    |    1 | IDN  | Indonesia          |  212107000 |
    |    1 | BRA  | Brazil             |  170115000 |
    |    1 | PAK  | Pakistan           |  156483000 |
    |    1 | RUS  | Russian Federation |  146934000 |
    |    1 | BGD  | Bangladesh         |  129155000 |
    |    1 | JPN  | Japan              |  126714000 |
    |    1 | NGA  | Nigeria            |  111506000 |
    +------+------+--------------------+------------+

    I think you may have tried your query in an already "used" client.
    See, if I then run the same query again, I get:

    +------+------+--------------------+------------+
    | rank | Code | Name               | Population |
    +------+------+--------------------+------------+
    |    2 | CHN  | China              | 1277558000 |
    |    3 | IND  | India              | 1013662000 |
    |    4 | USA  | United States      |  278357000 |
    |    5 | IDN  | Indonesia          |  212107000 |
    |    6 | BRA  | Brazil             |  170115000 |
    |    7 | PAK  | Pakistan           |  156483000 |
    |    8 | RUS  | Russian Federation |  146934000 |
    |    9 | BGD  | Bangladesh         |  129155000 |
    |   10 | JPN  | Japan              |  126714000 |
    |   11 | NGA  | Nigeria            |  111506000 |
    +------+------+--------------------+------------+

    Regards,
    Shlomi

  • Jannes

    I think the reason your first UNION attempt failed is because the ORDER BY ... LIMIT 10 is interpreted for the whole union, not just for the last query.

    I didn't try, but I would expect something like this to at least give the correct result (but of course the JOIN solutions are much cleaner anyway) :

    SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
    FROM DUAL WHERE (@rank := 0)<0
    UNION ALL
    (SELECT @rank := @rank + 1 AS rank, Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10)

 
Powered by Wordpress and MySQL. Theme by openark.org