On user variables evaluation order

September 17, 2009

There is something very unclear about what is defined and is undefined with regard to the order by which user variables are evaluated, even within the MySQL documentation itself.

I wish to present some examples and draw a conclusion. Since I will following state there's missing information, I would greatly appreciate any educated comments.

The trivial "reordering problem" case

Let's look at the following query:

mysql> SET @a := 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @b := 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @a:=@b+1, @b:=@a FROM mysql.user;
+----------+--------+
| @a:=@b+1 | @b:=@a |
+----------+--------+
|        1 |      1 |
|        2 |      2 |
|        3 |      3 |
|        4 |      4 |
|        5 |      5 |
+----------+--------+
5 rows in set (0.04 sec)

According to the docs, I can't rely on the fact that the @b:=@a part reads the value of @a after is has incremented, So I may eventually end up with:

+----------+--------+
| @a:=@b+1 | @b:=@a |
+----------+--------+
|        1 |      0 |
|        2 |      1 |
|        3 |      2 |
|        4 |      3 |
|        5 |      4 |
+----------+--------+

I wish to follow the documentation and see about other cases.

The trivial "trivial" case

Quoting the MySQL documentation (8.4. User-Defined Variables):

...The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. ...

How does the above quote (focus on bold text) from the manual apply to the trivial:

SET @a := 0;
SELECT *, @a := @a+1 FROM mysql.user;

We've just used/read the value of @a, and have assigned it a new value (@a+1) at the same statement. What's an other part? Is this considered to be the same part just because it's the same column in the result set?

Let's complicate this further:

A complicated case

Quoting the manual again, we see:

mysql> SET @t1=0, @t2=0, @t3=0;
mysql> SELECT @t1:=(@t2:=1)+@t3:=4,@t1,@t2,@t3;
+----------------------+------+------+------+
| @t1:=(@t2:=1)+@t3:=4 | @t1  | @t2  | @t3  |
+----------------------+------+------+------+
|                    5 |    5 |    1 |    4 |
+----------------------+------+------+------+

What's going on here? Haven't we been told not to assign and use the same variable in the same statement? @t1 has been assigned in the first part, then has been used in the second (according to the documentation itself, we could have got '0' for @t1, second column). But I'm not keen on ranting, look further:

Looking at the first part of the statement: @t1:=(@t2:=1)+@t3:=4, what promises do I get here?

  • Does the use of brackets ensure that @t2 is evaluated before @t1?
  • Does the use of brackets ensure that @t3 is evaluated before @t1?
  • Does the order of expression tokens ensure that @t2 is evaluated before @t3 is assigned?

I think the answer to the above is 'yes' to all. But I can't verify this using the text. The reasons I think all answer 'yes' are:

  • This all happens in only one part of the statement (ignoring the use of @t1, @t2, @t3 later on).
  • Experience verifies it is so.
  • I don't see how ORDER BY, GROUP BY, HAVING (choosing examples from the docs) will change this: my understanding is that it all relates to column computation, and here this is a single column.

Drawing a conclusion

If the above is correct, and within the same part of the statement the use of brackets can dictate ordering of assignments, then we can workaround many problems arising from the danger of assigning and reading variables in the same statement.

If, in addition, the ordering of the tokens in the expression guarantees execution order within that part of statement, then the workaround does not need to be too ugly.

Using such expressions as COALESCE(@a := 1, @b := @a+1, ...) or otherwise using IF/CASE/other tricks, we can perform as many assignments as we wish within the same part of the statement.

Looking back at SQL: Ranking without self join, the solution can be fixed, if the answers to the above are correct.

The use of user variables is not a child's play; it can help solving problems with great speed, and small memory footprint. I think this issue is a little too obscurred. I call for clarifications.

5
Leave a Reply

avatar
5 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
3 Comment authors
shlomiBaronRoland Bouman Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Roland Bouman
Guest

Hi Shlomi, I think you raise a few valid points. For instance I think that it is indeed correct to assume that one could force the order of evaluation with parenthesis if it is all contained within the same expression. However, the way I see it, this will only force the per-row evaluation order. Major problems can still arise when your expression relies on assignments that happened during the processing of some row other than the current row. Since the semantics of the SELECT statement is to deliver a set of rows, there is now way to force the order… Read more »

Baron
Guest

Shlomi, I've actually tried to read the mysqld source code to see if I could find out how variables are used, and came to the conclusion that it's buried deep in the optimizer and I don't understand it well enough to draw conclusions.

trackback

[...] On user variables evaluation order [...]

 
Powered by Wordpress and MySQL. Theme by openark.org