A simple question I’ve been asked:
Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query?
For example, is it possible to merge the following two UPDATE statements into one?
mysql> UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G'; Query OK, 178 rows affected (0.01 sec) mysql> UPDATE film SET rental_rate=rental_rate-0.5 WHERE length < 90; Query OK, 320 rows affected (0.01 sec)
To verify our tests, we take a checksum:
mysql> pager md5sum PAGER set to 'md5sum' mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id; c2d253c3919efaa6d11487b1fd5061f3 -
Obviously, the following query is incorrect:
mysql> UPDATE film SET rental_duration=rental_duration+1, rental_rate=rental_rate-0.5 WHERE rating = 'G' OR length < 90; Query OK, 431 rows affected (0.03 sec) mysql> pager md5sum PAGER set to 'md5sum' mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id; 09d450806e2cd7fa78a83ac5bef72d2b -
Motivation
Why would you want to do that?
- While it may seem strange, the merge can be logically (application-wise) perfectly reasonable.
- The UPDATE may be time consuming – perhaps it requires full table scan on a large table. Doing it with one scan is faster than two scans.
The solution
Use a condition for the SET clauses, optionally drop the WHERE conditions.
UPDATE film SET rental_duration=IF(rating = 'G', rental_duration+1, rental_duration), rental_rate=IF(length < 90, rental_rate-0.5, rental_rate) ; mysql> pager md5sum PAGER set to 'md5sum' mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id; c2d253c3919efaa6d11487b1fd5061f3 -
The above query necessarily does a full table scan. If there’s a benefit to using indexes in the WHERE clause, it may still be applied, using an OR condition:
UPDATE film SET rental_duration=IF(rating = 'G', rental_duration+1, rental_duration), rental_rate=IF(length < 90, rental_rate-0.5, rental_rate) WHERE rating = 'G' OR length < 90 ;
If there is a computational overhead to the IF statement, I have not noticed it. This kind of solution plays well when each of the distinct queries requires a full scan, on large tables.
I don’t know how it is on mysql, but on another db (don’t remember right now), the overhead of if is that it calculate the two options in any case. if there is a big calculation in one case (or both of them) – there will be significant overhead.
Technically IF is a flow-control construct – I believe CASE would be more appropriate here.
@xerxes,
This is not the same IF() as in stored routine’s IF. It is not a control flow construct but a simple function.
CASE is ANSI. But IF() is so much shorter to write…
Nice writeup, good trick 🙂
@xerxes like shlomi says, there are two different kind of IFs, IF function and IF statement. Likewise there are two different CASEs, CASE expression (or operator if you like that better) and CASE statement. And each of these come in two flavors, simple (CASE expr WHEN val1 THEN …. END) and searched (CASE WHEN condition1 THEN … END CASE)
Roland, thank you for expanding on this.
Xerxes, see this link: http://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html in the MySQL manual.