Table split(…) for the masses

(pun intended)

common_schema‘s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one’s query into smaller queries, each working on a different set of rows (a chunk).

Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.

In this post I show how split exposes itself to the user, should the user wish so.

split can manage queries of the following forms:

  • DELETE FROM table_name [WHERE]…
  • DELETE FROM table_name USING <multi table syntax> [WHERE]…
  • UPDATE table_name SET … [WHERE]…
  • UPDATE <multiple tables> SET … [WHERE]…
  • INSERT INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • REPLACE INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • SELECT … FROM <multiple tables> [WHERE]…

The latter being a non-obvious one at first sight.

Basically, it’ automatic

You just say:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

And split identifies sakila.inventory as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.

By the way, here’s how to execute a QueryScript code like the above.

But you can drive in manual mode

You can use the following syntax:

split (sakila.inventory)
{
  -- No action taken, but this block of code
  -- is executed per chunk of the table.
  -- I wonder what can be done here?
}

split provides with magic variables, which you can use in the action block. These are:

  • $split_step: 1-based loop counter
  • $split_rowcount: number of rows affected in current chunk operation
  • $split_total_rowcount: total number of rows affected during this split statement
  • $split_total_elapsed_time: number of seconds elapsed since beginning of this split operation.
  • $split_clause: the magic variable: the filtering condition limiting rows to current chunk.
  • $split_table_schema: the explicit or inferred schema of split table
  • $split_table_name: the explicit or inferred table being split

To illustrate, consider the following script:

split (sakila.inventory)
{
  select $split_step as step, $split_clause as clause;
}

The output is this:

+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                                                                    |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    1 | ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000')))) |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    2 | ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    3 | ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    4 | ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    5 | ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

So you can get yourself a nice present: the SQL clause which filters the distinct chunks.

A simple demo: what can the user do with “manual mode”?

Normally, I would expect the user to use the automated version of split. Let it do the hard work! But sometimes, you may wish to take control into your hands.

Consider an example: I wish to export a table into CSV file, but in chunks. pt-archiver does that. But it is also easily achievable with split:

split (sakila.inventory) {
  var $file_name := QUOTE(CONCAT('/tmp/inventory_chunk_', $split_step, '.csv'));
  select * from sakila.inventory WHERE :${split_clause} INTO OUTFILE :${file_name};
}

This script uses the powerful variable expansion feature of QueryScript: it extracts the text behind :${split_clause} and plants it as part of the query. It does the same for :${file_name}, making a variable possible where MySQL would normally disallow one (the INTO OUTFILE clause only accepts a constant string).

What do we get as result?

bash:/tmp$ ls -s1 inventory_chunk_*
32 inventory_chunk_1.csv
32 inventory_chunk_2.csv
32 inventory_chunk_3.csv
32 inventory_chunk_4.csv
20 inventory_chunk_5.csv

Conclusion

During the past months, and even as I developed split for QueryScript, I found myself using it more and more for my own purposes. As it evolved I realized how much more simple it makes these complex operations. Heck, it beats oak-chunk-update in its ease of use. They both have their place, but split is so much more intuitive and easy to write. And, no external scripts, no package dependencies.

I suggest that split is a major tool for server side scripting, server maintenance, developer operations. Check it out!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.