MySQL foreach()

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn’t jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible “normal” use.

Use case: using values from query

Let’s kill all queries running for over 20 seconds:

call foreach('SELECT id FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20', 'KILL QUERY ${1}');

The thing I like most about foreach() is that it’s self explanatory. Nevertheless, I note:

The KILL command is executed for each process running for more than 20 seconds (I did round up corners, since I didn’t check for sleeping processes, for simplicity).
I also use the ${1} placeholder: much like in awk, this will get the first column in the result set. In our case, it is the single column, id.
I chose to invoke a single query/command per iteration step.

Compare the above with another solution to the same problem, using eval():

call eval('SELECT CONCAT(\'KILL QUERY \',id) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 20');

They both get the same thing done. But foreach() is just a bit more friendly to write (and read).

Let’s move to a more complicated example.

Use case: using multiple values from a query, invoking multiple commands

Let’s kill some queries, as above, but also write down a log entry so that we know what happened:

call foreach(
  'SELECT id, user FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 20',
  'KILL QUERY ${1}; INSERT INTO my_log VALUES (\'have just killed query ${1}, executed by ${2}\');')
;

In the above, for each long running process, we:

Kill the process’ query. id being the first column, is referenced by ${1}.
INSERT to my_log that said process has been killed. We note both id and user using placeholders ${1} and ${2}, respectively.

It’s possible to invoke as many queries/commands per iteration step. It is possible to use placeholders ${1} through ${9}, as well as ${NR}, which works as in awk: it is a row-counter, 1-based.

This example can still be written with eval(), but in much uglier form. I can’t just first KILL the processes, then log about them, since by the time I want to log, the queries will not be running; the commands must be coupled. This is naturally done with foreach().

Use case: iterating constant values, invoking DDL

The commands invoked by foreach() can take the form of DML (INSERT/UPDATE/…), DDL (CREATE/ALTER/…) or other (KILL/SET/…). The placeholders can be used anywhere within the text.

Take an installation where different schemata have the same exact table structure. We want to refactor a table on all schemata:

call $('{USA, UK, Japan, NZ}', 'ALTER TABLE db_region_${1}.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8');

The above translates to the following commands:

ALTER TABLE db_region_USA.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_UK.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_Japan.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;
ALTER TABLE db_region_NZ.customer ADD COLUMN personal_notes VARCHAR(4096) CHARSET utf8;

In the above, we:

Provide a list of constant values. These can be strings, numbers, whatever. They are space delimited.
Use the ${1} place holder. We can also use ${NR}.

Use case: loop through number sequence

We wish to populate a table with values:

call foreach('1970:2038', 'INSERT INTO test.test_dates (dt) VALUES (DATE(\'${1}-01-01\'))');

The above results with:

mysql> SELECT dt FROM test_dates;
+------------+
| dt         |
+------------+
| 1970-01-01 |
| 1971-01-01 |
| 1972-01-01 |
| 1973-01-01 |
| 1974-01-01 |
...
| 2036-01-01 |
| 2037-01-01 |
| 2038-01-01 |
+------------+

With numbers range:

Integers are assumed
Range is indicated by low and high values, both inclusive
Negatives allowed (e.g. ‘-5:5’, resulting with 11 steps)
Placeholders ${1} and ${NR} are allowed.

Use case: iterating through two dimensional numbers range:

We use 3 template tables; we create 15 schemata; in each we create 3 tables based on the template tables:

call foreach( '1:15,1:3',
  'CREATE DATABASE IF NOT EXISTS db_test_${1}; CREATE TABLE db_test_${1}.tbl_${2} LIKE db_template.my_table_${2};'
);

Notes:

Each of the number ranges has the same restrictions and properties as listed above (integers, inclusive, ascending)
We can now use ${1} and ${2} placeholders, noting the first and second numbers range, respectively.
We may also use ${NR}, which, in this case, will run 1 through 45 (15 times 3).
We use multiple queries per iteration step.

Use case: overcoming MySQL limitations

MySQL does not support ORDER BY & LIMIT in multi-table UPDATE and DELETE statements (as noted last year). So we cannot:

DELETE FROM t1 USING t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100;

However, we can:

call foreach(
  'SELECT t1.id FROM t1 JOIN t2 ON (...) JOIN t3 ON (..) WHERE x = 7 ORDER BY ts LIMIT 100',
  'DELETE FROM t1 WHERE id = ${1}'
);

Of course, it will do a lot of single row DELETEs. There are further MySQL limitations which complicate things if I want to overcome this. Perhaps at a later blog post.

Acknowledgements

I hit a weird bug which prevented me from releasing this earlier on. Actually it’s a duplicate of this bug, which makes it 6 years old. Hurray.

To the rescue came Roland Bouman, who suggested an idea so crazy even I was skeptic: to parse and modify the original query so as to rename column names according to my scheme. And of course he made it happen, along with some additional very useful stuff. It’s really a super-ultra-meta-meta-sql-fu magic he does there.

So, thanks, Roland, for joining the ride, and thanks, Giuseppe, for testing and helping out to shape this functionality. It’s great fun working with other people on open-source — a new experience for me.

Continued

In this post I’ve covered the general-purpose iterations. There are also more specific types of iterations with foreach(). Continued next.

3 thoughts on “MySQL foreach()”

Pingback: More MySQL foreach() | code.openark.org
Matthew Montgomery says:

December 2, 2011 at 8:23 pm

I love it! Another great use case for this is in MySQL Cluster during on line add node. After new nodes are added, have to reorganize partition for each table to use the new nodes.

CALL foreach(‘SELECT TABLE_SCHEMA, TABLE_NAME FROM INFOMATION_SCHEMA.TABLES WHERE ENGINE=\’ndbcluster\”, ‘ALTER TABLE ${1}.${2} REORGANIZE PARTITION’);

It would be slightly nicer to have an option to

CALL foreach(‘table engine ndbcluster’, ‘ALTER TABLE ${schema}.${table} REORGANIZE PARTITION’)

But only slightly so… no worries.
shlomi says:

December 2, 2011 at 8:36 pm

Matthew,
There’s a lot planned in terms of more sophisticated syntax like the one you suggested. Most of it is neatly organized in my brain. I’m just waiting to see if the community or user base approve of the direction I’m taking (as I’m hoping it will).
Filtering tables by engine, by format, by create options in general, by size etc. is on the roadmap.

With regard to your query (great use case!), in a month or so there’s yet a new interesting release I’m working on that’s related. The plot thickens!

This site uses Akismet to reduce spam. Learn how your comment data is processed.