Indexing – code.openark.org

Purging old rows with QueryScript: three use cases

shlomi — Wed, 14 Nov 2012 09:15:35 +0000

Problem: you need to purge old rows from a table. This may be your weekly/monthly cleanup task. The table is large, the amount of rows to be deleted is large, and doing so in one big DELETE is too heavy.

You can use oak-chunk-update or pt-archiver to accomplish the task. You can also use server side scripting with QueryScript, offering a very simple syntax with no external scripting, dependencies and command line options.

I wish to present three cases of row deletion, with three different solutions. In all cases we assume some TIMESTAMP column exists in table, by which we choose to purge the row. In all cases we assume we wish to purge rows older than 1 month.

We assume the naive query is this:

DELETE FROM my_schema.my_table WHERE row_timestamp < CURDATE() - INTERVAL 1 MONTH

Case 1: TIMESTAMP column is indexed

I almost always index a timestamp column, if only for being able to quickly purge data (but usually also to slice data by date). In this case where the column is indexed, it’s very easy to figure out which rows are older than 1 month.

We break the naive query into smaller parts, and execute these in sequence:

while (DELETE FROM my_schema.my_table WHERE row_timestamp < CURDATE() - INTERVAL 1 MONTH ORDER BY row_timestamp LIMIT 1000)
  throttle 1;

How does the above work?

QueryScript accepts a DELETE statement as a conditional expression in a while loop. The expression evaluates to TRUE when the DELETE affects rows. Once the DELETE ceases to affect rows (when no more rows match the WHERE condition), the while loop terminates.

The throttle command allows us to play nice: by throttling we increase the total runtime through sleeping in between loop iterations.

Case 2: TIMESTAMP column is not indexed, and there is no heuristic for matching rows

This case is hardest to tackle by means of optimization: there is no index, and we cannot assume or predict anything about the distribution of old rows. We must therefore scan the entire table so as to be able to purge old rows.

This does not mean we have to do one huge full table scan. As long as we have some way to split the table, we are still good. We can utilize the PRIMARY KEY or another UNIQUE KEY so as to break the table into smaller, distinct parts, and work our way on these smaller chunks:

split (DELETE FROM my_schema.my_table WHERE row_timestamp < CURDATE() - INTERVAL 1 MONTH)
  throttle 1;

The split statement will automagically calculate the chunks and inject filtering conditions onto the query, such that each execution of the query relates to a distinct set of rows.

Case 3: TIMESTAMP column not indexed, but known to be monotonic

This is true for many tables. Rows with AUTO_INCREMENT columns and TIMESTAMP columns are created with CURRENT_TIMESTAMP values. This makes for a monotonic function: as the AUTO_INCREMENT grows, so does the TIMESTAMP.

This makes for the following observation: it we iterate the table row by row, and reach a point where the current row is not old, then we can stop looking. Timestamps will only increase by value, which means further rows only turn to be newer.

With this special case at hand, we can:

split (DELETE FROM my_schema.my_table WHERE row_timestamp < CURDATE() - INTERVAL 1 MONTH) {
  if ($split_rowcount = 0)
    break;
  throttle 1;
}

split is a looping device, and a break statement works on split just as on a while statement.

split provides with magic variables which describe current chunk status. $split_rowcount relates to the number of rows affected by last chunk query. No more rows affected? This means we’ve hit recent rows, and we do not expect to find old rows any further. We can stop looking.

How common_schema split()s tables – internals

shlomi — Thu, 06 Sep 2012 05:25:07 +0000

This post exposes some of the internals, and the SQL behind QueryScript’s split. common_schema/QueryScript 1.1 introduces the split statement, which auto-breaks a “large” query (one which operates on large tables as a whole or without keys) into smaller queries, and executes them in sequence.

This makes for easier transactions, less locks held, potentially (depending on the user) more idle time released back to the database. split has similar concepts to oak-chunk-update and pt-archiver, but works differently, and implemented entirely in SQL on server side.

Take the following statement as example:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  pass;

It yields with (roughly) the following statements:

UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581'))));

(I say “roughly” because internally there are user defined variables at play, but for convenience, I verbose the actual values as constants.)

How does that work?

common_schema works on server side. There is no Perl script or anything. It must therefore use server-side operations to:

Identify table to be split
Analyze the table in the first place, deciding how to split it
Analyze the query, deciding on how to rewrite it
Split the table (logically) into unique and distinct chunks
Work out the query on each such chunk

Following is an internal look at how common_schema does all the above.

Identifying the table

When query operates on a single table, split is able to parse the query’s SQL and find out that table. When multiple tables are involved, split requires user instruction: which table is it that the query should be split by?

Analyzing the table

Table analysis is done via a similar method to candidate_keys_recommended. It is almost identical, only it uses INFORMATION_SCHEMA optimizations to make the query short and lightweight. Simulating the analysis using candidate_keys_recommended, we get:

mysql> select * from candidate_keys_recommended where table_name='inventory' \G
*************************** 1. row ***************************
          table_schema: sakila
            table_name: inventory
recommended_index_name: PRIMARY
          has_nullable: 0
            is_primary: 1
 count_column_in_index: 1
          column_names: inventory_id

This is cool, simple and very easy to work with: we choose to split the table via the inventory_id column, which is conveniently an integer. We’ll soon see split can handle complex cases as well.

Analyzing the query

This is done in part via Roland’s query_analysis_routines, and in part just parsing the query, looking for WHERE, GROUP BY, LIMIT etc. clauses.

The nice part is injecting a WHERE condition, which didn’t appear in the original query. That WHERE condition is what limits the query to a distinct chunk of rows.

Splitting the table

With a single INTEGER PRIMARY KEY this sounds simple, right? Take rows 1..1,000, then 1,001..2,000, then 2,001..3,000 etc.

Wrong: even with this simple scenario, things are much more complex. Are the numbers successive? What if there are holes? What if there is a 1,000,000 gap between every two numbers? What if there are multiple holes of differing size and frequency?

And if we have two columns in our UNIQUE KEY? What if one of them is textual, not an INTEGER, the other a TIMESTAMP, not an INTEGER either?

split doesn’t work in that naive way. It makes no assumptions on the density of values. It only requires:

some UNIQUE KEY to work with,
which has no NULL values.

Given the above, it uses User Defined Variables to setup the chunks. With our single INTEGER column, the minimum value is set like this:

select 
  inventory_id 
from 
  `sakila`.`inventory` 
order by 
  inventory_id ASC 
limit 1  
into @_split_column_variable_min_1
;

This sets the first value of the first chunk. What value terminates this chunk? It is calculated like this:

select 
  inventory_id 
from (
  select 
    inventory_id 
  from 
    `sakila`.`inventory` 
  where 
    (((`inventory`.`inventory_id` > @_split_column_variable_range_start_1)) OR ((`inventory`.`inventory_id` = @_split_column_variable_range_start_1))) and (((`inventory`.`inventory_id` < @_split_column_variable_max_1)) OR ((`inventory`.`inventory_id` = @_split_column_variable_max_1))) 
  order by 
    inventory_id ASC limit 1000 
  ) sel_split_range  
order by 
  inventory_id DESC 
limit 1  
into @_split_column_variable_range_end_1
;

Now there’s a query you wouldn’t want to work by hand, now would you?

The cool part here is that the above works well for any type of column; this doesn’t have to be an INTEGER. Dates, strings etc. are all just fine.

The above also works well for multiple columns, where the query gets more complicated (see following).

Working out the query per chunk

This part is the easy one, now that all the hard work is done. We know ho to manipulate the query, we know the lower and upper boundaries of the chunk, so we just fill in the values and execute.

Multi-columns keys

Consider a similar query on sakila.film_actor, where the PRIMARY KEY is a compound of two columns:

split (UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

The chunked queries will look like this:

UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '1')) OR ((`film_actor`.`actor_id` = '1') AND (`film_actor`.`film_id` > '1')) OR ((`film_actor`.`actor_id` = '1') AND (`film_actor`.`film_id` = '1'))) AND (((`film_actor`.`actor_id` < '39')) OR ((`film_actor`.`actor_id` = '39') AND (`film_actor`.`film_id` < '293')) OR ((`film_actor`.`actor_id` = '39') AND (`film_actor`.`film_id` = '293'))));
UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '39')) OR ((`film_actor`.`actor_id` = '39') AND (`film_actor`.`film_id` > '293'))) AND (((`film_actor`.`actor_id` < '76')) OR ((`film_actor`.`actor_id` = '76') AND (`film_actor`.`film_id` < '234')) OR ((`film_actor`.`actor_id` = '76') AND (`film_actor`.`film_id` = '234'))));
UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '76')) OR ((`film_actor`.`actor_id` = '76') AND (`film_actor`.`film_id` > '234'))) AND (((`film_actor`.`actor_id` < '110')) OR ((`film_actor`.`actor_id` = '110') AND (`film_actor`.`film_id` < '513')) OR ((`film_actor`.`actor_id` = '110') AND (`film_actor`.`film_id` = '513'))));
UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '110')) OR ((`film_actor`.`actor_id` = '110') AND (`film_actor`.`film_id` > '513'))) AND (((`film_actor`.`actor_id` < '146')) OR ((`film_actor`.`actor_id` = '146') AND (`film_actor`.`film_id` < '278')) OR ((`film_actor`.`actor_id` = '146') AND (`film_actor`.`film_id` = '278'))));
UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '146')) OR ((`film_actor`.`actor_id` = '146') AND (`film_actor`.`film_id` > '278'))) AND (((`film_actor`.`actor_id` < '183')) OR ((`film_actor`.`actor_id` = '183') AND (`film_actor`.`film_id` < '862')) OR ((`film_actor`.`actor_id` = '183') AND (`film_actor`.`film_id` = '862'))));
UPDATE sakila.film_actor SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`film_actor`.`actor_id` > '183')) OR ((`film_actor`.`actor_id` = '183') AND (`film_actor`.`film_id` > '862'))) AND (((`film_actor`.`actor_id` < '200')) OR ((`film_actor`.`actor_id` = '200') AND (`film_actor`.`film_id` < '993')) OR ((`film_actor`.`actor_id` = '200') AND (`film_actor`.`film_id` = '993'))));

View the complete command to realize just how much more complex each query is, and how much more complex the chunking becomes. Here’s how I evaluate the chunk’s “next range end” variables:

select 
  actor_id, film_id 
from (
  select 
    actor_id, film_id 
  from 
    `sakila`.`film_actor` 
  where 
    (((`film_actor`.`actor_id` > @_split_column_variable_range_start_1)) OR ((`film_actor`.
`actor_id` = @_split_column_variable_range_start_1) AND (`film_actor`.`film_id` > @_split_column_variable_range_start_2))) and (((`film_actor`.`actor_id` < @_split_column_variable_max_1)) OR ((`film_actor`.`actor_id` = @_split_column_variable_max_1) AND (`film_actor`.`film_id` < @_split_column_variable_max_2)) OR ((`film_actor`.`actor_id` = @_split_column_variable_max_1) AND (`film_actor`.`film_id` = @_split_column_variable_max_2))) 
  order by 
    actor_id ASC, film_id ASC 
  limit 1000 
  ) sel_split_range  
order by 
  actor_id DESC, film_id DESC 
limit 1  
into @_split_column_variable_range_end_1, @_split_column_variable_range_end_2
;

By the way, you may recall that everything is done server side. The WHERE condition for the chunked queries is in itself generated via SQL statement, and not too much by programmatic logic. Here’s part of the query which computes the limiting condition:

  select
    group_concat('(', partial_comparison, ')' order by n separator ' OR ') as comparison
  from (
    select 
      n,
      group_concat('(', column_name, ' ', if(is_last, comparison_operator, '='), ' ', variable_name, ')' order by column_order separator ' AND ') as partial_comparison
    from (
      select 
        n, CONCAT(mysql_qualify(split_table_name), '.', mysql_qualify(column_name)) AS column_name,
        case split_variable_type
          when 'range_start' then range_start_variable_name
          when 'range_end' then range_end_variable_name
          when 'max' then max_variable_name
        end as variable_name,
        _split_column_names_table.column_order, _split_column_names_table.column_order = n as is_last 
      from 
        numbers, _split_column_names_table 
      where 
        n between _split_column_names_table.column_order and num_split_columns 
      order by n, _split_column_names_table.column_order
    ) s1
    group by n
  ) s2
  into return_value
  ;

There is a lot of complexity to split to make it able to provide with as clean a syntax for the user as possible.

Table split(…) for the masses

shlomi — Wed, 05 Sep 2012 05:04:05 +0000

(pun intended)

common_schema‘s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one’s query into smaller queries, each working on a different set of rows (a chunk).

Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.

In this post I show how split exposes itself to the user, should the user wish so.

split can manage queries of the following forms:

DELETE FROM table_name [WHERE]…
DELETE FROM table_name USING [WHERE]…
UPDATE table_name SET … [WHERE]…
UPDATE SET … [WHERE]…
INSERT INTO some_table SELECT … FROM [WHERE]…
REPLACE INTO some_table SELECT … FROM [WHERE]…
SELECT … FROM [WHERE]…

The latter being a non-obvious one at first sight.

Basically, it’ automatic

You just say:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

And split identifies sakila.inventory as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.

By the way, here’s how to execute a QueryScript code like the above.

But you can drive in manual mode

You can use the following syntax:

split (sakila.inventory)
{
  -- No action taken, but this block of code
  -- is executed per chunk of the table.
  -- I wonder what can be done here?
}

split provides with magic variables, which you can use in the action block. These are:

$split_step: 1-based loop counter
$split_rowcount: number of rows affected in current chunk operation
$split_total_rowcount: total number of rows affected during this split statement
$split_total_elapsed_time: number of seconds elapsed since beginning of this split operation.
$split_clause: the magic variable: the filtering condition limiting rows to current chunk.
$split_table_schema: the explicit or inferred schema of split table
$split_table_name: the explicit or inferred table being split

To illustrate, consider the following script:

split (sakila.inventory)
{
  select $split_step as step, $split_clause as clause;
}

The output is this:

+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                                                                    |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    1 | ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000')))) |
+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    2 | ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    3 | ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    4 | ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

+------+--------------------------------------------------------------------------------------------------------------------------------------+
| step | clause                                                                                                                               |
+------+--------------------------------------------------------------------------------------------------------------------------------------+
|    5 | ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581')))) |
+------+--------------------------------------------------------------------------------------------------------------------------------------+

So you can get yourself a nice present: the SQL clause which filters the distinct chunks.

A simple demo: what can the user do with “manual mode”?

Normally, I would expect the user to use the automated version of split. Let it do the hard work! But sometimes, you may wish to take control into your hands.

Consider an example: I wish to export a table into CSV file, but in chunks. pt-archiver does that. But it is also easily achievable with split:

split (sakila.inventory) {
  var $file_name := QUOTE(CONCAT('/tmp/inventory_chunk_', $split_step, '.csv'));
  select * from sakila.inventory WHERE :${split_clause} INTO OUTFILE :${file_name};
}

This script uses the powerful variable expansion feature of QueryScript: it extracts the text behind :${split_clause} and plants it as part of the query. It does the same for :${file_name}, making a variable possible where MySQL would normally disallow one (the INTO OUTFILE clause only accepts a constant string).

What do we get as result?

bash:/tmp$ ls -s1 inventory_chunk_*
32 inventory_chunk_1.csv
32 inventory_chunk_2.csv
32 inventory_chunk_3.csv
32 inventory_chunk_4.csv
20 inventory_chunk_5.csv

Conclusion

During the past months, and even as I developed split for QueryScript, I found myself using it more and more for my own purposes. As it evolved I realized how much more simple it makes these complex operations. Heck, it beats oak-chunk-update in its ease of use. They both have their place, but split is so much more intuitive and easy to write. And, no external scripts, no package dependencies.

I suggest that split is a major tool for server side scripting, server maintenance, developer operations. Check it out!

common_schema rev. 68: eval(), processlist_grantees, candidate_keys, easter_day()

shlomi — Tue, 06 Sep 2011 07:05:34 +0000

Revision 68 of common_schema is out, and includes some interesting features:

eval(): Evaluates the queries generated by a given query
match_grantee(): Match an existing account based on user+host
processlist_grantees: Assigning of GRANTEEs for connected processes
candidate_keys: Listing of prioritized candidate keys: keys which are UNIQUE, by order of best-use.
easter_day(): Returns DATE of easter day in given DATETIME’s year.

Let’s take a slightly closer look at these:

eval()

I’ve dedicated this blog post on MySQL eval() to describe it. In simple summary: eval() takes a query which generates queries (most common use queries on INFORMATION_SCHEMA) and auto-evaluates (executes) those queries. Read more

match_grantee()

As presented in Finding CURRENT_USER for any user, I’ve developed the algorithm to match a connected user+host details (as presented with PROCESSLIST) with the grantee tables (i.e. the mysql.user table), in a manner which simulates the MySQL server account matching algorithm.

This is now available as a stored function: given a user+host, the function returns with the best matched grantee. Read more

processlist_grantees

This view relies on the above, and maps the entire PROCESSLIST onto GRANTEEs. The view maps each process onto the GRANTEE (MySQL account) which is the owner of that process. Surprisingly, MySQL does not provide one with such information.

The view also provides with the following useful metadata:

Is said process executes under a SUPER privilege?
Is this a replication thread, or serving a replicating client?
Is this process the current connection (myself)?

In the spirit of common_schema, it provides with the SQL commands necessary to KILL and KILL QUERY for each process. A sample output:

mysql> SELECT * FROM common_schema.processlist_grantees;
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+
| ID     | USER       | HOST                | GRANTEE                | grantee_user | grantee_host | is_super | is_repl | sql_kill_query    | sql_kill_connection |
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+
| 650472 | replica    | jboss00.myweb:34266 | 'replica'@'%.myweb'    | replica      | %.myweb      |        0 |       1 | KILL QUERY 650472 | KILL 650472         |
| 692346 | openarkkit | jboss02.myweb:43740 | 'openarkkit'@'%.myweb' | openarkkit   | %.myweb      |        0 |       0 | KILL QUERY 692346 | KILL 692346         |
| 842853 | root       | localhost           | 'root'@'localhost'     | root         | localhost    |        1 |       0 | KILL QUERY 842853 | KILL 842853         |
| 843443 | jboss      | jboss03.myweb:40007 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843443 | KILL 843443         |
| 843444 | jboss      | jboss03.myweb:40012 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843444 | KILL 843444         |
| 843510 | jboss      | jboss00.myweb:49850 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843510 | KILL 843510         |
| 844559 | jboss      | jboss01.myweb:37031 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 844559 | KILL 844559         |
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+

Finally, it is now possible to execute the following: “Kill all slow queries which are not executed by users with the SUPER privilege or are replication threads”. To just generate the commands, execute:

mysql> SELECT sql_kill_connection FROM common_schema.processlist_grantees WHERE is_super = 0 AND is_repl = 0;

Sorry, did you only want to kill the queries? Those which are very slow? Do as follows:

mysql> SELECT sql_kill_connection FROM common_schema.processlist_grantees JOIN INFORMATION_SCHEMA.PROCESSLIST USING(ID) WHERE TIME > 10 AND is_super = 0 AND is_repl = 0;

But, really, we don’t just want commands. We really want to execute this!

Good! Step in eval():

mysql> CALL common_schema.eval('SELECT sql_kill_query FROM common_schema.processlist_grantees JOIN INFORMATION_SCHEMA.PROCESSLIST USING(id) WHERE TIME > 10 AND is_super = 0 AND is_repl = 0');

candidate_keys

A view which lists the candidate keys for tables and provides ranking for those keys, based on some simple heuristics.

This view uses the same algorithm as that used by oak-chunk-update and oak-online-alter-table, tools in the openark kit. So it provides with a way to choose the best candidate key to walk through a table. At current, a table’s PRIMARY KEY is always considered to be best, because of InnoDB’s structure of clustered index. But I intend to change that as well and provide general recommendation about candidate keys (so for example, I would be able to recommend that the PRIMARY KEY is not optimal for some table).

Actually, after a discussion initiated by Giuseppe and Roland, starting here and continuing on mail, there are more checks to be made for candidate keys, and I suspect the next version of candidate_keys will be more informational.

easter_day()

Many thanks to Roland Bouman who suggested his code for calculating easter day for a given year. Weehee! This is the first contribution to common_schema! Read more

Get it

common_schema is an open source project. It is released under the BSD license.

Find it here.

Announcing common_schema: common views & routines for MySQL

shlomi — Wed, 13 Jul 2011 04:25:24 +0000

Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.

What does it do?

There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes & transactions, security, internals… There are basic functions answering for common needs.

Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.

Here are a few highlights:

Did you know you can work out simple monitoring of your server with a query? There’s a view to do that for you.
How about showing just the good parts of the processlist?
Does your schema have redundant keys?
Or InnoDB tables with no PRIMARY KEY?
Is AUTO_INCREMENT running out of space?
Can I get the SQL statements to generate my FOREIGN KEYs? To drop them?
And can we finally get SHOW GRANTS for all accounts, and as an SQL query?
Ever needed a 64 bit CRC function?
And aren’t you tired of writing the cumbersome SUBSTRING_INDEX(SUBSTRING_INDEX(str, ‘,’, 3), ‘,’, -1)? There’s an alternative.

There’s more. Take a look at the common_schema documentation for full listing. And it’s evolving: I’ve got quite a few ideas already for future components.

Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and risks.

What do I need to install?

There’s no script or executable file. It’s just a schema. The distribution in an SQL file which generates common_schema. Much like a dump file.

What are the system requirements?

It’s just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):

common_schema_mysql_51: fits all MySQL >= 5.1 distributions
common_schema_innodb_plugin: fits MySQL >= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled
common_schema_percona_server: fits Percona Server >= 5.1

Refer to the documentation for more details.

What are the terms of use?

common_schema is released under the BSD license.

Where can I download it?

On the common_schema project page. Enjoy it!

Reasons to use AUTO_INCREMENT columns on InnoDB

shlomi — Tue, 22 Mar 2011 06:31:18 +0000

An InnoDB table must have a primary key (one is created if you don’t do it yourself). You may have a natural key at hand. Stop! Allow me to suggest an AUTO_INCREMENT may be better.

Why should one add an AUTO_INCREMENT PRIMARY KEY on a table on which there’s a natural key? Isn’t an AUTO_INCREMENT a pseudo key, meaning, it doesn’t have any explicit relation to the row data, other than it is a number and unique?

Yes, indeed so. Nevertheless, consider:

Natural keys are many times multi-columned.
Multi column PRIMARY KEYs make for larger keys, and make for bloated secondary keys as well. You may be wasting space for storing the additional AUTO_INCREMENT column, but you may gain space back on secondary keys.
Multi column PRIMARY KEYs make for more locks. See also this post.
InnoDB INSERTs work considerably faster when worked in ascending PRIMARY KEY order. Can you ensure your natural key is in such order?
Even though an AUTO_INCREMENT makes for an INSERT bottleneck (values must be given serially), it is in particular helpful to InnoDB by ensuring PRIMARY KEY values are in ascending order.
AUTO_INCEMENT makes for chronological resolution. You know what came first, and what came next.
In many datasets, more recent entries are often being accessed more, and are therefore “hotter”. By using AUTO_INCREMENT, you’re ensuring that recent entries are grouped together within the B+ Tree. This means less random I/O when looking for recent data.
A numerical key is in particular helpful in splitting your table (and tasks on your table) into smaller chunks. I write tools which can work out with any PRIMARY KEY combination, but it’s easier to work with numbers.

Multi condition UPDATE query

shlomi — Thu, 27 Jan 2011 08:30:24 +0000

A simple question I’ve been asked:

Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query?

For example, is it possible to merge the following two UPDATE statements into one?

mysql> UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G';
Query OK, 178 rows affected (0.01 sec)

mysql> UPDATE film SET rental_rate=rental_rate-0.5 WHERE length < 90;
Query OK, 320 rows affected (0.01 sec)

To verify our tests, we take a checksum:

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
c2d253c3919efaa6d11487b1fd5061f3  -

Obviously, the following query is incorrect:

mysql> UPDATE film SET rental_duration=rental_duration+1, rental_rate=rental_rate-0.5  WHERE rating = 'G' OR length < 90;
Query OK, 431 rows affected (0.03 sec)

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
09d450806e2cd7fa78a83ac5bef72d2b  -

Motivation

Why would you want to do that?

While it may seem strange, the merge can be logically (application-wise) perfectly reasonable.
The UPDATE may be time consuming – perhaps it requires full table scan on a large table. Doing it with one scan is faster than two scans.

The solution

Use a condition for the SET clauses, optionally drop the WHERE conditions.

UPDATE
 film
SET
 rental_duration=IF(rating = 'G', rental_duration+1, rental_duration),
 rental_rate=IF(length < 90, rental_rate-0.5, rental_rate)
;

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
c2d253c3919efaa6d11487b1fd5061f3  -

The above query necessarily does a full table scan. If there’s a benefit to using indexes in the WHERE clause, it may still be applied, using an OR condition:

UPDATE
 film
SET
 rental_duration=IF(rating = 'G', rental_duration+1, rental_duration),
 rental_rate=IF(length < 90, rental_rate-0.5, rental_rate)
WHERE
 rating = 'G'
 OR length < 90
;

If there is a computational overhead to the IF statement, I have not noticed it. This kind of solution plays well when each of the distinct queries requires a full scan, on large tables.

Simple guideline for choosing appropriate InnoDB PRIMARY KEYs

shlomi — Thu, 21 Oct 2010 05:52:45 +0000

Risking some flames, I’d like to suggest only two options for choosing PRIMARY KEYs for InnoDB tables. I suggest they should cover 99% (throwing numbers around) of cases.

PRIMARY KEY cases

An integer (SMALLINT / INT / BIGINT), possibly AUTO_INCREMENT column.
The combination of two columns on a many-to-many connecting table (e.g. film_actor, which connects films to actors), the two columns being the PRIMARY KEYs of respective data tables. This rule may be extended to 3-way relation tables.

A short recap: an InnoDB must have a PRIMARY KEY. It will pick one if you don’t offer it. It can pick a really bad UNIQUE KEY (e.g. website_url(255)) or make one up using InnoDB internal row ids. If you don’t have a good candidate, an AUTO_INCREMENT PRIMARY KEY is probably the easiest way out.

A 2-column combination for a many-to-many connection table is common and viable. The PRIMARY KEY will not only provide with good join access method, but will also provide with the required UNIQUE constraint.

An integer-based PRIMARY KEY will make for more compact & shallow index tree structures, which leads to less I/O and page reads.

An AUTO_INCREMENT will allow for ascending PRIMARY KEY order of INSERT, which is InnoDB-friendly: index pages will be more utilized, less fragmented.

Exceptions

You have a partitioned table, e.g. on date range. With partitioned tables, every UNIQUE KEY, including the PRIMARY KEY, must include partitioning columns. In such case you will have to extend the PRIMARY KEY.
The only key on your table is a unique constraint on some column, e.g. UNIQUE KRY (url). On one hand, it seems wasteful to create another column (e.g. AUTO_INCREMENT) to use as PRIMARY KEY. On the other hand, I’ve seen many cases where this kind of PK didn’t hold up. At some point there was need for another index. Or some method had to be devised for chunking up table data (oak-chunk-update can do that even with non-integer PKs). I’m reluctant to use such keys as PRIMARY.
I’m sure there are others.

Umm…

I wrote the draft for this post a while ago. And then came Domas and ruined it. Wait for 5.1.52?

Thoughts and ideas for Online Schema Change

shlomi — Thu, 07 Oct 2010 08:29:10 +0000

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table, to be later (asynchronously) analyzed and make for a replay of actions on the ghost table.

So in the Facebook code, some cases will lead to undesired behavior. Consider two tables, country and city, with city holding a RESTRICT/NO ACTION foreign key on country‘s id. Now consider the scenario:

Rows from city are DELETEd, where the country Id is Spain’s.
- city‘s ghost table is still unaffected, Spain’s cities are still there.
- A change is written to the delta table to mark these rows for deletion.
A DELETE is issued on country‘s Spain record.
- The DELETE should work, from the user’s perspective
- But it will fail: city’s ghost table has not received the changes yet. There’s still matching rows. The NO ACTION constraint will fail the DELETE statement.

Now, this does not lead to corruption, just to seemingly unreasonable behavior on the database part. This behavior is probably undesired. NO ACTION constraint won’t do.

However, with CASCADE or SET NULL options, there is less of an issue: operations on the parent table (e.g. country) cannot fail. We must make sure operations on the ghost table make it consistent with the original table (e.g. city).

Consider the following scenario:

A new country is created, called “Sleepyland”. An INSERT is made to country.
- Both city and city‘s ghost are immediately aware of it.
A new town is created and INSERTed to city. The town is called “Naphaven”.
- The change takes time to propagate to city‘s ghost table.
Meanwhile, we realized we made a mistake. We’ve been had. There’s no such city nor country.
1. We DELETE “Naphaven” from city.
2. We DELETE “Sleepyland” from country.
- Note that city‘s ghost table still hasn’t caught up with the changes.
Eventually, the INSERT statement for “Naphaven” reaches city‘s ghost table.
- What should happen now? The INSERT cannot succeed.
- Will this fail the entire process?

Looking at the PHP code, I see that changes written on the delta table are blindly replayed on the ghost table.

Since the process is asynchronous, this should not be the case. We can solve the above if we use INSERT IGNORE instead of INSERT. The statement will fail without failing anything else. The row cannot exist, and that’s because the original row does not exist anymore.

Unlike a replication corruption, this does not lead to accumulation mistakes. The replay is static, somewhat like in binary log format. Changes are just written, regardless of existing data.

I have given this considerable thought, and I can’t say I’ve covered all the possible scenario. However I believe that with proper use of INSERT IGNORE and REPLACE INTO (two statements I heavily relied on with oak-online-alter-table), correctness can be achieved.

There’s the small pain of re-generating the foreign key definition on the “ghost” table (CREATE TABLE LIKE … does not copy FK definitions). And since foreign key names are unique, a new name must be picked up. Not pretty, but perfectly doable.

“No AFTER_{INSERT/UPDATE/DELETE} triggers must exist.”

It would be nicer if MySQL had an ALTER TRIGGER statement. There isn’t such statement. If there were such an atomic statement, then we would be able to rewrite the trigger, so as to add our own code to the end of the trigger’s code. Yuck. Would be even nicer if we were allowed to have multiple triggers of same event.

So, we are left with DROP and CREATE triggers. Alas, this makes for a short period where the trigger does not exist. Bad. The easy solution would be to LOCK WRITE the table, but apparently you can’t DROP the trigger (*) when the table is locked. Sigh.

(*) Happened to me, apparently to Facebook too; With latest 5.1 (5.1.51) version this actually works. With 5.0 it didn’t use to; this needs more checking.

Use of INFORMATION_SCHEMA

As with oak-online-alter-table, the OSC checks for triggers, indexes, column by searching on the INFORMATION_SCHEMA tables. This makes for nice SQL for getting the exact listing and types of PRIMARY KEY columns, whether or not AFTER triggers exist, and so on.

I’ve always considered this to be the weak part of openark-kit, that it relies on INFORMATION_SCHEMA so much. It’s easier, it’s cleaner, it’s even more correct to work that way — but it just puts too much locks. I think Baron Schwartz (and now Daniel Nichter) did amazing work on analyzing table schemata by parsing the SHOW CREATE TABLE and other SHOW commands regex-wise with Maatkit. It’s a crazy work! Had I written openark-kit in Perl, I would have just import their code. But I’m too lazy busy to do the conversion from Perl to Python, and rewrite that code, what with all the debugging.

OSC is written in PHP. Again, much conversion work. I think performance-wise this is an important step to make.

A word for the critics

Finally, a word for the critics. I’ve read some Facebook/MySQL bashing comments and wish to relate.

In his interview to The Register, Mark Callaghan gave the example that “Open Schema Change lets the company update indexes without user downtime, according to Callaghan”.

PostgreSQL was mentioned for being able to add index with only read locks taken, or being able to do the work with no locks using CREATE INDEX CONCURRENTLY. I wish MySQL had that feature! Yes, MySQL has a lot to improve upon, and the latest PostgreSQL 9.0 brings valuable new features. (Did I make it clear I have no intention of bashing PostgreSQL? If not, please re-read this paragraph until convinced).

Bashing related to the notion of MySQL being so poor that Facebook used an even poorer mechanism to work out the ALTER TABLE.

Well, allow me to add a few words: the CREATE INDEX is by far not the only thing you can achieve with OSC (although it may be Facebook’s major concern). You should be able to:

Add columns
Drop columns
Convert character sets
Modify column types
Add partitioning
Reorganize partitioning
Compress the table
Otherwise changing table format
Heck, you could even modify the storage engine! (To other transactional engine)

These are giant steps. How easy would it be to write these down into the database? It only takes a few weeks time to work out a working solution with reasonable limitations, just using the resources the MySQL server provides you with. The MySQL@Facebook team should be given credit for that.

How often should you use OPTIMIZE TABLE? – followup

shlomi — Mon, 04 Oct 2010 08:07:45 +0000

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status:

mysql> show table status like 'logs'\G
*************************** 1. row ***************************
           Name: logs
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 22433048
 Avg_row_length: 206
    Data_length: 4625285120
Max_data_length: 0
   Index_length: 1437073408
      Data_free: 4194304
 Auto_increment: 186878920
    Create_time: 2010-08-24 18:10:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
        Comment:

(A bit puzzled on the Create_time; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the Create_time field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).

I was using Percona-Server-5.1.47-11.2, and so was able to look at the index statistics for that table:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+----------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys   | index_size | leaf_pages |
+--------------+------------+--------------+--------+----------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1              |     282305 |     246856 |
| newsminer    | logs       | s            |      2 | 17, 1          |      38944 |      33923 |
| newsminer    | logs       | name         |      3 | 2492739, 10, 2 |      22432 |      19551 |
| newsminer    | logs       | origin       |      3 | 1303, 4, 1     |      26336 |      22931 |
+--------------+------------+--------------+--------+----------------+------------+------------+

Status after massive purge

My first requirement was to purge out all record up to 2010-09-01 00:00:00. I did so in small chunks, using openark kit‘s oak-chunk-update (same can be achieved with maatkit‘s mk-archiver). The process purged 1000 rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in monotonically ascending values, purging of old rows also means purging of lower PKs, which means we’re trimming the PK tree from left.

Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40961 |      35262 |
| newsminer    | logs       | s            |      2 | 26, 1        |      34440 |       3798 |
| newsminer    | logs       | name         |      3 | 341011, 4, 1 |       4738 |       2774 |
| newsminer    | logs       | origin       |      3 | 341011, 4, 2 |      10178 |       3281 |
+--------------+------------+--------------+--------+--------------+------------+------------+

The number of deleted rows was roughly 85% of total rows, so down to 15% number of rows.

Status after OPTIMIZE TABLE

Time to see whether OPTIMIZE really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?

mysql> OPTIMIZE TABLE logs;
...
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40436 |      35323 |
| newsminer    | logs       | s            |      2 | 16, 1        |       5489 |       4784 |
| newsminer    | logs       | name         |      3 | 335813, 7, 1 |       3178 |       2749 |
| newsminer    | logs       | origin       |      3 | 335813, 5, 2 |       3951 |       3446 |
+--------------+------------+--------------+--------+--------------+------------+------------+
4 rows in set (0.00 sec)

The above shows no significant change in either of the indexes: not for index_size, not for leaf_pages, not for statistics (row_per_keys). The OPTIMIZE did not reduce index size. It did not reduce the number of index pages (leaf_pages are the major factor here). Some leaff_pages values have even increased, but in small enough margin to consider as equal.

Index-wise, the above example does not show an advantage to using OPTIMIZE. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.

So, no use for OPTIMIZE?

Think again: file system-wise, things look different.

Before purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd

After purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd

Recall that InnoDB never releases table space back to file system!

After OPTIMIZE on table:

bash:~# ls -l logs.* -h
-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd

On innodb_file_per_table an OPTIMIZE creates a new table space, and the old one gets destroyed. Space goes back to file system. Don’t know about you; I like to have my file system with as much free space as possible.

Need to verify

I’ve tested Percona Server, since this is where I can find INNODB_INDEX_STATS. But this begs the following questions:

Perhaps the results only apply for Percona Server? (I’m guessing not).
Or only for InnoDB plugin? Does the same hold for “builtin” InnoDB? (dunno)
Only on >= 5.1? (Maybe; 5.0 is becoming rare now anyway)
Only on InnoDB (Well, of course this test is storage engine dependent!)

Conclusion

The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle UPDATEs to indexed columns. I have some more operations to do here, with larger tables (e.g. 40GB compressed). If anything changes, I’ll drop a note.