How common_schema installs itself

Up till common_schema version 1.1, the user would need to choose from distinct distribution files: an install compatible with MySQL 5.1, one compatible with InnoDB Plugin enabled servers, and one compatible with Percona Server. The difference between the three is the availability of certain INFORMATION_SCHEMA tables.

With 1.1, this is no longer the case: common_schema auto-detects the server and available feature set, and installs accordingly.

Wait, isn’t common_schema just a SQL file?

Yes. It’s not like there’s an installer like InstallShield or anything. Nevertheless, common_schema offers a smart way of conditional handling, which is uses in itself. It’s called QueryScript.

common_schema is installed by importing the SQL file (via SOURCE command; the mysql client; your favorite GUI). This creates your usual tables, views and routines. But some of these routines make for an interpreter for QueryScript. Somewhere along the installation process (remember – it’s just a SQL import), common_schema switches over to executing scripts to manage the installation. In particular, there are a few views which depend on optional tables, such as InnoDB Plugin’s tables for INFORMATION_SCHEMA. Continue reading » “How common_schema installs itself”

How common_schema split()s tables – internals

This post exposes some of the internals, and the SQL behind QueryScript’s split. common_schema/QueryScript 1.1 introduces the split statement, which auto-breaks a “large” query (one which operates on large tables as a whole or without keys) into smaller queries, and executes them in sequence.

This makes for easier transactions, less locks held, potentially (depending on the user) more idle time released back to the database. split has similar concepts to oak-chunk-update and pt-archiver, but works differently, and implemented entirely in SQL on server side.

Take the following statement as example:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  pass;

It yields with (roughly) the following statements:

UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581'))));

(I say “roughly” because internally there are user defined variables at play, but for convenience, I verbose the actual values as constants.)

How does that work?

common_schema works on server side. There is no Perl script or anything. It must therefore use server-side operations to:

  • Identify table to be split
  • Analyze the table in the first place, deciding how to split it
  • Analyze the query, deciding on how to rewrite it
  • Split the table (logically) into unique and distinct chunks
  • Work out the query on each such chunk

Following is an internal look at how common_schema does all the above. Continue reading » “How common_schema split()s tables – internals”

Table split(…) for the masses

(pun intended)

common_schema‘s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one’s query into smaller queries, each working on a different set of rows (a chunk).

Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.

In this post I show how split exposes itself to the user, should the user wish so.

split can manage queries of the following forms:

  • DELETE FROM table_name [WHERE]…
  • DELETE FROM table_name USING <multi table syntax> [WHERE]…
  • UPDATE table_name SET … [WHERE]…
  • UPDATE <multiple tables> SET … [WHERE]…
  • INSERT INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • REPLACE INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • SELECT … FROM <multiple tables> [WHERE]…

The latter being a non-obvious one at first sight.

Basically, it’ automatic

You just say:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

And split identifies sakila.inventory as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.

By the way, here’s how to execute a QueryScript code like the above. Continue reading » “Table split(…) for the masses”

common_schema 1.1 released: split(), try-catch, killall(), profiling

I’m very happy to announce the release of common_schema, version 1.1 (revision 300).

This version boasts with compelling new features: innovative QueryScript syntax, libraries, views which add to your skills as a DBA, making some maintenance and management tasks a breeze.

  • QueryScript, split statement: automagically break long queries into smaller chunks, avoid long locks and reduce query/transaction overhead
  • QueryScript, try-catch statement: just try { something; } catch { act_on_error; }.
  • killall(): quickly kill connections based on grantee/user/host information.
  • profiling/profiling_last: utility views to assist in query profiling diagnostics
  • 1 size fits all: a single installer which auto-recognizes available server features and enables respective common_schema features accordingly.
  • QueryScript performance boost
  • much much more…

Not familiar with common_schema? It allows you to do stuff on server side, by selecting from views, calling upon useful routines or writing easy-to-manage scripts.

I’m suggesting that common_schema should be a really-should-have tool to accompany your MySQL install. Did I say “tool”? It’s merely a schema. But it makes for a great framework:

In High Performance MySQL, 3rd edition, Baron Schwartz describes common_schema:

The common_schema is to MySQL as jQuery is to javaScript

Reviewing highlights for version 1.1:

QueryScript

QueryScript is a scripting language. It sees some major improvements here. I’ve made some speed boosts by avoiding using temporary tables, and by using string parsing instead.

Without doubt the two most handy statements added to QueryScript are: Continue reading » “common_schema 1.1 released: split(), try-catch, killall(), profiling”

DELETE, don’t INSERT

Have just read INSERT, Don’t DELETE by Aaron Brown, and have some lengthy response, which is why I write this post instead of commenting on said post.

I wish to offer my counter thought and suggest that DELETEs are probably the better choice.

Aaron suggests that, when one wishes to purge rows from some table, a trick can be used: instead of DELETEing unwanted rows, one can INSERT “good” rows into a new table, then switch over with RENAME (but please read referenced post for complete details).

I respectfully disagree on several points discussed.

Lockdown

The fact one needs to block writes during the time of creation of new table is problematic: you need to essentially turn off parts of your application. The posts suggests one could use a slave – but this solution is far from being trivial as well. To switch over, you yet again need to turn off access to DB, even if for a short while. Continue reading » “DELETE, don’t INSERT”

On stored routines and dynamic statements

I very much enjoyed reading Overloading Procedures by Michael McLaughlin: good stuff!

I’m dealing with similar issues in common_schema/QueryScript, where I implement a whole new scripting language within MySQL, interpreted by stored routines. I am now finalizing the next version of common_schema/QueryScript, with a major addition to the scripting language to put yet even more power at the hands of the programmer/DBA using simple, clean syntax.

Still hush hush, the development of that feature touched at the very same issues described in Michael’s post. Present in current release, these issues are intensified by the use and complexity of the new development. Here are a few insights of mine:

Internal array implementation

Like Michael, I started by implementing arrays through tables. That is, create a (temporary, in my case) table, wrap it up with a lot of stored routine code, and simulate an array. This array is not yet provided to the user, but is used internally for QueryScript’s own code.

Well, disappointment here: during load tests on intense structures, such as a foreach loop, where each iteration of the loop requires the creation of an array, I found that the current solution does not hold well on busy servers.

Seemingly, there’s nothing wrong with the creation of a new table every once in a while — and in particular a temporary table. However, I quickly found out that a busy server thrashes the table cache with such intense rate of creation/dropping of tables. The competition over the table cache mutex becomes intolerable and hogs not only the script’s execution but the entire server’s.

There’s also the issue of the type of array values — no going around using textual columns, of course, but — how long? A VARCHAR(32767) should be enough for any reasonable implementation, but — how much memory would that consume? Both MEMORY and standard temporary tables (Percona Server has that partially resolved) use a fixed row format, which means a 32K text is actually allocated in memory even when your value is ‘x’. Continue reading » “On stored routines and dynamic statements”

MySQL/QueryScript use case: DELETE all but top N records per group

Some administrative tasks can be simplified by using common_schema/QueryScript. I’m collecting a bunch of these for documentation. Here’s one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, … most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, … populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward. Continue reading » “MySQL/QueryScript use case: DELETE all but top N records per group”

common_schema rev. 218: QueryScript, throttling, processes, documentation

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more…

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema’s QueryScript implementation.

Continue reading » “common_schema rev. 218: QueryScript, throttling, processes, documentation”

QueryScript: SQL scripting language

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the ‘sakila’ database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

Continue reading » “QueryScript: SQL scripting language”