common_schema 1.1 released: split(), try-catch, killall(), profiling

I’m very happy to announce the release of common_schema, version 1.1 (revision 300).

This version boasts with compelling new features: innovative QueryScript syntax, libraries, views which add to your skills as a DBA, making some maintenance and management tasks a breeze.

  • QueryScript, split statement: automagically break long queries into smaller chunks, avoid long locks and reduce query/transaction overhead
  • QueryScript, try-catch statement: just try { something; } catch { act_on_error; }.
  • killall(): quickly kill connections based on grantee/user/host information.
  • profiling/profiling_last: utility views to assist in query profiling diagnostics
  • 1 size fits all: a single installer which auto-recognizes available server features and enables respective common_schema features accordingly.
  • QueryScript performance boost
  • much much more…

Not familiar with common_schema? It allows you to do stuff on server side, by selecting from views, calling upon useful routines or writing easy-to-manage scripts.

I’m suggesting that common_schema should be a really-should-have tool to accompany your MySQL install. Did I say “tool”? It’s merely a schema. But it makes for a great framework:

In High Performance MySQL, 3rd edition, Baron Schwartz describes common_schema:

The common_schema is to MySQL as jQuery is to javaScript

Reviewing highlights for version 1.1:

QueryScript

QueryScript is a scripting language. It sees some major improvements here. I’ve made some speed boosts by avoiding using temporary tables, and by using string parsing instead.

Without doubt the two most handy statements added to QueryScript are: Continue reading » “common_schema 1.1 released: split(), try-catch, killall(), profiling”

MySQL/QueryScript use case: DELETE all but top N records per group

Some administrative tasks can be simplified by using common_schema/QueryScript. I’m collecting a bunch of these for documentation. Here’s one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, … most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, … populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward. Continue reading » “MySQL/QueryScript use case: DELETE all but top N records per group”

common_schema rev. 218: QueryScript, throttling, processes, documentation

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more…

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema’s QueryScript implementation.

Continue reading » “common_schema rev. 218: QueryScript, throttling, processes, documentation”

QueryScript: SQL scripting language

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the ‘sakila’ database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

Continue reading » “QueryScript: SQL scripting language”

MySQL command line vs. visual editors

Students in my training classes usually prefer to use some kind of visual editor for MySQL. Typically this would be the software they’re using at work. Sometimes they just bring over their laptops with the software installed. Or they would use MySQL Workbench, which is what I usually have pre-installed on their desktops.

I see MySQL Workbench, SQLyog, Toad for MySQL, or several more.

I always humbly suggest they close down their software and open up a command line.

It isn’t fancy. It may not even be convenient (especially on Windows, in my opinion). And repeating your last command with a minor modification requires a lot of key stroking. Or you would copy+paste from some text editor. Most students will give it a shot, then go back to their favorite editor.

Well, again and again I reach the same conclusion:

Visual editors are not as trustworthy as the command line.

Time and again students show me something on their editor. Behavior seems strange to me. Opening up a console shows a completely different picture.

Things like: Continue reading » “MySQL command line vs. visual editors”

More MySQL foreach()

In my previous post I’ve shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema.

In this part I present DBA’s handy syntax for schema and table operations and maintenance.

Confession: while I love INFORMATION_SCHEMA‘s power, I just hate writing queries against it. It’s just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';

When a join is involved this really becomes a nightmare. I think it’s cumbersome, and as result, many do not remember the names and meaning of columns, making for “oh, I need to read the manual all over again just to get that query right”. Anyway, that’s my opinion.

A SHOW TABLES statement is easier to type, but cannot be integrated into a SELECT query (though we have a partial solution for that, too), and besides, when filtering out the views, the SHOW statement becomes almost as cumbersome as the one on INFORMATION_SCHEMA.

Which is why foreach() offers handy shortcuts to common iterations on schemata and tables, as follows:

Use case: iterate all databases

call foreach('schema', 'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))');

In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let’s filter them:

Use case: iterate databases by name match

call foreach('schema like wordpress_%', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

The above will only iterate my WordPress databases (I have several of these), performing an ALTER on wp_posts for each of those databases. Continue reading » “More MySQL foreach()”

MySQL foreach()

A new routine is now available in common_schema, which makes for an easier execution syntax for some operations:

foreach(collection_to_iterate_over, queries_to_execute_per_iteration_step);

To illustrate what it can do, consider:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPACT');

call $('schema like shard_%', 'CREATE TABLE ${schema}.messages (id INT)');

call $('2000:2009', 'INSERT IGNORE INTO report (report_year) VALUES (${1})');

$() stands as a synonym to foreach(). I suspect it should look familiar to web programmers.

The idea for foreach() was introduced by Giuseppe Maxia during a correspondence. At first I was skeptic: this isn’t jQuery; this is SQL. Why would I want to use foreach()?

Then Giuseppe provided some use cases, and as I started thinking about it, I found more and more cases where such a tool might considerably reduce scripting overhead and avoid requiring SQL-fu skills. In fact, I have been using it myself for the past few weeks

I provide examples which I hope will convince the reader as for the simplicity of using such syntax. Showing off the types of input foreach() accepts (query, table search, schema search, set of constants, single or double numbers range), and the types of queries it can execute (single, multiple, using placeholders).

I stress that this is not a replacement for common queries (i.e. does not come to replace your common SELECT, UPDATE, DELETE), but more for working out administrative tasks. Nevertheless, the last example in this post does provide with an interesting insight on possible “normal” use. Continue reading » “MySQL foreach()”

Test-driven SQL development

I’m having a lot of fun writing common_schema, an SQL project which includes views, tables and stored routines.

As the project grows (and it’s taking some interesting directions, in my opinion) more dependencies are being introduced, and a change to one routine or view may affect many others. This is why I’ve turned the development on common_schema to be test driven.

Now, just how do you test drive an SQL project?

Well, much like the way you test any other project in your favorite programming language. If its functions you’re testing, that’s all too familiar: functions get some input and provide some output. Hmmm, they might be changing SQL data during that time. With procedures it’s slightly more complex, since they do not directly return output but result sets.

Here’s the testing scheme I use: Continue reading » “Test-driven SQL development”

Useful sed / awk liners for MySQL

Listing some useful sed / awk liners to use with MySQL. I use these on occasion.

sed, awk & grep have many overlapping features. Some simple tasks can be performed by either. For example, stripping empty lines can be performed by either:

grep '.'
awk '/./'
sed '/./!d'
grep -v '^$'
awk '!/^$/'
sed '/^$/d'

It’s a matter of taste & convention which tool and variation to use. So for any script I suggest, there may be many variations, possibly cleaner, shorter; feel free to comment.

mysqldump

The output of mysqldump is in particular useful when one wishes to make transformation on data or metadata. Continue reading » “Useful sed / awk liners for MySQL”

oak-hook-general-log: your poor man’s Query Analyzer

The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries.

Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.

With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or… For example, the following execution will only log queries which make for filesort:

oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort

The problem with using the standard logs

So you have the general log, which you don’t often enable, since it tends to grow huge within moments. You then have the slow log. Slow log is great, and is among the top tools for MySQL diagnosis.

The slow log allows for log-queries-not-using-indexes, which is yet another nice feature. Not only should you log any query running for over X seconds, but also log any query which does not use an index.

Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common 20 rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.

And how about queries which do use an index, but do so poorly? They use an index, but retrieve some 12,500,000 rows, using temporary table & filesort?

What oak-hook-general-log does for you

This tool streams out the general log, and filters out queries based on their role or on their execution plan.

To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is 1 minute, configurable). It’s really not a tool you should keep running for days. But during the few moments it runs, it will:

  • Routinely rotate the mysql.general_log table so that it doesn’t fill up
  • Examine entries found in the general log
  • Cross reference entries to the PROCESSLIST so as to deduce database context (bug #52554)
  • If required and appropriate, evaluate a query’s execution plan
  • Decide whether to dump each entry based on filtering rules

Filtering rules

Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are: Continue reading » “oak-hook-general-log: your poor man’s Query Analyzer”