common_schema over traditional scripts

If you are familiar with both openark kit and common_schema, you’ll notice I’ve incorporated some functionality already working in openark kit into common_schema, essentially rewriting what used to be a Python script into SQL/QueryScript.

What was my reasoning for rewriting good code? I wish to explain that, and provide with a couple examples.

I’m generally interested in pushing as much functionality into the MySQL server. When using an external script, one:

  • Needs the right dependencies (OS, Perl/Python version, Perl/Python modules).
  • Needs to provide with connection params,
  • Needs to get acquainted with a lot of command line options,
  • Is limited by whatever command line options are provided.
  • Has to invoke that script (duh!) to get the work done.

This last bullet is not so trivial: it means you can’t work some operation with your favorite GUI client, because it has no notion of your Perl script; does not run on the same machine where your Python code resides; simply can’t run those scripts for you.

With server-side code, functionality is accessible via any client. You run your operation via a query (e.g. CALL some_procedure). That can be done from your GUI client, your command line client, your event scheduler, your cronjob, all equally. You only need access to your MySQL server, which is trivial.

Of course, server side scripting is limited. Some stuff simply can’t be written solely on server side. If you want to consult your replicating slave; gracefully take action on user’s Ctrl+C, send data over the web, you’ll have to do it with an external tool. There are actually a lot of surprising limitations to things one would assume are possible on server side. You may already know how frustrated I am by the fact one can hardly get info from SHOW commands.

But, when it works, it shines

Let’s review a couple examples. The first one is nearly trivial. The second less so. Continue reading » “common_schema over traditional scripts”

Killing InnoDB idle transactions

The issue of terminating long-time idle open InnoDB transaction has been discussed recently by many. I wish to add my share, by proposing a quick and clean solution via common_schema.

common_schema 1.2 provides with the innodb_transactions view, which relies on INNODB_TRX – one of the InnoDB Plugin views in INFORMATION_SCHEMA – as well as on PROCESSLIST, and so is able to determine with certainty that a transaction has been idle for a long time.

innodb_transactions offers us with a sql_kill_query column, which produces a ‘KILL QUERY 12345’ type of value. So we can:

SELECT sql_kill_query FROM innodb_transactions WHERE trx_idle_seconds >= 10; 
+-------------------+
| sql_kill_query    |
+-------------------+
| KILL QUERY 292509 |
| KILL QUERY 292475 |
+-------------------+ 

common_schema‘s useful eval() routine allows us to actually invoke those KILL statements, all in a one-liner:

call eval('SELECT sql_kill_query FROM innodb_transactions WHERE trx_idle_seconds >= 10');

Technical details Continue reading » “Killing InnoDB idle transactions”

Speaking in London: common_schema, DBA’s framework for MySQL

I’m happy to be speaking about common_schema at Percona Live London, Dec 4th, 2012. This will be the first chance for me to speak about common_schema, what I believe to be an extremely useful companion to your MySQL server.

I will show you how common_schema can leverage your SQL skills and give you new powers. Stuff you had to be SQL-Perl-super-expert to work through is easily attainable with common_schema and QueryScript. Some maintenance, security and auditing tasks are now a breeze. Writing server side scripts can be fun!

What I won’t do:

  • Read the fancy manual aloud
  • Walk you through every possible view, column, function, parameter and script

What I will do:

  • Explain the nature of common_schema, show you the concepts behind it (and it is built with concepts in mind)
  • Provide with selected examples (schema analysis, security, processes & transactions) — there’s much to go by, but no boring drill down into all details.
  • Expose some cool hacks with QueryScript
  • Do very brief (as time allows) live demos.

common_schema is an open source project and released under the BSD license.

Last April Roland Bouman gave a great talk about common_schema. This will not be the same talk; a lot has been developed by this time.

I’ll be speaking at a very challenging time slot; the three other talks running concurrent to my own are all great ones; I actually wanted to attend all three of them. Heck, Peter wants to (though contrary to belief he can’t fork() himself). So, no hard feelings if you choose another talk during this time slot. But I do know where you live.

Purging old rows with QueryScript: three use cases

Problem: you need to purge old rows from a table. This may be your weekly/monthly cleanup task. The table is large, the amount of rows to be deleted is large, and doing so in one big DELETE is too heavy.

You can use oak-chunk-update or pt-archiver to accomplish the task. You can also use server side scripting with QueryScript, offering a very simple syntax with no external scripting, dependencies and command line options.

I wish to present three cases of row deletion, with three different solutions. In all cases we assume some TIMESTAMP column exists in table, by which we choose to purge the row. In all cases we assume we wish to purge rows older than 1 month.

We assume the naive query is this:

DELETE FROM my_schema.my_table WHERE row_timestamp < CURDATE() - INTERVAL 1 MONTH

Case 1: TIMESTAMP column is indexed

I almost always index a timestamp column, if only for being able to quickly purge data (but usually also to slice data by date). In this case where the column is indexed, it’s very easy to figure out which rows are older than 1 month.

We break the naive query into smaller parts, and execute these in sequence: Continue reading » “Purging old rows with QueryScript: three use cases”

common_schema 1.2: security, partition management, processes, QueryScript goodies

common_schema 1.2 is released! This version comes shortly after 1.1, yet contains quite a few interesting goodies:

  • Account blocking
  • Security audit
  • RANGE partition management
  • Slave status
  • Better blocking and idle transaction management
  • QueryScript goodies:
    • echo, report
    • while-otherwise statement; foreach-otherwise statement
    • Better variable scope handling
    • Complete support for variable expansion
    • Transaction support within QueryScript
  • More summary info and SQL statements in processlist-related views

A closer look at these follows:

Account blocking

A new view called sql_accounts, inspired by oak-block-account (also see here and here) provides with the means of blocking use accounts (and releasing them, of course) without revoking their privileges. It offers the SQL statements to block an account (by modifying its password in a symmetric way) and to release an account (by modifying its password back to normal). It really works like a charm. Together with killall() and sql_accounts this gives the administrator great control over accounts.

Security audit

Imported from openark kit, and implemented via QueryScript, the security_audit() procedure will audit your accounts, passwords and general settings to find problems, pitfalls and security hazards. I will write more on this later.

RANGE partition management

The sql_range_partitions view manages your RANGE and RANGE COLUMNS partitioned tables by providing with the SQL statements to drop oldest partitions and to create the next (in sequence) partitions. See my earlier post. Continue reading » “common_schema 1.2: security, partition management, processes, QueryScript goodies”

Your magical RANGE partitioning maintenance query

If you use RANGE (or RANGE COLUMNS) partitioning, and in particular when partitioning by date/time, then your are subject to the following questions: how and when do you create the “next” partition? How and when do you drop your older partitions?

Many people use in-house scripting to do that, and Giuseppe Maxia wrote Partition Helper. But I would like to take you one step forward, and provide with a query (based on views) which automagically understands which new partition you want to create, and provides you with the statement to do so. It looks somewhat like this (a demo follows later on):

mysql> SELECT * FROM sql_range_partitions \G
*************************** 1. row ***************************
            table_schema: test
              table_name: city
sql_drop_first_partition: alter table `test`.`city` drop partition `p3`
  sql_add_next_partition: alter table `test`.`city` add partition (partition `p_20160101000000` values less than (736329) /* 2016-01-01 00:00:00 */ )
*************************** 2. row ***************************
            table_schema: test
              table_name: quarterly_report_status
sql_drop_first_partition: alter table `test`.`quarterly_report_status` drop partition `p3`
  sql_add_next_partition: alter table `test`.`quarterly_report_status` reorganize partition `p_maxvalue` into (partition `p_20110401000000` values less than (1301608800) /* 2011-04-01 00:00:00 */ , partition p_maxvalue values less than MAXVALUE)

A closer look at why this is magic

This query just gave you the DROP PARTITION and ADD PARTITION for all tables in your databases that use a RANGE partitioning scheme. But, consider: Continue reading » “Your magical RANGE partitioning maintenance query”

How common_schema installs itself

Up till common_schema version 1.1, the user would need to choose from distinct distribution files: an install compatible with MySQL 5.1, one compatible with InnoDB Plugin enabled servers, and one compatible with Percona Server. The difference between the three is the availability of certain INFORMATION_SCHEMA tables.

With 1.1, this is no longer the case: common_schema auto-detects the server and available feature set, and installs accordingly.

Wait, isn’t common_schema just a SQL file?

Yes. It’s not like there’s an installer like InstallShield or anything. Nevertheless, common_schema offers a smart way of conditional handling, which is uses in itself. It’s called QueryScript.

common_schema is installed by importing the SQL file (via SOURCE command; the mysql client; your favorite GUI). This creates your usual tables, views and routines. But some of these routines make for an interpreter for QueryScript. Somewhere along the installation process (remember – it’s just a SQL import), common_schema switches over to executing scripts to manage the installation. In particular, there are a few views which depend on optional tables, such as InnoDB Plugin’s tables for INFORMATION_SCHEMA. Continue reading » “How common_schema installs itself”

How common_schema split()s tables – internals

This post exposes some of the internals, and the SQL behind QueryScript’s split. common_schema/QueryScript 1.1 introduces the split statement, which auto-breaks a “large” query (one which operates on large tables as a whole or without keys) into smaller queries, and executes them in sequence.

This makes for easier transactions, less locks held, potentially (depending on the user) more idle time released back to the database. split has similar concepts to oak-chunk-update and pt-archiver, but works differently, and implemented entirely in SQL on server side.

Take the following statement as example:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  pass;

It yields with (roughly) the following statements:

UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581'))));

(I say “roughly” because internally there are user defined variables at play, but for convenience, I verbose the actual values as constants.)

How does that work?

common_schema works on server side. There is no Perl script or anything. It must therefore use server-side operations to:

  • Identify table to be split
  • Analyze the table in the first place, deciding how to split it
  • Analyze the query, deciding on how to rewrite it
  • Split the table (logically) into unique and distinct chunks
  • Work out the query on each such chunk

Following is an internal look at how common_schema does all the above. Continue reading » “How common_schema split()s tables – internals”

Table split(…) for the masses

(pun intended)

common_schema‘s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one’s query into smaller queries, each working on a different set of rows (a chunk).

Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.

In this post I show how split exposes itself to the user, should the user wish so.

split can manage queries of the following forms:

  • DELETE FROM table_name [WHERE]…
  • DELETE FROM table_name USING <multi table syntax> [WHERE]…
  • UPDATE table_name SET … [WHERE]…
  • UPDATE <multiple tables> SET … [WHERE]…
  • INSERT INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • REPLACE INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • SELECT … FROM <multiple tables> [WHERE]…

The latter being a non-obvious one at first sight.

Basically, it’ automatic

You just say:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

And split identifies sakila.inventory as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.

By the way, here’s how to execute a QueryScript code like the above. Continue reading » “Table split(…) for the masses”