Percona Live MySQL Conference and Expo 2013, Call for Papers

I’m honored to accept the role of conference chairman in the upcoming Percona Live MySQL Conference and Expo, April 22th-25th, 2013, Santa Clara, CA. Here are a few administrative announcements:

Call for Papers

Call for papers is extended until Oct 31st. We’ve already received a good number of submissions, but wish to provide with another chance for latecomers. Now is a good time to submit!

The committee will begin the long reviewing process shortly.

Committee

Members of the reviewing committee are:

  • Roland Bouman (Pentaho)
  • Ronald Bradford (Continuent)
  • Laine Campbell (PalominoDB)
  • Colin Charles (MontyProgram)
  • Jeremey Cole (Twitter)
  • Patrick Galbraith (HP)
  • Henrik Ingo (Nokia)
  • Jay Janssen (Percona)
  • Giuseppe Maxia (Continuent)
  • Domas Mituzas (Facebook)
  • Ivan Zoratti (SkySQL)

Continue reading » “Percona Live MySQL Conference and Expo 2013, Call for Papers”

Thoughts on MySQL 5.6 new replication features

After playing a little bit with MySQL 5.6 (RC), and following closely on Giuseppe’s MySQL 5.6 replication gotchas (and bugs), I was having some thoughts.

These are shared for a few reasons:

  • Maybe I didn’t understand it well, and someone could correct me
  • Or I understood it well, and my input could be of service to the developers
  • Or it could be of service to the users

InnoDB tables in mysql schema

The introduction of InnoDB tables in mysql makes for crash-safe replication information: the exact replication position (master log file+pos, relay log file+pos etc.) is updated on InnoDB tables; with innodb_flush_logs_at_trx_commit=1 this means replication status is durable and consistent with server data. This is great news!

However, the introduction of InnoDB tables to the mysql schema also breaks some common usage on installation and setup of MySQL servers. You can’t just drop your ib_data1 file upon dump+restore, since it also contains internal data. Giuseppe outlines the workaround for that.

I was thinking: would it be possible to have a completely different tablespace for MySQL’s internal InnoDB tables? That could be a single tablespace file (who cares about file-per-table on a few internal tables). And I’m throwing an idea without being intimate with the internals: you know how it is possible to span the shared tablespace across multiple files, as in: Continue reading » “Thoughts on MySQL 5.6 new replication features”

Your magical RANGE partitioning maintenance query

If you use RANGE (or RANGE COLUMNS) partitioning, and in particular when partitioning by date/time, then your are subject to the following questions: how and when do you create the “next” partition? How and when do you drop your older partitions?

Many people use in-house scripting to do that, and Giuseppe Maxia wrote Partition Helper. But I would like to take you one step forward, and provide with a query (based on views) which automagically understands which new partition you want to create, and provides you with the statement to do so. It looks somewhat like this (a demo follows later on):

mysql> SELECT * FROM sql_range_partitions \G
*************************** 1. row ***************************
            table_schema: test
              table_name: city
sql_drop_first_partition: alter table `test`.`city` drop partition `p3`
  sql_add_next_partition: alter table `test`.`city` add partition (partition `p_20160101000000` values less than (736329) /* 2016-01-01 00:00:00 */ )
*************************** 2. row ***************************
            table_schema: test
              table_name: quarterly_report_status
sql_drop_first_partition: alter table `test`.`quarterly_report_status` drop partition `p3`
  sql_add_next_partition: alter table `test`.`quarterly_report_status` reorganize partition `p_maxvalue` into (partition `p_20110401000000` values less than (1301608800) /* 2011-04-01 00:00:00 */ , partition p_maxvalue values less than MAXVALUE)

A closer look at why this is magic

This query just gave you the DROP PARTITION and ADD PARTITION for all tables in your databases that use a RANGE partitioning scheme. But, consider: Continue reading » “Your magical RANGE partitioning maintenance query”

MySQL 5.6 new features: the user’s perspective

This is a yet-another compilation of the new MySQL 5.6 feature set. It is not a complete drill down. This list reflects what I believe to be the interesting new features user and usability -wise.

For example, I won’t be listing InnoDB’s split of kernel mutex. I’m assuming it can have a great impact on overall performance due to reducing lock contention; but usability-wise, this is very internal.

The complication is an aggregate of the many announcements and other complications published earlier on. See a reference at the end of this post.

Do note I am not using 5.6 as yet; it is in RC, not GA. I am mostly excited just to write down this list.

InnoDB

  • Online ALTER TABLE: if there is one major new feature in 5.6 you would want to upgrade for, this would be it. Add columns, drop columns, rename columns, add indexes, drop indexes – now online, while your SELECT, INSERT, UPDATE and DELETE statements are running.
  • Transportable tablespace files: copy+paste your_table.ibd files with FLUSH TABLE FOR EXPORT and ALTER TABLE … IMPORT TABLESPACE.
  • FULLTEXT: for many, the one thing holding them back from leaving MyISAM behind. Now available in InnoDB with same syntax as with MyISAM.
  • Memcached API: access InnoDB data via memcahced protocol, and skip the SQL interface.
  • User defined table location: place your tables in your pre-defined location. Place other tables elsewhere. This is something I’ve been asked about for ages.

Continue reading » “MySQL 5.6 new features: the user’s perspective”

Three wishes for a new year

Another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Get a decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL.

I mean, I like GROUP_CONCAT, and the many hacks it provides: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. But it makes for a poor substitution to Window Functions, and only solves a subset of issues.

My wishes in previous two years [2010], [2011] have not come true. I’m still willing to settle for two out of three.

How common_schema installs itself

Up till common_schema version 1.1, the user would need to choose from distinct distribution files: an install compatible with MySQL 5.1, one compatible with InnoDB Plugin enabled servers, and one compatible with Percona Server. The difference between the three is the availability of certain INFORMATION_SCHEMA tables.

With 1.1, this is no longer the case: common_schema auto-detects the server and available feature set, and installs accordingly.

Wait, isn’t common_schema just a SQL file?

Yes. It’s not like there’s an installer like InstallShield or anything. Nevertheless, common_schema offers a smart way of conditional handling, which is uses in itself. It’s called QueryScript.

common_schema is installed by importing the SQL file (via SOURCE command; the mysql client; your favorite GUI). This creates your usual tables, views and routines. But some of these routines make for an interpreter for QueryScript. Somewhere along the installation process (remember – it’s just a SQL import), common_schema switches over to executing scripts to manage the installation. In particular, there are a few views which depend on optional tables, such as InnoDB Plugin’s tables for INFORMATION_SCHEMA. Continue reading » “How common_schema installs itself”

How common_schema split()s tables – internals

This post exposes some of the internals, and the SQL behind QueryScript’s split. common_schema/QueryScript 1.1 introduces the split statement, which auto-breaks a “large” query (one which operates on large tables as a whole or without keys) into smaller queries, and executes them in sequence.

This makes for easier transactions, less locks held, potentially (depending on the user) more idle time released back to the database. split has similar concepts to oak-chunk-update and pt-archiver, but works differently, and implemented entirely in SQL on server side.

Take the following statement as example:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  pass;

It yields with (roughly) the following statements:

UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` < '1000')) OR ((`inventory`.`inventory_id` = '1000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '1000'))) AND (((`inventory`.`inventory_id` < '2000')) OR ((`inventory`.`inventory_id` = '2000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '2000'))) AND (((`inventory`.`inventory_id` < '3000')) OR ((`inventory`.`inventory_id` = '3000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '3000'))) AND (((`inventory`.`inventory_id` < '4000')) OR ((`inventory`.`inventory_id` = '4000'))));
UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR WHERE ((((`inventory`.`inventory_id` > '4000'))) AND (((`inventory`.`inventory_id` < '4581')) OR ((`inventory`.`inventory_id` = '4581'))));

(I say “roughly” because internally there are user defined variables at play, but for convenience, I verbose the actual values as constants.)

How does that work?

common_schema works on server side. There is no Perl script or anything. It must therefore use server-side operations to:

  • Identify table to be split
  • Analyze the table in the first place, deciding how to split it
  • Analyze the query, deciding on how to rewrite it
  • Split the table (logically) into unique and distinct chunks
  • Work out the query on each such chunk

Following is an internal look at how common_schema does all the above. Continue reading » “How common_schema split()s tables – internals”

Table split(…) for the masses

(pun intended)

common_schema‘s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one’s query into smaller queries, each working on a different set of rows (a chunk).

Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.

In this post I show how split exposes itself to the user, should the user wish so.

split can manage queries of the following forms:

  • DELETE FROM table_name [WHERE]…
  • DELETE FROM table_name USING <multi table syntax> [WHERE]…
  • UPDATE table_name SET … [WHERE]…
  • UPDATE <multiple tables> SET … [WHERE]…
  • INSERT INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • REPLACE INTO some_table SELECT … FROM <single or multiple tables> [WHERE]…
  • SELECT … FROM <multiple tables> [WHERE]…

The latter being a non-obvious one at first sight.

Basically, it’ automatic

You just say:

split (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)
  throttle 2;

And split identifies sakila.inventory as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.

By the way, here’s how to execute a QueryScript code like the above. Continue reading » “Table split(…) for the masses”

common_schema 1.1 released: split(), try-catch, killall(), profiling

I’m very happy to announce the release of common_schema, version 1.1 (revision 300).

This version boasts with compelling new features: innovative QueryScript syntax, libraries, views which add to your skills as a DBA, making some maintenance and management tasks a breeze.

  • QueryScript, split statement: automagically break long queries into smaller chunks, avoid long locks and reduce query/transaction overhead
  • QueryScript, try-catch statement: just try { something; } catch { act_on_error; }.
  • killall(): quickly kill connections based on grantee/user/host information.
  • profiling/profiling_last: utility views to assist in query profiling diagnostics
  • 1 size fits all: a single installer which auto-recognizes available server features and enables respective common_schema features accordingly.
  • QueryScript performance boost
  • much much more…

Not familiar with common_schema? It allows you to do stuff on server side, by selecting from views, calling upon useful routines or writing easy-to-manage scripts.

I’m suggesting that common_schema should be a really-should-have tool to accompany your MySQL install. Did I say “tool”? It’s merely a schema. But it makes for a great framework:

In High Performance MySQL, 3rd edition, Baron Schwartz describes common_schema:

The common_schema is to MySQL as jQuery is to javaScript

Reviewing highlights for version 1.1:

QueryScript

QueryScript is a scripting language. It sees some major improvements here. I’ve made some speed boosts by avoiding using temporary tables, and by using string parsing instead.

Without doubt the two most handy statements added to QueryScript are: Continue reading » “common_schema 1.1 released: split(), try-catch, killall(), profiling”

SQL: selecting top N records per group, another solution

A while back I presented SQL: selecting top N records per group, a “give me the top 5 countries in each continent” type of query, and which used an external numbers table and a lot of tedious casting.

Here’s another solution I came up with (*). Still using GROUP_CONCAT (how else?), but no external table and no casting. The query outputs the largest 5 countries (by surface area) per continent.

SELECT
  Continent,
  Name,
  SurfaceArea,
  Population
FROM
  world.Country,
  (
    SELECT 
      GROUP_CONCAT(top_codes_per_group) AS top_codes
    FROM
      (
        SELECT 
          SUBSTRING_INDEX(GROUP_CONCAT(Code ORDER BY SurfaceArea DESC), ',', 5) AS top_codes_per_group
        FROM
          world.Country
        GROUP BY
          Continent
      ) s_top_codes_per_group
  ) s_top_codes
WHERE
  FIND_IN_SET(Code, top_codes)
ORDER BY
  Continent,
  SurfaceArea DESC
;

+---------------+----------------------------------------------+-------------+------------+
| Continent     | Name                                         | SurfaceArea | Population |
+---------------+----------------------------------------------+-------------+------------+
| Asia          | China                                        |  9572900.00 | 1277558000 |
| Asia          | India                                        |  3287263.00 | 1013662000 |
| Asia          | Kazakstan                                    |  2724900.00 |   16223000 |
| Asia          | Saudi Arabia                                 |  2149690.00 |   21607000 |
| Asia          | Indonesia                                    |  1904569.00 |  212107000 |
| Europe        | Russian Federation                           | 17075400.00 |  146934000 |
| Europe        | Ukraine                                      |   603700.00 |   50456000 |
| Europe        | France                                       |   551500.00 |   59225700 |
| Europe        | Spain                                        |   505992.00 |   39441700 |
| Europe        | Sweden                                       |   449964.00 |    8861400 |
| North America | Canada                                       |  9970610.00 |   31147000 |
| North America | United States                                |  9363520.00 |  278357000 |
| North America | Greenland                                    |  2166090.00 |      56000 |
| North America | Mexico                                       |  1958201.00 |   98881000 |
| North America | Nicaragua                                    |   130000.00 |    5074000 |
| Africa        | Sudan                                        |  2505813.00 |   29490000 |
| Africa        | Algeria                                      |  2381741.00 |   31471000 |
| Africa        | Congo, The Democratic Republic of the        |  2344858.00 |   51654000 |
| Africa        | Libyan Arab Jamahiriya                       |  1759540.00 |    5605000 |
| Africa        | Chad                                         |  1284000.00 |    7651000 |
| Oceania       | Australia                                    |  7741220.00 |   18886000 |
| Oceania       | Papua New Guinea                             |   462840.00 |    4807000 |
| Oceania       | New Zealand                                  |   270534.00 |    3862000 |
| Oceania       | Solomon Islands                              |    28896.00 |     444000 |
| Oceania       | New Caledonia                                |    18575.00 |     214000 |
| Antarctica    | Antarctica                                   | 13120000.00 |          0 |
| Antarctica    | French Southern territories                  |     7780.00 |          0 |
| Antarctica    | South Georgia and the South Sandwich Islands |     3903.00 |          0 |
| Antarctica    | Heard Island and McDonald Islands            |      359.00 |          0 |
| Antarctica    | Bouvet Island                                |       59.00 |          0 |
| South America | Brazil                                       |  8547403.00 |  170115000 |
| South America | Argentina                                    |  2780400.00 |   37032000 |
| South America | Peru                                         |  1285216.00 |   25662000 |
| South America | Colombia                                     |  1138914.00 |   42321000 |
| South America | Bolivia                                      |  1098581.00 |    8329000 |
+---------------+----------------------------------------------+-------------+------------+

In bold are the conditions by which we nominate our selected rows (condition is SurfaceArea DESC, number of rows is 5, so 5 largest countries).

Continue reading » “SQL: selecting top N records per group, another solution”