Online ALTER TABLE now available in openark kit

March 24, 2009

A new utility in openark kit allows for online ALTER TABLE operation. That is, the modification of table structure without locking down the entire table for the duration of the operation. The oak-online-alter-table utility works under the following restrictions:

  • The table has at least one single-column UNIQUE KEY [*]
  • Altered table shares a single-column UNIQUE KEY with the original table [*]
  • No 'AFTER' triggers are defined on the table (the utility creates its own triggers for the duration of the operation)
  • The table has no FOREIGN KEYs [*][#]
  • Table name is no longer than 57 characters

[*]: Restriction is scheduled to be removed or partly removed.

[#]: 'Child-side' foreign keys may actually work, but have not been tested.

Follows is a mini FAQ which attempts to introduce the utility.

So what exactly does this utility provide?

  • First and foremost, the ability to perform a non blocking ALTER TABLE. This has long been an issue with MySQL, and complex Master-Master, application aware solutions are currently required in order to perform an ALTER TABLE with minimal downtime. The utility offers a no-downtime solution, albeit there is performance penalty for the duration of its runtime, and some requirements to meet.
  • It also supports a 'null' ALTER. That is, an ALTER TABLE which does not change anything. This effectively means rebuilding of the table. For InnoDB tables with innodb_file_per_table, for example, this could be the means of regaining disk space after removing many rows from the table. Also, while it does not strictly act like OPTIMIZE TABLE, the effect of running this utility should build a better organized table on disk (this as yet unverified).
  • Another thing this utility supports is the building of a ghost table: a duplicate of a given table, which keeps mirroring the original table via triggers. [May be removed in future versions]

How does it work?

Well, there are some wiki pages [1] [2] [3] which explain the details, but in general it goes as follows:

  1. The utility creates an empty 'ghost'  table, a copy of the original
  2. The ghost table is ALTERed (or left unchanged, if no ALTER specified)
  3. Triggers are created on the original table, which propagate changes to the ghost table.
  4. In two passes, data is copied from the original table to the ghost table, then reviewed for deletion.
  5. The ghost table is renamed to take place of the original table; the original table is dropped.

Here's an invocation sample, in which we modify the type of the AUTO_INCREMENT column, add a column, add a key and drop a column - all at the same time, all online:

oak-online-alter-table -u root --ask-pass --socket=/tmp/mysql.sock --database=world --table=City --alter="MODIFY Id BIGINT UNSIGNED AUTO_INCREMENT, ADD COLUMN Mayor VARCHAR(64) CHARSET utf8, DROP COLUMN District, ADD KEY(Population)"

What kind of ALTERs are supported?

  • ADD COLUMN (new column must have a default value)
  • DROP COLUMN (as long as there's still a shared single-column UNIQUE KEY between the old table and the altered one)
  • MODIFY COLUMN (change type, including UNIQUE KEY columns)
  • ADD KEY (normal, unique, fulltext etc.)
  • DROP KEY (as long as there's still a shared single-column UNIQUE KEY between the old table and the altered one)
  • Change ENGINE: works, but great care should be taken when dealing with non-transactional engines
  • Adding FOREIGN KEY constraints is possible
  • More... Not all ALTERS have been tested, e.g. CONVERT, partition handling...
  • none: it is possible to just do an online rebuild of the table, with no modification to its structure

The following are not allowed while the utility is running:

  • An ALTER TABLE on the original table (well, obviously)
  • A TRUNCATE on the original table
  • LOAD DATA INFILE into the original table
  • OPTIMIZE on the original table

In short, while this utility runs, just act normal...

Which engines are supported?

MyISAM & InnoDB are both supported. Any engine which uses table level locks is implicitly supported (Memory, Archive). There is currently no support for Maria (transactional), PBXT or Falcon. No plans for the near future include these.

It is possible to ALTER the ENGINE of a table, but this is highly unrecommended (due to differences in transactional behavior, or lack of transactional support altogether).

With InnoDB, and when running long transactions during the runtime of this utility, one of the steps (which requires a LOCK TABLES...WRITE) may take some time to complete: it repeatedly attempts to set up locks until successful. However, there is no blocking during these attempts, and the lock time, once the locks are gained, is minor.

What impact does this utility have?

For the duration of runtime, you can expect a performance penalty. There are many reasons for that:

  • Triggers are added to your table. This, by itself, is performance penalty.
  • Any changes to your table is propagated to the ghost table. So every INSERT makes for two, actually.
  • More locks are held. While this utility is non-blocking, it does lock small chunks in turns, and for very short periods. But still locks are held.
  • While the ghost table is being built, disk space of up to the same size as your original table is consumed. Depending on your ENGINE and storage means, some disk space may be released later on.

I was drunk when I decided to run it, but now that I'm sober, I find out it's still running and I want to undo everything!

OK. First, don't drink and DBA. Next, stop the utility, run it again with --cleanup. All should be back to normal.

How mature is this utility? I think there's a bug!

It is very experimental. It seems to be working well as far as I know, but community feedback is appreciated. Please report bugs here. Thank you!

  • Pingback: New in openark kit: online alter table | openark forge()

  • While useful in some cases, the prob with this is that for big tables the process is still quite painful and in fact made worse; with this process, at some point there'll be 3 copies of the data (original table, the extra one, and the newly written one while alter table is doing its thing).
    For large tables the issue is often both diskspace as well as disk I/O.

  • @Arjen
    Not so. There will only be two tables: the original one, and the ALTERed one.
    See, I create the ALTERed table as empty, and fill it on the fly. This is why you only have two copies - not worse than the "real" ALTER TABLE, which builds a copy - while blocking your entire table.

  • @Arjen
    Disk space - you get the same issue when doing normal ALTER TABLE, either directly, or with M-M replication. At any case you're spending disk space (unless you dump+import, which is a pain in itself)

  • flytox

    Interesting work. I'm curious to check the generated overhead while altering the table.

  • @flytox,

    I'm actually running this on live servers (my own...) and find that load average does not increase significantly. What's most affected is slave lag. Using the correct --sleep-time allows for slaves for better follow ups.

  • Pingback: Pythian Group - Blog()

  • Great work, I interest in test it

  • Mohammad,
    please do, I will appreciate any feedback


  • Jon d

    Thanks for the tool im using it now... Ive been doing similar things myself manually, first create schema, then do a first pass, then copy the remaining rows. Then rename/drop. Great script.

  • The link to oak-online-alter-table in your post is now broken.

  • @Ronald: thanks, fixed

  • Saurav Kothari

    Lots of broken links. Links to oak-online-alter-table from openark homepage and other pages are broken.

Powered by Wordpress and MySQL. Theme by