mk-schema-change? Check out ideas from oak-online-alter-table

March 10, 2010

In response to Mark Callaghan's post mk-schema-change.

I apologize for not commenting on the post itself, I do not hold a Facebook account. Anyway this is a long write, so it may as well deserve a post of its own.

Some of the work Mark is describing already exists under openark kit's oak-online-alter-table. Allow me to explain what I have gained there, and how the issue can be further pursued. There is relevance to Mark's suggestion.

oak-online-alter-table uses a combination of locks, chunks and triggers to achieve an almost non-blocking ALTER TABLE effect. I had a very short opportunity to speak with Mark on last year's conference, in between bites. Mark stated that anything involving triggers was irrelevant in his case.

The triggers are a pain, but I believe a few other insights from oak-online-alter-table can be of interest.

The first attempt

My first attempt with the script assumed:

  • Table has an AUTO_INCREMENT PRIMARY KEY column
  • New rows always gain ascending PRIMARY KEY values
  • PRIMARY KEY never changes for an existing row
  • PRIMARY KEY values are never reused
  • Rows may be deleted at will
  • No triggers exist on the table
  • No FOREIGN KEYs exist on the table.

So the idea was: when one wants to do an ALTER TABLE:

  1. Create a ghost table with the new structure.
  2. Read the minimum and maximum PK values.
  3. Create AFTER INSERT, AFTER UPDATE, AFTER DELETE triggers on the original table. These triggers will propagate the changes onto the ghost table.
  4. Working out slowly, and in small chunks, copy rows within recorded min-max values range into the ghost table. The interesting part is where the script makes sure there's no contradiction between these actions and those of the triggers, (whichever came first!). This is largely solved using INSERT IGNORE and REPLACE INTO in the proper context.
  5. Working out slowly and in chunks again, we remove rows from the ghost table, which are no longer existent in the original table.
  6. Once all chunking is complete, RENAME original table to *_old, and ghost table in place of the original table.

Steps 4 & 5 are similar in concept to transactional recovery through redo logs and undo logs.

The next attempt

Next phase removed the AUTO_INCREMENT requirement, as well as the "no reuse of PK". In fact, the only remaining constraints were:

  • There is some UNIQUE KEY on the table which is unaffected by the ALTER operation
  • No triggers exist on the table
  • No FOREIGN KEYs exist on the table.

The steps are in general very similar to those listed previously, only now a more elaborate chunking method is used with possible non-integer, possible multi-column chunking algorithm. Also, the triggers take care of changes in UNIQUE KEY values themselves.


Have a look at the wiki pages for OnlineAlterTable*. There is some discussion on concurrency issues; on transactional behavior, which explains why oak-online-alter-table performs correctly. Some of these are very relvant, I believe, to Mark's suggestion. In particular, making the chunks copy; retaining transactional integrity, etc.

To remove any doubt, oak-online-alter-table is not production ready or anywhere near. Use at your own risk. I've seen it work, and I've seen it crash. I got little feedback and thus little chance to fix things. I also didn't touch the code for quite a few months now, so I'm a little rusty myself.

  • Mark Callaghan

    Thanks for being specific with my quote on triggers. I am not willing to use them in production. Maybe I am not able to use them. They might work great for others but I have enough features to worry about and am cautious to take on more. If Shlomi or another MySQL expert suggests you should use triggers for a specific problem, then by all means do so.

  • I think your quote on triggers took some 30% of our entire conversation 😀

    By all means, this is not a discussion on triggers. There's an algorithm beside the triggers here.

  • Mark Callaghan

    We finally got around to working on a tool for this. It uses triggers. Did you publish code for your version?

  • Nagavamsi Ponnekanti (Vamsi)

    Interesting approach. Some questions I have are
    1. For multi-column unique key, chunking would make where clause complicated, right? Eg., for a 2-column key, if 1st select reads till [x, y] the where clause of 2nd select will look like ((col1 = x and col2 > y) OR (col1 > x)). Seems to get even more messy with 3 column keys. If we use such complex where clauses, is mysql optimizer smart enough to make use of index on unique key?

    2. Mysql seems to get read locks on rows of table S in statements like "insert into T(...) select ... from S". Are those row locks held till commit, or released as soon as scan moves to next row?

    3. Likewise does "delete from T where key not in (select ... from S)" also get READ row locks on S, and fo so, are those read locks held till commit?

  • Hi Mark,
    Yes, code is published for over a year: openark kit
    Look for oak-online-alter-table.

  • @Vamsi,

    1. Yes. Originally I tried tuple comparison, e.g. (a,b,c) >= (3,4,5). That didn't turn out well, and I've documented it on: MySQL not being able to utilize a compound index?.
    With multiple column the breaking down of the range comparison is ugly indeed, but it's code-generated, so who cares...? Plus, the optimizer uses the key well.

    2. With InnoDB locks are never relased during transaction but only when it commits or rolls back.

    3. Likewise


  • Mark Callaghan

    @Vamsi - InnoDB gets S locks on rows in the SELECT portion of INSERT/UPDATE/DELETE ... SELECT * from FOO. This is done so that the statement produces the same result on a slave. If this were not done, then another connection could start running after the long running statement above, update the selected table (FOO above), commit and write to the binlog before the INSERT/UPDATE/DELETE ... SELECT * from FOO above.

    In that case the long-running statement above doesn't see the fast update on the master. But on the slave the fast update is applied first from the relay log and the long-running INSERT/UPDATE/DELETE reads it.

    Unfortunately, InnoDB almost always gets the S lock even when it shouldn't -- binlog is disabled was the obvious case. It might also get the S lock when sql_log_bin=0. Of course, we can fix things like that.

  • Nagavamsi Ponnekanti (Vamsi)

    wget is able to get,
    but tar complains.

    tar -zxvf online.tar.gz

    gzip: stdin: not in gzip format
    tar: Child returned status 1
    tar: Error exit delayed from previous errors

  • @Vamsi,
    That's because Google changed their linking method. Please follow this link with your browser to get to the real download link.
    You have downloaded an HTML page.


  • Nagavamsi Ponnekanti (Vamsi)

    Hi Shlomi,
    Regarding your comment "I’ve seen it work, and I’ve seen it crash.", do u remember which step(s) it was crashing at?

  • @Vamsi,

    Shall we continue this discussion by mail? You can find my email address here.

Powered by Wordpress and MySQL. Theme by