In response to Mark Callaghan’s post mk-schema-change.
I apologize for not commenting on the post itself, I do not hold a Facebook account. Anyway this is a long write, so it may as well deserve a post of its own.
Some of the work Mark is describing already exists under openark kit‘s oak-online-alter-table. Allow me to explain what I have gained there, and how the issue can be further pursued. There is relevance to Mark’s suggestion.
oak-online-alter-table uses a combination of locks, chunks and triggers to achieve an almost non-blocking ALTER TABLE effect. I had a very short opportunity to speak with Mark on last year’s conference, in between bites. Mark stated that anything involving triggers was irrelevant in his case.
The triggers are a pain, but I believe a few other insights from oak-online-alter-table can be of interest.
The first attempt
My first attempt with the script assumed:
- Table has an AUTO_INCREMENT PRIMARY KEY column
- New rows always gain ascending PRIMARY KEY values
- PRIMARY KEY never changes for an existing row
- PRIMARY KEY values are never reused
- Rows may be deleted at will
- No triggers exist on the table
- No FOREIGN KEYs exist on the table.
So the idea was: when one wants to do an ALTER TABLE:
- Create a ghost table with the new structure.
- Read the minimum and maximum PK values.
- Create AFTER INSERT, AFTER UPDATE, AFTER DELETE triggers on the original table. These triggers will propagate the changes onto the ghost table.
- Working out slowly, and in small chunks, copy rows within recorded min-max values range into the ghost table. The interesting part is where the script makes sure there’s no contradiction between these actions and those of the triggers, (whichever came first!). This is largely solved using INSERT IGNORE and REPLACE INTO in the proper context.
- Working out slowly and in chunks again, we remove rows from the ghost table, which are no longer existent in the original table.
- Once all chunking is complete, RENAME original table to *_old, and ghost table in place of the original table.
Steps 4 & 5 are similar in concept to transactional recovery through redo logs and undo logs.
The next attempt
Next phase removed the AUTO_INCREMENT requirement, as well as the “no reuse of PK”. In fact, the only remaining constraints were:
- There is some UNIQUE KEY on the table which is unaffected by the ALTER operation
- No triggers exist on the table
- No FOREIGN KEYs exist on the table.
The steps are in general very similar to those listed previously, only now a more elaborate chunking method is used with possible non-integer, possible multi-column chunking algorithm. Also, the triggers take care of changes in UNIQUE KEY values themselves.
mk-schema-change?
Have a look at the wiki pages for OnlineAlterTable*. There is some discussion on concurrency issues; on transactional behavior, which explains why oak-online-alter-table performs correctly. Some of these are very relvant, I believe, to Mark’s suggestion. In particular, making the chunks copy; retaining transactional integrity, etc.
To remove any doubt, oak-online-alter-table is not production ready or anywhere near. Use at your own risk. I’ve seen it work, and I’ve seen it crash. I got little feedback and thus little chance to fix things. I also didn’t touch the code for quite a few months now, so I’m a little rusty myself.
@Vamsi,
1. Yes. Originally I tried tuple comparison, e.g. (a,b,c) >= (3,4,5). That didn’t turn out well, and I’ve documented it on: MySQL not being able to utilize a compound index?.
With multiple column the breaking down of the range comparison is ugly indeed, but it’s code-generated, so who cares…? Plus, the optimizer uses the key well.
2. With InnoDB locks are never relased during transaction but only when it commits or rolls back.
3. Likewise
Shlomi
@Vamsi – InnoDB gets S locks on rows in the SELECT portion of INSERT/UPDATE/DELETE … SELECT * from FOO. This is done so that the statement produces the same result on a slave. If this were not done, then another connection could start running after the long running statement above, update the selected table (FOO above), commit and write to the binlog before the INSERT/UPDATE/DELETE … SELECT * from FOO above.
In that case the long-running statement above doesn’t see the fast update on the master. But on the slave the fast update is applied first from the relay log and the long-running INSERT/UPDATE/DELETE reads it.
Unfortunately, InnoDB almost always gets the S lock even when it shouldn’t — binlog is disabled was the obvious case. It might also get the S lock when sql_log_bin=0. Of course, we can fix things like that.
Shlomi,
wget is able to get http://code.google.com/p/openarkkit/downloads/detail?name=openark-kit-111.tar.gz,
but tar complains.
tar -zxvf online.tar.gz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
@Vamsi,
That’s because Google changed their linking method. Please follow this link with your browser to get to the real download link.
You have downloaded an HTML page.
Shlomi
Hi Shlomi,
Regarding your comment “I’ve seen it work, and I’ve seen it crash.”, do u remember which step(s) it was crashing at?
Thanks.