Review – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 18 Dec 2012 18:41:36 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 State of InnDB Online DDL in MySQL 5.6.9-RC (good news included) https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-9-rc-good-news-included https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-9-rc-good-news-included#comments Tue, 18 Dec 2012 11:21:12 +0000 https://shlomi-noach.github.io/blog/?p=5882 5.6.9-RC is out, and I was curious to see how the online DDL has improved since my 5.6.8 review. I also owe James Day this review, since he came up with results inconsistent with my own.

We both agreed the dataset I was using was too small, but I got similar results even on larger scale. Then some time passed, and 5.6.9 was announced.

So for the 5.6.9 test I took one of my real tables on production. It is not extremely large: it’s a ~ 300MB .ibd file, in the following format:

mysql> show create table tbl \G

CREATE TABLE `tbl` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `a` varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
  `w` smallint(11) NOT NULL DEFAULT '0',
  `d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `icount` smallint(5) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`) KEY_BLOCK_SIZE=8,
  UNIQUE KEY `u_idx` (`a`,`w`,`d`) KEY_BLOCK_SIZE=8,
  KEY `d` (`d`) KEY_BLOCK_SIZE=8
) ENGINE=InnoDB AUTO_INCREMENT=16960441 DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16

Got some 2.5M rows in the table; desktop machine, 64 bit Linux, mysqlsandbox.

I have crossed several DDL statements with several DML statements. The DDL statements in this test are (ALTER TABLE…):

  • ROW_FORMAT=COMPACT
  • AUTO_INCREMENT=16960441
  • ADD INDEX (w)
  • DROP INDEX w
  • ADD COLUMN c CHAR(1) NOT NULL
  • DROP COLUMN c

The DML statements are:

  1. select max(id) from test.tbl; — this queries the AUTO_INCREMENT value, which is of course a PRIMARY KEY
  2. select min(d) from test.tbl; — there is an index on d, and normal execution plan is to optimize table away and just use the index
  3. select min(icount) from test.tbl; — there is no index on icount, and full table scan is required
  4. update test.tbl set d = d + interval 1 second where id = 8057370; — the UPDATE uses the PRIMARY KEY
  5. update test.tbl set d = d + interval 1 second where icount = 200; — will affect 4 rows, but requires full scan.

The results?

ALTER TABLE… Time (sec) General comments select max PK select min by index select min by full scan update by PK update by full scan
ROW_FORMAT=COMPACT 183 online online online online online
AUTO_INCREMENT=16960441 0.24 [Instant operation] n/a n/a n/a n/a n/a
ADD INDEX (w) 21 online online online online online
DROP INDEX w 0.1 [Instant operation] n/a n/a n/a n/a n/a
ADD COLUMN c CHAR(1) NOT NULL 103 online online online online online
DROP COLUMN c 110 online online online online online

Notes

  • All operations were online: operations did not wait for ALTER to complete.
  • I executed all operations multiple times during each ALTER.
  • In addition, I executed operations from another client.
  • Some operations were fast, others sometimes took as long as 7.34 seconds to complete. This is no small matter: the time it took for each DML was indeterministic, and longer than what it would usually take it. That’s perfectly understandable. Just note that some operations took exceedingly long time to complete. My understanding is that the ALTER operations happens in chunks. DML statements are allowed in between these chunks. This is the reason why on smaller tables there didn’t seem to be any “online” statement: the chunks were just too large in relation to table size. And so, and this is still my own understanding, your query may get lucky or unlucky depending on the exact moment it has been issued.
  • I did not try it with FOREIGN KEYs. I previously concluded that foreign keys were a no-go for online DDL. I’m not sure if this is still the case. Another time for this test – but it must take place.

Conclusions

Still RC – but for the first time the online DDL seem to deliver what’s promised. I’m very happy to see this.

I am yet to understand how the ALTER works via replication. With single threaded replication I would assume it’s back to “wait till I’m done” on the slave, in which case the “online” term is not there yet. Even on multi-threaded replication DML on same schema would hang. I’m happy to be corrected on this by an authority.

My predicament is that oak-online-alter-table or pt-online-schema-change are here to stay for the next couple of years at least. Some operations, like partitioning, are not supported by current online InnoDB DDL. Also, these scripts allow you some control over the speed at which the ALTER process works, allowing for pre-defined sleep time in between chunks, so as to let the server – and its slaves – recover their breath.

Nonetheless, big kudos for the InnoDB team at Oracle for pulling this one out!

]]>
https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-9-rc-good-news-included/feed 4 5882
State of InnDB Online DDL in MySQL 5.6.8-RC https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-8-rc https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-8-rc#comments Tue, 20 Nov 2012 09:49:14 +0000 https://shlomi-noach.github.io/blog/?p=5823 5.6.8-rc is out, and so I’m following up on InnoDB’s online DDL new feature: the ability to SELECT, INSERT, DELETE, UPDATE a table even while an ALTER TABLE is executing on same table.

The brief summary

Not as advertised; many things can’t be done.

The longer review

I’m using 5.6.8-rc 64bit binary distribution for Linux, installed via mysqlsandbox. My hardware is irrelevant, but the fact I’m testing on my laptop assists me in that ALTER TABLE operations take a while, so that I’m able to easily type commands in two terminals and have the time to watch them being executed. Query cache is disabled.

I’m using the sakila sample database, and in particular I’m working with the rental table. Here’s the table definition:

CREATE TABLE `rental` (
  `rental_id` int(11) NOT NULL AUTO_INCREMENT,
  `rental_date` datetime NOT NULL,
  `inventory_id` mediumint(8) unsigned NOT NULL,
  `customer_id` smallint(5) unsigned NOT NULL,
  `return_date` datetime DEFAULT NULL,
  `staff_id` tinyint(3) unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`rental_id`),
  UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`),
  KEY `idx_fk_inventory_id` (`inventory_id`),
  KEY `idx_fk_customer_id` (`customer_id`),
  KEY `idx_fk_staff_id` (`staff_id`),
  CONSTRAINT `fk_rental_staff` FOREIGN KEY (`staff_id`) REFERENCES `staff` (`staff_id`) ON UPDATE CASCADE,
  CONSTRAINT `fk_rental_inventory` FOREIGN KEY (`inventory_id`) REFERENCES `inventory` (`inventory_id`) ON UPDATE CASCADE,
  CONSTRAINT `fk_rental_customer` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`customer_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=16050 DEFAULT CHARSET=utf8

Highlights for the table: AUTO_INCREMENT PRIMARY KEY, some columns indexed, some not, and Foreign Keys in place. Pretty much a standard table. It contains 16,044 rows. Row format is COMPACT.

What I want to know is: which DDL commands allow for which online DML commands?

So, on terminal #1 I will issue queries like:

node1 5.6.8-rc-log sakila> alter table sakila.rental ROW_FORMAT=COMPACT /* or whatever */;
Query OK, 0 rows affected (10.57 sec)
Records: 0  Duplicates: 0  Warnings: 0

And during the above operation, I will execute the following on terminal #2:

  1. select max(rental_id) from sakila.rental; this queries the AUTO_INCREMENT value, which is of course a PRIMARY KEY
  2. select min(rental_date) from sakila.rental; there is an index on rental_date, and normal execution plan is to optimize table away and just use the index
  3. select min(return_date) from sakila.rental; there is no index on return_date, and full table scan is required
  4. update rental set return_date = return_date + interval 1 second where rental_id=3; the UPDATE uses the PRIMARY KEY
  5. update rental set return_date = return_date + interval 1 second where return_date = NOW(); won’t actually affect anything, but requires full scan.

So here are the results:

+-------------------------------------------------------------+-------+---------------------------+---------------+---------------------+-------------------------+--------------+---------------------+
| ALTER statement                                             | Time  | General comments          | select max PK | select min by index | select min by full scan | update by PK | update by full scan |
+-------------------------------------------------------------+-------+---------------------------+---------------+---------------------+-------------------------+--------------+---------------------+
| ROW_FORMAT=COMPACT                                          | 10.92 |                           | Instant       | Instant             | Instant                 | blocked      | blocked             |
| AUTO_INCREMENT=16051                                        |  0.06 | Instant, no table rebuild | N/A           | N/A                 | N/A                     | N/A          | N/A                 |
| ADD INDEX(last_update)                                      |  2.37 |                           | blocked       | blocked             | blocked                 | blocked      | blocked             |
| ADD INDEX(last_update), ALGORITHM=INPLACE                   |  1.83 |                           | blocked       | blocked             | blocked                 | blocked      | blocked             |
| ADD INDEX(last_update), ALGORITHM=INPLACE, LOCK=NONE        |  0.00 | ERROR 1235 (42000): ...   | N/A           | N/A                 | N/A                     | N/A          | N/A                 |
| ADD COLUMN c CHAR(1) NOT NULL                               | 11.20 |                           | Instant       | Instant             | Instant                 | blocked      | blocked             |
| ADD COLUMN c CHAR(1) NOT NULL, ALGORITHM=INPLACE, LOCK=NONE |  0.00 | ERROR 1235 (42000): .     | N/A           | N/A                 | N/A                     | N/A          | N/A                 |
+-------------------------------------------------------------+-------+---------------------------+---------------+---------------------+-------------------------+--------------+---------------------+

Rather surprising, I would say.

  • None of my tests resolved with online write (UPDATE). At best I could get online read (SEELCT).
  • AUTO_INCREMENT is instantaneous. High time for that! It’s just some number in the .frm file, never understood the need for table rebuild.
  • Apparently ADD COLUMN is more online than ADD INDEX, and I’ve tested this again and again and again to make sure I was doing it right. This is quite weird, even according to the docs.
  • In none of the above tests (and others, non listed), have I been able to specify LOCK=NONE. It’s always ERROR 1235 (42000): This version of MySQL doesn’t yet support ‘alter table sakila.rental <whatever>, algorithm=inplace, lock=none’.

So what’s so online about this? Online reads are nice, but most everyone cannot accept blocking writes (for same reason no one would use mysqlhotcopy, also so wrongly named). This leaves us again with oak-online-alter-table and pt-online-schema-change.

The butler did it

Apologies to the butler, the FOREIGN KEYs did it. Let’s try the same again without foreign keys:

node1 5.6.8-rc-log sakila> create table rental2 like rental;
node1 5.6.8-rc-log sakila> insert into rental2 select * from rental;
node1 5.6.8-rc-log sakila> rename table rental to rental_old, rental2 to rental;
Query OK, 0 rows affected (0.31 sec)

Here are the results:

+-------------------------------------------------------------+-------+---------------------------+----------------+---------------------+-------------------------+----------------+---------------------+
| ALTER statement                                             | Time  | General comments          | select max PK  | select min by index | select min by full scan | update by PK   | update by full scan |
+-------------------------------------------------------------+-------+---------------------------+----------------+---------------------+-------------------------+----------------+---------------------+
| ROW_FORMAT=COMPACT                                          | 11.03 |                           | Instant        | Instant             | Instant                 | Instant        | Instant             |
| AUTO_INCREMENT=16051                                        |  0.05 | Instant, no table rebuild | N/A            | N/A                 | N/A                     | N/A            | N/A                 |
| ADD INDEX(last_update)                                      |  2.04 |                           | * Inconsistent | * Inconsistent      | * Inconsistent          | blocked        | blocked             |
| ADD INDEX(last_update), ALGORITHM=INPLACE, LOCK=NONE        |  3.14 |                           | * Inconsistent | * Inconsistent      | * Inconsistent          | blocked        | blocked             |
| ADD COLUMN c CHAR(1) NOT NULL                               |    ** |                           | * Inconsistent | * Inconsistent      | * Inconsistent          | * Inconsistent | * Inconsistent      |
| ADD COLUMN c CHAR(1) NOT NULL, ALGORITHM=INPLACE, LOCK=NONE |    ** |                           | * Inconsistent | * Inconsistent      | * Inconsistent          | * Inconsistent | * Inconsistent      |
+-------------------------------------------------------------+-------+---------------------------+----------------+---------------------+-------------------------+----------------+---------------------+

What’s going on here?

  • ALGORITHM=INPLACE, LOCK=NONE is accepted! Bad, bad foreign keys!
  • * ADD INDEX usually allows for concurrent reads, but after repeated tests SELECTs start to block. Then they don’t work concurrently anymore until table is recreated. But even that not always, so I’m not sure what the inconsistency is.
  • * ADD COLUMN is still more concurrent than ADD INDEX, and actually allows for concurrent writes! Though, inconsistently. Sometimes it does not allow for concurrent writes.
  • ** ADD COLUMN runtime highly affected by concurrent queries. It wents as high as 45 seconds on my laptop. Now, to make things clear, I’m not running an automated benchmark here: I’m copying+pasting the statements from my editor to the mysql CLI. So, maybe 10 or 15SELECT and UPDATE queries executes. How does that justify 35 seconds delay in table rebuild?

Some conclusions:

  • The documentation does not specify anything about FOREIGN KEYs crashing the party. It should.
  • The documentation specifically mentions the ADD/DROP INDEX statements to be online. ADD INDEX is less online than ADD COLUMN.
  • Everything is still shaky. Sometimes things work, sometimes they don’t.
  • Runtimes are unproportionally affected by concurrent queries.
  • For the meantime, I keep to my online alter table scripts. Been using them for 3.5 years now.
]]>
https://shlomi-noach.github.io/blog/mysql/state-of-inndb-online-ddl-in-mysql-5-6-8-rc/feed 1 5823