Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group.
I wanted to test TokuDB's compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems.
To my surprise, when I converted tables from InnoDB to TokuDB, I saw an increase in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table's "KEY_BLOCK_SIZE=4" as an instruction for TokuDB's page size. TokuDB should be using 4MB block size, but thinks it's being instructed to use 4KB. Problem is, you can't get rid of table options. When one converts a table to InnoDB in ROW_FORMAT=COMPACT, or even to MyISAM, the KEY_BLOCK_SIZE option keeps lurking in the dark.
So until this is hopefully resolved in TokuDB's next version, here's a way to go around the problem.
The case at hand
Consider the following table:
CREATE TABLE `t` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `c1` int(10) unsigned NOT NULL DEFAULT '0', `c2` int(10) unsigned NOT NULL DEFAULT '0', `c3` int(10) unsigned NOT NULL DEFAULT '0', `c4` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `c5` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `c6` smallint(10) unsigned NOT NULL DEFAULT '0', `c7` smallint(10) unsigned NOT NULL DEFAULT '0', `c8` smallint(10) unsigned NOT NULL DEFAULT '0', `c9` smallint(10) unsigned NOT NULL DEFAULT '0', `c10` smallint(10) unsigned NOT NULL DEFAULT '0', `c11` smallint(10) NOT NULL DEFAULT '0', `c12` smallint(10) NOT NULL DEFAULT '0', `c13` smallint(10) NOT NULL DEFAULT '0', `c14` smallint(10) NOT NULL DEFAULT '0', `ct` text NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `c1c4` (`c1`,`c4`), KEY `c4` (`c4`) ) ENGINE=InnoDB AUTO_INCREMENT=4688271 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4
Note that it is in COMPRESSED format, with KEY_BLOCK_SIZE=4. It mostly has INT columns, so I don't expect it to compress by much.
On disk, the .ibd file amounts to 160MB. Table has 3,587,488 rows. Same table in InnoDB COMPACT row format amounts to 412MB on disk.
Converting the table to TokuDB with aggressive compression resulted with:
mysql> alter table t engine=tokudb row_format=tokudb_lzma; Query OK, 3587488 rows affected (29 min 48.79 sec) Records: 3587488 Duplicates: 0 Warnings:
And over 873MB of combined files on disk! Also note it took nearly 30 minutes to ALTER. Clearly this is not the expected outcome.
Attempt to make it work
I tried both the following approaches:
- alter table t engine=tokudb row_format=tokudb_lzma key_block_size=4096: thought it would fool TokuDB to think it should create a 4M key block size.
- alter table t engine=tokudb row_format=tokudb_lzma key_block_size=0: try and reset the key block size.
Both the above attempts resulted with same bloat in resulting table.
The reason? When ALTERing a table with a nother KEY_BLOCK_SIZE, the keys on the table remain with their old KEY_BLOCK_SIZE. They are unaffected by the ALTER. As suggested by Nail Kashapov, indexes must be rebuilt as well.
Making it work
The next ALTER modifies the KEY_BLOCK_SIZE and rebuilds all the indexes on the table:
mysql> alter table t drop primary key, add primary key(id), drop key c1c4, add unique key `c1c4` (c1, c4), drop key c4, add key `c4` (c4), engine=tokudb row_format=tokudb_lzma key_block_size=0; Query OK, 3587488 rows affected (2 min 7.97 sec) Records: 3587488 Duplicates: 0 Warnings: 0
Yep! Runtime seems much more agreeable. Total size on disk? Little over 26M. Did I say I wasn't expecting good reduction in terms of compression?
Have done the same for multiple tables; compression is consistently strong (e.g. 16MB InnoDB compressed -> 3.5MB TokuDB aggressive, 548MB InnoDB non-compressed -> 36MB TokuDB aggressive), on varying table schemata. Very impressive reduction in disk space!
Next version of TokuDB is expected to ignore the KEY_BLOCK_SIZE table option; until then converting compressed tables to TokuDB is a pain in terms of the syntax -- but worthwhile in terms of disk space.