{"id":6368,"date":"2013-06-05T09:10:12","date_gmt":"2013-06-05T07:10:12","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=6368"},"modified":"2013-06-05T09:10:12","modified_gmt":"2013-06-05T07:10:12","slug":"converting-compressed-innodb-tables-to-tokudb-7-0-1","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/converting-compressed-innodb-tables-to-tokudb-7-0-1","title":{"rendered":"Converting compressed InnoDB tables to TokuDB 7.0.1"},"content":{"rendered":"<p>Or: how to make it work in TokuDB version <strong>7.0.1<\/strong>. This is a follow up on a <a href=\"https:\/\/groups.google.com\/forum\/?fromgroups=#!topic\/tokudb-user\/hLlHwlp2AL0\">discussion on the tokudb-user group<\/a>.<\/p>\n<h4>Background<\/h4>\n<p>I wanted to test TokuDB&#8217;s compression. I took a staging machine of mine, with production data, and migrated it from <strong>Percona Server 5.5<\/strong> To <strong>MariaDB 5.5+TokuDB 7.0.1<\/strong>. Migration went well, no problems.<\/p>\n<p>To my surprise, when I converted tables from InnoDB to TokuDB, I saw an <em>increase<\/em> in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table&#8217;s <strong>&#8220;KEY_BLOCK_SIZE=4&#8221;<\/strong> as an instruction for TokuDB&#8217;s page size. TokuDB should be using <strong>4MB<\/strong> block size, but thinks it&#8217;s being instructed to use <strong>4KB<\/strong>. Problem is, you <a href=\"http:\/\/bugs.mysql.com\/bug.php?id=67727\">can&#8217;t get rid of table options<\/a>. When one converts a table to InnoDB in <strong>ROW_FORMAT=COMPACT<\/strong>, or even to MyISAM, the <strong>KEY_BLOCK_SIZE<\/strong> option keeps lurking in the dark.<\/p>\n<p>So until this is hopefully resolved in TokuDB&#8217;s next version, here&#8217;s a way to go around the problem.<!--more--><\/p>\n<h4>The case at hand<\/h4>\n<p>Consider the following table:<\/p>\n<blockquote>\n<pre> CREATE TABLE `t` (\r\n  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,\r\n  `c1` int(10) unsigned NOT NULL DEFAULT '0',\r\n  `c2` int(10) unsigned NOT NULL DEFAULT '0',\r\n  `c3` int(10) unsigned NOT NULL DEFAULT '0',\r\n  `c4` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',\r\n  `c5` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',\r\n  `c6` smallint(10) unsigned NOT NULL DEFAULT '0',\r\n  `c7` smallint(10) unsigned NOT NULL DEFAULT '0',\r\n  `c8` smallint(10) unsigned NOT NULL DEFAULT '0',\r\n  `c9` smallint(10) unsigned NOT NULL DEFAULT '0',\r\n  `c10` smallint(10) unsigned NOT NULL DEFAULT '0',\r\n  `c11` smallint(10) NOT NULL DEFAULT '0',\r\n  `c12` smallint(10) NOT NULL DEFAULT '0',\r\n  `c13` smallint(10) NOT NULL DEFAULT '0',\r\n  `c14` smallint(10) NOT NULL DEFAULT '0',\r\n  `ct` text NOT NULL,\r\n  PRIMARY KEY (`id`),\r\n  UNIQUE KEY `c1c4` (`c1`,`c4`),\r\n  KEY `c4` (`c4`)\r\n) ENGINE=InnoDB AUTO_INCREMENT=4688271 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4<\/pre>\n<\/blockquote>\n<p>Note that it is in <strong>COMPRESSED<\/strong> format, with <strong>KEY_BLOCK_SIZE=4<\/strong>. It mostly has <strong>INT<\/strong> columns, so I don&#8217;t expect it to compress by much.<\/p>\n<p>On disk, the <strong>.ibd<\/strong> file amounts to <strong>160MB<\/strong>. Table has<strong> <\/strong><strong>3,587,488<\/strong> rows. Same table in InnoDB COMPACT row format amounts to <strong>412MB<\/strong> on disk.<\/p>\n<p>Converting the table to TokuDB with aggressive compression resulted with:<\/p>\n<blockquote>\n<pre>mysql&gt; alter table t engine=tokudb row_format=tokudb_lzma;\r\nQuery OK, 3587488 rows affected (29 min 48.79 sec)\r\nRecords: 3587488\u00a0 Duplicates: 0\u00a0 Warnings:<\/pre>\n<\/blockquote>\n<p>And over <strong>873MB<\/strong> of combined files on disk! Also note it took nearly <strong>30<\/strong> minutes to <strong>ALTER<\/strong>. Clearly this is not the expected outcome.<\/p>\n<h4>Attempt to make it work<\/h4>\n<p>I tried both the following approaches:<\/p>\n<ul>\n<li><strong>alter table t engine=tokudb row_format=tokudb_lzma key_block_size=4096<\/strong>: thought it would fool TokuDB to think it should create a 4M key block size.<\/li>\n<li><strong>alter table t engine=tokudb row_format=tokudb_lzma key_block_size=0<\/strong>: try and reset the key block size.<\/li>\n<\/ul>\n<p>Both the above attempts resulted with same bloat in resulting table.<\/p>\n<p>The reason? When ALTERing a table with a nother <strong>KEY_BLOCK_SIZE<\/strong>, the keys on the table remain with their old <strong>KEY_BLOCK_SIZE<\/strong>. They are unaffected by the <strong>ALTER<\/strong>. As suggested by <em>Nail Kashapov<\/em>, indexes must be rebuilt as well.<\/p>\n<h4>Making it work<\/h4>\n<p>The next <strong>ALTER<\/strong> modifies the <strong>KEY_BLOCK_SIZE<\/strong> <em>and<\/em> rebuilds all the indexes on the table:<\/p>\n<blockquote>\n<pre>mysql&gt; alter table t drop primary key, add primary key(id), drop key c1c4, add unique key `c1c4` (c1, c4), drop key c4, add key `c4` (c4), engine=tokudb row_format=tokudb_lzma key_block_size=0;\r\nQuery OK, 3587488 rows affected (2 min 7.97 sec)\r\nRecords: 3587488\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/pre>\n<\/blockquote>\n<p>Yep! Runtime seems much more agreeable. Total size on disk? Little over <strong>26M<\/strong>. Did I say I wasn&#8217;t expecting good reduction in terms of compression?<\/p>\n<p>Have done the same for multiple tables; compression is consistently strong (e.g. <strong>16MB<\/strong> InnoDB compressed -&gt; <strong>3.5MB<\/strong> TokuDB aggressive, <strong>548MB<\/strong> InnoDB non-compressed -&gt; <strong>36MB<\/strong> TokuDB aggressive), on varying table schemata. Very impressive reduction in disk space!<\/p>\n<h4>Conclusion<\/h4>\n<p>Next version of TokuDB is expected to ignore the <strong>KEY_BLOCK_SIZE<\/strong> table option; until then converting compressed tables to TokuDB is a pain in terms of the syntax &#8212; but worthwhile in terms of disk space.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group. Background I wanted to test TokuDB&#8217;s compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[14,102],"class_list":["post-6368","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-innodb","tag-tokudb"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1EI","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=6368"}],"version-history":[{"count":9,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6368\/revisions"}],"predecessor-version":[{"id":6377,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/6368\/revisions\/6377"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=6368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=6368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=6368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}