{"id":3442,"date":"2012-05-22T07:33:05","date_gmt":"2012-05-22T05:33:05","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=3442"},"modified":"2012-05-22T07:34:23","modified_gmt":"2012-05-22T05:34:23","slug":"getting-rid-of-huge-ibdata-file-no-dump-required","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/getting-rid-of-huge-ibdata-file-no-dump-required","title":{"rendered":"Getting rid of huge ibdata file, no dump required"},"content":{"rendered":"<p>You <a href=\"http:\/\/code.openark.org\/blog\/mysql\/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file\">have<\/a> been <a href=\"http:\/\/ronaldbradford.com\/blog\/leveraging-the-innodb-plugin-2011-02-11\/\">told<\/a> (guilty as charged), that the only way to get rid of the huge InnoDB tablespace file (commonly named <strong>ibdata1<\/strong>), when moving to <strong>innodb_file_per_table<\/strong>, is to do a logical dump of your data, completely erase everything, then import the dump.<\/p>\n<p>To quickly reiterate, you can only delete the <strong>ibdata1<\/strong> file when no InnoDB tables exist. Delete this file with an existing InnoDB table, even a table in its own tablespace, and nothing ever works anymore.<\/p>\n<h4>The problem with the dump-based solution<\/h4>\n<p>The impact of doing a logical dump is often overwhelming. Well, the dump may be tolerable, but the restore is much longer. The real pain is that you can&#8217;t do this one table at a time: you have to destroy everything before dropping the <strong>ibdata1<\/strong> file; you then have to import everything.<\/p>\n<p>Perhaps the most common scenario is that we do the changes on a slave, so as not to completely shut down our database. This is nice; no one is aware of the shutdown process. However, Huston, we have a problem: we need to make sure we can keep up the binary logs on the master for the duration of the <em>entire process<\/em>.<!--more--><\/p>\n<h4>A semi-solution for binary logs<\/h4>\n<p>You may get by by keeping the <strong>SQL_IO_THREAD<\/strong> running on the slave while dump is taken (SQL thread is better turned off). If you&#8217;re careful, you could do the same after restarting the database: you should still be able to acquire relay logs. With row based replication becoming more common, the problem of binary logs disk space returns: the logs (rather, log entries) are just so much larger!<\/p>\n<p>Either way, the process can takes long days, at the end of which your slave is up, but lags for long days behind.<\/p>\n<h4>Wishful thought: do it one table at a time<\/h4>\n<p>If we could do it one table at a time, and assuming our dataset is fairly split among several tables (i.e. not all of our <strong>500GB<\/strong> of data is in one huge table), life would be easier: we could work on a single table, resume replication, let the slave catch up, then do the same for the next table.<\/p>\n<p>How? Didn&#8217;t we just say one can only drop the <strong>ibdata1<\/strong> file when no InnoDB tables exist?<\/p>\n<h4>Solution: do it one table at a time<\/h4>\n<p>I&#8217;m going to illustrate what seems like a longer procedure. I will later show why it is not, in fact, longer.<\/p>\n<p>The idea is to first convert all your tables to MyISAM (Yay! A use for MyISAM!). That is, convert your tables one table at a time, using normal <strong>ALTER TABLE t ENGINE=MyISAM<\/strong>.<\/p>\n<p>Please let go of the foreign keys issue right now. I will address it later, there&#8217;s a lot to be addressed.<\/p>\n<p>So, on a slave:<\/p>\n<ol>\n<li><strong>STOP SLAVE<\/strong><\/li>\n<li>One <strong>ALTER TABLE &#8230; ENGINE=MyISAM<br \/>\n<\/strong><\/li>\n<li><strong>START SLAVE<\/strong> again<\/li>\n<li>Wait for slave catch up<\/li>\n<li>GOTO <strong>1<\/strong><\/li>\n<\/ol>\n<p>What do we end up with? A MyISAM only database. What do we do with it? Why, convert it back to InnoDB, of course!<\/p>\n<p>But, before that, we:<\/p>\n<ol>\n<li>Shut MySQL down<\/li>\n<li>Delete <strong>ibdata1<\/strong> file, <strong>ib_logfile[01]<\/strong> (i.e. delete all InnoDB files)<\/li>\n<li>Start MySQL<\/li>\n<\/ol>\n<p>A new <strong>ibdata1<\/strong> file, and new transaction log files will be created. Note: the new ibdata1 file is <em>small<\/em>. Mission almost accomplished.<\/p>\n<p>We then:<\/p>\n<ol>\n<li><strong>STOP SLAVE<\/strong><\/li>\n<li>Do one <strong>ALTER TABLE &#8230; ENGINE=InnoDB [ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8 &#8230;]<br \/>\n<\/strong><\/li>\n<li><strong>START SLAVE<\/strong> again<\/li>\n<li>Wait for slave catch up<\/li>\n<li>GOTO <strong>1<\/strong><\/li>\n<\/ol>\n<p>What do we end up with? An InnoDB only database, with true file per table, and a small <strong>ibdata1<\/strong> file. Space recovered!<\/p>\n<h4>The advantage of this method<\/h4>\n<p>The thing is, we resume replication after each table alteration. This means breaking the lag period into many smaller periods. While the <em>total<\/em> runtime does not reduce, we do reduce the maximum lag time. And this makes for easier recovery: no need to store multitudes of binary logs!<\/p>\n<h4>So what about the foreign keys?<\/h4>\n<p>Phew. Continued next post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You have been told (guilty as charged), that the only way to get rid of the huge InnoDB tablespace file (commonly named ibdata1), when moving to innodb_file_per_table, is to do a logical dump of your data, completely erase everything, then import the dump. To quickly reiterate, you can only delete the ibdata1 file when no [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[74,14,13,65],"class_list":["post-3442","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-hack","tag-innodb","tag-myisam","tag-refactoring"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-Tw","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3442","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=3442"}],"version-history":[{"count":16,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3442\/revisions"}],"predecessor-version":[{"id":4913,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3442\/revisions\/4913"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=3442"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=3442"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=3442"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}