{"id":2882,"date":"2010-10-04T10:07:45","date_gmt":"2010-10-04T08:07:45","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=2882"},"modified":"2010-10-04T10:07:45","modified_gmt":"2010-10-04T08:07:45","slug":"how-often-should-you-use-optimize-table-followup","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/how-often-should-you-use-optimize-table-followup","title":{"rendered":"How often should you use OPTIMIZE TABLE? &#8211; followup"},"content":{"rendered":"<p>This post follows up on Baron&#8217;s <a href=\"http:\/\/www.xaprb.com\/blog\/2010\/02\/07\/how-often-should-you-use-optimize-table\/\">How often should you use OPTIMIZE TABLE?<\/a>. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the <strong>OPTIMIZE<\/strong> operation on table&#8217;s indexes. I worked on some production data I was authorized to provide as example.<\/p>\n<h4>The use case<\/h4>\n<p>I&#8217;ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I&#8217;ve rewritten some column names for privacy:<\/p>\n<blockquote>\n<pre>mysql&gt; show create table logs \\G\r\n\r\nCreate Table: CREATE TABLE `logs` (\r\n `id` int(11) NOT NULL AUTO_INCREMENT,\r\n `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,\r\n `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\r\n `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,\r\n `message` text NOT NULL,\r\n `level` tinyint(11) NOT NULL DEFAULT '0',\r\n `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',\r\n PRIMARY KEY (`id`),\r\n KEY `s` (`s`),\r\n KEY `name` (`name`,`ts`),\r\n KEY `origin` (`origin`,`ts`)\r\n) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8<\/pre>\n<\/blockquote>\n<p>The table had log records starting <strong>2010-08-23<\/strong> and up till <strong>2010-09-02<\/strong> noon. Table status:<!--more--><\/p>\n<blockquote>\n<pre>mysql&gt; show table status like 'logs'\\G\r\n*************************** 1. row ***************************\r\n           Name: logs\r\n         Engine: InnoDB\r\n        Version: 10\r\n     Row_format: Compressed\r\n           Rows: 22433048\r\n Avg_row_length: 206\r\n    Data_length: 4625285120\r\nMax_data_length: 0\r\n   Index_length: 1437073408\r\n      Data_free: 4194304\r\n Auto_increment: 186878920\r\n    Create_time: 2010-08-24 18:10:49\r\n    Update_time: NULL\r\n     Check_time: NULL\r\n      Collation: utf8_general_ci\r\n       Checksum: NULL\r\n Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8\r\n        Comment:<\/pre>\n<\/blockquote>\n<p>(A bit puzzled on the <strong>Create_time<\/strong>; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the <strong>Create_time<\/strong> field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).<\/p>\n<p>I was using <a href=\"http:\/\/www.percona.com\/downloads\/Percona-Server-5.1\/\">Percona-Server-5.1.47-11.2<\/a>, and so was able to look at the index statistics for that table:<\/p>\n<blockquote>\n<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';\r\n+--------------+------------+--------------+--------+----------------+------------+------------+\r\n| table_schema | table_name | index_name\u00a0\u00a0 | fields | row_per_keys\u00a0\u00a0 | index_size | leaf_pages |\r\n+--------------+------------+--------------+--------+----------------+------------+------------+\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | PRIMARY\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 1 | 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 282305 |\u00a0\u00a0\u00a0\u00a0 246856 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | s            |\u00a0\u00a0\u00a0\u00a0\u00a0 2 | 17, 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 38944 |\u00a0\u00a0\u00a0\u00a0\u00a0 33923 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 2492739, 10, 2 |\u00a0\u00a0\u00a0\u00a0\u00a0 22432 |\u00a0\u00a0\u00a0\u00a0\u00a0 19551 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | origin\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 1303, 4, 1\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 26336 |\u00a0\u00a0\u00a0\u00a0\u00a0 22931 |\r\n+--------------+------------+--------------+--------+----------------+------------+------------+<\/pre>\n<\/blockquote>\n<h4>Status after massive purge<\/h4>\n<p>My first requirement was to purge out all record up to <strong>2010-09-01 00:00:00<\/strong>. I did so in small chunks, using <a href=\"http:\/\/code.openark.org\/forge\/openark-kit\">openark kit<\/a>&#8216;s oak-chunk-update (same can be achieved with <a href=\"http:\/\/www.maatkit.org\/\">maatkit<\/a>&#8216;s mk-archiver). The process purged <strong>1000<\/strong> rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in <a href=\"http:\/\/code.openark.org\/blog\/mysql\/monotonic-functions-sql-and-mysql\">monotonically ascending<\/a> values, purging of old rows also means purging of lower PKs, which means we&#8217;re trimming the PK tree from left.<\/p>\n<p>Even while purging took place, I could see the index_size\/leaf_pages values dropping, until, finally:<\/p>\n<blockquote>\n<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';\r\n+--------------+------------+--------------+--------+--------------+------------+------------+\r\n| table_schema | table_name | index_name\u00a0\u00a0 | fields | row_per_keys | index_size | leaf_pages |\r\n+--------------+------------+--------------+--------+--------------+------------+------------+\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | PRIMARY\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 1 | 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 40961 |\u00a0\u00a0\u00a0\u00a0\u00a0 35262 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | s            |\u00a0\u00a0\u00a0\u00a0\u00a0 2 | 26, 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 34440 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3798 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 341011, 4, 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 4738 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2774 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | origin\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 341011, 4, 2 |\u00a0\u00a0\u00a0\u00a0\u00a0 10178 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3281 |\r\n+--------------+------------+--------------+--------+--------------+------------+------------+<\/pre>\n<\/blockquote>\n<p>The number of deleted rows was roughly <strong>85%<\/strong> of total rows, so down to <strong>15%<\/strong> number of rows.<\/p>\n<h4>Status after OPTIMIZE TABLE<\/h4>\n<p>Time to see whether <strong>OPTIMIZE<\/strong> really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?<\/p>\n<blockquote>\n<pre>mysql&gt; OPTIMIZE TABLE logs;\r\n...\r\nmysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';\r\n+--------------+------------+--------------+--------+--------------+------------+------------+\r\n| table_schema | table_name | index_name\u00a0\u00a0 | fields | row_per_keys | index_size | leaf_pages |\r\n+--------------+------------+--------------+--------+--------------+------------+------------+\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | PRIMARY\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 1 | 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 40436 |\u00a0\u00a0\u00a0\u00a0\u00a0 35323 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | s            |\u00a0\u00a0\u00a0\u00a0\u00a0 2 | 16, 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5489 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 4784 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 335813, 7, 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3178 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2749 |\r\n| newsminer\u00a0\u00a0\u00a0 | logs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | origin\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 335813, 5, 2 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3951 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3446 |\r\n+--------------+------------+--------------+--------+--------------+------------+------------+\r\n4 rows in set (0.00 sec)<\/pre>\n<\/blockquote>\n<p>The above shows no significant change in either of the indexes: not for <strong>index_size<\/strong>, not for <strong>leaf_pages<\/strong>, not for statistics (<strong>row_per_keys<\/strong>). The <strong>OPTIMIZE<\/strong> did not reduce index size. It did not reduce the number of index pages (<strong>leaf_pages<\/strong> are the major factor here). Some <strong>leaff_pages<\/strong> values have even increased, but in small enough margin to consider as equal.<\/p>\n<p>Index-wise, the above example does not show an advantage to using <strong>OPTIMIZE<\/strong>. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.<\/p>\n<h4>So, no use for OPTIMIZE?<\/h4>\n<p>Think again: file system-wise, things look different.<\/p>\n<p>Before purging of data:<\/p>\n<blockquote>\n<pre>bash:~# ls -l logs.* -h\r\n-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm\r\n-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd<\/pre>\n<\/blockquote>\n<p>After purging of data:<\/p>\n<blockquote>\n<pre>bash:~# ls -l logs.* -h\r\n-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm\r\n-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd<\/pre>\n<\/blockquote>\n<p>Recall that InnoDB never releases table space back to file system!<\/p>\n<p>After <strong>OPTIMIZE<\/strong> on table:<\/p>\n<blockquote>\n<pre>bash:~# ls -l logs.* -h\r\n-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm\r\n-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd<\/pre>\n<\/blockquote>\n<p>On <strong>innodb_file_per_table<\/strong> an <strong>OPTIMIZE<\/strong> creates a new table space, and the old one gets destroyed. Space goes back to file system. Don&#8217;t know about you; I like to have my file system with as much free space as possible.<\/p>\n<h4>Need to verify<\/h4>\n<p>I&#8217;ve tested Percona Server, since this is where I can find <strong>INNODB_INDEX_STATS<\/strong>. But this begs the following questions:<\/p>\n<ul>\n<li>Perhaps the results only apply for Percona Server? (I&#8217;m guessing not).<\/li>\n<li>Or only for InnoDB plugin? Does the same hold for &#8220;builtin&#8221; InnoDB? (dunno)<\/li>\n<li>Only on &gt;= 5.1? (Maybe; 5.0 is becoming rare now anyway)<\/li>\n<li>Only on InnoDB (Well, of course this test is storage engine dependent!)<\/li>\n<\/ul>\n<h4>Conclusion<\/h4>\n<p>The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle <strong>UPDATE<\/strong>s to indexed columns. I have some more operations to do here, with larger tables (e.g. <strong>40GB<\/strong> compressed). If anything changes, I&#8217;ll drop a note.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post follows up on Baron&#8217;s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table&#8217;s indexes. I worked on some production data I was authorized to provide as example. The [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[26,14,52],"class_list":["post-2882","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-indexing","tag-innodb","tag-performance"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-Ku","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2882","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=2882"}],"version-history":[{"count":21,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2882\/revisions"}],"predecessor-version":[{"id":3004,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2882\/revisions\/3004"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=2882"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=2882"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=2882"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}