How often should you use OPTIMIZE TABLE? – followup

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status:

mysql> show table status like 'logs'\G
*************************** 1. row ***************************
           Name: logs
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 22433048
 Avg_row_length: 206
    Data_length: 4625285120
Max_data_length: 0
   Index_length: 1437073408
      Data_free: 4194304
 Auto_increment: 186878920
    Create_time: 2010-08-24 18:10:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
        Comment:

(A bit puzzled on the Create_time; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the Create_time field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).

I was using Percona-Server-5.1.47-11.2, and so was able to look at the index statistics for that table:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+----------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys   | index_size | leaf_pages |
+--------------+------------+--------------+--------+----------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1              |     282305 |     246856 |
| newsminer    | logs       | s            |      2 | 17, 1          |      38944 |      33923 |
| newsminer    | logs       | name         |      3 | 2492739, 10, 2 |      22432 |      19551 |
| newsminer    | logs       | origin       |      3 | 1303, 4, 1     |      26336 |      22931 |
+--------------+------------+--------------+--------+----------------+------------+------------+

Status after massive purge

My first requirement was to purge out all record up to 2010-09-01 00:00:00. I did so in small chunks, using openark kit‘s oak-chunk-update (same can be achieved with maatkit‘s mk-archiver). The process purged 1000 rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in monotonically ascending values, purging of old rows also means purging of lower PKs, which means we’re trimming the PK tree from left.

Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40961 |      35262 |
| newsminer    | logs       | s            |      2 | 26, 1        |      34440 |       3798 |
| newsminer    | logs       | name         |      3 | 341011, 4, 1 |       4738 |       2774 |
| newsminer    | logs       | origin       |      3 | 341011, 4, 2 |      10178 |       3281 |
+--------------+------------+--------------+--------+--------------+------------+------------+

The number of deleted rows was roughly 85% of total rows, so down to 15% number of rows.

Status after OPTIMIZE TABLE

Time to see whether OPTIMIZE really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?

mysql> OPTIMIZE TABLE logs;
...
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40436 |      35323 |
| newsminer    | logs       | s            |      2 | 16, 1        |       5489 |       4784 |
| newsminer    | logs       | name         |      3 | 335813, 7, 1 |       3178 |       2749 |
| newsminer    | logs       | origin       |      3 | 335813, 5, 2 |       3951 |       3446 |
+--------------+------------+--------------+--------+--------------+------------+------------+
4 rows in set (0.00 sec)

The above shows no significant change in either of the indexes: not for index_size, not for leaf_pages, not for statistics (row_per_keys). The OPTIMIZE did not reduce index size. It did not reduce the number of index pages (leaf_pages are the major factor here). Some leaff_pages values have even increased, but in small enough margin to consider as equal.

Index-wise, the above example does not show an advantage to using OPTIMIZE. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.

So, no use for OPTIMIZE?

Think again: file system-wise, things look different.

Before purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd

After purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd

Recall that InnoDB never releases table space back to file system!

After OPTIMIZE on table:

bash:~# ls -l logs.* -h
-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd

On innodb_file_per_table an OPTIMIZE creates a new table space, and the old one gets destroyed. Space goes back to file system. Don’t know about you; I like to have my file system with as much free space as possible.

Need to verify

I’ve tested Percona Server, since this is where I can find INNODB_INDEX_STATS. But this begs the following questions:

  • Perhaps the results only apply for Percona Server? (I’m guessing not).
  • Or only for InnoDB plugin? Does the same hold for “builtin” InnoDB? (dunno)
  • Only on >= 5.1? (Maybe; 5.0 is becoming rare now anyway)
  • Only on InnoDB (Well, of course this test is storage engine dependent!)

Conclusion

The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle UPDATEs to indexed columns. I have some more operations to do here, with larger tables (e.g. 40GB compressed). If anything changes, I’ll drop a note.

7 thoughts on “How often should you use OPTIMIZE TABLE? – followup

  1. Hi Shlomi,

    A couple of things to note:

    – I would recommend using xtrabackup –stats to make an assessment on fragmentation. See: http://www.mysqlperformanceblog.com/2009/09/14/statistics-of-innodb-tables-and-indexes-available-in-xtrabackup/

    – I don’t think it matters that you are truncating from the end of your primary key versus midway through a primary key.

    (More related to the original blog post) What fragmentation OPTIMIZE will not help with, is if ID is monotonic, but (s), (name) and (origin) are not – or worst case are in complete opposite order. As optimize creates it’s temporary table, it will not pre-sort these indexes before creating them – so you have a trickle load of data that causes page splits straight away.

  2. What about a ‘ALTER TABLE…ENGINE=InnoDB’ instead of a OPTIMIZE TABLE to reclaim space?

  3. Is it safe to run OPTIMIZE TABLE while the database is in use by an application? Or does it need to be taken down and brought up with the “skip-networking” option so no can connect to it?

  4. @Jason,

    It is safe, in that the table is completely locked down for the duration of the ALTER operation. You can’t read from it, you can’t even look at it.

    If you can guarantee no one will access this table, it also makes for an online operation. Otherwise, it is often undesired to work out on a working server, as so many connections get locked down, as well as table cache.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.