Reasons to use InnoDB Plugin

I wish to present some compelling reasons to use the InnoDB plugin. The plugin is a drop-in replacement for “normal” InnoDB tables; enabling many new features. It is the outcome of a long termed silence from InnoBase (Oracle), which were thought to be neglecting the InnoDB engine.

I’m going to leave out “performance” for the reason that grander forces have benchmarked and written about it.

Compression

Using the new Barracuda table format, table data can be compressed. Compression depends on the type of data you have in your table, and in KEY_BLOCK_SIZE. I have found tables with lots of textual data to compress well, to about 25% volume (that is, reduction of 75%), and strictly integer-typed tables (like an a-2-b connecting table) to compress poorly.

I have seen an InnoDB 50GB database shrink into some 12GB only. Wow! That meant a server which only had RAID 1 two 72GB disks, and which was dangerously filled up with disk space, could now accommodate the database, a backup, and then some!

Continue reading » “Reasons to use InnoDB Plugin”

Reasons to use innodb_file_per_table

When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are ‘mixed’ together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB’s tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I’ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around. Continue reading » “Reasons to use innodb_file_per_table”

The depth of an index: primer

InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).

In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.

Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.

What affects the depth of an index?

There are quite a few structural issues, but it boils down to two important factors:

  1. The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.
  2. The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we’ll assume a large number of rows).

Continue reading » “The depth of an index: primer”

MySQL User Group Meetings in Israel

This is a short note that the MySQL User Group Meetings in Israel are established (well, re-established after a very long period).

Thanks to Eddy Resnick from Sun Microsystems Israel who has set up the meetings. So far, we’ve had 2 successful meetings, and we intend to have more! First one was in Sun’s offices in Herzlia; second one, held last week, was at Interbit (a MySQL training center) in Ramat Gan. We hope to hold these meetings on a monthly basis, and the next ones are expected to be held at Interbit.

A new (blessed) law in Israel forbids us from sending invitations for these meetings via email without prior consent of the recepient (this law has passed as means of stopping spam). We do realize there are many users out there who would be interested in these meeting. For those users: please stay tuned to Interbit’s website, where future meetings will be published – or just give them a call!

It was my honor to present a short session, one of three in this last meeting. Other presenters were Erad Deutch, who presented “MySQL Success Stories”, and Moshe Kaplan, who presented “Sharding Solutions”. I have presented “MyISAM & InnoDB Tuning Fundamentals”, where I have layed down the basics behind parameter tuning for these storage engines.

As per audience request, here’s the presentation in PDF format:

I intend to give sessions in future meetings, and have already started working on my next one. So please come, it’s a fun way to pass a nice afternoon. See you there!

Two storage engines; different plans, Part II

In Part I of this article, we have seen how the internal structure of the storage engine’s index can affect an execution plan. We’ve seen that some plans are inherent to the way engines are implemented.

We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will consider MyISAM and InnoDB. Again, we will use the world database for testing. This time, we will see how confident the storage engines are in their index search capabilities.

Many newcomers to databases often believe that an index search is always preferable to full table scan. This is not the case. If I were to look for 10 rows in a 1,000,000 rows table, using an indexed column – I could benefit from an index search. However, if I’m looking for 200,000 rows on that table (that’s 20% of the rows) – an index search can actually be much more expensive than a full table scan. Continue reading » “Two storage engines; different plans, Part II”

Two storage engines; different plans, Part I

A popping question is: “Can an execution plan change for different storage engines?”

The answer is “Yes”. I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine.

We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many respects, and in particular, the way they implement indexes and statistics: two major players in the optimizer’s point of view. Continue reading » “Two storage engines; different plans, Part I”