Two storage engines; different plans, Part I

A popping question is: “Can an execution plan change for different storage engines?”

The answer is “Yes”. I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine.

We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many respects, and in particular, the way they implement indexes and statistics: two major players in the optimizer’s point of view.

Let’s start with the famous world database, available from dev.mysql.com. All tables in this schema are defined as MyISAM. We will alter them between MyISAM and InnoDB as we go along.

A peek at the Country table reveals:

mysql> SHOW CREATE TABLE Country \G
*************************** 1. row ***************************
Table: Country
Create Table: CREATE TABLE `Country` (
`Code` char(3) NOT NULL default '',
`Name` char(52) NOT NULL default '',
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL default 'Asia',
...
PRIMARY KEY (`Code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

To see the first example of execution plan difference, we will add an index on the Country table:

ALTER TABLE Country ADD INDEX (Continent);

And run the following query to find European country codes:

mysql> SELECT Code FROM Country WHERE Continent = 'Europe';
+------+
| Code |
+------+
| NLD |
| ALB |
| AND |
| BEL |
| BIH |
| GBR |
...

But how is this query executed?

mysql> EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Country
type: ref
possible_keys: Continent
key: Continent
key_len: 1
ref: const
rows: 37
Extra: Using where
1 row in set (0.00 sec)

Simple enough: we asked for European countries only. MySQL has found the index on Continent to be appropriate. However, to get the actual Code, a table row read was necessary.

InnoDB will provide a different plan, though:

mysql> ALTER TABLE Country ENGINE=InnoDB;
Query OK, 239 rows affected (0.18 sec)
Records: 239 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Country
type: ref
possible_keys: Continent
key: Continent
key_len: 1
ref: const
rows: 46
Extra: Using where; Using index
1 row in set (0.00 sec)

Can you spot the difference? The “Extra” column now indicates “Using index” (The numbers of expected rows also differ, but that’s another issue).

The reason for this change lies with the way MyISAM and InnoDB implement indexes. MyISAM takes the approach where the table data resides in its own space (and in fact, its own file), and all indexes refer to rows in that space. MyISAM is using nonclustered indexes.

InnoDB, however, uses a clustered index on the PRIMARY KEY. That is, for every table there is always a PRIMARY KEY index (even if we never defined one), and table data is aggregated withing the index’ structure. And so, to access table rows, one must first traverse the PRIMARY KEY index. This type of index is called a “clustered index”. The Code column is the primary key, and therefore the data is clustered on the Code column.

InnoDB’s secondary indexes behave altogether differently. A secondary index does not refer to the table rows directly, but instead refer to the PRIMARY KEY value, which relates to those rows. A table look up using a secondary key involves a search on that key, only to get a PRIMARY KEY value, and search on that clustered index as well. A side effect is that a secondary index includes the values of the PRIMARY KEY. Each secondary index, like the one we created on Continent, is somewhat a compound index, like on (Continent, Code). This is the reason that for our query, a search on the index was enough. There was no need to access table data, since all relevant data could be found within the index.

I say “somewhat”, because in contrast with an index on (Continent, Code), the index does not necessarily store the PRIMARY KEY values in any particular order. To prove this, let’s ask the following:

mysql> EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe' ORDER BY Code\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Country
type: ref
possible_keys: Continent
key: Continent
key_len: 1
ref: const
rows: 46
Extra: Using where; Using index; Using filesort
1 row in set (0.00 sec)

There’s a “Using filesort” comment in the “Extra” column, which would not be there had we used a compound index on (Continent, Code).

2 thoughts on “Two storage engines; different plans, Part I

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.