The depth of an index: primer

InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).

In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.

Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.

What affects the depth of an index?

There are quite a few structural issues, but it boils down to two important factors:

  1. The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.
  2. The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we’ll assume a large number of rows).

Continue reading » “The depth of an index: primer”

MySQL’s character sets and collations demystified

MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets. Continue reading » “MySQL’s character sets and collations demystified”

Common wrong Data Types compilation

During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.

Here’s a compilation of “the right and the wrong” data types. Continue reading » “Common wrong Data Types compilation”