Table refactoring & application version upgrades, Part II

Continuing Table refactoring & application version upgrades, Part I, we now discuss code & database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.

As I’ll show, DROP operations are significantly simpler than creation operations. Interestingly, it’s the same as in life.

DROP COLUMN

A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:

  1. App: V1 -> V2. Remove all references to column; make sure no queries use said column.
  2. DB: V1 -> V2 (possibly failover from M1 to M2), change is DROP COLUMN.

DROP INDEX

A possibly simpler case here. Why would you drop an index? Is it because you found out you never use it anymore? Then all you have to do is just drop it.

Or perhaps you don’t need the functionality the index supports anymore? Then first drop the functionality:

  1. (optional) App: V1 -> V2. Discard using functionality which relies on index.
  2. DB: V1 -> V2 (possibly failover from M1 to M2), change is DROP INDEX. Check out InnoDB Plugin here. Continue reading » “Table refactoring & application version upgrades, Part II”

Table refactoring & application version upgrades, Part I

A developer’s major concern is: How do I do application & database upgrades with minimal downtime? How do I synchronize between a DB’s version upgrade and an application’s version upgrade?

I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.

Reader prerequisites

I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an ALTER command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of MMM for MySQL. When I describe “Failover from M1 to M2“, I mean “Make the ALTER changes on M2 (passive), then switch your application from M1 to M2 (change of IPs, VIP, etc.), promoting M2 to active position, then apply same changes on M1 (now passive) or completely rebuild it”.

Phew, a one sentence description of M-M usage…

I also assume the reader’s understanding that a table’s schema can be different on master & slave, which is the basis for the “use replication for refactoring” trick. But it cannot be too different, or, to be precise, the two schemata must both support the ongoing queries for the table.

A full discussion of the above is beyond the scope of this post.

Types of refactoring needs

As I limit this discussion to single table refactoring,we can look at major refactoring operations and their impact on application & upgrades. We will discuss ADD/DROP COLUMN, ADD/DROP INDEX, ADD/DROP UNIQUE INDEX, ADD/DROP FOREIGN KEY, ADD/DROP TABLE.

We will assume the database and application are both in Version #1 (V1), and need to be upgraded to V2 or greater. Continue reading » “Table refactoring & application version upgrades, Part I”

SQL: forcing single row tables integrity

Single row tables are used in various cases. Such tables can be used for “preferences” or “settings”; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.

The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them to have just one row?

The half-baked solution

The common solution is to create a PRIMARY KEY and always use the same value for that key. In addition, using REPLACE or INSERT INTO ON DUPLICATE KEY UPDATE helps out in updating the row. For example:

CREATE TABLE heartbeat (
 id int NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
 );

The above table definition is taken from mk-heartbeat. It should be noted that mk-heartbeat in itself does not require that the table has a single row, so it is not the target of this post. I’m taking the above table definition as a very simple example.

So, we assume we want this table to have a single row, for whatever reasons we have. We would usually do:

REPLACE INTO heartbeat (id, ts) VALUES (1, NOW());

or

INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON DUPLICATE KEY UPDATE ts = NOW();

Why is the above a “half baked solution”? Because it is up to the application to make sure it reuses the same PRIMARY KEY value. There is nothing in the database to prevent the following: Continue reading » “SQL: forcing single row tables integrity”

Reducing locks by narrowing primary key

In a period of two weeks, I had two cases with the exact same symptoms.

Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks.

In both cases, tables were InnoDB. In both cases, there was a PRIMARY KEY on the combination of all 5 columns. And in both cases, there was no clear explanation as for why the PRIMARY KEY was chosen as such.

Choosing a proper PRIMARY KEY

Especially with InnoDB, which uses clustered index structure, the PRIMARY KEY is of particular importance. Besides the fact that a bloated PRIMARY KEY bloats the entire clustered index and secondary keys (see: The depth of an index: primer), it is also a source for locks. It’s true that any UNIQUE KEY can serve as a PRIMARY KEY. But not all such keys are good candidates. Continue reading » “Reducing locks by narrowing primary key”

Monotonic functions, SQL and MySQL

In mathematics, a monotonic function (or monotone function) is a function which preserves the given order. [Wikipedia]

To be more precise, a function f is monotonic increasing, if for every x ≤ y it holds that f(x) ≤ f(y). f is said to be strictly monotonic increasing is for every x < y it holds that f(x) < f(y).

So, if we follow values in some order, we say that f is monotonic increasing if f‘s value never decreases (it either increases or stays the same), and we say that f is strictly increasing if f‘s value is always changes “upwards”.

Monotonic functions play an important role in SQL. To discuss monotonic functions in SQL we must first determine what the order is, and then, what the function is.

Well, they both change according to our point of view. Let’s look at some examples. Take a look at the following table: Continue reading » “Monotonic functions, SQL and MySQL”

Beware of implicit casting

Ever so often a query provides a “bad” execution plan. Adding a missing index can many times solve the problem. However, not everything can be solved with an index. I wish to highlight the point of having an implicit cast, which negates the use of an index on MySQL.

I see this happening a lot on customers’ databases, and this begs for a short introduction.

MySQL doesn’t support index functions

Let’s assume the following table: Continue reading » “Beware of implicit casting”

Useful temporal functions & queries

Here’s a complication of some common and useful time & date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.

There are many ways to solve such problems. I’ll present my favorites.

Querying for time difference

Given two timestamps: ts1 (older) and ts2 (newer), how much time has passed between them?

One can use TIMEDIFF() & DATEDIFF(), or compare two UNIX_TIMESTAMP() values. My personal favorite is to use TIMESTAMPDIFF(). Reason being that I’m usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:

SELECT TIMESTAMPDIFF(HOUR, ts1, ts2)

Take, for example:

SELECT TIMESTAMPDIFF(MONTH, '2008-10-07 00:00:00', '2009-12-06 00:00:00')

The function correctly identifies the number of days per month, and provides with 13, being the truncated number of full months.

Doing arithmetics

One can use TIMESTAMPADD(), or DATE_SUB(), but, again, when dealing with specific resolutions, I find “+ INTERVAL” to be the most convenient:

SELECT ts1 + INTERVAL 10 HOUR

Continue reading » “Useful temporal functions & queries”

SQL: finding a user’s country/region based on IP

I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);

The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?

The wrong way

The form I’ve encountered is as follows: an index is declared on regions_ip_range:

KEY ip_range_idx (start_ip, end_ip)

And the query goes like this:

SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip

Continue reading » “SQL: finding a user’s country/region based on IP”

The depth of an index: primer

InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).

In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.

Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.

What affects the depth of an index?

There are quite a few structural issues, but it boils down to two important factors:

  1. The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.
  2. The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we’ll assume a large number of rows).

Continue reading » “The depth of an index: primer”