The problem with MySQL foreign key constraints in Online Schema Changes

This post explains the inherent problem of running online schema changes in MySQL, on tables participating in a foreign key relationship. We’ll lay some ground rules and facts, sketch a simplified schema, and dive into an online schema change operation.

Our discussion applies to pt-online-schema-change, gh-ost, and Vitess based migrations, or any other online schema change tool that works with a shadow/ghost table like the Facebook tools.

Why Online Schema Change?

Online schema change tools come as workarounds to an old problem: schema migrations in MySQL were blocking, uninterruptible, aggressive in resources, replication unfriendly. Running a straight ALTER TABLE in production means locking your table, generating high load on the primary, causing massive replication lag on replicas once the migration moves down the replication stream.

Isn’t there some Online DDL?

Yes. InnoDB supports Online DDL, where for many ALTER types, your table remains unblocked throughout the migration. That’s an important improvement, but unfortunately not enough. Some migration types do not permit concurrent DDL (notably changing column data type, e.g. from INT to BIGINT). Migration is still aggressive and generates high load on your server. Replicas still run the migration sequentially. If your migration takes 5 hours to run concurrently on the primary, expect a 5 hour replication lag on your replica, i.e. complete loss of your fresh read capacity.

Isn’t there some Instant DDL?

Yes. But unfortunately extremely limited. Mostly just for adding a new column. See here or again here. Instant DDLs showed great promise when introduced (contributed to MySQL by Tencent Games DBA Team) three years ago, and the hope was that MySQL would support many more types of ALTER TABLE in INSTANT DDL. At this time this has not happened yet, and we do with what we have.

Not everyone is Google or Facebook scale, right?

True. But you don’t need to to be Google, or Facebook, or GitHub etc. scale to feel the pain of schema changes. Any non trivially sized table takes time to ALTER, which results with lock/downtime. If your tables are limited to hundreds or mere thousands of small rows, you can get away with it. When your table grows, and a mere dozens of MB of data is enough, ALTER becomes non-trivial at best case, and outright a cause of outage in a common scenario, in my experience.

Let’s discuss foreign key constraints

In the relational model tables have relationships. A column in one table indicates a column in another table, so that a row in one table has a relationship one or more rows in another table. That’s the “foreign key”. A foreign key constraint is the enforcement of that relationship. A foreign key constraint is a database construct which watches over rows in different tables and ensures the relationship does not break. For example, it may prevent me from deleting a row that is in a relationship, to prevent the related row(s) from becoming orphaned. Continue reading » “The problem with MySQL foreign key constraints in Online Schema Changes”