Last week I had the pleasant opportunity of introducing and discussing the operation of online schema migrations to MySQL's engineering managers, as part of their annual meeting, in London.
Together with Simon J. Mudd of Booking.com, we discussed our perception of what it takes to run online schema migrations on a live, busy system.
While the Oracle/MySQL engineers develop new features or optimize behavior in the MySQL, we of the industry have the operational expertise and understanding of the flow of working with MySQL. In all topics, and in schema migration in particular, there is a gap between what's perceived to be the use case and what the use case actually is. It is the community's task to provide feedback back to Oracle so as to align development to match operations need where possible.
Our meeting included the following:
Need for schema migrations
We presented, based on our experience in current and past companies, and based on our friends of the community's experience, the case for online schema migrations. At GitHub, at Booking.com and in many other companies I'm familiar with, we continuously deploy to production, and this implies continuous schema migrations to our production databases. We have migrations running daily; sometimes multiple per day, some time none.
With continuous deployment, we as Guardians of the Database do not wish to be blockers for the development cycle. On the contrary, we want to be out of the way as soon as possible, other than verifying a requested migration is safe. We wish to be able to deliver a migration at any given time.
Not all companies behave this way; some run a weekly aggregation of migrations. Others yet still use the Though Shall Not Pass DBA model. We tried to depict the various approaches with strong emphasis on our own approach, which is the most demanding of schema migration solutions.
The MySQL ALTER
We proceeded to discuss the in-house ALTER statement & InnoDB online DDL, and pointed out the limitations those impose on "online" operations to the effect of rendering these solutions unused by many. The serialization in replication stream means losing serving capacity, getting lagging replicas. The lack of escape path means a commitment into an hours worth of uninterruptible operation. The lack of resource control implies getting performance degraded throughout the operation.
We briefly touched on the TokuDB's ALTER and how it worked.
We discussed migrating via replication: running migrations on one or more replicas at a time, finally failing over onto a promoted replica once all replicas are updated.
We know this solution to be in use in companies such as DropBox, Etsy and others. We illustrated our own reasoning for not using this solution:
- Increased clock-time for running a migration: running a one-replica-at-a-time or few-replicas-at-a-time can double, triple, quadruple and so forth the overall migration time.
- Concurrent migration complexity: and since runtime increases, so does the likelihood of needing to run additional migration at the same time, which highly complicates the flow in a one-at-a-time or few-at-a-time model.
- Serving capacity: in this model some, or up to half the number of servers, are non operational. Serving capacity is reduced and we need to have more hardware to support that
- Failover: the failover is not smooth; it either includes some outage or some block time, and at any case noticeable in production. Having a planned failover once in a while is OK, but having a failover multiple times a day is too much of a hustle, in our current setup.
- Topology complexity: how our topologies always have some special cases, such as cross-DC replication with reduced cross-DC network traffic via intermediate masters, testing replicas with newer versions, developer-dedicated servers and others, that make shuffling of replicas around difficult to automate.
We have not discussed Galera's Rolling Schema Upgrades as we personally do not have the experience of working with it. It solves the failover issue above, but given a "normal" replication tree under the cluster, same problems as above apply.
We concluded with our personal take, that like everything else, we just like to write stuff directly onto our masters, and let the natural replication flow deal with it and get our entire topology to be consistent.
Existing trigger based migrations
We drilled down into the algorithms behind pt-online-schema-change and Facebook's OSC (the latter being rewritten today, not yet released as open source). We elaborated on the pains we saw in trigger based migrations: being unsuspendible, causing lock spaghetti, impacting write latency on the master to the point of a standstill on busy servers, being untestable.
I presented gh-ost, our own, triggerless take on schema migrations. I discussed the logic behind gh-ost and how it decouples migration load from production load; the low impact the triggerless migration has on the master and on the entire replication chain, leading to low, subsecond replication lags throughout the migration and eliminating locking contention on the master. Basically the presentation Tom Krouper and I gave at Percona Live Amsterdam.
Want to Have
We followed up by a list of feature requests we could enjoy. These were largely technical issues gh-ost would benefit from, simplifying its behavior or ensuring its correctness in complex cases. We discussed dropping tables at end of migration, getting more info in the binary logs, GTID issues and more.
Thank you to Morgan Tocker for officially inviting us to this gathering. There were quite a few familiar faces in the room, and it was a friendly gathering. Thank you to all the engineering managers with whom we met!
The discussion was lively, friendly and receptive. The Oracle engineers laid out the internals of the online DDL; some of their thoughts on the potential of the JSON format; gave advice on technical issues presented. I'd like to thank them for listening to our take on the subject. There was a discussion on the possible paths Oracle can take to improve online schema operations, and I'd like to thank Oracle for sharing their own thoughts and advice!