MySQL vs. PostgreSQL, gh-ost perspective

August 11, 2016

Last week we released gh-ost, GitHub's online schema migration tool for MySQL. As with other open source releases in the MySQL ecosystem, this release was echoed by several "Why not PostgreSQL?" comments. Having been active in open source since many years now, I'm familiar with these responses, and I find this is a good time to share my thoughts. Why? XKCD knows the answer:

XKCD: Duty Calls

I picked one post I wish to address (latest commit: 3dfbd2cd3f5468f035ec86442d2c670a510118d8). The author invested some time writing it. It nicely summarizes claims I've heard over the years, as well as some prejudice. Through responding to this post I will be generalizing thoughts and impressions to address the common reactions. Dear @brandur, let's grab a beer some day; I fundamentally disagree with your post and with its claims.

EDIT: linked post has been updated following this writing; I'd like to thank the author for his consideration. Also see his followup post. The version I've responded to in this post is this commit. Continue Reading »

Introducing gh-ost: triggerless online schema migrations

August 1, 2016

I'm thoroughly happy to introduce gh-ost: triggerless, controllable, auditable, testable, trusted online schema change tool released today by GitHub.

gh-ost now powers our production schema migrations. We hit some serious limitations using pt-online-schema-change on our large volume, high traffic tables, to the effect of driving our database to a near grinding halt or even to the extent of causing outages. With gh-ost, we are now able to migrate our busiest tables at any time, peak hours and heavy workloads included, without causing impact to our service.

gh-ost supports testing in production. It goes a long way to build trust, both in integrity and in control. Are your databases just too busy and you cannot run existing online-schema-change tools? Have you suffered outages due to migrations? Are you tired of babysitting migrations that run up to 3:00am? Tired of being the only one tailing logs? Please, take a look at gh-ost. I believe it changes online migration paradigm.

For a more thorough overview, please read the announcement on the GitHub Engineering Blog, and proceed to the documentation.

gh-ost is open sourced under the MIT license.

Solving the non-atomic table swap, Take III: making it atomic

July 7, 2016

With the unintended impression of becoming live blogging, we now follow up on Solving the non-atomic table swap, Take II and Solving the Facebook-OSC non-atomic table swap problem with a safe, blocking, atomic solution

Why yet another iteration?

The solution presented in Solving the non-atomic table swap, Take II was good, in that it was safe. No data corruption. Optimistic: if no connection is killed throughout the process, then completely blocking.

Two outstanding issues remained:

  • If something did go wrong, the solution reverted to a table-outage
  • On replicas, the table swap is non atomic, non blocking. There's table-outage scenario on replica.

As it turns out, there's a simpler solution which overcomes both the above. As with math and physics, the simpler solution is often the preferred one. But it took those previous iterations to gather a few ideas together. So, anyway:

Safe, locking, atomic, asynchronous table swap

Do read the aforementioned previous posts; the quick-quick recap is: we want to be able to LOCK a table tbl, then do some stuff, then swap it out and put some ghost table in its place. MySQL does not allow us to rename tbl to tbl_old, ghost to tbl if we have locks on tbl in that session.

The solution we offer is now based on two connections only (as opposed to three, in the optimistic approach). "Our" connections will be C10, C20. The "normal" app connections are C1..C9, C11..C19, C21..C29.

  • Connections C1..C9 operate on tbl with normal DML: INSERT, UPDATE, DELETE
  • Connection C10: CREATE TABLE tbl_old (id int primary key) COMMENT='magic-be-here'
  • Connection C10: LOCK TABLES tbl WRITE, tbl_old WRITE
  • Connections C11..C19, newly incoming, issue queries on tbl but are blocked due to the LOCK
  • Connection C20: RENAME TABLE tbl TO tbl_old, ghost TO tbl
    This is blocked due to the LOCK, but gets prioritized on top connections C11..C19 and on top C1..C9 or any other connection that attempts DML on tbl
  • Connections C21..C29, newly incoming, issue queries on tbl but are blocked due to the LOCK and due to the RENAME, waiting in queue
  • Connection C10: checks that C20's RENAME is applied (looks for the blocked RENAME in processlist)
  • Connection 10: DROP TABLE tbl_old
    Nothing happens yet; tbl is still locked. All other connections still blocked.
  • Connection 10: UNLOCK TABLES
    The RENAME is first to execute, ghost table is swapped in place of tbl, then C1..C9, C11..C19, C21..C29 all get to operate on the new and shiny tbl

Some notes Continue Reading »

Solving the non-atomic table swap, Take II

June 20, 2016

Following up and improving on Solving the Facebook-OSC non-atomic table swap problem, we present a better, safe solution.

Quick, quickest recap:

We are working on a triggerless online schema migration solution. It is based on an asynchronous approach, similarly to the FB osc and as opposed to the synchronous solution as used by pt-online-schema-change.

We asynchronously synchronize (is that even a valid statement?) between some table tbl and a ghost table ghost, and at some time we want to cut-over: swap the two; kick out tbl and put ghost in its place and under its name.

However, we cannot use the single statement rename tbl to tbl_old, ghost to tbl, because we use the asynchronous approach, where at the time we lock tbl for writes, we still have some events we need to process and apply onto ghost before swapping the two.

And MySQL does not allow a lock tables tbl write; ... ; rename tbl to tbl_old, ghost to tbl.

In Solving the Facebook-OSC non-atomic table swap problem we suggested a way that works, unless when it doesn't work. Read the caveat at the end of the post. Premature death of a connection that participates in the algorithm causes a chain reaction that leads to the premature execution of the rename statement, potentially before we've applied those remaining events. This leads to data inconsistency between the old table and the new table, and is unacceptable.

To that effect, we were more inclined to go with the Facebook solution, which makes a two-step: lock tables tbl write; alter table tbl rename to tbl_old; ... ; alter table ghost rename to tbl;

This two-step solution is guaranteed not to have data inconsistency. Alas, it also implies an outage. There's a brief moment, in between the two renames, and during that time where we apply those last changes, where the table tbl is simply not there.

Not all applications will fail gracefully on such a scenario. Continue Reading »

Solving the Facebook-OSC non-atomic table swap problem

May 3, 2016

We present a way to use an atomic, blocking table swap in the Facebook Online-Schema-Change solution, as well as in a rumored, other Online-Schema-rumored-Change solution. Update: also a caveat.

Quick recap (really quick)

pt-online-schema-change and facebook-osc are two popular online-schema-change solutions for MySQL. They both use triggers, but in different ways. While the Percona tool uses synchronous table updates, such that any INSERT|UPDATE|DELETE on the modified table causes an INSERT|UPDATE|DELETE on a ghost table, in the Facebook tool all cause an INSERT on a changelog table, which is then iterated, read, having entries applied on the ghost table.

The TL;DR is that DMLs on the table propagate synchronously, within same transaction in the Percona tool, and asynchronously, with lag, in the Facebook tool.

What's the problem with the table swap?

In the Percona tool, once the logic is satisfied the copy is complete, we issue this query:

RENAME TABLE tbl TO tbl_old, tbl_new TO tbl;

This is an atomic, two table RENAME operation.

However with the asynchronous nature of the Facebook tool, such a RENAME would be a mistake. We must first block writes to the modified table, then make sure we have iterated the changelog table to the point of lock, apply those changes onto the ghost table, and only then do the swap.

The problem is: you cannot RENAME TABLES while one of them is LOCKed.

This is silly, and inconsistent, because:

Query OK, 0 rows affected (0.00 sec)

> RENAME TABLE tbl TO tbl_old, tbl_new TO tbl;
ERROR 1192 (HY000): Can't execute the given command because you have active locked tables or an active transaction

> ALTER TABLE tbl RENAME TO tbl_old;
Query OK, 0 rows affected (0.00 sec)

Why would the RENAME fail where the ALTER works?

Small thing, but critical to the operation of the online-schema-change. From the Facebook OSC documentation:

Since alter table causes an implicit commit in innodb, innodb locks get released after the first alter table. So any transaction that sneaks in after the first alter table and before the second alter table gets a 'table not found' error. The second alter table is expected to be very fast though because copytable is not visible to other transactions and so there is no need to wait.

What the FB solution means

It means for a very brief duration, the table is not there. Your app will get errors.

Of course, we should be able to handle errors anytime, aywhere. But the honest truth is: we (as in the world) do not. Many apps will fail ungracefully should they get a table not found error.

An atomic swap, as compared, would make for briefly blocking operations, making the app ignorant of the swap.


Rumor has it that we at GitHub are developing a new, triggerless, Online Schema Change tool. It is rumored to be based off binary logs and is rumored to have lots of interesting rumored implications.

Such rumored implementation would have to be asynchronous by nature, or so rumors say. And as such, it would fall for the same non-atomic table swap problem.


Once we heard it was rumored we were working on a triggerless online schema change tool, we realized we would have to solve the non-atomic swap problem. What we did was to gossip about it in between ourselves, which led to three different rumors of a solution, eventually manifested as three different working solutions. All three solutions make for blocking queries on the app's side. I will present one of these solution here, based on voluntary locks. Continue Reading »

MySQL Community Awards 2016: the Winners

April 22, 2016

The MySQL Community Awards initiative is an effort to acknowledge and thank individuals and corporates for their contributions to the MySQL ecosystem. It is a from-the-community, by-the-community and for-the-community effort. The committee is composed of an independent group of community members of different orientation and opinion, themselves past winners or known contributors to the community.

The 2016 community awards were presented on April 21st, 2016, during the keynotes at the Percona Live conference. The winners are:

MySQL Community Awards: Community Contributor of the year 2016

  • Bill Karwin
    Bill has been working with the community for years, helping them understand SQL. Bill is the author of the great book "SQL Antipatterns". He has given a large amount of help on sites such as StackOverflow, Quora, and of course many conference talks. Bill has provided a huge amount of help to the community directly.
  • Domas Mituzas
    Domas Mituzas started in the MySQL ecosystem as a MySQL Support Engineer at MySQL AB. Since he had some spare time, he did a lot of work to scale MySQL at Wikipedia. He is now a small data engineer at Facebook, mostly working with user-facing data systems. He continues to write very interesting blog posts and bug reports. Domas is responsible for giving us MyDumper, PoorMansProfiler, and the infamous Query Cache tuner!
  • Yoshinori Matsunobu
    Yoshinori Matsunobu is currently leading the MyRocks effort to get the RocksDB storage engine for MySQL into production at Facebook. Previously (amongst his other accomplishments) he created HandlerSocket, and implemented MHA to support failover automation for MySQL – both of which have been used at many companies. He is a frequent speaker at community events, and his tutorials and slide decks do a lot to increase expertise in the community. He is a frequent bug reporter with a focus on replication (RBR, semi-sync).

MySQL Community Awards: Application of the year 2016

Continue Reading »

MySQL Community Awards 2016: Call for Nominations!

February 5, 2016

The 2016 MySQL Community Awards event will take place, as usual, in Santa Clara, during the Percona Live Data Performance Conference, April 2016.

The MySQL Community Awards is a community based initiative. The idea is to publicly recognize contributors to the MySQL ecosystem. The entire process of discussing, voting and awarding is controlled by an independent group of community members, typically based of past winners or their representatives, as well as known contributors.

It is a self-appointed, self-declared, self-making-up-the-rules-as-it-goes committee. It is also very aware of the importance of the community; a no-nonsense, non-political, adhering to tradition, self criticizing committee.

The Call for Nominations is open. We are seeking the community’s assistance in nominating candidates in the following categories:

MySQL Community Awards: Community Contributor of the year 2016

This is a personal award; a winner would a person who has made contribution to the MySQL ecosystem. This could be via development, advocating, blogging, speaking, supporting, etc. All things go.

MySQL Community Awards: Application of the year 2016

An application, project, product etc. which supports the MySQL ecosystem by either contributing code, complementing its behaviour, supporting its use, etc. This could range from a one man open source project to a large scale social service.

MySQL Community Awards: Corporate Contributor of the year 2016

A company who made contribution to the MySQL ecosystem. This might be a corporate which released major open source code; one that advocates for MySQL; one that help out community members by... anything.

For a list of previous winners, please see MySQL Hall of Fame. Continue Reading »

Orchestrator progress

December 23, 2015

This comes mostly to reassure, having moved into GitHub: orchestrator development continues.

I will have the privilege of working on this open source solution in GitHub. There are a few directions we can take orchestrator to, and we will be looking into the possibilities. We will continue to strengthen the crash recovery process, and in fact I've got a couple ideas on drastically shortening Pseudo-GTID recovery time as well as other debts. We will look into yet other directions, which we will share. My new and distinguished team will co-work on/with orchestrator and will no doubt provide useful and actionable input.

Orchestrator continues to be open for pull requests, with a temporal latency in response time (it's the Holidays, mostly).

Some Go(lang) limitations (namely the import path, I'll blog more about it) will most probably imply some changes to the code, which will be well communicated to existing collaborators.

Most of all, we will keep orchestrator a generic solution, while keeping focus on what we think is most important - and there's some interesting vision here. Time will reveal as we make progress.


Joining GitHub

November 30, 2015

Today was my last day at, and shortly I will be joining the team at GitHub.

I'd like to thank the many kind, friendly & smart people I've worked with at!

The challenges at are big. There is such a diversity within the technology stack; even within the database range. A solution that works on all the various production environments is something to value. Indeed, the Production environment it is an amazing playground for developers, offering high volume, large numbers, and differing workloads to tackle. Your code just gets hammered down and you get very quick feedback on whether you did it right or wrong.

I was happy to have worked on serious reliability and operational topics, and to have made a meaningful contribution.

Joining GitHub, I'm to be a systems engineer in a great team (friends included), building great products, in and around the database zone, delivering open source, pretty much expecting to do awesome stuff! That, and the swag.

Forking Golang repositories on GitHub and managing the import path

November 23, 2015

Problem: there's an awesome Golang project on GitHub which you want to fork. You want to develop & collaborate on that fork, but the golang import path, in your source code, still references the original path, breaking everything.

A couple solutions offered below. First, though, let's get some names.

A sample case, the problem at hand

There's an awesome tool on You successfully fork it onto

You want to collaborate on; you wish to pull, commit & push. Maybe you want to send pull requests to the origin.

The following is commonly found throughout .go files in the repository:

import (

If you:

go get

golang creates your $GOPATH/src/, which is awesome. However, as you resolve dependencies via

cd $GOPATH/src/ ; go get ./...

golang digs into the source code, finds references to etc, and fetches those from and onto $GOPATH/src/, which is not awesome. You actually have two copies of the code, one from your fork, one from the origin, and your own fork will be largely ignored as it mostly points back to the origin.

A bad solution

The dirty, bad solution would be for you to go over the source code and replace "" entries with "". It is bad for two reasons:

  • You will not be able to further pull changes from upstream
  • You will not be able to pull-request and push your own changes upstream

Continue Reading »

Powered by Wordpress and MySQL. Theme by