Practical Orchestrator, BoF, GitHub and other talks at Percona Live 2017

April 19, 2017

Next week I will be presenting Practical Orchestrator at Percona Live, Santa Clara.

As opposed to previous orchestrator talks I gave, and which were either high level or algorithmic talks, Practical Orchestrator will be, well... practical.

The objective for this talk is that attendees leave the classroom with a good grasp of orchestrator's powers, and know how to set up orchestrator in their environment.

We will walk through discovery, refactoring, recovery, HA. I will walk through the most important configuration settings, share advice on what makes a good deployment, and tell you how we and others run orchestrator. We'll present a few scripting/automation examples. We will literally set up orchestrator on my computer.

It's a 50 minute talk and it will be fast paced!

ProxySQL & Orchestrator BoF

ProxySQL is all the rage, and throughout the past 18 months René Cannaò and myself discussed a few times the potential for integration between ProxySQL and Orchestrator. We've also received several requests from the community.

We will run a BoF, a very informal session where we openly discuss our thoughts on possible integration, what makes sense and what doesn't, and above all else would love to hear the attendees' thoughts. We might come out of this session with some plan to pick low hanging fruit, who knows?

The current link to the BoF sessions is this. It seems terribly broken, and hopefully I'll replace it later on.

GitHub talks

GitHub engineers will further present these talks: Continue Reading »

"MySQL High Availability tools" followup, the missing piece: orchestrator

April 6, 2017

I read with interest MySQL High Availability tools - Comparing MHA, MRM and ClusterControl by SeveralNines. I thought there was a missing piece in the comparison: orchestrator, and that as result the comparion was missing scope and context.

I'd like to add my thoughts on topics addressed in the post. I'm by no means an expert on MHA, MRM or ClusterControl, and will mostly focus on how orchestrator tackles high availability issues raised in the post.

What this is

This is to add insights on the complexity of failovers. Over the duration of three years, I always think I've seen it all, and then get hit by yet a new crazy scenario. Doing the right thing automatically is difficult.

In this post, I'm not trying to convince you to use orchestrator (though I'd be happy if you did). To be very clear, I'm not claiming it is better than any other tool. As always, each tool has pros and cons.

This post does not claim other tools are not good. Nor that orchestrator has all the answers. At the end of the day, pick the solution that works best for you. I'm happy to use a solution that reliably solves 99% of the cases as opposed to an unreliable solution that claims to solve 99.99% of the cases.

Quick background

orchestrator is actively maintained by GitHub. It manages automated failovers at GitHub. It manages automated failovers at, one of the largest MySQL setups on this planet. It manages automated failovers as part of Vitess. These are some names I'm free to disclose, and browsing the issues shows a few more users running failovers in production. Otherwise, it is used for topology management and visualization in a large number of companies such as Square, Etsy, Sendgrid, Godaddy and more.

Let's now follow one-by-one the observations on the SeveralNines post. Continue Reading »

MySQL Community Awards 2017: Call for Nominations!

March 16, 2017

The 2017 MySQL Community Awards event will take place, as usual, in Santa Clara, during the Percona Live Data Performance Conference, April 2017.

The MySQL Community Awards is a community based initiative. The idea is to publicly recognize contributors to the MySQL ecosystem. The entire process of discussing, voting and awarding is controlled by an independent group of community members, typically based of past winners or their representatives, as well as known contributors.

It is a self-appointed, self-declared, self-making-up-the-rules-as-it-goes committee. It is also very aware of the importance of the community; a no-nonsense, non-political, adhering to tradition, self criticizing committee.

The Call for Nominations is open. We are seeking the community’s assistance in nominating candidates in the following categories:

MySQL Community Awards: Community Contributor of the year 2017

This is a personal award; a winner would a person who has made contribution to the MySQL ecosystem. This could be via development, advocating, blogging, speaking, supporting, etc. All things go.

MySQL Community Awards: Application of the year 2017

An application, project, product etc. which supports the MySQL ecosystem by either contributing code, complementing its behavior, supporting its use, etc. This could range from a one man open source project to a large scale social service.

MySQL Community Awards: Corporate Contributor of the year 2017

A company who made contribution to the MySQL ecosystem. This might be a corporate which released major open source code; one that advocates for MySQL; one that help out community members by... anything.

For a list of previous winners, please see MySQL Hall of Fame. Continue Reading »

orchestrator Puppet module now available

February 1, 2017

We have just open sourced and published an orchestrator puppet module. This module is authored by Tom Krouper of GitHub's database infrastructure team, and is what we use internally at GitHub for deploying orchestrator.

The module manages the orchestrator service, the config file (inherit to override values), etc (pun intended). Check it out!



Some observations on MySQL to sqlite migration & compatibility

January 30, 2017

I'm experimenting with sqlite as backend database for orchestrator. While orchestrator manages MySQL replication topologies, it also uses MySQL as backend. For some deployments, and I'm looking into such one, having MySQL as backend is a considerable overhead.

This sent me to the route of looking into a self contained orchestrator binary + backend DB. I would have orchestrator spawn up its own backend database instead of connecting to an external one.

Why even relational?

Can't orchestrator just use a key-value backend?

Maybe it could. But frankly I enjoy the power of relational databases, and the versatility they offer has proven itself multiple times with orchestrator, being able to answer interesting, new, complex questions about one's topology by crafting SQL queries.

Moreover, orchestrator is already heavily invested in the relational model. At this time, replacing all SQL queries with key-value reads seems to me as a significant investment in time and risk. So I was looking for a relational, SQL accessible embeddable database for orchestrator.

Why sqlite?

I am in particular looking at two options: sqlite (via the go-sqlite3 binding) and TiDB. sqlite does not need much introduction, and I'll just say it's embeddable within the golang-built binary. Continue Reading »

Discussing online schema migrations with Oracle's MySQL engineering managers

November 23, 2016

Last week I had the pleasant opportunity of introducing and discussing the operation of online schema migrations to MySQL's engineering managers, as part of their annual meeting, in London.

Together with Simon J. Mudd of, we discussed our perception of what it takes to run online schema migrations on a live, busy system.

While the Oracle/MySQL engineers develop new features or optimize behavior in the MySQL, we of the industry have the operational expertise and understanding of the flow of working with MySQL. In all topics, and in schema migration in particular, there is a gap between what's perceived to be the use case and what the use case actually is. It is the community's task to provide feedback back to Oracle so as to align development to match operations need where possible.

Our meeting included the following:

Need for schema migrations

We presented, based on our experience in current and past companies, and based on our friends of the community's experience, the case for online schema migrations. At GitHub, at and in many other companies I'm familiar with, we continuously deploy to production, and this implies continuous schema migrations to our production databases. We have migrations running daily; sometimes multiple per day, some time none. Continue Reading »

Three wishes for a new year

September 28, 2016

(Almost) another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Relaxed GTID constraints

I'm still not using GTID, and still see operational issues with working with GTID. As a latest example, our new schema migration solution, gh-ost, allows us to test migrations in production, on replicas. The GTID catch? gh-ost has to write something to the binary log. Thus, it "corrupts" the replica with a bogus GTID entry that will never be met in another server, thus making said replica unsafe to promote. We can work around this, but...

I understand the idea and need for the Executed GTID Set. It will certainly come in handy with multi-writer InnoDB Cluster. However for most use cases GTID poses a burden. The reason is that our topologies are imperfect, and we as humans are imperfect, and operations are most certainly imperfect. We may wish to operate on a replica: test something, by intention or mistake. We may wish to use a subchain as the seed for a new cluster split. We may wish to be able to write to downstream replicas. We may use a 3rd party tool that issues a flush tables with read lock without disabling sql_log_bin. Things just happen.

For that, I would like to suggest GTID control levels, such as:

  1. Strict: same as Oracle's existing implementation. Executed sets, purged sets, whatnot.
  2. Last executed: a mode where the only thing that counts is the last executed GTID value. If I repoint replica, all it needs to check is "hey this is my last executed GTID entry, give me the coordinates of yours. And, no, I don't care about comparing executed and purged sets, I will trust you and keep running from that point on"
  3. Declarative: GTIDs are generated, are visible in each and every binary log entry, but are completely ignored.

I realize Oracle MySQL GTID is out for some over 3 years now, but I'm sorry - I still have reservations and see use cases where I fear it will not serve me right.

How about my previous years wishes? World peace and good health never came through, however:

  • My 2015 wish for "decent, operations friendly built in online table refactoring" was unmet, however gh-ost is a thing now and exceeds my expectations. No, really. Please come see Tom & myself present gh-ost and how it changed our migration paradigm.
  • My 2012 wish for "decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL" was met by MariaDB's window functions.
    Not strictly Window Functions, but Oracle MySQL 8.0 will support CTE (hierarchial/recursive), worth a mention.

See you in Amsterdam!

gh-ost 1.0.17: Hooks, Sub-second lag control, Amazon RDS and more

September 6, 2016

gh-ost version 1.0.17 is now released, with various additions and fixes. Here are some notes of interest:


gh-ost now supports hooks. These are your own executables that gh-ost will invoke at particular points of interest (validation pass, about to cut-over, success, failure, status, etc.)

gh-ost will set various environment variables for your executables to pick up, passing along such information as migrated/ghost table name, elapsed time, processed rows, migrated host etc.

Sub-second lag control

At GitHub we're very strict about replication lag. We keep it well under 1 second at most times. gh-ost can now identify sub-second lag on replicas (well, you need to supply with the right query). Our current production migrations are set by default with --max-lag-millis=500 or less, and our most intensive migrations keep replication lag well below 1sec or even below 500ms


The SUPER privilege is required to set global binlog_format='ROW' and for STOP SLAVE; START SLAVE;

If you know your replica has RBR, you can pass --assume-rbr and skips those steps.


Hooks + No Super = RDS, as seems to be the case. For --test-on-replica you will need to supply your own gh-ost-on-stop-replication hook, to stop your RDS replica at cut-over phase. See this tracking issue Continue Reading »

MySQL vs. PostgreSQL, gh-ost perspective

August 11, 2016

Last week we released gh-ost, GitHub's online schema migration tool for MySQL. As with other open source releases in the MySQL ecosystem, this release was echoed by several "Why not PostgreSQL?" comments. Having been active in open source since many years now, I'm familiar with these responses, and I find this is a good time to share my thoughts. Why? XKCD knows the answer:

XKCD: Duty Calls

I picked one post I wish to address (latest commit: 3dfbd2cd3f5468f035ec86442d2c670a510118d8). The author invested some time writing it. It nicely summarizes claims I've heard over the years, as well as some prejudice. Through responding to this post I will be generalizing thoughts and impressions to address the common reactions. Dear @brandur, let's grab a beer some day; I fundamentally disagree with your post and with its claims.

EDIT: linked post has been updated following this writing; I'd like to thank the author for his consideration. Also see his followup post. The version I've responded to in this post is this commit. Continue Reading »

Introducing gh-ost: triggerless online schema migrations

August 1, 2016

I'm thoroughly happy to introduce gh-ost: triggerless, controllable, auditable, testable, trusted online schema change tool released today by GitHub.

gh-ost now powers our production schema migrations. We hit some serious limitations using pt-online-schema-change on our large volume, high traffic tables, to the effect of driving our database to a near grinding halt or even to the extent of causing outages. With gh-ost, we are now able to migrate our busiest tables at any time, peak hours and heavy workloads included, without causing impact to our service.

gh-ost supports testing in production. It goes a long way to build trust, both in integrity and in control. Are your databases just too busy and you cannot run existing online-schema-change tools? Have you suffered outages due to migrations? Are you tired of babysitting migrations that run up to 3:00am? Tired of being the only one tailing logs? Please, take a look at gh-ost. I believe it changes online migration paradigm.

For a more thorough overview, please read the announcement on the GitHub Engineering Blog, and proceed to the documentation.

gh-ost is open sourced under the MIT license.

Powered by Wordpress and MySQL. Theme by