Things that don’t work well with MySQL’s FOREIGN KEY implementation

Foreign keys are an important database construct, that let you keep an aspect of data integrity within your relational model. Data integrity has many faces, most of which are application-specific. Foreign keys let you maintain an association integrity between rows in two tables, one being the parent, the other being the child. I’ve personally mostly avoided using foreign keys in MySQL for many years in my professional work, for several reasons. I’d like to now highlight things I find to be wrong/broken with the MySQL implementation.

But, first, as in the past this caused some confusion: when I say I’m not using foreign keys, that does not mean I don’t JOIN tables. I still JOIN tables. I still have some id column in one table and some parent_id in another. I still use the benefit of the relational model. In a sense, I do use foreign keys. What I don’t normally is the foreign key CONSTRAINT, i.e. the declaration of a CONSTRAINT some_fk FOREIGN KEY ... in a table’s definition.

So here are things I consider to be broken, either specific to the MySQL implementation, or in the general concept. Some are outright deal breakers for environments I’ve worked with. Others are things to work around. In no particular order:

No binary log entries for cascaded writes

I think there are many people unaware of this. In a way, MySQL doesn’t really support foreign keys. The InnoDB engine does. This is old history, from before InnoDB was even officially a MySQL technology, and was developed independently as a 3rd party product. There was a time when MySQL sought alternative engines. There was a time when there was a plan to implement foreign keys in MySQL, above the storage engine level. But as history goes, MySQL and InnoDB both became one with Oracle acquiring both, and I’m only guessing implementing foreign keys in MySQL became lower priority, to be eventually abandoned.

Alas, the fact foreign keys are implemented in the storage engine level has dire consequences. The engine does not have direct access to the binary log. If you create a foreign key constraint with ON DELETE|UPDATE of SET NULL or CASCADE, you should be aware that cascaded operations are never written to the binary log. Consider these two tables: Continue reading » “Things that don’t work well with MySQL’s FOREIGN KEY implementation”

Reducing my OSS involvement, and how it affects orchestrator & gh-ost

I’m going to bring down my work volume around OSS to a minimum, specifically when it comes to orchestrator and gh-ost. This is to explain the whats and hows so that users are as informed as possible. TL;DR a period of time I will not respond to issues, will not review pull requests, will not produce releases, will not answer on mailing lists. That period of time is undefined. Could be as short as a few weeks, could be months, more, an unknown.

The “What”

Both orchestrator and gh-ost are popular tools in the MySQL ecosystem. They enjoy widespread adoption and are known to be used at prominent companies. Time and again I learn of more users of these projects. I used to keep a show-off list, I lost track since.

With wide adoption comes community engagement. This comes in the form of questions (“How do I…”, “Why does this not work…”, “Is it possible to…”), issues (crashing or data integrity bugs, locking issues, performance issues, etc.), suggestions (support this or that) and finally pull requests.

At this time, there’s multiple engagements per day. Between these two projects I estimate more than a full time job addressing those user interactions. That’s a full time job volume on top of an already existing full time job.

Much of this work went on employer’s time, but I have other responsibilities at work, too, and there is no room for a full-time-plus work on these projects. Responding to all community requests is unsustainable and futile. Some issues are left unanswered. Some pull requests are left open.

Even more demanding than time is context. To address a user’s bug report I’d need to re-familiarize myself with 5-year old code. That takes the toll of time but also memory and context switch. As community interaction goes, a simple discussion on an Issue can span multiple days. During those days I’d jump in and out of context. With multiple daily engagements this would mean re-familiarizing myself with different areas of the code, being able to justify a certain behavior; or have good arguments to why we should or should not change it; being able to simulate a scenario in my brain (I don’t have access to users’ environments); comprehend potential scenarios and understand what could break as result of what change — I don’t have and can’t practically have the tests to cover the myriad of scenarios, deployments, software, network and overall infrastructure in all users environments.

Even if I set designated time for community work, this still takes a toll on my daily tasks. The need to have a mental projection in your brain for all that’s open and all that’s to come makes it harder to free my mind and work on a new problem, to really immerse myself in thought, to create something new.

When? For how long?

Continue reading » “Reducing my OSS involvement, and how it affects orchestrator & gh-ost”

The problem with MySQL foreign key constraints in Online Schema Changes

This post explains the inherent problem of running online schema changes in MySQL, on tables participating in a foreign key relationship. We’ll lay some ground rules and facts, sketch a simplified schema, and dive into an online schema change operation.

Our discussion applies to pt-online-schema-change, gh-ost, and Vitess based migrations, or any other online schema change tool that works with a shadow/ghost table like the Facebook tools.

Why Online Schema Change?

Online schema change tools come as workarounds to an old problem: schema migrations in MySQL were blocking, uninterruptible, aggressive in resources, replication unfriendly. Running a straight ALTER TABLE in production means locking your table, generating high load on the primary, causing massive replication lag on replicas once the migration moves down the replication stream.

Isn’t there some Online DDL?

Yes. InnoDB supports Online DDL, where for many ALTER types, your table remains unblocked throughout the migration. That’s an important improvement, but unfortunately not enough. Some migration types do not permit concurrent DDL (notably changing column data type, e.g. from INT to BIGINT). Migration is still aggressive and generates high load on your server. Replicas still run the migration sequentially. If your migration takes 5 hours to run concurrently on the primary, expect a 5 hour replication lag on your replica, i.e. complete loss of your fresh read capacity.

Isn’t there some Instant DDL?

Yes. But unfortunately extremely limited. Mostly just for adding a new column. See here or again here. Instant DDLs showed great promise when introduced (contributed to MySQL by Tencent Games DBA Team) three years ago, and the hope was that MySQL would support many more types of ALTER TABLE in INSTANT DDL. At this time this has not happened yet, and we do with what we have.

Not everyone is Google or Facebook scale, right?

True. But you don’t need to to be Google, or Facebook, or GitHub etc. scale to feel the pain of schema changes. Any non trivially sized table takes time to ALTER, which results with lock/downtime. If your tables are limited to hundreds or mere thousands of small rows, you can get away with it. When your table grows, and a mere dozens of MB of data is enough, ALTER becomes non-trivial at best case, and outright a cause of outage in a common scenario, in my experience.

Let’s discuss foreign key constraints

In the relational model tables have relationships. A column in one table indicates a column in another table, so that a row in one table has a relationship one or more rows in another table. That’s the “foreign key”. A foreign key constraint is the enforcement of that relationship. A foreign key constraint is a database construct which watches over rows in different tables and ensures the relationship does not break. For example, it may prevent me from deleting a row that is in a relationship, to prevent the related row(s) from becoming orphaned. Continue reading » “The problem with MySQL foreign key constraints in Online Schema Changes”

orchestrator on DB AMA: show notes

Earlier today I presented orchestrator on DB AMA. Thank you to the organizers Morgan Tocker, Liz van Dijk and Frédéric Descamps for hosting me, and thank you to all who participated!

This was a no-slides, all command-line walkthrough of some of orchestrator‘s capabilities, highlighting refactoring, topology analysis, takeovers and failovers, and discussing a bit of scripting and HTTP API tips.

The recording is available on YouTube (also embedded on https://dbama.now.sh/#history).

To present orchestrator, I used the new shiny docker CI environment; it’s a single docker image running orchestrator, a 4-node MySQL replication topology (courtesy dbdeployer), heartbeat injection, Consul, consul-template and HAProxy. You can run it, too! Just clone the orchestrator repo, then run:

./script/dock system

From there, you may follow the same playbook I used in the presentation, available as orchestrator-demo-playbook.sh.

Hope you find the presentation and the playbook to be useful resources.

orchestrator: what’s new in CI, testing & development

Recent focus on development & testing yielded with new orchestrator environments and offerings for developers and with increased reliability and trust. This post illustrates the new changes, and see Developers section on the official documentation for more details.

Testing

In the past four years orchestrator was developed at GitHub, and using GitHub’s environments for testing. This is very useful for testing orchestrator‘s behavior within GitHub, interacting with its internal infrastructure, and validating failover behavior in a production environment. These tests and their results are not visible to the public, though.

Now that orchestrator is developed outside GitHub (that is, outside GitHub the company, not GitHub the platform) I wanted to improve on the testing framework, making it visible, accessible and contribute-able to the community. Thankfully, the GitHub platform has much to offer on that front and orchestrator now uses GitHub Actions more heavily for testing.

GitHub Actions provide a way to run code in a container in the context of the repository. The most common use case is to run CI tests on receiving a Pull Request. Indeed, when GitHub Actions became available, we switched out of Travis CI and into Actions for orchestrator‘s CI.

Today, orchestrator runs three different tests:

  • Build, unit testing, integration testing, code & doc validation
  • Upgrade testing
  • System testing

To highlight what each does: Continue reading » “orchestrator: what’s new in CI, testing & development”

Pulling this blog out of Planet MySQL aggregator, over community concerns

I’ve decided to pull this blog (http://code.openark.org/blog/) out of the planet.mysql.com aggregator.

planet.mysql.com (formerly planetmysql.com) serves as a blog aggregator, and collects news and blog posts on various MySQL and its ecosystem topics. It collects some vendor and team blogs as well as “indie” blogs such as this one.

It has traditionally been the go-to place to catch up on the latest developments, or to read insightful posts. This blog itself has been aggregated in Planet MySQL for some eleven years.

Planet MySQL used to be owned by the MySQL community team. This recently changed with unwelcoming implications for the community.

I recently noticed how a blog post of mine, The state of Orchestrator, 2020 (spoiler: healthy), did not get aggregated in Planet MySQL. After a quick discussion and investigation, it was determined (and confirmed) it was filtered out because it contained the word “MariaDB”. It has later been confirmed that Planet MySQL now filters out posts indicating its competitors, such as MariaDB, PostgreSQL, MongoDB.

Planet MySQL is owned by Oracle and it is their decision to make. Yes, logic implies they would not want to publish a promotional post for a competitor. However, I wish to explain how this blind filtering negatively affects the community.

But, before that, I’d like to share that I first attempted to reach out to whoever is in charge of Planet MySQL at this time (my understanding is that this is a marketing team). Sadly, two attempts at reaching out to them individually, and another attempt at reaching out on behalf of a small group of individual contributors, yielded no response. The owners would not have audience with me, and would not hear me out. I find it disappointing and will let others draw morals.

Why filtering is harmful for the community

We recognize that planet.mysql.com is an important information feed. It is responsible for a massive ratio of the traffic on my blog, and no doubt for many others. Indie blog posts, or small-team blog posts, practically depend on planet.mysql.com to get visibility.

And this is particularly important if you’re an open source developer who is trying to promote an open source project in the MySQL ecosystem. Without this aggregation, you will get significantly less visibility.

But, open source projects in the MySQL ecosystem do not live in MySQL vacuum, and typically target/support MySQL, Percona Server and MariaDB. As examples:

  • DBDeployer should understand MariaDB versioning scheme

  • skeema needs to recognize MariaDB features not present in MySQL

  • ProxySQL needs to support MariaDB Galera queries

  • orchestrator needs to support MariaDB’s GTID flavor

Consider that a blog post of the form “Project version 1.2.3 now released!” is likely to mention things like “fixed MariaDB GTID setup” or “MariaDB 10.x now supported” etc. Consider just pointing out that “PROJECT X supports MySQL, MariaDB and Percona Server”.

Consider that merely mentioning “MariaDB” gets your blog post filtered out on planet.mysql.com. This has an actual impact on open source development in the MySQL ecosystem. We will lose audience and lose adoption.

I believe the MySQL ecosystem as a whole will be negatively affected as result, and this will circle back to MySQL itself. I believe this goes against the very interests of Oracle/MySQL.

I’ve been around the MySQL community for some 12 years now. From my observation, there is no doubt that MySQL would not thrive as it does today, without the tooling, blogs, presentations and general advice by the community.

This is more than an estimation. I happen to know that, internally at MySQL, they have used or are using open source projects from the community, projects whose blog posts get filtered out today because they mention “MariaDB”. I find that disappointing.

I have personally witnessed how open source developments broke existing barriers to enable companies to use MySQL at greater scale, in greater velocity, with greater stability. I was part of such companies and I’ve personally authored such tools. I’m disappointed that planet.mysql.com filters out my blog posts for those tools and without giving me audience, and extend my disappointment for all open source project maintainers.

At this time I consider planet.mysql.com to be a marketing blog, not a community feed, and do not want to participate in its biased aggregation.

The state of Orchestrator, 2020 (spoiler: healthy)

Yesterday was my last day at GitHub, and this post explains what this means for orchestrator. First, a quick historical review:

  • 2014: I began work on orchestrator at Outbrain, as https://github.com/outbrain/orchestrator. I authored several open source projects while working for Outbrain, and created orchestrator to solve discovery, visualization and simple refactoring needs. Outbrain was happy to have the project developed as a public, open source repo from day 1, and it was released under the Apache 2 license. Interestingly, the idea to develop orchestrator came after I attended Percona Live Santa Clara 2014 and watched “ChatOps: How GitHub Manages MySQL” by one Sam Lambert.
  • 2015: Joined Booking.com where my main focus was to redesign and solve issues with the existing high availability setup. With Booking.com’s support, I continued work on orchestrator, pursuing better failure detection and recovery processes. Booking.com was an incredible playground and testbed for orchestrator, a massive deployment of multiple MySQL/MariaDB flavors and configuration.
  • 2016 – 2020: Joined GitHub. GitHub adopted orchestrator and I developed it under GitHub’s own org, at https://github.com/github/orchestrator. It became a core component in github.com’s high availability design, running failure detection and recoveries across sites and geographical regions, with more to come. These 4+ years have been critical to orchestrator‘s development and saw its widespread use. At this time I’m aware of multiple large-scale organizations using orchestrator for high availability and failovers. Some of these are GitHub, Booking.com, Shopify, Slack, Wix, Outbrain, and more. orchestrator is the underlying failover mechanism for vitess, and is also included in Percona’s PMM. These years saw a significant increase in community adoption and contributions, in published content, such as Pythian and Percona technical blog posts, and, not surprisingly, increase in issues and feature requests.


2020

GitHub was very kind to support moving the orchestrator repo under my own https://github.com/openark org. This means all issues, pull requests, releases, forks, stars and watchers have automatically transferred to the new location: https://github.com/openark/orchestrator. The old links do a “follow me” and implicitly direct to the new location. All external links to code and docs still work. I’m grateful to GitHub for supporting this transfer.

I’d like to thank all the above companies for their support of orchestrator and of open source in general. Being able to work on the same product throughout three different companies is mind blowing and an incredible opportunity. orchestrator of course remains open source and licensed with Apache 2. Existing Copyrights are unchanged.

As for what’s next: some personal time off, please understand if there’s delays to reviews/answers. My intention is to continue developing orchestrator. Naturally, the shape of future development depends on how orchestrator meets my future work. Nothing changes in that respect: my focus on orchestrator has always been first and foremost the pressing business needs, and then community support as possible. There are some interesting ideas by prominent orchestrator users and adopters and I’ll share more thoughts in due time.

 

Quick hack for GTID_OWN lack

One of the benefits of MySQL GTIDs is that each server remembers all GTID entries ever executed. Normally these would be ranges, e.g. 0041e600-f1be-11e9-9759-a0369f9435dc:1-3772242 or multi-ranges, e.g. 24a83cd3-e30c-11e9-b43d-121b89fcdde6:1-103775793, 2efbcca6-7ee1-11e8-b2d2-0270c2ed2e5a:1-356487160, 46346470-6561-11e9-9ab7-12aaa4484802:1-26301153, 757fdf0d-740e-11e8-b3f2-0a474bcf1734:1-192371670, d2f5e585-62f5-11e9-82a5-a0369f0ed504:1-10047.

One of the common problems in asynchronous replication is the issue of consistent reads. I’ve just written to the master. Is the data available on a replica yet? We have iterated on this, from reading on master, to heuristically finding up-to-date replicas based on heartbeats (see presentation and slides) via freno, and now settled, on some parts of our apps, to using GTID.

GTIDs are reliable as any replica can give you a definitive answer to the question: have you applied a given transaction or not?. Given a GTID entry, say f7b781a9-cbbd-11e9-affb-008cfa542442:12345, one may query for the following on a replica:

mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:12345', @@global.gtid_executed) as transaction_found;
+-------------------+
| transaction_found |
+-------------------+
|                 1 |
+-------------------+

mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:123450000', @@global.gtid_executed) as transaction_found;
+-------------------+
| transaction_found |
+-------------------+
|                 0 |
+-------------------+

Getting OWN_GTID

This is all well, but, given some INSERT or UPDATE on the master, how can I tell what’s the GTID associated with that transaction? There\s good news and bad news.

  • Good news is, you may SET SESSION session_track_gtids = OWN_GTID. This makes the MySQL protocol return the GTID generated by your transaction.
  • Bad news is, this isn’t a standard SQL response, and the common MySQL drivers offer you no way to get that information!

At GitHub we author our own Ruby driver, and have implemented the functionality to extract OWN_GTID, much like you’d extract LAST_INSERT_ID. But, how does one solve that without modifying the drivers? Here’s a poor person’s solution which gives you an inexact, but good enough, info. Following a write (insert, delete, create, …), run:

select gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), @@global.gtid_executed)) as master_generated_gtid;

The idea is to “clean” the executed GTID set from irrelevant entries, by filtering out all ranges that do not belong to the server you’ve just written to (the master). The number 1000000000000000 stands for “high enough value that will never be reached in practice” – set to your own preferred value, but this value should take you beyond 300 years assuming 100,000 transactions per second.

Continue reading » “Quick hack for GTID_OWN lack”

Un-split brain MySQL via gh-mysql-rewind

We are pleased to release gh-mysql-rewind, a tool that allows us to move MySQL back in time, automatically identify and rewind split brain changes, restoring a split brain server into a healthy replication chain.

I recently had the pleasure of presenting gh-mysql-rewind at FOSDEM. Video and slides are available. Consider following along with the video.

Motivation

Consider a split brain scenario: a “standard” MySQL replication topology suffered network isolation, and one of the replicas was promoted as new master. Meanwhile, the old master was still receiving writes from co-located apps.

Once the network isolation is over, we have a new master and an old master, and a split-brain situation: some writes only took place on one master; others only took place on the other. What if we wanted to converge the two? What paths do we have to, say, restore the old, demoted master, as a replica of the newly promoted master?

The old master is unlikely to agree to replicate from the new master. Changes have been made. AUTO_INCREMENT values have been taken. UNIQUE constraints will fail.

A few months ago, we at GitHub had exactly this scenario. An entire data center went network isolated. Automation failed over to a 2nd DC. Masters in the isolated DC meanwhile kept receiving writes. At the end of the failover we ended up with a split brain scenario – which we expected. However, an additional, unexpected constraint forced us to fail back to the original DC.

We had to make a choice: we’ve already operated for a long time in the 2nd DC and took many writes, that we were unwilling to lose. We were OK to lose (after auditing) the few seconds of writes on the isolated DC. But, how do we converge the data?

Backups are the trivial way out, but they incur long recovery time. Shipping backup data over the network for dozens of servers takes time. Restore time, catching up with changes since backup took place, warming up the servers so that they can handle production traffic, all take time.

Could we have reduces the time for recovery?

Continue reading » “Un-split brain MySQL via gh-mysql-rewind”