Using dbdeployer in CI tests

I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress).

The case for gh-ost

A user opened an issue on gh-ost, and the user was using MySQL 5.5. gh-ost is being tested on 5.7 where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be?

gh-ost tests

gh-ost has three different test types:

  • Unit tests: these are plain golang logic tests which are very easy and quick to run.
  • Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.
  • System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more.

Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What’s the deal with the integration tests? Continue reading » “Using dbdeployer in CI tests”

gh-ost 1.0.42 released: JSON support, optimizations

gh-ost 1.0.42 is released and available for download.

JSON

MySQL 5.7’s JSON data type is now supported.

There is a soft-limitation, that your JSON may not be part of your PRIMARY KEY. Currently this isn’t even supported by MySQL anyhow.

Performance

Two noteworthy changes are:

  • Client side prepared statements reduce network traffic and round trips to the server.
  • Range query iteration avoids creating temporary tables and filesorting.

We’re not running benchmarks at this time to observe performance gains.

5.7

More tests validating 5.7 compatibility (at this time GitHub runs MySQL 5.7 in production).

Ongoing

Many other changes included.

We are grateful for all community feedback in form of open Issues, Pull Requests and questions!

gh-ost is authored by GitHub. It is free and open source and is available under the MIT license.

Speaking

In two weeks time, Jonah Berquist will present gh-ost: Triggerless, Painless, Trusted Online Schema Migrations at Percona Live, Dublin.

Tom Krouper and myself will present MySQL Infrastructure Testing Automation at GitHub, where, among other things, we describe how we test gh-ost in production.

Speaking at August Penguin, MySQL Track, GitHub sponsored

This Thursday I’ll be presenting at August Penguin, conveniently taking place September 7th, 8th, Ramat Gan, Israel.

I will be speaking as part of the MySQL track, 2nd half of Thursday. The (Hebrew) schedule is here.

My talk is titled Reliable failovers, safe schema migrations: open source solutions to MySQL problems. I will describe some of the open source MySQL infrastructure work we run at GitHub ; how it solves reliability, availability and usability. I’ll describe some of our internal workflows and our use of chat and chatops.

I’m proud to announce GitHub sponsors the event. We won’t have a booth, but please do grab me in the hallways or over lunch to chat!

And, yes, octocat stickers will be made available 🙂

 

Discussing online schema migrations with Oracle’s MySQL engineering managers

Last week I had the pleasant opportunity of introducing and discussing the operation of online schema migrations to MySQL’s engineering managers, as part of their annual meeting, in London.

Together with Simon J. Mudd of Booking.com, we discussed our perception of what it takes to run online schema migrations on a live, busy system.

While the Oracle/MySQL engineers develop new features or optimize behavior in the MySQL, we of the industry have the operational expertise and understanding of the flow of working with MySQL. In all topics, and in schema migration in particular, there is a gap between what’s perceived to be the use case and what the use case actually is. It is the community’s task to provide feedback back to Oracle so as to align development to match operations need where possible.

Our meeting included the following:

Need for schema migrations

We presented, based on our experience in current and past companies, and based on our friends of the community’s experience, the case for online schema migrations. At GitHub, at Booking.com and in many other companies I’m familiar with, we continuously deploy to production, and this implies continuous schema migrations to our production databases. We have migrations running daily; sometimes multiple per day, some time none. Continue reading » “Discussing online schema migrations with Oracle’s MySQL engineering managers”

Three wishes for a new year

(Almost) another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Relaxed GTID constraints

I’m still not using GTID, and still see operational issues with working with GTID. As a latest example, our new schema migration solution, gh-ost, allows us to test migrations in production, on replicas. The GTID catch? gh-ost has to write something to the binary log. Thus, it “corrupts” the replica with a bogus GTID entry that will never be met in another server, thus making said replica unsafe to promote. We can work around this, but…

I understand the idea and need for the Executed GTID Set. It will certainly come in handy with multi-writer InnoDB Cluster. However for most use cases GTID poses a burden. The reason is that our topologies are imperfect, and we as humans are imperfect, and operations are most certainly imperfect. We may wish to operate on a replica: test something, by intention or mistake. We may wish to use a subchain as the seed for a new cluster split. We may wish to be able to write to downstream replicas. We may use a 3rd party tool that issues a flush tables with read lock without disabling sql_log_bin. Things just happen.

For that, I would like to suggest GTID control levels, such as:

  1. Strict: same as Oracle’s existing implementation. Executed sets, purged sets, whatnot.
  2. Last executed: a mode where the only thing that counts is the last executed GTID value. If I repoint replica, all it needs to check is “hey this is my last executed GTID entry, give me the coordinates of yours. And, no, I don’t care about comparing executed and purged sets, I will trust you and keep running from that point on”
  3. Declarative: GTIDs are generated, are visible in each and every binary log entry, but are completely ignored.

I realize Oracle MySQL GTID is out for some over 3 years now, but I’m sorry – I still have reservations and see use cases where I fear it will not serve me right.

How about my previous years wishes? World peace and good health never came through, however:

  • My 2015 wish for “decent, operations friendly built in online table refactoring” was unmet, however gh-ost is a thing now and exceeds my expectations. No, really. Please come see Tom & myself present gh-ost and how it changed our migration paradigm.
  • My 2012 wish for “decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL” was met by MariaDB’s window functions.
    Not strictly Window Functions, but Oracle MySQL 8.0 will support CTE (hierarchial/recursive), worth a mention.

See you in Amsterdam!

gh-ost 1.0.17: Hooks, Sub-second lag control, Amazon RDS and more

gh-ost version 1.0.17 is now released, with various additions and fixes. Here are some notes of interest:

Hooks

gh-ost now supports hooks. These are your own executables that gh-ost will invoke at particular points of interest (validation pass, about to cut-over, success, failure, status, etc.)

gh-ost will set various environment variables for your executables to pick up, passing along such information as migrated/ghost table name, elapsed time, processed rows, migrated host etc.

Sub-second lag control

At GitHub we’re very strict about replication lag. We keep it well under 1 second at most times. gh-ost can now identify sub-second lag on replicas (well, you need to supply with the right query). Our current production migrations are set by default with --max-lag-millis=500 or less, and our most intensive migrations keep replication lag well below 1sec or even below 500ms

No SUPER

The SUPER privilege is required to set global binlog_format='ROW' and for STOP SLAVE; START SLAVE;

If you know your replica has RBR, you can pass --assume-rbr and skips those steps.

RDS

Hooks + No Super = RDS, as seems to be the case. For --test-on-replica you will need to supply your own gh-ost-on-stop-replication hook, to stop your RDS replica at cut-over phase. See this tracking issue Continue reading » “gh-ost 1.0.17: Hooks, Sub-second lag control, Amazon RDS and more”