I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer
(work in progress).
The case for gh-ost
A user opened an issue on gh-ost
, and the user was using MySQL 5.5
. gh-ost
is being tested on 5.7
where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost
tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be?
gh-ost tests
gh-ost
has three different test types:
- Unit tests: these are plain
golang
logic tests which are very easy and quick to run. - Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.
- System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more.
Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What’s the deal with the integration tests?
gh-ost integration tests
The gh-ost
integration tests are a suite of scenarios which verify gh-ost
‘s operation is sound. These scenarios are mostly concerned with data types, special alter
statements etc. Is converting DATETIME
to TIMESTAMP
working properly? Are latin1
columns being updated correctly? How about renaming a column? Changing a PRIMARY KEY
? Column reorder? 5.7
JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication…
The environment for these tests is a master-replica setup, where gh-ost
modifies on the table on the replica and can then checksum or compare both the original and the altered ghost table.
We develop gh-ost
internally at GitHub, but it’s also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use Travis CI for the public facing tests.
To run gh-ost
‘s integration tests as described above as part of our CI tests we should be able to:
- Create a master/replica setup in CI.
- Actually, create a master/replica setup in any CI, and namely in Travis CI.
- Actually, create multiple master/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI.
I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, dbdeplyer
stepped in.
dbdeployer
dbdeployer
is a rewrite, a replacement to MySQL Sandbox. I’ve been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications:
- Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it’s fine, we can automate that.
- Command line flag complexity: I always get lost in the complexity of the flags.
- Get it right or prepare for battle: if you deployed something, but not the way you wanted, there’s sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere.
- Deploy, not remove. Adding a sandbox is one thing. How about removing it?
dbdeployer
is a golang
rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There’s first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what’s available. You can then create sandbox environments: replication, standalone, etc. You can then delete those.
It’s pretty simple and I have not much more to add — which is the best thing about it.
So, with dbdeployer
it is easy to create a master/replica. Something like:
dbdeployer unpack path/to/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary ${PWD}/sandbox/binary dbdeployer replication 5.7.21 --nodes 2 --sandbox-binary ${PWD}/sandbox/binary --sandbox-home ${PWD}/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW
Where does it all fit in, and what about the MySQL binaries though?
So, should dbdeployer
be part of the gh-ost
repo? And where does one get those MySQL binaries from? Are they to be part of the gh-ost
repo? Aren’t they a few GB to extract?
Neither dbdeployer
nor MySQL binaries should be added to the gh-ost
repo. And fortunately, Giuseppe also solved the MySQL binaries problem.
The scheme I’m looking at right now is as follows:
- A new public repo, gh-ost-ci-env is created. This repo includes:
dbdeployer
compiled binaries- Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time.
gh-ost
‘s CI togit clone https://github.com/github/gh-ost-ci-env.git
(code)gh-ost
‘s CI to setup a master/replica sandbox (one, two).- Kick the tests.
The above is a work in progress:
- At this time only runs a single MySQL version.
- There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures.
Another concern I have at this time is build time. For a single MySQL version, it takes some 5-7
minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably slower on Travis CI, I can expect between 10m - 15m
. Add multiple versions and we’re looking at a 1hr
build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is.
That’s all for now. I’m pretty excited for the potential of dbdeployer
and will be looking into incorporating the same for orchestrator
CI tests.