orchestrator: what’s new in CI, testing & development

Recent focus on development & testing yielded with new orchestrator environments and offerings for developers and with increased reliability and trust. This post illustrates the new changes, and see Developers section on the official documentation for more details.

Testing

In the past four years orchestrator was developed at GitHub, and using GitHub’s environments for testing. This is very useful for testing orchestrator‘s behavior within GitHub, interacting with its internal infrastructure, and validating failover behavior in a production environment. These tests and their results are not visible to the public, though.

Now that orchestrator is developed outside GitHub (that is, outside GitHub the company, not GitHub the platform) I wanted to improve on the testing framework, making it visible, accessible and contribute-able to the community. Thankfully, the GitHub platform has much to offer on that front and orchestrator now uses GitHub Actions more heavily for testing.

GitHub Actions provide a way to run code in a container in the context of the repository. The most common use case is to run CI tests on receiving a Pull Request. Indeed, when GitHub Actions became available, we switched out of Travis CI and into Actions for orchestrator‘s CI.

Today, orchestrator runs three different tests:

Build, unit testing, integration testing, code & doc validation
Upgrade testing
System testing

To highlight what each does:

Build, unit testing, integration testing

Based on the original CI (and possibly will split into distinct tests), this CI Action compiles the code, runs unit tests, runs the suite of integration tests (spins up both MySQL and SQLite databases and runs a series of tests on each backend), this CI job is the “basic” test to see that the contributed code even makes sense.

What’s new in this test is that it now produces an artifact: an orchestrator binary for Linux/amd64. This is again a feature for GitHub Actions; the artifact is kept for a couple months or so per Actions retention policy. Here‘s an example; by the time you read this the binary artifact may or may not still be there.

This means you don’t actually need a development environment on your laptop to be able to build and orchestrator binary. More on this later.

Upgrade testing

Until recently not formalized; I’d test upgrades by deploying them internally at GitHub onto a staging environment. Now upgrades are tested per Pull Request: we spin up a container, deploy orchestrator from master branch using both MySQL and SQLite backends, then checkout the PR branch, and redeploy orchestrator using the existing backends — this verifies that at least backend-database wise, there’s not upgrade errors.

At this time the test only validates the database changes are applicable; in the future this may expand onto more elaborate tests.

System testing

I’m most excited about this one. Taking ideas from our approach to testing gh-ost with dbdeployer, I created https://github.com/openark/orchestrator-ci-env, which offers a full blown testing enviroment for orchestrator, including a MySQL replication topology (courtesy dbdeployer), Consul, HAProxy and more.

This CI testing environment can also serve as a playground in your local docker setup, see shortly.

The system tests suite offers full blown cluster-wide operations such as graceful takeovers, master failovers, errant GTID transaction analysis and recovery and more. The suite utilizes the CI testing environment, breaks it, rebuilds it, validates it… Expects specific output, expects specific failure messages, specific analysis, specific outcomes.

As example, with the system tests suite, we can test the behavior of a master failover in a multi-DC, multi-region (obviously simulated) environment, where a server marked as “candidate” is lagging behind all others, with strict rules for cross-site/cross-region failovers, and still we wish to see that particular replica get promoted as master. We can test not only the topology aspect of the failover, but also the failover hooks, Consul integration and its effects, etc.

Development

There’s now multiple options for developers/contributors to build or just try out orchestrator.

Build on GitHub

As mentioned earlier, you actually don’t need a development environment. You can use orchestrator CI to build and generate a Linux/amd64 orchestrator binary, which you can download & deploy as you see fit.

I’ve signed up for the GitHub Codespaces beta program, and hope to make that available for orchestrator, as well.

Build via Docker

orchestrator offers various Docker build/run environments, accessible via the script/dock script:

`script/dock alpine` will build and spawn `orchestrator` on a minimal alpine linux
`script/dock test` will build and run the same CI tests (unit, integration) as mentioned earlier, but on your own docker environemtn
`script/dock pkg` will build and generate `.rpm` and `.deb` packages

CI environment: the “full orchestrator experience”

This is the orchestrator amusement park. Run script/dock system to spawn the aforementioned CI environment used in system tests, and on top of that, an orchestrator setup fully integrated with that system.

So that’s an orchestrator-MySQL topology-Consul-HAProxy setup, where orchestrator already has the credentials for, and pre-loads the MySQL topology, pre-configured to update Consul upon failover, HAProxy config populated by consul-template, heartbeat injection, and more. It resembles the HA setup at GitHub, and in the future I expect to provide alternate setups (on top).

Once in that docker environment, one can try running relocations, failovers, test orchestrator‘s behavior, etc.

Community

GitHub recently announced GitHub Discussions ; think a stackoverflow like place within one’s repo to ask questions, discuss, vote on answers. It’s expected to be available this summer. When it does, I’ll encourage the community to use it instead of today’s orchestrator-mysql Google Group and of course the many questions posted as Issues.

There’s been a bunch of PRs merged recently, with more to come later on. I’m grateful for all contributions. Please understand if I’m still slow to respond.