Open Source – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 26 May 2020 17:52:52 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 orchestrator on DB AMA: show notes https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes#respond Tue, 26 May 2020 17:52:52 +0000 https://shlomi-noach.github.io/blog/?p=8101 Earlier today I presented orchestrator on DB AMA. Thank you to the organizers Morgan Tocker, Liz van Dijk and Frédéric Descamps for hosting me, and thank you to all who participated!

This was a no-slides, all command-line walkthrough of some of orchestrator‘s capabilities, highlighting refactoring, topology analysis, takeovers and failovers, and discussing a bit of scripting and HTTP API tips.

The recording is available on YouTube (also embedded on https://dbama.now.sh/#history).

To present orchestrator, I used the new shiny docker CI environment; it’s a single docker image running orchestrator, a 4-node MySQL replication topology (courtesy dbdeployer), heartbeat injection, Consul, consul-template and HAProxy. You can run it, too! Just clone the orchestrator repo, then run:

./script/dock system

From there, you may follow the same playbook I used in the presentation, available as orchestrator-demo-playbook.sh.

Hope you find the presentation and the playbook to be useful resources.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes/feed 0 8101
orchestrator: what’s new in CI, testing & development https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development#respond Mon, 11 May 2020 08:01:08 +0000 https://shlomi-noach.github.io/blog/?p=8077 Recent focus on development & testing yielded with new orchestrator environments and offerings for developers and with increased reliability and trust. This post illustrates the new changes, and see Developers section on the official documentation for more details.

Testing

In the past four years orchestrator was developed at GitHub, and using GitHub’s environments for testing. This is very useful for testing orchestrator‘s behavior within GitHub, interacting with its internal infrastructure, and validating failover behavior in a production environment. These tests and their results are not visible to the public, though.

Now that orchestrator is developed outside GitHub (that is, outside GitHub the company, not GitHub the platform) I wanted to improve on the testing framework, making it visible, accessible and contribute-able to the community. Thankfully, the GitHub platform has much to offer on that front and orchestrator now uses GitHub Actions more heavily for testing.

GitHub Actions provide a way to run code in a container in the context of the repository. The most common use case is to run CI tests on receiving a Pull Request. Indeed, when GitHub Actions became available, we switched out of Travis CI and into Actions for orchestrator‘s CI.

Today, orchestrator runs three different tests:

  • Build, unit testing, integration testing, code & doc validation
  • Upgrade testing
  • System testing

To highlight what each does:

Build, unit testing, integration testing

Based on the original CI (and possibly will split into distinct tests), this CI Action compiles the code, runs unit tests, runs the suite of integration tests (spins up both MySQL and SQLite databases and runs a series of tests on each backend), this CI job is the “basic” test to see that the contributed code even makes sense.

What’s new in this test is that it now produces an artifact: an orchestrator binary for Linux/amd64. This is again a feature for GitHub Actions; the artifact is kept for a couple months or so per Actions retention policy. Here‘s an example; by the time you read this the binary artifact may or may not still be there.

This means you don’t actually need a development environment on your laptop to be able to build and orchestrator binary. More on this later.

Upgrade testing

Until recently not formalized; I’d test upgrades by deploying them internally at GitHub onto a staging environment. Now upgrades are tested per Pull Request: we spin up a container, deploy orchestrator from master branch using both MySQL and SQLite backends, then checkout the PR branch, and redeploy orchestrator using the existing backends — this verifies that at least backend-database wise, there’s not upgrade errors.

At this time the test only validates the database changes are applicable; in the future this may expand onto more elaborate tests.

System testing

I’m most excited about this one. Taking ideas from our approach to testing gh-ost with dbdeployer, I created https://github.com/openark/orchestrator-ci-env, which offers a full blown testing enviroment for orchestrator, including a MySQL replication topology (courtesy dbdeployer), Consul, HAProxy and more.

This CI testing environment can also serve as a playground in your local docker setup, see shortly.

The system tests suite offers full blown cluster-wide operations such as graceful takeovers, master failovers, errant GTID transaction analysis and recovery and more. The suite utilizes the CI testing environment, breaks it, rebuilds it, validates it… Expects specific output, expects specific failure messages, specific analysis, specific outcomes.

As example, with the system tests suite, we can test the behavior of a master failover in a multi-DC, multi-region (obviously simulated) environment, where a server marked as “candidate” is lagging behind all others, with strict rules for cross-site/cross-region failovers, and still we wish to see that particular replica get promoted as master. We can test not only the topology aspect of the failover, but also the failover hooks, Consul integration and its effects, etc.

Development

There’s now multiple options for developers/contributors to build or just try out orchestrator.

Build on GitHub

As mentioned earlier, you actually don’t need a development environment. You can use orchestrator CI to build and generate a Linux/amd64 orchestrator binary, which you can download & deploy as you see fit.

I’ve signed up for the GitHub Codespaces beta program, and hope to make that available for orchestrator, as well.

Build via Docker

orchestrator offers various Docker build/run environments, accessible via the script/dock script:

  • `script/dock alpine` will build and spawn `orchestrator` on a minimal alpine linux
  • `script/dock test` will build and run the same CI tests (unit, integration) as mentioned earlier, but on your own docker environemtn
  • `script/dock pkg` will build and generate `.rpm` and `.deb` packages

CI environment: the “full orchestrator experience”

This is the orchestrator amusement park. Run script/dock system to spawn the aforementioned CI environment used in system tests, and on top of that, an orchestrator setup fully integrated with that system.

So that’s an orchestrator-MySQL topology-Consul-HAProxy setup, where orchestrator already has the credentials for, and pre-loads the MySQL topology, pre-configured to update Consul upon failover, HAProxy config populated by consul-template, heartbeat injection, and more. It resembles the HA setup at GitHub, and in the future I expect to provide alternate setups (on top).

Once in that docker environment, one can try running relocations, failovers, test orchestrator‘s behavior, etc.

Community

GitHub recently announced GitHub Discussions ; think a stackoverflow like place within one’s repo to ask questions, discuss, vote on answers. It’s expected to be available this summer. When it does, I’ll encourage the community to use it instead of today’s orchestrator-mysql Google Group and of course the many questions posted as Issues.

There’s been a bunch of PRs merged recently, with more to come later on. I’m grateful for all contributions. Please understand if I’m still slow to respond.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development/feed 0 8077
The state of Orchestrator, 2020 (spoiler: healthy) https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2 https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2#respond Tue, 18 Feb 2020 19:14:12 +0000 https://shlomi-noach.github.io/blog/?p=8016 This post serves as a pointer to my previous announcement about The state of Orchestrator, 2020.

Thank you to Tom Krouper who applied his operational engineer expertise to content publishing problems.

]]>
https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2/feed 0 8016
The state of Orchestrator, 2020 (spoiler: healthy) https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy#respond Tue, 18 Feb 2020 08:09:05 +0000 https://shlomi-noach.github.io/blog/?p=7996 Yesterday was my last day at GitHub, and this post explains what this means for orchestrator. First, a quick historical review:

  • 2014: I began work on orchestrator at Outbrain, as https://github.com/outbrain/orchestrator. I authored several open source projects while working for Outbrain, and created orchestrator to solve discovery, visualization and simple refactoring needs. Outbrain was happy to have the project developed as a public, open source repo from day 1, and it was released under the Apache 2 license. Interestingly, the idea to develop orchestrator came after I attended Percona Live Santa Clara 2014 and watched “ChatOps: How GitHub Manages MySQL” by one Sam Lambert.
  • 2015: Joined Booking.com where my main focus was to redesign and solve issues with the existing high availability setup. With Booking.com’s support, I continued work on orchestrator, pursuing better failure detection and recovery processes. Booking.com was an incredible playground and testbed for orchestrator, a massive deployment of multiple MySQL/MariaDB flavors and configuration.
  • 2016 – 2020: Joined GitHub. GitHub adopted orchestrator and I developed it under GitHub’s own org, at https://github.com/github/orchestrator. It became a core component in github.com’s high availability design, running failure detection and recoveries across sites and geographical regions, with more to come. These 4+ years have been critical to orchestrator‘s development and saw its widespread use. At this time I’m aware of multiple large-scale organizations using orchestrator for high availability and failovers. Some of these are GitHub, Booking.com, Shopify, Slack, Wix, Outbrain, and more. orchestrator is the underlying failover mechanism for vitess, and is also included in Percona’s PMM. These years saw a significant increase in community adoption and contributions, in published content, such as Pythian and Percona technical blog posts, and, not surprisingly, increase in issues and feature requests.


2020

GitHub was very kind to support moving the orchestrator repo under my own https://github.com/openark org. This means all issues, pull requests, releases, forks, stars and watchers have automatically transferred to the new location: https://github.com/openark/orchestrator. The old links do a “follow me” and implicitly direct to the new location. All external links to code and docs still work. I’m grateful to GitHub for supporting this transfer.

I’d like to thank all the above companies for their support of orchestrator and of open source in general. Being able to work on the same product throughout three different companies is mind blowing and an incredible opportunity. orchestrator of course remains open source and licensed with Apache 2. Existing Copyrights are unchanged.

As for what’s next: some personal time off, please understand if there’s delays to reviews/answers. My intention is to continue developing orchestrator. Naturally, the shape of future development depends on how orchestrator meets my future work. Nothing changes in that respect: my focus on orchestrator has always been first and foremost the pressing business needs, and then community support as possible. There are some interesting ideas by prominent orchestrator users and adopters and I’ll share more thoughts in due time.

 

]]>
https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy/feed 0 7996
Un-split brain MySQL via gh-mysql-rewind https://shlomi-noach.github.io/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind https://shlomi-noach.github.io/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind#respond Tue, 05 Mar 2019 13:51:43 +0000 https://shlomi-noach.github.io/blog/?p=7928 We are pleased to release gh-mysql-rewind, a tool that allows us to move MySQL back in time, automatically identify and rewind split brain changes, restoring a split brain server into a healthy replication chain.

I recently had the pleasure of presenting gh-mysql-rewind at FOSDEM. Video and slides are available. Consider following along with the video.

Motivation

Consider a split brain scenario: a “standard” MySQL replication topology suffered network isolation, and one of the replicas was promoted as new master. Meanwhile, the old master was still receiving writes from co-located apps.

Once the network isolation is over, we have a new master and an old master, and a split-brain situation: some writes only took place on one master; others only took place on the other. What if we wanted to converge the two? What paths do we have to, say, restore the old, demoted master, as a replica of the newly promoted master?

The old master is unlikely to agree to replicate from the new master. Changes have been made. AUTO_INCREMENT values have been taken. UNIQUE constraints will fail.

A few months ago, we at GitHub had exactly this scenario. An entire data center went network isolated. Automation failed over to a 2nd DC. Masters in the isolated DC meanwhile kept receiving writes. At the end of the failover we ended up with a split brain scenario – which we expected. However, an additional, unexpected constraint forced us to fail back to the original DC.

We had to make a choice: we’ve already operated for a long time in the 2nd DC and took many writes, that we were unwilling to lose. We were OK to lose (after auditing) the few seconds of writes on the isolated DC. But, how do we converge the data?

Backups are the trivial way out, but they incur long recovery time. Shipping backup data over the network for dozens of servers takes time. Restore time, catching up with changes since backup took place, warming up the servers so that they can handle production traffic, all take time.

Could we have reduces the time for recovery?

There are multiple ways to do that: local backups, local delayed replicas, snapshots… We have embarked on several. In this post I wish to outline gh-mysql-rewind, which programmatically identifies the rogue (aka “bad”) transactions on the network isolated master, rewinds/reverts them, applies some bookkeeping and restores the demoted master as a healthy replica under the newly promoted master, thereby prepared to be promoted if needed.

General overview

gh-mysql-rewind is a shell script. It utilizes multiple technologies, some of which do not speak to each other, to be able to do the magic. It assumes and utilizes the following:

Some breakdown follows.

GTID

MySQL GTIDs keep track of all transactions executed on a given server. GTIDs indicate which server (UUID) originated a write, and ranges of transaction sequences. In a clean state, only one writer will generate GTIDs, and on all the replicas we would see the same GTID set, originated with the writer’s UUID.

In a split brain scenario, we would see divergence. It is possible to use GTID_SUBTRACT(old_master-GTIDs, new-master-GTIDs) to identify the exact set of transactions executed on the old, demoted master, right after the failover. This is the essence of the split brain.

For example, assume that just before the network partition, GTID on the master was 00020192-1111-1111-1111-111111111111:1-5000. Assume after the network partition the new master has UUID of 00020193-2222-2222-2222-222222222222. It began to take writes, and after some time its GTID set showed 00020192-1111-1111-1111-111111111111:1-5000,00020193-2222-2222-2222-222222222222:1-200.

On the demoted master, other writes took place, leading to the GTID set 00020192-1111-1111-1111-111111111111:1-5042.

We will run…

SELECT GTID_SUBTRACT(
  '00020192-1111-1111-1111-111111111111:1-5042',
  '00020192-1111-1111-1111-111111111111:1-5000,00020193-2222-2222-2222-222222222222:1-200'
);

> '00020192-1111-1111-1111-111111111111:5001-5042'

…to identify the exact set of “bad transactions” on the demoted master.

Row Based Replication

With row based replication, and with FULL image format, each DML (INSERT, UPDATE, DELETE) writes to the binary log the complete row data before and after the operation. This means the binary log has enough information for us to revert the operation.

Flashback

Developed by Alibaba, flashback has been incorporated in MariaDB. MariaDB’s mysqlbinlog utility supports a --flashback flag, which interprets the binary log in a special way. Instead of printing out the events in the binary log in order, it prints the inverted operations in reverse order.

To illustrate, let’s assume this pseudo-code sequence of events in the binary log:

insert(1, 'a')
insert(2, 'b')
insert(3, 'c')
update(2, 'b')->(2, 'second')
update(3, 'c')->(3, 'third')
insert(4, 'd')
delete(1, 'a')

A --flashback of this binary log would produce:

insert(1, 'a')
delete(4, 'd')
update(3, 'third')->(3, 'c')
update(2, 'second')->(2, 'b')
delete(3, 'c')
delete(2, 'b')
delete(1, 'a')

Alas, MariaDB and flashback do not speak MySQL GTID language. GTIDs are one of the major points where MySQL and MariaDB have diverged beyond compatibility.

The output of MariaDB’s mysqlbinlog --flashback has neither any mention of GTIDs, nor does the tool take notice of GTIDs in the binary logs in the first place.

gh-mysql-rewind

This is where we step in. GTIDs provide the information about what went wrong. flashback has the mechanism to generate the reverse sequence of statements. gh-mysql-rewind:

  • uses GTIDs to detect what went wrong
  • correlates those GTID entries with binary log files: identifies which binary logs actually contain those GTID events
  • invokes MariaDB’s mysqlbinlog --flashback to generate the reverse of those binary logs
  • injects (dummy) GTID information into the output
  • computes ETA

This last part is worth elaborating. We have created a time machine. We have the mechanics to make it work. But as any Sci-Fi fan knows, one of the most important parts of time travel is knowing ahead where (when) you are going to land. Are you back in the Renaissance? Or are you suddenly to appear on board the French Revolution? Better dress accordingly.

In our scenario it is not enough to move MySQL back in time to some consistent state. We want to know at what time we landed, so that we can instruct the rewinded server to join the replication chain as a healthy replica. In MySQL terms, we need to make MySQL “forget” everything that ever happened after the split brain: not only in terms of data (which we already did), but in terms of GTID history.

gh-mysql-rewind will do the math to project, ahead of time, at what “time” (i.e. GTID set) our time machine arrived. It will issue a `RESET MASTER; SET GLOBAL gtid_purged=’gtid-of-the-landing-time'” to make our re-winded MySQL consistent not only with some past dataset, but also with its own perception of the point in time where that dataset existed.

Limitations

Some limitations are due to MariaDB’s incompatibility with MySQL, some are due to MySQL DDL nature, some due to the fact gh-mysql-rewind is a shell script.

  • Cannot rewind DDL. DDLs are silently ignored, and will impose a problem when trying to re-apply them.
  • JSON, POINT data types are not supported.
  • The logic rewinds the MySQL server farther into the past than strictly required. This simplifies the code considerably, but imposed superfluous time to rewind+reapply, i.e. time to recover.
  • Currently, this only works one server at a time. If a group of 10 servers were network isolated together, the operation would need to run on each of these 10 servers.
  • Runs locally on each server. Requires both MySQL’s mysqlbinlog as well as MariaDB’s mysqlbinlog.

Testing

There’s lot of moving parts to this mechanism. A mixture of technologies that don’t normally speak to each other, injection of data, prediction of ETA… How reliable is all this?

We run continuous gh-mysql-rewind testing in production to consistently prove that it works as expected. Our testing uses a non-production, dedicated, functional replica. It contaminates the data on the replica. It lets gh-mysql-rewind automatically move it back in time, it joins the replica back into the healthy chain.

That’s not enough. We actually create a scenario where we can predict, ahead of testing, what the time-of-arrival will be. We checksum the data on that replica at that time. After contaminating and effectively breaking replication, we expect gh-mysql-rewind to revert the changes back to our predicted point in time. We checksum the data again. We expect 100% match.

See the video or slides for more detail on our testing setup.

Status

At this time the tool in one of several solutions we hope to never need to employ. It is stable and tested. We are looking forward to a promising MySQL development that will provide GTID-revert capabilities using standard commands, such as SELECT undo_transaction('00020192-1111-1111-1111-111111111111:5042').

We have released gh-mysql-rewind as open source, under the MIT license. The public release is a stripped down version of our own script, which has some GitHub-specific integration. We have general ideas in incorporating this functionality into higher level tools.

gh-mysql-rewind is developed by the database-infrastructure team at GitHub.

]]>
https://shlomi-noach.github.io/blog/mysql/un-split-brain-mysql-via-gh-mysql-rewind/feed 0 7928
Using dbdeployer in CI tests https://shlomi-noach.github.io/blog/mysql/using-dbdeployer-in-ci-tests https://shlomi-noach.github.io/blog/mysql/using-dbdeployer-in-ci-tests#respond Tue, 20 Feb 2018 07:29:58 +0000 https://shlomi-noach.github.io/blog/?p=7848 I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress).

The case for gh-ost

A user opened an issue on gh-ost, and the user was using MySQL 5.5. gh-ost is being tested on 5.7 where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be?

gh-ost tests

gh-ost has three different test types:

  • Unit tests: these are plain golang logic tests which are very easy and quick to run.
  • Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing.
  • System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more.

Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What’s the deal with the integration tests?

gh-ost integration tests

The gh-ost integration tests are a suite of scenarios which verify gh-ost‘s operation is sound. These scenarios are mostly concerned with data types, special alter statements etc. Is converting DATETIME to TIMESTAMP working properly? Are latin1 columns being updated correctly? How about renaming a column? Changing a PRIMARY KEY? Column reorder? 5.7 JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication…

The environment for these tests is a master-replica setup, where gh-ost modifies on the table on the replica and can then checksum or compare both the original and the altered ghost table.

We develop gh-ost internally at GitHub, but it’s also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use Travis CI for the public facing tests.

To run gh-ost‘s integration tests as described above as part of our CI tests we should be able to:

  • Create a master/replica setup in CI.
  • Actually, create a master/replica setup in any CI, and namely in Travis CI.
  • Actually, create multiple master/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI.

I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, dbdeplyer stepped in.

dbdeployer

dbdeployer is a rewrite, a replacement to MySQL Sandbox. I’ve been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications:

  • Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it’s fine, we can automate that.
  • Command line flag complexity: I always get lost in the complexity of the flags.
  • Get it right or prepare for battle: if you deployed something, but not the way you wanted, there’s sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere.
  • Deploy, not remove. Adding a sandbox is one thing. How about removing it?

dbdeployer is a golang rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There’s first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what’s available. You can then create sandbox environments: replication, standalone, etc. You can then delete those.

It’s pretty simple and I have not much more to add — which is the best thing about it.

So, with dbdeployer it is easy to create a master/replica. Something like:

dbdeployer unpack path/to/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary ${PWD}/sandbox/binary
dbdeployer replication 5.7.21 --nodes 2 --sandbox-binary ${PWD}/sandbox/binary --sandbox-home ${PWD}/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW

Where does it all fit in, and what about the MySQL binaries though?

So, should dbdeployer be part of the gh-ost repo? And where does one get those MySQL binaries from? Are they to be part of the gh-ost repo? Aren’t they a few GB to extract?

Neither dbdeployer nor MySQL binaries should be added to the gh-ost repo. And fortunately, Giuseppe also solved the MySQL binaries problem.

The scheme I’m looking at right now is as follows:

  • A new public repo, gh-ost-ci-env is created. This repo includes:
    • dbdeployer compiled binaries
    • Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time.
  • gh-ost‘s CI to git clone https://github.com/github/gh-ost-ci-env.git (code)
  • gh-ost‘s CI to setup a master/replica sandbox (one, two).
  • Kick the tests.

The above is a work in progress:

  • At this time only runs a single MySQL version.
  • There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures.

Another concern I have at this time is build time. For a single MySQL version, it takes some 5-7 minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably slower on Travis CI, I can expect between 10m - 15m. Add multiple versions and we’re looking at a 1hr build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is.

That’s all for now. I’m pretty excited for the potential of dbdeployer and will be looking into incorporating the same for orchestrator CI tests.

 

 

]]>
https://shlomi-noach.github.io/blog/mysql/using-dbdeployer-in-ci-tests/feed 0 7848
orchestrator 3.0.6: faster crash detection & recoveries, auto Pseudo-GTID, semi-sync and more https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-6-faster-crash-detection-recoveries-auto-pseudo-gtid-semi-sync-and-more https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-6-faster-crash-detection-recoveries-auto-pseudo-gtid-semi-sync-and-more#respond Mon, 29 Jan 2018 09:40:05 +0000 https://shlomi-noach.github.io/blog/?p=7833 orchestrator 3.0.6 is released and includes some exciting improvements and features. It quickly follows up on 3.0.5 released recently, and this post gives a breakdown of some notable changes:

Faster failure detection

Recall that orchestrator uses a holistic approach for failure detection: it reads state not only from the failed server (e.g. master) but also from its replicas. orchestrator now detects failure faster than before:

  • A detection cycle has been eliminated, leading to quicker resolution of a failure. On our setup, where we poll servers every 5sec, failure detection time dropped from 7-10sec to 3-5sec, keeping reliability. The reduction in time does not lead to increased false positives.
    Side note: you may see increased not-quite-failure analysis such as “I can’t see the master” (UnreachableMaster).
  • Better handling of network scenarios where packets are dropped. Instead of hanging till TCP timeout, orchestrator now observes server discovery asynchronously. We have specialized failover tests that simulate dropped packets. The change reduces detection time by some 5sec.

Faster master recoveries

Promoting a new master is a complex task which attempts to promote the best replica out of the pool of replicas. It’s not always the most up-to-date replica. The choice varies depending on replica configuration, version, and state.

With recent changes, orchestrator is able to to recognize, early on, that the replica it would like to promote as master is ideal. Assuming that is the case, orchestrator is able to immediate promote it (i.e. run hooks, set read_only=0 etc.), and run the rest of the failover logic, i.e. the rewiring of replicas under the newly promoted master, asynchronously.

This allows the promoted server to take writes sooner, even while its replicas are not yet connected. It also means external hooks are executed sooner.

Between faster detection and faster recoveries, we’re looking at some 10sec reduction in overall recovery time: from moment of crash to moment where a new master accepts writes. We stand now at < 20sec in almost all cases, and < 15s in optimal cases. Those times are measured on our failover tests.

We are working on reducing failover time unrelated to orchestrator and hope to update soon.

Automated Pseudo-GTID

As reminder, Pseudo-GTID is an alternative to GTID, without the kind of commitment you make with GTID. It provides similar “point your replica under any other server” behavior GTID allows.

There’s still many setups out there where GTID is not (yet?) deployed and enabled. However, Pseudo-GTID is often misunderstood, and though I’ve blogged and presented Pseudo-GTID many times in the past, I still find myself explaining to people the setup is simple and does not involve change to one’s topologies.

Well, it just got simpler. orchestrator is now able to automatically inject Pseudo-GTID for you.

Say the word: "AutoPseudoGTID": true, grant the necessary privilege, and your non-GTID topology is suddenly supercharged with magical Pseudo-GTID tokens that provide you with:

  • Arbitrary relocation of replicas
  • Automated or manual failovers (masters and intermediate masters)
  • Vendor freedom: runs on Oracle MySQL, Percona Server, MariaDB, or all of the above at the very same time.
  • Version freedom (still on 5.5? No problem. Oh, this gets you crash-safe replication as extra bonus, too)

Auto-Pseudo-GTID further simplifies the infrastructure in that you no longer need to take care of injecting Pseudo-GTID onto the master as well as handle master identity changes. No more event_scheduler to enable/disable nor services to start/stop.

More and more setups are moving to GTID. We may, too! But I find it peculiar that Pseudo-GTID was suggested 4 years ago, when 5.6 GTID was already released, and still many setups are not yet running GTID. If you’re not using GTID, please try Pseudo-GTID! Read more.

Semi-sync support

Semi-sync has been internally supported via a specialized patch contributed by Vitess, to flag a server as semi-sync-able and handle enablement of semi-sync upon master failover.

orchestrator now supports semi-sync more generically. You may use orchestrator to enable/disable semi-sync master/replica side, via orchestrator -c enable-semi-sync-master, orchestrator -c enable-semi-sync-replica, orchestrator -c disable-semi-sync-master, orchestrator -c disable-semi-sync-replica commands (or API equivalent).

The API will also tell you whether semi-sync is enabled on instances. Noteworthy that configured != enabled. A server can be configured with rpl_semi_sync_master_enabled=ON, but if no semi-sync replicas are found, the Rpl_semi_sync_master_status state is OFF.

More

UI changes, removal of prepared statements, documentation updates, raft updates…

orchestrator is free and open source and released under the Apache 2 license. It is authored at and used by GitHub.

I’ll be presenting orchestrator/raft in FOSDEM next week, at the MySQL and Friends Room.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-6-faster-crash-detection-recoveries-auto-pseudo-gtid-semi-sync-and-more/feed 0 7833
gh-ost 1.0.42 released: JSON support, optimizations https://shlomi-noach.github.io/blog/mysql/gh-ost-1-0-42-released-json-support-optimizations https://shlomi-noach.github.io/blog/mysql/gh-ost-1-0-42-released-json-support-optimizations#comments Thu, 14 Sep 2017 07:26:16 +0000 https://shlomi-noach.github.io/blog/?p=7799 gh-ost 1.0.42 is released and available for download.

JSON

MySQL 5.7’s JSON data type is now supported.

There is a soft-limitation, that your JSON may not be part of your PRIMARY KEY. Currently this isn’t even supported by MySQL anyhow.

Performance

Two noteworthy changes are:

  • Client side prepared statements reduce network traffic and round trips to the server.
  • Range query iteration avoids creating temporary tables and filesorting.

We’re not running benchmarks at this time to observe performance gains.

5.7

More tests validating 5.7 compatibility (at this time GitHub runs MySQL 5.7 in production).

Ongoing

Many other changes included.

We are grateful for all community feedback in form of open Issues, Pull Requests and questions!

gh-ost is authored by GitHub. It is free and open source and is available under the MIT license.

Speaking

In two weeks time, Jonah Berquist will present gh-ost: Triggerless, Painless, Trusted Online Schema Migrations at Percona Live, Dublin.

Tom Krouper and myself will present MySQL Infrastructure Testing Automation at GitHub, where, among other things, we describe how we test gh-ost in production.

]]>
https://shlomi-noach.github.io/blog/mysql/gh-ost-1-0-42-released-json-support-optimizations/feed 4 7799
Speaking at Percona Live Dublin: keynote, orchestrator tutorial, MySQL testing automation https://shlomi-noach.github.io/blog/mysql/speaking-at-percona-live-dublin-keynote-orchestrator-tutorial-mysql-testing-automation https://shlomi-noach.github.io/blog/mysql/speaking-at-percona-live-dublin-keynote-orchestrator-tutorial-mysql-testing-automation#respond Wed, 13 Sep 2017 05:20:56 +0000 https://shlomi-noach.github.io/blog/?p=7794 I’m looking forward to a busy Percona Live Dublin conference, delivering three talks. Chronologically, these are:

  • Practical orchestrator tutorial
    Attend this 3 hour tutorial for a thorough overview on orchestrator: what, why, how to configure, best advice, deployments, failovers, security, high availability, common operations, …
    We will of course discuss the new orchestrator/raft setup and share our experience running it in production.
    The tutorial will allow for general questions from the audience and open discussions.
  • Why Open Sourcing Our Database Tooling was the Smart Decision
    What it says. A 10 minute journey advocating for open sourcing infrastructure.
  • MySQL Infrastructure Testing Automation at GitHub
    Co-presenting with Tom Krouper, we share how & why we run infrastructure tests in and near production that gives us trust in many of our ongoing, ever changing operations. Essentially this is “why you should feel OK trusting us with your data”.

See you there!

]]>
https://shlomi-noach.github.io/blog/mysql/speaking-at-percona-live-dublin-keynote-orchestrator-tutorial-mysql-testing-automation/feed 0 7794
orchestrator 3.0.2 GA released: raft consensus, SQLite https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-2-ga-released-raft-consensus-sqlite https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-2-ga-released-raft-consensus-sqlite#comments Tue, 12 Sep 2017 08:39:10 +0000 https://shlomi-noach.github.io/blog/?p=7784 orchestrator 3.0.2 GA is released and available for download (see also packagecloud repository).

3.0.2 is the first stable release in the 3.0* series, introducing (recap from 3.0 pre-release announcement):

orchestrator/raft

Raft is a consensus protocol, supporting leader election and consensus across a distributed system.  In an orchestrator/raft setup orchestrator nodes talk to each other via raft protocol, form consensus and elect a leader. Each orchestrator node has its own dedicated backend database. The backend databases do not speak to each other; only the orchestrator nodes speak to each other.

No MySQL replication setup needed; the backend DBs act as standalone servers. In fact, the backend server doesn’t have to be MySQL, and SQLiteis supported. orchestrator now ships with SQLite embedded, no external dependency needed.

For details, please refer to the documentation:

SQLite

Suggested and requested by many, is to remove orchestrator‘s own dependency on a MySQL backend. orchestrator now supports a SQLite backend.

SQLite is a transactional, relational, embedded database, and as of 3.0 it is embedded within orchestrator, no external dependency required.

orchestrator-client

orchestrator-client is a client shell script which mimics the command line interface, while running curl | jq requests against the HTTP API. It stands to simplify your deployments: interacting with the orchestrator service via orchestrator-client is easier and only requires you to place a shell script (this is as opposed to installing the orchestrator binary + configuration file).

orchestrator-client is the way to interact with your orchestrator/raft cluster. orchestrator-client now has its own RPM/deb release package.

You may still use the web interface, web API ; and a special --ignore-raft-setup keeps power at your hand (use at your own risk).

State of orchestrator/raft

orchestrator/raft is a big change:

  • In the way it is deployed
  • In the way it is operated
  • In the high availability it provides
  • and more

This is why it has been tested in production for a few months.

orchestrator/raft now runs in production at GitHub ; we’ve decommissioned the “old” orchestrator setup having run both in parallel for a while. It drives our failovers and deploys over three data centers.

We are using MySQL as backend to our orchestrator cluster. We will introduce more staging tests for SQLite-based setups.

Roadmap

There’s much to do, and we chose to release a version that has a way to go. We expect:

  • Dynamic raft cluster join/leave operations (right now the cluster is static and configuration based)
  • New nodes joining the cluster to auto-populate data from the cluster. This is actually a built-in feature to the hashicorp/raft library that we use, however we intentionally did not use this functionality, and expect to re-introduce it. At this time, adding a newly provisioned node to the cluster requires a backup/restore or dump/load of DB data from an existing node.
  • Partitioning of probe tasks across nodes
  • Various thoughts on proxy integrations
  • More…

We will focus on making operations simpler, and of course keep stability and reliability at highest priority.

orchestrator tutorial

In two weeks time I will be presenting the practical orchestrator tutorial, a 3 hour practical walkthrough on deployment, configuration, reasoning and more.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-3-0-2-ga-released-raft-consensus-sqlite/feed 1 7784