orchestrator – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 26 May 2020 17:52:52 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 orchestrator on DB AMA: show notes https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes#respond Tue, 26 May 2020 17:52:52 +0000 https://shlomi-noach.github.io/blog/?p=8101 Earlier today I presented orchestrator on DB AMA. Thank you to the organizers Morgan Tocker, Liz van Dijk and Frédéric Descamps for hosting me, and thank you to all who participated!

This was a no-slides, all command-line walkthrough of some of orchestrator‘s capabilities, highlighting refactoring, topology analysis, takeovers and failovers, and discussing a bit of scripting and HTTP API tips.

The recording is available on YouTube (also embedded on https://dbama.now.sh/#history).

To present orchestrator, I used the new shiny docker CI environment; it’s a single docker image running orchestrator, a 4-node MySQL replication topology (courtesy dbdeployer), heartbeat injection, Consul, consul-template and HAProxy. You can run it, too! Just clone the orchestrator repo, then run:

./script/dock system

From there, you may follow the same playbook I used in the presentation, available as orchestrator-demo-playbook.sh.

Hope you find the presentation and the playbook to be useful resources.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-on-db-ama-show-notes/feed 0 8101
orchestrator: what’s new in CI, testing & development https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development#respond Mon, 11 May 2020 08:01:08 +0000 https://shlomi-noach.github.io/blog/?p=8077 Recent focus on development & testing yielded with new orchestrator environments and offerings for developers and with increased reliability and trust. This post illustrates the new changes, and see Developers section on the official documentation for more details.

Testing

In the past four years orchestrator was developed at GitHub, and using GitHub’s environments for testing. This is very useful for testing orchestrator‘s behavior within GitHub, interacting with its internal infrastructure, and validating failover behavior in a production environment. These tests and their results are not visible to the public, though.

Now that orchestrator is developed outside GitHub (that is, outside GitHub the company, not GitHub the platform) I wanted to improve on the testing framework, making it visible, accessible and contribute-able to the community. Thankfully, the GitHub platform has much to offer on that front and orchestrator now uses GitHub Actions more heavily for testing.

GitHub Actions provide a way to run code in a container in the context of the repository. The most common use case is to run CI tests on receiving a Pull Request. Indeed, when GitHub Actions became available, we switched out of Travis CI and into Actions for orchestrator‘s CI.

Today, orchestrator runs three different tests:

  • Build, unit testing, integration testing, code & doc validation
  • Upgrade testing
  • System testing

To highlight what each does:

Build, unit testing, integration testing

Based on the original CI (and possibly will split into distinct tests), this CI Action compiles the code, runs unit tests, runs the suite of integration tests (spins up both MySQL and SQLite databases and runs a series of tests on each backend), this CI job is the “basic” test to see that the contributed code even makes sense.

What’s new in this test is that it now produces an artifact: an orchestrator binary for Linux/amd64. This is again a feature for GitHub Actions; the artifact is kept for a couple months or so per Actions retention policy. Here‘s an example; by the time you read this the binary artifact may or may not still be there.

This means you don’t actually need a development environment on your laptop to be able to build and orchestrator binary. More on this later.

Upgrade testing

Until recently not formalized; I’d test upgrades by deploying them internally at GitHub onto a staging environment. Now upgrades are tested per Pull Request: we spin up a container, deploy orchestrator from master branch using both MySQL and SQLite backends, then checkout the PR branch, and redeploy orchestrator using the existing backends — this verifies that at least backend-database wise, there’s not upgrade errors.

At this time the test only validates the database changes are applicable; in the future this may expand onto more elaborate tests.

System testing

I’m most excited about this one. Taking ideas from our approach to testing gh-ost with dbdeployer, I created https://github.com/openark/orchestrator-ci-env, which offers a full blown testing enviroment for orchestrator, including a MySQL replication topology (courtesy dbdeployer), Consul, HAProxy and more.

This CI testing environment can also serve as a playground in your local docker setup, see shortly.

The system tests suite offers full blown cluster-wide operations such as graceful takeovers, master failovers, errant GTID transaction analysis and recovery and more. The suite utilizes the CI testing environment, breaks it, rebuilds it, validates it… Expects specific output, expects specific failure messages, specific analysis, specific outcomes.

As example, with the system tests suite, we can test the behavior of a master failover in a multi-DC, multi-region (obviously simulated) environment, where a server marked as “candidate” is lagging behind all others, with strict rules for cross-site/cross-region failovers, and still we wish to see that particular replica get promoted as master. We can test not only the topology aspect of the failover, but also the failover hooks, Consul integration and its effects, etc.

Development

There’s now multiple options for developers/contributors to build or just try out orchestrator.

Build on GitHub

As mentioned earlier, you actually don’t need a development environment. You can use orchestrator CI to build and generate a Linux/amd64 orchestrator binary, which you can download & deploy as you see fit.

I’ve signed up for the GitHub Codespaces beta program, and hope to make that available for orchestrator, as well.

Build via Docker

orchestrator offers various Docker build/run environments, accessible via the script/dock script:

  • `script/dock alpine` will build and spawn `orchestrator` on a minimal alpine linux
  • `script/dock test` will build and run the same CI tests (unit, integration) as mentioned earlier, but on your own docker environemtn
  • `script/dock pkg` will build and generate `.rpm` and `.deb` packages

CI environment: the “full orchestrator experience”

This is the orchestrator amusement park. Run script/dock system to spawn the aforementioned CI environment used in system tests, and on top of that, an orchestrator setup fully integrated with that system.

So that’s an orchestrator-MySQL topology-Consul-HAProxy setup, where orchestrator already has the credentials for, and pre-loads the MySQL topology, pre-configured to update Consul upon failover, HAProxy config populated by consul-template, heartbeat injection, and more. It resembles the HA setup at GitHub, and in the future I expect to provide alternate setups (on top).

Once in that docker environment, one can try running relocations, failovers, test orchestrator‘s behavior, etc.

Community

GitHub recently announced GitHub Discussions ; think a stackoverflow like place within one’s repo to ask questions, discuss, vote on answers. It’s expected to be available this summer. When it does, I’ll encourage the community to use it instead of today’s orchestrator-mysql Google Group and of course the many questions posted as Issues.

There’s been a bunch of PRs merged recently, with more to come later on. I’m grateful for all contributions. Please understand if I’m still slow to respond.

]]>
https://shlomi-noach.github.io/blog/mysql/orchestrator-whats-new-in-ci-testing-development/feed 0 8077
Pulling this blog out of Planet MySQL aggregator, over community concerns https://shlomi-noach.github.io/blog/mysql/pulling-this-blog-out-of-planet-mysql-aggregator-over-community-concerns https://shlomi-noach.github.io/blog/mysql/pulling-this-blog-out-of-planet-mysql-aggregator-over-community-concerns#respond Thu, 23 Apr 2020 15:26:24 +0000 https://shlomi-noach.github.io/blog/?p=8020 I’ve decided to pull this blog (https://shlomi-noach.github.io/blog/) out of the planet.mysql.com aggregator.

planet.mysql.com (formerly planetmysql.com) serves as a blog aggregator, and collects news and blog posts on various MySQL and its ecosystem topics. It collects some vendor and team blogs as well as “indie” blogs such as this one.

It has traditionally been the go-to place to catch up on the latest developments, or to read insightful posts. This blog itself has been aggregated in Planet MySQL for some eleven years.

Planet MySQL used to be owned by the MySQL community team. This recently changed with unwelcoming implications for the community.

I recently noticed how a blog post of mine, The state of Orchestrator, 2020 (spoiler: healthy), did not get aggregated in Planet MySQL. After a quick discussion and investigation, it was determined (and confirmed) it was filtered out because it contained the word “MariaDB”. It has later been confirmed that Planet MySQL now filters out posts indicating its competitors, such as MariaDB, PostgreSQL, MongoDB.

Planet MySQL is owned by Oracle and it is their decision to make. Yes, logic implies they would not want to publish a promotional post for a competitor. However, I wish to explain how this blind filtering negatively affects the community.

But, before that, I’d like to share that I first attempted to reach out to whoever is in charge of Planet MySQL at this time (my understanding is that this is a marketing team). Sadly, two attempts at reaching out to them individually, and another attempt at reaching out on behalf of a small group of individual contributors, yielded no response. The owners would not have audience with me, and would not hear me out. I find it disappointing and will let others draw morals.

Why filtering is harmful for the community

We recognize that planet.mysql.com is an important information feed. It is responsible for a massive ratio of the traffic on my blog, and no doubt for many others. Indie blog posts, or small-team blog posts, practically depend on planet.mysql.com to get visibility.

And this is particularly important if you’re an open source developer who is trying to promote an open source project in the MySQL ecosystem. Without this aggregation, you will get significantly less visibility.

But, open source projects in the MySQL ecosystem do not live in MySQL vacuum, and typically target/support MySQL, Percona Server and MariaDB. As examples:

  • DBDeployer should understand MariaDB versioning scheme

  • skeema needs to recognize MariaDB features not present in MySQL

  • ProxySQL needs to support MariaDB Galera queries

  • orchestrator needs to support MariaDB’s GTID flavor

Consider that a blog post of the form “Project version 1.2.3 now released!” is likely to mention things like “fixed MariaDB GTID setup” or “MariaDB 10.x now supported” etc. Consider just pointing out that “PROJECT X supports MySQL, MariaDB and Percona Server”.

Consider that merely mentioning “MariaDB” gets your blog post filtered out on planet.mysql.com. This has an actual impact on open source development in the MySQL ecosystem. We will lose audience and lose adoption.

I believe the MySQL ecosystem as a whole will be negatively affected as result, and this will circle back to MySQL itself. I believe this goes against the very interests of Oracle/MySQL.

I’ve been around the MySQL community for some 12 years now. From my observation, there is no doubt that MySQL would not thrive as it does today, without the tooling, blogs, presentations and general advice by the community.

This is more than an estimation. I happen to know that, internally at MySQL, they have used or are using open source projects from the community, projects whose blog posts get filtered out today because they mention “MariaDB”. I find that disappointing.

I have personally witnessed how open source developments broke existing barriers to enable companies to use MySQL at greater scale, in greater velocity, with greater stability. I was part of such companies and I’ve personally authored such tools. I’m disappointed that planet.mysql.com filters out my blog posts for those tools and without giving me audience, and extend my disappointment for all open source project maintainers.

At this time I consider planet.mysql.com to be a marketing blog, not a community feed, and do not want to participate in its biased aggregation.

]]>
https://shlomi-noach.github.io/blog/mysql/pulling-this-blog-out-of-planet-mysql-aggregator-over-community-concerns/feed 0 8020
The state of Orchestrator, 2020 (spoiler: healthy) https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2 https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2#respond Tue, 18 Feb 2020 19:14:12 +0000 https://shlomi-noach.github.io/blog/?p=8016 This post serves as a pointer to my previous announcement about The state of Orchestrator, 2020.

Thank you to Tom Krouper who applied his operational engineer expertise to content publishing problems.

]]>
https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy-2/feed 0 8016
The state of Orchestrator, 2020 (spoiler: healthy) https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy#respond Tue, 18 Feb 2020 08:09:05 +0000 https://shlomi-noach.github.io/blog/?p=7996 Yesterday was my last day at GitHub, and this post explains what this means for orchestrator. First, a quick historical review:

  • 2014: I began work on orchestrator at Outbrain, as https://github.com/outbrain/orchestrator. I authored several open source projects while working for Outbrain, and created orchestrator to solve discovery, visualization and simple refactoring needs. Outbrain was happy to have the project developed as a public, open source repo from day 1, and it was released under the Apache 2 license. Interestingly, the idea to develop orchestrator came after I attended Percona Live Santa Clara 2014 and watched “ChatOps: How GitHub Manages MySQL” by one Sam Lambert.
  • 2015: Joined Booking.com where my main focus was to redesign and solve issues with the existing high availability setup. With Booking.com’s support, I continued work on orchestrator, pursuing better failure detection and recovery processes. Booking.com was an incredible playground and testbed for orchestrator, a massive deployment of multiple MySQL/MariaDB flavors and configuration.
  • 2016 – 2020: Joined GitHub. GitHub adopted orchestrator and I developed it under GitHub’s own org, at https://github.com/github/orchestrator. It became a core component in github.com’s high availability design, running failure detection and recoveries across sites and geographical regions, with more to come. These 4+ years have been critical to orchestrator‘s development and saw its widespread use. At this time I’m aware of multiple large-scale organizations using orchestrator for high availability and failovers. Some of these are GitHub, Booking.com, Shopify, Slack, Wix, Outbrain, and more. orchestrator is the underlying failover mechanism for vitess, and is also included in Percona’s PMM. These years saw a significant increase in community adoption and contributions, in published content, such as Pythian and Percona technical blog posts, and, not surprisingly, increase in issues and feature requests.


2020

GitHub was very kind to support moving the orchestrator repo under my own https://github.com/openark org. This means all issues, pull requests, releases, forks, stars and watchers have automatically transferred to the new location: https://github.com/openark/orchestrator. The old links do a “follow me” and implicitly direct to the new location. All external links to code and docs still work. I’m grateful to GitHub for supporting this transfer.

I’d like to thank all the above companies for their support of orchestrator and of open source in general. Being able to work on the same product throughout three different companies is mind blowing and an incredible opportunity. orchestrator of course remains open source and licensed with Apache 2. Existing Copyrights are unchanged.

As for what’s next: some personal time off, please understand if there’s delays to reviews/answers. My intention is to continue developing orchestrator. Naturally, the shape of future development depends on how orchestrator meets my future work. Nothing changes in that respect: my focus on orchestrator has always been first and foremost the pressing business needs, and then community support as possible. There are some interesting ideas by prominent orchestrator users and adopters and I’ll share more thoughts in due time.

 

]]>
https://shlomi-noach.github.io/blog/mysql/the-state-of-orchestrator-2020-spoiler-healthy/feed 0 7996
MySQL master discovery methods, part 5: Service discovery & Proxy https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-5-service-discovery-proxy https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-5-service-discovery-proxy#respond Mon, 14 May 2018 08:08:32 +0000 https://shlomi-noach.github.io/blog/?p=7869 This is the fifth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via Service discovery and Proxy

Part 4 presented with an anti-pattern setup, where a proxy would infer the identify of the master by drawing conclusions from backend server checks. This led to split brains and undesired scenarios. The problem was the loss of context.

We re-introduce a service discovery component (illustrated in part 3), such that:

  • The app does not own the discovery, and
  • The proxy behaves in an expected and consistent way.

In a failover/service discovery/proxy setup, there is clear ownership of duties:

  • The failover tool own the failover itself and the master identity change notification.
  • The service discovery component is the source of truth as for the identity of the master of a cluster.
  • The proxy routes traffic but does not make routing decisions.
  • The app only ever connects to a single target, but should allow for a brief outage while failover takes place.

Depending on the technologies used, we can further achieve:

  • Hard cut for connections to old, demoted master M.
  • Black/hold off for incoming queries for the duration of failover.

We explain the setup using the following assumptions and scenarios:

  • All clients connect to master via cluster1-writer.example.net, which resolves to a proxy box.
  • We fail over from master M to promoted replica R.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Updates service discovery component that R is the new master for cluster1.

The proxy:

  • Either actively or passively learns that R is the new master, rewires all writes to go to R.
  • If possible, kills existing connections to M.

The app:

  • Needs to know nothing. Its connections to M fail, it reconnects and gets through to R.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted.

Everything is as before.

If the proxy kills existing connections to M, then the fact M is back alive turns meaningless. No one gets through to M. Clients were never aware of its identity anyhow, just as they are unaware of R‘s identity.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • In the process of promotion, M turned read-only.
  • Immediately following promotion, our failover tool updates service discovery.
  • Proxy reloads having seen the changes in service discovery.
  • Our app connects to R.

Discussion

This is a setup we use at GitHub in production. Our components are:

  • orchestrator for failover tool.
  • Consul for service discovery.
  • GLB (HAProxy) for proxy
  • Consul template running on proxy hosts:
    • listening on changes to Consul’s KV data
    • Regenerate haproxy.cfg configuration file
    • reload haproxy

As mentioned earlier, the apps need not change anything. They connect to a name that is always resolved to proxy boxes. There is never a DNS change.

At the time of failover, the service discovery component must be up and available, to catch the change. Otherwise we do not strictly require it to be up at all times.

For high availability we will have multiple proxies. Each of whom must listen on changes to K/V. Ideally the name (cluster1-writer.example.net in our example) resolves to any available proxy box.

  • This, in itself, is a high availability issue. Thankfully, managing the HA of a proxy layer is simpler than that of a MySQL layer. Proxy servers tend to be stateless and equal to each other.
  • See GLB as one example for a highly available proxy layer. Cloud providers, Kubernetes, two level layered proxies, Linux Heartbeat, are all methods to similarly achieve HA.

See also:

Sample orchestrator configuration

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "KVClusterMasterPrefix": "mysql/master",
  "ConsulAddress": "127.0.0.1:8500",
  "ZkAddress": "srv-a,srv-b:12181,srv-c",
  "PostMasterFailoverProcesses": [
    “/just/let/me/know about failover on {failureCluster}“,
  ],

In the above:

  • If ConsulAddress is specified, orchestrator will update given Consul setup with K/V changes.
  • At 3.0.10, ZooKeeper, via ZkAddress, is still not supported by orchestrator.
  • PostMasterFailoverProcesses is here just to point out hooks are not strictly required for the operation to run.

See orchestrator configuration documentation.

All posts in this series

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-5-service-discovery-proxy/feed 0 7869
MySQL master discovery methods, part 4: Proxy heuristics https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-4-proxy-heuristics https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-4-proxy-heuristics#respond Thu, 10 May 2018 06:10:34 +0000 https://shlomi-noach.github.io/blog/?p=7867 Note: the method described here is an anti pattern

This is the fourth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via Proxy Heuristics

In Proxy Heuristics all clients connect to the master through a proxy. The proxy observes the backend MySQL servers and determines who the master is.

This setup is simple and easy, but is an anti pattern. I recommend against using this method, as explained shortly.

Clients are all configured to connect to, say, cluster1-writer.proxy.example.net:3306. The proxy will intercept incoming requests either based on hostname or by port. It is aware of all/some MySQL backend servers in that cluster, and will route traffic to the master M.

A simple heuristic that I’ve seen in use is: pick the server that has read_only=0, a very simple check.

Let’s take a look at how this works and what can go wrong.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Fails over, but doesn’t need to run any hooks.

The proxy:

  • Knows both about M and R.
  • Notices M fails health checks (select @@global.read_only returns error since the box is down).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.

Success, we’re happy.

Configuration tip

With an automated failover solution, use read_only=1 in my.cnf at all times. Only the failover solution will set a server to read_only=0.

With this configuration, when M restarts, MySQL starts up as read_only=1.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted. Our tool:

  • Fails over, but doesn’t need to run any hooks.

The proxy:

  • Knows both about M and R.
  • Notices M fails health checks (select @@global.read_only returns error since the box is down).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.
  • 10 seconds later M comes back to life, claiming read_only=0.
  • The proxy now sees two servers reporting as healthy and with read_only=0.
  • The proxy has no context. It does not know why both are reporting the same. It is unaware of failovers. All it sees is what the backend MySQL servers report.

Therein lies the problem: you can not trust multiple servers (MySQL backends) to deterministically pick a leader (the master) without them collaborating on some elaborate consensus communication.

A non planned failover illustration #3

Master M box is overloaded, issuing too many connections for incoming connections.

Our tool decides to failover.

  • And doesn’t need to run any hooks.

The proxy:

  • Notices M fails health checks (select @@global.read_only does not respond because of the load).
  • Notices R reports healthy and with read_only=0.
  • Routes all traffic to R.
  • Shortly followed by M recovering (since no more writes are sent its way), claiming read_only=0.
  • The proxy now sees two servers reporting as healthy and with read_only=0.

Again, the proxy has no context, and neither do M and R, for that matter. The context (the fact we failed over from M to R) was known to our failover tool, but was lost along the way.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • M is available and responsive, we set it to read_only=1.
  • We set R to read_only=0.
  • All new connections route to R.
  • We should also instruct our Proxy to kill all previous connections to M.

This works very nicely.

Discussion

There is a substantial risk to this method. Correlation between failover and network partitioning/load (illustrations #2 and #3) is reasonable.

The root of the problem is that we expect individual servers to resolve conflicts without speaking to each other: we expect the MySQL servers to correctly claim “I’m the master” without context.

We then add to that problem by using the proxy to “pick a side” without giving it any context, either.

Sample orchestrator configuration

By way of discouraging use of this method I do not present an orchestrator configuration file.

All posts in this series

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-4-proxy-heuristics/feed 0 7867
MySQL master discovery methods, part 3: app & service discovery https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-3-app-service-discovery https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-3-app-service-discovery#respond Tue, 08 May 2018 08:02:19 +0000 https://shlomi-noach.github.io/blog/?p=7865 This is the third in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

App & service discovery

Part 1 and part 2 presented solutions where the app remained ingorant of master’s identity. This part takes a complete opposite direction and gives the app ownership on master access.

We introduce a service discovery component. Commonly known are Consul, ZooKeeper, etcd, highly available stores offering key/value (K/V) access, leader election or full blown service discovery & health.

We satisfy ourselves with K/V functionality. A key would be mysql/master/cluster1 and a value would be the master’s hostname/port.

It is the app’s responsibility at all times to fetch the identity of the master of a given cluster by querying the service discovery component, thereby opening connections to the indicated master.

The service discovery component is expected to be up at all times and to contain the identity of the master for any given cluster.

A non planned failover illustration #1

Master M has died. R gets promoted in its place. Our recovery tool:

  • Updates the service discovery component, key is mysql/master/cluster1, value is R‘s hostname.

Clients:

  • Listen on K/V changes, recognize that master’s value has changed.
  • Reconfigure/refresh/reload/do what it takes to speak to new master and to drop connections to old master.

A non planned failover illustration #2

Master M gets network isolated for 10 seconds, during which time we failover. R gets promoted. Our tool (as before):

  • Updates the service discovery component, key is mysql/master/cluster1, value is R‘s hostname.

Clients (as before):

  • Listen on K/V changes, recognize that master’s value has changed.
  • Reconfigure/refresh/reload/do what it takes to speak to new master and to drop connections to old master.
  • Any changes not taking place in a timely manner imply some connections still use old master M.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • App should start connecting to R.

Discussion

The app is the complete owner. This calls for a few concerns:

  • How does a given app refresh and apply the change of master such that no stale connections are kept?
    • Highly concurrent apps may be more difficult to manage.
  • In a polyglot app setup, you will need all clients to use the same setup. Implement same listen/refresh logic for Ruby, golang, Java, Python, Perl and notably shell scripts.
    • The latter do not play well with such changes.
  • How can you validate that the change of master has been detected by all app nodes?

As for the service discovery:

  • What load will you be placing on your service discovery component?
    • I was familiar with a setup where there were so many apps and app nodes and app instances, such that the amount of connections was too much for the service discovery . In that setup caching layers were created, which introduced their own consistency problems.
  • How do you handle service discovery outage?
    • A reasonable approach is to keep using last known master idendity should service discovery be down. This, again, plays better wih higher level applications, but less so with scripts.

It is worth noting that this setup does not suffer from geographical limitations to the master’s identity. The master can be anywhere; the service discovery component merely points out where the master is.

Sample orchestrator configuration

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "KVClusterMasterPrefix": "mysql/master",
  "ConsulAddress": "127.0.0.1:8500",
  "ZkAddress": "srv-a,srv-b:12181,srv-c",
  "PostMasterFailoverProcesses": [
    “/just/let/me/know about failover on {failureCluster}“,
  ],

In the above:

  • If ConsulAddress is specified, orchestrator will update given Consul setup with K/V changes.
  • At 3.0.10, ZooKeeper, via ZkAddress, is still not supported by orchestrator.
  • PostMasterFailoverProcesses is here just to point out hooks are not strictly required for the operation to run.

See orchestrator configuration documentation.

All posts in this series

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-3-app-service-discovery/feed 0 7865
MySQL master discovery methods, part 2: VIP & DNS https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-2-vip-dns https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-2-vip-dns#comments Mon, 07 May 2018 06:46:22 +0000 https://shlomi-noach.github.io/blog/?p=7863 This is the second in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via VIP

In part 1 we saw that one the main drawbacks of DNS discovery is the time it takes for the apps to connect to the promoted master. This is the result of both DNS deployment time as well as client’s TTL.

A quicker method is offered: use of VIPs (Virtual IPs). As before, apps would connect to cluster1-writer.example.net, cluster2-writer.example.net, etc. However, these would resolve to specific VIPs.

Say cluster1-writer.example.net resolves to 10.10.0.1. We let this address float between servers. Each server has its own IP (say 10.20.0.XXX) but could also potentially claim the VIP 10.10.0.1.

VIPs can be assigned by switches and I will not dwell into the internals, because I’m not a network expert. However, the following holds:

  • Acquiring a VIP is a very quick operation.
  • Acquiring a VIP must take place on the acquiring host.
  • A host may be unable to acquire a VIP should another host holds the same VIP.
  • A VIP can only be assigned within a bounded space: hosts connected to the same switch; hosts in the same Data Center or availability zone.

A non planned failover illustration #1

Master M has died, the box had a power failure. R gets promoted in its place. Our recovery tool:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is dead.
  • Connects to R and instructs it to acquire the VIP. Since M is dead there is no objection, and R successfully grabs the VIP.
  • Any new connections immediately route to the new master R.
  • Clients with connections to M cannot connect, issue retries, immediately route to R.

A non planned failover illustration #2

Master M gets network isolated for 30 seconds, during which time we failover. R gets promoted. Our tool:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is network isolated.
  • Connects to R and instructs it to acquire the VIP. Since M is network isolated there is no objection, and R successfully grabs the VIP.
  • Any new connections immediately route to the new master R.
  • Clients with connections to M cannot connect, issue retries, immediately route to R.
  • 30 seconds later M reappears, but no one pays any attention.

A non planned failover illustration #3

Master M box is overloaded. It is not responsive to new connections but may slowly serves existing connections. Our tool decides to failover:

  • Attempts to connect to M so that it can give up the VIP. The attempt fails because M is very loaded.
  • Connects to R and instructs it to acquire the VIP. Unfortunately, M hasn’t given up the VIP and still shows up as owning it.
  • All existing and new connections keep on routing to M, even as R is the new master.
  • This continues until some time has passed and we are able to manually grab the VIP on R, or until we forcibly network isolate M or forcibly shut it down.

We suffer outage.

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R.

  • M is available and responsive, we ask it to give up the VIP, which is does.
  • We ask R to grab the VIP, which it does.
  • All new connections route to R.
  • We may still see old connections routing to M. We can forcibly network isolate M to break those connections so as to cause reconnects, or restart apps.

Discussion

As with DNS discovery, the apps are never told of the change. They may be forcibly restarted though.

Grabbing a VIP is a quick operation. However, consider:

  • It is not guaranteed to succeed. I have seen it fail in various situations.
  • Since releasing/acquiring of VIP can only take place on the demoted/promoted servers, respectively, our failover tool will need to:
    • Remote SSH onto both boxes, or
    • Remote exec a command on those boxes
  • Moreover, the tool will do so sequentially. First we must connect to demoted master to give up the VIP, then to promoted master to acquire it.
  • This means the time at which the new master grabs the VIP depends on how long it takes to connect to the old master to give up the VIP. Seeing that the old master had trouble causing failover, we can expect correlation to not being able to connect to old master, or seeing slow connect time.
  • An alternative exists, in the form of Pacemaker. Consider Percona’s Replication Manager guide for more insights. Pacemaker provides a single point of access from where the VIP can be moved, and behind the scenes it will communicate to relevant nodes. This makes it simpler on the failover solution configuration.
  • We are constrained by physical location.
  • It is still possible for existing connection to keep on communicating to the demoted master, even while the VIP has been moved.

VIP & DNS combined

Per physical location, we could choose to use VIP. But should we need to failover to a server in another DC, we could choose to combine the DNS discovery, discussed in part 1.

We can expect to see faster failover time on a local physical location, and longer failover time on remote location.

Sample orchestrator configuration

What kind of remote exec method will you have? In this sample we will use remote (passwordless) SSH.

An orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PostMasterFailoverProcesses": [
    "ssh {failedHost} 'sudo ifconfig the-vip-interface down'",
    "ssh {successorHost} 'sudo ifconfig the-vip-interface up'",
    "/do/what/you/gotta/do to apply dns change for {failureClusterAlias}-writer.example.net to {successorHost}"
  ],  

In the above:

  • Replace SSH with any remote exec method you may use.
    • But you will need to set up the access/credentials for orchestrator to run those operations.
  • Replace ifconfig with service quagga stop/start or any method you use to release/grab VIPs.

See orchestrator configuration documentation.

All posts in this series

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-2-vip-dns/feed 3 7863
MySQL master discovery methods, part 1: DNS https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-1-dns https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-1-dns#respond Thu, 03 May 2018 10:56:46 +0000 https://shlomi-noach.github.io/blog/?p=7861 This is the first in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.

These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator specific configuration/advice, and point out where cross DC orchestrator/raft setup plays part in discovery itself, but for the most part any recovery tool such as MHA, replication-manager, severalnines or other, is applicable.

We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).

Master discovery via DNS

In DNS master discovery applications connect to the master via a name that gets resolved to the master’s box. By way of example, apps would target the masters of different clusters by connecting to cluster1-writer.example.net, cluster2-writer.example.net, etc. It is up for the DNS to resolve those names to IPs.

Issues for concern are:

  • You will likely have multiple DNS servers. How many? In which data centers / availability zones?
  • What is your method for distributing/deploying a name change to all your DNS servers?
  • DNS will indicate a TTL (Time To Live) such that clients can cache the IP associated with a name for a given number of seconds. What is that TTL?

As long as things are stable and going well, discovery via DNS makes sense. Trouble begins when the master fails over. Assume M used to be the master, but got demoted. Assume R used to be a replica, that got promoted and is now effectively the master of the topology.

Our failover solution has promoted R, and now needs to somehow apply the change, such that the apps connect to R instead of M. Some notes:

  • The apps need not change configuration. They should still connect to cluster1-writer.example.net, cluster2-writer.example.net, etc.
  • Our tool instructs DNS servers to make the change.
  • Clients will still resolve to old IP based on TTL.

A non planned failover illustration #1

Master M dies. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Say TTL is 60 seconds. Say update to all DNS servers takes 10 seconds. We will have between 10 and 70 seconds until all clients connect to the new master R.

During that time they will continue to attempt connecting to M. Since M is dead, those attempts will fail (thankfully).

A non planned failover illustration #2

Master M gets network isolated for 30 seconds, during which time we failover. R gets promoted. Our tool instructs all DNS servers on all DCs to update the IP address.

Again, assume TTL is 60 seconds. As before, it will take between 10 and 70 seconds for clients to learn of the new IP.

Clients who will require between 40 and 70 seconds to learn of the new IP will, however, hit an unfortunate scenario: the old master M reappears on the grid. Those clients will successfully reconnect to M and issue writes, leading to data loss (writes to M no longer replicate anywhere).

Planned failover illustration

We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R. We need to change DNS records. Since this is a planned failover, we set the old master to read_only=1, or even better, we network isolated it.

And still our clients take 10 to 70 seconds to recognize the new master.

Discussion

The above numbers are just illustrative. Perhaps DNS deployment is quicker than 10 seconds. You should do your own math.

TTL is a compromise which you can tune. Setting lower TTL will mitigate the problem, but will cause more hits on the DNS servers.

For planned takeover we can first deploy a change to the TTL, to, say, 2sec, wait 60sec, then deploy the IP change, then restore TTL to 60.

You may choose to restart apps upon DNS deployment. This emulates apps’ awareness of the change.

Sample orchestrator configuration

orchestrator configuration would look like this:

  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PostMasterFailoverProcesses": [
    "/do/what/you/gotta/do to apply dns change for {failureClusterAlias}-writer.example.net to {successorHost}"
  ],  

In the above:

  • ApplyMySQLPromotionAfterMasterFailover instructs orchestrator to set read_only=0; reset slave all on promoted server.
  • PostMasterFailoverProcesses really depends on your setup. But orchestrator will supply with hints to your scripts: identity of cluster, identity of successor.

See orchestrator configuration documentation.

All posts in this series

]]>
https://shlomi-noach.github.io/blog/mysql/mysql-master-discovery-methods-part-1-dns/feed 0 7861