State of automated recovery via Pseudo-GTID & Orchestrator @ Booking.com

This post sums up some of my work on MySQL resilience and high availability at Booking.com by presenting the current state of automated master and intermediate master recoveries via Pseudo-GTID & Orchestrator.

Booking.com uses many different MySQL topologies, of varying vendors, configurations and workloads: Oracle MySQL, MariaDB, statement based replication, row based replication, hybrid, OLTP, OLAP, GTID (few), no GTID (most), Binlog Servers, filters, hybrid of all the above.

Topologies size varies from a single server to many-many-many. Our typical topology has a master in one datacenter, a bunch of slaves in same DC, a slave in another DC acting as an intermediate master to further bunch of slaves in the other DC. Something like this, give or take:

booking-topology-sample

However as we are building our third data center (with MySQL deployments mostly completed) the graph turns more complex.

Two high availability questions are:

  • What happens when an intermediate master dies? What of all its slaves?
  • What happens when the master dies? What of the entire topology?

This is not a technical drill down into the solution, but rather on overview of the state. For more, please refer to recent presentations in September and April.

At this time we have:

  • Pseudo-GTID deployed on all chains
  • Pseudo-GTID based automated failover for intermediate masters on all chains
  • Pseudo-GTID based automated failover for masters on roughly 30% of the chains.
    • The rest of 70% of chains are set for manual failover using Pseudo-GTID.

Pseudo-GTID is in particular used for:

  • Salvaging slaves of a dead intermediate master
  • Correctly grouping and connecting slaves of a dead master
  • Routine refactoring of topologies. This includes:
    • Manual repointing of slaves for various operations (e.g. offloading slaves from a busy box)
    • Automated refactoring (for example, used by our automated upgrading script, which consults with orchestrator, upgrades, shuffles slaves around, updates intermediate master, suffles back…)
  • (In the works), failing over binlog reader apps that audit our binary logs.

Continue reading » “State of automated recovery via Pseudo-GTID & Orchestrator @ Booking.com”

Orchestrator & Pseudo-GTID for binlog reader failover

One of our internal apps at Booking.com audits changes to our tables on various clusters. We used to use tungsten replicator, but have since migrated onto our own solution.

We have a binlog reader (uses open-replicator) running on a slave. It expects Row Based Replication, hence our slave runs with log-slave-updates, binlog-format=’ROW’, to translate from the master’s Statement Based Replication. The binlog reader reads what it needs to read, audits what it needs to audit, and we’re happy.

However what happens if that slave dies?

In such case we need to be able to point our binlog reader to another slave, and it needs to be able to pick up auditing from the same point.

This sounds an awful lot like slave repointing in case of master/intermediate master failure, and indeed the solutions are similar. However our binlog reader is not a real MySQL server and does not understands replication. It does not really replicate, it just parses binary logs.

We’re also not using GTID. But we are using Pseudo-GTID. As it turns out, the failover solution is already built in by orchestrator, and this is how it goes:

Normal execution

Our binlog app reads entries from the binary log. Some are of interest for auditing purposes, some are not. An occasional Pseudo-GTID entry is found, and is being stored to ZooKeeper tagged as  “last seen and processed Pseudo-GTID”.

Upon slave failure

We recognize the death of a slave; we have other slaves in the pool; we pick another. Now we need to find the coordinates from which to carry on.

We read our “last seen and processed Pseudo-GTID”. Say it reads:

drop view if exists `meta`.`_pseudo_gtid_hint__asc:56373F17:00000000012B1C8B:50EC77A1`

. We now issue:

$ orchestrator -c find-binlog-entry -i new.slave.fqdn.com --pattern='drop view if exists `meta`.`_pseudo_gtid_hint__asc:56373F17:00000000012B1C8B:50EC77A1`'

The output of such command are the binlog coordinates of that same entry as found in the new slave’s binlogs:

binlog.000148:43664433

Pseudo-GTID entries are only injected once every few seconds (5 in our case). Either: Continue reading » “Orchestrator & Pseudo-GTID for binlog reader failover”

Orchestrator visual cheatsheet, TL;DR the “smart” way

Orchestrator is really growing. And the amount of users (DBAs, sys admins) using it is growing. Which gives me a lot of immediate feedback in the form of “Look, there’s just too many options to move slaves around! Which ones should we use?”

TL;DR look at the two visualized commands below

They are enough

The “smart” commands to end all commands

So all relocation commands are important, and give you fine-grained, pin-pointed control of the method of topology refactoring. However, most of the time you just want to move those servers around. Which is why there’s a new “smart” mode which support these two commands, which you should be happy using:

  • relocate: move a single slave to another position
  • relocate-slaves: move all/some slaves of some server to another position.

What makes these commands Smart? You can move slaves around from anywhere to anywhere. And orchestrator figures out the bast execution path. If possible, it uses GTID. Not possible? Is Pseudo-GTID available? Great, using Pseudo-GTID. Oh, are there binlog servers involved? Really simple, use them. None of the above? Orchestrator will use “standard” binlog file:pos math (with limitations). Orchestrator will even figure out if multiple steps are necessary and will combine any of the above.

So you don’t have to remember all the possible ways and options. The visual cheatsheet now boils down to these two:

orchestrator-cheatsheet-visualized-relocate

orchestrator-cheatsheet-visualized-relocate-slaves

Let’s take a slightly deeper look Continue reading » “Orchestrator visual cheatsheet, TL;DR the “smart” way”

Orchestrator 1.4.340: GTID, binlog servers, Smart Mode, failovers and lots of goodies

Orchestrator 1.4.340 is released. Not quite competing with the MySQL latest changelog, and as I haven’t blogged about orchestrator featureset in a while, this is a quick listing of orchestrator features available since my last publication:

  • Supports GTID (Oracle & MariaDB)
    • GTID still not being used in automated recovery — in progress.
    • enable-gtid, disable-gtid, skip-query for GTID commands
  • Supports binlog servers (MaxScale)
    • Discovery & operations on binlog servers
    • Understanding slave repositioning in a binlog-server architecture
  • Smart mode: relocate & relocate-below commands (or Web/GUI drag-n-drop) let orchestrator figure out the best way of slave repositioning. Orchestrator picks from GTID, Pseudo GTID, binlog servers, binlog file:pos math (and more) options, or combinations of the above. Fine grained commands still there, but mostly you won’t need them.
  • Crash recoveries (did you know orchestrator does that?):
    • For intermediate master recovery: improved logic in picking the best recovery plan (prefer in-DC, prefer promoting local slave, supporting binlog server topologies, …)
    • For master recovery: even better slave promotion; supports candidate slaves (prefer promoting such slaves); supports binlog server shared topologies
    • Better auditing and logging of recovery cases
    • Better analysis of crash scenarios, also in the event of lost VIPs, hanging connections; emergent checks in crash suspected scenarios
    • recover-lite: do all topology-only recovery steps, without invoking external processes
  • Better browser support: used to only work on Firefox and Chrome (and the latter has had issues), the Web UI should now work well on all browsers, at the cost of reduced d3 animation. More work still in progress.
  • Faster, more parallel, less blocking operations on all counts; removed a lots of serialized code; less locks.
  • Web enhancements
    • More verbose drag-n-drop (operation hint; color hints)
    • Drag-n-drop for slaves-of-a-server
    • Replication/crash analysis dashboard
  • Pools: orchestrator can be told about instance-to-pool association (submit-pool-instances command)
    • And can then present pool status (web)
    • Or pool hints within topologies (web)
    • Or queried for all pools (cluster-pool-instances command)
  • Other:
    • Supports MySQL 5.7 (tested with 5.7.8)
    • Configurable graphite path for metrics
    • –noop flag; does all the work except for actually changing master on slaves. Shows intentions.
    • Web (or cli which-cluster-osc-slaves command) provide list of control slaves to use in pt-osc operation
    • hostname-unresolve: force orchestrator to unresolve a fqdn into VIP/CNAME/… when issuing a CHANGE MASTER TO
  • 3rd party contributions (hey, thanks!) include:
    • More & better SSL support
    • Vagrant templates
  • For developers:
    • Orchestrator now go-gettable. Just go get github.com/outbrain/orchestrator
    • Improved build script; supports more architectures

Continue reading » “Orchestrator 1.4.340: GTID, binlog servers, Smart Mode, failovers and lots of goodies”

Pseudo GTID, ASCENDING

Pseudo GTID is a technique where we inject Globally Unique entries into MySQL, gaining GTID abilities without using GTID. It is supported by orchestrator and described in more detail here, here and here.

Quick recap: we can join two slaves to replicate from one another even if they never were in parent-child relationship, based on our uniquely identifiable entries which can be found in the slaves’ binary logs or relay logs. Having Pseudo-GTID injected and controlled by us allows us to optimize failovers into quick operations, especially where a large number of server is involved.

Ascending Pseudo-GTID further speeds up this process for delayed/lagging slaves.

Recap, visualized

(but do look at the presentation):

pseudo-gtid-quick

  1. Find last pseudo GTID in slave’s binary log (or last applied one in relay log)
  2. Search for exact match on new master’s binary logs
  3. Fast forward both through successive identical statements until end of slave’s applied entries is reached
  4. Point slave into cursor position on master

What happens if the slave we wish to reconnect is lagging? Or perhaps it is a delayed replica, set to run 24 hours behind its master?

The naive approach would expand bullet #2 into:

  • Search for exact match on master’s last binary logs
  • Unfound? Move on to previous (older) binary log on master
  • Repeat

The last Pseudo-GTID executed by the slave was issued by the master over 24 hours ago. Suppose the master generates one binary log per hour. This means we would need to full-scan 24 binary logs of the master where the entry will not be found; to only be matched in the 25th binary log (it’s an off-by-one problem, don’t hold the exact number against me).

Ascending Pseudo GTID

Since we control the generation of Pseudo-GTID, and since we control the search for Pseudo-GTID, we are free to choose the form of Pseudo-GTID entries. We recently switched into using Ascending Pseudo-GTID entries, and this works like a charm. Consider these Pseudo-GTID entries: Continue reading » “Pseudo GTID, ASCENDING”

Speaking at Percona Live: Pseudo GTID and Easy Replication Topology Management

In two weeks time I will be presenting Pseudo GTID and Easy Replication Topology Management at Percona Live. From the time I submitted the proposal a LOT has been developed, experimented, deployed and used with both Pseudo GTID and with orchestrator. In my talk I will:

  • Suggest that you skip the “to GTID or not to GTID” question and go for the lightweight Pseudo GTID
  • Show how Pseudo GTID is used in production to recover from various replication failures and server crashes
  • Do an outrageous demonstration
  • Tell you about 50,000 successful experiments and tests done in production
  • Show off orchestrator and its support for Pseudo GTID, including automated crash analysis and recovery mechanism.

I will further show how the orchestrator tooling makes for a less restrictive, more performant, less locking, non-intrusive, trusted and lightweight replication topology management solution. Continue reading » “Speaking at Percona Live: Pseudo GTID and Easy Replication Topology Management”

Speaking at FOSDEM: Pseudo GTID and easy replication management

This coming Sunday I’ll be presenting Pseudo GTID and easy replication management at FOSDEM, Brussels.

There’s been a lot of development on Pseudo GTID these last few weeks. In this talk I’ll show you how you can use Pseudo GTID instead of “normal” GTID to easily repoint your slaves, recover from intermediate master failure, promote slaves to masters as well as emply crash safe replication without crash safe replication.

Moreover, I will show how you can achieve all the above with less constraints than GTID, and for bulk operations — with less overhead and in shorter time. You will also see that Pseudo GTID is a non intrusive solution which does not require you to change anything in your topologies.

Moral: I’ll try and convince you to drop your plans for using GTID in favor of Pseudo GTID.

We will be employing Pseudo GTID as the basis for high availability and failover at Booking.com on many topologies, and as a safety mechanism in other topologies where we will employ Binlog servers.

Pseudo gtid & easy replication topology management from Shlomi Noach

Semi-automatic slave/master promotion via Pseudo GTID

Orchestrator release 1.2.7-beta now supports semi-automatic slave promotion to master upon master death, via Pseudo GTID.

orchestrator-make-master-highlighted

When the master is dead, orchestrator automatically picks the most up-to-date slaves and marks them as “Master candidates”. It allows a /api/make-master call on such a slave (S), in which case it uses Pseudo GTID to enslave its siblings, and set S as read-only = 0. All we need to do is click the “Make master” button. Continue reading » “Semi-automatic slave/master promotion via Pseudo GTID”

Refactoring replication topologies with Pseudo GTID: a visual tour

Orchestrator 1.2.1-beta supports Pseudo GTID (read announcement): a means to refactor the replication topology and connect slaves even without direct relationship; even across failed servers. This post illustrates two such scenarios and shows the visual way of mathcing/re-synching slaves.

Of course, orchestrator is not just a GUI tool; anything done with drag-and-drop is also done via web API (in fact, the drag-and-drop invoke the web API) as well as via command line. I’m mentioning this as this is the grounds for failover automation planned for the future.

Scenario 1: the master unexpectedly dies

The master crashes and cannot be contacted. All slaves are stopped as effect, but each in a different position. Some managed to salvage relay logs just before the master dies, some didn’t. In our scenario, all three slaves are at least caught up with the relay log (that is, whatever they managed to pull through the network, they already managed to execute). So they’re otherwise sitting idle waiting for something to happen. Well, something’s about to happen.

orchestrator-pseudo-gtid-dead-master

Note the green “Safe mode” button to the right. This means operation is through calculation of binary log files & positions with relation to one’s master. But the master is now dead, so let’s switch to adventurous mode; in this mode we can drag and drop slaves onto instances normally forbidden. At this stage the web interface allows us to drop a slave onto its sibling or any of its ancestors (including its very own parent, which is a means of reconnecting a slave with its parent). Anyhow:

orchestrator-pseudo-gtid-dead-master-pseudo-gtid-mode

We notice that orchestrator is already kind enough to say which slave is best candidate to be the new master (127.0.0.1:22990): this is the slave (or one of the slaves) with most up-to-date data. So we choose to take another server and make it a slave of 127.0.0.1:22990: Continue reading » “Refactoring replication topologies with Pseudo GTID: a visual tour”

Orchestrator 1.2.1 BETA: Pseudo GTID support, reconnect slaves even after master failure

orchestrator 1.2.1 BETA is released. This version supports Pseudo GTID, and provides one with powerful refactoring of one’s replication topologies, even across failed instances.

orchestrator-pseudo-gtid-dead-relay-master-begin-drag

Depicted: moving a slave up the topology even though its local master is inaccessible

Enabling Pseudo-GTID

You will need to:

  1. Inject a periodic unique entry onto your binary logs
  2. Configure orchestrator to recognize said entry.

Pseudo GTID injection example

We will use the event scheduler (must be enabled) to inject an entry every 10 seconds, recognized both in statement-based and row-based replication.

create database if not exists meta;

drop event if exists meta.create_pseudo_gtid_view_event;

delimiter ;;
create event if not exists
  meta.create_pseudo_gtid_view_event
  on schedule every 10 second starts current_timestamp
  on completion preserve
  enable
  do
    begin
      set @pseudo_gtid := uuid();
      set @_create_statement := concat('create or replace view meta.pseudo_gtid_view as select \'', @pseudo_gtid, '\' as pseudo_gtid_unique_val from dual');
      PREPARE st FROM @_create_statement;
      EXECUTE st;
      DEALLOCATE PREPARE st;
    end
;;

delimiter ;

set global event_scheduler := 1;

Make sure to enable event_scheduler in your my.cnf config file.

An entry in the binary logs would look like this: Continue reading » “Orchestrator 1.2.1 BETA: Pseudo GTID support, reconnect slaves even after master failure”