November 18, 2015
Having attended a talk (as part of the MariaDB Developer Meeting in Amsterdam) about recent developments of MaxScale in executing automated failovers, here are some (late) observations of mine.
I will begin by noting that the project is stated to be pre-production, and so of course none of the below are complaints, but rather food for thought, points for action and otherwise recommendations.
Some functionality of the MaxScale failover is also implemented by orchestrator, which I author. Orchestrator was built in production environments by and for operational people. In this respect it has gained many insights and had to cope with many real-world cases, special cases & Murphy's law cases. This post compares logic, feature set and capabilities of the two where relevant. To some extent the below will read as "hey, I've already implemented this; shame to re-implement the same", and indeed I think that way; but it wouldn't be the first time a code of mine would just be re-implemented by someone else and I've done the same, myself.
I'm describing the solution the way I understood it from the talk. If I'm wrong on any account I'm happy to be corrected via comments below. Edit: please see comment by Dipti Joshi
The idea is that MaxScale operates as a proxy to your topology. You do not connect to your master directly, but rather through MaxScale. Thus, MaxScale acts as a proxy to your master.
The next phase is that MaxScale would also auto-detect master failure, fix the topology for you, promote a new master, and will have your application unaware of all the complexity and without the app having to change setup/DNS/whatever. Of course some write downtime is implied.
Now for some breakdown.
The detection of a dead master, the check by which a failover is initiated, is based on MaxScale not being able to query the master. This calls for some points for consideration:
- Typically, I would see "I can't connect to the master therefore failover" as too hysterical, and the basis for a lot of false positives.
- However, since in the discussed configuration MaxScale is the only access point to the master, the fact MaxScale cannot connect to the master means the master is inaccessible de-facto.
- In light of the above, the decision makes sense - but I still hold that it would make false positives.
- I'm unsure (I think not; can anyone comment?) if MaxScale would make multiple attempts over time and only reach the conclusion after X successive failures. This would reduce the false positives.
- I'm having a growing dislike to a "check for 4 successive times then alert/failover" Nagios-style behavior. Orchestrator takes a different approach where it recognizes a master's death by not being able to connect to the master as well as being able to connect to 1st tier slaves, check their status and observe that they're unable to connect to the master as well. See What makes a MySQL server failure/recovery case?. This approach still calls for further refinement (what if the master is temporarily deadlocked? Is this a failover or not?).
Continue Reading »
November 13, 2015
We are used to issue queries with an IN clause of the form:
... where state in ('started', 'completed', 'failed') ...
However I've had a few cases where I used an inverted format. Here's one use case followed by an inverted IN clause.
Dynamic query building
Say we have this function:
Which, based on whether given clusterName is empty or not, would return list of all lagging slaves, or only those in the given cluster, respectively: Continue Reading »
October 14, 2015
Being a stateful, centralized datastore, MySQL can serve in negotiating leadership: a mechanism to elect a single service out of multiple services; moreover, a mechanism to promote a new leader should the existing leader cease to function.
What of Zookeeper?
Zookeeper makes for an excellent leader election mechanism. This is one of the most recognized uses for Zookeeper. It has HA via multiple nodes & quorum, ephemeral nodes, all you need. To achieve similar benefits with MySQL you'd need to use Galera or NDB Cluster; so why not use Zk?
The use case at hand is orchestrator, a multi-node, mostly stateless service that happens to use MySQL as backend datastore. Ir relies on MySQL to exist in backend. It already expects it to be there. If the MySQL server is down, so is the service, effectively. In such case it doesn't hurt adding another dependency on MySQL; this does not reduce HA. You need to take care of MySQL HA anyhow so there's no additional cost. In fact, going to Zookeeper makes the additional cost as you introduce a new component to the system that can be avoided.
Terms of the solution
Our proposed solution offers:
- Single leader election out of multiple nodes
- Leader actively reaffirms its leadership periodically
- Timeout based re-election: decision to re-elect new leader based on the fact current leader has not reaffirmed its leadership over X seconds
- A way to forcibly assume leadership for a specific node
- A way to forcibly call for re-elections by demoting existing leader
- A node/service can easily tell whether it's the leader or not
- Anyone can tell who the leader is
The solution is composed of a single table and a set of queries which implement the above offers. We assume a service can uniquely identify itself; this is easy to achieve:
- If services are running from different hosts (as should be the case, this is service HA), use hostname for ID
- But what if the service restarts? Are you good with calling this "the same service" or is this now a new service running on the same host?
- In such case use combination of hostname & OS process ID
- Or generate a random token upon startup
- Or use startup timestamp
Whichever solution you pick, make sure it is human readable, such that it is easy to tell which service is the leader. This helps operations. We note this as service_id
The following table will have a single row; the service_id in that row is the active leader.
CREATE TABLE service_election (
anchor tinyint(3) unsigned NOT NULL,
service_id varchar(128) NOT NULL,
last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (anchor)
Continue Reading »
October 8, 2015
Following up on New features I'd like to see in MySQL 5.8, here are statements I would like to see in MySQL 5.8:
- ENABLE EVENTS;
When promoting a slave to master, I want to be able to enable all those events that are in SLAVESIDE_DISABLED state. Today I script an iteration over the events an constructing the ALTER EVENT...ENABLE statement one by one. Just activate those!
- SKIP GTID TRANSACTION;
I have a transaction on slave that I want to skip, and there's GTID. The sequence of
is just something I don't want to do. To compute the GTID_NEXT; to open a transaction; to use session variables; this may seem straightforward to import the above from a shell script, but calculating the next GTID is not entirely trivial; issuing the above from your programming language makes for a weird "all these have to be in the same session AND you're going to do a transaction meanwhile". With golang it's actually a problem.
Make it simple for me. I'm willing to do the STOP/START SLAVE.
- BINLOG ENTRY '<arbitrary text>';
Very selfishly, I want to be able to inject a comment into the binary log, of arbitrary text. I want this comment to appear in SBR format, as if it were a DDL.
My selfish reason: injection of Pseudo-GTID. But I can see various other use cases, such as application level injection of app-logic checkpointing; chef/glu injection of "code deployed at this time"; application injection of "daily audit done to this point". This is too cool and too easy to skip.
- SHOW RELAY LOGS;
Similar to SHOW BINARY LOGS;
- PURGE RELAY LOGS TO '...';
Similar to PURGE BINARY LOGS TO '...';
It's time relay logs stopped being 2nd class citizens.
- SHOW NONBLOCKING [GLOBAL|SESSION] STATUS;
Issue a SHOW GLOBAL|SESSION STATUS query that only shows those variables for which it does not need to block. i.e. this is a safe, fast "show me everything you've got that I won't need to pay for".
Yes, yes, statements are also features, I know.
October 7, 2015
Following up on Morgan Tocker's What would you like to see in MySQL 5.8?, having attended and participated at the brainstorming at Percona Live Amsterdam, and publishing this post while failing to comply with any of Morgan's suggested media, these are the features I would like to see in MySQL 5.8:
- Dynamicly enable/disable log-bin and log-slave-updates
Today, when changing chef/puppet role of a server from a simple slave to an intermediate master and vice versa, a MySQL restart is required. This is a very big pain which makes replication automation complex, not to mention warmup times.
I want to be able to execute a query that is nice, i.e has lower priority; will not consume all resources; will stall/throttle so as to allow other queries to complete. Luis asked and I said this could be on a per statement basis, e.g. add a SQL_NICE query hint. But I'm unsure that would be the correct behavior. It also makes sense to do so on a per connection basis (perhaps provide connection attributed to hint niceness?).
- Online-ier ALTER TABLE. I would in particular want it to apply the nice feature, above. Otherwise throttle by user defined metrics.
- Online-ier ALTER TABLE in replication stream. Can the slaves run the ALTER statement in parallel?
- Re-Group Commit: in MTS, and when intermediate masters involved, copy+paste the group commit as applied on master as working downstream. I suspect this is easily achievable. The result: same parallelism for replicating slaves in all levels, whether they replicate directly from master or from 2nd, 3rd tier intermediate masters. Today parallelism decreases as we go downstream.
- Global user-defined-variables. I want to be able to define arbitrary (global) variables that I can later query via SELECT @@global.arbitrary. This would be similar to HTML 5's "data-*" attributes. I often wish I could tell & ask MySQL my puppet role; or the server status (is it live? Is it offline? Does it belong to a specific pool? etc.). Similar to "loose-*" syntax, this could be a "data-*" or "user-*" name prefix system.
I will follow up on new statements I would like to see in MySQL 5.8.
The brainstorming session at PerconaLive, I should note, was pure joy, and apart from getting two nice furry dolphins I enjoyed the engagement, the diversity of ideas, and the fact Oracle engineers (Mark in particular) were very busy taking notes or otherwise openly discussing the viability of some requested features.
September 18, 2015
Registration for the Percona Live Amsterdam conference community dinner at Booking.com is ongoing. Please note the following:
- By Monday noon the kitchen will make the necessary purchasing. At this time we will have to give them a number, which we will not exceed.
- The number we will give them is $(number of registrants Monday 12:00) + X, X being a constant
- Those X tickets will be available until Tuesday 12:00
- After which the registration is closed. We wish to avoid throwing away food, on one hand, as well as respect those who have reserved place and avoid running out of food, on the other.
Entry to the Booking.com building will be made available via Security personnel to those people who will be listed by the eventbrite registration. We will not be able to have last moment registrants; we will not collect money at the entrance; no credit cards accepted at the doorway.
If you'd like to attend the community dinner, please register now!
FYI Percona has arranged for boats to make the travel from the conference venue to Booking.com (no registration required, but room limited on those boats as well).
Now ain't I being dramatic here. So happy to see everyone here in Amsterdam in a few days!
September 15, 2015
In a week's time I'll be speaking at Percona Live Amsterdam. I will be presenting:
Managing and Visualizing your replication topologies with Orchestrator
23 September 4:20PM
This talk will present orchestrator, on which I've been working for the last year and a half, originally at Outbrain and now at Booking.com.
I will show off what orchestrator can do to manage your replication topologies. From visualization, through topology refactoring to automated crash recoveries, orchestrator today plays a key role at Booking.com infrastructure, at scale (oh I love using these words).
You can expect an outrageous demo, a visual walkthrough, some command line examples, and a lot on the logic and mechanisms behind orchestrator. I will present the difficult problems orchestrator covers.
orchestrator is free and open source, and is built to be as generic as possible; it is known to be used by multiple well known companies these days, so please join the party.
With that, I conclude with the almighty motto: Continue Reading »
September 2, 2015
Orchestrator is really growing. And the amount of users (DBAs, sys admins) using it is growing. Which gives me a lot of immediate feedback in the form of "Look, there's just too many options to move slaves around! Which ones should we use?"
TL;DR look at the two visualized commands below
They are enough
The "smart" commands to end all commands
So all relocation commands are important, and give you fine-grained, pin-pointed control of the method of topology refactoring. However, most of the time you just want to move those servers around. Which is why there's a new "smart" mode which support these two commands, which you should be happy using:
- relocate: move a single slave to another position
- relocate-slaves: move all/some slaves of some server to another position.
What makes these commands Smart? You can move slaves around from anywhere to anywhere. And orchestrator figures out the bast execution path. If possible, it uses GTID. Not possible? Is Pseudo-GTID available? Great, using Pseudo-GTID. Oh, are there binlog servers involved? Really simple, use them. None of the above? Orchestrator will use "standard" binlog file:pos math (with limitations). Orchestrator will even figure out if multiple steps are necessary and will combine any of the above.
So you don't have to remember all the possible ways and options. The visual cheatsheet now boils down to these two:
Let's take a slightly deeper look Continue Reading »
September 1, 2015
Orchestrator 1.4.340 is released. Not quite competing with the MySQL latest changelog, and as I haven't blogged about orchestrator featureset in a while, this is a quick listing of orchestrator features available since my last publication:
- Supports GTID (Oracle & MariaDB)
- GTID still not being used in automated recovery -- in progress.
- enable-gtid, disable-gtid, skip-query for GTID commands
- Supports binlog servers (MaxScale)
- Discovery & operations on binlog servers
- Understanding slave repositioning in a binlog-server architecture
- Smart mode: relocate & relocate-below commands (or Web/GUI drag-n-drop) let orchestrator figure out the best way of slave repositioning. Orchestrator picks from GTID, Pseudo GTID, binlog servers, binlog file:pos math (and more) options, or combinations of the above. Fine grained commands still there, but mostly you won't need them.
- Crash recoveries (did you know orchestrator does that?):
- For intermediate master recovery: improved logic in picking the best recovery plan (prefer in-DC, prefer promoting local slave, supporting binlog server topologies, ...)
- For master recovery: even better slave promotion; supports candidate slaves (prefer promoting such slaves); supports binlog server shared topologies
- Better auditing and logging of recovery cases
- Better analysis of crash scenarios, also in the event of lost VIPs, hanging connections; emergent checks in crash suspected scenarios
- recover-lite: do all topology-only recovery steps, without invoking external processes
- Better browser support: used to only work on Firefox and Chrome (and the latter has had issues), the Web UI should now work well on all browsers, at the cost of reduced d3 animation. More work still in progress.
- Faster, more parallel, less blocking operations on all counts; removed a lots of serialized code; less locks.
- Web enhancements
- More verbose drag-n-drop (operation hint; color hints)
- Drag-n-drop for slaves-of-a-server
- Replication/crash analysis dashboard
- Pools: orchestrator can be told about instance-to-pool association (submit-pool-instances command)
- And can then present pool status (web)
- Or pool hints within topologies (web)
- Or queried for all pools (cluster-pool-instances command)
- Supports MySQL 5.7 (tested with 5.7.8)
- Configurable graphite path for metrics
- --noop flag; does all the work except for actually changing master on slaves. Shows intentions.
- Web (or cli which-cluster-osc-slaves command) provide list of control slaves to use in pt-osc operation
- hostname-unresolve: force orchestrator to unresolve a fqdn into VIP/CNAME/... when issuing a CHANGE MASTER TO
- 3rd party contributions (hey, thanks!) include:
- More & better SSL support
- Vagrant templates
- For developers:
- Orchestrator now go-gettable. Just go get github.com/outbrain/orchestrator
- Improved build script; supports more architectures
Continue Reading »