Percona Live Amsterdam: Community Dinner, last updates

Registration for the Percona Live Amsterdam conference community dinner at Booking.com is ongoing. Please note the following:

  • By Monday noon the kitchen will make the necessary purchasing. At this time we will have to give them a number, which we will not exceed.
  • The number we will give them is $(number of registrants Monday 12:00) + X, X being a constant
  • Those X tickets will be available until Tuesday 12:00
  • After which the registration is closed. We wish to avoid throwing away food, on one hand, as well as respect those who have reserved place and avoid running out of food, on the other.

Entry to the Booking.com building will be made available via Security personnel to those people who will be listed by the eventbrite registration. We will not be able to have last moment registrants; we will not collect money at the entrance; no credit cards accepted at the doorway.

If you’d like to attend the community dinner, please register now!

FYI Percona has arranged for boats to make the travel from the conference venue to Booking.com (no registration required, but room limited on those boats as well).

Now ain’t I being dramatic here. So happy to see everyone here in Amsterdam in a few days!

Speaking at Percona Live Amsterdam: Orchestrator

In a week’s time I’ll be speaking at Percona Live Amsterdam. I will be presenting:

Managing and Visualizing your replication topologies with Orchestrator
23 September 4:20PM

This talk will present orchestrator, on which I’ve been working for the last year and a half, originally at Outbrain and now at Booking.com.

I will show off what orchestrator can do to manage your replication topologies. From visualization, through topology refactoring to automated crash recoveries, orchestrator today plays a key role at Booking.com infrastructure, at scale (oh I love using these words).

You can expect an outrageous demo, a visual walkthrough, some command line examples, and a lot on the logic and mechanisms behind orchestrator. I will present the difficult problems orchestrator covers.

orchestrator is free and open source, and is built to be as generic as possible; it is known to be used by multiple well known companies these days, so please join the party.

With that, I conclude with the almighty motto: Continue reading » “Speaking at Percona Live Amsterdam: Orchestrator”

Three wishes for a new year

Another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Decent, operations friendly built in online table refactoring

The existing online ALTER TABLE solution is still not there. It does not take into consideration the load on the server. In replication stream it’s serialized just as any other statement. It’s not easy to stop/revert in case of problem.

The existing online-schema-change scripts are still based on my 6-year old original solution, I’m proud to say, but the technology used, i.e. triggers, hasn’t really evolved since then. We are still fighting deadlocks induced by the use of triggers and/or concurrent copying of data. There are still limitations on use of foreign keys.

None of my wishes in previous years [2010], [2011], [2012], [2013], [2014] came true (and mostly gone worse). I’m still willing to settle for two out of three.

Orchestrator visual cheatsheet, TL;DR the “smart” way

Orchestrator is really growing. And the amount of users (DBAs, sys admins) using it is growing. Which gives me a lot of immediate feedback in the form of “Look, there’s just too many options to move slaves around! Which ones should we use?”

TL;DR look at the two visualized commands below

They are enough

The “smart” commands to end all commands

So all relocation commands are important, and give you fine-grained, pin-pointed control of the method of topology refactoring. However, most of the time you just want to move those servers around. Which is why there’s a new “smart” mode which support these two commands, which you should be happy using:

  • relocate: move a single slave to another position
  • relocate-slaves: move all/some slaves of some server to another position.

What makes these commands Smart? You can move slaves around from anywhere to anywhere. And orchestrator figures out the bast execution path. If possible, it uses GTID. Not possible? Is Pseudo-GTID available? Great, using Pseudo-GTID. Oh, are there binlog servers involved? Really simple, use them. None of the above? Orchestrator will use “standard” binlog file:pos math (with limitations). Orchestrator will even figure out if multiple steps are necessary and will combine any of the above.

So you don’t have to remember all the possible ways and options. The visual cheatsheet now boils down to these two:

orchestrator-cheatsheet-visualized-relocate

orchestrator-cheatsheet-visualized-relocate-slaves

Let’s take a slightly deeper look Continue reading » “Orchestrator visual cheatsheet, TL;DR the “smart” way”

Orchestrator 1.4.340: GTID, binlog servers, Smart Mode, failovers and lots of goodies

Orchestrator 1.4.340 is released. Not quite competing with the MySQL latest changelog, and as I haven’t blogged about orchestrator featureset in a while, this is a quick listing of orchestrator features available since my last publication:

  • Supports GTID (Oracle & MariaDB)
    • GTID still not being used in automated recovery — in progress.
    • enable-gtid, disable-gtid, skip-query for GTID commands
  • Supports binlog servers (MaxScale)
    • Discovery & operations on binlog servers
    • Understanding slave repositioning in a binlog-server architecture
  • Smart mode: relocate & relocate-below commands (or Web/GUI drag-n-drop) let orchestrator figure out the best way of slave repositioning. Orchestrator picks from GTID, Pseudo GTID, binlog servers, binlog file:pos math (and more) options, or combinations of the above. Fine grained commands still there, but mostly you won’t need them.
  • Crash recoveries (did you know orchestrator does that?):
    • For intermediate master recovery: improved logic in picking the best recovery plan (prefer in-DC, prefer promoting local slave, supporting binlog server topologies, …)
    • For master recovery: even better slave promotion; supports candidate slaves (prefer promoting such slaves); supports binlog server shared topologies
    • Better auditing and logging of recovery cases
    • Better analysis of crash scenarios, also in the event of lost VIPs, hanging connections; emergent checks in crash suspected scenarios
    • recover-lite: do all topology-only recovery steps, without invoking external processes
  • Better browser support: used to only work on Firefox and Chrome (and the latter has had issues), the Web UI should now work well on all browsers, at the cost of reduced d3 animation. More work still in progress.
  • Faster, more parallel, less blocking operations on all counts; removed a lots of serialized code; less locks.
  • Web enhancements
    • More verbose drag-n-drop (operation hint; color hints)
    • Drag-n-drop for slaves-of-a-server
    • Replication/crash analysis dashboard
  • Pools: orchestrator can be told about instance-to-pool association (submit-pool-instances command)
    • And can then present pool status (web)
    • Or pool hints within topologies (web)
    • Or queried for all pools (cluster-pool-instances command)
  • Other:
    • Supports MySQL 5.7 (tested with 5.7.8)
    • Configurable graphite path for metrics
    • –noop flag; does all the work except for actually changing master on slaves. Shows intentions.
    • Web (or cli which-cluster-osc-slaves command) provide list of control slaves to use in pt-osc operation
    • hostname-unresolve: force orchestrator to unresolve a fqdn into VIP/CNAME/… when issuing a CHANGE MASTER TO
  • 3rd party contributions (hey, thanks!) include:
    • More & better SSL support
    • Vagrant templates
  • For developers:
    • Orchestrator now go-gettable. Just go get github.com/outbrain/orchestrator
    • Improved build script; supports more architectures

Continue reading » “Orchestrator 1.4.340: GTID, binlog servers, Smart Mode, failovers and lots of goodies”

Percona Live Amsterdam: Community Dinner, Sep. 22nd

Keeping up with tradition, there will be a community event held at the upcoming Percona Live Europe: Amsterdam 2015 conference.

This year, Booking.com will be hosting the event at the company’s headquarters in the heart of Amsterdam.

We will hold a community dinner (dish selection, includes vegetarian; beverages will be served) in our caffeteria and hope to add some spicy activities to the event!

Space is limited, and tickets can be purchased via Eventbrite.

Special thanks to Daniël van Eeden and Jean-François Gagné for their work in making this happen! Continue reading » “Percona Live Amsterdam: Community Dinner, Sep. 22nd”

Baffling 5.7 global/status variables issues, unclean migration path

MySQL 5.7 introduces a change in the way we query for global variables and status variables: the INFORMATION_SCHEMA.(GLOBAL|SESSION)_(VARIABLES|STATUS) tables are now deprecated and empty. Instead, we are to use the respective performance_schema.(global|session)_(variables|status) tables.

But the change goes farther than that; there is also a security change. Oracle created a pitfall of 2 changes at the same time:

  1. Variables/status moved to a different table
  2. Privileges required on said table

As an example, my non-root user gets:

mysql> show session variables like 'tx_isolation';
ERROR 1142 (42000): SELECT command denied to user 'normal_user'@'my_host' for table 'session_variables'

Who gets affected by this? Nearly everyone and everything.

  • Your Nagios will not be able to read status variables
  • Your ORM will not be able to determine session variables
  • Your replication user will fail connecting (see this post by Giuseppe)
  • And most everyone else.

The problem with the above is that involves two unrelated changes to your setup, which are not entirely simple to coordinate:

  1. Change your app code to choose the correct schema (information_schema vs. performance_schema)
  2. GRANT the permissions on your database

Perhaps at this point you still do not consider this to be a problem. You may be thinking: well, let’s first prepare by creating the GRANTs, and once that is in place, we can, at our leisure, modify the code.

Not so fast. Can you really that simply create those GRANTs? Continue reading » “Baffling 5.7 global/status variables issues, unclean migration path”

On SHOW BINLOG/RELAYLOG EVENTS

Some notes after working with SHOW BINLOG EVENTS and SHOW RELAYLOG EVENTS statements; there are a few gotchas and some interesting facts. My reflections also follow.

I’m calling these commands from orchestrator when working with Pseudo-GTID (which I do alot). I prefer to work with agent-free design, where a single, remote service can do everything: examine replication status, scan binary logs for information, and recover broken topologies via gluing together servers that were not previously directly associated.

Alas, documentation is short on these commands, and some stuff I learned the hard way.

Basically, SHOW BINLOG/RELAYLOG EVENTS commands are a poor man’s replacement to mysqlbinlog, only you can issue them on MySQL protocol, and you do not have to have the binary/relay log files locally on your host.

Fun fact

The binary logs are called so because they are compressed. You are familiar with the binlog position you see on SHOW MASTER STATUS or SHOW SLAVE STATUS. You are familiar with the binlog position as you see it when you execute “mysqlbinlog mybinlog.001234”. The position of a new entry equals to the file size of the binary log at that time. If:

$ ls -l master/data/mysql-bin.015901
-rw-rw---- 1 user user 401408 Jul 18 02:44 master/data/mysql-bin.015901

Then the next entry will be at position 401408, as this is the file size in bytes.

And so when MySQL writes an entry to the binary log, it (of course) knows the entry’s position in the binary log, but then also immediately knows the position of the next entry.

We’ll revisit this fact later.

Output of SHOW BINLOG/RELAYLOG EVENTS

The output of both statement depends on the binlog_format. In Statement Based Replication it may look like: Continue reading » “On SHOW BINLOG/RELAYLOG EVENTS”

Pseudo GTID, ASCENDING

Pseudo GTID is a technique where we inject Globally Unique entries into MySQL, gaining GTID abilities without using GTID. It is supported by orchestrator and described in more detail here, here and here.

Quick recap: we can join two slaves to replicate from one another even if they never were in parent-child relationship, based on our uniquely identifiable entries which can be found in the slaves’ binary logs or relay logs. Having Pseudo-GTID injected and controlled by us allows us to optimize failovers into quick operations, especially where a large number of server is involved.

Ascending Pseudo-GTID further speeds up this process for delayed/lagging slaves.

Recap, visualized

(but do look at the presentation):

pseudo-gtid-quick

  1. Find last pseudo GTID in slave’s binary log (or last applied one in relay log)
  2. Search for exact match on new master’s binary logs
  3. Fast forward both through successive identical statements until end of slave’s applied entries is reached
  4. Point slave into cursor position on master

What happens if the slave we wish to reconnect is lagging? Or perhaps it is a delayed replica, set to run 24 hours behind its master?

The naive approach would expand bullet #2 into:

  • Search for exact match on master’s last binary logs
  • Unfound? Move on to previous (older) binary log on master
  • Repeat

The last Pseudo-GTID executed by the slave was issued by the master over 24 hours ago. Suppose the master generates one binary log per hour. This means we would need to full-scan 24 binary logs of the master where the entry will not be found; to only be matched in the 25th binary log (it’s an off-by-one problem, don’t hold the exact number against me).

Ascending Pseudo GTID

Since we control the generation of Pseudo-GTID, and since we control the search for Pseudo-GTID, we are free to choose the form of Pseudo-GTID entries. We recently switched into using Ascending Pseudo-GTID entries, and this works like a charm. Consider these Pseudo-GTID entries: Continue reading » “Pseudo GTID, ASCENDING”

What makes a MySQL server failure/recovery case?

Or: How do you reach the conclusion your MySQL master/intermediate-master is dead and must be recovered?

This is an attempt at making a holistic diagnosis of our replication topologies. The aim is to cover obvious and not-so-obvious crash scenarios, and to be able to act accordingly and heal the topology.

At Booking.com we are dealing with very large amounts of MySQL servers. We have many topologies, and many servers in each topology. See past numbers to get a feel for it. At these numbers failures happen frequently. Typically we would see normal slaves failing, but occasionally — and far more frequently than we would like to be paged for — an intermediate master or a master would crash. But our current (and ever in transition) setup also include SANs, DNS records, VIPs, any of which can fail and bring down our topologies.

Tackling issues of monitoring, disaster analysis and recovery processes, I feel safe to claim the following statements:

  • The fact your monitoring tool cannot access your database does not mean your database has failed.
  • The fact your monitoring tool can access your database does not mean your database is available.
  • The fact your database master is unwell does not mean you should fail over.
  • The fact your database master is alive and well does not mean you should not fail over.

Bummer. Let’s review a simplified topology with a few failure scenarios. Some of these scenarios you will find familiar. Some others may be caused by setups you’re not using. I would love to say I’ve seen it all but the more I see the more I know how strange things can become. Continue reading » “What makes a MySQL server failure/recovery case?”