When you cannot replicate from master M to slave S

Working on some replication topology automation, here are some rules that will prevent you from replicating from a MySQL server M to a slave S:

  • M does not have binary logs (log_bin) enabled
  • M is itself a slave and does not have log_slave_updates enabled
  • M has a higher major version than S (e.g. M is 5.6 and S is 5.5, but do note log_bin_use_v1_row_events)
  • Both servers have same server_id
  • S has log_bin & log_slave_updates, uses STATEMENT binlog_format, and M uses ROW or MIXED binlog_format
  • S has log_bin & log_slave_updates, uses MIXED binlog_format, and M uses ROW binlog_format
  • S and M are using different replication filters (some rules could work; good luck if you’re that adventurous)

[EDIT: the above is configuration-wise]

Did I miss anything? Please comment below.

The mystery of MySQL 5.6 excessive buffer pool flushing

I’m experimenting with upgrading to MySQL 5.6 and am experiencing an unexplained increase in disk I/O utilization. After discussing this with several people I’m publishing in the hope that someone has an enlightenment on this.

We have a few dozens servers in a normal replication topology. On this particular replication topology we’ve already evaluated that STATEMENT based replication is faster than ROW based replication, and so we use SBR. We have two different workloads on our slaves, applied by two different HAProxy groups, on three different data centres. Hardware-wise, servers of two groups use either Virident SSD cards or normal SAS spindle disks.

Our servers are I/O bound. A common query used by both workloads looks up data that does not necessarily have a hotspot, and is very large in volume. DML is low, and we only have a few hundred statements per second executed on master (and propagated through replication).

We have upgraded 6 servers from all datacenters to 5.6, both on SSD and spindle disks, and are experiencing the following phenomena: Continue reading » “The mystery of MySQL 5.6 excessive buffer pool flushing”

“Anemomaster”: DML visibility. Your must-do for tomorrow

Here’s our take of master DML query monitoring at Outbrain (presented April 2014). It took a half-day to code, implement, automate and deploy, and within the first hour of work we managed to catch multiple ill-doing services and scripts. You might want to try this out for yourself.

What’s this about?

What queries do you monitor on your MySQL servers? Many don’t monitor queries at all, and only look up slow queries on occasion, using pt-query-digest. Some monitor slow queries, where Anemometer (relying on pt-query-digest) is a very good tool. To the extreme, some monitor TCP traffic on all MySQL servers — good for you! In between, there’s a particular type of queries that are of special interest: DML (INSERT/UPDATE/DELETE) queries issued against the master.

They are of particular interest because they are only issued once against the master, yet propagate through replication topology to execute on all slaves. These queries have a direct impact on your slave lag and on your overall replication capacity. I suggest you should be familiar with your DMLs just as you are with your slow queries.

In particular, we had multiple occasions in the past where all or most slaves started lagging. Frantically we would go to our metrics; yes! We would see a spike in com_insert. Someone (some service) was obviously generating more INSERTs than usual, at a high rate that the slaves could not keep up with. But, which INSERT was that? Blindly, we would look at the binary logs. Well, erm, what are we looking for, exactly?

Two such occasions convinced us that there should be a solution, but it took some time till it hit us. We were already using Anemometer for monitoring our slow logs. We can do the same for monitoring our binary logs. Thus was born “Anemomaster”.

Quick recap on how Anemometer works: you issue pt-query-digest on your slow logs on all MySQL hosts (we actually first ship the slow logs to a central place where we analyse them; same thing). This is done periodically, and slow logs are then rotated. You throw the output of pt-query-digest to a central database (this is built in with pt-query-digest; it doesn’t necessarily produce human readable reports). Anemometer would read this central database and visualize the slow queries.

Analysing DMLs

But then, pt-query-digest doesn’t only parse slow logs. It can parse binary logs. Instead of asking for total query time, we ask for query count, and on we go to establish the same mechanism, using same pt-query-digest and same Anemometer to store and visualize the DMLs issued on our masters.

When analysing DMLs we’re interested in parsing binary logs — and it makes no sense to do the same on all slaves. All slaves just have same copy of binlog entries as the master produces. It only takes one server to get an accurate picture of the DMLs on your replication topology.

Continue reading » ““Anemomaster”: DML visibility. Your must-do for tomorrow”

The “once and for all” SHOW SLAVE STATUS log files & positions explained

True, GTID is upon us whether via MySQL 5.6 or Tungsten Replicator (and wasn’t it in Google Patches since 2009?).

But some of us are still using standard replication with MySQL 5.5, and the “what’s with all these binary log files and positions” question is ever erupting. The output of SHOW SLAVE STATUS confuses people new to it. It confuses me time and again.

So here’s the semi visual guide to interpreting the SHOW SLAVE STATUS.

About binary logs and relay logs

A master writes binary logs. These are typically and conventionally called mysql-bin.##### or mysqld-bin.##### (replace ##### with digits).

A slave connects to its master, and reads entries from the master’s binary logs. The slave writes those entries into its own relay logs. These are typically and conventionally called mysql-relay.##### or mysqld-relay.##### (replace ##### with digits).

There is nothing at all that connects the name or number of a slave’s relay log with the master’s binary log. There is nothing at all that connects the position within the relay log with the position within the master binary log. Files are flushed/rotated; have different size configuration; are re-created. However the slave does keep track on the current relay-log entry: it knows what’s the matching entry on the master’s binary logs. This is an important piece of information.

While the slave fetches entries and writes them into the relay log (via the IO_THREAD), it also reads the relay log to replay those entries (via the SQL_THREAD).

And so at each point in time we are interested in the following “coordinates”:

  • What are we fetching from the master? Which file are we fetching and from which position?
  • Where are we writing this to? (This is implicitly the latest relay log file and its size)
  • What’s the position of currently executing slave query, in relay-log coordinates? As the slave lags these coordinates are farther (smaller) than the written-to position.
  • What’s the position of currently executing slave query, in master binary-log coordinates? This information really tells us how far apart we are from the master. Continue reading » “The “once and for all” SHOW SLAVE STATUS log files & positions explained”

Seconds_behind_master vs. Absolute slave lag

I am unable to bring myself to trust the Seconds_behind_master value on SHOW SLAVE STATUS. Even with MySQL 5.5‘s CHANGE MASTER TO … MASTER_HEARTBEAT_PERIOD (good thing, applied when no traffic goes from master to slave) it’s easy and common to find fluctuations in Seconds_behind_master value.

And, when sampled by your favourite monitoring tool, this often leads to many false negatives.

At Outbrain we use HAProxy as proxy to our slaves, on multiple clusters. More about that in a future post. What’s important here is that our decision whether a slave enters or leaves a certain pool (i.e. gets UP or DOWN status in HAProxy) is based on replication lag. Taking slaves out when they are actually replicating well is bad, since this reduces the amount of serving instances. Putting slaves in the pool when they are actually lagging too much is bad as they contain invalid, irrelevant data.

To top it all, even when correct, the Seconds_behind_master value is practically irrelevant on 2nd level slaves. In a Master -> Slave1 -> Slave2 setup, what does it mean that Slave2 has Seconds_behind_master = 0? Nothing much to the application: Slave1 might be lagging an hour behind the master, or may not be replicating at all. Slave2 might have an hour’s data missing even though it says its own replication is fine.

None of the above is news, and yet many fall in this pitfall. The solution is quite old as well; it is also very simple: do your own heartbeat mechanism, at your favourite time resolution, and measure slave lag by timestamp you yourself updated on the master.

Maatkit/percona-toolkit did this long time ago with mk-heartbeat/pt-heartbeat. We’re doing it in a very similar manner. The benefit is obvious. Consider the following two graphs; the first shows Seconds_behind_master, the seconds shows our own Absolute_slave_lag measurement. Continue reading » “Seconds_behind_master vs. Absolute slave lag”

Thoughts on MySQL 5.6 new replication features

After playing a little bit with MySQL 5.6 (RC), and following closely on Giuseppe’s MySQL 5.6 replication gotchas (and bugs), I was having some thoughts.

These are shared for a few reasons:

  • Maybe I didn’t understand it well, and someone could correct me
  • Or I understood it well, and my input could be of service to the developers
  • Or it could be of service to the users

InnoDB tables in mysql schema

The introduction of InnoDB tables in mysql makes for crash-safe replication information: the exact replication position (master log file+pos, relay log file+pos etc.) is updated on InnoDB tables; with innodb_flush_logs_at_trx_commit=1 this means replication status is durable and consistent with server data. This is great news!

However, the introduction of InnoDB tables to the mysql schema also breaks some common usage on installation and setup of MySQL servers. You can’t just drop your ib_data1 file upon dump+restore, since it also contains internal data. Giuseppe outlines the workaround for that.

I was thinking: would it be possible to have a completely different tablespace for MySQL’s internal InnoDB tables? That could be a single tablespace file (who cares about file-per-table on a few internal tables). And I’m throwing an idea without being intimate with the internals: you know how it is possible to span the shared tablespace across multiple files, as in: Continue reading » “Thoughts on MySQL 5.6 new replication features”

MySQL 5.6 new features: the user’s perspective

This is a yet-another compilation of the new MySQL 5.6 feature set. It is not a complete drill down. This list reflects what I believe to be the interesting new features user and usability -wise.

For example, I won’t be listing InnoDB’s split of kernel mutex. I’m assuming it can have a great impact on overall performance due to reducing lock contention; but usability-wise, this is very internal.

The complication is an aggregate of the many announcements and other complications published earlier on. See a reference at the end of this post.

Do note I am not using 5.6 as yet; it is in RC, not GA. I am mostly excited just to write down this list.

InnoDB

  • Online ALTER TABLE: if there is one major new feature in 5.6 you would want to upgrade for, this would be it. Add columns, drop columns, rename columns, add indexes, drop indexes – now online, while your SELECT, INSERT, UPDATE and DELETE statements are running.
  • Transportable tablespace files: copy+paste your_table.ibd files with FLUSH TABLE FOR EXPORT and ALTER TABLE … IMPORT TABLESPACE.
  • FULLTEXT: for many, the one thing holding them back from leaving MyISAM behind. Now available in InnoDB with same syntax as with MyISAM.
  • Memcached API: access InnoDB data via memcahced protocol, and skip the SQL interface.
  • User defined table location: place your tables in your pre-defined location. Place other tables elsewhere. This is something I’ve been asked about for ages.

Continue reading » “MySQL 5.6 new features: the user’s perspective”

Notes on row based replication

MySQL’s Row Based Replication (RBR) succeeds (though not replaces) Statement Based Replication (SBR), as of version 5.1.

Anyone who is familiar with replication data drift — the unexplained growing data difference between master & slave — might wish to look into row based replication. On multiple servers I’m handling the change to RBR has eliminated (to the best of my findings) replication data drift.

This is easily explained, as RBR writes particular row IDs into the binary log. You no longer need to hope the statement

DELETE FROM my_table ORDER BY my_column LIMIT 100

acts deterministically on all servers (is my_column UNIQUE?). With row based replication the particular rows deleted on the master are then deleted on the slave.

Here are three notes on RBR: Continue reading » “Notes on row based replication”

Auto caching INFORMATION_SCHEMA tables: seeking input

The short version

I have it all working. It’s kind of magic. But there are issues, and I’m not sure it should even exist, and am looking for input.

The long version

In Auto caching tables I presented with a hack which allows getting cached or fresh results via a simple SELECT queries.

The drive for the above hack was INFORMATION_SCHEMA tables. There are two major problems with INFORMATION_SCHEMA:

  1. Queries on schema-oriented tables such as TABLES, COLUMNS, STATISTICS, etc. are heavyweight. How heavyweight? Enough to make a lockdown of your database. Enough to crash down your database in some cases.
  2. The data is always generated on-the-fly, as you request it. Query the COLUMNS table twice, and risk two lockdowns of your database.

The auto-cache mechanism solves issue #2. I have it working, time based. I have an auto-cache table for each of the INFORMATION_SCHEMA heavyweight tables. Say, every 30 minutes the cache is invalidated. Throughout those 30 minutes, you get a free pass!

The auto-cache mechanism also paves the road to solving issue #1: since it works by invoking a stored routine, I have better control of the way I read INFORMATION_SCHEMA. This, I can take advantage of INFORMATION_SCHEMA optimization. It’s tedious, but not complicated.

For example, if I wanted to cache the TABLES table, I don’t necessarily read the entire TABLES data in one read. Instead, I can iterate the schemata, get a list of table names per schema, then read full row data for these, table by table. The result? Many many more SELECTs, but more optimized, and no one-big-lock-it-all query.

And the problem is…

Continue reading » “Auto caching INFORMATION_SCHEMA tables: seeking input”

Problems with MMM for MySQL

I recently encountered troubling issues with MMM for MySQL deployments, leading me to the decision to cease using it on production.

At the very same day I started writing about it, Baron published What’s wrong with MMM?. I wish to present the problems I encountered and the reasons I find MMM is flawed. In a period of two weeks, two different deployments presented me with 4 crashes, in 3 different scenarios.

In all the following scenarios, there is an Active/Passive Master-Master deployment, with one VIP (virtual IP) set for writer role, one VIP set for reader role.

Problem #1: unjustified failover, broken replication

Unjustified failover must be the common scenario. It’s also a scenario I can live with. A few seconds of downtime are OK with me once in a couple of months.

But on two different installations, a few days apart, I had two seemingly unjustified failovers followed by a troubling issue: replication got broken. Continue reading » “Problems with MMM for MySQL”