Quick hack for GTID_OWN lack

One of the benefits of MySQL GTIDs is that each server remembers all GTID entries ever executed. Normally these would be ranges, e.g. 0041e600-f1be-11e9-9759-a0369f9435dc:1-3772242 or multi-ranges, e.g. 24a83cd3-e30c-11e9-b43d-121b89fcdde6:1-103775793, 2efbcca6-7ee1-11e8-b2d2-0270c2ed2e5a:1-356487160, 46346470-6561-11e9-9ab7-12aaa4484802:1-26301153, 757fdf0d-740e-11e8-b3f2-0a474bcf1734:1-192371670, d2f5e585-62f5-11e9-82a5-a0369f0ed504:1-10047.

One of the common problems in asynchronous replication is the issue of consistent reads. I’ve just written to the master. Is the data available on a replica yet? We have iterated on this, from reading on master, to heuristically finding up-to-date replicas based on heartbeats (see presentation and slides) via freno, and now settled, on some parts of our apps, to using GTID.

GTIDs are reliable as any replica can give you a definitive answer to the question: have you applied a given transaction or not?. Given a GTID entry, say f7b781a9-cbbd-11e9-affb-008cfa542442:12345, one may query for the following on a replica:

mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:12345', @@global.gtid_executed) as transaction_found;
+-------------------+
| transaction_found |
+-------------------+
|                 1 |
+-------------------+

mysql> select gtid_subset('f7b781a9-cbbd-11e9-affb-008cfa542442:123450000', @@global.gtid_executed) as transaction_found;
+-------------------+
| transaction_found |
+-------------------+
|                 0 |
+-------------------+

Getting OWN_GTID

This is all well, but, given some INSERT or UPDATE on the master, how can I tell what’s the GTID associated with that transaction? There\s good news and bad news.

  • Good news is, you may SET SESSION session_track_gtids = OWN_GTID. This makes the MySQL protocol return the GTID generated by your transaction.
  • Bad news is, this isn’t a standard SQL response, and the common MySQL drivers offer you no way to get that information!

At GitHub we author our own Ruby driver, and have implemented the functionality to extract OWN_GTID, much like you’d extract LAST_INSERT_ID. But, how does one solve that without modifying the drivers? Here’s a poor person’s solution which gives you an inexact, but good enough, info. Following a write (insert, delete, create, …), run:

select gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), gtid_subtract(concat(@@server_uuid, ':1-1000000000000000'), @@global.gtid_executed)) as master_generated_gtid;

The idea is to “clean” the executed GTID set from irrelevant entries, by filtering out all ranges that do not belong to the server you’ve just written to (the master). The number 1000000000000000 stands for “high enough value that will never be reached in practice” – set to your own preferred value, but this value should take you beyond 300 years assuming 100,000 transactions per second.

Continue reading » “Quick hack for GTID_OWN lack”

Three wishes for a new year

(Almost) another new year by Jewish calendar. What do I wish for the following year?

  1. World peace
  2. Good health to all
  3. Relaxed GTID constraints

I’m still not using GTID, and still see operational issues with working with GTID. As a latest example, our new schema migration solution, gh-ost, allows us to test migrations in production, on replicas. The GTID catch? gh-ost has to write something to the binary log. Thus, it “corrupts” the replica with a bogus GTID entry that will never be met in another server, thus making said replica unsafe to promote. We can work around this, but…

I understand the idea and need for the Executed GTID Set. It will certainly come in handy with multi-writer InnoDB Cluster. However for most use cases GTID poses a burden. The reason is that our topologies are imperfect, and we as humans are imperfect, and operations are most certainly imperfect. We may wish to operate on a replica: test something, by intention or mistake. We may wish to use a subchain as the seed for a new cluster split. We may wish to be able to write to downstream replicas. We may use a 3rd party tool that issues a flush tables with read lock without disabling sql_log_bin. Things just happen.

For that, I would like to suggest GTID control levels, such as:

  1. Strict: same as Oracle’s existing implementation. Executed sets, purged sets, whatnot.
  2. Last executed: a mode where the only thing that counts is the last executed GTID value. If I repoint replica, all it needs to check is “hey this is my last executed GTID entry, give me the coordinates of yours. And, no, I don’t care about comparing executed and purged sets, I will trust you and keep running from that point on”
  3. Declarative: GTIDs are generated, are visible in each and every binary log entry, but are completely ignored.

I realize Oracle MySQL GTID is out for some over 3 years now, but I’m sorry – I still have reservations and see use cases where I fear it will not serve me right.

How about my previous years wishes? World peace and good health never came through, however:

  • My 2015 wish for “decent, operations friendly built in online table refactoring” was unmet, however gh-ost is a thing now and exceeds my expectations. No, really. Please come see Tom & myself present gh-ost and how it changed our migration paradigm.
  • My 2012 wish for “decent, long waited for, implementation of Window Functions (aka Analytic Functions) for MySQL” was met by MariaDB’s window functions.
    Not strictly Window Functions, but Oracle MySQL 8.0 will support CTE (hierarchial/recursive), worth a mention.

See you in Amsterdam!

Refactoring replication topology with Pseudo GTID

This post describes in detail the method of using Pseudo GTID to achieve unplanned replication topology changes, i.e. connecting two arbitrary slaves, or recovering from a master failure even as all its slaves are hanging in different positions.

Please read Pseudo GTID and Pseudo GTID, RBR as introduction.

Consider the following case: the master dies unexpectedly, and its three slaves are all hanging, not necessarily at same binary log file/position (network broke down while some slaves managed to salvage more entries into their relay logs than others)

orchestrator-failed-master

(Did you notice the “Candidate for master” message? To be discussed shortly)

GTID

With GTID each transaction (and entry in the binary log) is associated with a unique mark — the Global Transaction ID. Just pick the slave with the most advanced GTID to be the next master, and just CHANGE MASTER TO MASTER_HOST=’chosen_slave’ on the other slaves, and everything magically works. A slave knows which GTID it has already processed, and can look that entry on its master’s binary logs, resuming replication on the one that follows.

How does that work? The master’s binary logs are searched for that GTID entry. I’m not sure how brute-force this is, since I’m aware of a subtlety which requires brute-force scan of all binary logs; I don’t actually know if it’s always like that.

Pseudo GTID

We can mimick that above, but our solution can’t be as fine grained. With the injection of Pseudo GTID we mark the binary log for unique entries. But instead of having a unique identifier for every entry, we have a unique identifier for every second, 10 seconds, or what have you, with otherwise normal, non-unique entries in between our Pseudo GTID entries.

Recognizing which slave is more up to date

Given two slaves, which is more up to date?

  • If both replicate(d) from same master, a SHOW SLAVE STATUS comparison answers (safe method: wait till SQL thread catches up with broken IO thread, compare relay_master_log_file, exec_master_log_pos on both machines). This is the method by which the above “Candidate for master” message is produced.
  • If one is/was descendent of the other, then obviously it is less advanced or equals its ancestor.
  • Otherwise we’re unsure – still solvable via bi-directional trial & error, as explained later on.

For now, let’s assume we know which slave is more up to date (has received and executed more relay logs). Let’s call it S1, whereas the less up-to-date will be S2. This will make our discussion simpler. Continue reading » “Refactoring replication topology with Pseudo GTID”

Pseudo GTID, Row Based Replication

This post continues Pseudo GTID, in a series of posts describing an alternative to using MySQL GTIDs.

The solution offered in the last post does not work too well for row based replication. The binary log entries for the INSERT statement look like this:

# at 1020
# at 1074
#141020 12:36:21 server id 1  end_log_pos 1074  Table_map: `test`.`pseudo_gtid` mapped to number 33
#141020 12:36:21 server id 1  end_log_pos 1196  Update_rows: table id 33 flags: STMT_END_F

BINLOG '
lddEVBMBAAAANgAAADIEAAAAACEAAAAAAAEABHRlc3QAC3BzZXVkb19ndGlkAAMDBw8CQAAE
lddEVBgBAAAAegAAAKwEAAAAACEAAAAAAAEAA///+AEAAACL10RUJDg2ZmRhMDk1LTU4M2MtMTFl
NC05NzYyLTNjOTcwZWEzMWVhOPgBAAAAlddEVCQ4Y2YzOWMyYy01ODNjLTExZTQtOTc2Mi0zYzk3
MGVhMzFlYTg=
'/*!*/;

Where’s our unique value? Encoded within something that cannot be trusted to be unique. Issuing mysqlbinlog –verbose helps out:

BEGIN
/*!*/;
# at 183
# at 237
#141020 12:35:51 server id 1  end_log_pos 237   Table_map: `test`.`pseudo_gtid` mapped to number 33
#141020 12:35:51 server id 1  end_log_pos 359   Update_rows: table id 33 flags: STMT_END_F

BINLOG '
d9dEVBMBAAAANgAAAO0AAAAAACEAAAAAAAEABHRlc3QAC3BzZXVkb19ndGlkAAMDBw8CQAAE
d9dEVBgBAAAAegAAAGcBAAAAACEAAAAAAAEAA///+AEAAABt10RUJDc1MWJkYzEwLTU4M2MtMTFl
NC05NzYyLTNjOTcwZWEzMWVhOPgBAAAAd9dEVCQ3YjExZDQzYy01ODNjLTExZTQtOTc2Mi0zYzk3
MGVhMzFlYTg=
'/*!*/;
### UPDATE `test`.`pseudo_gtid`
### WHERE
###   @1=1
###   @2=1413797741
###   @3='751bdc10-583c-11e4-9762-3c970ea31ea8'
### SET
###   @1=1
###   @2=1413797751
###   @3='7b11d43c-583c-11e4-9762-3c970ea31ea8'

and that’s something we can work with. However, I like to do stuff from within MySQL, and rely as little as possible on external tools. How do the binary log entries look via SHOW BINLOG EVENTS? Not good.

master [localhost] {msandbox} (test) > show binlog events in 'mysql-bin.000058' limit 20;
+------------------+------+-------------+-----------+-------------+---------------------------------------+
| Log_name         | Pos  | Event_type  | Server_id | End_log_pos | Info                                  |
+------------------+------+-------------+-----------+-------------+---------------------------------------+
| mysql-bin.000058 |    4 | Format_desc |         1 |         107 | Server ver: 5.5.32-log, Binlog ver: 4 |
| mysql-bin.000058 |  107 | Query       |         1 |         183 | BEGIN                                 |
| mysql-bin.000058 |  183 | Table_map   |         1 |         237 | table_id: 33 (test.pseudo_gtid)       |
| mysql-bin.000058 |  237 | Update_rows |         1 |         359 | table_id: 33 flags: STMT_END_F        |
| mysql-bin.000058 |  359 | Xid         |         1 |         386 | COMMIT /* xid=5460 */                 |
| mysql-bin.000058 |  386 | Query       |         1 |         462 | BEGIN                                 |
| mysql-bin.000058 |  462 | Table_map   |         1 |         516 | table_id: 33 (test.pseudo_gtid)       |
| mysql-bin.000058 |  516 | Update_rows |         1 |         638 | table_id: 33 flags: STMT_END_F        |
| mysql-bin.000058 |  638 | Xid         |         1 |         665 | COMMIT /* xid=5471 */                 |
| mysql-bin.000058 |  665 | Query       |         1 |         741 | BEGIN                                 |
| mysql-bin.000058 |  741 | Table_map   |         1 |         795 | table_id: 33 (test.pseudo_gtid)       |
| mysql-bin.000058 |  795 | Update_rows |         1 |         917 | table_id: 33 flags: STMT_END_F        |
| mysql-bin.000058 |  917 | Xid         |         1 |         944 | COMMIT /* xid=5474 */                 |
| mysql-bin.000058 |  944 | Query       |         1 |        1020 | BEGIN                                 |
| mysql-bin.000058 | 1020 | Table_map   |         1 |        1074 | table_id: 33 (test.pseudo_gtid)       |
| mysql-bin.000058 | 1074 | Update_rows |         1 |        1196 | table_id: 33 flags: STMT_END_F        |
| mysql-bin.000058 | 1196 | Xid         |         1 |        1223 | COMMIT /* xid=5476 */                 |
| mysql-bin.000058 | 1223 | Query       |         1 |        1299 | BEGIN                                 |
| mysql-bin.000058 | 1299 | Table_map   |         1 |        1353 | table_id: 33 (test.pseudo_gtid)       |
| mysql-bin.000058 | 1353 | Update_rows |         1 |        1475 | table_id: 33 flags: STMT_END_F        |
+------------------+------+-------------+-----------+-------------+---------------------------------------+

The representation of row-format entries in the SHOW BINLOG EVENTS output is really poor. Why, there’s nothing to tell me at all about what’s been done, except that this is some operation on test.pseudo_gtid. Obviously I cannot find anything unique over here.

Not all is lost. How about DDL statements? Those are still written in SBR format (there’s no rows to log upon creating a table). A solution could be somehow manipulating a unique value in a DDL statement. There could be various such solutions, and I chose to use a CREATE VIEW statement, dynamically composed of a UUID(): Continue reading » “Pseudo GTID, Row Based Replication”

Pseudo GTID

Pseudo GTID is a method to implement a GTID-like solution where slaves are easily connected to one another. This blog post and the following ones will describe work in progress (some 80% completed), where simulation of GTID makes for a good enough basis for refactoring replication topologies. I’m coding this in orchestrator, which already provides a substantial infrastructure support for this.

The final goal: orchestrator will allow you to move a slave below another, using only the data available by those two slaves. The usage is obvious:

  • Easy master failover (master dead? Orchestrator will choose the most advanced slave to promote and make it master of its siblings)
  • Slave promotion in complex topologies (with deep nested topologies, be able to move a slave up the hierarchy even if its local master is corrupted).

This can all happen with your normal, non GTID, MySQL replication, using your normal binary log files & positions.

This work in progress is inspired by Sam Lambert at GitHub, who has worked on a similar solution with different implementation. I also recall discussions with other DBAs having similar solution.

Pseudo GTID

First thing’s first, the basis for proposed solution is a pseudo-GTID. A unique entry in the binary logs (not necessarily sequential; not necessarily in ascending order). While in GTID implementations we have a unique identifier for each entry in the binary log, with pseudo-GTID we accept an occasional (or frequent) unique entry in the binary log.

There are many ways to do so. Certainly a client can generate a unique Id and invoke some statement on MySQL involving that ID. That would serve as valid grounds for the proposed solution. But I like things to be contained within MySQL. Consider, for example, the following event, which would be my preferred choice in Statement Based Replication (for RBR solution, see next post):

drop table if exists test.pseudo_gtid;
create table if not exists test.pseudo_gtid (
  id int unsigned not null primary key,
  ts timestamp,
  gtid varchar(64) charset ascii
);


drop event if exists test.update_pseudo_gtid_event;

delimiter ;;
create event if not exists
  test.update_pseudo_gtid_event
  on schedule every 10 second starts current_timestamp
  on completion preserve
  enable
  do
    begin
      set @pseudo_gtid := uuid();
      insert into test.pseudo_gtid (id, ts, gtid) values (1, NOW(), @pseudo_gtid) on duplicate key update ts=NOW(), gtid=VALUES(gtid);
    end
;;

delimiter ;

The above is based on Making UUID() and RAND() replication safe. What do we get? Once in 10 seconds (or what have you), a unique entry is written to the binary log.

Continue reading » “Pseudo GTID”