{"id":7299,"date":"2015-07-29T12:59:50","date_gmt":"2015-07-29T10:59:50","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7299"},"modified":"2015-07-29T12:59:50","modified_gmt":"2015-07-29T10:59:50","slug":"pseudo-gtid-ascending","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/pseudo-gtid-ascending","title":{"rendered":"Pseudo GTID, ASCENDING"},"content":{"rendered":"<p>Pseudo GTID is\u00a0a technique where we inject Globally Unique entries into MySQL, gaining GTID abilities without using GTID. It is supported by <strong><a href=\"https:\/\/github.com\/outbrain\/orchestrator\">orchestrator<\/a><\/strong> and described in more detail <a href=\"https:\/\/speakerdeck.com\/shlominoach\/pseudo-gtid-and-easy-mysql-replication-topology-management\">here<\/a>, <a href=\"http:\/\/code.openark.org\/blog\/tag\/pseudo-gtid\">here<\/a> and <a href=\"https:\/\/github.com\/outbrain\/orchestrator\/wiki\/Orchestrator-Manual#pseudo-gtid\">here<\/a>.<\/p>\n<p>Quick recap: we can join two slaves to replicate from one another even if they never were in parent-child relationship, based on our uniquely identifiable entries which can be found in the slaves&#8217; binary logs or relay logs.\u00a0Having Pseudo-GTID injected and controlled by us allows us to optimize failovers into quick operations, especially where a large number of server is involved.<\/p>\n<p><strong>Ascending Pseudo-GTID<\/strong> further speeds up this process for delayed\/lagging slaves.<\/p>\n<h3>Recap, visualized<\/h3>\n<p>(but do look at the <a href=\"https:\/\/speakerdeck.com\/shlominoach\/pseudo-gtid-and-easy-mysql-replication-topology-management\">presentation<\/a>):<\/p>\n<blockquote><p><a href=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/07\/pseudo-gtid-quick1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-7303\" src=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/07\/pseudo-gtid-quick1.png\" alt=\"pseudo-gtid-quick\" width=\"636\" height=\"366\" srcset=\"https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/07\/pseudo-gtid-quick1.png 636w, https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/07\/pseudo-gtid-quick1-300x173.png 300w\" sizes=\"auto, (max-width: 636px) 100vw, 636px\" \/><\/a><\/p>\n<ol>\n<li>Find last pseudo GTID in slave\u2019s binary log (or last applied one in relay log)<\/li>\n<li>Search for exact match on new master\u2019s binary logs<\/li>\n<li>Fast forward both through successive identical statements until end of slave\u2019s applied entries is reached<\/li>\n<li>Point slave into cursor position on master<\/li>\n<\/ol>\n<\/blockquote>\n<p>What happens if the slave we wish to reconnect is lagging? Or perhaps it is a delayed replica, set to run <strong>24<\/strong> hours behind its master?<\/p>\n<p>The naive approach would expand bullet <strong>#2<\/strong> into:<\/p>\n<ul>\n<li>Search for exact match on master\u2019s last binary logs<\/li>\n<li>Unfound? Move on to previous (older) binary log on master<\/li>\n<li>Repeat<\/li>\n<\/ul>\n<p>The last Pseudo-GTID executed by the slave was issued by the master over <strong>24<\/strong> hours ago. Suppose the master generates one binary log per hour.\u00a0This means we would need to full-scan <strong>24<\/strong> binary logs of the master where the entry will not be found; to only be matched in the <strong>25th<\/strong> binary log (it&#8217;s an off-by-one problem, don&#8217;t hold the exact number against me).<\/p>\n<h3>Ascending Pseudo GTID<\/h3>\n<p>Since we control the generation of Pseudo-GTID, and since we control the search for Pseudo-GTID, we are free to choose the form of Pseudo-GTID entries. We recently switched into using Ascending Pseudo-GTID entries, and this works like a charm. Consider these Pseudo-GTID entries:<!--more--><\/p>\n<blockquote>\n<pre>drop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B364E3:0000000000056EE2:6DD57B85<\/strong>`\r\ndrop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B364E8:0000000000056EEC:ACF03802<\/strong>`\r\ndrop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B364ED:0000000000056EF8:06279C24<\/strong>`\r\ndrop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B364F2:0000000000056F02:19D785E4<\/strong>`<\/pre>\n<\/blockquote>\n<p>The above entries are ascending in lexical order. The above is generated using a UTC timestamp, along with other watchdog\/random values. For a moment let&#8217;s trust that our generation is indeed always ascending. How does that help us?<\/p>\n<p>Suppose the last entry found in the slave is<\/p>\n<blockquote>\n<pre>drop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B364E3:0000000000056EE2:6DD57B85<\/strong>`<\/pre>\n<\/blockquote>\n<p>And this is what we&#8217;re to search on the master&#8217;s binary logs. Starting with the optimistic hope that the entry is in the master&#8217;s last binary log, we start reading.\u00a0By nature of binary logs we have to scan them sequentially from start to end. As we read the binary log entries, we soon meet the first Pseudo-GTID injection, and it reads:<\/p>\n<blockquote>\n<pre>drop view if exists `meta`.`<strong>_pseudo_gtid_hint__asc:55B730E6:0000000000058F02:19D785E4<\/strong>`<\/pre>\n<\/blockquote>\n<p>&nbsp;<\/p>\n<p>At this stage we know we can completely skip scanning the rest of the binary log. Our entry will not be there: this entry is larger than the one we&#8217;re looking for, and they&#8217;ll only get larger as we get along in the binary log. It is therefore safe to ignore the rest of this file and move on to the next-older binary log on the master, to repeat our search there.<\/p>\n<p>Binary logs where the entry cannot be in are only briefly examined: <em>orchestrator<\/em> will probably read\u00a0no more than first <strong>1,000<\/strong> entries or so (can&#8217;t give you a number, it&#8217;s your workload) before giving up on the binary log.<\/p>\n<p>On every topology chain we have <strong>2<\/strong> delayed replica slaves, to help us out in the case we make a grave mistake of DELETing the wrong data. These slaves would take, on some chains, <strong>5-6<\/strong> minutes to reconnect to a new master using Pseudo-GTID, since\u00a0it required scanning\u00a0many many GBs of binary logs. This is no longer the case; we&#8217;ve reduced scan time for such servers to about <strong>25s<\/strong> at worst, and much quicker on average. There can still be dozens of binary logs to open, but all but one are given up very quickly. I should stress that those <strong>25s<\/strong> are nonblocking for other slaves which are mote up to date than the delayed replicas.<\/p>\n<h3>Can there be a mistake?<\/h3>\n<p>Notice that the above algorithm does not require each and every entry to be ascending; it just compares the first entry in each binlog to determine whether our target entry is there or not. This means if we&#8217;ve messed up our Ascending order and injected some out-of-order entries, we can still get away with it &#8212; as long as those entries are not the first ones in the binary log, nor are they the last entries executed by the slave.<\/p>\n<p>But why be so negative? We&#8217;re using UTC timestamp as the major sorting order, and inject Pseudo-GTID every <strong>5<\/strong> seconds; even with leap second we&#8217;re comfortable.<\/p>\n<p>On my TODO is to also include a &#8220;Plan B&#8221; full-scan search: if the Ascending algorithm fails, we can still opt for the full scan option. So there would be no risk at all.<\/p>\n<h3>Example<\/h3>\n<p>We inject Pseudo-GTID via event-scheduler. These are the good parts of the event definition:<\/p>\n<blockquote>\n<pre>create event if not exists\r\n  create_pseudo_gtid_event\r\n  on schedule every 5 second starts current_timestamp\r\n  on completion preserve\r\n  enable\r\n  do\r\n    begin\r\n      set @connection_id := connection_id();\r\n      set @now := now();\r\n      set @rand := floor(rand()*(1 &lt;&lt; 32));\r\n      <strong>set @pseudo_gtid_hint := concat_ws(':', lpad(hex(unix_timestamp(@now)), 8, '0'), lpad(hex(@connection_id), 16, '0'), lpad(hex(@rand), 8, '0'));<\/strong>\r\n<strong>\r\n      set @_create_statement := concat('drop ', 'view if exists `meta`.`_pseudo_gtid_', 'hint__asc:', @pseudo_gtid_hint, '`');<\/strong>\r\n      PREPARE st FROM @_create_statement;\r\n      EXECUTE st;\r\n      DEALLOCATE PREPARE st;\r\n<\/pre>\n<\/blockquote>\n<p>We accompany this by the following <em>orchestrator<\/em> configuration:<\/p>\n<blockquote>\n<pre> \"PseudoGTIDPattern\": \"drop view if exists .*?`_pseudo_gtid_hint__\",\r\n \"PseudoGTIDMonotonicHint\": \"asc:\",<\/pre>\n<\/blockquote>\n<p><strong>&#8220;PseudoGTIDMonotonicHint&#8221;<\/strong> notes a string; if that string (<strong>&#8220;asc:&#8221;<\/strong>) is found in the slave&#8217;s Pseudo-GTID entry, then the entry is assumed to have been injected as part of ascending entries, and the optimization kicks in.<\/p>\n<p><a href=\"https:\/\/github.com\/outbrain\/orchestrator\/wiki\/Orchestrator-Manual#pseudo-gtid\">The Manual<\/a> has more on this.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pseudo GTID is\u00a0a technique where we inject Globally Unique entries into MySQL, gaining GTID abilities without using GTID. It is supported by orchestrator and described in more detail here, here and here. Quick recap: we can join two slaves to replicate from one another even if they never were in parent-child relationship, based on our [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[108,115,8],"class_list":["post-7299","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-orchestrator","tag-pseudo-gtid","tag-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1TJ","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7299"}],"version-history":[{"count":9,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7299\/revisions"}],"predecessor-version":[{"id":7324,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7299\/revisions\/7324"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}