{"id":7453,"date":"2015-11-20T11:41:13","date_gmt":"2015-11-20T09:41:13","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7453"},"modified":"2015-11-20T11:41:13","modified_gmt":"2015-11-20T09:41:13","slug":"state-of-automated-recovery-via-pseudo-gtid-orchestrator-booking-com","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/state-of-automated-recovery-via-pseudo-gtid-orchestrator-booking-com","title":{"rendered":"State of automated recovery via Pseudo-GTID &#038; Orchestrator @ Booking.com"},"content":{"rendered":"<p>This post sums up some of my work on MySQL resilience and high availability at <a href=\"http:\/\/www.booking.com\">Booking.com<\/a> by presenting the current state of automated master and intermediate master recoveries via <a href=\"http:\/\/code.openark.org\/blog\/mysql\/refactoring-replication-topology-with-pseudo-gtid\">Pseudo-GTID<\/a> &amp; <strong><a href=\"https:\/\/github.com\/outbrain\/orchestrator\">Orchestrator<\/a><\/strong>.<\/p>\n<p>Booking.com uses many different MySQL topologies, of varying vendors, configurations and workloads: Oracle MySQL, MariaDB, statement based replication, row based replication, hybrid, OLTP, OLAP,\u00a0GTID (few), no GTID (most), Binlog Servers, filters, hybrid of all the above.<\/p>\n<p>Topologies size\u00a0varies from a single server to many-many-many. Our typical topology has a master in one datacenter, a bunch of slaves in same DC, a slave in another DC acting as an intermediate master to further bunch of slaves in the other DC. Something like this, give or take:<\/p>\n<blockquote><p><a href=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-7480 size-medium\" src=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample-300x169.png\" alt=\"booking-topology-sample\" width=\"300\" height=\"169\" srcset=\"https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample-300x169.png 300w, https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample-1024x576.png 1024w, https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample-900x506.png 900w, https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2015\/11\/booking-topology-sample.png 1600w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p><\/blockquote>\n<p>However as we are building our third data center (with MySQL deployments mostly\u00a0completed) the graph turns more complex.<\/p>\n<p>Two high availability questions are:<\/p>\n<ul>\n<li>What happens when an intermediate master dies? What of all its slaves?<\/li>\n<li>What happens when the master dies? What of the entire topology?<\/li>\n<\/ul>\n<p>This is not a technical drill down into the solution, but rather on overview of the state. For more, please refer to recent presentations in <a href=\"https:\/\/speakerdeck.com\/shlominoach\/managing-and-visualizing-your-replication-topologies-with-orchestrator\">September<\/a> and <a href=\"https:\/\/speakerdeck.com\/shlominoach\/pseudo-gtid-and-easy-mysql-replication-topology-management\">April<\/a>.<\/p>\n<p>At this time we have:<\/p>\n<ul>\n<li>Pseudo-GTID deployed on all chains\n<ul>\n<li>Injected every 5 seconds<\/li>\n<li>Using the <a href=\"http:\/\/code.openark.org\/blog\/mysql\/pseudo-gtid-ascending\">monotonically ascending<\/a> variation<\/li>\n<\/ul>\n<\/li>\n<li>Pseudo-GTID based automated failover for intermediate masters on all chains<\/li>\n<li>Pseudo-GTID based automated failover for masters on roughly 30% of the chains.\n<ul>\n<li>The rest of 70% of chains are set for manual failover using Pseudo-GTID.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Pseudo-GTID is in particular used for:<\/p>\n<ul>\n<li>Salvaging slaves of a dead intermediate master<\/li>\n<li>Correctly grouping and connecting slaves of a dead master<\/li>\n<li>Routine refactoring of topologies. This includes:\n<ul>\n<li>Manual repointing of slaves for various operations (e.g. offloading slaves from a busy box)<\/li>\n<li>Automated refactoring (for example, used by our automated upgrading script, which consults with <em>orchestrator<\/em>, upgrades, shuffles slaves around, updates intermediate master, suffles back&#8230;)<\/li>\n<\/ul>\n<\/li>\n<li>(In the works), failing over\u00a0binlog reader apps that audit our binary logs.<\/li>\n<\/ul>\n<p><!--more-->Furthermore, Booking.com is also <a href=\"https:\/\/www.percona.com\/live\/europe-amsterdam-2015\/sessions\/binlog-servers-bookingcom\">working on Binlog Servers<\/a>:<\/p>\n<ul>\n<li>These take production traffic and offload masters and intermediate masters<\/li>\n<li>Often co-serve slaves using\u00a0round-robin VIP, such that failure of one Binlog Server makes for simple slave replication self-recovery.<\/li>\n<li>Are interleaved alongside\u00a0standard replication\n<ul>\n<li>At this time we have no &#8220;pure&#8221; Binlog Server topology in production; we always have normal intermediate masters and slaves<\/li>\n<\/ul>\n<\/li>\n<li>This hybrid state makes for greater complexity:\n<ul>\n<li>Binlog Servers are not designed to participate in a game of changing masters\/intermediate master, unless <a href=\"http:\/\/jfg-mysql.blogspot.nl\/2015\/09\/abstracting-binlog-servers-and-mysql-master-promotion-wo-reconfiguring-slaves.html\">successors come from their own sub-topology<\/a>,\u00a0which\u00a0is not the case today.\n<ul>\n<li>For example, a Binlog Server that replicates directly from the master, cannot be repointed to just any new master.<\/li>\n<li>But can still hold valuable binary log entries that other slaves may not.<\/li>\n<\/ul>\n<\/li>\n<li>Are not actual MySQL servers, therefore of course cannot be promoted as masters<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><em>Orchestrator<\/em> &amp; Pseudo-GTID makes this hybrid topology still resilient:<\/p>\n<ul>\n<li><em>Orchestrator<\/em> understands the limitations on the hybrid topology and can salvage slaves of 1st tier Binlog Servers via Pseudo-GTID<\/li>\n<li>In the case where the Binlog Servers were the most up to date slaves of a failed master, <em>orchestrator<\/em> knows to first move potential candidates under the Binlog Server and then extract them out again.<\/li>\n<li>At this time Binlog Servers are still unstable. Pseudo-GTID allows us to comfortably test them\u00a0on a large setup with reduced fear of losing slaves.<\/li>\n<\/ul>\n<p>Otherwise <em>orchestrator<\/em> already understands pure Binlog Server topologies and can\u00a0do master promotion. When pure binlog servers topologies will be\u00a0in production <em>orchestrator<\/em> will be there to watch over.<\/p>\n<h3>Summary<\/h3>\n<p>To date, Pseudo-GTID has high scores in automated failovers of our topologies; <em>orchestrator&#8217;s<\/em> <a href=\"http:\/\/code.openark.org\/blog\/mysql\/what-makes-a-mysql-server-failurerecovery-case\">holistic approach<\/a> makes for reliable diagnostics; together they reduce our dependency on specific servers &amp; hardware, physical location, latency implied by SAN devices.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post sums up some of my work on MySQL resilience and high availability at Booking.com by presenting the current state of automated master and intermediate master recoveries via Pseudo-GTID &amp; Orchestrator. Booking.com uses many different MySQL topologies, of varying vendors, configurations and workloads: Oracle MySQL, MariaDB, statement based replication, row based replication, hybrid, OLTP, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[116,62,108,115,8],"class_list":["post-7453","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-failover","tag-high-availability","tag-orchestrator","tag-pseudo-gtid","tag-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1Wd","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7453"}],"version-history":[{"count":18,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7453\/revisions"}],"predecessor-version":[{"id":7504,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7453\/revisions\/7504"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}