{"id":7867,"date":"2018-05-10T08:10:34","date_gmt":"2018-05-10T06:10:34","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7867"},"modified":"2018-05-22T10:43:52","modified_gmt":"2018-05-22T08:43:52","slug":"mysql-master-discovery-methods-part-4-proxy-heuristics","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-4-proxy-heuristics","title":{"rendered":"MySQL master discovery methods, part 4: Proxy heuristics"},"content":{"rendered":"<p><em>Note: the method described here is an anti pattern<\/em><\/p>\n<p>This is the fourth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.<\/p>\n<p>These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share <code>orchestrator<\/code> specific configuration\/advice, and point out where cross DC <code>orchestrator\/raft<\/code> setup plays part in discovery itself, but for the most part any recovery tool such as <code>MHA<\/code>, <code>replication-manager<\/code>, <code>severalnines<\/code> or other, is applicable.<\/p>\n<p>We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera\/XtraDB Cluster\/InnoDB Cluster).<\/p>\n<h3>Master discovery via Proxy Heuristics<\/h3>\n<p>In Proxy Heuristics all clients connect to the master through a proxy. The proxy observes the backend MySQL servers and determines who the master is.<\/p>\n<p><strong>This setup is simple and easy, but is an anti pattern. I recommend against using this method, as explained shortly<\/strong>.<\/p>\n<p>Clients are all configured to connect to, say, <code>cluster1-writer.proxy.example.net:3306<\/code>. The proxy will intercept incoming requests either based on hostname or by port. It is aware of all\/some MySQL backend servers in that cluster, and will route traffic to the master <code>M<\/code>.<\/p>\n<p>A simple heuristic that I&#8217;ve seen in use is: pick the server that has <code>read_only=0<\/code>, a very simple check.<\/p>\n<p>Let&#8217;s take a look at how this works and what can go wrong.<\/p>\n<p><!--more--><\/p>\n<h3>A non planned failover illustration #1<\/h3>\n<p>Master <code>M<\/code> has died, the box had a power failure. <code>R<\/code> gets promoted in its place. Our recovery tool:<\/p>\n<ul>\n<li>Fails over, but doesn&#8217;t need to run any hooks.<\/li>\n<\/ul>\n<p>The proxy:<\/p>\n<ul>\n<li>Knows both about <code>M<\/code> and <code>R<\/code>.<\/li>\n<li>Notices <code>M<\/code> fails health checks (<code>select @@global.read_only<\/code> returns error since the box is down).<\/li>\n<li>Notices <code>R<\/code> reports healthy and with <code>read_only=0<\/code>.<\/li>\n<li>Routes all traffic to <code>R<\/code>.<\/li>\n<\/ul>\n<p>Success, we&#8217;re happy.<\/p>\n<h3>Configuration tip<\/h3>\n<p>With an automated failover solution, use <code>read_only=1<\/code> in <code>my.cnf<\/code> at all times. Only the failover solution will set a server to <code>read_only=0<\/code>.<\/p>\n<p>With this configuration, when <code>M<\/code> restarts, MySQL starts up as <code>read_only=1<\/code>.<\/p>\n<h3>A non planned failover illustration #2<\/h3>\n<p>Master <code>M<\/code> gets network isolated for <code>10<\/code> seconds, during which time we failover. <code>R<\/code> gets promoted. Our tool:<\/p>\n<ul>\n<li>Fails over, but doesn&#8217;t need to run any hooks.<\/li>\n<\/ul>\n<p>The proxy:<\/p>\n<ul>\n<li>Knows both about <code>M<\/code> and <code>R<\/code>.<\/li>\n<li>Notices <code>M<\/code> fails health checks (<code>select @@global.read_only<\/code> returns error since the box is down).<\/li>\n<li>Notices <code>R<\/code> reports healthy and with <code>read_only=0<\/code>.<\/li>\n<li>Routes all traffic to <code>R<\/code>.<\/li>\n<li><code>10<\/code> seconds later <code>M<\/code> comes back to life, claiming <code>read_only=0<\/code>.<\/li>\n<li>The proxy now sees two servers reporting as healthy and with <code>read_only=0<\/code>.<\/li>\n<li>The proxy has no context. It does not know why both are reporting the same. It is unaware of failovers. All it sees is what the backend MySQL servers report.<\/li>\n<\/ul>\n<p>Therein lies the problem: you can not trust multiple servers (MySQL backends) to deterministically pick a leader (the master) without them collaborating on some elaborate consensus communication.<\/p>\n<h3>A non planned failover illustration #3<\/h3>\n<p>Master <code>M<\/code> box is overloaded, issuing <code>too many connections<\/code> for incoming connections.<\/p>\n<p>Our tool decides to failover.<\/p>\n<ul>\n<li>And doesn&#8217;t need to run any hooks.<\/li>\n<\/ul>\n<p>The proxy:<\/p>\n<ul>\n<li>Notices <code>M<\/code> fails health checks (<code>select @@global.read_only<\/code> does not respond because of the load).<\/li>\n<li>Notices <code>R<\/code> reports healthy and with <code>read_only=0<\/code>.<\/li>\n<li>Routes all traffic to <code>R<\/code>.<\/li>\n<li>Shortly followed by <code>M<\/code> recovering (since no more writes are sent its way), claiming <code>read_only=0<\/code>.<\/li>\n<li>The proxy now sees two servers reporting as healthy and with <code>read_only=0<\/code>.<\/li>\n<\/ul>\n<p>Again, the proxy has no context, and neither do <code>M<\/code> and <code>R<\/code>, for that matter. The context (the fact we failed over from <code>M<\/code> to <code>R<\/code>) was known to our failover tool, but was lost along the way.<\/p>\n<h3>Planned failover illustration<\/h3>\n<p>We wish to replace the master, for maintenance reasons. We successfully and gracefully promote <code>R<\/code>.<\/p>\n<ul>\n<li><code>M<\/code> is available and responsive, we set it to <code>read_only=1<\/code>.<\/li>\n<li>We set <code>R<\/code> to <code>read_only=0<\/code>.<\/li>\n<li>All new connections route to <code>R<\/code>.<\/li>\n<li>We should also instruct our Proxy to kill all previous connections to <code>M<\/code>.<\/li>\n<\/ul>\n<p>This works very nicely.<\/p>\n<h3>Discussion<\/h3>\n<p>There is a substantial risk to this method. Correlation between failover and network partitioning\/load (illustrations #2 and #3) is reasonable.<\/p>\n<p>The root of the problem is that we expect individual servers to resolve conflicts without speaking to each other: we expect the MySQL servers to correctly claim &#8220;I&#8217;m the master&#8221; without context.<\/p>\n<p>We then add to that problem by using the proxy to &#8220;pick a side&#8221; without giving it any context, either.<\/p>\n<h3>Sample orchestrator configuration<\/h3>\n<p>By way of discouraging use of this method I do not present an <code>orchestrator<\/code> configuration file.<\/p>\n<h3>All posts in this series<\/h3>\n<ul>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-1-dns\">MySQL master discovery methods, part 1: DNS<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-2-vip-dns\">MySQL master discovery methods, part 2: VIP &amp; DNS<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-3-app-service-discovery\">MySQL master discovery methods, part 3: app &amp; service discovery<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-4-proxy-heuristics\">MySQL master discovery methods, part 4: Proxy heuristics<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-5-service-discovery-proxy\">MySQL master discovery methods, part 5: Service discovery &amp; Proxy<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-6-other-methods\">MySQL master discovery methods, part 6: other methods<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Note: the method described here is an anti pattern This is the fourth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[62,108,8],"class_list":["post-7867","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-high-availability","tag-orchestrator","tag-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-22T","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7867","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7867"}],"version-history":[{"count":8,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7867\/revisions"}],"predecessor-version":[{"id":7905,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7867\/revisions\/7905"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}