{"id":7865,"date":"2018-05-08T10:02:19","date_gmt":"2018-05-08T08:02:19","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7865"},"modified":"2018-05-22T10:44:07","modified_gmt":"2018-05-22T08:44:07","slug":"mysql-master-discovery-methods-part-3-app-service-discovery","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-3-app-service-discovery","title":{"rendered":"MySQL master discovery methods, part 3: app &#038; service discovery"},"content":{"rendered":"<p>This is the third in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.<\/p>\n<p>These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share <code>orchestrator<\/code> specific configuration\/advice, and point out where cross DC <code>orchestrator\/raft<\/code> setup plays part in discovery itself, but for the most part any recovery tool such as <code>MHA<\/code>, <code>replication-manager<\/code>, <code>severalnines<\/code> or other, is applicable.<\/p>\n<p>We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera\/XtraDB Cluster\/InnoDB Cluster).<\/p>\n<h3>App &amp; service discovery<\/h3>\n<p><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-1-dns\">Part 1<\/a> and <a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-2-vip-dns\">part 2<\/a> presented solutions where the app remained ingorant of master&#8217;s identity. This part takes a complete opposite direction and gives the app ownership on master access.<\/p>\n<p>We introduce a service discovery component. Commonly known are <em>Consul<\/em>, <em>ZooKeeper<\/em>, <em>etcd<\/em>, highly available stores offering key\/value (K\/V) access, leader election or full blown service discovery &amp; health.<\/p>\n<p>We satisfy ourselves with K\/V functionality. A key would be <code>mysql\/master\/cluster1<\/code> and a value would be the master&#8217;s hostname\/port.<\/p>\n<p>It is the app&#8217;s responsibility at all times to fetch the identity of the master of a given cluster by querying the service discovery component, thereby opening connections to the indicated master.<\/p>\n<p>The service discovery component is expected to be up at all times and to contain the identity of the master for any given cluster.<\/p>\n<p><!--more--><\/p>\n<h3>A non planned failover illustration #1<\/h3>\n<p>Master <code>M<\/code> has died. <code>R<\/code> gets promoted in its place. Our recovery tool:<\/p>\n<ul>\n<li>Updates the service discovery component, key is <code>mysql\/master\/cluster1<\/code>, value is <code>R<\/code>&#8216;s hostname.<\/li>\n<\/ul>\n<p>Clients:<\/p>\n<ul>\n<li>Listen on K\/V changes, recognize that master&#8217;s value has changed.<\/li>\n<li>Reconfigure\/refresh\/reload\/do what it takes to speak to new master and to drop connections to old master.<\/li>\n<\/ul>\n<h3>A non planned failover illustration #2<\/h3>\n<p>Master <code>M<\/code> gets network isolated for <code>10<\/code> seconds, during which time we failover. <code>R<\/code> gets promoted. Our tool (as before):<\/p>\n<ul>\n<li>Updates the service discovery component, key is <code>mysql\/master\/cluster1<\/code>, value is <code>R<\/code>&#8216;s hostname.<\/li>\n<\/ul>\n<p>Clients (as before):<\/p>\n<ul>\n<li>Listen on K\/V changes, recognize that master&#8217;s value has changed.<\/li>\n<li>Reconfigure\/refresh\/reload\/do what it takes to speak to new master and to drop connections to old master.<\/li>\n<li>Any changes not taking place in a timely manner imply some connections still use old master <code>M<\/code>.<\/li>\n<\/ul>\n<h3>Planned failover illustration<\/h3>\n<p>We wish to replace the master, for maintenance reasons. We successfully and gracefully promote <code>R<\/code>.<\/p>\n<ul>\n<li>App should start connecting to <code>R<\/code>.<\/li>\n<\/ul>\n<h3>Discussion<\/h3>\n<p>The app is the complete owner. This calls for a few concerns:<\/p>\n<ul>\n<li>How does a given app refresh and apply the change of master such that no stale connections are kept?\n<ul>\n<li>Highly concurrent apps may be more difficult to manage.<\/li>\n<\/ul>\n<\/li>\n<li>In a polyglot app setup, you will need all clients to use the same setup. Implement same listen\/refresh logic for Ruby, golang, Java, Python, Perl and notably shell scripts.\n<ul>\n<li>The latter do not play well with such changes.<\/li>\n<\/ul>\n<\/li>\n<li>How can you validate that the change of master has been detected by all app nodes?<\/li>\n<\/ul>\n<p>As for the service discovery:<\/p>\n<ul>\n<li>What load will you be placing on your service discovery component?\n<ul>\n<li>I was familiar with a setup where there were so many apps and app nodes and app instances, such that the amount of connections was too much for the service discovery . In that setup caching layers were created, which introduced their own consistency problems.<\/li>\n<\/ul>\n<\/li>\n<li>How do you handle service discovery outage?\n<ul>\n<li>A reasonable approach is to keep using last known master idendity should service discovery be down. This, again, plays better wih higher level applications, but less so with scripts.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>It is worth noting that this setup does not suffer from geographical limitations to the master&#8217;s identity. The master can be anywhere; the service discovery component merely points out where the master is.<\/p>\n<h3>Sample orchestrator configuration<\/h3>\n<p>An <code>orchestrator<\/code> configuration would look like this:<\/p>\n<pre><code class=\"json\">  \"ApplyMySQLPromotionAfterMasterFailover\": true,\n  \"KVClusterMasterPrefix\": \"mysql\/master\",\n  \"ConsulAddress\": \"127.0.0.1:8500\",\n  \"ZkAddress\": \"srv-a,srv-b:12181,srv-c\",\n  \"PostMasterFailoverProcesses\": [\n    \u201c\/just\/let\/me\/know about failover on {failureCluster}\u201c,\n  ],\n<\/code><\/pre>\n<p>In the above:<\/p>\n<ul>\n<li>If <code>ConsulAddress<\/code> is specified, <code>orchestrator<\/code> will update given <em>Consul<\/em> setup with K\/V changes.<\/li>\n<li>At <code>3.0.10<\/code>, <em>ZooKeeper<\/em>, via <code>ZkAddress<\/code>, is still not supported by <code>orchestrator<\/code>.<\/li>\n<li><code>PostMasterFailoverProcesses<\/code> is here just to point out hooks are not strictly required for the operation to run.<\/li>\n<\/ul>\n<p>See <a href=\"https:\/\/github.com\/github\/orchestrator\/blob\/master\/docs\/configuration.md\">orchestrator configuration<\/a> documentation.<\/p>\n<h3>All posts in this series<\/h3>\n<ul>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-1-dns\">MySQL master discovery methods, part 1: DNS<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-2-vip-dns\">MySQL master discovery methods, part 2: VIP &amp; DNS<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-3-app-service-discovery\">MySQL master discovery methods, part 3: app &amp; service discovery<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-4-proxy-heuristics\">MySQL master discovery methods, part 4: Proxy heuristics<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-5-service-discovery-proxy\">MySQL master discovery methods, part 5: Service discovery &amp; Proxy<\/a><\/li>\n<li><a href=\"http:\/\/code.openark.org\/blog\/mysql\/mysql-master-discovery-methods-part-6-other-methods\">MySQL master discovery methods, part 6: other methods<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This is the third in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master. These posts are not concerned with the manner by [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[62,108,8],"class_list":["post-7865","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-high-availability","tag-orchestrator","tag-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-22R","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7865","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7865"}],"version-history":[{"count":7,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7865\/revisions"}],"predecessor-version":[{"id":7906,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7865\/revisions\/7906"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}