{"id":7384,"date":"2015-10-14T07:52:21","date_gmt":"2015-10-14T05:52:21","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7384"},"modified":"2019-03-25T12:48:45","modified_gmt":"2019-03-25T10:48:45","slug":"leader-election-using-mysql","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/leader-election-using-mysql","title":{"rendered":"Leader election using MySQL"},"content":{"rendered":"<p>Being a stateful, centralized datastore, MySQL can serve in negotiating leadership:\u00a0a mechanism to elect a single service out of multiple services; moreover, a mechanism to promote a new leader should the existing leader cease to function.<\/p>\n<h3>What of Zookeeper?<\/h3>\n<p>Zookeeper\u00a0makes for an excellent leader election mechanism. This is one of the most recognized uses for Zookeeper. It has HA via multiple nodes\u00a0&amp;\u00a0quorum,\u00a0\u00a0ephemeral nodes, all you need. To achieve similar benefits with MySQL you&#8217;d need to use Galera or NDB Cluster; so why not use Zk? The use case at hand is<\/p>\n<p><strong><a href=\"https:\/\/github.com\/outbrain\/orchestrator\">orchestrator<\/a><\/strong>, a multi-node, mostly stateless service that happens to use MySQL as backend datastore. Ir relies on MySQL to exist in backend. It already <em>expects it to be there<\/em>. If the MySQL server is down, so is the service, effectively. In such case it doesn&#8217;t hurt adding another dependency on MySQL; this does not reduce HA. You need to take care of MySQL HA anyhow so there&#8217;s no additional cost. In fact, going to Zookeeper makes the additional cost as you introduce a new component to the system that can be avoided.<\/p>\n<h3>Terms of the solution<\/h3>\n<p>Our proposed solution offers:<\/p>\n<ul>\n<li>Single leader election out of multiple nodes<\/li>\n<li>Leader actively reaffirms its leadership periodically<\/li>\n<li>Timeout based re-election: decision to re-elect new leader based on the fact current leader has not\u00a0reaffirmed its leadership over\u00a0X seconds<\/li>\n<li>A way to forcibly assume leadership for\u00a0a specific node<\/li>\n<li>A way to forcibly call for re-elections by demoting existing leader<\/li>\n<li>A node\/service can easily tell whether it&#8217;s the leader or not<\/li>\n<li>Anyone can tell who the leader is<\/li>\n<\/ul>\n<h3>SQL solution<\/h3>\n<p>The solution is composed of a single table and a set of queries which implement the above offers. We assume a service can uniquely identify itself; this is easy to achieve:<\/p>\n<ul>\n<li>If services are running from different hosts (as should be the case, this is service HA), use hostname for ID\n<ul>\n<li>But what if the service restarts? Are you good with calling this &#8220;the same service&#8221; or is this now a new service running on the same host?<\/li>\n<\/ul>\n<\/li>\n<li>In such case\u00a0use combination of hostname &amp; OS process ID\n<ul>\n<li>Or generate a random token upon startup<\/li>\n<li>Or use startup timestamp Whichever solution you pick, make sure it is human readable, such that\u00a0it is easy to tell <\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><em>which service\u00a0is the leader<\/em>. This helps operations. We note this as\u00a0<strong>service_id<\/strong><\/p>\n<h4>Table<\/h4>\n<p>The following table will have a single row; the<\/p>\n<p><strong>service_id<\/strong> in that row is the active leader.<\/p>\n<pre><code class=\"sql\">CREATE TABLE service_election ( \n  anchor tinyint(3) unsigned NOT NULL, \n  service_id varchar(128) NOT NULL, \n  last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, \n  PRIMARY KEY (anchor) \n) ENGINE=InnoDB \n<\/code><\/pre>\n<h4>Attempt election<\/h4>\n<p><!--more--> All services issue the following, periodically. Say every service issues this once per second:<\/p>\n<pre><code class=\"sql\">insert ignore into service_election ( anchor, service_id, last_seen_active ) values ( 1, 'my_service_number_7', now() ) on duplicate key update service_id = if(last_seen_active &lt; now() - interval 20 second, values(service_id), service_id), last_seen_active = if(service_id = values(service_id), values(last_seen_active), last_seen_active) ;\n<\/code><\/pre>\n<ul>\n<li>Replace <strong>&#8216;my_service_number_7&#8217;<\/strong> with specific <strong>service_id<\/strong> per service.* The above is the most complex query in our solution; breakdown: <\/li>\n<li>Assume timeout is set for <strong>20<\/strong> seconds<\/li>\n<li>The first ever service (<strong>my_service_number_7<\/strong>)\u00a0succeeds in registering (inserting its own <strong>service_id<\/strong> into the table)<\/li>\n<li>The next time this service issues the query, it finds that it is already the owner, hence updates <strong>last_seen_active<\/strong>.\n<ul>\n<li>And while it is alive and keeps on polling, it\u00a0remains the leader.<\/li>\n<\/ul>\n<\/li>\n<li>Immediately following, another service (<strong>my_service_number_12<\/strong>) does not update anything: since\u00a0<strong>last_seen_active &lt; now() &#8211; interval 20 second<\/strong> is <strong>false<\/strong>,\u00a0<strong>service_id<\/strong> is unchanged, nor is <strong>last_seen_active<\/strong>.<\/li>\n<li>Assuming said service (<strong>my_service_number_7<\/strong>) has died and\u00a0hasn&#8217;t injected anything in the last <strong>20<\/strong> seconds, a different service will inject its own <strong>service_id<\/strong> as well as update <strong>last_seen_active<\/strong>. It will become the leader.<\/li>\n<\/ul>\n<h4>Forcibly assume leadership<\/h4>\n<p>Make a certain service the leader:<\/p>\n<pre><code class=\"sql\">replace into service_election ( anchor, service_id, last_seen_active ) values ( 1, 'my_service_number_12', now() ) \n<\/code><\/pre>\n<p>The next time\u00a0<strong>my_service_number_12<\/strong> attempts to register its own leadership, it finds out it&#8217;s <em>already<\/em> the leader and updates <strong>last_seen_active<\/strong>.<\/p>\n<h4>Force re-elections<\/h4>\n<p>The next service to attempt election will succeed after\u00a0this:<\/p>\n<pre><code class=\"sql\">delete from service_election;\n<\/code><\/pre>\n<h4>Am I the leader?<\/h4>\n<p>A service can query as follows:<\/p>\n<pre><code class=\"sql\">select count(*) as is_leader from service_election where anchor=1 and service_id='my_service_number_7';\n<\/code><\/pre>\n<h4>Who is the leader?<\/h4>\n<p>This is of course very simple:<\/p>\n<pre><code class=\"sql\">select max(service_id) as leader from service_election where anchor=1;\n<\/code><\/pre>\n<h3>What of GET_LOCK()?<\/h3>\n<p><a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.6\/en\/miscellaneous-functions.html#function_get-lock\">GET_LOCK()<\/a> allows one to acquire a helper-lock of arbitrary name; this lock is reserved as long as the connection that acquired it lives. No other connection can acquire lock of same name.<\/p>\n<pre><code class=\"sql\">SELECT GET_LOCK(\"my_service_leadership\", 0)\n<\/code><\/pre>\n<p>Returns <strong>1<\/strong>\u00a0on success, <strong>0<\/strong>\u00a0on failure to obtain lock. This supposedly makes a much simpler\u00a0leader election mechanism. However I don&#8217;t like it for the following reasons: * You must allocate a dedicated\u00a0connection that lives for the duration of your service * This doesn&#8217;t play too well with ORMs or typical connection pools * There is no visibility (&lt; MySQL 5.7) into <em>who<\/em> is holding the lock. You only know\u00a0that it&#8217;s being held. * If your service hangs (but does not die), the lock is still being held. There is no &#8220;keepalive&#8221; requirement on the lock&#8217;s side.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Being a stateful, centralized datastore, MySQL can serve in negotiating leadership:\u00a0a mechanism to elect a single service out of multiple services; moreover, a mechanism to promote a new leader should the existing leader cease to function. What of Zookeeper? Zookeeper\u00a0makes for an excellent leader election mechanism. This is one of the most recognized uses for [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[21],"class_list":["post-7384","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1V6","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7384","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7384"}],"version-history":[{"count":22,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7384\/revisions"}],"predecessor-version":[{"id":7967,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7384\/revisions\/7967"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7384"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7384"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}