{"id":7577,"date":"2016-07-07T14:54:25","date_gmt":"2016-07-07T12:54:25","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7577"},"modified":"2016-07-07T14:54:25","modified_gmt":"2016-07-07T12:54:25","slug":"solving-the-non-atomic-table-swap-take-iii-making-it-atomic","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/solving-the-non-atomic-table-swap-take-iii-making-it-atomic","title":{"rendered":"Solving the non-atomic table swap, Take III: making it atomic"},"content":{"rendered":"<p>With the unintended impression of becoming live blogging, we now follow up on <a title=\"Link to Solving the non-atomic table swap, Take II\" href=\"http:\/\/code.openark.org\/blog\/mysql\/solving-the-non-atomic-table-swap-take-ii\" rel=\"bookmark\">Solving the non-atomic table swap, Take II<\/a> and <a title=\"Link to Solving the Facebook-OSC non-atomic table swap problem\" href=\"http:\/\/code.openark.org\/blog\/mysql\/solving-the-facebook-osc-non-atomic-table-swap-problem\" rel=\"bookmark\">Solving the Facebook-OSC non-atomic table swap problem<\/a> with a safe, blocking, <em>atomic<\/em> solution<\/p>\n<h3>Why yet another iteration?<\/h3>\n<p>The solution presented in <a title=\"Link to Solving the non-atomic table swap, Take II\" href=\"http:\/\/code.openark.org\/blog\/mysql\/solving-the-non-atomic-table-swap-take-ii\" rel=\"bookmark\">Solving the non-atomic table swap, Take II<\/a> was good, in that it was safe. No data corruption. Optimistic: if no connection is killed throughout the process, then completely blocking.<\/p>\n<p>Two outstanding issues remained:<\/p>\n<ul>\n<li>If something did go wrong, the solution reverted to a table-outage<\/li>\n<li>On replicas, the table swap is non atomic, non blocking. There&#8217;s table-outage scenario on replica.<\/li>\n<\/ul>\n<p>As it turns out, there&#8217;s a simpler solution which overcomes both the above. As with math and physics, the simpler solution is often the preferred one. But it took those previous iterations to gather a few ideas together. So, anyway:<\/p>\n<h3>Safe, locking, atomic, asynchronous table swap<\/h3>\n<p>Do read the aforementioned previous posts; the quick-quick recap is: we want to be able to <strong>LOCK<\/strong> a table <strong>tbl<\/strong>, then do some stuff, then swap it out and put some <strong>ghost<\/strong> table in its place. MySQL does not allow us to <strong>rename tbl to tbl_old, ghost to tbl<\/strong> if we have locks on <strong>tbl<\/strong> in that session.<\/p>\n<p>The solution we offer is now based on two connections only (as opposed to three, in the <em>optimistic<\/em> approach). &#8220;Our&#8221; connections will be C10, C20. The &#8220;normal&#8221; app connections are C1..C9, C11..C19, C21..C29.<\/p>\n<ul>\n<li>Connections C1..C9 operate on <strong>tbl<\/strong> with normal DML: <strong>INSERT, UPDATE, DELETE<\/strong><\/li>\n<li>Connection C10: <strong>CREATE TABLE tbl_old (id int primary key) COMMENT=&#8217;magic-be-here&#8217;<\/strong><\/li>\n<li>Connection C10: <strong>LOCK TABLES tbl WRITE, tbl_old WRITE<\/strong><\/li>\n<li>Connections C11..C19, newly incoming, issue queries on <strong>tbl<\/strong> but are blocked due to the <strong>LOCK<\/strong><\/li>\n<li>Connection C20: <strong>RENAME TABLE tbl TO tbl_old, ghost TO tbl<\/strong><br \/>\nThis is blocked due to the <strong>LOCK<\/strong>, <em>but<\/em> gets prioritized on top connections C11..C19 and on top C1..C9 or any other connection that attempts DML on <strong>tbl<\/strong><\/li>\n<li>Connections C21..C29, newly incoming, issue queries on <strong>tbl<\/strong> but are blocked due to the <strong>LOCK<\/strong> and due to the <strong>RENAME<\/strong>, waiting in queue<\/li>\n<li>Connection C10: checks that C20&#8217;s <strong>RENAME<\/strong> is applied (looks for the blocked <strong>RENAME<\/strong> in processlist)<\/li>\n<li>Connection 10: <strong>DROP TABLE tbl_old<\/strong><br \/>\nNothing happens yet; <strong>tbl<\/strong> is still locked. All other connections still blocked.<\/li>\n<li>Connection 10: <strong>UNLOCK TABLES<br \/>\nBAM!<\/strong> The <strong>RENAME<\/strong> is first to execute, <strong>ghost<\/strong> table is swapped in place of <strong>tbl<\/strong>, then C1..C9, C11..C19, C21..C29 all get to operate on the new and shiny <strong>tbl<\/strong><\/li>\n<\/ul>\n<p>Some notes<!--more--><\/p>\n<ul>\n<li>We create <strong>tbl_old<\/strong> as a blocker for a premature swap<\/li>\n<li>It is allowed for a connection to <strong>DROP<\/strong> a table it has under a <strong>WRITE LOCK<\/strong><\/li>\n<li>A blocked <strong>RENAME<\/strong> is always prioritized over a blocked <strong>INSERT\/UPDATE\/DELETE<\/strong>, no matter who came first<\/li>\n<\/ul>\n<h3>What happens on failures?<\/h3>\n<p>Much fun. Just works; no rollback required.<\/p>\n<ul>\n<li>If C10 errors on the <strong>CREATE<\/strong> we do not proceed.<\/li>\n<li>If C10 errors on the <strong>LOCK<\/strong> statement, we do not proceed. The table is not locked. App continues to operate as normal.<\/li>\n<li>If C10 dies just as C20 is about to issue the <strong>RENAME<\/strong>:\n<ul>\n<li>The lock is released, the queries C1..C9, C11..C19 immediately operate on <strong>tbl<\/strong>.<\/li>\n<li>C20&#8217;s <strong>RENAME<\/strong> immediately fails because <strong>tbl_old<\/strong> exists.<br \/>\nThe entire operation is failed, but nothing terrible happens; some queries were blocked for some time is all. We will need to retry everything<\/li>\n<\/ul>\n<\/li>\n<li>If C10 dies while C20 is blocked on <strong>RENAME<\/strong>: Mostly similar to the above. Lock released, then C20 fails the <strong>RENAME<\/strong> (because <strong>tbl_old<\/strong> exists), then all queries resume normal operation<\/li>\n<li>If C20 dies before C10 drops the table, we catch the error and let C10 proceed as planned: <strong>DROP, UNLOCK<\/strong>. Nothing terrible happens, some queries were blocked for some time. We will need to retry<\/li>\n<li>If C20 dies just after C10 <strong>DROP<\/strong>s the table but before the unlock, same as above.<\/li>\n<li>If both C10 and C20 die, no problem: <strong>LOCK<\/strong> is cleared; <strong>RENAME<\/strong> lock is cleared. C1..C9, C11..C19, C21..C29 are free to operate on <strong>tbl<\/strong>.<\/li>\n<\/ul>\n<p>No matter what happens, at the end of operation we look for the <strong>ghost<\/strong> table. Is it still there? Then we know the operation failed, &#8220;atomically&#8221;. Is it not there? Then it has been renamed to <strong>tbl<\/strong>, and the operation worked atomically.<\/p>\n<p>A side note on failure is the matter of cleaning up the magic <strong>tbl_old<\/strong>. Here this is a matter of taste. Maybe just let it live and avoid recreating it, or you can drop it if you like.<\/p>\n<h3>Impact on app<\/h3>\n<p>App connections are guaranteed to be blocked, either until <strong>ghost<\/strong> is swapped in, or until operation fails. In the former, they proceed to operate on the new table. In the latter, they proceed to operate on the original table.<\/p>\n<h3>Impact on replication<\/h3>\n<p>Replication only sees the <strong>RENAME<\/strong>. There is no <strong>LOCK<\/strong> in the binary logs. Thus, replication sees an atomic two-table swap. There is no table-outage.<\/p>\n<h3>Conclusion<\/h3>\n<p>This solution satisfies all we wanted to achieve. We&#8217;re unlikely to give this another iteration. Well, if some yet-more-elegant solution comes along I&#8217;ll be tempted, for the beauty of it, but the solution offered in this post is simple-enough, safe, atomic, replication friendly, and should make everyone happy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the unintended impression of becoming live blogging, we now follow up on Solving the non-atomic table swap, Take II and Solving the Facebook-OSC non-atomic table swap problem with a safe, blocking, atomic solution Why yet another iteration? The solution presented in Solving the non-atomic table swap, Take II was good, in that it was [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[57,121],"class_list":["post-7577","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-open-source","tag-operations"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1Yd","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7577"}],"version-history":[{"count":8,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7577\/revisions"}],"predecessor-version":[{"id":7592,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7577\/revisions\/7592"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}