{"id":5034,"date":"2012-09-05T07:04:05","date_gmt":"2012-09-05T05:04:05","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=5034"},"modified":"2012-09-05T07:05:56","modified_gmt":"2012-09-05T05:05:56","slug":"table-split-for-the-masses","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/table-split-for-the-masses","title":{"rendered":"Table split(&#8230;) for the masses"},"content":{"rendered":"<p>(pun intended)<\/p>\n<p><em>common_schema<\/em>&#8216;s new <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/query_script_split.html\"><strong>split<\/strong><\/a> statement (see <a href=\"http:\/\/code.openark.org\/blog\/mysql\/common_schema-1-1-released-split-try-catch-killall-profiling\">release announcement<\/a>) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, <em>split<\/em> breaks one&#8217;s query into smaller queries, each working on a different set of rows (a chunk).<\/p>\n<p>Thus, it is possible to avoid holding locks for long times, allowing for smaller transactions. It also makes for breathing space for the RDBMS, at times boosting operation speed, and at times prolonging operation speed at will.<\/p>\n<p>In this post I show how <em>split<\/em> exposes itself to the user, should the user wish so.<\/p>\n<p><em>split<\/em> can manage queries of the following forms:<\/p>\n<ul>\n<li>DELETE FROM table_name [WHERE]&#8230;<\/li>\n<li>DELETE FROM table_name USING &lt;multi table syntax&gt; [WHERE]&#8230;<\/li>\n<li>UPDATE table_name SET &#8230; [WHERE]&#8230;<\/li>\n<li>UPDATE &lt;multiple tables&gt; SET &#8230; [WHERE]&#8230;<\/li>\n<li>INSERT INTO some_table SELECT &#8230; FROM &lt;single or multiple tables&gt; [WHERE]&#8230;<\/li>\n<li>REPLACE INTO some_table SELECT &#8230; FROM &lt;single or multiple tables&gt; [WHERE]&#8230;<\/li>\n<li>SELECT &#8230; FROM &lt;multiple tables&gt; [WHERE]&#8230;<\/li>\n<\/ul>\n<p>The latter being a non-obvious one at first sight.<\/p>\n<h4>Basically, it&#8217; automatic<\/h4>\n<p>You just say:<\/p>\n<blockquote>\n<pre><strong>split<\/strong> (UPDATE sakila.inventory SET last_update = last_update + INTERVAL 6 HOUR)\r\n\u00a0 throttle 2;<\/pre>\n<\/blockquote>\n<p>And <em>split<\/em> identifies <strong>sakila.inventory<\/strong> as the table which needs to be split, and injects appropriate conditions so as to work on a subset of the rows, in multiple steps.<\/p>\n<p>By the way, here&#8217;s <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/query_script_execution.html\">how to execute a QueryScript code<\/a> like the above.<!--more--><\/p>\n<h4>But you can drive in manual mode<\/h4>\n<p>You can use the following syntax:<\/p>\n<blockquote>\n<pre><strong>split<\/strong> (sakila.inventory)\r\n{\r\n\u00a0 -- No action taken, but this block of code\r\n\u00a0 -- is executed per chunk of the table.\r\n\u00a0 -- I wonder what can be done here?\r\n}<\/pre>\n<\/blockquote>\n<p><em>split<\/em> provides with <em>magic variables<\/em>, which you can use in the action block. These are:<\/p>\n<ul>\n<li><strong>$split_step<\/strong>: <strong>1<\/strong>-based loop counter<\/li>\n<li><strong>$split_rowcount<\/strong>: number of rows affected in current chunk operation<\/li>\n<li><strong>$split_total_rowcount<\/strong>: total number of rows affected during this <em>split<\/em> statement<\/li>\n<li><strong>$split_total_elapsed_time<\/strong>: number of seconds elapsed since beginning of this <em>split<\/em> operation.<\/li>\n<li><strong>$split_clause<\/strong>: <em>the<\/em> magic variable: the filtering condition limiting rows to current chunk.<\/li>\n<li><strong>$split_table_schema<\/strong>: the explicit or inferred schema of split table<\/li>\n<li><strong>$split_table_name<\/strong>: the explicit or inferred table being split<\/li>\n<\/ul>\n<p>To illustrate, consider the following script:<\/p>\n<blockquote>\n<pre><strong>split<\/strong> (sakila.inventory)\r\n{\r\n\u00a0 select <strong>$split_step<\/strong> as step, <strong>$split_clause<\/strong> as clause;\r\n}<\/pre>\n<\/blockquote>\n<p>The output is this:<\/p>\n<blockquote>\n<pre>+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\r\n| step | clause\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\r\n|\u00a0\u00a0\u00a0 1 | ((((`inventory`.`inventory_id` &gt; '1')) OR ((`inventory`.`inventory_id` = '1'))) AND (((`inventory`.`inventory_id` &lt; '1000')) OR ((`inventory`.`inventory_id` = '1000')))) |\r\n+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\r\n\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n| step | clause\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n|\u00a0\u00a0\u00a0 2 | ((((`inventory`.`inventory_id` &gt; '1000'))) AND (((`inventory`.`inventory_id` &lt; '2000')) OR ((`inventory`.`inventory_id` = '2000')))) |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n| step | clause\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n|\u00a0\u00a0\u00a0 3 | ((((`inventory`.`inventory_id` &gt; '2000'))) AND (((`inventory`.`inventory_id` &lt; '3000')) OR ((`inventory`.`inventory_id` = '3000')))) |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n| step | clause\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n|\u00a0\u00a0\u00a0 4 | ((((`inventory`.`inventory_id` &gt; '3000'))) AND (((`inventory`.`inventory_id` &lt; '4000')) OR ((`inventory`.`inventory_id` = '4000')))) |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n| step | clause\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+\r\n|\u00a0\u00a0\u00a0 5 | ((((`inventory`.`inventory_id` &gt; '4000'))) AND (((`inventory`.`inventory_id` &lt; '4581')) OR ((`inventory`.`inventory_id` = '4581')))) |\r\n+------+--------------------------------------------------------------------------------------------------------------------------------------+<\/pre>\n<\/blockquote>\n<p>So you can get yourself a nice present: the SQL clause which filters the distinct chunks.<\/p>\n<h4>A simple demo: what can the user do with &#8220;manual mode&#8221;?<\/h4>\n<p>Normally, I would expect the user to use the automated version of <em>split<\/em>. Let it do the hard work! But sometimes, you may wish to take control into your hands.<\/p>\n<p>Consider an example: I wish to export a table into CSV file, but in chunks. <a href=\"http:\/\/www.percona.com\/doc\/percona-toolkit\/2.1\/pt-archiver.html\">pt-archiver<\/a> does that. But it is also easily achievable with <em>split<\/em>:<\/p>\n<blockquote>\n<pre><strong>split<\/strong> (sakila.inventory) {\r\n\u00a0 var <strong>$file_name<\/strong> := QUOTE(CONCAT('\/tmp\/inventory_chunk_', <strong>$split_step<\/strong>, '.csv'));\r\n\u00a0 select * from sakila.inventory WHERE <strong>:${split_clause}<\/strong> INTO OUTFILE <strong>:${file_name}<\/strong>;\r\n}<\/pre>\n<\/blockquote>\n<p>This script uses the powerful <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/query_script_variables.html\">variable expansion<\/a> feature of QueryScript: it extracts the text behind <strong><\/strong><strong>:${split_clause}<\/strong> and plants it as part of the query. It does the same for <strong><\/strong><strong>:${file_name}<\/strong>, making a variable possible where MySQL would normally disallow one (the <strong>INTO OUTFILE<\/strong> clause only accepts a constant string).<\/p>\n<p>What do we get as result?<\/p>\n<blockquote>\n<pre><strong>bash:\/tmp$ ls -s1 inventory_chunk_*<\/strong>\r\n32 inventory_chunk_1.csv\r\n32 inventory_chunk_2.csv\r\n32 inventory_chunk_3.csv\r\n32 inventory_chunk_4.csv\r\n20 inventory_chunk_5.csv<\/pre>\n<\/blockquote>\n<h4>Conclusion<\/h4>\n<p>During the past months, and even as I developed <em>split<\/em> for <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/query_script.html\">QueryScript<\/a>, I found myself using it more and more for my own purposes. As it evolved I realized how much more simple it makes these complex operations. Heck, it beats <a href=\"http:\/\/openarkkit.googlecode.com\/svn\/trunk\/openarkkit\/doc\/html\/oak-chunk-update.html\">oak-chunk-update<\/a> in its ease of use. They both have their place, but <em>split<\/em> is so much more intuitive and easy to write. And, no external scripts, no package dependencies.<\/p>\n<p>I suggest that <em>split<\/em> is a major tool for server side scripting, server maintenance, developer operations. <a href=\"http:\/\/code.google.com\/p\/common-schema\/\">Check it out<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(pun intended) common_schema&#8216;s new split statement (see release announcement) auto-splits complex queries over large tables into smaller ones: instead of issuing one huge query, split breaks one&#8217;s query into smaller queries, each working on a different set of rows (a chunk). Thus, it is possible to avoid holding locks for long times, allowing for smaller [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[67,26,24,76,50],"class_list":["post-5034","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-common_schema","tag-indexing","tag-information_schema","tag-queryscript","tag-scripts"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1jc","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5034","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=5034"}],"version-history":[{"count":48,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5034\/revisions"}],"predecessor-version":[{"id":5346,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5034\/revisions\/5346"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=5034"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=5034"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=5034"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}