{"id":4245,"date":"2011-10-27T19:20:30","date_gmt":"2011-10-27T17:20:30","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=4245"},"modified":"2011-11-24T20:28:36","modified_gmt":"2011-11-24T18:28:36","slug":"contest-for-glory-write-a-self-throttling-mysql-query","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/contest-for-glory-write-a-self-throttling-mysql-query","title":{"rendered":"Contest for Glory: write a self throttling MySQL query"},"content":{"rendered":"<p>What&#8217;s all this about?<\/p>\n<p>I&#8217;ve you&#8217;ve <a href=\"http:\/\/code.openark.org\/blog\/mysql\/slides-from-my-talk-programmatic-queries-things-you-can-code-with-sql\">been to my talk<\/a> in London, then you&#8217;ve already got this as homework. If not, allow me to fill in the details:<\/p>\n<p>I was speaking about MySQL&#8217;s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of evaluation, time suspension and time calculation.<\/p>\n<p>An issue I presented was that of a very long running query. Say it runs for <strong>20<\/strong> minutes. It kills your I\/O. Do you <strong>KILL<\/strong> the query or wait? It <em>could<\/em> terminate in <strong>5<\/strong> seconds from now, and if you kill it now, you lose everything. But it may yet run for <strong>3<\/strong> more hours!<\/p>\n<p>We discussed a futuristic feature where the query would gracefully terminate after some designated time. However, futuristic wishes do not help us.<\/p>\n<h4>A self throttling query<\/h4>\n<p>I suggested the idea for a self throttling query. We know how to throttle writing queries, such as <strong>DELETE<\/strong> queries: we break them into small chunks, then work each chunk at a time, setting ourselves to sleep in between chunks. This is how <strong>&#8211;sleep<\/strong> and <strong>&#8211;sleep-ratio<\/strong> work in <a href=\"http:\/\/openarkkit.googlecode.com\/svn\/trunk\/openarkkit\/doc\/html\/oak-chunk-update.html\">oak-chunk-update<\/a>. It is how <strong>&#8211;sleep<\/strong> and<strong> &#8211;sleep-coef<\/strong> work in <a href=\"http:\/\/www.percona.com\/doc\/percona-toolkit\/pt-archiver.html\">pt-archiver<\/a>.<\/p>\n<p>But can the same be done for <strong>SELECT<\/strong> queries?<!--more--><\/p>\n<p>The answer is: yes. And it can be done from within the query itself. And it&#8217;s your job to do it.<\/p>\n<p>So here are the rules:<\/p>\n<ul>\n<li>For simplicity, assume the <strong>SELECT<\/strong> executes over a single large table.<\/li>\n<li>We assume no <strong>GROUP BY<\/strong>, <strong>DISTINCT<\/strong> and the like.<\/li>\n<li>There can be <strong>WHERE<\/strong>, <strong>ORDER BY<\/strong> etc.<\/li>\n<\/ul>\n<p>As an example:<\/p>\n<blockquote>\n<pre>SELECT rental_id, TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days FROM sakila.rental<\/pre>\n<\/blockquote>\n<p>Here is the problem:<\/p>\n<ul>\n<li>The query takes too much disk I\/O<\/li>\n<\/ul>\n<p>Suggestion: make the query self throttling. We&#8217;re going to make it run for a <em>longer<\/em> time, allowing for non-I\/O periods within the query. We assume we are not bothered with locks.<\/p>\n<h4>Task #1<\/h4>\n<ul>\n<li>Write a self throttling query: rewrite the query above such that for every <strong>1,000<\/strong> rows, it goes to sleep for <strong>2<\/strong> seconds.<\/li>\n<\/ul>\n<h4>Task #2<\/h4>\n<ul>\n<li>Improve upon the previous solution: rewrite the query such that for every <strong>1,000<\/strong> rows, it goes to sleep for the same amount of time it took for those <strong>1,000<\/strong> rows to execute (effectively doubling the execution time of the query).<\/li>\n<\/ul>\n<h4>Advanced<\/h4>\n<ul>\n<li>There are issues with security, but in principal it is possible to have the query go to sleep when actual disk I\/O exceeds some value. Code it!<\/li>\n<\/ul>\n<h4>Prove it<\/h4>\n<p>devise a method to <em>prove<\/em> that sleeps do occur as required.<\/p>\n<h4>How to submit a solution?<\/h4>\n<ul>\n<li>Simplest is to add as comment on this post. Use <strong>&lt;pre&gt;&#8230;&lt;\/pre&gt;<\/strong> for formatted code.<\/li>\n<li>You may also email me directly.<\/li>\n<\/ul>\n<p>I will pick up the best solution according to correctness, elegance and time of submission.<\/p>\n<p>The winners get <em><strong>glory<\/strong><\/em>: their names to be mentioned on this blog. Shall we save the discussion on the exact font face and size for later on?<\/p>\n<p><em><strong>UPDATE:<\/strong> it should have been clear from my talk: no stored routines involved!<\/em><\/p>\n<p><em><strong>UPDATE<\/strong><\/em>: solution is on: <a href=\"http:\/\/code.openark.org\/blog\/mysql\/self-throttling-mysql-queries\">Self throttling MySQL queries<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What&#8217;s all this about? I&#8217;ve you&#8217;ve been to my talk in London, then you&#8217;ve already got this as homework. If not, allow me to fill in the details: I was speaking about MySQL&#8217;s programmatic nature in many aspects (best if you read the slides!). We discussed user defined variables, derived tables, ordering constructs, order of [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[69,21],"class_list":["post-4245","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-community","tag-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-16t","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4245","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=4245"}],"version-history":[{"count":23,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4245\/revisions"}],"predecessor-version":[{"id":4457,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4245\/revisions\/4457"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=4245"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=4245"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=4245"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}