{"id":5008,"date":"2012-06-27T07:25:09","date_gmt":"2012-06-27T05:25:09","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=5008"},"modified":"2012-06-27T07:47:38","modified_gmt":"2012-06-27T05:47:38","slug":"delete-dont-insert","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/delete-dont-insert","title":{"rendered":"DELETE, don&#8217;t INSERT"},"content":{"rendered":"<p>Have just read <a href=\"http:\/\/blog.9minutesnooze.com\/insert-delete\/\">INSERT, Don\u2019t DELETE<\/a> by Aaron Brown, and have some lengthy response, which is why I write this post instead of commenting on said post.<\/p>\n<p>I wish to offer my counter thought and suggest that <strong>DELETE<\/strong>s are probably the better choice.<\/p>\n<p>Aaron suggests that, when one wishes to purge rows from some table, a trick can be used: instead of <strong>DELETE<\/strong>ing unwanted rows, one can <strong>INSERT<\/strong> &#8220;good&#8221; rows into a new table, then switch over with <strong>RENAME<\/strong> (but please read referenced post for complete details).<\/p>\n<p>I respectfully disagree on several points discussed.<\/p>\n<h4>Lockdown<\/h4>\n<p>The fact one needs to block writes during the time of creation of new table is problematic: you need to essentially turn off parts of your application. The posts suggests one could use a slave &#8211; but this solution is far from being trivial as well. To switch over, you yet again need to turn off access to DB, even if for a short while.<!--more--><\/p>\n<p>A switch over to a slave is quite a big deal, in my opinion, for the mere purpose of deletion of rows.<\/p>\n<h4>DELETEs are easy<\/h4>\n<p>The DELETEs are so much easier: the first thing to note is the following: <em>You don&#8217;t actually have to delete all the rows *at once*<\/em>.<\/p>\n<p>You just need to drop some rows, right? Why waste a huge transaction that takes minutes, when you can drop the rows by chunks, one at a time?<br \/>\nFor that, you can use either <a href=\"http:\/\/www.percona.com\/doc\/percona-toolkit\/2.1\/pt-archiver.html\">pt-archive<\/a> from <em>Percona Toolkit<\/em>, <a href=\"http:\/\/openarkkit.googlecode.com\/svn\/trunk\/openarkkit\/doc\/html\/oak-chunk-update.html\">oak-chunk-update<\/a> from <em>openark-kit<\/em>, or write a simple <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/query_script.html\">QueryScript<\/a> code with <em>common_schema<\/em>:<\/p>\n<blockquote>\n<pre>while (DELETE FROM title WHERE title &lt;= 'g' LIMIT 1000)\r\n{\r\n\u00a0\u00a0throttle 1;\r\n}<\/pre>\n<\/blockquote>\n<p>So, drop <strong>1,000<\/strong> rows or so at a time, then sleep some time, etc. The total runtime is longer, but who cares? The impact can be reduced to be unnoticeable.<\/p>\n<h4>Space reclaim<\/h4>\n<p>You can use online table operations to rebuild your table and reclaim the disk space. Either see <a href=\"http:\/\/openarkkit.googlecode.com\/svn\/trunk\/openarkkit\/doc\/html\/oak-online-alter-table.html\">oak-online-alter-table<\/a> or <a href=\"http:\/\/www.percona.com\/doc\/percona-toolkit\/2.1\/pt-online-schema-change.html\">pt-online-schema-change<\/a>. Again, both work in small chunks, so no long stalls.<\/p>\n<p>But more on this: my usual purge scenario shows that it is repetitive. You purge, data fills again, you purge again, and so on.<\/p>\n<p>Which is why it doesn&#8217;t make much sense to rebuild the table and reclaim the disk space: it just grows again to roughly same dimensions.<br \/>\nFor a one time operation (e.g. after neglect of cleanup for long time) &#8212; yes, absolutely, do a rebuild and reclaim. For repetitive cleanup &#8211; I don&#8217;t bother.<\/p>\n<h4>Conclusion<\/h4>\n<p>Aaron does make note at the end of his post that <strong>DELETE<\/strong> operations can be done online, while the <strong>INSERT<\/strong> trick requires downtime, and this is a fair assessment.<\/p>\n<p>But just to make a point: none of the <strong>DELETE<\/strong> timings are interesting. Since we are not concerned with deleting the rows in a given time (no &#8220;press the red button&#8221;), we can spread them over time and make the impact negligible. So not only is everything done online, it also goes unnoticed by the user. And this, I believe, is the major thing to consider.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Have just read INSERT, Don\u2019t DELETE by Aaron Brown, and have some lengthy response, which is why I write this post instead of commenting on said post. I wish to offer my counter thought and suggest that DELETEs are probably the better choice. Aaron suggests that, when one wishes to purge rows from some table, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[67,34,52,76],"class_list":["post-5008","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-common_schema","tag-openark-kit","tag-performance","tag-queryscript"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1iM","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5008","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=5008"}],"version-history":[{"count":12,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5008\/revisions"}],"predecessor-version":[{"id":5020,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5008\/revisions\/5020"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=5008"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=5008"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=5008"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}