{"id":3206,"date":"2011-01-10T10:24:44","date_gmt":"2011-01-10T08:24:44","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=3206"},"modified":"2011-01-10T12:59:11","modified_gmt":"2011-01-10T10:59:11","slug":"another-use-for-top-n-records-per-group-query","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/another-use-for-top-n-records-per-group-query","title":{"rendered":"Another use for &#8220;top N records per group&#8221; query"},"content":{"rendered":"<p>A few days ago I published <a href=\"http:\/\/code.openark.org\/blog\/mysql\/sql-selecting-top-n-records-per-group\">SQL: selecting top N records per group<\/a>. It just dawned on me that the very same query solved another type of problem I was having a couple years ago.<\/p>\n<blockquote><p>BTW, for reference, Baron Schwartz posted <a href=\"http:\/\/www.xaprb.com\/blog\/2006\/12\/07\/how-to-select-the-firstleastmax-row-per-group-in-sql\/\">this<\/a> 4 years ago. There are very interesting approaches in text and in comments. Good to see the ancients already knew of such problems, I should study my history better.<\/p>\n<p>(Kidding, kidding! This is self criticism of course)<\/p><\/blockquote>\n<p>In this case I present now I have a table with recurring data, which, to some extent, represents revision of data. For the sake of simplicity let&#8217;s describe it as a simple simulation of a revision system:<\/p>\n<ul>\n<li>I have text files, whose content I store within a row<\/li>\n<li>Each text file uses at least one row in the table<\/li>\n<li>Text files can be edited, whereas an edited file is written in a new row (never UPDATEd).<\/li>\n<\/ul>\n<p>Hold your horses: I&#8217;m not really implementing a revision system, I just can&#8217;t get into the actual details here.<\/p>\n<p>The table becomes large quickly, and it&#8217;s desired to purge rows from the table. The rule is: we must obtain the most recent two versions for each file (if there are indeed more than one). All others can be purged. So the question is:<!--more--><\/p>\n<blockquote><p>Which are all the 3rd newest, 4th newst, &#8230;, older, &#8230;, oldest revisions per file?<\/p><\/blockquote>\n<p>To use the example from the aforementioned post:<\/p>\n<blockquote><p>Which are the 3rd largest, 4th largest, &#8230;, smallest countries in each continent?<\/p><\/blockquote>\n<p>Assuming <strong>group_concat_max_len<\/strong> allows, and enough numbers are available, all we need to do is change this condition:<\/p>\n<blockquote>\n<pre>WHERE\r\n  tinyint_asc.value &gt;= 1 AND tinyint_asc.value &lt;= 5\r\n<\/pre>\n<\/blockquote>\n<p>into:<\/p>\n<blockquote>\n<pre>WHERE\r\n  tinyint_asc.value &gt;= 3\r\n<\/pre>\n<\/blockquote>\n<h4>More memory, more CPU<\/h4>\n<p>The original query limited the number of resulting rows (only 5 per group). There&#8217;s no such limitation now; there could be <strong>1,000<\/strong> revision per file, there could be more. So this latest query could incur more overhead. Back to the original problem, if I do the purging frequently enough, this shouldn&#8217;t be a problem.<\/p>\n<p>Right. Now to the task of revisiting old, forgotten code \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few days ago I published SQL: selecting top N records per group. It just dawned on me that the very same query solved another type of problem I was having a couple years ago. BTW, for reference, Baron Schwartz posted this 4 years ago. There are very interesting approaches in text and in comments. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[21],"class_list":["post-3206","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-PI","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3206","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=3206"}],"version-history":[{"count":20,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3206\/revisions"}],"predecessor-version":[{"id":3248,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/3206\/revisions\/3248"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=3206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=3206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=3206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}