{"id":5249,"date":"2012-08-21T06:49:43","date_gmt":"2012-08-21T04:49:43","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=5249"},"modified":"2012-08-21T09:03:00","modified_gmt":"2012-08-21T07:03:00","slug":"sql-selecting-top-n-records-per-group-another-solution","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/sql-selecting-top-n-records-per-group-another-solution","title":{"rendered":"SQL: selecting top N records per group, another solution"},"content":{"rendered":"<p>A while back I presented <a title=\"Permanent Link to SQL: selecting top N records per group\" href=\"http:\/\/code.openark.org\/blog\/mysql\/sql-selecting-top-n-records-per-group\" rel=\"bookmark\">SQL: selecting top N records per group<\/a>, a &#8220;give me the top <strong>5<\/strong> countries in each continent&#8221; type of query, and which used an external <em>numbers<\/em> table and a lot of tedious casting.<\/p>\n<p>Here&#8217;s another solution I came up with (<a href=\"#update\">*<\/a>). Still using <strong>GROUP_CONCAT<\/strong> (how else?), but no external table and no casting. The query outputs the largest <strong>5<\/strong> countries (by surface area) per continent.<\/p>\n<blockquote>\n<pre>SELECT\r\n\u00a0 Continent,\r\n\u00a0 Name,\r\n\u00a0 SurfaceArea,\r\n\u00a0 Population\r\nFROM\r\n\u00a0 world.Country,\r\n\u00a0 (\r\n\u00a0\u00a0\u00a0 SELECT \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP_CONCAT(top_codes_per_group) AS top_codes\r\n\u00a0\u00a0\u00a0 FROM\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SUBSTRING_INDEX(GROUP_CONCAT(<strong>Code ORDER BY SurfaceArea DESC<\/strong>), ',', <strong>5<\/strong>) AS top_codes_per_group\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 world.Country\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Continent\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 ) s_top_codes_per_group\r\n\u00a0 ) s_top_codes\r\nWHERE\r\n\u00a0 FIND_IN_SET(Code, top_codes)\r\nORDER BY\r\n\u00a0 Continent,\r\n\u00a0 SurfaceArea DESC\r\n;\r\n\r\n+---------------+----------------------------------------------+-------------+------------+\r\n| Continent     | Name                                         | SurfaceArea | Population |\r\n+---------------+----------------------------------------------+-------------+------------+\r\n| Asia          | China                                        |  9572900.00 | 1277558000 |\r\n| Asia          | India                                        |  3287263.00 | 1013662000 |\r\n| Asia          | Kazakstan                                    |  2724900.00 |   16223000 |\r\n| Asia          | Saudi Arabia                                 |  2149690.00 |   21607000 |\r\n| Asia          | Indonesia                                    |  1904569.00 |  212107000 |\r\n| Europe        | Russian Federation                           | 17075400.00 |  146934000 |\r\n| Europe        | Ukraine                                      |   603700.00 |   50456000 |\r\n| Europe        | France                                       |   551500.00 |   59225700 |\r\n| Europe        | Spain                                        |   505992.00 |   39441700 |\r\n| Europe        | Sweden                                       |   449964.00 |    8861400 |\r\n| North America | Canada                                       |  9970610.00 |   31147000 |\r\n| North America | United States                                |  9363520.00 |  278357000 |\r\n| North America | Greenland                                    |  2166090.00 |      56000 |\r\n| North America | Mexico                                       |  1958201.00 |   98881000 |\r\n| North America | Nicaragua                                    |   130000.00 |    5074000 |\r\n| Africa        | Sudan                                        |  2505813.00 |   29490000 |\r\n| Africa        | Algeria                                      |  2381741.00 |   31471000 |\r\n| Africa        | Congo, The Democratic Republic of the        |  2344858.00 |   51654000 |\r\n| Africa        | Libyan Arab Jamahiriya                       |  1759540.00 |    5605000 |\r\n| Africa        | Chad                                         |  1284000.00 |    7651000 |\r\n| Oceania       | Australia                                    |  7741220.00 |   18886000 |\r\n| Oceania       | Papua New Guinea                             |   462840.00 |    4807000 |\r\n| Oceania       | New Zealand                                  |   270534.00 |    3862000 |\r\n| Oceania       | Solomon Islands                              |    28896.00 |     444000 |\r\n| Oceania       | New Caledonia                                |    18575.00 |     214000 |\r\n| Antarctica    | Antarctica                                   | 13120000.00 |          0 |\r\n| Antarctica    | French Southern territories                  |     7780.00 |          0 |\r\n| Antarctica    | South Georgia and the South Sandwich Islands |     3903.00 |          0 |\r\n| Antarctica    | Heard Island and McDonald Islands            |      359.00 |          0 |\r\n| Antarctica    | Bouvet Island                                |       59.00 |          0 |\r\n| South America | Brazil                                       |  8547403.00 |  170115000 |\r\n| South America | Argentina                                    |  2780400.00 |   37032000 |\r\n| South America | Peru                                         |  1285216.00 |   25662000 |\r\n| South America | Colombia                                     |  1138914.00 |   42321000 |\r\n| South America | Bolivia                                      |  1098581.00 |    8329000 |\r\n+---------------+----------------------------------------------+-------------+------------+\r\n<\/pre>\n<\/blockquote>\n<p>In bold are the conditions by which we nominate our selected rows (condition is <strong>SurfaceArea DESC<\/strong>, number of rows is <strong>5<\/strong>, so 5 largest countries).<\/p>\n<h4><!--more-->What&#8217;s going on here?<\/h4>\n<p>So the inner <strong>s_top_codes_per_group<\/strong> query produces the codes for largest countries per continent:<\/p>\n<blockquote>\n<pre>+---------------------+\r\n| top_codes_per_group |\r\n+---------------------+\r\n| CHN,IND,KAZ,SAU,IDN |\r\n| RUS,UKR,FRA,ESP,SWE |\r\n| CAN,USA,GRL,MEX,NIC |\r\n| SDN,DZA,COD,LBY,TCD |\r\n| AUS,PNG,NZL,SLB,NCL |\r\n| ATA,ATF,SGS,HMD,BVT |\r\n| BRA,ARG,PER,COL,BOL |\r\n+---------------------+<\/pre>\n<\/blockquote>\n<p>The wrapping <strong>s_top_codes<\/strong> query concatenates all the above to one long text:<\/p>\n<blockquote>\n<pre>+---------------------------------------------------------------------------------------------------------------------------------------------+\r\n| top_codes\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+---------------------------------------------------------------------------------------------------------------------------------------------+\r\n| CHN,IND,KAZ,SAU,IDN,RUS,UKR,FRA,ESP,SWE,CAN,USA,GRL,MEX,NIC,SDN,DZA,COD,LBY,TCD,AUS,PNG,NZL,SLB,NCL,ATA,ATF,SGS,HMD,BVT,BRA,ARG,PER,COL,BOL |\r\n+---------------------------------------------------------------------------------------------------------------------------------------------+<\/pre>\n<\/blockquote>\n<p>And the final query simply demands that <strong>Code<\/strong> must be found within this string, by calling upon <strong>FIND_IN_SET(Code, top_codes)<\/strong>.<\/p>\n<h4>Notes<\/h4>\n<ul>\n<li>This solution works for <strong>PRIMARY KEY<\/strong>s or otherwise <strong>UNIQUE KEY<\/strong>s of all sorts (a <strong>CHAR(3)<\/strong> in our example, but same for integers etc.)<\/li>\n<li>And you still have to have a sufficient <strong>group_concat_max_len<\/strong> (see <a title=\"Those oversized, undersized variables defaults\" href=\"http:\/\/code.openark.org\/blog\/mysql\/those-oversized-undersized-variables-defaults\">this post<\/a>). You <em>must<\/em> have a large enough value to fit in the very long text you may be expecting in <strong>s_top_codes<\/strong>.<\/li>\n<li>Performance-wise there are full scans here, as well as string searching.<\/li>\n<\/ul>\n<p><a name=\"update\"><\/a><\/p>\n<h4>* UPDATE<\/h4>\n<p>I should pay closer attention. <a href=\"http:\/\/www.xaprb.com\/blog\/2006\/12\/07\/how-to-select-the-firstleastmax-row-per-group-in-sql\/#comment-13284\">This comment<\/a> had it <strong>5<\/strong> years ago.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A while back I presented SQL: selecting top N records per group, a &#8220;give me the top 5 countries in each continent&#8221; type of query, and which used an external numbers table and a lot of tedious casting. Here&#8217;s another solution I came up with (*). Still using GROUP_CONCAT (how else?), but no external table [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[74,21],"class_list":["post-5249","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-hack","tag-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1mF","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5249","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=5249"}],"version-history":[{"count":20,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5249\/revisions"}],"predecessor-version":[{"id":5270,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5249\/revisions\/5270"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=5249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=5249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=5249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}