{"id":246,"date":"2008-12-01T18:16:51","date_gmt":"2008-12-01T16:16:51","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=246"},"modified":"2008-12-13T06:33:43","modified_gmt":"2008-12-13T04:33:43","slug":"selecting-a-specific-non-aggregated-column-data-in-group-by","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/selecting-a-specific-non-aggregated-column-data-in-group-by","title":{"rendered":"Selecting a specific non aggregated column data in GROUP BY"},"content":{"rendered":"<p>In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using <a title=\"Setting up the world database\" href=\"http:\/\/dev.mysql.com\/doc\/world-setup\/en\/world-setup.html\">MySQL&#8217;s world database<\/a>, I wish to get the number of countries per continent, along with a &#8220;sample&#8221; country:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>Continent, <strong>COUNT<\/strong>(*), Name\r\n<strong>FROM <\/strong>`Country` <strong>GROUP BY <\/strong>Continent\r\n\r\n+---------------+----------+----------------+\r\n| Continent     | COUNT(*) | Name           |\r\n+---------------+----------+----------------+\r\n| Asia          |       51 | Afghanistan    |\r\n| Europe        |       46 | Albania        |\r\n| North America |       37 | Aruba          |\r\n| Africa        |       58 | Angola         |\r\n| Oceania       |       28 | American Samoa |\r\n| Antarctica    |        5 | Antarctica     |\r\n| South America |       14 | Argentina      |\r\n+---------------+----------+----------------+<\/pre>\n<\/blockquote>\n<p>What if I want to choose that &#8220;sample&#8221; country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs.<!--more--><\/p>\n<p>We begin by observing MySQL&#8217;s GROUP_CONCAT() fcuntion:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>Continent, <strong>COUNT<\/strong>(*), <strong>GROUP_CONCAT<\/strong>(Name)\r\n<strong>FROM <\/strong>`Country` <strong>GROUP BY <\/strong>Continent\r\n\r\n+---------------+----------+-------------------------------+\r\n| Continent     | COUNT(*) | GROUP_CONCAT(Name)            |\r\n+---------------+----------+-------------------------------+\r\n| Asia          |       51 | Sri Lanka,South Korea,Mald... |\r\n| Europe        |       46 | Iceland,Faroe Islands,Fran... |\r\n| North America |       37 | Martinique,Mexico,Puerto R... |\r\n| Africa        |       58 | Zambia,South Africa,Nigeri... |\r\n| Oceania       |       28 | Tokelau,American Samoa,Sam... |\r\n| Antarctica    |        5 | South Georgia and the Sout... |\r\n| South America |       14 | Suriname,Argentina,Ecuador... |\r\n+---------------+----------+-------------------------------+<\/pre>\n<\/blockquote>\n<p>Mmmm. The order by which the names are presented does not seem to be of use to me. Didn&#8217;t we say we want to see the country with the largest population? It just so happens that GROUP_CONCAT allows for ORDER BY:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>Continent, <strong>COUNT<\/strong>(*),\r\n  <strong>GROUP_CONCAT<\/strong>(Name <strong>ORDER BY <\/strong>Population <strong>DESC<\/strong>)\r\n<strong>FROM <\/strong>`Country` <strong>GROUP BY <\/strong>Continent\r\n\r\n+---------------+----------+-------------------------------------------------------+\r\n| Continent     | COUNT(*) | GROUP_CONCAT(Name ORDER BY Population DESC)           |\r\n+---------------+----------+-------------------------------------------------------+\r\n| Asia          |       51 | China,India,Indonesia,Pakistan,Bangladesh,Japan,Vi... |\r\n| Europe        |       46 | Russian Federation,Germany,United Kingdom,France,I... |\r\n| North America |       37 | United States,Mexico,Canada,Guatemala,Cuba,Dominic... |\r\n| Africa        |       58 | Nigeria,Egypt,Ethiopia,Congo, The Democratic Repub... |\r\n| Oceania       |       28 | Australia,Papua New Guinea,New Zealand,Fiji Island... |\r\n| Antarctica    |        5 | Heard Island and McDonald Islands,Antarctica,Bouve... |\r\n| South America |       14 | Brazil,Colombia,Argentina,Peru,Venezuela,Chile,Ecu... |\r\n+---------------+----------+-------------------------------------------------------+<\/pre>\n<\/blockquote>\n<p>Good progress! We can now see our desired countries first in list. What&#8217;s next? We want to isolate the first country for each continent. SUBSTRING_INDEX() comes to the rescue. Finally, we reach the following query:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>Continent, <strong>COUNT<\/strong>(*),\r\n  <strong>SUBSTRING_INDEX<\/strong>(\r\n    <strong>CONCAT<\/strong>(\r\n      <strong>GROUP_CONCAT<\/strong>(Name <strong>ORDER BY <\/strong>Population <strong>DESC<\/strong>),\r\n      ','),\r\n    ',', 1) <strong>AS <\/strong>largest_country\r\n<strong>FROM <\/strong>`Country` <strong>GROUP BY <\/strong>Continent\r\n\r\n+---------------+----------+-----------------------------------+\r\n| Continent     | COUNT(*) | largest_country                   |\r\n+---------------+----------+-----------------------------------+\r\n| Asia          |       51 | China                             |\r\n| Europe        |       46 | Russian Federation                |\r\n| North America |       37 | United States                     |\r\n| Africa        |       58 | Nigeria                           |\r\n| Oceania       |       28 | Australia                         |\r\n| Antarctica    |        5 | Heard Island and McDonald Islands |\r\n| South America |       14 | Brazil                            |\r\n+---------------+----------+-----------------------------------+<\/pre>\n<\/blockquote>\n<p>This query provides us with the right answer. No sub-selects, no joins required. Granted, it&#8217;s not too pretty, and it&#8217;s not clean SQL (it&#8217;s not even <em>standard<\/em> SQL) &#8211; but it just may save some query time: there&#8217;s an index to add which will cause this query to run &#8220;using index&#8221;:<\/p>\n<blockquote>\n<pre><strong>ALTER TABLE <\/strong>Country <strong>ADD INDEX <\/strong>`Continent_Name_Population` (`Continent`,`name`,`Population`)<\/pre>\n<\/blockquote>\n<h4>Notes<\/h4>\n<ul>\n<li>The above assumes the comma (&#8216;,&#8217;) does not appear in the text of the column. Otherwise, you can choose your own delimiter in GROUP_CONCAT().<\/li>\n<li>The concatenated string can grow long if many rows are aggregated per group. Check out <a title=\"MySQL docs\" href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.0\/en\/server-system-variables.html#sysvar_group_concat_max_len\">group_concat_max_len<\/a>, which limits the length of the GROUP_CONCAT string. Also see<code class=\"literal\"> max_allowed_packet<\/code>, or read the whole deal on the <a title=\"MySQL docs\" href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.0\/en\/group-by-functions.html#function_group-concat\">MySQL online documentation<\/a>.<\/li>\n<\/ul>\n<p>[<strong>UPDATE<\/strong>: Turns out an anonymous comment to <a href=\"http:\/\/kristiannielsen.livejournal.com\/6745.html\">Kristian Nielsen&#8217;s post<\/a> suggested the same solution, about a week ago. Moreover, it suggests that the CONCAT() part in the above solution is not required, as SUBSTRING_INDEX does not require the delimiter to be present]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using MySQL&#8217;s world database, I can get the number of countries per continent, along with a &#8220;sample&#8221; country.<br \/>\nWhat if I want to choose that &#8220;sample&#8221; country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[21,20],"class_list":["post-246","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-sql","tag-syntax"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-3Y","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/246","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=246"}],"version-history":[{"count":15,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/246\/revisions"}],"predecessor-version":[{"id":267,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/246\/revisions\/267"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}