Unwalking a string with GROUP_CONCAT

June 16, 2009

"Walking a string" is an SQL technique to convert a single value into multiple rows result set. For example, walking the string 'hello' results with 5 rows, each of which contains a single character from the text.

I'll present a brief example of walking a string, and then show how to "unwalk" the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine):

CREATE TABLE `int_table` (
  `int_col` int(11) NOT NULL,
  PRIMARY KEY  (`int_col`)

-- ...
-- INSERTS follow here
-- ...

mysql> SELECT * FROM int_table;
| int_col |
|       0 |
|       1 |
|       2 |
|       3 |
|       4 |
|       5 |
|       6 |
|       7 |
|       8 |
|       9 |
10 rows in set (0.00 sec)

To convert a string to rows of characters, we join the text with the integers table (we assume there are enough numbers for covering the length of the text):

mysql> SELECT
         SUBSTRING(s, int_col+1, 1) AS c
       FROM int_table, (SELECT 'hello' AS s) sel1
       WHERE int_col < char_length(s);
| c |
| h |
| e |
| l |
| l |
| o |
5 rows in set (0.00 sec)

More on this can be found in the excellent SQL Cookbook.

Unwalking the string

Doing the inverse action - combining the string back from the multiple rows, can be easily done using GROUP_CONCAT. It's interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause. When no such clause is provided in the SQL query, all searched rows are used.

Let's assume now that we have a table of character values, which we want to concatenate back to a complete string. We can easily build this table:

CREATE TABLE characters AS
    SUBSTRING(s, int_col+1, 1) AS c
  FROM int_table, (SELECT 'hello' AS s) sel1
  WHERE int_col < char_length(s);

To reconstruct the text, we simply use MySQL's GROUP_CONCAT with an empty separator:

mysql> SELECT GROUP_CONCAT(c separator '') AS s FROM characters;
| s     |
| hello |
1 row in set (0.00 sec)
  • Hi!

    nice one - GROUP_CONCAT is a friend of mine 😉

    Just caught this:

    "It’s interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause"

    Well, this is not specific for GROUP_CONCAT - all aggregate functions have this property. And this leads us to one of the biggest SQL WTF's ever: Both the presence of a GROUP BY clause as well as the presence of an aggregate function (any aggregate function) causes aggregation to occur.

  • Hi Roland!

    Perhaps it's a matter of semantics, but I'm not sure I understand what you mean.
    If I only GROUP BY, is that not considered an aggregation?
    For example, in MySQL I can SELECT a non-grouped column with a GROUP BY clause. The selected value is arbitrary, as far as I'm concerned, but can't it be said that it's aggregated?

    In math we would call this value a 'representer' of the group (I'm bending some rules here).

  • Hi Shlomi,

    Perhaps it was me that wasn't clear.

    In your original post, you mentioned that it was interesting that GROUP_CONCAT does not require a GROUP BY clause. I just pointed out that none of the aggregate functions require a GROUP BY clause, so GROUP_CONCAT is not different from the other aggregate functions in that respect.

    I do agree that applying only GROUP BY is aggregation. This brings us to the WTF:

    SQL provides two completely different constructs that cause aggregation (as in grouping of rows):
    - a GROUP BY clause
    - aggregate function

    So I'm just saying that IMO, it is strange that the presence of an aggregate function causes aggregation. It would make more sense to me if a grouping construct would first be required before you would be allowed to apply aggregate functions. So in my mind, it would make more sense to write this:

    FROM t

    instead of this

    FROM t

    (of course , GROUP BY NULL is strange, that's because the keywords were designed to always group according to an expression. That could be solved by making the [BY exprlist] part optional, in which case you would have to write GROUP to convey that you want all rows to be grouped into one row.)



  • RT

    Hi Shlomi,
    Can you please tell how you added that favicon to your blog...


  • Hi,

    Why, with GROUP_CONCAT, of course.
    You'll need to create the icon (created mine with TheGimp),
    name it favicon.ico and put it in your theme directory (so one icon per theme)

    Edit header.php and add the following inside your <head> tag:
    <link rel="shortcut icon" href="<?php bloginfo('template_directory'); ?>/favicon.ico" />

    That should do the work

  • I was just thinking of writing a SP.
    Thanks a lot for sharing this.
    I selected the int_col because I wanted the exact position of the character.

  • I forgot to mention one thing about group_concat. I am getting 54 strings as against expected 74 when I use group_concat. I do not have access to my.cnf file so I changed the session variable but it did not help.
    SET SESSION group_concat_max_len = 1000000;

  • Hi Shantanu,

    do you get any warnings? If group_concat_max_len really is too short to hold the result, you should get a warning:

    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    | Level | Code | Message |
    | Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |

    If you want to be sure, don't set group_concat_max_len to a particular length, just set it to max_allowed_packet. This is the practical maximum of any string length and if you exceed that you will get an entirely different error message, see: http://dev.mysql.com/doc/refman/5.1/en/packet-too-large.html

  • Yes, I got exactly the same warning and after following your tip, I got the results. Thanks.

  • Pingback: Rotating SQL graphs horizontally | code.openark.org()

  • Pingback: Checking for string permutation | code.openark.org()

  • Pingback: SQL: selecting top N records per group | code.openark.org()

  • Anthony

    Nice article.

    The SET SESSION group_concat_max_len = 1000000 was exactly the info I required.

    I had trouble with this issue as follows:-

    I used group_concat() in isolation and no truncation occured.

    I used the same "isolated" group_concat() query as a subquery and joined to it. Then truncation occurred but NO WARNING/ERROR.

    I moved the subquery into a temp table and then joined on that. Then, the now infamous 1260 ERROR was thrown.

    Needless to say, problem solved!

Powered by Wordpress and MySQL. Theme by openark.org