Unwalking a string with GROUP_CONCAT

June 16, 2009

"Walking a string" is an SQL technique to convert a single value into multiple rows result set. For example, walking the string 'hello' results with 5 rows, each of which contains a single character from the text.

I'll present a brief example of walking a string, and then show how to "unwalk" the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine):

CREATE TABLE `int_table` (
  `int_col` int(11) NOT NULL,
  PRIMARY KEY  (`int_col`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

-- ...
-- INSERTS follow here
-- ...

mysql> SELECT * FROM int_table;
+---------+
| int_col |
+---------+
|       0 |
|       1 |
|       2 |
|       3 |
|       4 |
|       5 |
|       6 |
|       7 |
|       8 |
|       9 |
+---------+
10 rows in set (0.00 sec)

To convert a string to rows of characters, we join the text with the integers table (we assume there are enough numbers for covering the length of the text):

mysql> SELECT
         SUBSTRING(s, int_col+1, 1) AS c
       FROM int_table, (SELECT 'hello' AS s) sel1
       WHERE int_col < char_length(s);
+---+
| c |
+---+
| h |
| e |
| l |
| l |
| o |
+---+
5 rows in set (0.00 sec)

More on this can be found in the excellent SQL Cookbook.

Unwalking the string

Doing the inverse action - combining the string back from the multiple rows, can be easily done using GROUP_CONCAT. It's interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause. When no such clause is provided in the SQL query, all searched rows are used.

Let's assume now that we have a table of character values, which we want to concatenate back to a complete string. We can easily build this table:

CREATE TABLE characters AS
  SELECT
    SUBSTRING(s, int_col+1, 1) AS c
  FROM int_table, (SELECT 'hello' AS s) sel1
  WHERE int_col < char_length(s);

To reconstruct the text, we simply use MySQL's GROUP_CONCAT with an empty separator:

mysql> SELECT GROUP_CONCAT(c separator '') AS s FROM characters;
+-------+
| s     |
+-------+
| hello |
+-------+
1 row in set (0.00 sec)

tags: ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

13 Comments to "Unwalking a string with GROUP_CONCAT"

  1. Roland Bouman wrote:

    Hi!

    nice one - GROUP_CONCAT is a friend of mine ;)

    Just caught this:

    "It’s interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause"

    Well, this is not specific for GROUP_CONCAT - all aggregate functions have this property. And this leads us to one of the biggest SQL WTF's ever: Both the presence of a GROUP BY clause as well as the presence of an aggregate function (any aggregate function) causes aggregation to occur.

  2. shlomi wrote:

    Hi Roland!

    Perhaps it's a matter of semantics, but I'm not sure I understand what you mean.
    If I only GROUP BY, is that not considered an aggregation?
    For example, in MySQL I can SELECT a non-grouped column with a GROUP BY clause. The selected value is arbitrary, as far as I'm concerned, but can't it be said that it's aggregated?

    In math we would call this value a 'representer' of the group (I'm bending some rules here).

  3. Roland Bouman wrote:

    Hi Shlomi,

    Perhaps it was me that wasn't clear.

    In your original post, you mentioned that it was interesting that GROUP_CONCAT does not require a GROUP BY clause. I just pointed out that none of the aggregate functions require a GROUP BY clause, so GROUP_CONCAT is not different from the other aggregate functions in that respect.

    I do agree that applying only GROUP BY is aggregation. This brings us to the WTF:

    SQL provides two completely different constructs that cause aggregation (as in grouping of rows):
    - a GROUP BY clause
    - aggregate function

    So I'm just saying that IMO, it is strange that the presence of an aggregate function causes aggregation. It would make more sense to me if a grouping construct would first be required before you would be allowed to apply aggregate functions. So in my mind, it would make more sense to write this:

    SELECT COUNT(*)
    FROM t
    GROUP BY NULL

    instead of this

    SELECT COUNT(*)
    FROM t

    (of course , GROUP BY NULL is strange, that's because the keywords were designed to always group according to an expression. That could be solved by making the [BY exprlist] part optional, in which case you would have to write GROUP to convey that you want all rows to be grouped into one row.)

    HTH,

    Roland

  4. RT wrote:

    Hi Shlomi,
    Can you please tell how you added that favicon to your blog...

    RT.

  5. shlomi wrote:

    Hi,

    Why, with GROUP_CONCAT, of course.
    You'll need to create the icon (created mine with TheGimp),
    name it favicon.ico and put it in your theme directory (so one icon per theme)

    Edit header.php and add the following inside your <head> tag:
    <link rel="shortcut icon" href="<?php bloginfo('template_directory'); ?>/favicon.ico" />

    That should do the work

  6. Shantanu Oak wrote:

    I was just thinking of writing a SP.
    Thanks a lot for sharing this.
    I selected the int_col because I wanted the exact position of the character.

  7. Shantanu Oak wrote:

    I forgot to mention one thing about group_concat. I am getting 54 strings as against expected 74 when I use group_concat. I do not have access to my.cnf file so I changed the session variable but it did not help.
    SET SESSION group_concat_max_len = 1000000;

  8. Roland Bouman wrote:

    Hi Shantanu,

    do you get any warnings? If group_concat_max_len really is too short to hold the result, you should get a warning:

    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +---------+------+--------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------+
    | Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |
    +---------+------+--------------------------------------+

    If you want to be sure, don't set group_concat_max_len to a particular length, just set it to max_allowed_packet. This is the practical maximum of any string length and if you exceed that you will get an entirely different error message, see: http://dev.mysql.com/doc/refman/5.1/en/packet-too-large.html

  9. Shantanu Oak wrote:

    Yes, I got exactly the same warning and after following your tip, I got the results. Thanks.

  10. Rotating SQL graphs horizontally | code.openark.org wrote:

    [...] Unwalking a string with GROUP_CONCAT [...]

  11. Checking for string permutation | code.openark.org wrote:

    [...] to business: how do we normalize a text using SQL? Well, once again, string walking and string unwalking to the rescue. The trick is to break the string apart (to distinct characters), then re-combine the [...]

  12. SQL: selecting top N records per group | code.openark.org wrote:

    [...] string walking. Examples for string walking (described in the excellent SQL Cookbook) can be found here, here and here. We’ll be using a numbers table: a simple table which lists ascending integer [...]

  13. Anthony wrote:

    Nice article.

    The SET SESSION group_concat_max_len = 1000000 was exactly the info I required.

    I had trouble with this issue as follows:-

    I used group_concat() in isolation and no truncation occured.

    I used the same "isolated" group_concat() query as a subquery and joined to it. Then truncation occurred but NO WARNING/ERROR.

    I moved the subquery into a temp table and then joined on that. Then, the now infamous 1260 ERROR was thrown.

    Needless to say, problem solved!

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org