Unwalking a string with GROUP_CONCAT

“Walking a string” is an SQL technique to convert a single value into multiple rows result set. For example, walking the string ‘hello’ results with 5 rows, each of which contains a single character from the text.

I’ll present a brief example of walking a string, and then show how to “unwalk” the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine):

CREATE TABLE `int_table` (
  `int_col` int(11) NOT NULL,
  PRIMARY KEY  (`int_col`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

-- ...
-- INSERTS follow here
-- ...

mysql> SELECT * FROM int_table;
+---------+
| int_col |
+---------+
|       0 |
|       1 |
|       2 |
|       3 |
|       4 |
|       5 |
|       6 |
|       7 |
|       8 |
|       9 |
+---------+
10 rows in set (0.00 sec)

To convert a string to rows of characters, we join the text with the integers table (we assume there are enough numbers for covering the length of the text):

mysql> SELECT
         SUBSTRING(s, int_col+1, 1) AS c
       FROM int_table, (SELECT 'hello' AS s) sel1
       WHERE int_col < char_length(s);
+---+
| c |
+---+
| h |
| e |
| l |
| l |
| o |
+---+
5 rows in set (0.00 sec)

More on this can be found in the excellent SQL Cookbook.

Unwalking the string

Doing the inverse action – combining the string back from the multiple rows, can be easily done using GROUP_CONCAT. It’s interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause. When no such clause is provided in the SQL query, all searched rows are used.

Let’s assume now that we have a table of character values, which we want to concatenate back to a complete string. We can easily build this table:

CREATE TABLE characters AS
  SELECT
    SUBSTRING(s, int_col+1, 1) AS c
  FROM int_table, (SELECT 'hello' AS s) sel1
  WHERE int_col < char_length(s);

To reconstruct the text, we simply use MySQL’s GROUP_CONCAT with an empty separator:

mysql> SELECT GROUP_CONCAT(c separator '') AS s FROM characters;
+-------+
| s     |
+-------+
| hello |
+-------+
1 row in set (0.00 sec)

13
Leave a Reply

avatar
13 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
5 Comment authors
AnthonyShantanu OakshlomiRTRoland Bouman Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Roland Bouman
Guest

Hi!

nice one – GROUP_CONCAT is a friend of mine 😉

Just caught this:

“It’s interesting to learn that GROUP_CONCAT does not actually require any GROUP BY clause”

Well, this is not specific for GROUP_CONCAT – all aggregate functions have this property. And this leads us to one of the biggest SQL WTF’s ever: Both the presence of a GROUP BY clause as well as the presence of an aggregate function (any aggregate function) causes aggregation to occur.

Roland Bouman
Guest

Hi Shlomi, Perhaps it was me that wasn’t clear. In your original post, you mentioned that it was interesting that GROUP_CONCAT does not require a GROUP BY clause. I just pointed out that none of the aggregate functions require a GROUP BY clause, so GROUP_CONCAT is not different from the other aggregate functions in that respect. I do agree that applying only GROUP BY is aggregation. This brings us to the WTF: SQL provides two completely different constructs that cause aggregation (as in grouping of rows): – a GROUP BY clause – aggregate function So I’m just saying that IMO,… Read more »

RT
Guest
RT

Hi Shlomi,
Can you please tell how you added that favicon to your blog…

RT.

Shantanu Oak
Guest

I was just thinking of writing a SP.
Thanks a lot for sharing this.
I selected the int_col because I wanted the exact position of the character.

Shantanu Oak
Guest

I forgot to mention one thing about group_concat. I am getting 54 strings as against expected 74 when I use group_concat. I do not have access to my.cnf file so I changed the session variable but it did not help.
SET SESSION group_concat_max_len = 1000000;

Roland Bouman
Guest

Hi Shantanu, do you get any warnings? If group_concat_max_len really is too short to hold the result, you should get a warning: 1 row in set, 1 warning (0.00 sec) mysql> show warnings; +———+——+————————————–+ | Level | Code | Message | +———+——+————————————–+ | Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() | +———+——+————————————–+ If you want to be sure, don’t set group_concat_max_len to a particular length, just set it to max_allowed_packet. This is the practical maximum of any string length and if you exceed that you will get an entirely different error message, see: http://dev.mysql.com/doc/refman/5.1/en/packet-too-large.html

Shantanu Oak
Guest

Yes, I got exactly the same warning and after following your tip, I got the results. Thanks.

trackback

[…] Unwalking a string with GROUP_CONCAT […]

trackback

[…] to business: how do we normalize a text using SQL? Well, once again, string walking and string unwalking to the rescue. The trick is to break the string apart (to distinct characters), then re-combine the […]

trackback

[…] string walking. Examples for string walking (described in the excellent SQL Cookbook) can be found here, here and here. We’ll be using a numbers table: a simple table which lists ascending integer […]

Anthony
Guest
Anthony

Nice article.

The SET SESSION group_concat_max_len = 1000000 was exactly the info I required.

I had trouble with this issue as follows:-

I used group_concat() in isolation and no truncation occured.

I used the same “isolated” group_concat() query as a subquery and joined to it. Then truncation occurred but NO WARNING/ERROR.

I moved the subquery into a temp table and then joined on that. Then, the now infamous 1260 ERROR was thrown.

Needless to say, problem solved!