CHAR and spaces

June 12, 2012

I know about it, I knew about it all along, but... it's so easy to fall for it; there's just so much absurdity!

A CHAR type has a known number of characters. For example, the column:

CountryCode CHAR(3) CHARSET ascii NOT NULL

- is known to have exactly three characters. These could be 'USA', 'FRA', etc.

What happens with spaces? A SQL CHAR type ignores any trailing spaces; thus, the code 'GB ' (the characters 'G', 'B', and the space ' ') is interpreted as 'GB'. Trailing spaces are not regarded as part of the text. Want to see some absurdity?

CREATE TABLE `c_test` (
  `c` char(1) DEFAULT NULL
);
INSERT INTO c_test VALUES ('a');
INSERT INTO c_test VALUES ('b');
INSERT INTO c_test VALUES (' ');

SELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM c_test;
+------+-----------+---------+---------+---------+-------------+
| c    | LENGTH(c) | c = 'a' | c = 'b' | c = ' ' | LENGTH(' ') |
+------+-----------+---------+---------+---------+-------------+
| a    |         1 |       1 |       0 |       0 |           1 |
| b    |         1 |       0 |       1 |       0 |           1 |
|      |         0 |       0 |       0 |       1 |           1 |
+------+-----------+---------+---------+---------+-------------+

Note that LENGTH(c) is zero on third row, but nevertheless it equals the space character (c = ' '). Although, the length of the space character is... 1 (and this is because the constant ' ' is interpreted as VARCHAR).

To emphasize the absurdity, now take a closer look:

SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;
+------+-----------+---------+-------------+--------+------------+
| c    | LENGTH(c) | c = ' ' | LENGTH(' ') | c = '' | LENGTH('') |
+------+-----------+---------+-------------+--------+------------+
| a    |         1 |       0 |           1 |      0 |          0 |
| b    |         1 |       0 |           1 |      0 |          0 |
|      |         0 |       1 |           1 |      1 |          0 |
+------+-----------+---------+-------------+--------+------------+

I add a comparison with the empty string '', and it turns out the space character equals both a space character and the empty string, though their lengths do not agree.

Now, I usually don't bother with CHAR types, and prefer the VARCHAR or TEXT types, but every once in a while I do use it. Most recently, in a stored routine code in common_schema, I iterate & parse some text. I iterated the text char by char, and declared a local CHAR(1) variable to assign iterated value to. It sure bit me when I got to a space character which insisted it was the empty text (indicating end of iteration).

Of course the immediate and correct solution was to change the local variable's type to VARCHAR(1) instead. But the time it consumed to find the problem... Be warned and be aware!

tags:
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

12 Comments to "CHAR and spaces"

  1. Roland Bouman wrote:

    Hey, that's a pretty nasty gotcha. I "knew" about the spaces, but didn't realize its consequences. Thanks for pointing it out.

  2. Peter Laursen wrote:

    What is your sql_mode?

    http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_pad_char_to_full_length

    "By default, trailing spaces are trimmed from CHAR column values on retrieval. If PAD_CHAR_TO_FULL_LENGTH is enabled, trimming does not occur and retrieved CHAR values are padded to their full length. This mode does not apply to VARCHAR columns, for which trailing spaces are retained on retrieval."

  3. shlomi wrote:

    Roland,
    You were aware enough to use it in _wrap_select_list_columns:
    "...
    declare v_whitespace varchar(1) default '';
    ..."

  4. shlomi wrote:

    Peter,
    There's an sql_mode to that, too?!?
    My hatred towards sql_mode is growing by the minute!

  5. shlomi wrote:

    @Peter,
    Even with sql_mode='PAD_CHAR_TO_FULL_LENGTH', I get:

    SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;
    +------+-----------+---------+-------------+--------+------------+
    | c    | LENGTH(c) | c = ' ' | LENGTH(' ') | c = '' | LENGTH('') |
    +------+-----------+---------+-------------+--------+------------+
    | a    |         1 |       0 |           1 |      0 |          0 |
    | b    |         1 |       0 |           1 |      0 |          0 |
    |      |         1 |       1 |           1 |      1 |          0 |
    +------+-----------+---------+-------------+--------+------------+
    

    Which is different than before, but still absurd.

  6. Peter Laursen wrote:

    What is absurd here is that "SELECT c = '';" returns "1", right? (in other words: empty string == SPACE-character). This I consider a plain bug. What say?

    "SELECT LENGTH(c)" will simply return the column width with this SQL_mode, I think, as "When CHAR values are stored, they are right-padded with spaces to the specified length." (quote from http://dev.mysql.com/doc/refman/5.5/en/char.html)

  7. Davi Arnaut wrote:

    "I add a comparison with the empty string '', and it turns out the space character equals both a space character and the empty string, though their lengths do not agree."

    I think this might be blank-padding semantics, where the SQL standard requires padding so that when compared the values will have the same length.

    Test with, let'say PostgreSQL, it should also have this behavior.

  8. shlomi wrote:

    Peter,
    Yes, that would be a bug, I believe. Will report.

  9. Rick James wrote:

    Bottom line: Use CHAR only for truly fixed length fields.

  10. shlomi wrote:

    @Rick,

    Well, only for truly fixed length fields which do not end with spaces, that is.
    I had a perfectly true fixed length field which was spaces, those spaces being ignored...

  11. Daniël van Eeden wrote:

    The behaviour for PostgreSQL 8.4 seems identical.

    chartest=# CREATE TABLE c_test (c char(1) DEFAULT NULL);
    CREATE TABLE
    chartest=# INSERT INTO c_test VALUES ('a');
    INSERT 0 1
    chartest=# INSERT INTO c_test VALUES ('b');
    INSERT 0 1
    chartest=# INSERT INTO c_test VALUES (' ');
    INSERT 0 1
    chartest=# SELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM c_test;
    c | length | ?column? | ?column? | ?column? | length
    ---+--------+----------+----------+----------+--------
    a | 1 | t | f | f | 1
    b | 1 | f | t | f | 1
    | 0 | f | f | t | 1
    (3 rows)

    chartest=# SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;
    c | length | ?column? | length | ?column? | length
    ---+--------+----------+--------+----------+--------
    a | 1 | f | 1 | f | 0
    b | 1 | f | 1 | f | 0
    | 0 | t | 1 | t | 0
    (3 rows)

  12. Daniël van Eeden wrote:

    And now with pre tags:

    chartest=# CREATE TABLE c_test (c char(1) DEFAULT NULL);
    CREATE TABLE
    chartest=# INSERT INTO c_test VALUES ('a');
    INSERT 0 1
    chartest=# INSERT INTO c_test VALUES ('b');
    INSERT 0 1
    chartest=# INSERT INTO c_test VALUES (' ');
    INSERT 0 1
    chartest=# SELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM c_test;
    c | length | ?column? | ?column? | ?column? | length
    ---+--------+----------+----------+----------+--------
    a | 1 | t | f | f | 1
    b | 1 | f | t | f | 1
    | 0 | f | f | t | 1
    (3 rows)

    chartest=# SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;
    c | length | ?column? | length | ?column? | length
    ---+--------+----------+--------+----------+--------
    a | 1 | f | 1 | f | 0
    b | 1 | f | 1 | f | 0
    | 0 | t | 1 | t | 0
    (3 rows)

    chartest=# select version();
    version
    -----------------------------------------------------------------------------------------------------------------
    PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.4.5 20110214 (Red Hat 4.4.5-6), 64-bit
    (1 row)

Leave Your Comment

 
Powered by Wordpress and MySQL. Theme by openark.org