{"id":4528,"date":"2012-06-12T07:28:08","date_gmt":"2012-06-12T05:28:08","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=4528"},"modified":"2012-06-12T07:28:08","modified_gmt":"2012-06-12T05:28:08","slug":"char-and-spaces","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/char-and-spaces","title":{"rendered":"CHAR and spaces"},"content":{"rendered":"<p>I know about it, I knew about it all along, but&#8230; it&#8217;s so easy to fall for it; there&#8217;s just so much absurdity!<\/p>\n<p>A <strong>CHAR<\/strong> type has a known number of characters. For example, the column:<\/p>\n<blockquote>\n<pre>CountryCode CHAR(3) CHARSET ascii NOT NULL<\/pre>\n<\/blockquote>\n<p>&#8211; is known to have exactly three characters. These could be <strong>&#8216;USA&#8217;<\/strong>, <strong>&#8216;FRA&#8217;<\/strong>, etc.<\/p>\n<p>What happens with spaces? A SQL <strong>CHAR<\/strong> type ignores any trailing spaces; thus, the code <strong>&#8216;GB &#8216;<\/strong> (the characters <strong>&#8216;G&#8217;<\/strong>, <strong>&#8216;B&#8217;<\/strong>, and the space <strong>&#8216; &#8216;<\/strong>) is interpreted as <strong>&#8216;GB&#8217;<\/strong>. Trailing spaces are not regarded as part of the text. Want to see some absurdity?<\/p>\n<blockquote>\n<pre>CREATE TABLE `c_test` (\r\n\u00a0 `c` char(1) DEFAULT NULL\r\n);\r\nINSERT INTO c_test VALUES ('a');\r\nINSERT INTO c_test VALUES ('b');\r\nINSERT INTO c_test VALUES (' ');\r\n\r\nSELECT c, LENGTH(c), c = 'a', c = 'b', c = ' ', LENGTH(' ') FROM c_test;\r\n+------+-----------+---------+---------+---------+-------------+\r\n| c\u00a0\u00a0\u00a0 | LENGTH(c) | c = 'a' | c = 'b' | c = ' ' | LENGTH(' ') |\r\n+------+-----------+---------+---------+---------+-------------+\r\n| a\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\r\n| b\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\r\n+------+-----------+---------+---------+---------+-------------+<\/pre>\n<\/blockquote>\n<p><!--more-->Note that <strong>LENGTH(c)<\/strong> is zero on third row, but nevertheless it equals the space character (<strong>c = &#8216; &#8216;<\/strong>). Although, the length of the space character is&#8230; <strong>1<\/strong> (and this is because the constant <strong>&#8216; &#8216;<\/strong> is interpreted as <strong>VARCHAR)<\/strong>.<\/p>\n<p>To emphasize the absurdity, now take a closer look:<\/p>\n<blockquote>\n<pre>SELECT c, LENGTH(c), c = ' ', LENGTH(' '), c = '', LENGTH('') FROM c_test;\r\n+------+-----------+---------+-------------+--------+------------+\r\n| c\u00a0\u00a0\u00a0 | LENGTH(c) | c = ' ' | LENGTH(' ') | c = '' | LENGTH('') |\r\n+------+-----------+---------+-------------+--------+------------+\r\n| a\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n| b\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n+------+-----------+---------+-------------+--------+------------+<\/pre>\n<\/blockquote>\n<p>I add a comparison with the empty string <strong>&#8221;<\/strong>, and it turns out the space character equals both a space character and the empty string, though their lengths do not agree.<\/p>\n<p>Now, I usually don&#8217;t bother with <strong>CHAR<\/strong> types, and prefer the <strong>VARCHAR<\/strong> or <strong>TEXT<\/strong> types, but every once in a while I do use it. Most recently, in a stored routine code in <a href=\"http:\/\/code.google.com\/p\/common-schema\">common_schema<\/a>, I iterate &amp; parse some text. I iterated the text char by char, and declared a local<strong> CHAR(1)<\/strong> variable to assign iterated value to. It sure bit me when I got to a space character which insisted it was the empty text (indicating end of iteration).<\/p>\n<p>Of course the immediate and correct solution was to change the local variable&#8217;s type to <strong>VARCHAR(1)<\/strong> instead. But the time it consumed to find the problem&#8230; Be warned and be aware!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I know about it, I knew about it all along, but&#8230; it&#8217;s so easy to fall for it; there&#8217;s just so much absurdity! A CHAR type has a known number of characters. For example, the column: CountryCode CHAR(3) CHARSET ascii NOT NULL &#8211; is known to have exactly three characters. These could be &#8216;USA&#8217;, &#8216;FRA&#8217;, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[18],"class_list":["post-4528","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-data-types"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1b2","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4528","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=4528"}],"version-history":[{"count":10,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4528\/revisions"}],"predecessor-version":[{"id":4946,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4528\/revisions\/4946"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=4528"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=4528"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=4528"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}