{"id":85,"date":"2008-11-18T09:37:57","date_gmt":"2008-11-18T07:37:57","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=85"},"modified":"2008-12-13T06:34:09","modified_gmt":"2008-12-13T04:34:09","slug":"common-data-types-errors-compilation","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/common-data-types-errors-compilation","title":{"rendered":"Common wrong Data Types compilation"},"content":{"rendered":"<p>During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.<\/p>\n<p>Here&#8217;s a compilation of &#8220;the right and the wrong&#8221; data types.<!--more--><\/p>\n<ul>\n<li><strong><code>INT(1)<\/code><\/strong> is not one byte long. <strong><code>INT(10)<\/code><\/strong> is no bigger than <strong><code>INT(2)<\/code><\/strong>. The number in parenthesis is misleading, and only describes the text alignment of the number, when displayed in an interactive shell. All mentioned types are the same INT, have the same storage capacity, and the same range. If you want a one-byte <strong><code>INT<\/code><\/strong>, use <strong><code>TINYINT<\/code><\/strong>.<\/li>\n<\/ul>\n<ul>\n<li>An integer <strong><code>PRIMARY KEY<\/code><\/strong> is preferable, especially if you&#8217;re using the InnoDB storage engine. If possible, avoid using <strong><code>VARCHAR<\/code><\/strong> as <strong><code>PRIMARY KEY<\/code><\/strong>. In InnoDB, this will make the clustered index deeper, secondary indexes larger (sometimes much larger) and look ups slower.<\/li>\n<\/ul>\n<ul>\n<li>Do not use <strong><code>VARCHAR<\/code><\/strong> to represent timestamps. It may look like <strong><code>'2008-11-14 07:59:13'<\/code><\/strong> is a textual field, but in fact it&#8217;s just an integer counting the seconds elapsed from 1970-01-01. That&#8217;s 4 bytes vs. 19 if you&#8217;re using <strong><code>CHAR<\/code><\/strong> with <strong><code>ASCII<\/code><\/strong> charset, or more if you&#8217;re using <strong><code>UTF8<\/code><\/strong> or <strong><code>VARCHAR<\/code><\/strong>.<\/li>\n<\/ul>\n<ul>\n<li>Do not use <strong><code>VARCHAR<\/code><\/strong> to represent IPv4 addresses. This one is quite common. The IP 192.168.100.255 can be represented with <strong><code>VARCHAR(15)<\/code><\/strong>, true, but could be better represented with a 4-byte int. That&#8217;s what IPv4 is: four bytes. Use the <strong><code>INET_ATON()<\/code><\/strong> and <strong><code>INET_NTOA()<\/code><\/strong> functions to translate between the INT value and textual value.<\/li>\n<\/ul>\n<ul>\n<li>This one should be obvious, but I&#8217;ve seen it in reality, where the schema was auto generated by some naive generator: do not represent numbers as text. Yes, I have seen integer columns represented by <strong><code>VARCHAR<\/code><\/strong>. Don&#8217;t ask how the performance was.<\/li>\n<\/ul>\n<ul>\n<li><strong><code>MD5()<\/code><\/strong> columns shouldn&#8217;t be <strong><code>VARCHAR<\/code><\/strong>. Use <strong><code>CHAR(32)<\/code><\/strong> instead. It&#8217;s always 32 bytes long, so no need for <strong><code>VARCHAR<\/code><\/strong>&#8216;s additional byte overhead. If your tables or database are <strong><code>UTF8<\/code><\/strong> by default, make sure the MD5 column&#8217;s charset is <strong><code>ASCII<\/code><\/strong>, or it will consume 96 bytes instead of just 32. I also suggest the case-sensitive <strong><code>ascii_bin<\/code><\/strong> collation, but that&#8217;s a more minor issue.<\/li>\n<\/ul>\n<ul>\n<li><strong><code>PASSWORD()<\/code><\/strong> columns shouldn&#8217;t be <strong><code>VARCHAR<\/code><\/strong>, but <strong><code>CHAR<\/code><\/strong>. The length depends on whether you&#8217;re using <strong><code>old-passwords<\/code><\/strong> variable (for some strange reason, this variable always appears in the MySQL sample configuration files &#8211; though you really don&#8217;t want it unless it&#8217;s for backward compatibility with older MySQL versions). As in the MD5 note, use <strong><code>ASCII<\/code><\/strong> charset.<\/li>\n<\/ul>\n<ul>\n<li>Better use <strong><code>TIMESTAMP<\/code><\/strong> than <strong><code>INT<\/code><\/strong> to count seconds, as MySQL has many supportive functions for this data type.<\/li>\n<\/ul>\n<ul>\n<li>Use <strong><code>TINYINT<\/code><\/strong>, <strong><code>SMALLINT<\/code><\/strong>, <strong><code>MEDIUMINT<\/code><\/strong> instead of <strong><code>INT<\/code><\/strong> when possible. Do you expect to have 4000000000 customers? No? Then a &#8220;<strong><code>id SMALLINT<\/code><\/strong>&#8221; may suffice as <strong><code>PRIMARY KEY<\/code><\/strong>.<\/li>\n<\/ul>\n<ul>\n<li>Use <strong><code>CHARACTER SET<\/code><\/strong>s with care. More on this on future posts.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. <\/p>\n<p>Here&#8217;s a compilation of &#8220;the right and the wrong&#8221; data types.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[18,19,17],"class_list":["post-85","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-data-types","tag-normalization","tag-schema"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1n","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/85","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=85"}],"version-history":[{"count":13,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/85\/revisions"}],"predecessor-version":[{"id":357,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/85\/revisions\/357"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=85"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=85"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=85"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}