<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Data Types</title>
	<atom:link href="http://code.openark.org/blog/tag/data-types/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Thu, 09 Sep 2010 16:15:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Implicit casting you don&#8217;t want to see around</title>
		<link>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around</link>
		<comments>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around#comments</comments>
		<pubDate>Wed, 07 Jul 2010 08:53:37 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2344</guid>
		<description><![CDATA[In Beware of implicit casting, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled: Number-String comparisons Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus: mysql&#62; SELECT 3 = '3.0'; +-----------+ &#124; 3 = [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://code.openark.org/blog/mysql/beware-of-implicit-casting">Beware of implicit casting</a>, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled:</p>
<h4>Number-String comparisons</h4>
<p>Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql&gt; SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+
</pre>
</blockquote>
<p>The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.</p>
<h4>Direct DATE arithmetics</h4>
<p>The first query <em>seems</em> to work, but is completely incorrect. The second explains why. The third is a total mess.<span id="more-2344"></span></p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
|             20100104 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
|             20100098 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
|             2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)
</pre>
</blockquote>
<h4>Number-String comparisons, big integers</h4>
<p>Look at the following crazy comparisons:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
|             1 |
+---------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
|                                         1 |
+-------------------------------------------+
</pre>
</blockquote>
<p>The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is <a href="http://dev.mysql.com/doc/refman/5.0/en/type-conversion.html">documented</a>. The manual describes the cases for casting, then states: &#8220;&#8230; In all other cases, the arguments are compared <em>as             floating-point (real) numbers</em>. &#8230;&#8221;</p>
<h4>Lessons learned:</h4>
<ul>
<li>Be careful when comparing strings with floating point values. Matching depends on how both are represented.</li>
<li>Avoid converting temporal types to strings when doing date manipulation.</li>
<li>Avoid direct math on temporal types.</li>
<li>Avoid casting <strong>BIGINT</strong>s represented by strings. Casting will turn out to use <strong>FLOAT</strong>s and may be incorrect.</li>
</ul>
<p>Last but not least:</p>
<ul>
<li>Use the proper data types for your data&#8217;s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Choosing MySQL boolean data types</title>
		<link>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types</link>
		<comments>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types#comments</comments>
		<pubDate>Thu, 03 Jun 2010 05:24:11 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2181</guid>
		<description><![CDATA[How do you implement True/False columns? There are many ways to do it, each with its own pros and cons. ENUM Create you column as ENUM(&#8216;F&#8217;, &#8216;T&#8217;), or ENUM(&#8216;N&#8217;,'Y&#8217;) or ENUM(&#8217;0&#8242;, &#8217;1&#8242;). This is the method used in the mysql tables (e.g. mysql.user privileges table). It&#8217;s very simple and intuitive. It truly restricts the values [...]]]></description>
			<content:encoded><![CDATA[<p>How do you implement <strong>True</strong>/<strong>False</strong> columns?</p>
<p>There are many ways to do it, each with its own pros and cons.</p>
<h4>ENUM</h4>
<p>Create you column as <strong>ENUM(&#8216;F&#8217;, &#8216;T&#8217;)</strong>, or <strong>ENUM(&#8216;N&#8217;,'Y&#8217;)</strong> or <strong>ENUM(&#8217;0&#8242;, &#8217;1&#8242;)</strong>.</p>
<p>This is the method used in the <strong>mysql</strong> tables (e.g. <strong>mysql.user</strong> privileges table). It&#8217;s very simple and intuitive. It truly restricts the values to just two options, which serves well. It&#8217;s compact (just one byte).</p>
<p>A couple disadvantages to this method:</p>
<ol>
<li>Enums are represented by numerical values (which is good) and start with <strong>1</strong> instead of <strong>0</strong>. This means <strong>&#8216;F&#8217;</strong> is <strong>1</strong>, and <strong>&#8216;T&#8217;</strong> is <strong>2</strong>, and they both translate to <strong>True</strong> when directly used in a booleanic expression (e.g. <strong>IF(val, &#8216;True&#8217;, &#8216;False&#8217;)</strong> always yields <strong>&#8216;True&#8217;</strong>)</li>
<li>There&#8217;s no real convention. Is it <strong>&#8216;Y&#8217;/'N&#8217;</strong>? <strong>&#8216;T&#8217;/'F&#8217;</strong>? <strong>&#8216;P&#8217;/'N&#8217;</strong>? <strong>&#8217;1&#8242;/&#8217;0&#8242;</strong>?</li>
</ol>
<h4>CHAR(1)</h4>
<p>Simple again. Proposed values are, as before, <strong>&#8216;F&#8217;</strong>, <strong>&#8216;T&#8217;</strong> etc. This time there&#8217;s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an &#8216;X&#8217;.</p>
<p>Watch out for the charset! If it&#8217;s <strong>utf8</strong> you pay with 3 bytes instead of just 1. And, again, <strong>&#8216;T&#8217;</strong>, <strong>&#8216;F&#8217;</strong>, <strong>&#8216;Y&#8217;</strong>, <strong>&#8216;N&#8217;</strong> values all evaluate as <strong>True</strong>. It is possible to use the zero-valued character, but it defeats the purpose of using <strong>CHAR</strong>.<span id="more-2181"></span></p>
<h4>CHAR(0)</h4>
<p>Many are unaware that it&#8217;s even valid to make this definition. What does it mean? Take a look at the following table:</p>
<blockquote>
<pre>CREATE TABLE `t1` (
 `bval` char(0) DEFAULT NULL
);
mysql&gt; INSERT INTO t1 VALUES ('');
mysql&gt; INSERT INTO t1 VALUES ('');
mysql&gt; INSERT INTO t1 VALUES (NULL);

mysql&gt; SELECT * FROM t1;
+------+
| bval |
+------+
|      |
|      |
| NULL |
+------+
</pre>
</blockquote>
<p>NULLable columns cause for an additional storage per row. There&#8217;s one bit per NULLable column which notes down whether the column&#8217;s value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.</p>
<p>Furthermore:</p>
<blockquote>
<pre>mysql&gt; SELECT bval IS NOT NULL FROM t1;
+------------------+
| bval IS NOT NULL |
+------------------+
|                1 |
|                1 |
|                0 |
+------------------+
</pre>
</blockquote>
<p>So this plays somewhat nicely into booleanic expressions.</p>
<p>However, this method is unintuitive and confusing. I personally don&#8217;t use it.</p>
<h4>TINYINT</h4>
<p>With integer values, we can get down to <strong>0</strong> and <strong>1</strong>. With <strong>TINYINT</strong>, we only pay with 1 byte of storage. As with <strong>CHAR(1)</strong>, we cannot prevent anyone from INSERTing other values. But that doesn&#8217;t really matter, if we&#8217;re willing to accept that 0 evaluates as <strong>False</strong>, and <em>all other values</em> as <strong>True</strong>. In this case, boolean expressions work very well with your column values.</p>
<h4>BOOL/BOOLEAN</h4>
<p>These are just synonyms to <strong>TINYINT</strong>. I like to define my boolean values as such. Alas, when issuing a <strong>SHOW CREATE TABLE</strong> the definition is just a normal <strong>TINYINT</strong>. Still, it is clearer to look at if you&#8217;re storing your table schema under your version control.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>But I DO want MySQL to say &#8220;ERROR&#8221;!</title>
		<link>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error</link>
		<comments>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error#comments</comments>
		<pubDate>Fri, 12 Mar 2010 04:53:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[sql_mode]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2005</guid>
		<description><![CDATA[MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data. Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though). Calling CREATE TEMPORARY TABLE? You get silent commit. Issuing a ROLLBACK on non-transactional involved [...]]]></description>
			<content:encoded><![CDATA[<p>MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.</p>
<ul>
<li>Using <strong>GROUP_CONCAT</strong> with a small <strong>group_concat_max_len</strong> setting? Your result will be silently truncated (make sure to check the warnings though).</li>
<li>Calling <strong>CREATE <span style="text-decoration: line-through;">TEMPORARY</span> TABLE</strong>? You get <a href="http://www.joinfu.com/2010/03/a-follow-up-on-the-sql-puzzle/">silent commit</a>.</li>
<li>Issuing a <strong>ROLLBACK</strong> on non-transactional involved engines? Have a warning; no error.</li>
<li>Using <strong>LOCK IN SHARE MODE</strong> on non transactional tables? Not a problem. Nothing reported.</li>
<li>Adding a <strong>FOREIGN KEY</strong> on a MyISAM table? Good for you; no action actually taken.</li>
<li>Inserting <strong>300</strong> to a <strong>TINYINT</strong> column in a relaxed <strong>sql_mode</strong>? Give me <strong>255</strong>, I&#8217;ll silently drop the remaining <strong>45</strong>. I owe you.</li>
</ul>
<h4>Warnings and errors</h4>
<p>It would be nice to:<span id="more-2005"></span></p>
<ul>
<li>Have an <strong>auto_propagate_warning_to_error</strong> server variable (global/session/both) which, well, does what it says.</li>
<li>Have an <strong>i_am_really_not_a_dummy</strong> server variable which implies stricter checks for all the above and prevents you from doing with <em>anything</em> that may be problematic (or rolls back your transactions on your invalid actions).</li>
</ul>
<p>Connectors may be nice enough to propagate warnings to errors &#8211; that&#8217;s good. But not enough: since data is already committed in MySQL.</p>
<p>If I understand correctly, and maybe it&#8217;s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that &#8220;MySQL just works out of the box and does not require me to configure or understand anything&#8221;).</p>
<p>MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity &#8212; or, be nice to everyone, just allow me to specify what &#8220;nice&#8221; means for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Useful temporal functions &amp; queries</title>
		<link>http://code.openark.org/blog/mysql/useful-temporal-functions-queries</link>
		<comments>http://code.openark.org/blog/mysql/useful-temporal-functions-queries#comments</comments>
		<pubDate>Tue, 08 Dec 2009 09:46:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1666</guid>
		<description><![CDATA[Here&#8217;s a complication of some common and useful time &#38; date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations. There are many ways to solve such problems. I&#8217;ll present my favorites. Querying for time difference Given two timestamps: ts1 (older) and ts2 (newer), how much time has [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a complication of some common and useful time &amp; date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.</p>
<p>There are many ways to solve such problems. I&#8217;ll present my favorites.</p>
<h4>Querying for time difference</h4>
<p>Given two timestamps: <em>ts1</em> (older) and <em>ts2</em> (newer), how much time has passed between them?</p>
<p>One can use <strong>TIMEDIFF()</strong> &amp; <strong>DATEDIFF()</strong>, or compare two <strong>UNIX_TIMESTAMP()</strong> values. My personal favorite is to use <strong>TIMESTAMPDIFF()</strong>. Reason being that I&#8217;m usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:</p>
<blockquote>
<pre>SELECT TIMESTAMPDIFF(HOUR, ts1, ts2)</pre>
</blockquote>
<p>Take, for example:</p>
<blockquote>
<pre>SELECT TIMESTAMPDIFF(MONTH, '2008-10-07 00:00:00', '2009-12-06 00:00:00')</pre>
</blockquote>
<p>The function correctly identifies the number of days per month, and provides with <strong>13</strong>, being the truncated number of full months.</p>
<h4>Doing arithmetics</h4>
<p>One can use <strong>TIMESTAMPADD()</strong>, or <strong>DATE_SUB()</strong>, but, again, when dealing with specific resolutions, I find &#8220;<strong>+ INTERVAL</strong>&#8221; to be the most convenient:</p>
<blockquote>
<pre>SELECT ts1 + INTERVAL 10 HOUR</pre>
</blockquote>
<p><span id="more-1666"></span>This allows me to only add by a specific unit: <strong>SECOND</strong>, <strong>MINUTE</strong>, <strong>HOUR</strong>, <strong>DAY</strong>, <strong>WEEK</strong>, etc. Many times I find this is exactly what I want.</p>
<blockquote>
<pre>SELECT TIMESTAMP('2009-12-06 20:14:52') + INTERVAL 4 WEEK AS ts2;
+---------------------+
| ts2                 |
+---------------------+
| 2010-01-03 20:14:52 |
+---------------------+</pre>
</blockquote>
<h4>Checking if a timestamp is in a given date</h4>
<p>This one is very popular, and most poorly treated.</p>
<p>Say we have a <strong>sales</strong> table, with some <strong>ts</strong> column. We want to SELECT all sales on Dec 25th, 2008. I&#8217;ve seen so many solutions, many in writing. Let&#8217;s look at them:</p>
<p><em>Wrong:</em></p>
<blockquote>
<pre>SELECT * FROM sales WHERE ts BETWEEN '2008-12-25' AND '2008-12-26'</pre>
</blockquote>
<p>Why is this wrong? Because <strong>BETWEEN</strong> is inclusive. A sale taking place on &#8216;<strong>2008-12-26 00:00:00</strong>&#8216; will match our condition.</p>
<p><em>Correct but inefficient:</em></p>
<blockquote>
<pre>SELECT * FROM sales WHERE DATE(ts) = DATE('2008-12-25')</pre>
</blockquote>
<p>Why is this inefficient? Because a function is used over the <strong>ts</strong> column. This disables use of any index we might have on <strong>ts</strong>, leading to full table scan.</p>
<p><em>Correct but inefficient:</em></p>
<blockquote>
<pre>SELECT * FROM sales WHERE ts LIKE '2008-12-25 %'</pre>
</blockquote>
<p>Why is this inefficient? Because a function is used over the <strong>ts</strong> column. Can you see it? It&#8217;s an implicit CAST function, which casts the TIMESTAMP value to a character value, so as to perform a string comparison.</p>
<p><em>Correct but ugh:</em></p>
<blockquote>
<pre>SELECT * FROM sales WHERE ts BETWEEN '2008-12-25 00:00:00' AND '2008-12-25 23:59:59'</pre>
</blockquote>
<p>Why is it ugh? Because, well, &#8230;Ugh!</p>
<p><em>Correct:</em></p>
<blockquote>
<pre>SELECT * FROM sales WHERE ts &gt;= DATE('2008-12-25') AND ts &lt; DATE('2008-12-26')</pre>
</blockquote>
<p>This allows for indexing to be used properly. The <strong>DATE()</strong> casting is not strictly required here, but is generally safer.</p>
<h4>Truncating to last midnight</h4>
<p>Surprisingly, this simple question sees a lot of incorrect solution attempts. The quickest, safest way to get &#8220;last midnight&#8221; is:</p>
<blockquote>
<pre>SELECT DATE(ts)</pre>
</blockquote>
<p>or, if you like to be stricter:</p>
<blockquote>
<pre>SELECT TIMESTAMP(DATE(ts))</pre>
</blockquote>
<p>For example:</p>
<blockquote>
<pre>SELECT TIMESTAMP(DATE('2009-12-06 20:14:52')) AS midnight;
+---------------------+
| midnight            |
+---------------------+
| 2009-12-06 00:00:00 |
+---------------------+</pre>
</blockquote>
<h4>Truncating to last round hour</h4>
<p>Similar to the above, but utilizes arithmetic:</p>
<blockquote>
<pre>SELECT DATE(ts) + INTERVAL HOUR(ts) HOUR</pre>
</blockquote>
<p>For example:</p>
<blockquote>
<pre>SELECT ts, DATE(ts) + INTERVAL HOUR(ts) HOUR FROM sales LIMIT 5;
+---------------------+-----------------------------------+
| ts                  | DATE(ts) + INTERVAL HOUR(ts) HOUR |
+---------------------+-----------------------------------+
| 2009-01-05 05:17:00 | 2009-01-05 05:00:00               |
| 2009-03-09 00:49:00 | 2009-03-09 00:00:00               |
| 2009-02-20 00:14:00 | 2009-02-20 00:00:00               |
| 2009-02-14 22:42:00 | 2009-02-14 22:00:00               |
| 2009-03-14 04:50:00 | 2009-03-14 04:00:00               |
+---------------------+-----------------------------------+</pre>
</blockquote>
<h4>Round to closest round hour</h4>
<p>Taking the classic round() implementation, which states:</p>
<blockquote>
<pre>round(x) := int(x + 0.5)</pre>
</blockquote>
<p>We write:</p>
<blockquote>
<pre>SELECT DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR</pre>
</blockquote>
<p>Example:</p>
<blockquote>
<pre>SELECT ts, DATE(ts + INTERVAL 30 MINUTE) + INTERVAL HOUR(ts + INTERVAL 30 MINUTE) HOUR AS rounded FROM sales ORDER BY HOUR(ts) DESC LIMIT 5;
+---------------------+---------------------+
| ts                  | rounded             |
+---------------------+---------------------+
| 2009-03-25 23:54:00 | 2009-03-26 00:00:00 |
| 2009-03-13 23:45:00 | 2009-03-14 00:00:00 |
| 2009-01-29 22:53:00 | 2009-01-29 23:00:00 |
| 2009-01-18 22:22:00 | 2009-01-18 22:00:00 |
| 2009-01-14 22:16:00 | 2009-01-14 22:00:00 |
+---------------------+---------------------+</pre>
</blockquote>
<h4>Count number of midnights between two timestamps, inclusive</h4>
<p>Given two timestamps, <em>ts1</em> and <em>ts2</em>, what is the number of midnights between them?</p>
<blockquote>
<pre>SELECT TIMESTAMPDIFF(DAY, DATE(ts1), ts2) + IF(DATE(ts1) = ts1, 1, 0);</pre>
</blockquote>
<p>Example:</p>
<blockquote>
<pre>SELECT ts, ts2, TIMESTAMPDIFF(DAY, DATE(ts), ts2) + IF(DATE(ts) = ts, 1, 0) AS number_of_midnights FROM sales LIMIT 10;
+---------------------+---------------------+---------------------+
| ts                  | ts2                 | number_of_midnights |
+---------------------+---------------------+---------------------+
| 2009-01-05 05:17:00 | 2009-01-05 19:17:00 |                   0 |
| 2009-03-09 00:49:00 | 2009-03-11 15:49:00 |                   2 |
| 2009-02-20 00:14:00 | 2009-02-23 02:14:00 |                   3 |
| 2009-02-14 22:42:00 | 2009-02-18 07:42:00 |                   4 |
| 2009-03-14 04:50:00 | 2009-03-17 16:50:00 |                   3 |
| 2009-02-16 04:01:00 | 2009-02-19 08:01:00 |                   3 |
| 2009-01-20 05:36:00 | 2009-01-21 08:36:00 |                   1 |
| 2009-02-07 15:57:00 | 2009-02-07 22:57:00 |                   0 |
| 2009-02-13 14:59:00 | 2009-02-15 22:59:00 |                   2 |
| 2009-01-11 03:02:00 | 2009-01-13 11:02:00 |                   2 |
+---------------------+---------------------+---------------------+</pre>
</blockquote>
<h4>Further notes</h4>
<p>A full listing of temporal functions can be found on the <a href="http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html">MySQL documentation</a>. There&#8217;s almost always more than one way to solve a problem. I&#8217;ve seen (and done, in the past) many calculations done on the application side due to lack of familiarity with the available functions.</p>
<p>Please share your own common solutions below!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/useful-temporal-functions-queries/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Character sets: latin1 vs. ascii</title>
		<link>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii</link>
		<comments>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii#comments</comments>
		<pubDate>Wed, 08 Jul 2009 07:39:02 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Data Types]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=828</guid>
		<description><![CDATA[Unless specified otherwise, latin1 is the default character set in MySQL. What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.) [...]]]></description>
			<content:encoded><![CDATA[<p>Unless specified otherwise, <strong>latin1</strong> is the default character set in MySQL.</p>
<p>What I usually find in schemes are columns which are either <strong>utf8</strong> or <strong>latin1</strong>. The <strong>utf8</strong> columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and <strong>latin1</strong> column being all the rest (passwords, digests, email addresses, hard-coded values etc.)</p>
<p>I find <strong>latin1</strong> to be improper for such purposes and suggest that <strong>ascii</strong> be used instead. The reason being that <strong>latin1</strong> implies a European text (with swedish collation). It is unclear for an outsider, when finding a <strong>latin1</strong> column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character in <strong>latin1</strong> only requires 1 byte of storage.<span id="more-828"></span></p>
<p>Well, this is what the <strong>ascii</strong> character set is for. When I see an <strong>ascii</strong> column, I know for sure no West European characters are allowed; just the plain old a-zA-Z0-9 etc. It is clearer from the schema&#8217;s definition what the stored values should be.</p>
<h4>A note to MySQL</h4>
<p>It&#8217;s been long since the Swedish roots of the company have dictated defaults. New instances should default to either <strong>ascii</strong> or  <strong>utf8</strong> (the latter being the most common and space efficient unicode protocol): character sets that are locale-neutral. Really, how many people realize that when they <strong>ORDER BY</strong> a text column, rows are sorted according to Swedish dictionary ordering?</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The depth of an index: primer</title>
		<link>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer</link>
		<comments>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer#comments</comments>
		<pubDate>Thu, 09 Apr 2009 03:55:08 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=545</guid>
		<description><![CDATA[InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index). In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves. Assuming the index is not in memory, the depth [...]]]></description>
			<content:encoded><![CDATA[<p>InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).</p>
<p>In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.</p>
<p>Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.</p>
<p>What affects the depth of an index?</p>
<p>There are quite a few structural issues, but it boils down to two important factors:</p>
<ol>
<li>The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.</li>
<li>The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we&#8217;ll assume a large number of rows).</li>
</ol>
<p><span id="more-545"></span>Of course, these two factors also affect the total size of the index, hence its disk usage, but I wish to concentrate on the index depth.</p>
<p>Let&#8217;s emphasize the second factor. It is best to index shorter columns, if that is possible. It is the reason behind using an index on a VARCHAR&#8217;s prefix (e.g. KEY(email_address(16)). It is also a reason to use INT, instead of BIGINT columns for your primary key, when BIGINT is not required.</p>
<p>The larger the indexed data type is (or the total size of data types for all columns in a combined index), the less values that can fit in an index node. The less values in a node, the more node splits occur; the more nodes are required to build the index. The less values in the node, the less <em>wide</em> the index tree is. The less wide an index tree is, and the more nodes it has &#8211; the deeper it gets.</p>
<p>So bigger data types lead to deeper trees. Deeper trees lead to more IO operations on lookup.</p>
<h4>InnoDB</h4>
<p>On InnoDB there&#8217;s another issue: all tables are clustered by primary key. Any access to table data requires diving into, or traversing the primary key tree.</p>
<p>On InnoDB, a secondary index (any index which is not the primary key) does not lead to table data. Instead, the &#8220;data&#8221; in the leaf nodes of a secondary index &#8211; are the primary key values.</p>
<p>And so, when looking up a value on an InnoDB table using a secondary key, we first search the secondary key to retrieve the primary key value, then go to the primary key tree to retrieve the data.</p>
<p>This means two index lookups, one of which is always the primary key.</p>
<p>On InnoDB, it is therefore in particular important to keep the primary key small. Have small data types. Prefer an SMALLINT to INT, if possible. Prefer an INT to BIGINT, if possible. Prefer an integer value over some VARCHAR text.</p>
<p>With long data types used in an InnoDB primary key, not only is the primary key index bloated (deep), but also every other index gets to be bloated, as the leaf values in all other indexes are those same long data types.</p>
<h4>MyISAM</h4>
<p>MyISAM does not use clustered trees, hence the primary key is just a regular unique key. All indexes are created equal and an index lookup only consists of a single index search. Therefore, two indexes do no affect one another, with the exception that they are competing on the same key cache.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL&#8217;s character sets and collations demystified</title>
		<link>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified</link>
		<comments>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified#comments</comments>
		<pubDate>Mon, 08 Dec 2008 06:44:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=10</guid>
		<description><![CDATA[MySQL's character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets.]]></description>
			<content:encoded><![CDATA[<p>MySQL&#8217;s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.</p>
<p>This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets.<span id="more-10"></span></p>
<h4>Character Sets</h4>
<p>A thorough discussion of how the character sets have evolved through history is beyond the scope of this post. While the Unicode standard is gaining recognition, the &#8220;older&#8221; character sets are still around. Understanding the difference between Unicode and local character sets is crucial.</p>
<p>Consider, for example, MySQL&#8217;s <strong><code>latin1</code></strong> character set. In this character set there are 256 different characters, represented by one byte. The first 128 characters map to ASCII, the standard &#8220;ABCabc012 dot comma&#8221; set, of which most of this post is composed. The latter 128 characters in <strong><code>latin1</code></strong> are composed of West European specific characters, such as À, ë, õ, Ñ.</p>
<p>A <strong><code>Name VARCHAR(60) CHARSET latin1</code></strong> column can describe names with West European characters. But it cannot describe Russian or Hebrew names. To represent a name in Hebrew, you&#8217;d need the <strong><code>hebrew</code></strong> charset (ISO 8859-8), in which the first 128 characters are, as always, mapped to ASCII, and the latter 128 characters describe the Hebrew alphabet and punctuation marks, such as ש,ל,מ,ה. The Cyrillic, Arabic and Turkish charsets follow in a similar manner.</p>
<p>Assume now I&#8217;m building a world wide web application, such as a popular social network. I would like to store the first names of my users, in every possible language. None of the above character sets support all languages. I therefore turn to <a title="What is Unicode" href="http://www.unicode.org/standard/WhatIsUnicode.html">Unicode</a>. In particular, MySQL supports <strong><code>utf8</code></strong>, a Unicode encoding scheme, which is commonly used due to its economic storage requirements.</p>
<p>In Unicode there is a dedicated number for each letter in the known languages, in ancient languages, and some imaginary or otherwise non existing languages, such as Klingon (yes, I know there are people who actually speak Klingon), may yet find their way into the standard.</p>
<p>UTF8 (or utf8), a Unicode encoding scheme, states the following: for ASCII characters, such as &#8216;a&#8217;, &#8217;6&#8242;, &#8216;$&#8217;, only one byte of storage is required. For Hebrew, Cyrillic or Turkish characters, 2 bytes are required. For Japanese, Chinese &#8211; more (MySQL supports up to 3 bytes per character). Again, the exact details of the implementation are beyond the scope of this post, and are well described <a title="UTF-8 and Unicode FAQ" href="http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8">here</a> and <a title="Wikipedia - UTF-8" href="http://en.wikipedia.org/wiki/UTF-8">here.</a></p>
<p>What&#8217;s important to me is that I can define <strong><code>Name VARCHAR(30) CHARSET utf8</code></strong> for my columns, and Voila! Any name can be represented in my database.</p>
<h4>So why not define everything as utf8 and get done with it?</h4>
<p>Well, it just so happens that Unicode comes with a price. See, for example, the following column definition:<strong></strong></p>
<blockquote><p><code>CountryCode CHAR(3) CHARSET utf8</code></p></blockquote>
<p>We are asking for a column with 3 characters exactly. The required storage for this column will be such that any 3-letter name must fit in. This means (3 characters) times (3 bytes per character) = 9 bytes of storage. So <strong><code>CHAR</code></strong> and <strong><code>utf8</code></strong> together may be less than ideal.<strong><code> VARCHAR</code></strong> behaves better: it only requires as many bytes per character as described above. So the text &#8220;abc&#8221; will only require 3 bytes (plus <strong><code>VARCHAR</code></strong>&#8216;s leading 1 or 2 bytes).</p>
<h4>Why don&#8217;t we drop the &#8216;CHAR&#8217; altogether, then, and use only &#8216;VARCHAR&#8217;?</h4>
<p>Because some values are simply better represented with <strong><code>CHAR</code></strong>: consider a &#8220;password&#8221; column, encoded with MD5. The <strong><code>MD5()</code></strong> function returns a 32 characters long text. It&#8217;s always 32 characters, and, moreover, it&#8217;s always in ASCII. The best data type and character set definition would be <strong><code>password CHAR(32) CHARSET ascii</code></strong>. We thus ensure exactly 32 bytes are allocated to this column. A <strong><code>VARCHAR</code></strong> will acquire an additional byte or two, depending on its defined length, which will indicate the length of the text.</p>
<h4>And why would I care about collations?</h4>
<p>Collations deal with text comparison. We observed that the default character set in MySQL is <strong><code>latin1</code></strong>. The default collation is <strong><code>latin1_swedish_ci</code></strong>. In this collation the following holds true: <strong><code>'ABC' = 'abc'</code></strong>.</p>
<p>Wait. What?</p>
<p>Look at the &#8220;ci&#8221; in <strong><code>latin1_swedish_ci</code></strong>. It stands for &#8220;case insensitive&#8221;. Collations which end with &#8220;cs&#8221; or &#8220;bin&#8221; are case sensitive. The <strong><code>utf8</code></strong> character set comes with <strong><code>utf8_general_ci</code></strong> collation. This can make sense. Let&#8217;s review our web application table (I&#8217;m using plain text passwords here, bare with me for this example):</p>
<blockquote>
<pre>CREATE TABLE my_users (
  name VARCHAR(30) CHARSET utf8 COLLATE utf8_general_ci,
  plainPassword VARCHAR(16) CHARSET ASCII,
  UNIQUE KEY (name)
);
INSERT INTO my_users (name, password) VALUES ('David', 'mypass');</pre>
</blockquote>
<p>It holds true that the name &#8216;David&#8217; equals &#8216;david&#8217;. If I were to <strong><code>SELECT * FROM my_users WHERE name='david'</code></strong>, I would find the desired row. The unique key will also guarantee that no daVID user can be added.</p>
<p>But David certainly wouldn&#8217;t want users to login with the password &#8216;MYPASS&#8217;. So we refine our table:</p>
<blockquote>
<pre>CREATE TABLE my_users (
  name VARCHAR(30) CHARSET utf8 COLLATE utf8_general_ci,
  plainPassword VARCHAR(16) CHARSET ascii COLLATE ascii_bin,
  UNIQUE KEY (name)
);</pre>
</blockquote>
<p>The <strong><code>ascii_bin</code></strong> collation is a case sensitive collation for <strong><code>ascii</code></strong>. The following will not find anything:</p>
<blockquote><p><code>SELECT * FROM my_users WHERE name='david' AND plainPassword='MYPASS';</code></p></blockquote>
<p>Holding a plain text password in your database is not a best practice, but apparently it&#8217;s common.</p>
<p>Collations also deal with text ordering. For any two strings, the collation determines which is larger, or if they are equal. Probably the most common situation you see collations in action is when you <strong>ORDER BY</strong> a text column.</p>
<h4>Also keep in mind</h4>
<ul>
<li>When you check for length of strings, do you use the <strong><code>LENGTH()</code></strong> function, as in <strong><code>SELECT LENGTH(Name) FROM City</code></strong>? You probably wish to replace this with <strong><code>CHAR_LENGTH()</code></strong>. <strong><code>LENGTH()</code></strong> returns the number of bytes required for the text storage. <strong><code>CHAR_LENGTH()</code></strong> returns the number of characters in the text, and is usually what you are looking for. It may hold true that for a string s, <strong><code>LENGTH(s)=12</code></strong> and <strong><code>CHAR_LENGTH(s)=8</code></strong>. Watch out for these glitches.</li>
<li>You can converts texts between character sets with <strong><code>CONVERT</code></strong>. For example: <strong><code>CONVERT(s USING utf8)</code></strong></li>
<li>Stored routines should not be overlooked. If your stored routine accepts a text argument, or if your stored function returns one, make sure the character sets are properly defined. If not, then your utf8 text may be converted to latin1 during the call to your stored routine. This also applies to local parameters within the stored routines.</li>
<li>An <strong><code>ALTER TABLE <em>&lt;some table&gt;</em> CONVERT TO <em>&lt;some charset&gt;</em></code></strong> will change the character set not only for the table itself, but also for all existing textual columns.</li>
</ul>
<p>See the following post: <a title="Useful database analysis queries with INFORMATION_SCHEMA" href="http://code.openark.org/blog/mysql/useful-database-analysis-queries-with-information_schema">Useful database analysis queries with INFORMATION_SCHEMA</a> for queries which diagnose your databases character sets.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Common wrong Data Types compilation</title>
		<link>http://code.openark.org/blog/mysql/common-data-types-errors-compilation</link>
		<comments>http://code.openark.org/blog/mysql/common-data-types-errors-compilation#comments</comments>
		<pubDate>Tue, 18 Nov 2008 07:37:57 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Normalization]]></category>
		<category><![CDATA[Schema]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=85</guid>
		<description><![CDATA[During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. 

Here's a compilation of "the right and the wrong" data types.]]></description>
			<content:encoded><![CDATA[<p>During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.</p>
<p>Here&#8217;s a compilation of &#8220;the right and the wrong&#8221; data types.<span id="more-85"></span></p>
<ul>
<li><strong><code>INT(1)</code></strong> is not one byte long. <strong><code>INT(10)</code></strong> is no bigger than <strong><code>INT(2)</code></strong>. The number in parenthesis is misleading, and only describes the text alignment of the number, when displayed in an interactive shell. All mentioned types are the same INT, have the same storage capacity, and the same range. If you want a one-byte <strong><code>INT</code></strong>, use <strong><code>TINYINT</code></strong>.</li>
</ul>
<ul>
<li>An integer <strong><code>PRIMARY KEY</code></strong> is preferable, especially if you&#8217;re using the InnoDB storage engine. If possible, avoid using <strong><code>VARCHAR</code></strong> as <strong><code>PRIMARY KEY</code></strong>. In InnoDB, this will make the clustered index deeper, secondary indexes larger (sometimes much larger) and look ups slower.</li>
</ul>
<ul>
<li>Do not use <strong><code>VARCHAR</code></strong> to represent timestamps. It may look like <strong><code>'2008-11-14 07:59:13'</code></strong> is a textual field, but in fact it&#8217;s just an integer counting the seconds elapsed from 1970-01-01. That&#8217;s 4 bytes vs. 19 if you&#8217;re using <strong><code>CHAR</code></strong> with <strong><code>ASCII</code></strong> charset, or more if you&#8217;re using <strong><code>UTF8</code></strong> or <strong><code>VARCHAR</code></strong>.</li>
</ul>
<ul>
<li>Do not use <strong><code>VARCHAR</code></strong> to represent IPv4 addresses. This one is quite common. The IP 192.168.100.255 can be represented with <strong><code>VARCHAR(15)</code></strong>, true, but could be better represented with a 4-byte int. That&#8217;s what IPv4 is: four bytes. Use the <strong><code>INET_ATON()</code></strong> and <strong><code>INET_NTOA()</code></strong> functions to translate between the INT value and textual value.</li>
</ul>
<ul>
<li>This one should be obvious, but I&#8217;ve seen it in reality, where the schema was auto generated by some naive generator: do not represent numbers as text. Yes, I have seen integer columns represented by <strong><code>VARCHAR</code></strong>. Don&#8217;t ask how the performance was.</li>
</ul>
<ul>
<li><strong><code>MD5()</code></strong> columns shouldn&#8217;t be <strong><code>VARCHAR</code></strong>. Use <strong><code>CHAR(32)</code></strong> instead. It&#8217;s always 32 bytes long, so no need for <strong><code>VARCHAR</code></strong>&#8216;s additional byte overhead. If your tables or database are <strong><code>UTF8</code></strong> by default, make sure the MD5 column&#8217;s charset is <strong><code>ASCII</code></strong>, or it will consume 96 bytes instead of just 32. I also suggest the case-sensitive <strong><code>ascii_bin</code></strong> collation, but that&#8217;s a more minor issue.</li>
</ul>
<ul>
<li><strong><code>PASSWORD()</code></strong> columns shouldn&#8217;t be <strong><code>VARCHAR</code></strong>, but <strong><code>CHAR</code></strong>. The length depends on whether you&#8217;re using <strong><code>old-passwords</code></strong> variable (for some strange reason, this variable always appears in the MySQL sample configuration files &#8211; though you really don&#8217;t want it unless it&#8217;s for backward compatibility with older MySQL versions). As in the MD5 note, use <strong><code>ASCII</code></strong> charset.</li>
</ul>
<ul>
<li>Better use <strong><code>TIMESTAMP</code></strong> than <strong><code>INT</code></strong> to count seconds, as MySQL has many supportive functions for this data type.</li>
</ul>
<ul>
<li>Use <strong><code>TINYINT</code></strong>, <strong><code>SMALLINT</code></strong>, <strong><code>MEDIUMINT</code></strong> instead of <strong><code>INT</code></strong> when possible. Do you expect to have 4000000000 customers? No? Then a &#8220;<strong><code>id SMALLINT</code></strong>&#8221; may suffice as <strong><code>PRIMARY KEY</code></strong>.</li>
</ul>
<ul>
<li>Use <strong><code>CHARACTER SET</code></strong>s with care. More on this on future posts.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/common-data-types-errors-compilation/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>
