<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; SQL</title>
	<atom:link href="http://code.openark.org/blog/tag/sql/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Tue, 07 Sep 2010 05:53:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SQL trick: overcoming GROUP_CONCAT limitation in special cases</title>
		<link>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases</link>
		<comments>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases#comments</comments>
		<pubDate>Wed, 21 Jul 2010 13:14:30 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2580</guid>
		<description><![CDATA[In Verifying GROUP_CONCAT limit without using variables, I have presented a test to verify if group_concat_max_len is sufficient for known limitations. I will follow the path where I assume I cannot control group_concat_max_len, not even in session scope, and show an SQL solution, dirty as it is, to overcome the GROUP_CONCAT limitation, under certain conditions. [...]]]></description>
			<content:encoded><![CDATA[<p>In <a title="Link to Verifying GROUP_CONCAT limit without  using variables" rel="bookmark" href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables">Verifying GROUP_CONCAT limit without using variables</a>, I have presented a test to verify if <strong>group_concat_max_len</strong> is sufficient for known limitations. I will follow the path where I assume I cannot control <strong>group_concat_max_len</strong>, not even in session scope, and show an SQL solution, dirty as it is, to overcome the <strong>GROUP_CONCAT</strong> limitation, under certain conditions.</p>
<p>Sheeri rightfully <a href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comment-14617">asks</a> why I wouldn&#8217;t just set <strong>group_concat_max_len </strong>in session scope. The particular case I have is that I&#8217;m providing a VIEW definition. I&#8217;d like users to &#8220;install&#8221; that view, i.e. to <strong>CREATE</strong> it on their database. The VIEW does some logic, and uses <strong>GROUP_CONCAT</strong> to implement that logic.</p>
<p>Now, I have no control on the DBA or developer who created the view. The creation of the view has nothing to do with the <strong>group_concat_max_len</strong> setting on her database instance.</p>
<h4>An example</h4>
<p>OK, apologies aside. Using the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database, I execute:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): AKROYD,AKROYD,AKROYD,ALLEN,ALLEN,ALLEN,ASTAIRE,BACALL,BAILEY,BAILEY,BALE,BALL,BARRYMORE,BASINGER,BENING,BENING,BERGEN,BERGMAN,BERRY,BERRY,BERRY,BIRCH,BLOOM,BOLGER,BOLGER,BRIDGES,BRODY,BRODY,BULLOCK,CAGE,CAGE,CARREY,CHAPLIN,CHASE,CHASE,CLOSE,COSTNER,CRAWFORD,CRAWFORD,CRONYN,CRONYN,CROWE,CRUISE,CRUZ,DAMON,DAVIS,DAVIS,DAVIS,DAY-LEWIS,DEAN,DEAN,DEE,DEE,DEGENERES,DEGENERES,DEGENERES,DENCH,DENCH,DEPP,DEPP,DERN,DREYFUSS,DUKAKIS,DUKAKIS,DUNST,FAWCETT,FAWCETT,GABLE,GARLAND,GARLAND,GARLAND,GIBSON,GOLDBERG,GOODING,GOODING,GRANT,GUINESS,GUINESS,GUINESS,HACKMAN,HACKMAN,HARRIS,HARRIS,HARRIS,HAWKE,HESTON,HOFFMAN,HOFFMAN,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPKINS,HOPPER,HOPPER,HUDSON,HUNT,HURT,JACKMAN,JACKMAN,JOHANSSON,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KEITEL,KEITEL,KILMER,KILMER,KILMER,KILMER,KILMER,LEIGH,LOLLOBRIGIDA,MALDEN,MANSFIELD,MARX,MCCONAUGHEY,MCCONAUGHEY,MCDORMAND,MCKELLEN,MCKELLEN,MCQUEEN,MCQUEEN,MIRANDA,MONROE,MONROE,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,NOLTE,NOLTE,NOLTE,NOLTE,OLIVIER,OLIVIER,PALTROW,PALTROW,P
1 row in set, 1 warning (0.00 sec)

mysql&gt; SHOW WARNINGS;
+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |
+---------+------+--------------------------------------+
1 row in set (0.00 sec)
</pre>
</blockquote>
<p><span id="more-2580"></span>So, my <strong>GROUP_CONCAT</strong> has been truncated. How much did I lose?</p>
<blockquote>
<pre>mysql&gt; SELECT SUM(LENGTH(last_name) + 1) - 1 FROM actor;
+--------------------------------+
| SUM(LENGTH(last_name) + 1) - 1 |
+--------------------------------+
|                           1445 |
+--------------------------------+
</pre>
</blockquote>
<p>(In the above query I counted the separating commas; they are part of the <strong>GROUP_CONCAT</strong> limit).</p>
<h4>The special case at hand</h4>
<p>The proposed SQL trick assumes the following:</p>
<ul>
<li>The length of the <strong>GROUP_CONCAT</strong> result is <em>known to be under a certain value</em>.</li>
<li>A <strong>GROUP_CONCAT</strong> of any set of <em>n</em> rows is <em>known to be shorter than (or equal to) <strong>1024</strong> characters</em>.</li>
</ul>
<p>In our above example, I happen to know that the length of the <strong>GROUP_CONCAT</strong> result is below <strong>2048</strong>. I also happen to know that any <strong>100</strong> rows will yield in a <strong>GROUP_CONCAT</strong> length of less than <strong>1024</strong>.</p>
<p>How can I know this? Well, the length of my <strong>VARCHAR</strong>, or the fact I&#8217;m handling <strong>INT</strong> values can give me upper bounds on total lengths.</p>
<h4>Steps towards the solution</h4>
<p>Returning to our example, my intention becomes clearer: I want to work it out in two phases (later on I&#8217;ll show how this can be done in more phases). Any of the following is good:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 1 and 100 \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): GUINESS,WAHLBERG,CHASE,DAVIS,LOLLOBRIGIDA,NICHOLSON,MOSTEL,JOHANSSON,SWANK,GABLE,CAGE,BERRY,WOOD,BERGEN,OLIVIER,COSTNER,VOIGHT,TORN,FAWCETT,TRACY,PALTROW,MARX,KILMER,STREEP,BLOOM,CRAWFORD,MCQUEEN,HOFFMAN,WAYNE,PECK,SOBIESKI,HACKMAN,PECK,OLIVIER,DEAN,DUKAKIS,BOLGER,MCKELLEN,BRODY,CAGE,DEGENERES,MIRANDA,JOVOVICH,STALLONE,KILMER,GOLDBERG,BARRYMORE,DAY-LEWIS,CRONYN,HOPKINS,PHOENIX,HUNT,TEMPLE,PINKETT,KILMER,HARRIS,CRUISE,AKROYD,TAUTOU,BERRY,NEESON,NEESON,WRAY,JOHANSSON,HUDSON,TANDY,BAILEY,WINSLET,PALTROW,MCCONAUGHEY,GRANT,WILLIAMS,PENN,KEITEL,POSEY,ASTAIRE,MCCONAUGHEY,SINATRA,HOFFMAN,CRUZ,DAMON,JOLIE,WILLIS,PITT,ZELLWEGER,CHAPLIN,PECK,PESCI,DENCH,GUINESS,BERRY,AKROYD,PRESLEY,TORN,WAHLBERG,WILLIS,HAWKE,BRIDGES,MOSTEL,DEPP
1 row in set (0.00 sec)

mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 101 and 200 \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): DAVIS,TORN,LEIGH,CRONYN,CROWE,DUNST,DEGENERES,NOLTE,DERN,DAVIS,ZELLWEGER,BACALL,HOPKINS,MCDORMAND,BALE,STREEP,TRACY,ALLEN,JACKMAN,MONROE,BERGMAN,NOLTE,DENCH,BENING,NOLTE,TOMEI,GARLAND,MCQUEEN,CRAWFORD,KEITEL,JACKMAN,HOPPER,PENN,HOPKINS,REYNOLDS,MANSFIELD,WILLIAMS,DEE,GOODING,HURT,HARRIS,RYDER,DEAN,WITHERSPOON,ALLEN,JOHANSSON,WINSLET,DEE,TEMPLE,NOLTE,HESTON,HARRIS,KILMER,GIBSON,TANDY,WOOD,MALDEN,BASINGER,BRODY,DEPP,HOPE,KILMER,WEST,WILLIS,GARLAND,DEGENERES,BULLOCK,WILSON,HOFFMAN,HOPPER,PFEIFFER,WILLIAMS,DREYFUSS,BENING,HACKMAN,CHASE,MCKELLEN,MONROE,GUINESS,SILVERSTONE,CARREY,AKROYD,CLOSE,GARLAND,BOLGER,ZELLWEGER,BALL,DUKAKIS,BIRCH,BAILEY,GOODING,SUVARI,TEMPLE,ALLEN,SILVERSTONE,WALKEN,WEST,KEITEL,FAWCETT,TEMPLE
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>It&#8217;s somewhat tempting to try the following trick based on <strong>IF</strong>, but see what happens:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')) FROM actor\G
*************************** 1. row ***************************
GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')): AKROYD,AKROYD,,,,,ASTAIRE,,BAILEY,,,,BARRYMORE,,,,BERGEN,,BERRY,BERRY,BERRY,,BLOOM,BOLGER,,BRIDGES,BRODY,,,CAGE,CAGE,,CHAPLIN,CHASE,,,COSTNER,CRAWFORD,,CRONYN,,,CRUISE,CRUZ,DAMON,DAVIS,,,DAY-LEWIS,DEAN,,,,DEGENERES,,,DENCH,,DEPP,,,,DUKAKIS,,,FAWCETT,,GABLE,,,,,GOLDBERG,,,GRANT,GUINESS,GUINESS,,HACKMAN,,HARRIS,,,HAWKE,,HOFFMAN,HOFFMAN,,,HOPKINS,,,,,HUDSON,HUNT,,,,JOHANSSON,JOHANSSON,,JOLIE,JOVOVICH,KEITEL,,,KILMER,KILMER,KILMER,,,,LOLLOBRIGIDA,,,MARX,MCCONAUGHEY,MCCONAUGHEY,,MCKELLEN,,MCQUEEN,,MIRANDA,,,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,,,,,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,,PESCI,,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,,,,,SINATRA,SOBIESKI,STALLONE,STREEP,,,SWANK,TANDY,,TAUTOU,TEMPLE,,,,,TORN,TORN,,TRACY,,VOIGHT,WAHLBERG,WAHLBERG,,WAYNE,,,WILLIAMS,,,WILLIS,WILLIS,,,WINSLET,,,WOOD,,WRAY,ZELLWEGER,,
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>We&#8217;re getting there, though. We will mimic <strong>GROUP_CONCAT</strong>&#8216;s separator by using <strong>CONCAT</strong>, and remove the default separator:</p>
<blockquote>
<pre>SELECT
 GROUP_CONCAT(
   IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '')
   SEPARATOR ''
 ) AS result
FROM actor
\G
*************************** 1. row ***************************
result: ,AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER
1 row in set (0.00 sec)
</pre>
</blockquote>
<h4>Solution</h4>
<p>Let&#8217;s combine all we had so far to get the final result:</p>
<blockquote>
<pre>SELECT
  SUBSTRING(
    CONCAT(
      GROUP_CONCAT(
        IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '')
        SEPARATOR ''
      ),
      GROUP_CONCAT(
        IF(actor_id BETWEEN 101 AND 200, CONCAT(',', last_name), '')
        SEPARATOR ''
      )
    ),
    2
  ) AS result
FROM actor
\G

*************************** 1. row ***************************
result: AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER,AKROYD,ALLEN,ALLEN,ALLEN,BACALL,BAILEY,BALE,BALL,BASINGER,BENING,BENING,BERGMAN,BIRCH,BOLGER,BRODY,BULLOCK,CARREY,CHASE,CLOSE,CRAWFORD,CRONYN,CROWE,DAVIS,DAVIS,DEAN,DEE,DEE,DEGENERES,DEGENERES,DENCH,DEPP,DERN,DREYFUSS,DUKAKIS,DUNST,FAWCETT,GARLAND,GARLAND,GARLAND,GIBSON,GOODING,GOODING,GUINESS,HACKMAN,HARRIS,HARRIS,HESTON,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPPER,HOPPER,HURT,JACKMAN,JACKMAN,JOHANSSON,KEITEL,KEITEL,KILMER,KILMER,LEIGH,MALDEN,MANSFIELD,MCDORMAND,MCKELLEN,MCQUEEN,MONROE,MONROE,NOLTE,NOLTE,NOLTE,NOLTE,PENN,PFEIFFER,REYNOLDS,RYDER,SILVERSTONE,SILVERSTONE,STREEP,SUVARI,TANDY,TEMPLE,TEMPLE,TEMPLE,TOMEI,TORN,TRACY,WALKEN,WEST,WEST,WILLIAMS,WILLIAMS,WILLIS,WILSON,WINSLET,WITHERSPOON,WOOD,ZELLWEGER,ZELLWEGER
1 row in set (0.00 sec)
</pre>
</blockquote>
<h4>More than 2048 characters?</h4>
<p>As far as the upper limit is known, we can work this trick in the same manner. Assume the length is expected to be <strong>3000</strong> characters. We can then <strong>CONCAT</strong> three, or four, or five <strong>GROUP_CONCAT</strong> results, each of fewer number of rows as required. Just copy+paste the above <strong>GROUP_CONCAT(&#8230;)</strong> clause a couple more times, and edit the <strong>actor_id BETWEEN n AND m</strong> clauses.</p>
<p>Moreover, further using <strong>MIN(actor_id)</strong>, <strong>MAX(actor_id)</strong> can minimize dependencies on specific values.</p>
<p>Dirty? ugly? Not arguing. But it&#8217;s working! In some ways it is not such a dirty solution: I&#8217;m avoiding using stored routines (easily setting the <strong>group_concat_max_len</strong> session variable from within a stored function&#8217;s body, see Justin&#8217;s <a href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comment-14641">suggestion</a>), so I&#8217;m only relying on SQL, not on &#8220;external&#8221; technology, if I may call it that way.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Implicit casting you don&#8217;t want to see around</title>
		<link>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around</link>
		<comments>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around#comments</comments>
		<pubDate>Wed, 07 Jul 2010 08:53:37 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2344</guid>
		<description><![CDATA[In Beware of implicit casting, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled: Number-String comparisons Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus: mysql&#62; SELECT 3 = '3.0'; +-----------+ &#124; 3 = [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://code.openark.org/blog/mysql/beware-of-implicit-casting">Beware of implicit casting</a>, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled:</p>
<h4>Number-String comparisons</h4>
<p>Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql&gt; SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+
</pre>
</blockquote>
<p>The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.</p>
<h4>Direct DATE arithmetics</h4>
<p>The first query <em>seems</em> to work, but is completely incorrect. The second explains why. The third is a total mess.<span id="more-2344"></span></p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
|             20100104 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
|             20100098 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
|             2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)
</pre>
</blockquote>
<h4>Number-String comparisons, big integers</h4>
<p>Look at the following crazy comparisons:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
|             1 |
+---------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
|                                         1 |
+-------------------------------------------+
</pre>
</blockquote>
<p>The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is <a href="http://dev.mysql.com/doc/refman/5.0/en/type-conversion.html">documented</a>. The manual describes the cases for casting, then states: &#8220;&#8230; In all other cases, the arguments are compared <em>as             floating-point (real) numbers</em>. &#8230;&#8221;</p>
<h4>Lessons learned:</h4>
<ul>
<li>Be careful when comparing strings with floating point values. Matching depends on how both are represented.</li>
<li>Avoid converting temporal types to strings when doing date manipulation.</li>
<li>Avoid direct math on temporal types.</li>
<li>Avoid casting <strong>BIGINT</strong>s represented by strings. Casting will turn out to use <strong>FLOAT</strong>s and may be incorrect.</li>
</ul>
<p>Last but not least:</p>
<ul>
<li>Use the proper data types for your data&#8217;s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SQL: good comments conventions</title>
		<link>http://code.openark.org/blog/mysql/sql-good-comments-conventions</link>
		<comments>http://code.openark.org/blog/mysql/sql-good-comments-conventions#comments</comments>
		<pubDate>Thu, 01 Jul 2010 07:36:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2581</guid>
		<description><![CDATA[I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code. I list four major places where SQL comments are helpful. I&#8217;ll use the sakila database. It is originally scarcely commented; I&#8217;ll present it now enhanced with comments, to illustrate. Table definitions The CREATE TABLE statement [...]]]></description>
			<content:encoded><![CDATA[<p>I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.</p>
<p>I list four major places where SQL comments are helpful. I&#8217;ll use the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database. It is originally scarcely commented; I&#8217;ll present it now enhanced with comments, to illustrate.</p>
<h4>Table definitions</h4>
<p>The <strong>CREATE TABLE</strong> statement allows for a comment, intended to describe the nature of the table:</p>
<blockquote>
<pre>CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 <strong>COMMENT='Reflection of `film`, used for FULLTEXT search.'</strong>
</pre>
</blockquote>
<p>It&#8217;s too bad the comment&#8217;s max length is 60 characters, though. However, it&#8217;s a very powerful field.</p>
<h4>Column definitions</h4>
<p>One may comment particular columns:<span id="more-2581"></span></p>
<blockquote>
<pre>CREATE TABLE `film` (
 `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(255) NOT NULL,
 `description` text,
 `release_year` year(4) DEFAULT NULL,
 `language_id` tinyint(3) unsigned NOT NULL <strong>COMMENT 'Soundtrack spoken language'</strong>,
 `original_language_id` tinyint(3) unsigned DEFAULT NULL <strong>COMMENT 'Filmed spoken language'</strong>,
 `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
 `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
 `length` smallint(5) unsigned DEFAULT NULL,
 `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
</pre>
</blockquote>
<h4>Stored routines definitions</h4>
<p>Here&#8217;s an original <strong>sakila</strong> procedure, untouched. It is already commented:</p>
<blockquote>
<pre>CREATE DEFINER=`root`@`localhost` PROCEDURE `rewards_report`(
 IN min_monthly_purchases TINYINT UNSIGNED
 , IN min_dollar_amount_purchased DECIMAL(10,2) UNSIGNED
 , OUT count_rewardees INT
)
 READS SQL DATA
 <strong>COMMENT 'Provides a customizable report on best customers'</strong>
BEGIN

 DECLARE last_month_start DATE;
 DECLARE last_month_end DATE;
 ...
</pre>
</blockquote>
<h4>SQL queries</h4>
<p>Last but not least, while not part of the schema, SQL queries define the use of the schema. That is, the schema exists for the sole reason of being able to query it.</p>
<p>Where did <em>that</em> query come from? Which piece of code issued it? Why? What&#8217;s its purpose?</p>
<p>Looking at the <strong>PROCESSLIST</strong>, the slow log, etc., it is easier when the queries are commented:</p>
<blockquote>
<pre>SELECT
 <strong>/* List film details along with participating actors */</strong>
 <strong>/* Issued by analytics module */</strong>
 film.*,
 COUNT(*) AS count_actors,
 GROUP_CONCAT(CONCAT(actor.first_name, ' ', actor.last_name))
FROM
 film
 JOIN film_actor USING(film_id)
 JOIN actor USING(actor_id)
GROUP BY film.film_id;
</pre>
</blockquote>
<h4>Conclusion</h4>
<p>Source code commenting is an important practice, and usually watched out for. SQL &amp; table definitions commenting are often scarce or non-existent. I urge DBAs to adopt a comments coding convention for SQL, and apply it whenever they can.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-good-comments-conventions/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>SQL: forcing single row tables integrity</title>
		<link>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity</link>
		<comments>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity#comments</comments>
		<pubDate>Tue, 22 Jun 2010 04:58:51 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2523</guid>
		<description><![CDATA[Single row tables are used in various cases. Such tables can be used for &#8220;preferences&#8221; or &#8220;settings&#8221;; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc. The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them [...]]]></description>
			<content:encoded><![CDATA[<p>Single row tables are used in various cases. Such tables can be used for &#8220;preferences&#8221; or &#8220;settings&#8221;; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.</p>
<p>The problem with single row tables is that, well, they must have s single row. And the question is: <em>how can you force them to have just one row?</em></p>
<h4>The half-baked solution</h4>
<p>The common solution is to create a <strong>PRIMARY KEY</strong> and always use the same value for that key. In addition, using <strong>REPLACE</strong> or <strong>INSERT INTO ON DUPLICATE KEY UPDATE</strong> helps out in updating the row. For example:</p>
<blockquote><pre class="brush: sql;">
CREATE TABLE heartbeat (
 id int NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
 );
</pre>
</blockquote>
<p>The above table definition is taken from <a href="http://www.maatkit.org/doc/mk-heartbeat.html">mk-heartbeat</a>. It should be noted that <em>mk-heartbeat</em> in itself does not require that the table has a single row, so it is not the target of this post. I&#8217;m taking the above table definition as a very simple example.</p>
<p>So, we assume we want this table to have a single row, for whatever reasons we have. We would usually do:</p>
<blockquote><pre class="brush: sql;">
REPLACE INTO heartbeat (id, ts) VALUES (1, NOW());
</pre>
</blockquote>
<p>or</p>
<blockquote><pre class="brush: sql;">
INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON DUPLICATE KEY UPDATE ts = NOW();
</pre>
</blockquote>
<p>Why is the above a <em>&#8220;half baked solution&#8221;</em>? Because it is up to the application to make sure it reuses the same <strong>PRIMARY KEY</strong> value. There is nothing in the database to prevent the following:<span id="more-2523"></span></p>
<blockquote><pre class="brush: sql;">
REPLACE INTO heartbeat (id, ts) VALUES (73, NOW()); -- Ooops
</pre>
</blockquote>
<p>One may claim that <em>&#8220;my application has good integrity&#8221;</em>. That may be the case; but I would then raise the question: <em>why, then, would you need <strong>FOREIGN KEY</strong>s</em>? Of course, many people don&#8217;t use <strong>FOREIGN KEY</strong>s, but I think the message is clear.</p>
<h4>A heavyweight solution</h4>
<p>Triggers <a href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-i">can help out</a>. But really, this is an overkill.</p>
<h4>A solution</h4>
<p>I purpose a solution where, much like <strong>FOREIGN KEY</strong>s, the database will force the integrity of the table; namely, have it contain <em>at most one row</em>.</p>
<p>For this solution to work, we will need a strict <strong>sql_mode</strong>. I&#8217;ll show later what happens when using a relaxed <strong>sql_mode</strong>:</p>
<blockquote><pre class="brush: sql;">
SET sql_mode='STRICT_ALL_TABLES'; -- Session scope for the purpose of this article
</pre>
</blockquote>
<p>Here&#8217;s a new table definition:</p>
<blockquote><pre class="brush: sql;">
CREATE TABLE heartbeat (
 integrity_keeper ENUM('') NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
);
</pre>
</blockquote>
<p>Let&#8217;s see what happens now:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW());
Query OK, 1 row affected (0.00 sec)

mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES ('', NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES (0, NOW());
ERROR 1265 (01000): Data truncated for column 'integrity_keeper' at row 1
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES (1, NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'

mysql&gt; REPLACE INTO heartbeat (ts) VALUES (NOW());
Query OK, 2 rows affected (0.00 sec)

mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW()) ON DUPLICATE KEY UPDATE ts = NOW();
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM heartbeat;
+------------------+---------------------+
| integrity_keeper | ts                  |
+------------------+---------------------+
|                  | 2010-06-15 09:12:19 |
+------------------+---------------------+
</pre>
</blockquote>
<p>So the trick is to create a <strong>PRIMARY KEY</strong> column which is only allowed a single value.</p>
<p>The above shows I cannot force another row into the table: the schema will prevent me from doing so. Mission accomplished.</p>
<h4>Further thoughts</h4>
<p>The <strong>CHECK</strong> keyword is the real solution to this problem (and other  problems). However, it is ignored by MySQL.</p>
<p>It is interesting to note that with a relaxed <strong>sql_mode</strong>, the <strong>INSERT INTO heartbeat (integrity_keeper, ts) VALUES (0, NOW());</strong> query succeeds. Why? The default <strong>ENUM</strong> value is <strong>1</strong>, and, being in relaxed mode, <strong>0</strong> is allowed in, even though it is not a valid value (Argh!).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Verifying GROUP_CONCAT limit without using variables</title>
		<link>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables</link>
		<comments>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comments</comments>
		<pubDate>Thu, 10 Jun 2010 07:16:14 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2534</guid>
		<description><![CDATA[I have a case where I must know if group_concat_max_len is at its default value (1024), which means there are some operation I cannot work out. I&#8217;ve ranted on this here. Normally, I would simply: SELECT @@group_concat_max_len However, I am using views, where session variables are not allowed. Using a stored function can do the [...]]]></description>
			<content:encoded><![CDATA[<p>I have a case where I must know if <strong>group_concat_max_len</strong> is at its default value (<strong>1024</strong>), which means there are some operation I cannot work out. I&#8217;ve ranted on this <a href="http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults">here</a>.</p>
<p>Normally, I would simply:</p>
<blockquote><pre class="brush: sql;">
SELECT @@group_concat_max_len
</pre>
</blockquote>
<p>However, I am using views, where session variables are not allowed. Using a stored function can <a href="http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown">do the trick</a>, but I wanted to avoid stored routines. So here&#8217;s a very simple test case: is the current <strong>group_concat_max_len</strong> long enough or not? I&#8217;ll present the long version and the short version.</p>
<h4>The long version</h4>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  )
FROM
  INFORMATION_SCHEMA.COLLATIONS;
</pre>
</blockquote>
<p>If the result is <strong>1024</strong>, we are in a bad shape. I happen to know that the total length of collation names is above <strong>1800</strong>, and so it is trimmed down. Another variance of the above query would be:<span id="more-2534"></span></p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  ) = SUM(CHAR_LENGTH(COLLATION_NAME))
    AS group_concat_max_len_is_long_enough
FROM
  INFORMATION_SCHEMA.COLLATIONS;

+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The <strong>COLLATIONS</strong>, <strong>CHARACTER_SETS</strong> or <strong>COLLATION_CHARACTER_SET_APPLICABILITY</strong> tables provide with known to exist variables (assuming you did not compile MySQL with particular charsets). It&#8217;s possible to <strong>CONCAT</strong>, <strong>UNION</strong> or <strong>JOIN</strong> columns and tables to detect longer than <strong>1800</strong> characters in <strong>group_concat_max_len</strong>. I admit this is becoming ugly, so let&#8217;s move on.</p>
<h4>The short version</h4>
<p>Don&#8217;t want to rely on existing tables? Not sure what values to expect? Look at this:</p>
<blockquote><pre class="brush: sql;">
SELECT CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 1025))) FROM DUAL
</pre>
</blockquote>
<p><strong>GROUP_CONCAT</strong> doesn&#8217;t really care about the number of rows. In the above example, I&#8217;m using a single row (retrieved from the <strong>DUAL</strong> virtual table), making sure it is long enough. Type in any number in place of <strong>1025</strong>, and you have a metric for your <strong>group_concat_max_len</strong>.</p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 32768))) &gt;= 32768 As group_concat_max_len_is_long_enough
FROM
  DUAL;
+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The above makes a computation with <strong>REPEAT</strong>. One can replace this with a big constant.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Choosing MySQL boolean data types</title>
		<link>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types</link>
		<comments>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types#comments</comments>
		<pubDate>Thu, 03 Jun 2010 05:24:11 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2181</guid>
		<description><![CDATA[How do you implement True/False columns? There are many ways to do it, each with its own pros and cons. ENUM Create you column as ENUM(&#8216;F&#8217;, &#8216;T&#8217;), or ENUM(&#8216;N&#8217;,'Y&#8217;) or ENUM(&#8217;0&#8242;, &#8217;1&#8242;). This is the method used in the mysql tables (e.g. mysql.user privileges table). It&#8217;s very simple and intuitive. It truly restricts the values [...]]]></description>
			<content:encoded><![CDATA[<p>How do you implement <strong>True</strong>/<strong>False</strong> columns?</p>
<p>There are many ways to do it, each with its own pros and cons.</p>
<h4>ENUM</h4>
<p>Create you column as <strong>ENUM(&#8216;F&#8217;, &#8216;T&#8217;)</strong>, or <strong>ENUM(&#8216;N&#8217;,'Y&#8217;)</strong> or <strong>ENUM(&#8217;0&#8242;, &#8217;1&#8242;)</strong>.</p>
<p>This is the method used in the <strong>mysql</strong> tables (e.g. <strong>mysql.user</strong> privileges table). It&#8217;s very simple and intuitive. It truly restricts the values to just two options, which serves well. It&#8217;s compact (just one byte).</p>
<p>A couple disadvantages to this method:</p>
<ol>
<li>Enums are represented by numerical values (which is good) and start with <strong>1</strong> instead of <strong>0</strong>. This means <strong>&#8216;F&#8217;</strong> is <strong>1</strong>, and <strong>&#8216;T&#8217;</strong> is <strong>2</strong>, and they both translate to <strong>True</strong> when directly used in a booleanic expression (e.g. <strong>IF(val, &#8216;True&#8217;, &#8216;False&#8217;)</strong> always yields <strong>&#8216;True&#8217;</strong>)</li>
<li>There&#8217;s no real convention. Is it <strong>&#8216;Y&#8217;/'N&#8217;</strong>? <strong>&#8216;T&#8217;/'F&#8217;</strong>? <strong>&#8216;P&#8217;/'N&#8217;</strong>? <strong>&#8217;1&#8242;/&#8217;0&#8242;</strong>?</li>
</ol>
<h4>CHAR(1)</h4>
<p>Simple again. Proposed values are, as before, <strong>&#8216;F&#8217;</strong>, <strong>&#8216;T&#8217;</strong> etc. This time there&#8217;s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an &#8216;X&#8217;.</p>
<p>Watch out for the charset! If it&#8217;s <strong>utf8</strong> you pay with 3 bytes instead of just 1. And, again, <strong>&#8216;T&#8217;</strong>, <strong>&#8216;F&#8217;</strong>, <strong>&#8216;Y&#8217;</strong>, <strong>&#8216;N&#8217;</strong> values all evaluate as <strong>True</strong>. It is possible to use the zero-valued character, but it defeats the purpose of using <strong>CHAR</strong>.<span id="more-2181"></span></p>
<h4>CHAR(0)</h4>
<p>Many are unaware that it&#8217;s even valid to make this definition. What does it mean? Take a look at the following table:</p>
<blockquote>
<pre>CREATE TABLE `t1` (
 `bval` char(0) DEFAULT NULL
);
mysql&gt; INSERT INTO t1 VALUES ('');
mysql&gt; INSERT INTO t1 VALUES ('');
mysql&gt; INSERT INTO t1 VALUES (NULL);

mysql&gt; SELECT * FROM t1;
+------+
| bval |
+------+
|      |
|      |
| NULL |
+------+
</pre>
</blockquote>
<p>NULLable columns cause for an additional storage per row. There&#8217;s one bit per NULLable column which notes down whether the column&#8217;s value is NULL or not. If you only have one NULLable column, you must pay for this bit with 1 byte. If you have two NULLable columns, you still only pay with 1 byte.</p>
<p>Furthermore:</p>
<blockquote>
<pre>mysql&gt; SELECT bval IS NOT NULL FROM t1;
+------------------+
| bval IS NOT NULL |
+------------------+
|                1 |
|                1 |
|                0 |
+------------------+
</pre>
</blockquote>
<p>So this plays somewhat nicely into booleanic expressions.</p>
<p>However, this method is unintuitive and confusing. I personally don&#8217;t use it.</p>
<h4>TINYINT</h4>
<p>With integer values, we can get down to <strong>0</strong> and <strong>1</strong>. With <strong>TINYINT</strong>, we only pay with 1 byte of storage. As with <strong>CHAR(1)</strong>, we cannot prevent anyone from INSERTing other values. But that doesn&#8217;t really matter, if we&#8217;re willing to accept that 0 evaluates as <strong>False</strong>, and <em>all other values</em> as <strong>True</strong>. In this case, boolean expressions work very well with your column values.</p>
<h4>BOOL/BOOLEAN</h4>
<p>These are just synonyms to <strong>TINYINT</strong>. I like to define my boolean values as such. Alas, when issuing a <strong>SHOW CREATE TABLE</strong> the definition is just a normal <strong>TINYINT</strong>. Still, it is clearer to look at if you&#8217;re storing your table schema under your version control.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/choosing-mysql-boolean-data-types/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Views: better performance with condition pushdown</title>
		<link>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown</link>
		<comments>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown#comments</comments>
		<pubDate>Thu, 20 May 2010 05:17:05 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1328</guid>
		<description><![CDATA[Justin&#8217;s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer&#8217;s site. The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with [...]]]></description>
			<content:encoded><![CDATA[<p>Justin&#8217;s <a href="http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/">A workaround for the performance problems of TEMPTABLE views</a> post on <a href="http://www.mysqlperformanceblog.com/">mysqlperformanceblog.com</a> reminded me of a solution I once saw on a customer&#8217;s site.</p>
<p>The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.</p>
<p>While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there&#8217;s a hack around this.</p>
<p>Let&#8217;s use the world database to illustrate. Look at the following view definition:<span id="more-1328"></span></p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;
</pre>
</blockquote>
<p>The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>And, even if we only want to filter out a single country, we still get the same plan:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>So, we need to scan the entire country_language and country tables in order to return results for just one row.</p>
<h4>A non-working solution</h4>
<p>The solution offered by the developer was this:</p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And follow by:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
</pre>
</blockquote>
<p>So, pushdown a <strong>WHERE</strong> condition into the view&#8217;s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using <strong>IFNULL</strong>, for example, or <strong>OR</strong> statements) to allow for full scan in case the variable is undefined.</p>
<p>This doesn&#8217;t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.</p>
<h4>A workaround</h4>
<p>Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:</p>
<blockquote><pre class="brush: sql;">
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And now:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql&gt; EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
</pre>
</blockquote>
<p>Since views are allowed to call stored routines (Justing used this to call upon <strong>CONNECTION_ID()</strong>), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.</p>
<p>Back in the customer&#8217;s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.</p>
<h4>Another kind of use</h4>
<p>Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:</p>
<p>Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?</p>
<p>There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I&#8217;ve seen it in use.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Discovery of the day: GROUP BY &#8230; DESC</title>
		<link>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc</link>
		<comments>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc#comments</comments>
		<pubDate>Tue, 04 May 2010 09:38:38 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2381</guid>
		<description><![CDATA[I happened on a query where, by mistake, an SELECT ... ORDER BY x DESC LIMIT 1 was written as SELECT ... GROUP BY x DESC LIMIT 1 And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It&#8217;s documented. In MySQL, GROUP BY [...]]]></description>
			<content:encoded><![CDATA[<p>I happened on a query where, by mistake, an</p>
<pre class="brush: sql;">
SELECT ... ORDER BY x DESC LIMIT 1
</pre>
<p>was written as</p>
<pre class="brush: sql;">
SELECT ... GROUP BY x DESC LIMIT 1
</pre>
<p>And it took me by surprise to realize <strong>GROUP BY x DESC</strong> is a valid statement. I looked it up: yep! It&#8217;s <a href="http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html">documented</a>.</p>
<p>In MySQL, <strong>GROUP BY</strong> results are sorted according to the group statement. You can override this by adding <strong>ORDER BY NULL</strong> (see <a href="http://code.openark.org/blog/mysql/less-known-sql-syntax-and-functions-in-mysql">past post</a>). I wasn&#8217;t aware you can actually control the sort order.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Proper SQL table alias use conventions</title>
		<link>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions</link>
		<comments>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions#comments</comments>
		<pubDate>Thu, 11 Mar 2010 07:10:09 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Opinions]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2156</guid>
		<description><![CDATA[After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query. I&#8217;m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I [...]]]></description>
			<content:encoded><![CDATA[<p>After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.</p>
<p>I&#8217;m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.</p>
<p>Using the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database, take a look at the following queries:<span id="more-2156"></span></p>
<h4>Query #1</h4>
<blockquote>
<pre><strong>SELECT</strong>
 R.rental_date, C.customer_id, C.first_name, C.last_name
<strong>FROM</strong>
 rental R
 <strong>JOIN</strong> customer C <strong>USING</strong> (customer_id)
<strong>WHERE</strong>
 R.rental_date &gt;= DATE('2005-10-01')
 <strong>AND</strong> C.store_id=1;
</pre>
</blockquote>
<p>The above looks for film rentals done in a specific store (store #<strong>1</strong>), as of Oct. 1st, 2005.</p>
<h4>Query #2</h4>
<blockquote>
<pre><strong>SELECT</strong>
 F.title, C.name
<strong>FROM</strong>
 film <strong>AS</strong> F
 <strong>JOIN</strong> film_category <strong>AS</strong> S <strong>ON</strong> (F.film_id = S.film_id)
 <strong>JOIN</strong> category <strong>AS</strong> C <strong>ON</strong> (S.category_id = C.category_id)
<strong>WHERE</strong> F.length &gt; 180;</pre>
</blockquote>
<p>The above lists the title and category for all films longer than three hours.</p>
<h4>Query #3</h4>
<blockquote>
<pre><strong>SELECT</strong> c.customer_id, c.last_name
<strong>FROM</strong>
  customer c
  <strong>INNER JOIN</strong> address a ON (c.address_id = a.address_id)
  <strong>INNER JOIN</strong> (
    <strong>SELECT</strong>
      c.city_id
    <strong>FROM</strong>
      city AS c
      <strong>JOIN</strong> country s <strong>ON</strong> (c.country_id = s.country_id)
    <strong>WHERE</strong>
      s.country <strong>LIKE</strong> 'F%'
  ) s1 <strong>USING</strong> (city_id)
<strong>WHERE</strong>
  create_date &gt;= DATE('2005-10-01');
</pre>
</blockquote>
<p>The above lists customers created as of Oct. 1st, 2005, and who live in countries starting with an &#8216;F&#8217;. The query could be solved without a subquery, but there&#8217;s a good reason why I made it so.</p>
<h4>The problems</h4>
<p>I used very different conventions on any one of the queries, and sometimes within each query. And it&#8217;s common that I see the same on a customer&#8217;s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I&#8217;ll leaver the rest to the reader.</p>
<p>Here&#8217;s where I see problems:</p>
<ul>
<li>Query <strong>#1</strong>: In itself, it looks fine. <strong>Rental</strong> turns to <strong>R</strong>, <strong>Customer</strong> turns to <strong>C</strong>. I will comment on this slightly later on when I provide my full opinion.</li>
<li>Query <strong>#2</strong>: So <strong>film</strong> turns to <strong>F</strong>, <strong>category</strong> turns to <strong>C</strong>. What should <strong>film_category</strong> turn into? <em>Out of letters?</em> Let&#8217;s just go for <strong>S</strong>, shall we? But <strong>S</strong> has nothing do with <strong>film_category</strong>. Yet it&#8217;s so commonly seen.</li>
<li>Query <strong>#2</strong>: We&#8217;re using the <strong>AS</strong> keyword now. We didn&#8217;t use it before.</li>
<li>Queries <strong>#1</strong>, <strong>#2</strong>: Hold on. Wasn&#8217;t <strong>C</strong> taken for <strong>customer</strong> in Query <strong>#1</strong>? Now, in Query <strong>#2</strong> it stands for <strong>category</strong>? I&#8217;m beginning to get confused.</li>
<li>Query <strong>#3</strong>: Now aliases are lower case; I was just getting used to them being upper case.</li>
<li>Query <strong>#3</strong>: But, hey, <strong>c</strong> is back to <strong>customer</strong>!</li>
<li>Query <strong>#3</strong>: Or, is it? Take a look at the subquery. Theres another <strong>c</strong> in there! This time it&#8217;s <strong>city</strong>! And it&#8217;s perfectly valid syntax. We actually have two identical aliases in the same query.</li>
<li>Query <strong>#3</strong>: If I could, I would name country with <strong>c</strong> as well. But I can&#8217;t. So why not throw in <strong>s</strong> again?</li>
<li>Query <strong>#3</strong>: and now I don&#8217;t even bother using the alias when accessing the <strong>create_date</strong>. Well, there&#8217;s no such column in any of the other tables!</li>
</ul>
<h4>Proper conventions</h4>
<p>What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read.</p>
<p>Any of the above issues could be justified. But I wish to make some suggestions:</p>
<ul>
<li>Decide whether you&#8217;re going for upper or lower case.</li>
<li>Do not use the same alias twice in your query, even if it&#8217;s valid.</li>
<li>Aliases do not have to be single character. <strong>film_category</strong> may just as well be <strong>FC</strong>.</li>
<li>Do not alias something that is hard to interpret. <strong>s</strong> does not stand for <strong>country</strong>.</li>
<li>Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use <strong>cust</strong> instead of <strong>c</strong>.</li>
</ul>
<p>The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another!</p>
<h4>My own convention</h4>
<p>Simple: I <em>only use aliases</em> when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer.</p>
<p>I know this does not appeal to many. But there&#8217;s no confusion. And it&#8217;s easily searchable. And it&#8217;s consistent. And if properly formatted, as in the above queries, is well readable.</p>
<p>Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Tip: faster than TRUNCATE</title>
		<link>http://code.openark.org/blog/mysql/tip-faster-than-truncate</link>
		<comments>http://code.openark.org/blog/mysql/tip-faster-than-truncate#comments</comments>
		<pubDate>Tue, 09 Mar 2010 11:37:01 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1896</guid>
		<description><![CDATA[TRUNCATE is usually a fast operation (much faster than DELETE FROM). But sometimes it just hangs; I&#8217;ve has several such uncheerful events with InnoDB (Plugin) tables which were extensively written to. The TRUNCATE hanged; nothing else would work; minutes pass. TRUNCATE on tables with no FOREIGN KEYs should act fast: it translate to dropping the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://dev.mysql.com/doc/refman/5.0/en/truncate-table.html"><strong>TRUNCATE</strong></a> is usually a fast operation (much faster than <strong>DELETE FROM</strong>). But sometimes it just hangs; I&#8217;ve has several such uncheerful events with InnoDB (Plugin) tables which were extensively written to. The <strong>TRUNCATE</strong> hanged; nothing else would work; minutes pass.</p>
<p><strong>TRUNCATE</strong> on tables with no <strong>FOREIGN KEY</strong>s <em>should</em> act fast: it translate to dropping the table and creating a new one (and it all depends on the MySQL version, see the manual).</p>
<p>What&#8217;s faster than <strong>TRUNCATE</strong>, then? If you don&#8217;t have triggers nor <strong>FOREIGN KEY</strong>s, a <a href="http://dev.mysql.com/doc/refman/5.0/en/rename-table.html"><strong>RENAME TABLE</strong></a> can come to the rescue. Instead of:</p>
<blockquote>
<pre>TRUNCATE log_table</pre>
</blockquote>
<p>Do:</p>
<blockquote>
<pre>CREATE TABLE log_table_new LIKE log_table;
<strong>RENAME TABLE</strong> log_table TO log_table_old, log_table_new TO log_table;
DROP TABLE log_table_old;</pre>
</blockquote>
<p>I found this to work well for me. Do note that <strong>AUTO_INCREMENT</strong> values can be tricky here: the &#8220;new&#8221; table is created with an <strong>AUTO_INCREMENT</strong> value which is immediately taken in the &#8220;working&#8221; table. If you care about not using same <strong>AUTO_INCREMENT</strong> values, you can:<span id="more-1896"></span></p>
<blockquote>
<pre>ALTER TABLE log_table_new AUTO_INCREMENT=<em>some high enough value;</em></pre>
</blockquote>
<p>Just before renaming.</p>
<p>I do not have a good explanation as for why the <strong>RENAME TABLE</strong> succeeds to respond faster than <strong>TRUNCATE</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/tip-faster-than-truncate/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>
