<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Syntax</title>
	<atom:link href="http://code.openark.org/blog/tag/syntax/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Thu, 09 Sep 2010 16:15:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SQL: good comments conventions</title>
		<link>http://code.openark.org/blog/mysql/sql-good-comments-conventions</link>
		<comments>http://code.openark.org/blog/mysql/sql-good-comments-conventions#comments</comments>
		<pubDate>Thu, 01 Jul 2010 07:36:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2581</guid>
		<description><![CDATA[I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code. I list four major places where SQL comments are helpful. I&#8217;ll use the sakila database. It is originally scarcely commented; I&#8217;ll present it now enhanced with comments, to illustrate. Table definitions The CREATE TABLE statement [...]]]></description>
			<content:encoded><![CDATA[<p>I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.</p>
<p>I list four major places where SQL comments are helpful. I&#8217;ll use the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database. It is originally scarcely commented; I&#8217;ll present it now enhanced with comments, to illustrate.</p>
<h4>Table definitions</h4>
<p>The <strong>CREATE TABLE</strong> statement allows for a comment, intended to describe the nature of the table:</p>
<blockquote>
<pre>CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 <strong>COMMENT='Reflection of `film`, used for FULLTEXT search.'</strong>
</pre>
</blockquote>
<p>It&#8217;s too bad the comment&#8217;s max length is 60 characters, though. However, it&#8217;s a very powerful field.</p>
<h4>Column definitions</h4>
<p>One may comment particular columns:<span id="more-2581"></span></p>
<blockquote>
<pre>CREATE TABLE `film` (
 `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(255) NOT NULL,
 `description` text,
 `release_year` year(4) DEFAULT NULL,
 `language_id` tinyint(3) unsigned NOT NULL <strong>COMMENT 'Soundtrack spoken language'</strong>,
 `original_language_id` tinyint(3) unsigned DEFAULT NULL <strong>COMMENT 'Filmed spoken language'</strong>,
 `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
 `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
 `length` smallint(5) unsigned DEFAULT NULL,
 `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
</pre>
</blockquote>
<h4>Stored routines definitions</h4>
<p>Here&#8217;s an original <strong>sakila</strong> procedure, untouched. It is already commented:</p>
<blockquote>
<pre>CREATE DEFINER=`root`@`localhost` PROCEDURE `rewards_report`(
 IN min_monthly_purchases TINYINT UNSIGNED
 , IN min_dollar_amount_purchased DECIMAL(10,2) UNSIGNED
 , OUT count_rewardees INT
)
 READS SQL DATA
 <strong>COMMENT 'Provides a customizable report on best customers'</strong>
BEGIN

 DECLARE last_month_start DATE;
 DECLARE last_month_end DATE;
 ...
</pre>
</blockquote>
<h4>SQL queries</h4>
<p>Last but not least, while not part of the schema, SQL queries define the use of the schema. That is, the schema exists for the sole reason of being able to query it.</p>
<p>Where did <em>that</em> query come from? Which piece of code issued it? Why? What&#8217;s its purpose?</p>
<p>Looking at the <strong>PROCESSLIST</strong>, the slow log, etc., it is easier when the queries are commented:</p>
<blockquote>
<pre>SELECT
 <strong>/* List film details along with participating actors */</strong>
 <strong>/* Issued by analytics module */</strong>
 film.*,
 COUNT(*) AS count_actors,
 GROUP_CONCAT(CONCAT(actor.first_name, ' ', actor.last_name))
FROM
 film
 JOIN film_actor USING(film_id)
 JOIN actor USING(actor_id)
GROUP BY film.film_id;
</pre>
</blockquote>
<h4>Conclusion</h4>
<p>Source code commenting is an important practice, and usually watched out for. SQL &amp; table definitions commenting are often scarce or non-existent. I urge DBAs to adopt a comments coding convention for SQL, and apply it whenever they can.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-good-comments-conventions/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Views: better performance with condition pushdown</title>
		<link>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown</link>
		<comments>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown#comments</comments>
		<pubDate>Thu, 20 May 2010 05:17:05 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1328</guid>
		<description><![CDATA[Justin&#8217;s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer&#8217;s site. The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with [...]]]></description>
			<content:encoded><![CDATA[<p>Justin&#8217;s <a href="http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/">A workaround for the performance problems of TEMPTABLE views</a> post on <a href="http://www.mysqlperformanceblog.com/">mysqlperformanceblog.com</a> reminded me of a solution I once saw on a customer&#8217;s site.</p>
<p>The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.</p>
<p>While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there&#8217;s a hack around this.</p>
<p>Let&#8217;s use the world database to illustrate. Look at the following view definition:<span id="more-1328"></span></p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;
</pre>
</blockquote>
<p>The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>And, even if we only want to filter out a single country, we still get the same plan:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>So, we need to scan the entire country_language and country tables in order to return results for just one row.</p>
<h4>A non-working solution</h4>
<p>The solution offered by the developer was this:</p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And follow by:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
</pre>
</blockquote>
<p>So, pushdown a <strong>WHERE</strong> condition into the view&#8217;s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using <strong>IFNULL</strong>, for example, or <strong>OR</strong> statements) to allow for full scan in case the variable is undefined.</p>
<p>This doesn&#8217;t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.</p>
<h4>A workaround</h4>
<p>Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:</p>
<blockquote><pre class="brush: sql;">
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And now:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql&gt; EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
</pre>
</blockquote>
<p>Since views are allowed to call stored routines (Justing used this to call upon <strong>CONNECTION_ID()</strong>), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.</p>
<p>Back in the customer&#8217;s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.</p>
<h4>Another kind of use</h4>
<p>Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:</p>
<p>Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?</p>
<p>There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I&#8217;ve seen it in use.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Discovery of the day: GROUP BY &#8230; DESC</title>
		<link>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc</link>
		<comments>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc#comments</comments>
		<pubDate>Tue, 04 May 2010 09:38:38 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2381</guid>
		<description><![CDATA[I happened on a query where, by mistake, an SELECT ... ORDER BY x DESC LIMIT 1 was written as SELECT ... GROUP BY x DESC LIMIT 1 And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It&#8217;s documented. In MySQL, GROUP BY [...]]]></description>
			<content:encoded><![CDATA[<p>I happened on a query where, by mistake, an</p>
<pre class="brush: sql;">
SELECT ... ORDER BY x DESC LIMIT 1
</pre>
<p>was written as</p>
<pre class="brush: sql;">
SELECT ... GROUP BY x DESC LIMIT 1
</pre>
<p>And it took me by surprise to realize <strong>GROUP BY x DESC</strong> is a valid statement. I looked it up: yep! It&#8217;s <a href="http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html">documented</a>.</p>
<p>In MySQL, <strong>GROUP BY</strong> results are sorted according to the group statement. You can override this by adding <strong>ORDER BY NULL</strong> (see <a href="http://code.openark.org/blog/mysql/less-known-sql-syntax-and-functions-in-mysql">past post</a>). I wasn&#8217;t aware you can actually control the sort order.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/discovery-of-the-day-group-by-desc/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>But I DO want MySQL to say &#8220;ERROR&#8221;!</title>
		<link>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error</link>
		<comments>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error#comments</comments>
		<pubDate>Fri, 12 Mar 2010 04:53:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[sql_mode]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2005</guid>
		<description><![CDATA[MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data. Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though). Calling CREATE TEMPORARY TABLE? You get silent commit. Issuing a ROLLBACK on non-transactional involved [...]]]></description>
			<content:encoded><![CDATA[<p>MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.</p>
<ul>
<li>Using <strong>GROUP_CONCAT</strong> with a small <strong>group_concat_max_len</strong> setting? Your result will be silently truncated (make sure to check the warnings though).</li>
<li>Calling <strong>CREATE <span style="text-decoration: line-through;">TEMPORARY</span> TABLE</strong>? You get <a href="http://www.joinfu.com/2010/03/a-follow-up-on-the-sql-puzzle/">silent commit</a>.</li>
<li>Issuing a <strong>ROLLBACK</strong> on non-transactional involved engines? Have a warning; no error.</li>
<li>Using <strong>LOCK IN SHARE MODE</strong> on non transactional tables? Not a problem. Nothing reported.</li>
<li>Adding a <strong>FOREIGN KEY</strong> on a MyISAM table? Good for you; no action actually taken.</li>
<li>Inserting <strong>300</strong> to a <strong>TINYINT</strong> column in a relaxed <strong>sql_mode</strong>? Give me <strong>255</strong>, I&#8217;ll silently drop the remaining <strong>45</strong>. I owe you.</li>
</ul>
<h4>Warnings and errors</h4>
<p>It would be nice to:<span id="more-2005"></span></p>
<ul>
<li>Have an <strong>auto_propagate_warning_to_error</strong> server variable (global/session/both) which, well, does what it says.</li>
<li>Have an <strong>i_am_really_not_a_dummy</strong> server variable which implies stricter checks for all the above and prevents you from doing with <em>anything</em> that may be problematic (or rolls back your transactions on your invalid actions).</li>
</ul>
<p>Connectors may be nice enough to propagate warnings to errors &#8211; that&#8217;s good. But not enough: since data is already committed in MySQL.</p>
<p>If I understand correctly, and maybe it&#8217;s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that &#8220;MySQL just works out of the box and does not require me to configure or understand anything&#8221;).</p>
<p>MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity &#8212; or, be nice to everyone, just allow me to specify what &#8220;nice&#8221; means for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Proper SQL table alias use conventions</title>
		<link>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions</link>
		<comments>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions#comments</comments>
		<pubDate>Thu, 11 Mar 2010 07:10:09 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Opinions]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2156</guid>
		<description><![CDATA[After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query. I&#8217;m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I [...]]]></description>
			<content:encoded><![CDATA[<p>After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.</p>
<p>I&#8217;m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.</p>
<p>Using the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database, take a look at the following queries:<span id="more-2156"></span></p>
<h4>Query #1</h4>
<blockquote>
<pre><strong>SELECT</strong>
 R.rental_date, C.customer_id, C.first_name, C.last_name
<strong>FROM</strong>
 rental R
 <strong>JOIN</strong> customer C <strong>USING</strong> (customer_id)
<strong>WHERE</strong>
 R.rental_date &gt;= DATE('2005-10-01')
 <strong>AND</strong> C.store_id=1;
</pre>
</blockquote>
<p>The above looks for film rentals done in a specific store (store #<strong>1</strong>), as of Oct. 1st, 2005.</p>
<h4>Query #2</h4>
<blockquote>
<pre><strong>SELECT</strong>
 F.title, C.name
<strong>FROM</strong>
 film <strong>AS</strong> F
 <strong>JOIN</strong> film_category <strong>AS</strong> S <strong>ON</strong> (F.film_id = S.film_id)
 <strong>JOIN</strong> category <strong>AS</strong> C <strong>ON</strong> (S.category_id = C.category_id)
<strong>WHERE</strong> F.length &gt; 180;</pre>
</blockquote>
<p>The above lists the title and category for all films longer than three hours.</p>
<h4>Query #3</h4>
<blockquote>
<pre><strong>SELECT</strong> c.customer_id, c.last_name
<strong>FROM</strong>
  customer c
  <strong>INNER JOIN</strong> address a ON (c.address_id = a.address_id)
  <strong>INNER JOIN</strong> (
    <strong>SELECT</strong>
      c.city_id
    <strong>FROM</strong>
      city AS c
      <strong>JOIN</strong> country s <strong>ON</strong> (c.country_id = s.country_id)
    <strong>WHERE</strong>
      s.country <strong>LIKE</strong> 'F%'
  ) s1 <strong>USING</strong> (city_id)
<strong>WHERE</strong>
  create_date &gt;= DATE('2005-10-01');
</pre>
</blockquote>
<p>The above lists customers created as of Oct. 1st, 2005, and who live in countries starting with an &#8216;F&#8217;. The query could be solved without a subquery, but there&#8217;s a good reason why I made it so.</p>
<h4>The problems</h4>
<p>I used very different conventions on any one of the queries, and sometimes within each query. And it&#8217;s common that I see the same on a customer&#8217;s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I&#8217;ll leaver the rest to the reader.</p>
<p>Here&#8217;s where I see problems:</p>
<ul>
<li>Query <strong>#1</strong>: In itself, it looks fine. <strong>Rental</strong> turns to <strong>R</strong>, <strong>Customer</strong> turns to <strong>C</strong>. I will comment on this slightly later on when I provide my full opinion.</li>
<li>Query <strong>#2</strong>: So <strong>film</strong> turns to <strong>F</strong>, <strong>category</strong> turns to <strong>C</strong>. What should <strong>film_category</strong> turn into? <em>Out of letters?</em> Let&#8217;s just go for <strong>S</strong>, shall we? But <strong>S</strong> has nothing do with <strong>film_category</strong>. Yet it&#8217;s so commonly seen.</li>
<li>Query <strong>#2</strong>: We&#8217;re using the <strong>AS</strong> keyword now. We didn&#8217;t use it before.</li>
<li>Queries <strong>#1</strong>, <strong>#2</strong>: Hold on. Wasn&#8217;t <strong>C</strong> taken for <strong>customer</strong> in Query <strong>#1</strong>? Now, in Query <strong>#2</strong> it stands for <strong>category</strong>? I&#8217;m beginning to get confused.</li>
<li>Query <strong>#3</strong>: Now aliases are lower case; I was just getting used to them being upper case.</li>
<li>Query <strong>#3</strong>: But, hey, <strong>c</strong> is back to <strong>customer</strong>!</li>
<li>Query <strong>#3</strong>: Or, is it? Take a look at the subquery. Theres another <strong>c</strong> in there! This time it&#8217;s <strong>city</strong>! And it&#8217;s perfectly valid syntax. We actually have two identical aliases in the same query.</li>
<li>Query <strong>#3</strong>: If I could, I would name country with <strong>c</strong> as well. But I can&#8217;t. So why not throw in <strong>s</strong> again?</li>
<li>Query <strong>#3</strong>: and now I don&#8217;t even bother using the alias when accessing the <strong>create_date</strong>. Well, there&#8217;s no such column in any of the other tables!</li>
</ul>
<h4>Proper conventions</h4>
<p>What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read.</p>
<p>Any of the above issues could be justified. But I wish to make some suggestions:</p>
<ul>
<li>Decide whether you&#8217;re going for upper or lower case.</li>
<li>Do not use the same alias twice in your query, even if it&#8217;s valid.</li>
<li>Aliases do not have to be single character. <strong>film_category</strong> may just as well be <strong>FC</strong>.</li>
<li>Do not alias something that is hard to interpret. <strong>s</strong> does not stand for <strong>country</strong>.</li>
<li>Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use <strong>cust</strong> instead of <strong>c</strong>.</li>
</ul>
<p>The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another!</p>
<h4>My own convention</h4>
<p>Simple: I <em>only use aliases</em> when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer.</p>
<p>I know this does not appeal to many. But there&#8217;s no confusion. And it&#8217;s easily searchable. And it&#8217;s consistent. And if properly formatted, as in the above queries, is well readable.</p>
<p>Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/proper-sql-table-alias-use-conventions/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>7 ways to convince MySQL to use the right index</title>
		<link>http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index</link>
		<comments>http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index#comments</comments>
		<pubDate>Thu, 02 Apr 2009 16:06:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=695</guid>
		<description><![CDATA[Sometimes MySQL gets it wrong. It doesn&#8217;t use the right index. It happens that MySQL generates a query plan which is really bad (EXPLAIN says it&#8217;s going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: &#8220;Sure, I can do that with 100 rows using a key&#8221;. A true [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes MySQL gets it wrong. It doesn&#8217;t use the right index.</p>
<p>It happens that MySQL generates a query plan which is really bad (EXPLAIN says it&#8217;s going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: &#8220;Sure, I can do that with 100 rows using a key&#8221;.</p>
<h4>A true story</h4>
<p>A customer had issues with his database. Queries were taking 15 minutes to complete, and the db in general was not responsive. Looking at the slow query log, I found the criminal query. Allow me to bring you up to speed:</p>
<p>A table is defined like this:</p>
<blockquote>
<pre>CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT,
  type INT UNSIGNED,
  level TINYINT unsigned,
  ...
  PRIMARY KEY(id),
  KEY `type` (type)
) ENGINE=InnoDB;</pre>
</blockquote>
<p>The offending query was this:</p>
<blockquote>
<pre>SELECT id FROM data
WHERE type=12345 AND level &gt; 3
ORDER BY id</pre>
</blockquote>
<p>The facts were:</p>
<ul>
<li>`t` has about 10,000,000 rows.</li>
<li>The index on `type` is selective: about 100 rows per value on average.</li>
<li>The query took a long time to complete.</li>
<li>EXPLAIN has shown that MySQL uses the PRIMARY KEY, hence searches 10,000,000 rows, filtered &#8220;using where&#8221;.</li>
<li>The <em>other</em> EXPLAIN has shown that by using the `type` key, only 110 rows are expected, to be filtered &#8220;using where&#8221;, then sorted &#8220;using filesort&#8221;</li>
</ul>
<p>So MySQL acknowledged it was generating the wrong plan. The <em>other</em> plan was better by its own standards.</p>
<h4>Solving the problem</h4>
<p>Let&#8217;s walk through 7 ways to solve the problem, starting with the more aggressive solutions, refining to achieve desired behavior through subtle changes.<span id="more-695"></span></p>
<h4>Solution #1: OPTIMIZE</h4>
<p>If MySQL got it wrong, it may be because the table was frequently changed. This affects the statistics. If we can spare the time (table is locked during that time), we could help out by rebuilding the table.</p>
<h4>Solution #2: ANALYZE</h4>
<p>ANALYZE TABLE is less time consuming, in particular on InnoDB, where it is barely noticed. An ANALYZE will update the index statistics and help out in generating better query plans.</p>
<p>But hold on, the above two solutions are fine, but in the given case, MySQL <em>already</em> acknowledges better plans are at hand. The fact was I tried to run ANALYZE a few times, to no avail.</p>
<h4>Solution #3: USE INDEX</h4>
<p>Since the issue was urgent, my first thought went for the ultimate weapon:</p>
<blockquote>
<pre>SELECT id FROM data USE INDEX(type)
WHERE type=12345 AND level &gt; 3
ORDER BY id</pre>
</blockquote>
<p>This instructs MySQL to only consider the indexes listed; in our example, I only want MySQL to consider using the `type` index. It is using this method that generated the <em>other</em> (good) EXPLAIN result. I could have gone even more ruthless and ask for FORCE INDEX.</p>
<h4>Solution #4: IGNORE INDEX</h4>
<p>A similar approach would be to explicitly negate the use of the PRIMARY KEY, like this:</p>
<blockquote>
<pre>SELECT id FROM data IGNORE INDEX(PRIMARY)
WHERE type=12345 AND level &gt; 3
ORDER BY id</pre>
</blockquote>
<h4>A moment of thinking</h4>
<p>The above solutions are &#8220;ugly&#8221;, in the sense that this is not standard SQL. It&#8217;s too MySQL specific.</p>
<p>I&#8217;ve asked the programmers to do a quick rewrite, and had a few moments to consider: why did MySQL insist on using the PRIMARY KEY. Was it because I&#8217;ve asked it for the `id` column only? I rewrote as follows:</p>
<blockquote>
<pre>SELECT id, type, level FROM data
WHERE type=12345 AND level &gt; 3
ORDER BY id</pre>
</blockquote>
<p>Nope. EXPLAIN got me the same bad plan. Then it must be the ORDER BY clause:</p>
<blockquote>
<pre>SELECT id FROM data
WHERE type=12345 AND level &gt; 3</pre>
</blockquote>
<p>Sure enough, EXPLAIN now  indicates using the `type` index, only reading 110 rows. So MySQL preferred to scan 10,000,000 rows, just so that the rows are generated in the right ORDER, and so no sorting is required, when it could have read 110 rows (where each row is a mere INT) and sort them in no time.</p>
<p>Armed with this knowledge, a few more options come at hand.</p>
<h4>Solution #5:Move some logic to the application</h4>
<p>At about that point I got a message that the programmers were unable to add the USE INDEX part. Why? They were using the EJB framework, which limits your SQL-like queries to something very generic. Well, you can always drop the ORDER BY part and sort on the application side. That isn&#8217;t fun, but it&#8217;s been done.</p>
<h4>Solution #6: Negate use of PRIMARY KEY</h4>
<p>Can we force MySQL to use the `type` index, retain the ORDER BY, and do it all with standard SQL? Sure. The following query does this:</p>
<blockquote>
<pre>SELECT id, type, level FROM data
WHERE type=12345 AND level &gt; 3
ORDER BY id+0</pre>
</blockquote>
<p>id+0 is a function on the `id` column. This makes MySQL unable to utilize the PRIMARY KEY (or any other index on `id`, had there been one).</p>
<p>In his book &#8220;<a title="SQL Tuning by Dan Tow" href="http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733">SQL Tuning</a>&#8220;, Dan Tow dedicates a chapter on hints and tips like the above. He shows how to control the use or non-use of indexes, the order by which subqueries are calculated, and more.</p>
<p>Unfortunately, the EJB specification said this was not allowed. You could not ORDER BY a fucntion. Only on normal column.</p>
<h4>Solution #7: Make MySQL think the problem is harder than it really is</h4>
<p>Almost out of options. Just a moment before settling for sorting on the application side, another issue can be considered: since MySQL was fooled once, can it be fooled again to make things right? Can we fool it to believe that the PRIMARY KEY would not be worthwhile to use? The following query does this:</p>
<blockquote>
<pre>SELECT id, type, level FROM data
WHERE type=12345 AND level &gt; 3
ORDER BY id, type, level</pre>
</blockquote>
<p>Let&#8217;s reflect on this one. What is the order by which the rows are returned now? Answer: exactly as before. Since `id` is PRIMARY KEY, it is also UNIQUE, so no two `id` values are the same. Therefore, the secondary sorting column is redudant, and so is the following one. We get exactly the same result as &#8220;ORDER BY id&#8221;.</p>
<p>But MySQL didn&#8217;t catch this. This query caused MySQL to say: <em>&#8220;Mmmmm. &#8216;ORDER BY id, type, level&#8217; is not doable with the PRIMARY KEY only. Well, in this case, I had better used the `type` index&#8221;</em>. Is this a weakness of MySQL? I guess so. Maybe it will be fixed in the future. But this was the fix that made the day.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>`;`.`*`.`.` is a valid column name</title>
		<link>http://code.openark.org/blog/mysql/is-a-valid-column-name</link>
		<comments>http://code.openark.org/blog/mysql/is-a-valid-column-name#comments</comments>
		<pubDate>Thu, 12 Feb 2009 04:38:11 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=502</guid>
		<description><![CDATA[And the following query: SELECT `;`.`*`.`.` FROM `;`.`*`; is valid as well. So are the following: DROP DATABASE IF EXISTS `;`; CREATE DATABASE `;`; CREATE TABLE `;`.`*` (`.` INT); CREATE TABLE `;`.```` (`.` INT); CREATE TABLE `;`.`$(ls)` (`.` INT); So, on my Linux machine: root@mymachine:/usr/local/mysql/data# ls -l total 30172 drwx------ 2 mysql mysql 4096 2009-01-11 [...]]]></description>
			<content:encoded><![CDATA[<p>And the following query:</p>
<blockquote>
<pre>SELECT `;`.`*`.`.` FROM `;`.`*`;</pre>
</blockquote>
<p>is valid as well. So are the following:</p>
<blockquote>
<pre>DROP DATABASE IF EXISTS `;`;
CREATE DATABASE `;`;
CREATE TABLE `;`.`*` (`.` INT);
CREATE TABLE `;`.```` (`.` INT);
CREATE TABLE `;`.`$(ls)` (`.` INT);</pre>
</blockquote>
<p><span id="more-502"></span>So, on my Linux machine:</p>
<blockquote>
<pre>root@mymachine:/usr/local/mysql/data# ls -l
total 30172
drwx------ 2 mysql mysql     4096 2009-01-11 08:00 ;
-rw-rw---- 1 mysql mysql 18874368 2009-01-09 19:08 ibdata1
-rw-rw---- 1 mysql mysql  5242880 2009-01-09 19:08 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 2009-01-09 19:08 ib_logfile1
drwxr-x--- 2 mysql mysql     4096 2008-12-09 11:38 mysql
-rw-rw---- 1 mysql mysql  1423612 2009-01-11 08:00 mysql-bin.000001
-rw-rw---- 1 mysql mysql       19 2009-01-04 09:05 mysql-bin.index
drwx------ 2 mysql mysql     4096 2008-12-21 13:58 sakila
-rw-rw---- 1 mysql root      9783 2009-01-04 09:05 mymachine.err
-rw-rw---- 1 mysql mysql        6 2009-01-04 09:05 mymachine
.pid
drwx------ 2 mysql mysql     4096 2009-01-04 08:30 world</pre>
</blockquote>
<p>Well then&#8230;</p>
<blockquote>
<pre>root@mymachine:/usr/local/mysql/data# <strong>cd ;</strong>
root@mymachine:~#</pre>
</blockquote>
<p>Trying again:</p>
<blockquote>
<pre>root@mymachine:~# <strong>cd -</strong>
/usr/local/mysql/data
root@mymachine:/usr/local/mysql/data# <strong>cd ";"</strong>
root@mymachine:/usr/local/mysql/data/;#</pre>
</blockquote>
<p>And now:</p>
<blockquote>
<pre>root@mymachine:/usr/local/mysql/data/;# <strong>ls -l *.frm</strong>
-rw-rw---- 1 mysql mysql 8554 2009-01-11 08:00 `.frm
-rw-rw---- 1 mysql mysql 8554 2009-01-11 08:00 *.frm
-rw-rw---- 1 mysql mysql 8554 2009-01-11 08:00 $(ls).frm</pre>
</blockquote>
<p>Oh, sorry, I meant:</p>
<blockquote>
<pre>root@mymachine:/usr/local/mysql/data/;# <strong>ls -l "*".frm</strong>
-rw-rw---- 1 mysql mysql 8554 2009-01-11 08:00 *.frm</pre>
</blockquote>
<p>Weird.</p>
<p>As a nice surprise, though, the dot (.) is not allowed in database or table names (but is allowed in column names). Nor are the slash (/) and backslash (\). Look <a title="Schema Object Names" href="http://dev.mysql.com/doc/refman/5.0/en/identifiers.html">here</a> for more on this.</p>
<h4>Support for non English naming</h4>
<p>I kinda new about this all along, but never thought of the consequences. It&#8217;s nice to have a relaxed naming rule (I can even name my tables in Hebrew if I like), but &#8220;nice&#8221; doesn&#8217;t always play along with &#8220;practical&#8221;.</p>
<p>As a Hebrew speaker, I repeatedly encounter issues with using my language. In many applications Hebrew encoding is not supported (many times even UTF8 isn&#8217;t). Not to mention the fact that Hebrew is written from right to left. On many occasion I was irritated by the lack of support for non-English or non-ASCII characters.</p>
<p>But not always and not everywhere. I&#8217;ve had my share of programming languages, and, to be honest, I never expected my programming language to support UTF8 encoding for function names, variables, modules, packages or whatever. Using &#8220;a-zA-Z0-9_&#8221; is <em>just fine</em>. Many people who are not well familiar with English just name their variables in their native language, but written with English characters. This works well till you get someone from outside the country, who doesn&#8217;t speak the language and does not understand (nor can pronounce, nor has the matching keyboard layout or knows how to use it) the names.</p>
<p>In the same way, I have no wish for my table names to be named in Hebrew, German or Japanese names. English is <em>just fine</em>.</p>
<p>Using non-letter characters just adds to the mess. Popular &#8220;command&#8221; characters such as &#8216;~&#8217;, &#8216;,&#8217;, &#8216;:&#8217;, &#8216;;&#8217;, &#8216;*&#8217;, &#8216;?&#8217;, &#8216;(&#8216;, &#8216;$&#8217; are better left alone. They don&#8217;t belong in database or table names (mapped to file names) or column names (internally handled by MySQL).</p>
<p>English has become the <em>de-facto</em> computer world language. Programming languages, file systems, TCP/IP protocols, SQL: everything &#8220;speaks&#8221; English.</p>
<h4>Security</h4>
<p>There&#8217;s another aspect, though: security. It may sound silly, but you can actually write complete scripts in a table&#8217;s name! Not wanting to give the wrong idea, I&#8217;m not presenting some table names which can wreak havoc on your machine if used improperly.</p>
<p>But think about it: don&#8217;t we all use a couple of scripts which backup/clean/automate some stuff for us? Don&#8217;t these scripts just go ahead and read some table names, then do stuff on those tables? How well do they trust table names?</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/is-a-valid-column-name/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>REPLACE INTO: think twice</title>
		<link>http://code.openark.org/blog/mysql/replace-into-think-twice</link>
		<comments>http://code.openark.org/blog/mysql/replace-into-think-twice#comments</comments>
		<pubDate>Wed, 17 Dec 2008 07:03:19 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=397</guid>
		<description><![CDATA[The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation. Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO&#8217;s. [...]]]></description>
			<content:encoded><![CDATA[<p>The <a title="REPLACE Syntax" href="http://dev.mysql.com/doc/refman/5.0/en/replace.html">REPLACE [INTO]</a> syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.</p>
<p>Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO&#8217;s. It certainly has a nicer syntax than <a title="INSERT ... ON DUPLICATE KEY UPDATE Syntax" href="http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html">INSERT INTO &#8230; ON DUPLICATE KEY UPDATE</a>, and it&#8217;s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.</p>
<p>But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.<span id="more-397"></span></p>
<p>Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO &#8230; ON DUPLICATE KEY</p>
<p>Also, there&#8217;s the notion of &#8220;well, if the row isn&#8217;t there, we create it. If it&#8217;s there, it simply get&#8217;s updated&#8221;. This is false. The row doesn&#8217;t just get updated, it is completely removed. The problem is, if there&#8217;s a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it&#8217;s an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.</p>
<p>Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it&#8217;s better left alone. On a production system I&#8217;ve seen, changing REPLACE INTO to INSERT INTO &#8230; ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/replace-into-think-twice/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>MySQL&#8217;s character sets and collations demystified</title>
		<link>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified</link>
		<comments>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified#comments</comments>
		<pubDate>Mon, 08 Dec 2008 06:44:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=10</guid>
		<description><![CDATA[MySQL's character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets.]]></description>
			<content:encoded><![CDATA[<p>MySQL&#8217;s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.</p>
<p>This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets.<span id="more-10"></span></p>
<h4>Character Sets</h4>
<p>A thorough discussion of how the character sets have evolved through history is beyond the scope of this post. While the Unicode standard is gaining recognition, the &#8220;older&#8221; character sets are still around. Understanding the difference between Unicode and local character sets is crucial.</p>
<p>Consider, for example, MySQL&#8217;s <strong><code>latin1</code></strong> character set. In this character set there are 256 different characters, represented by one byte. The first 128 characters map to ASCII, the standard &#8220;ABCabc012 dot comma&#8221; set, of which most of this post is composed. The latter 128 characters in <strong><code>latin1</code></strong> are composed of West European specific characters, such as À, ë, õ, Ñ.</p>
<p>A <strong><code>Name VARCHAR(60) CHARSET latin1</code></strong> column can describe names with West European characters. But it cannot describe Russian or Hebrew names. To represent a name in Hebrew, you&#8217;d need the <strong><code>hebrew</code></strong> charset (ISO 8859-8), in which the first 128 characters are, as always, mapped to ASCII, and the latter 128 characters describe the Hebrew alphabet and punctuation marks, such as ש,ל,מ,ה. The Cyrillic, Arabic and Turkish charsets follow in a similar manner.</p>
<p>Assume now I&#8217;m building a world wide web application, such as a popular social network. I would like to store the first names of my users, in every possible language. None of the above character sets support all languages. I therefore turn to <a title="What is Unicode" href="http://www.unicode.org/standard/WhatIsUnicode.html">Unicode</a>. In particular, MySQL supports <strong><code>utf8</code></strong>, a Unicode encoding scheme, which is commonly used due to its economic storage requirements.</p>
<p>In Unicode there is a dedicated number for each letter in the known languages, in ancient languages, and some imaginary or otherwise non existing languages, such as Klingon (yes, I know there are people who actually speak Klingon), may yet find their way into the standard.</p>
<p>UTF8 (or utf8), a Unicode encoding scheme, states the following: for ASCII characters, such as &#8216;a&#8217;, &#8217;6&#8242;, &#8216;$&#8217;, only one byte of storage is required. For Hebrew, Cyrillic or Turkish characters, 2 bytes are required. For Japanese, Chinese &#8211; more (MySQL supports up to 3 bytes per character). Again, the exact details of the implementation are beyond the scope of this post, and are well described <a title="UTF-8 and Unicode FAQ" href="http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8">here</a> and <a title="Wikipedia - UTF-8" href="http://en.wikipedia.org/wiki/UTF-8">here.</a></p>
<p>What&#8217;s important to me is that I can define <strong><code>Name VARCHAR(30) CHARSET utf8</code></strong> for my columns, and Voila! Any name can be represented in my database.</p>
<h4>So why not define everything as utf8 and get done with it?</h4>
<p>Well, it just so happens that Unicode comes with a price. See, for example, the following column definition:<strong></strong></p>
<blockquote><p><code>CountryCode CHAR(3) CHARSET utf8</code></p></blockquote>
<p>We are asking for a column with 3 characters exactly. The required storage for this column will be such that any 3-letter name must fit in. This means (3 characters) times (3 bytes per character) = 9 bytes of storage. So <strong><code>CHAR</code></strong> and <strong><code>utf8</code></strong> together may be less than ideal.<strong><code> VARCHAR</code></strong> behaves better: it only requires as many bytes per character as described above. So the text &#8220;abc&#8221; will only require 3 bytes (plus <strong><code>VARCHAR</code></strong>&#8216;s leading 1 or 2 bytes).</p>
<h4>Why don&#8217;t we drop the &#8216;CHAR&#8217; altogether, then, and use only &#8216;VARCHAR&#8217;?</h4>
<p>Because some values are simply better represented with <strong><code>CHAR</code></strong>: consider a &#8220;password&#8221; column, encoded with MD5. The <strong><code>MD5()</code></strong> function returns a 32 characters long text. It&#8217;s always 32 characters, and, moreover, it&#8217;s always in ASCII. The best data type and character set definition would be <strong><code>password CHAR(32) CHARSET ascii</code></strong>. We thus ensure exactly 32 bytes are allocated to this column. A <strong><code>VARCHAR</code></strong> will acquire an additional byte or two, depending on its defined length, which will indicate the length of the text.</p>
<h4>And why would I care about collations?</h4>
<p>Collations deal with text comparison. We observed that the default character set in MySQL is <strong><code>latin1</code></strong>. The default collation is <strong><code>latin1_swedish_ci</code></strong>. In this collation the following holds true: <strong><code>'ABC' = 'abc'</code></strong>.</p>
<p>Wait. What?</p>
<p>Look at the &#8220;ci&#8221; in <strong><code>latin1_swedish_ci</code></strong>. It stands for &#8220;case insensitive&#8221;. Collations which end with &#8220;cs&#8221; or &#8220;bin&#8221; are case sensitive. The <strong><code>utf8</code></strong> character set comes with <strong><code>utf8_general_ci</code></strong> collation. This can make sense. Let&#8217;s review our web application table (I&#8217;m using plain text passwords here, bare with me for this example):</p>
<blockquote>
<pre>CREATE TABLE my_users (
  name VARCHAR(30) CHARSET utf8 COLLATE utf8_general_ci,
  plainPassword VARCHAR(16) CHARSET ASCII,
  UNIQUE KEY (name)
);
INSERT INTO my_users (name, password) VALUES ('David', 'mypass');</pre>
</blockquote>
<p>It holds true that the name &#8216;David&#8217; equals &#8216;david&#8217;. If I were to <strong><code>SELECT * FROM my_users WHERE name='david'</code></strong>, I would find the desired row. The unique key will also guarantee that no daVID user can be added.</p>
<p>But David certainly wouldn&#8217;t want users to login with the password &#8216;MYPASS&#8217;. So we refine our table:</p>
<blockquote>
<pre>CREATE TABLE my_users (
  name VARCHAR(30) CHARSET utf8 COLLATE utf8_general_ci,
  plainPassword VARCHAR(16) CHARSET ascii COLLATE ascii_bin,
  UNIQUE KEY (name)
);</pre>
</blockquote>
<p>The <strong><code>ascii_bin</code></strong> collation is a case sensitive collation for <strong><code>ascii</code></strong>. The following will not find anything:</p>
<blockquote><p><code>SELECT * FROM my_users WHERE name='david' AND plainPassword='MYPASS';</code></p></blockquote>
<p>Holding a plain text password in your database is not a best practice, but apparently it&#8217;s common.</p>
<p>Collations also deal with text ordering. For any two strings, the collation determines which is larger, or if they are equal. Probably the most common situation you see collations in action is when you <strong>ORDER BY</strong> a text column.</p>
<h4>Also keep in mind</h4>
<ul>
<li>When you check for length of strings, do you use the <strong><code>LENGTH()</code></strong> function, as in <strong><code>SELECT LENGTH(Name) FROM City</code></strong>? You probably wish to replace this with <strong><code>CHAR_LENGTH()</code></strong>. <strong><code>LENGTH()</code></strong> returns the number of bytes required for the text storage. <strong><code>CHAR_LENGTH()</code></strong> returns the number of characters in the text, and is usually what you are looking for. It may hold true that for a string s, <strong><code>LENGTH(s)=12</code></strong> and <strong><code>CHAR_LENGTH(s)=8</code></strong>. Watch out for these glitches.</li>
<li>You can converts texts between character sets with <strong><code>CONVERT</code></strong>. For example: <strong><code>CONVERT(s USING utf8)</code></strong></li>
<li>Stored routines should not be overlooked. If your stored routine accepts a text argument, or if your stored function returns one, make sure the character sets are properly defined. If not, then your utf8 text may be converted to latin1 during the call to your stored routine. This also applies to local parameters within the stored routines.</li>
<li>An <strong><code>ALTER TABLE <em>&lt;some table&gt;</em> CONVERT TO <em>&lt;some charset&gt;</em></code></strong> will change the character set not only for the table itself, but also for all existing textual columns.</li>
</ul>
<p>See the following post: <a title="Useful database analysis queries with INFORMATION_SCHEMA" href="http://code.openark.org/blog/mysql/useful-database-analysis-queries-with-information_schema">Useful database analysis queries with INFORMATION_SCHEMA</a> for queries which diagnose your databases character sets.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mysqls-character-sets-and-collations-demystified/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Dynamic sequencing with a single query</title>
		<link>http://code.openark.org/blog/mysql/dynamic-sequencing-with-a-single-query</link>
		<comments>http://code.openark.org/blog/mysql/dynamic-sequencing-with-a-single-query#comments</comments>
		<pubDate>Wed, 03 Dec 2008 17:59:14 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=271</guid>
		<description><![CDATA[It is a known trick to use a session variable for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

But can it be achieved with one query only? That's more of a problem... I'll provide such a solution, albeit not a pretty one.]]></description>
			<content:encoded><![CDATA[<p>It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.</p>
<p>For example, the following query lists the top 10 populated countries, using <a title="MySQL's world database setup" href="http://dev.mysql.com/doc/world-setup/en/world-setup.html">MySQL&#8217;s world database</a>:</p>
<blockquote>
<pre>SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+</pre>
</blockquote>
<p>The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:</p>
<blockquote>
<pre>SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
</blockquote>
<p>The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.</p>
<p><strong>Can the same be achieved with one query only? </strong>That&#8217;s more of a problem.<span id="more-271"></span> To try this out, I <em>log out</em> (important, otherwise @rank still has its previous value) from my client, log in again, and try the following:</p>
<blockquote>
<pre>SELECT
  @rank := IFNULL(@rank,0)+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    1 | IND  | India              | 1013662000 |
|    1 | USA  | United States      |  278357000 |
|    1 | IDN  | Indonesia          |  212107000 |
|    1 | BRA  | Brazil             |  170115000 |
|    1 | PAK  | Pakistan           |  156483000 |
|    1 | RUS  | Russian Federation |  146934000 |
|    1 | BGD  | Bangladesh         |  129155000 |
|    1 | JPN  | Japan              |  126714000 |
|    1 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
</blockquote>
<p>Ooops. When a session variable is NULL, it only gets assigned <em>after the query completes</em>, instead of per row. (For fun, try running the above query again in the same session, and see what values you get for @rank).</p>
<p>I do not know the reason for this behavior. I don&#8217;t even know if it&#8217;s intended. But I do want to make a workaround. So I try by using various techniques:</p>
<blockquote>
<pre>SELECT
  @rank := CASE @rank WHEN NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
| NULL | CHN  | China              | 1277558000 |
| NULL | IND  | India              | 1013662000 |
| NULL | USA  | United States      |  278357000 |
| NULL | IDN  | Indonesia          |  212107000 |
| NULL | BRA  | Brazil             |  170115000 |
| NULL | PAK  | Pakistan           |  156483000 |
| NULL | RUS  | Russian Federation |  146934000 |
| NULL | BGD  | Bangladesh         |  129155000 |
| NULL | JPN  | Japan              |  126714000 |
| NULL | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
</blockquote>
<p>Well, that wouldn&#8217;t work since NULL compared with NULL returns NULL, right? Let&#8217;s try another:</p>
<blockquote>
<pre>SELECT
  @rank := CASE WHEN @rank IS NULL THEN 0 ELSE @rank + 1 END AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    0 | CHN  | China              | 1277558000 |
|    0 | IND  | India              | 1013662000 |
|    0 | USA  | United States      |  278357000 |
|    0 | IDN  | Indonesia          |  212107000 |
|    0 | BRA  | Brazil             |  170115000 |
|    0 | PAK  | Pakistan           |  156483000 |
|    0 | RUS  | Russian Federation |  146934000 |
|    0 | BGD  | Bangladesh         |  129155000 |
|    0 | JPN  | Japan              |  126714000 |
|    0 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
</blockquote>
<p>We can go on like this (and I did) trying to force the session variable into being set to 0 after the first row. Once can try nested assignment, selecting from DUAL, using IF, NULLIF and more. Still, MySQL will only set the variable, if it&#8217;s NULL, <em>after</em> the query completes. A solution is to force the variable to zero before the query begins. I will use a UNION ALL, in which the first part sets the @rank, and the second performs the query. Since its a UNION, I need to have the same number of columns in both parts. Moreover, since I&#8217;m ORDERing by Population, a column named `Population` must exist in the first part. This leads to the following query:</p>
<blockquote>
<pre>SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)&lt;0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM Country ORDER BY Population DESC LIMIT 10

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|   94 | CHN  | China              | 1277558000 |
|   72 | IND  | India              | 1013662000 |
|  229 | USA  | United States      |  278357000 |
|   71 | IDN  | Indonesia          |  212107000 |
|   29 | BRA  | Brazil             |  170115000 |
|  152 | PAK  | Pakistan           |  156483000 |
|  226 | RUS  | Russian Federation |  146934000 |
|   19 | BGD  | Bangladesh         |  129155000 |
|   82 | JPN  | Japan              |  126714000 |
|  146 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
<p>The first query in the UNION should not return any rows, hence the impossible (@rank := 0)&lt;0 condition.</p></blockquote>
<p>Well, the rank has numbers all right, but what kind of numbers are these? Apparently the ranking took place <em>before</em> the ORDER BY. Not giving up, we try one more time:</p>
<blockquote>
<pre>SELECT NULL AS rank, NULL AS Code, NULL AS Name, NULL AS Population
  FROM DUAL WHERE (@rank := 0)&lt;0
UNION ALL
SELECT @rank := @rank + 1 AS rank, Code, Name, Population
  FROM (SELECT Code, Name, Population
    FROM Country ORDER BY Population DESC LIMIT 10) AS c

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+</pre>
</blockquote>
<p>Now we&#8217;ve got it!</p>
<p>The question arises: why go through all this when a simple two-queries solution is available?</p>
<p>First, as a MySQL excercise, I find this an interesting problem. Second, it just may be possible you&#8217;ll be bound with one single query. For example, reporting tools may only allow for one query per report table. As another example, you may not have a sophisticated connection pool, and you are bound for sending one query per connection, hence unable to store session variables in between.</p>
<p>If you know of other solutions, hopefully simpler ones, please comment below!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/dynamic-sequencing-with-a-single-query/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>
