<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Indexing</title>
	<atom:link href="http://code.openark.org/blog/tag/indexing/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Wed, 01 Feb 2012 08:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>common_schema rev. 68: eval(), processlist_grantees, candidate_keys, easter_day()</title>
		<link>http://code.openark.org/blog/mysql/common_schema-rev-68-eval-processlist_grantees-candidate_keys-easter_day</link>
		<comments>http://code.openark.org/blog/mysql/common_schema-rev-68-eval-processlist_grantees-candidate_keys-easter_day#comments</comments>
		<pubDate>Tue, 06 Sep 2011 07:05:34 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3952</guid>
		<description><![CDATA[Revision 68 of common_schema is out, and includes some interesting features: eval(): Evaluates the queries generated by a given query match_grantee(): Match an existing account based on user+host processlist_grantees: Assigning of GRANTEEs for connected processes candidate_keys: Listing of prioritized candidate keys: keys which are UNIQUE, by order of best-use. easter_day(): Returns DATE of easter day [...]]]></description>
			<content:encoded><![CDATA[<p>Revision <strong>68</strong> of <a rel="nofollow" href="http://code.google.com/p/common-schema/">common_schema</a> is out, and includes some interesting features:</p>
<ul>
<li><strong>eval()</strong>: Evaluates the queries generated by a given query</li>
<li><strong>match_grantee()</strong>: Match an existing account based on user+host</li>
<li><strong>processlist_grantees</strong>: Assigning of GRANTEEs for connected processes</li>
<li><strong>candidate_keys</strong>: Listing of prioritized candidate keys: keys which are UNIQUE, by order of best-use.</li>
<li><strong>easter_day()</strong>: Returns DATE of easter day in given DATETIME's year.</li>
</ul>
<p>Let's take a slightly closer look at these:</p>
<h4>eval()</h4>
<p>I've dedicated this blog post on <a href="http://code.openark.org/blog/mysql/mysql-eval">MySQL eval()</a> to describe it. In simple summary: <strong>eval()</strong> takes a query which generates queries (most common use queries on <strong>INFORMATION_SCHEMA</strong>) and auto-evaluates (executes) those queries. <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/general_procedures.html#eval">Read more</a></p>
<h4>match_grantee()</h4>
<p>As presented in <a title="Link to Finding CURRENT_USER for any user" rel="bookmark" href="http://code.openark.org/blog/mysql/finding-current_user-for-any-user">Finding CURRENT_USER for any user</a>, I've developed the algorithm to match a connected user+host details (as presented with <strong>PROCESSLIST</strong>) with the grantee tables (i.e. the <strong>mysql.user</strong> table), in a manner which simulates the MySQL server account matching algorithm.</p>
<p>This is now available as a stored function: given a user+host, the function returns with the best matched grantee. <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/privileges_functions.html#match_grantee">Read more</a></p>
<h4>processlist_grantees</h4>
<p>This view relies on the above, and maps the entire <strong>PROCESSLIST</strong> onto GRANTEEs. The view maps each process onto the GRANTEE (MySQL account) which is the owner of that process. Surprisingly, MySQL does not provide one with such information.<span id="more-3952"></span></p>
<p>The view also provides with the following useful metadata:</p>
<ul>
<li>Is said process executes under a SUPER privilege?</li>
<li>Is this a replication thread, or serving a replicating client?</li>
<li>Is this process the current connection (myself)?</li>
</ul>
<p>In the spirit of <strong>common_schema</strong>, it provides with the SQL commands necessary to <strong>KILL</strong> and <strong>KILL QUERY</strong> for each process. A sample output:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM common_schema.processlist_grantees;
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+
| ID     | USER       | HOST                | GRANTEE                | grantee_user | grantee_host | is_super | is_repl | sql_kill_query    | sql_kill_connection |
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+
| 650472 | replica    | jboss00.myweb:34266 | 'replica'@'%.myweb'    | replica      | %.myweb      |        0 |       1 | KILL QUERY 650472 | KILL 650472         |
| 692346 | openarkkit | jboss02.myweb:43740 | 'openarkkit'@'%.myweb' | openarkkit   | %.myweb      |        0 |       0 | KILL QUERY 692346 | KILL 692346         |
| 842853 | root       | localhost           | 'root'@'localhost'     | root         | localhost    |        1 |       0 | KILL QUERY 842853 | KILL 842853         |
| 843443 | jboss      | jboss03.myweb:40007 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843443 | KILL 843443         |
| 843444 | jboss      | jboss03.myweb:40012 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843444 | KILL 843444         |
| 843510 | jboss      | jboss00.myweb:49850 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 843510 | KILL 843510         |
| 844559 | jboss      | jboss01.myweb:37031 | 'jboss'@'%.myweb'      | jboss        | %.myweb      |        0 |       0 | KILL QUERY 844559 | KILL 844559         |
+--------+------------+---------------------+------------------------+--------------+--------------+----------+---------+-------------------+---------------------+</pre>
</blockquote>
<p>Finally, it is now possible to execute the following:  “Kill all slow queries which are not executed by users with the SUPER privilege or are replication threads”. To just generate the commands, execute:</p>
<blockquote>
<pre>mysql&gt; SELECT <strong>sql_kill_connection</strong> FROM <strong>common_schema.processlist_grantees</strong> WHERE is_super = 0 AND is_repl = 0;</pre>
</blockquote>
<p>Sorry, did you only want to kill the queries? Those which are very slow? Do as follows:</p>
<blockquote>
<pre>mysql&gt; SELECT sql_kill_connection FROM common_schema.processlist_grantees JOIN INFORMATION_SCHEMA.PROCESSLIST <strong>USING(ID)</strong> WHERE <strong>TIME &gt; 10</strong> AND is_super = 0 AND is_repl = 0;</pre>
</blockquote>
<p>But, really, we don't just want <em>commands</em>. We really want to execute this!</p>
<p>Good! Step in <strong>eval()</strong>:</p>
<blockquote>
<pre>mysql&gt; CALL common_schema.<strong>eval</strong>('SELECT <strong>sql_kill_query</strong> FROM common_schema.processlist_grantees JOIN INFORMATION_SCHEMA.PROCESSLIST USING(id) WHERE TIME &gt; 10 AND is_super = 0 AND is_repl = 0');</pre>
</blockquote>
<p><a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/processlist_grantees.html">Read more</a></p>
<h4>candidate_keys</h4>
<p>A view which lists the candidate keys for tables and provides ranking for those keys, based on some simple heuristics.</p>
<p>This view uses  the same algorithm as that used by <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-chunk-update.html">oak-chunk-update</a> and <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-online-alter-table.html">oak-online-alter-table</a>, tools in the <a href="http://code.openark.org/forge/openark-kit">openark kit</a>. So it provides with a way to choose the best candidate key to walk through a table. At current, a table's <strong>PRIMARY KEY</strong> is always considered to be best, because of InnoDB's structure of clustered index. But I intend to change that as well and provide general recommendation about candidate keys (so for example, I would be able to recommend that the <strong>PRIMARY KEY</strong> is not optimal for some table).</p>
<p>Actually, after a discussion initiated by Giuseppe and Roland, starting <a href="http://datacharmer.blogspot.com/2011/09/finding-tables-without-primary-keys.html">here</a> and continuing on mail, there are more checks to be made for candidate keys, and I suspect the next version of <em>candidate_keys</em> will be more informational.</p>
<p><a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/candidate_keys.html">Read more</a></p>
<h4>easter_day()</h4>
<p>Many thanks to <a href="http://rpbouman.blogspot.com/">Roland Bouman</a> who suggested his code for calculating easter day for a given year. <em>Weehee!</em> This is the first contribution to <em>common_schema</em>! <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/time_functions.html#easter_day">Read more</a></p>
<h4>Get it</h4>
<p><em>common_schema</em> is an open source project. It is released under the BSD license.</p>
<p><a href="http://code.google.com/p/common-schema/">Find it here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/common_schema-rev-68-eval-processlist_grantees-candidate_keys-easter_day/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing common_schema: common views &amp; routines for MySQL</title>
		<link>http://code.openark.org/blog/mysql/announcing-common_schema-common-views-routines-for-mysql</link>
		<comments>http://code.openark.org/blog/mysql/announcing-common_schema-common-views-routines-for-mysql#comments</comments>
		<pubDate>Wed, 13 Jul 2011 04:25:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Schema]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3794</guid>
		<description><![CDATA[Today I have released common_schema, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server. What does it do? There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes &#38; transactions, security, internals... There are [...]]]></description>
			<content:encoded><![CDATA[<p>Today I have released <a title="common_schema" href="http://code.openark.org/forge/common_schema">common_schema</a>, a utility schema for MySQL which includes many views and functions, and is aimed to be installed on any MySQL server.</p>
<h4>What does it do?</h4>
<p>There are views answering for all sorts of useful information: stuff related to schema analysis, data dimensions, monitoring, processes &amp; transactions, security, internals... There are basic functions answering for common needs.</p>
<p>Some of the views/routines simply formalize those queries we tend to write over and over again. Others take the place of external tools, answering complex questions via SQL and metadata. Still others help out with SQL generation.</p>
<p>Here are a few highlights:</p>
<ul>
<li>Did you know you can work out <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/global_status_diff_nonzero.html">simple monitoring</a> of your server with a <em>query</em>?  There's a view to do that for you.</li>
<li>How about showing just <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/processlist_top.html">the good parts of the processlist</a>?</li>
<li>Does your schema have <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/redundant_keys.html">redundant keys</a>?</li>
<li>Or InnoDB tables with <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/no_pk_innodb_tables.html">no PRIMARY KEY</a>?</li>
<li>Is AUTO_INCREMENT <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/auto_increment_columns.html">running out of space</a>?</li>
<li>Can I get the SQL statements to <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/sql_foreign_keys.html">generate my FOREIGN KEYs</a>? To drop them?</li>
<li>And can we finally get <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/sql_show_grants.html">SHOW GRANTS for all accounts</a>, and as an <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/sql_grants.html">SQL query</a>?</li>
<li>Ever needed a <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/general_functions.html#crc64">64 bit CRC function</a>?</li>
<li>And aren't you tired of writing the cumbersome SUBSTRING_INDEX(SUBSTRING_INDEX(str, ',', 3), ',', -1)? <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/string_functions.html#split_token">There's an alternative</a>.</li>
</ul>
<p>There's more. Take a look at the <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/introduction.html">common_schema documentation</a> for full listing. And it's evolving: I've got quite a few ideas already for future components.</p>
<p>Some of these views rely on heavyweight INFORMATION_SCHEMA tables. You should be aware of the impact and <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/risks.html">risks</a>.</p>
<h4>What do I need to install?</h4>
<p>There's no script or executable file. It's just a schema. The distribution in an SQL file which generates <em>common_schema</em>. Much like a dump file.</p>
<h4><span id="more-3794"></span>What are the system requirements?</h4>
<p>It's just between you and your MySQL. There are currently three distribution files, dedicated for different versions of MySQL (and allowing for increased functionality):</p>
<ul>
<li><strong>common_schema_mysql_51</strong>: fits all MySQL &gt;= 5.1 distributions</li>
<li><strong>common_schema_innodb_plugin</strong>: fits MySQL &gt;= 5.1, with InnoDB plugin + INFORMATION_SCHEMA tables enabled</li>
<li><strong>common_schema_percona_server</strong>: fits Percona Server &gt;= 5.1</li>
</ul>
<p>Refer to the <a rel="nofollow" href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/download.html">documentation</a> for more details.</p>
<h4>What are the terms of use?</h4>
<p><em>common_schema</em> is released under the <a href="http://www.opensource.org/licenses/bsd-license.php">BSD license</a>.</p>
<h4>Where can I download it?</h4>
<p>On the <a href="http://code.google.com/p/common-schema/">common_schema project page</a>. Enjoy it!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/announcing-common_schema-common-views-routines-for-mysql/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Reasons to use AUTO_INCREMENT columns on InnoDB</title>
		<link>http://code.openark.org/blog/mysql/reasons-to-use-auto_increment-columns-on-innodb</link>
		<comments>http://code.openark.org/blog/mysql/reasons-to-use-auto_increment-columns-on-innodb#comments</comments>
		<pubDate>Tue, 22 Mar 2011 06:31:18 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Schema]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3196</guid>
		<description><![CDATA[An InnoDB table must have a primary key (one is created if you don't do it yourself). You may have a natural key at hand. Stop! Allow me to suggest an AUTO_INCREMENT may be better. Why should one add an AUTO_INCREMENT PRIMARY KEY on a table on which there's a natural key? Isn't an AUTO_INCREMENT [...]]]></description>
			<content:encoded><![CDATA[<p>An InnoDB table must have a primary key (one is created if you don't do it yourself). You may have a <a href="http://en.wikipedia.org/wiki/Natural_key">natural key</a> at hand. Stop! Allow me to suggest an AUTO_INCREMENT may be better.</p>
<p>Why should one add an AUTO_INCREMENT PRIMARY KEY on a table on which there's a natural key? Isn't an AUTO_INCREMENT a pseudo key, meaning, it doesn't have any explicit relation to the row data, other than it is a number and unique?</p>
<p>Yes, indeed so. Nevertheless, consider:</p>
<ul>
<li>Natural keys are many times multi-columned.</li>
<li>Multi column PRIMARY KEYs make for larger keys, and make for bloated secondary keys as well. You may be wasting space for storing the additional AUTO_INCREMENT column, but you may gain space back on secondary keys.</li>
<li>Multi column PRIMARY KEYs make for more locks. See also <a href="http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key">this post</a>.</li>
<li>InnoDB INSERTs work considerably faster when worked in ascending PRIMARY KEY order. Can you ensure your natural key is in such order?</li>
<li>Even though an AUTO_INCREMENT makes for an INSERT bottleneck (values must be given serially), it is in particular helpful to InnoDB by ensuring PRIMARY KEY values are in ascending order.</li>
<li>AUTO_INCEMENT makes for chronological resolution. You <em>know</em> what came first, and what came next.</li>
<li>In many datasets, more recent entries are often being accessed more, and are therefore "hotter". By using AUTO_INCREMENT, you're ensuring that recent entries are grouped together within the B+ Tree. This means less random I/O when looking for recent data.</li>
<li>A numerical key is in particular helpful in splitting your table (and tasks on your table) into smaller chunks. I write <a href="http://code.google.com/p/openarkkit/">tools</a> which can work out with any PRIMARY KEY combination, but it's easier to work with numbers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/reasons-to-use-auto_increment-columns-on-innodb/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Multi condition UPDATE query</title>
		<link>http://code.openark.org/blog/mysql/multi-condition-update-query</link>
		<comments>http://code.openark.org/blog/mysql/multi-condition-update-query#comments</comments>
		<pubDate>Thu, 27 Jan 2011 08:30:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2401</guid>
		<description><![CDATA[A simple question I've been asked: Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query? For example, is it possible to merge the following two UPDATE statements into one? mysql&#62; UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G'; Query OK, 178 rows affected (0.01 sec) mysql&#62; UPDATE [...]]]></description>
			<content:encoded><![CDATA[<p>A simple question I've been asked:</p>
<p>Is it possible to merge two <strong>UPDATE</strong> queries, each on different <strong>WHERE</strong> conditions, into a single query?</p>
<p>For example, is it possible to merge the following two <strong>UPDATE</strong> statements into one?</p>
<blockquote>
<pre>mysql&gt; <strong>UPDATE</strong> film <strong>SET</strong> rental_duration=rental_duration+1 <strong>WHERE</strong> rating = 'G';
Query OK, 178 rows affected (0.01 sec)

mysql&gt; <strong>UPDATE</strong> film <strong>SET</strong> rental_rate=rental_rate-0.5 <strong>WHERE</strong> length &lt; 90;
Query OK, 320 rows affected (0.01 sec)
</pre>
</blockquote>
<p>To verify our tests, we take a checksum:</p>
<blockquote>
<pre>mysql&gt; pager md5sum
PAGER set to 'md5sum'
mysql&gt; <strong>SELECT</strong> film_id, title, rental_duration, rental_rate <strong>FROM</strong> film <strong>ORDER BY</strong> film_id;
c2d253c3919efaa6d11487b1fd5061f3  -
</pre>
</blockquote>
<p>Obviously, the following query is <strong>incorrect</strong>:<span id="more-2401"></span></p>
<blockquote>
<pre>mysql&gt; <strong>UPDATE</strong> film <strong>SET</strong> rental_duration=rental_duration+1, rental_rate=rental_rate-0.5  <strong>WHERE</strong> rating = 'G' <strong>OR</strong> length &lt; 90;
Query OK, 431 rows affected (0.03 sec)

mysql&gt; pager md5sum
PAGER set to 'md5sum'
mysql&gt; <strong>SELECT</strong> film_id, title, rental_duration, rental_rate <strong>FROM</strong> film <strong>ORDER BY</strong> film_id;
09d450806e2cd7fa78a83ac5bef72d2b  -
</pre>
</blockquote>
<h4>Motivation</h4>
<p>Why would you want to do that?</p>
<ul>
<li>While it may seem strange, the merge can be logically (application-wise) perfectly reasonable.</li>
<li>The <strong>UPDATE</strong> may be time consuming - perhaps it requires full table scan on a large table. Doing it with one scan is faster than two scans.</li>
</ul>
<h4>The solution</h4>
<p>Use a condition for the <strong>SET</strong> clauses, optionally drop the <strong>WHERE</strong> conditions.</p>
<blockquote>
<pre><strong>UPDATE</strong>
 film
<strong>SET</strong>
 rental_duration=<strong>IF</strong>(rating = 'G', rental_duration+1, rental_duration),
 rental_rate=<strong>IF</strong>(length &lt; 90, rental_rate-0.5, rental_rate)
;

mysql&gt; pager md5sum
PAGER set to 'md5sum'
mysql&gt; <strong>SELECT</strong> film_id, title, rental_duration, rental_rate <strong>FROM</strong> film <strong>ORDER BY</strong> film_id;
c2d253c3919efaa6d11487b1fd5061f3  -
</pre>
</blockquote>
<p>The above query necessarily does a full table scan. If there's a benefit to using indexes in the <strong>WHERE</strong> clause, it may still be applied, using an <strong>OR</strong> condition:</p>
<blockquote>
<pre><strong>UPDATE</strong>
 film
<strong>SET</strong>
 rental_duration=<strong>IF</strong>(rating = 'G', rental_duration+1, rental_duration),
 rental_rate=<strong>IF</strong>(length &lt; 90, rental_rate-0.5, rental_rate)
<strong>WHERE</strong>
 rating = 'G'
 OR length &lt; 90
;
</pre>
</blockquote>
<p>If there is a computational overhead to the <strong>IF</strong> statement, I have not noticed it. This kind of solution plays well when each of the distinct queries requires a full scan, on large tables.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/multi-condition-update-query/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Simple guideline for choosing appropriate InnoDB PRIMARY KEYs</title>
		<link>http://code.openark.org/blog/mysql/simple-guideline-for-choosing-appropriate-innodb-primary-keys</link>
		<comments>http://code.openark.org/blog/mysql/simple-guideline-for-choosing-appropriate-innodb-primary-keys#comments</comments>
		<pubDate>Thu, 21 Oct 2010 05:52:45 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2104</guid>
		<description><![CDATA[Risking some flames, I'd like to suggest only two options for choosing PRIMARY KEYs for InnoDB tables. I suggest they should cover 99% (throwing numbers around) of cases. PRIMARY KEY cases An integer (SMALLINT / INT / BIGINT), possibly AUTO_INCREMENT column. The combination of two columns on a many-to-many connecting table (e.g. film_actor, which connects [...]]]></description>
			<content:encoded><![CDATA[<p>Risking some flames, I'd like to suggest only two options for choosing <strong>PRIMARY KEY</strong>s for InnoDB tables. I suggest they should cover 99% (throwing numbers around) of cases.</p>
<h4>PRIMARY KEY cases</h4>
<ol>
<li>An integer (SMALLINT / INT / BIGINT), possibly <strong>AUTO_INCREMENT</strong> column.</li>
<li>The combination of two columns on a many-to-many connecting table (e.g. <strong>film_actor</strong>, which connects <strong>film</strong>s to <strong>actor</strong>s), the two columns being the <strong>PRIMARY KEY</strong>s of respective data tables. This rule may be extended to 3-way relation tables.</li>
</ol>
<p>A short recap: an InnoDB must have a <strong>PRIMARY KEY</strong>. It will pick one if you don't offer it. It can pick a really bad <strong>UNIQUE KEY</strong> (e.g. <strong>website_url(255)</strong>) or make one up using InnoDB internal row ids. If you don't have a good candidate, an <strong>AUTO_INCREMENT PRIMARY KEY</strong> is probably the easiest way out.</p>
<p>A 2-column combination for a many-to-many connection table is common and viable. The <strong>PRIMARY KEY</strong> will not only provide with good join access method, but will also provide with the required <strong>UNIQUE</strong> constraint.</p>
<p>An integer-based <strong>PRIMARY KEY</strong> will make for more compact &amp; shallow index tree structures, which leads to less I/O and page reads.</p>
<p>An <strong>AUTO_INCREMENT</strong> will allow for ascending <strong>PRIMARY KEY</strong> order of <strong>INSERT</strong>, which is InnoDB-friendly: index pages will be more utilized, less fragmented.<span id="more-2104"></span></p>
<h4>Exceptions</h4>
<ul>
<li><strong>You have a partitioned table, e.g. on date range.</strong> With partitioned tables, every UNIQUE KEY, including the PRIMARY KEY, must include partitioning columns. In such case you will have to extend the PRIMARY KEY.</li>
<li><strong>The only key on your table is a unique constraint on some column, e.g. UNIQUE KRY (url).</strong> On one hand, it seems wasteful to create <em>another</em> column (e.g. AUTO_INCREMENT) to use as PRIMARY KEY. On the other hand, I've seen many cases where this kind of PK didn't hold up. At some point there was need for another index. Or some method had to be devised for chunking up table data (<a href="http://code.openark.org/forge/openark-kit/oak-chunk-update">oak-chunk-update</a> can do that even with non-integer PKs). I'm reluctant to use such keys as PRIMARY.</li>
<li>I'm sure there are others.</li>
</ul>
<h4>Umm...</h4>
<p>I wrote the draft for this post a while ago. And then came <a href="http://mituzas.lt/2010/07/30/on-primary-keys/">Domas</a> and ruined it. <a href="http://bugs.mysql.com/bug.php?id=55656">Wait for</a> <strong>5.1.52</strong>?</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/simple-guideline-for-choosing-appropriate-innodb-primary-keys/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Thoughts and ideas for Online Schema Change</title>
		<link>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change</link>
		<comments>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change#comments</comments>
		<pubDate>Thu, 07 Oct 2010 08:29:10 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[Opinions]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Schema]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[Triggers]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3005</guid>
		<description><![CDATA[Here's a few thoughts on current status and further possibilities for Facebook's Online Schema Change (OSC) tool. I've had these thoughts for months now, pondering over improving oak-online-alter-table but haven't got around to implement them nor even write them down. Better late than never. The tool has some limitations. Some cannot be lifted, some could. [...]]]></description>
			<content:encoded><![CDATA[<p>Here's a few thoughts on current status and further possibilities for Facebook's <a href="http://www.facebook.com/note.php?note_id=430801045932">Online Schema Change</a> (OSC) tool. I've had these thoughts for months now, pondering over improving <a href="../../forge/openark-kit/oak-online-alter-table">oak-online-alter-table</a> but haven't got around to implement them nor even write them down. Better late than never.</p>
<p>The tool has some limitations. Some cannot be lifted, some could. Quoting from the <a href="http://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932">announcement</a> and looking at the code, I add a few comments. I conclude with a general opinion on the tool's abilities.</p>
<h4>"The original table must have PK. Otherwise an error is returned."</h4>
<p>This restriction could be lifted: it's enough that the table has a UNIQUE KEY. My original <em>oak-online-alter-table</em> handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.</p>
<p>However, this restriction is of no real interest. As we're mostly interested in InnoDB tables, and since any InnoDB table <em>should have</em> a PRIMARY KEY, we shouldn't care too much.</p>
<h4>"No foreign keys should exist. Otherwise an error is returned."</h4>
<p>Tricky stuff. With <em>oak-online-alter-table</em>, changes to the original table were immediately reflected in the <em>ghost</em> table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn't be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).</p>
<p>The Facebook patch works differently: it captures changes and writes them to a <strong>delta</strong> table,  to be later (asynchronously) analyzed and make for a <em>replay</em> of actions on the <em>ghost</em> table.<span id="more-3005"></span></p>
<p>So in the Facebook code, some cases will lead to undesired behavior. Consider two tables, <strong>country</strong> and <strong>city</strong>, with city holding a RESTRICT/NO ACTION foreign key on <strong>country</strong>'s id. Now consider the scenario:</p>
<ol>
<li>Rows from <strong>city</strong> are DELETEd, where the country Id is Spain's.
<ul>
<li><strong>city</strong>'s ghost table is still unaffected, Spain's cities are still there.</li>
<li>A change is written to the delta table to mark these rows for deletion.</li>
</ul>
</li>
<li>A DELETE is issued on <strong>country</strong>'s Spain record.
<ul>
<li>The DELETE should work, from the user's perspective</li>
<li>But it will fail: city's ghost table has not received the changes yet. There's still matching rows. The NO ACTION constraint will fail the DELETE statement.</li>
</ul>
</li>
</ol>
<p>Now, this does not lead to corruption, just to seemingly unreasonable behavior on the database part. This behavior is probably undesired. NO ACTION constraint won't do.</p>
<p>However, with CASCADE or SET NULL options, there is less of an issue: operations on the parent table (e.g. <strong>country</strong>) cannot fail. We must make sure operations on the ghost table make it consistent with the original table (e.g. <strong>city</strong>).</p>
<p>Consider the following scenario:</p>
<ol>
<li>A new country is created, called "Sleepyland". An INSERT is made to <strong>country</strong>.
<ul>
<li>Both <strong>city</strong> and <strong>city</strong>'s ghost are immediately aware of it.</li>
</ul>
</li>
<li>A new town is created and INSERTed to <strong>city</strong>. The town is called "Naphaven".
<ul>
<li>The change takes time to propagate to <strong>city</strong>'s ghost table.</li>
</ul>
</li>
<li>Meanwhile, we realized we made a mistake. We've been had. There's no such city nor country.
<ol>
<li>We DELETE "Naphaven" from <strong>city</strong>.</li>
<li>We DELETE "Sleepyland" from <strong>country</strong>.</li>
</ol>
<ul>
<li>Note that <strong>city</strong>'s ghost table still hasn't caught up with the changes.</li>
</ul>
</li>
<li>Eventually, the INSERT statement for "Naphaven" reaches <strong>city</strong>'s ghost table.
<ul>
<li>What should happen now? The INSERT cannot succeed.</li>
<li>Will this fail the entire process?</li>
</ul>
</li>
</ol>
<p>Looking at the PHP code, I see that changes written on the <strong>delta</strong> table are blindly replayed on the ghost table.</p>
<p>Since the process is asynchronous, this should not be the case. We can solve the above if we use INSERT IGNORE instead of INSERT. The statement will fail without failing anything else. The row cannot exist, and that's because the original row does not exist anymore.</p>
<p>Unlike a replication corruption, this does not lead to accumulation mistakes. The <strong>replay</strong> is static, somewhat like in <em>binary log format</em>. Changes are <em>just written</em>, regardless of existing data.</p>
<p>I have given this considerable thought, and I can't say I've covered all the possible scenario. However I believe that with proper use of INSERT IGNORE and REPLACE INTO (two statements I heavily relied on with <em>oak-online-alter-table</em>), correctness can be achieved.</p>
<p>There's the small pain of re-generating the foreign key definition on the "ghost" table (<strong>CREATE TABLE LIKE ...</strong> does not copy FK definitions). And since foreign key names are unique, a new name must be picked up. Not pretty, but perfectly doable.</p>
<h4>"No AFTER_{INSERT/UPDATE/DELETE} triggers must exist."</h4>
<p>It would be nicer if MySQL had an ALTER TRIGGER statement. There isn't such statement. If there were such an atomic statement, then we would be able to rewrite the trigger, so as to add our own code to the <em>end of the trigger's code</em>. Yuck. Would be even nicer if we were <a href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-ii">allowed to have multiple triggers</a> of same event.</p>
<p>So, we are left with DROP and CREATE triggers. Alas, this makes for a short period where the trigger does not exist. Bad. The easy solution would be to LOCK WRITE the table, but apparently you can't DROP the trigger (*) when the table is locked. Sigh.</p>
<p>(*) Happened to me, apparently to Facebook too; With latest 5.1 (5.1.51) version this actually works. With 5.0 it didn't use to; this needs more checking.</p>
<h4>Use of INFORMATION_SCHEMA</h4>
<p>As with oak-online-alter-table, the OSC checks for triggers, indexes, column by searching on the INFORMATION_SCHEMA tables. This makes for nice SQL for getting the exact listing and types of PRIMARY KEY columns, whether or not AFTER triggers exist, and so on.</p>
<p>I've always considered this to be the weak part of <a href="http://code.openark.org/forge/openark-kit">openark-kit</a>, that it relies on INFORMATION_SCHEMA so much. It's easier, it's cleaner, it's even <em>more correct</em> to work that way -- but it just puts too much locks. I think Baron Schwartz (and now Daniel Nichter) did amazing work on analyzing table schemata by parsing the SHOW CREATE TABLE and other SHOW commands regex-wise with <a href="http://www.maatkit.org/">Maatkit</a>. It's a crazy work! Had I written <em>openark-kit</em> in Perl, I would have just import their code. But I'm too <span style="text-decoration: line-through;">lazy</span> busy to do the conversion from Perl to Python, and rewrite that code, what with all the debugging.</p>
<p>OSC is written in PHP. Again, much conversion work. I think performance-wise this is an important step to make.</p>
<h4>A word for the critics</h4>
<p>Finally, a word for the critics. I've read some Facebook/MySQL bashing comments and wish to relate.</p>
<p>In his <a href="http://www.theregister.co.uk/2010/09/21/facebook_online_schema_change_for_mysql/">interview to The Register</a>, Mark Callaghan gave the example that "Open Schema Change lets the company update indexes without user downtime, according to Callaghan".</p>
<p>PostgreSQL was mentioned for being able to add index with only read locks taken, or being able to do the work with no locks using CREATE INDEX CONCURRENTLY. I wish MySQL had that feature! Yes, MySQL has a lot to improve upon, and the latest PostgreSQL 9.0 brings valuable new features. (Did I make it clear I have no intention of bashing PostgreSQL? If not, please re-read this paragraph until convinced).</p>
<p>Bashing related to the notion of MySQL being so poor that Facebook used an even poorer mechanism to work out the ALTER TABLE.</p>
<p>Well, allow me to add a few words: the CREATE INDEX is by far not the only thing you can achieve with OSC (although it may be Facebook's major concern). You should be able to:</p>
<ul>
<li>Add columns</li>
<li>Drop columns</li>
<li>Convert character sets</li>
<li>Modify column types</li>
<li>Add partitioning</li>
<li>Reorganize partitioning</li>
<li>Compress the table</li>
<li>Otherwise changing table format</li>
<li>Heck, you could even modify the storage engine! (To other transactional engine)</li>
</ul>
<p>These are giant steps. How easy would it be to write these down into the database? It only takes a few weeks time to work out a working solution with reasonable limitations, just using the resources the MySQL server provides you with. The <a href="http://www.facebook.com/MySQLatFacebook">MySQL@Facebook team</a> should be given credit for that.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>How often should you use OPTIMIZE TABLE? - followup</title>
		<link>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup</link>
		<comments>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup#comments</comments>
		<pubDate>Mon, 04 Oct 2010 08:07:45 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2882</guid>
		<description><![CDATA[This post follows up on Baron's How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table's indexes. I worked on some production data I was authorized to provide as example. The [...]]]></description>
			<content:encoded><![CDATA[<p>This post follows up on Baron's <a href="http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/">How often should you use OPTIMIZE TABLE?</a>. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the <strong>OPTIMIZE</strong> operation on table's indexes. I worked on some production data I was authorized to provide as example.</p>
<h4>The use case</h4>
<p>I'll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I've rewritten some column names for privacy:</p>
<blockquote>
<pre>mysql&gt; show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8</pre>
</blockquote>
<p>The table had log records starting <strong>2010-08-23</strong> and up till <strong>2010-09-02</strong> noon. Table status:<span id="more-2882"></span></p>
<blockquote>
<pre>mysql&gt; show table status like 'logs'\G
*************************** 1. row ***************************
           Name: logs
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 22433048
 Avg_row_length: 206
    Data_length: 4625285120
Max_data_length: 0
   Index_length: 1437073408
      Data_free: 4194304
 Auto_increment: 186878920
    Create_time: 2010-08-24 18:10:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
        Comment:</pre>
</blockquote>
<p>(A bit puzzled on the <strong>Create_time</strong>; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the <strong>Create_time</strong> field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).</p>
<p>I was using <a href="http://www.percona.com/downloads/Percona-Server-5.1/">Percona-Server-5.1.47-11.2</a>, and so was able to look at the index statistics for that table:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+----------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys   | index_size | leaf_pages |
+--------------+------------+--------------+--------+----------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1              |     282305 |     246856 |
| newsminer    | logs       | s            |      2 | 17, 1          |      38944 |      33923 |
| newsminer    | logs       | name         |      3 | 2492739, 10, 2 |      22432 |      19551 |
| newsminer    | logs       | origin       |      3 | 1303, 4, 1     |      26336 |      22931 |
+--------------+------------+--------------+--------+----------------+------------+------------+</pre>
</blockquote>
<h4>Status after massive purge</h4>
<p>My first requirement was to purge out all record up to <strong>2010-09-01 00:00:00</strong>. I did so in small chunks, using <a href="http://code.openark.org/forge/openark-kit">openark kit</a>'s oak-chunk-update (same can be achieved with <a href="http://www.maatkit.org/">maatkit</a>'s mk-archiver). The process purged <strong>1000</strong> rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in <a href="http://code.openark.org/blog/mysql/monotonic-functions-sql-and-mysql">monotonically ascending</a> values, purging of old rows also means purging of lower PKs, which means we're trimming the PK tree from left.</p>
<p>Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40961 |      35262 |
| newsminer    | logs       | s            |      2 | 26, 1        |      34440 |       3798 |
| newsminer    | logs       | name         |      3 | 341011, 4, 1 |       4738 |       2774 |
| newsminer    | logs       | origin       |      3 | 341011, 4, 2 |      10178 |       3281 |
+--------------+------------+--------------+--------+--------------+------------+------------+</pre>
</blockquote>
<p>The number of deleted rows was roughly <strong>85%</strong> of total rows, so down to <strong>15%</strong> number of rows.</p>
<h4>Status after OPTIMIZE TABLE</h4>
<p>Time to see whether <strong>OPTIMIZE</strong> really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?</p>
<blockquote>
<pre>mysql&gt; OPTIMIZE TABLE logs;
...
mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40436 |      35323 |
| newsminer    | logs       | s            |      2 | 16, 1        |       5489 |       4784 |
| newsminer    | logs       | name         |      3 | 335813, 7, 1 |       3178 |       2749 |
| newsminer    | logs       | origin       |      3 | 335813, 5, 2 |       3951 |       3446 |
+--------------+------------+--------------+--------+--------------+------------+------------+
4 rows in set (0.00 sec)</pre>
</blockquote>
<p>The above shows no significant change in either of the indexes: not for <strong>index_size</strong>, not for <strong>leaf_pages</strong>, not for statistics (<strong>row_per_keys</strong>). The <strong>OPTIMIZE</strong> did not reduce index size. It did not reduce the number of index pages (<strong>leaf_pages</strong> are the major factor here). Some <strong>leaff_pages</strong> values have even increased, but in small enough margin to consider as equal.</p>
<p>Index-wise, the above example does not show an advantage to using <strong>OPTIMIZE</strong>. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.</p>
<h4>So, no use for OPTIMIZE?</h4>
<p>Think again: file system-wise, things look different.</p>
<p>Before purging of data:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd</pre>
</blockquote>
<p>After purging of data:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd</pre>
</blockquote>
<p>Recall that InnoDB never releases table space back to file system!</p>
<p>After <strong>OPTIMIZE</strong> on table:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd</pre>
</blockquote>
<p>On <strong>innodb_file_per_table</strong> an <strong>OPTIMIZE</strong> creates a new table space, and the old one gets destroyed. Space goes back to file system. Don't know about you; I like to have my file system with as much free space as possible.</p>
<h4>Need to verify</h4>
<p>I've tested Percona Server, since this is where I can find <strong>INNODB_INDEX_STATS</strong>. But this begs the following questions:</p>
<ul>
<li>Perhaps the results only apply for Percona Server? (I'm guessing not).</li>
<li>Or only for InnoDB plugin? Does the same hold for "builtin" InnoDB? (dunno)</li>
<li>Only on &gt;= 5.1? (Maybe; 5.0 is becoming rare now anyway)</li>
<li>Only on InnoDB (Well, of course this test is storage engine dependent!)</li>
</ul>
<h4>Conclusion</h4>
<p>The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle <strong>UPDATE</strong>s to indexed columns. I have some more operations to do here, with larger tables (e.g. <strong>40GB</strong> compressed). If anything changes, I'll drop a note.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part II</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii#comments</comments>
		<pubDate>Thu, 12 Aug 2010 03:24:06 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2801</guid>
		<description><![CDATA[Continuing Table refactoring &#38; application version upgrades, Part I, we now discuss code &#38; database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both. As I'll show, DROP operations are significantly simpler than creation operations. Interestingly, it's [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing <a href="http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i">Table refactoring &amp; application version upgrades, Part I</a>, we now discuss code &amp; database upgrades which require <strong>DROP</strong> operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.</p>
<p>As I'll show, DROP operations are significantly simpler than creation operations. Interestingly, it's the same as in life.</p>
<h4>DROP COLUMN</h4>
<p>A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Remove all references to column; make sure no queries use said column.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP COLUMN</strong>.</li>
</ol>
<h4>DROP INDEX</h4>
<p>A possibly simpler case here. Why would you drop an index? Is it because you found out you never use it anymore? Then all you have to do is just drop it.</p>
<p>Or perhaps you don't need the functionality the index supports anymore? Then first drop the functionality:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>. Check out InnoDB Plugin here.<span id="more-2801"></span></li>
</ol>
<h4>DROP UNIQUE INDEX</h4>
<p>When using Master-Slave failover for table refactoring, we're now removing a constraint from the slave. Since the master is more constrained than the slave, there is no problem here. It's mostly the same as with a normal DROP INDEX, with a minor addition:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that inserts duplicates.</li>
</ol>
<h4>DROP FOREIGN KEY</h4>
<p>Again, we are removing a constraint.</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that conflicts with removed constraint. I mean, if you really know what you are doing.</li>
</ol>
<h4>DROP TABLE</h4>
<p>The very simple steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Make sure no reference to table is made.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong>. Issue a <strong>DROP TABLE</strong>.</li>
</ol>
<p>With <strong>ext3</strong> dropping a large table is no less than a nightmare. Not only does the action take long time, it also locks down the table cache, which very quickly leads to having dozens of queries hang. <strong>xfs</strong> is a good alternative.</p>
<h4>Conclusion</h4>
<p>We looked at single table operations, coupled with application upgrades. By carefully looking at the process breakdown, multiple changes can be addressed with ease and safety. Not all operations are completely safe when used with replication failover. But they are mostly safe if you have some trust in your code.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part I</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i#comments</comments>
		<pubDate>Tue, 10 Aug 2010 12:36:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2775</guid>
		<description><![CDATA[A developer's major concern is: How do I do application &#38; database upgrades with minimal downtime? How do I synchronize between a DB's version upgrade and an application's version upgrade? I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to [...]]]></description>
			<content:encoded><![CDATA[<p>A developer's major concern is: <em>How do I do application &amp; database upgrades with minimal downtime? How do I synchronize between a DB's version upgrade and an application's version upgrade?<br />
</em></p>
<p>I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.</p>
<h4>Reader prerequisites</h4>
<p>I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an <strong>ALTER</strong> command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of <a href="http://mysql-mmm.org/">MMM for MySQL</a>. When I describe "Failover from <strong>M1</strong> to <strong>M2</strong>", I mean "Make the <strong>ALTER</strong> changes on <strong>M2</strong> (passive), then switch your application from <strong>M1</strong> to <strong>M2</strong> (change of IPs, VIP, etc.), promoting <strong>M2</strong> to active position, then apply same changes on <strong>M1</strong> (now passive) or completely rebuild it".</p>
<p>Phew, a one sentence description of M-M usage...</p>
<p>I also assume the reader's understanding that a table's schema can be different on master &amp; slave, which is the basis for the "use replication for refactoring" trick. But it cannot be too different, or, to be precise, the two schemata must both support the ongoing queries for the table.</p>
<p>A full discussion of the above is beyond the scope of this post.</p>
<h4>Types of refactoring needs</h4>
<p>As I limit this discussion to single table refactoring,we can look at major refactoring operations and their impact on application &amp; upgrades. We will discuss ADD/DROP COLUMN, ADD/DROP INDEX, ADD/DROP UNIQUE INDEX, ADD/DROP FOREIGN KEY, ADD/DROP TABLE.</p>
<p>We will assume the database and application are both in Version #1 (<strong>V1</strong>), and need to be upgraded to <strong>V2</strong> or greater.<span id="more-2775"></span></p>
<h4>ADD INDEX</h4>
<p>Starting with the easier actions. Why would you add an index? Either:</p>
<ol>
<li>There is some existing query which can be optimized by the new query</li>
<li>Or there is some new functionality which issues a query for which the new index is required.</li>
</ol>
<p>Adding an index is an easy action in that the table's data does not really change.</p>
<p>In case <strong>#1</strong>, all you need to do is to add the new index (if the table is large, fail over from <strong>M1</strong> to <strong>M2</strong>). There is no application upgrade, so all that happens is that the database upgrades <strong>V1 </strong>-&gt;<strong> V2</strong>.</p>
<p>In case <strong>#2</strong>, the database must be prepared with new schema before the new functionality/query is introduced (since it depends on the existence of the index). The steps, therefore, are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>)</li>
<li>(Sometime later) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will issue queries which utilize the new index.</li>
</ol>
<p>The application does not have to be upgraded at the same instant the DB gets upgraded. In fact, we'll see that this is a typical scenario: we can separate upgrades into smaller steps, which allow for time lapse. One <em>could</em> work out steps <strong>1</strong> &amp; <strong>2</strong> together, but that would take an extra effort.</p>
<h4>ADD COLUMN</h4>
<p>This must be one of the most common table schema upgrades: a new property is needed on the application side. It must be supported by the database. Perhaps a new field in some Java Object, with Hibernate mapping that field onto a new column. Or maybe the new column is there for purpose of de-normalization.</p>
<p>This is also a more complicated task. Let's look at the required steps:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD COLUMN</strong>.</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change is: provide column value for newly <strong>INSERT</strong>ed rows.</li>
<li>If needed, retroactively update column values for all pre-existing rows.</li>
<li>App: <strong>V2</strong> -&gt; <strong>V3</strong>. Application begins to use (read, <strong>SELECT</strong>) new column.</li>
</ol>
<p>The above procedure assumes that the new column must have some calculated value. A 10-million rows table must now be updated, to have the correct values filled in. So we ask of the application to start filling in data for new rows, which makes the invalid row set static. We can just take a "from row" and a "to row" and fill in the missing column's value for those rows. Only when all rows contain valid values can we let the application start using that row. This makes for <em>two</em> application upgrades.</p>
<p>If you're content with just a static <strong>DEFAULT</strong> value, then step <strong>3</strong> can be skipped, and step <strong>4</strong> can be merged with step <strong>2</strong>.</p>
<h4>ADD UNIQUE INDEX</h4>
<p>This is an altogether different case than the normal <strong>ADD INDEX</strong>, even though they may seem similar. And the case is particularly different when using Master-Slave failover for rebuilding the table.</p>
<p>Consider the case where we add a <strong>UNIQUE INDEX</strong> on a slave. Some <strong>INSERT</strong> query executes on the master, successfully, and is logged to the binary log. The slave picks it up, tries to execute it, to find that it fails on a DUPLICATE KEY error.</p>
<p>The <strong>UNIQUE INDEX</strong> is a constraint, and it makes the slave more constrained than the master. This is a delicate situation. Here how to (mostly) work it out:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change <strong>INSERT</strong> queries on relevant table to <strong>INSERT IGNORE</strong> or <strong>REPLACE</strong> queries, whichever is more appropriate.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD UNIQUE KEY</strong> (and while at it, a tip: are you aware of <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html">ALTER IGNORE TABLE</a>?)</li>
</ol>
<p>The change of query ensures that the query will succeed on the slave (either by silently doing nothing or by actually replacing content). It also means that the slave can now have different data than the master. Of course, it you trust your application to never <strong>INSERT</strong> duplicates, you can sleep better.</p>
<p>We do not handle <strong>UPDATE</strong> statements here.</p>
<h4>ADD CONSTRAINT FOREIGN KEY</h4>
<p>As with <strong>ADD UNIQUE INDEX</strong>, there is a new constraint here. A slave becomes more constrained than the master. But we now have to make sure <strong>INSERT</strong>, <strong>UPDATE</strong> and <strong>DELETE</strong> statements all go peacefully (well, it also depends on the type of <strong>ON DELETE</strong> and <strong>ON UPDATE</strong> property of the FK).</p>
<p>The steps would be:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD CONSTRAINT FOREIGN KEY</strong>.</li>
</ol>
<p>And then cross your fingers or have trust in your application. If the table is small enough, one does not have to use replication to do the refactoring, and life is simpler. Just execute the <strong>ALTER</strong> on the active master, and continue with your life.</p>
<h4>CREATE TABLE</h4>
<p>This is a simple case, since the table is new. The steps are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (no need to use slaves here)</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will start using new table.</li>
</ol>
<h4>Conslusion</h4>
<p>Having such steps formalized help with development management and database management. It makes clear what is expected of the application, and what is expected of the database. The breaking down of these operations into sequential steps allows us to work more slowly; make preparation work; work within our own working hours; get a chance to see the family.</p>
<p>In this post we took a look at "creation" refactoring changes. New columns, new keys, new constraints. In the next part of this article, we'll discuss <strong>DROP</strong> operations.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SQL: forcing single row tables integrity</title>
		<link>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity</link>
		<comments>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity#comments</comments>
		<pubDate>Tue, 22 Jun 2010 04:58:51 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2523</guid>
		<description><![CDATA[Single row tables are used in various cases. Such tables can be used for "preferences" or "settings"; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc. The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them [...]]]></description>
			<content:encoded><![CDATA[<p>Single row tables are used in various cases. Such tables can be used for "preferences" or "settings"; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.</p>
<p>The problem with single row tables is that, well, they must have s single row. And the question is: <em>how can you force them to have just one row?</em></p>
<h4>The half-baked solution</h4>
<p>The common solution is to create a <strong>PRIMARY KEY</strong> and always use the same value for that key. In addition, using <strong>REPLACE</strong> or <strong>INSERT INTO ON DUPLICATE KEY UPDATE</strong> helps out in updating the row. For example:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
CREATE TABLE heartbeat (
 id int NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
 );
</pre>
</blockquote>
<p>The above table definition is taken from <a href="http://www.maatkit.org/doc/mk-heartbeat.html">mk-heartbeat</a>. It should be noted that <em>mk-heartbeat</em> in itself does not require that the table has a single row, so it is not the target of this post. I'm taking the above table definition as a very simple example.</p>
<p>So, we assume we want this table to have a single row, for whatever reasons we have. We would usually do:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
REPLACE INTO heartbeat (id, ts) VALUES (1, NOW());
</pre>
</blockquote>
<p>or</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON DUPLICATE KEY UPDATE ts = NOW();
</pre>
</blockquote>
<p>Why is the above a <em>"half baked solution"</em>? Because it is up to the application to make sure it reuses the same <strong>PRIMARY KEY</strong> value. There is nothing in the database to prevent the following:<span id="more-2523"></span></p>
<blockquote><pre class="brush: sql; title: ; notranslate">
REPLACE INTO heartbeat (id, ts) VALUES (73, NOW()); -- Ooops
</pre>
</blockquote>
<p>One may claim that <em>"my application has good integrity"</em>. That may be the case; but I would then raise the question: <em>why, then, would you need <strong>FOREIGN KEY</strong>s</em>? Of course, many people don't use <strong>FOREIGN KEY</strong>s, but I think the message is clear.</p>
<h4>A heavyweight solution</h4>
<p>Triggers <a href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-i">can help out</a>. But really, this is an overkill.</p>
<h4>A solution</h4>
<p>I purpose a solution where, much like <strong>FOREIGN KEY</strong>s, the database will force the integrity of the table; namely, have it contain <em>at most one row</em>.</p>
<p>For this solution to work, we will need a strict <strong>sql_mode</strong>. I'll show later what happens when using a relaxed <strong>sql_mode</strong>:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
SET sql_mode='STRICT_ALL_TABLES'; -- Session scope for the purpose of this article
</pre>
</blockquote>
<p>Here's a new table definition:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
CREATE TABLE heartbeat (
 integrity_keeper ENUM('') NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
);
</pre>
</blockquote>
<p>Let's see what happens now:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW());
Query OK, 1 row affected (0.00 sec)

mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES ('', NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES (0, NOW());
ERROR 1265 (01000): Data truncated for column 'integrity_keeper' at row 1
mysql&gt; INSERT INTO heartbeat (integrity_keeper, ts) VALUES (1, NOW());
ERROR 1062 (23000): Duplicate entry '' for key 'PRIMARY'

mysql&gt; REPLACE INTO heartbeat (ts) VALUES (NOW());
Query OK, 2 rows affected (0.00 sec)

mysql&gt; INSERT INTO heartbeat (ts) VALUES (NOW()) ON DUPLICATE KEY UPDATE ts = NOW();
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM heartbeat;
+------------------+---------------------+
| integrity_keeper | ts                  |
+------------------+---------------------+
|                  | 2010-06-15 09:12:19 |
+------------------+---------------------+
</pre>
</blockquote>
<p>So the trick is to create a <strong>PRIMARY KEY</strong> column which is only allowed a single value.</p>
<p>The above shows I cannot force another row into the table: the schema will prevent me from doing so. Mission accomplished.</p>
<h4>Further thoughts</h4>
<p>The <strong>CHECK</strong> keyword is the real solution to this problem (and other  problems). However, it is ignored by MySQL.</p>
<p>It is interesting to note that with a relaxed <strong>sql_mode</strong>, the <strong>INSERT INTO heartbeat (integrity_keeper, ts) VALUES (0, NOW());</strong> query succeeds. Why? The default <strong>ENUM</strong> value is <strong>1</strong>, and, being in relaxed mode, <strong>0</strong> is allowed in, even though it is not a valid value (Argh!).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-forcing-single-row-tables-integrity/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

