<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Schema</title>
	<atom:link href="http://code.openark.org/blog/tag/schema/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Tue, 07 Sep 2010 05:53:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>mk-schema-change? Check out ideas from oak-online-alter-table</title>
		<link>http://code.openark.org/blog/mysql/mk-schema-change-check-out-ideas-from-oak-online-alter-table</link>
		<comments>http://code.openark.org/blog/mysql/mk-schema-change-check-out-ideas-from-oak-online-alter-table#comments</comments>
		<pubDate>Wed, 10 Mar 2010 18:28:29 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[Schema]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2144</guid>
		<description><![CDATA[In response to Mark Callaghan&#8217;s post mk-schema-change. I apologize for not commenting on the post itself, I do not hold a Facebook account. Anyway this is a long write, so it may as well deserve a post of its own. Some of the work Mark is describing already exists under openark kit&#8216;s oak-online-alter-table. Allow me [...]]]></description>
			<content:encoded><![CDATA[<p>In response to Mark Callaghan&#8217;s post <a href="http://www.facebook.com/note.php?note_id=356997370932">mk-schema-change</a>.</p>
<p>I apologize for not commenting on the post itself, I do not hold a Facebook account. Anyway this is a long write, so it may as well deserve a post of its own.</p>
<p>Some of the work Mark is describing already exists under <a href="http://code.openark.org/forge/openark-kit">openark kit</a>&#8216;s <a href="http://code.openark.org/forge/openark-kit/oak-online-alter-table">oak-online-alter-table</a>. Allow me to explain what I have gained there, and how the issue can be further pursued. There is relevance to Mark&#8217;s suggestion.</p>
<p><em>oak-online-alter-table</em> uses a combination of locks, chunks and triggers to achieve an almost non-blocking <strong>ALTER TABLE</strong> effect. I had a very short opportunity to speak with Mark on last year&#8217;s conference, in between bites. Mark stated that anything involving triggers was irrelevant in his case.</p>
<p>The triggers are a pain, but I believe a few other insights from <em>oak-online-alter-table</em> can be of interest.<span id="more-2144"></span></p>
<h4>The first attempt</h4>
<p>My first attempt with the script assumed:</p>
<ul>
<li>Table has an <strong>AUTO_INCREMENT PRIMARY KEY</strong> column</li>
<li>New rows always gain ascending <strong>PRIMARY KEY</strong> values</li>
<li><strong>PRIMARY KEY</strong> never changes for an existing row</li>
<li><strong>PRIMARY KEY</strong> values are never reused</li>
<li>Rows may be deleted at will</li>
<li>No triggers exist on the table</li>
<li>No <strong>FOREIGN KEY</strong>s exist on the table.</li>
</ul>
<p>So the idea was: when one wants to do an <strong>ALTER TABLE</strong>:</p>
<ol>
<li>Create a <em>ghost</em> table with the new structure.</li>
<li>Read the minimum and maximum PK values.</li>
<li>Create <strong>AFTER INSERT</strong>, <strong>AFTER UPDATE</strong>, <strong>AFTER DELETE</strong> triggers on the original table. These triggers will propagate the changes onto the <em>ghost</em> table.</li>
<li>Working out slowly, and in small chunks, copy rows within recorded min-max values range into the <em>ghost</em> table. The interesting part is where the script makes sure there&#8217;s no contradiction between these actions and those of the triggers, (whichever came first!). This is largely solved using <strong>INSERT IGNORE</strong> and <strong>REPLACE INTO</strong> in the proper context.</li>
<li>Working out slowly and in chunks again, we <em>remove</em> rows from the <em>ghost</em> table, which are no longer existent in the original table.</li>
<li>Once all chunking is complete, <strong>RENAME</strong> original table to *_old, and <em>ghost</em> table in place of the original table.</li>
</ol>
<p>Steps <strong>4</strong> &amp; <strong>5</strong> are similar in concept to transactional recovery through <em>redo logs</em> and <em>undo logs</em>.</p>
<h4>The next attempt</h4>
<p>Next phase removed the <strong>AUTO_INCREMENT</strong> requirement, as well as the &#8220;no reuse of PK&#8221;. In fact, the only remaining constraints were:</p>
<ul>
<li>There is some <strong>UNIQUE KEY</strong> on the table which is unaffected by the <strong>ALTER</strong> operation</li>
<li>No triggers exist on the table</li>
<li>No <strong>FOREIGN KEY</strong>s exist on the table.</li>
</ul>
<p>The steps are in general very similar to those listed previously, only now a more elaborate chunking method is used with possible non-integer, possible multi-column chunking algorithm. Also, the triggers take care of changes in <strong>UNIQUE KEY</strong> values themselves.</p>
<h4>mk-schema-change?</h4>
<p>Have a look at the <a href="http://code.google.com/p/openarkkit/w/list">wiki pages</a> for OnlineAlterTable*. There is some discussion on concurrency issues; on transactional behavior, which explains why <em>oak-online-alter-table</em> performs correctly. Some of these are very relvant, I believe, to Mark&#8217;s suggestion. In particular, making the chunks copy; retaining transactional integrity, etc.</p>
<p>To remove any doubt, <em>oak-online-alter-table</em> is<em> </em> <strong>not production ready</strong> or anywhere near. Use at your own risk. I&#8217;ve seen it work, and I&#8217;ve seen it crash. I got little feedback and thus little chance to fix things. I also didn&#8217;t touch the code for quite a few months now, so I&#8217;m a little rusty myself.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mk-schema-change-check-out-ideas-from-oak-online-alter-table/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Common wrong Data Types compilation</title>
		<link>http://code.openark.org/blog/mysql/common-data-types-errors-compilation</link>
		<comments>http://code.openark.org/blog/mysql/common-data-types-errors-compilation#comments</comments>
		<pubDate>Tue, 18 Nov 2008 07:37:57 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Normalization]]></category>
		<category><![CDATA[Schema]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=85</guid>
		<description><![CDATA[During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. 

Here's a compilation of "the right and the wrong" data types.]]></description>
			<content:encoded><![CDATA[<p>During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.</p>
<p>Here&#8217;s a compilation of &#8220;the right and the wrong&#8221; data types.<span id="more-85"></span></p>
<ul>
<li><strong><code>INT(1)</code></strong> is not one byte long. <strong><code>INT(10)</code></strong> is no bigger than <strong><code>INT(2)</code></strong>. The number in parenthesis is misleading, and only describes the text alignment of the number, when displayed in an interactive shell. All mentioned types are the same INT, have the same storage capacity, and the same range. If you want a one-byte <strong><code>INT</code></strong>, use <strong><code>TINYINT</code></strong>.</li>
</ul>
<ul>
<li>An integer <strong><code>PRIMARY KEY</code></strong> is preferable, especially if you&#8217;re using the InnoDB storage engine. If possible, avoid using <strong><code>VARCHAR</code></strong> as <strong><code>PRIMARY KEY</code></strong>. In InnoDB, this will make the clustered index deeper, secondary indexes larger (sometimes much larger) and look ups slower.</li>
</ul>
<ul>
<li>Do not use <strong><code>VARCHAR</code></strong> to represent timestamps. It may look like <strong><code>'2008-11-14 07:59:13'</code></strong> is a textual field, but in fact it&#8217;s just an integer counting the seconds elapsed from 1970-01-01. That&#8217;s 4 bytes vs. 19 if you&#8217;re using <strong><code>CHAR</code></strong> with <strong><code>ASCII</code></strong> charset, or more if you&#8217;re using <strong><code>UTF8</code></strong> or <strong><code>VARCHAR</code></strong>.</li>
</ul>
<ul>
<li>Do not use <strong><code>VARCHAR</code></strong> to represent IPv4 addresses. This one is quite common. The IP 192.168.100.255 can be represented with <strong><code>VARCHAR(15)</code></strong>, true, but could be better represented with a 4-byte int. That&#8217;s what IPv4 is: four bytes. Use the <strong><code>INET_ATON()</code></strong> and <strong><code>INET_NTOA()</code></strong> functions to translate between the INT value and textual value.</li>
</ul>
<ul>
<li>This one should be obvious, but I&#8217;ve seen it in reality, where the schema was auto generated by some naive generator: do not represent numbers as text. Yes, I have seen integer columns represented by <strong><code>VARCHAR</code></strong>. Don&#8217;t ask how the performance was.</li>
</ul>
<ul>
<li><strong><code>MD5()</code></strong> columns shouldn&#8217;t be <strong><code>VARCHAR</code></strong>. Use <strong><code>CHAR(32)</code></strong> instead. It&#8217;s always 32 bytes long, so no need for <strong><code>VARCHAR</code></strong>&#8216;s additional byte overhead. If your tables or database are <strong><code>UTF8</code></strong> by default, make sure the MD5 column&#8217;s charset is <strong><code>ASCII</code></strong>, or it will consume 96 bytes instead of just 32. I also suggest the case-sensitive <strong><code>ascii_bin</code></strong> collation, but that&#8217;s a more minor issue.</li>
</ul>
<ul>
<li><strong><code>PASSWORD()</code></strong> columns shouldn&#8217;t be <strong><code>VARCHAR</code></strong>, but <strong><code>CHAR</code></strong>. The length depends on whether you&#8217;re using <strong><code>old-passwords</code></strong> variable (for some strange reason, this variable always appears in the MySQL sample configuration files &#8211; though you really don&#8217;t want it unless it&#8217;s for backward compatibility with older MySQL versions). As in the MD5 note, use <strong><code>ASCII</code></strong> charset.</li>
</ul>
<ul>
<li>Better use <strong><code>TIMESTAMP</code></strong> than <strong><code>INT</code></strong> to count seconds, as MySQL has many supportive functions for this data type.</li>
</ul>
<ul>
<li>Use <strong><code>TINYINT</code></strong>, <strong><code>SMALLINT</code></strong>, <strong><code>MEDIUMINT</code></strong> instead of <strong><code>INT</code></strong> when possible. Do you expect to have 4000000000 customers? No? Then a &#8220;<strong><code>id SMALLINT</code></strong>&#8221; may suffice as <strong><code>PRIMARY KEY</code></strong>.</li>
</ul>
<ul>
<li>Use <strong><code>CHARACTER SET</code></strong>s with care. More on this on future posts.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/common-data-types-errors-compilation/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>
