<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Development</title>
	<atom:link href="http://code.openark.org/blog/category/development/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Sat, 08 Jun 2013 04:41:13 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Documentation in SQL: CALL for help()</title>
		<link>http://code.openark.org/blog/mysql/documentation-in-sql-call-for-help</link>
		<comments>http://code.openark.org/blog/mysql/documentation-in-sql-call-for-help#comments</comments>
		<pubDate>Wed, 11 Jan 2012 07:01:54 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[mycheckpoint]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[Stored routines]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4536</guid>
		<description><![CDATA[Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation. The matter of keeping the documentation faithful is a topic of interest. I'd like to outline a few documentation bundling possibilities, [...]]]></description>
				<content:encoded><![CDATA[<p>Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation.</p>
<p>The matter of keeping the documentation faithful is a topic of interest. I'd like to outline a few documentation bundling possibilities, and the present the coming new documentation method for <a href="http://code.google.com/p/common-schema/" rel="nofollow">common_schema</a>. I'll talk about any bundling that is NOT <em>man pages</em>.</p>
<h4>High level: web docs</h4>
<p>This is the initial method of documentation I used for <a title="openark kit" href="../../forge/openark-kit">openark kit</a> and <a title="mycheckpoint" href="../../forge/mycheckpoint">mycheckpoint</a>. It's still valid for <em>mycheckpoint</em>. Documentation is web-based. You need Internet access to read it. It's in HTML format.</p>
<p>Well, not exactly HTML format: I wrote it in WordPress. Yes, it's HTML, but there's a lot of noise around (theme, menus, etc.) which is not strictly part of the documentation.</p>
<p>While this is perhaps the easiest way to go, here's a few drawbacks:<span id="more-4536"></span></p>
<ul>
<li>You're bound to some framework (WordPress in this case)</li>
<li>Docs are split between MySQL database (my underlying WordPRess storage) &amp; WordPress files (themes, style, header, footer etc.)</li>
<li>Documentation is separate from your code - they're just not in the same place</li>
<li>There is no version control over the documentation.</li>
</ul>
<p>The result is a single source of documentation, which applies to whatever version is latest. It's impossible to maintain docs for multiple versions. You must manually synchronize your WordPress updates with code commits (or rather - code release!).</p>
<h4>Mid level: version controlled HTML docs</h4>
<p>I first saw this approach on Baron's <a href="http://www.xaprb.com/blog/2010/09/22/aspersa-gets-a-user-manual/" rel="bookmark">Aspersa gets a user manual</a> post. I loved it: the documentation is HTML, but stored as part of your project's code, in same version control.</p>
<p>This means one can <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/introduction.html">browse the documentation</a> (<em>openark kit</em> in this example) exactly as it appears in the baseline. Depending on your project hosting, one may be able to do so per version.</p>
<p>The approach has the great benefit of having the docs tightly coupled with the code in terms of development. Before committing code, one updates documentation for that code, then commits/releases both together.</p>
<p>You're also not bound to any development framework. You may edit with <em>vim, emacs, gedit, bluefish, eclipse,</em> ... any tool of your choice. It's all down to plain old text files.</p>
<h4>Mid level #2: documentation bundling</h4>
<p>One thing I started doing with common_schema is to release a doc bundle with the code. So one can download a compressed bundle of all HTML files. That way one is absolutely certain what's the right documentation for revision <strong>178</strong>. There's no effort about it: the docs are already tightly coupled with code versions. Just compress and distribute.</p>
<h4>Low level: documentation coupled with your code</h4>
<p>Perl scripts can be written as Perl modules, in which case they are eligible for using the <em>perldoc</em> convention. You code your documentation within your script itself, as comment. <em>Perldoc</em> can extract the documentation and present in man-like format. Same happens with Python's <em>pydoc</em>. Baron's <a href="http://www.xaprb.com/blog/2011/11/07/when-documentation-is-code/" rel="bookmark">When documentation is code</a> illustrates that approach. <a href="http://www.maatkit.org/">Maatkit</a> (now <em>Percona Toolkit</em>) has been using it for years.</p>
<p>This method has the advantage of having the documentation ready right within your shell. You don't need a browser, nor firewall access. The docs are just there for you in the same environment where you're executing the code.</p>
<h4>SQL Low level: CALL for help()</h4>
<p><em>common_schema</em> is a different type of project. It is merely a schema. There's no Perl nor Python. One imports the schema into one's MySQL server.</p>
<p>What's the low-level approach for this type of code?</p>
<p>For <em>common_schema</em> I use three levels of documentation: the mid-level, where one can <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/introduction.html">browse through the versioned docs</a>, the 2nd mid-level, where one can <a href="http://code.google.com/p/common-schema/downloads/list">download bundled documentation</a>, and then a low-level approach: documentation embedded within the code.</p>
<p>MySQL's documentation is also built into the server: see the <strong>help_*</strong> tables within the <strong>mysql</strong> schema. The <em>mysql</em> command line client allows one to access help by supporting the help command, e.g.</p>
<blockquote>
<pre>mysql&gt; help create table;</pre>
</blockquote>
<p>The client intercepts this command (this is not server side command) and searches through the <strong>mysql.help_*</strong> docs.</p>
<p>With <em>common_schema</em>, I don't have control over the client; it's all on server side. But the code being a schema, what with stored routines and tables, it's easy enough to set up documentation.</p>
<p>As of the next version of <em>common_schema</em>, and following MySQL's method, <em>common_schema</em> provides a <strong>help</strong> table:</p>
<blockquote>
<pre>DESC help;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| topic        | varchar(32) | NO   | PRI | NULL    |       |
| help_message | text        | NO   |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+</pre>
</blockquote>
<p>And a <strong>help()</strong> procedure, so that you can call for <em>help()</em>. The procedure will look for the best matching document based on your search expression:</p>
<blockquote>
<pre>root@mysql-5.1.51&gt; <strong>CALL help('match');</strong>
<strong>+---------------------------------------</strong>----------------------------------------+
| help                                                                          |
+-------------------------------------------------------------------------------+
|                                                                               |
| NAME                                                                          |
|                                                                               |
| match_grantee(): Match an existing account based on user+host.                |
|                                                                               |
| TYPE                                                                          |
|                                                                               |
| Function                                                                      |
|                                                                               |
| DESCRIPTION                                                                   |
|                                                                               |
| MySQL does not provide with identification of logged in accounts. It only     |
| provides with user + host:port combination within processlist. Alas, these do |
| not directly map to accounts, as MySQL lists the host:port from which the     |
| connection is made, but not the (possibly wildcard) user or host.             |
| This function matches a user+host combination against the known accounts,     |
| using the same matching method as the MySQL server, to detect the account     |
| which MySQL identifies as the one matching. It is similar in essence to       |
| CURRENT_USER(), only it works for all sessions, not just for the current      |
| session.                                                                      |
|                                                                               |
| SYNOPSIS                                                                      |
|                                                                               |
|                                                                               |
|                                                                               |
|        match_grantee(connection_user char(16) CHARSET utf8,                   |
|        connection_host char(70) CHARSET utf8)                                 |
|          RETURNS VARCHAR(100) CHARSET utf8                                    |
|                                                                               |
|                                                                               |
| Input:                                                                        |
|                                                                               |
| * connection_user: user login (e.g. as specified by PROCESSLIST)              |
| * connection_host: login host. May optionally specify port number (e.g.       |
|   webhost:12345), which is discarded by the function. This is to support      |
|   immediate input from as specified by PROCESSLIST.                           |
|                                                                               |
|                                                                               |
| EXAMPLES                                                                      |
|                                                                               |
| Find an account matching the given use+host combination:                      |
|                                                                               |
|                                                                               |
|        mysql&gt; SELECT match_grantee('apps', '192.128.0.1:12345') AS            |
|        grantee;                                                               |
|        +------------+                                                         |
|        | grantee    |                                                         |
|        +------------+                                                         |
|        | 'apps'@'%' |                                                         |
|        +------------+                                                         |
|                                                                               |
|                                                                               |
|                                                                               |
| ENVIRONMENT                                                                   |
|                                                                               |
| MySQL 5.1 or newer                                                            |
|                                                                               |
| SEE ALSO                                                                      |
|                                                                               |
| processlist_grantees                                                          |
|                                                                               |
| AUTHOR                                                                        |
|                                                                               |
| Shlomi Noach                                                                  |
|                                                                               |
+-------------------------------------------------------------------------------+</pre>
</blockquote>
<p>I like HTML for documentation. I think it's a good format, provided you don't start doing funny things. Perhaps <em>TROFF</em> is more suitable; certainly more popular on Unix machines. But I already have everything in HTML. So, what do I do?</p>
<p>My decision was to keep documentation in HTML, and use the handy <em>html2text</em> tool to do the job. And it does it pretty well! The sample you see above is an automated translation of HTML to plain text.</p>
<p>I add a few touches of my own: SELECTing long texts is ugly, whether you do it via "<strong>;</strong>" or "<strong>\G</strong>". The <strong>help()</strong> routine breaks the text by '<strong>\n</strong>', returning a multi row result set. The above sample makes for some <strong>60+</strong> rows, nicely formatted, broken from the original single text appearing in the <strong>help</strong> table.</p>
<p>So now you have an internal help method for <em>common_schema</em>, right where the code is. You don't have to leave the command line client in order to get help.</p>
<p><a href="http://datacharmer.blogspot.com/">Giuseppe</a> offered me the idea for this, even while my own thinking about it was in early stages.</p>
<p>The next version of <em>common_schema</em> will be available in a few weeks. The code is pretty much ready. I just need to work on, ahem..., the documentation.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/documentation-in-sql-call-for-help/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More MySQL foreach()</title>
		<link>http://code.openark.org/blog/mysql/more-mysql-foreach</link>
		<comments>http://code.openark.org/blog/mysql/more-mysql-foreach#comments</comments>
		<pubDate>Fri, 02 Dec 2011 13:55:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[Hack]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4171</guid>
		<description><![CDATA[In my previous post I've shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema. In this part I present DBA's handy syntax for schema and table operations and maintenance. Confession: while I love INFORMATION_SCHEMA's power, I just hate writing queries against it. It's just so much typing! Just getting the [...]]]></description>
				<content:encoded><![CDATA[<p>In my <a href="http://code.openark.org/blog/mysql/mysql-foreach">previous post</a> I've shown several generic use cases for <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/foreach.html"><em>foreach()</em></a>, a new scripting functionality introduced in <a href="http://code.google.com/p/common-schema/" rel="nofollow">common_schema</a>.</p>
<p>In this part I present DBA's handy syntax for schema and table operations and maintenance.</p>
<p>Confession: while I love <strong>INFORMATION_SCHEMA</strong>'s power, I just <em>hate</em> writing queries against it. It's just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:</p>
<blockquote>
<pre>SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';</pre>
</blockquote>
<p>When a join is involved this really becomes a nightmare. I think it's cumbersome, and as result, many do not remember the names and meaning of columns, making for <em>"oh, I need to read the manual all over again just to get that query right"</em>. Anyway, that's my opinion.</p>
<p>A <strong>SHOW TABLES</strong> statement is easier to type, but cannot be integrated into a <strong>SELECT</strong> query (though <a href="http://code.openark.org/blog/mysql/reading-results-of-show-statements-on-server-side">we have a partial solution</a> for that, too), and besides, when filtering out the views, the <strong>SHOW</strong> statement becomes almost as cumbersome as the one on <strong>INFORMATION_SCHEMA</strong>.</p>
<p>Which is why <em>foreach()</em> offers handy shortcuts to common iterations on schemata and tables, as follows:</p>
<h4>Use case: iterate all databases</h4>
<blockquote>
<pre>call foreach(<span style="color: #808000;">'schema'</span>, <span style="color: #003366;">'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))'</span>);</pre>
</blockquote>
<p>In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let's filter them:</p>
<h4>Use case: iterate databases by name match</h4>
<blockquote>
<pre>call foreach(<span style="color: #808000;">'schema like wordpress_%'</span>, <span style="color: #003366;">'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL'</span>);</pre>
</blockquote>
<p>The above will only iterate my WordPress databases (I have several of these), performing an <strong>ALTER</strong> on <strong>wp_posts</strong> for each of those databases.<span id="more-4171"></span></p>
<p>I don't have to quote the <em>like</em> expression, but I can, if I wish to.</p>
<p>I can also use a regular expression match:</p>
<blockquote>
<pre>call foreach(<span style="color: #808000;">'schema ~ /^wordpress_[0-9]+$/'</span>, <span style="color: #003366;">'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL'</span>);</pre>
</blockquote>
<h4>Use case: iterate tables in a specific schema</h4>
<p>Time to upgrade our <strong>sakila</strong> tables to InnoDB's compressed format. We use <strong>$()</strong>, a synonym for <em>foreach()</em>.</p>
<blockquote>
<pre>call $(<span style="color: #808000;">'table in sakila'</span>, <span style="color: #003366;">'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED'</span>);</pre>
</blockquote>
<p>The above will iterate on tables in <strong>sakila</strong>. I say <em>tables</em>, since it will avoid iterating views (there is still no specific syntax for views iteration). This is done on purpose, as my experience shows there is very little in common between tables and views when it comes to maintenance and operations.</p>
<h4>Use case: iterate tables by name match</h4>
<p>Here's a interesting scenario: you wish to work on all tables matching some name. The naive approach would be to:</p>
<blockquote>
<pre>SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'wp_posts' AND TABLE_TYPE = 'BASE TABLE'</pre>
</blockquote>
<p><em><strong>Wait!</strong></em> Are you aware this may bring your server down? This query will open all databases at once, opening all <strong>.frm</strong> files (though thankfully not data files, since we only check for name and type).</p>
<p>Here's a better approach:</p>
<blockquote>
<pre>call foreach(<span style="color: #808000;">'table like wp_posts'</span>, <span style="color: #003366;">'ALTER TABLE ${schema}.${table} ENGINE=InnoDB'</span>);</pre>
</blockquote>
<p>(There's now FULLTEXT to InnoDB, so the above can make sense in the near future!)</p>
<p>The good part is that <em>foreach()</em> will look for matching tables <em>one database at a time</em>. It will iterate the list of database, then look for matching tables per database, thereby optimizing the query on <strong>INFORMATION_SCHEMA</strong>.</p>
<p>Here, too, I can use regular expressions:</p>
<blockquote>
<pre>call $(<span style="color: #808000;">'table ~ /^wp_.*$/'</span>, <span style="color: #003366;">'ALTER TABLE ${schema}.${table} ENGINE=InnoDB'</span>);</pre>
</blockquote>
<h4>Conclusion</h4>
<p>This is work in the making, but, as someone who maintains a few productions servers, I've already put it to work.</p>
<p>I'm hoping the syntax is easy to comprehend. I know that since I developed it it must be far more intuitive to myself than to others. I've tried to keep close on common syntax and concepts from various programming languages.</p>
<p>I would like to get as much feedback as possible. I have further ideas and thoughts on the direction <a href="http://code.google.com/p/common-schema/">common_schema</a> is taking, but wish take it in small steps. Your feedback is appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/more-mysql-foreach/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Test-driven SQL development</title>
		<link>http://code.openark.org/blog/mysql/test-driven-sql-development</link>
		<comments>http://code.openark.org/blog/mysql/test-driven-sql-development#comments</comments>
		<pubDate>Thu, 20 Oct 2011 17:55:04 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Ant]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[Stored routines]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4036</guid>
		<description><![CDATA[I'm having a lot of fun writing common_schema, an SQL project which includes views, tables and stored routines. As the project grows (and it's taking some interesting directions, in my opinion) more dependencies are being introduced, and a change to one routine or view may affect many others. This is why I've turned the development [...]]]></description>
				<content:encoded><![CDATA[<p>I'm having a lot of fun writing <a href="http://code.google.com/p/common-schema/" rel="nofollow">common_schema</a>, an SQL project which includes views, tables and stored routines.</p>
<p>As the project grows (and it's taking some interesting directions, in my opinion) more dependencies are being introduced, and a change to one routine or view may affect many others. This is why I've turned the development on <em>common_schema</em> to be <em>test driven</em>.</p>
<p>Now, just how do you test drive an SQL project?</p>
<p>Well, much like the way you test any other project in your favorite programming language. If its functions you're testing, that's all too familiar: functions get some input and provide some output. Hmmm, they might be changing SQL data during that time. With procedures it's slightly more complex, since they do not directly return output but result sets.</p>
<p>Here's the testing scheme I use:<span id="more-4036"></span></p>
<ul>
<li>Tests are divided to families. For example, there is a family of tests for the <em>eval()</em> function.</li>
<li>Each test in a family is responsible for checking the simplest, most "atomic" issue. This means many small tests.</li>
<li>Each test can have a <em>"pre-test"</em> step, which prepares the ground (for example, create a table and populate it)</li>
<li>Likewise, a test can have a <em>"post-test"</em> step, which is typically just cleanup code (since the test is already complete by the time the post step is invoked).</li>
<li>Each test is an SQL file: a set of commands to be executed.</li>
<li>A test may have an <em>"expected output"</em> file.</li>
</ul>
<ul>
<li>If no explicit <em>expected</em> exists, the test is expected to return <strong>"1"</strong> (just as the most basic <em>JUnit</em> test assumes an "assert true")</li>
<li>A test family may also have <em>pre-</em> and <em>post-</em> steps.</li>
<li>Any failure in any step fails the entire process. Failures may include:
<ul>
<li>Failure to prepare the grounds for a test or family of tests</li>
<li>Failure in executing the test</li>
<li>Mismatch between test's output and expected result.</li>
<li>Failure in executing the <em>post-</em> step (may indicate yet invalid test result not intercepted by the test)</li>
</ul>
</li>
</ul>
<h4>An example</h4>
<p>The following image presents a single test family: the <em>eval</em> family, testing the <em>eval()</em> routine.</p>
<blockquote><p><a href="http://code.openark.org/blog/wp-content/uploads/2011/10/test-driven-sql-development-01.png"><img class="size-full wp-image-4205 alignnone" title="test-driven-sql-development-01" src="http://code.openark.org/blog/wp-content/uploads/2011/10/test-driven-sql-development-01.png" alt="Test driven SQL development - sample" width="198" height="258" /></a></p></blockquote>
<ul>
<li>In this family, there are two tests.</li>
<li>In both tests, we have a <em>pre-test</em> step, and a test.</li>
<li>We have no <em>post-test</em> here.</li>
<li>Nor do we have an <em>expected-output</em> sample, which means the tests expect to return with <strong>"1"</strong>.</li>
</ul>
<h4>Implementation</h4>
<p>But how are tests conducted? Via <em>mysql</em>, of course. While tests are plain SQL text file, they are being executed against a running MySQL server using the <em>mysql</em> client. It is given the test files as input, and its output is directed to file as well.</p>
<p>This makes it very easy to code the test using a simple shell script. It takes a small measure of file iteration, process invocation, exit code check, and <em>diff</em> execution.</p>
<p>For example, to test <em>eval()</em>'s <strong>01</strong> test, we first execute <em>mysql</em> with <strong>01/pre.sql</strong> as input. Assuming success, we execute <em>mysql</em> again, this time with <strong>01/test.sql</strong>. We capture the output of this execution, and compare it with <em>expected-output</em>, or with <strong>"1"</strong> when no <em>expected-output</em> specified.</p>
<h4>Tests pass, or no code!</h4>
<p>Some <strong>12</strong> years ago, I worked with a less-known version system called <a href="http://aegis.sourceforge.net/documents.html">aegis</a>. The thing I remember most from <em>aegis</em> was that it had a good tests infrastructure. Long before "test-driven development" was coined, or was even commonly practiced, <em>aegis</em> supported tests right into your version control. "Right into", in the sense that <em>you could not merge your code back to the baseline</em> if it didn't pass all of the tests.</p>
<p>I work with SVN for <em>common_schema</em>, and I do not know of such an option in SVN. But I also use <em>ant</em> to build this project, and the dependency is clear there: <strong>ant dist</strong>, my target which creates the distribution files, is dependent on <strong>ant test</strong>, the target which works out the tests.</p>
<p>That is, you cannot generate the distribution files when tests fail.</p>
<h4>Further notes</h4>
<p>Since I'm now retroactively patching tests for already existing functionality, calling it <em>test-driven</em> development is an overstatement; nevertheless new tests are already proving invaluable when I keep changing and improving existing code. Suddenly dependent functionality no longer works as expected. What fun!</p>
<p><a href="http://code.google.com/p/common-schema/source/browse/trunk/common_schema/tests/test_all.sh">The code</a> for the testing suite is actually much shorter than this blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/test-driven-sql-development/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>oak-hook-general-log: your poor man&#039;s Query Analyzer</title>
		<link>http://code.openark.org/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer</link>
		<comments>http://code.openark.org/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer#comments</comments>
		<pubDate>Wed, 15 Dec 2010 17:46:06 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3032</guid>
		<description><![CDATA[The latest release of openark kit introduces oak-hook-general-log, a handy tool which allows for some analysis of executing queries. Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings. With this [...]]]></description>
				<content:encoded><![CDATA[<p>The latest release of <a href="http://code.openark.org/forge/openark-kit">openark kit</a> introduces <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-hook-general-log.html">oak-hook-general-log</a>, a handy tool which allows for some analysis of executing queries.</p>
<p>Initially I just intended for the tool to be able to dump the general log to standard output, from any machine capable to connect to MySQL. Quick enough, I realized the power it brings.</p>
<p>With this tool, one can dump to standard output all queries using temporary tables; or using a specific index; or doing a full index scan; or just follow up on connections; or... For example, the following execution will only log queries which make for filesort:</p>
<blockquote>
<pre>oak-hook-general-log --user=root --host=localhost --password=123456 --filter-explain-filesort</pre>
</blockquote>
<h4>The problem with using the standard logs</h4>
<p>So you have the <em>general log</em>, which you don't often enable, since it tends to grow huge within moments. You then have the <em>slow log</em>. Slow log is great, and is among the top tools for MySQL diagnosis.</p>
<p>The slow log allows for <strong>log-queries-not-using-indexes</strong>, which is yet another nice feature. Not only should you log any query running for over <strong>X</strong> seconds, but also log any query which does not use an index.</p>
<p>Wait. This logs all single-row tables (no single row table will use an index), as well as very small tables (a common <strong>20</strong> rows lookup table will most often be scanned). These are OK scans. This makes for some noise in the slow log.</p>
<p>And how about queries which do use an index, but do so poorly? They use an index, but retrieve some <strong>12,500,000</strong> rows, <em>using temporary</em> table &amp; <em>filesort</em>?</p>
<h4>What oak-hook-general-log does for you</h4>
<p>This tool streams out the general log, and filters out queries based on their <em>role</em> or on their <em>execution plan</em>.</p>
<p>To work at all, it must enable the general log. Moreover, it directs the general log to log table. Mind that this makes for a performance impact, which is why the tool auto-terminates and restores original log settings (default is <strong>1</strong> minute, configurable). It's really not a tool you should keep running for days. But during the few moments it runs, it will:</p>
<ul>
<li>Routinely rotate the <strong>mysql.general_log</strong> table so that it doesn't fill up</li>
<li>Examine entries found in the general log</li>
<li>Cross reference entries to the PROCESSLIST so as to deduce database context (<a href="http://bugs.mysql.com/bug.php?id=52554">bug #52554</a>)</li>
<li>If required and appropriate, evaluate a query's execution plan</li>
<li>Decide whether to dump each entry based on filtering rules</li>
</ul>
<h4>Filtering rules</h4>
<p>Filtering rules are passed as command line options. At current, only one filtering rule applies (if more than one specified only one is used, so no point in passing more than one). Some of the rules are:<span id="more-3032"></span></p>
<ul>
<li><strong>filter-connection</strong>: only log connect/quit entries</li>
<li><strong>filter-explain-fullscan</strong>: only log full table scans</li>
<li><strong>filter-explain-temporary</strong>: only log queries which create implicit temporary tables</li>
<li><strong>filter-explain-rows-exceed</strong>: only log queries where more than <strong>X</strong> number of rows are being accessed on some table (estimated)</li>
<li><strong>filter-explain-total-rows-exceed</strong>: only log queries where more than <strong>X</strong> number of rows are accessed on all tables combined (estimated, with possibly incorrect numbers on some queries)</li>
<li><strong>filter-explain-key</strong>: only log queries using a specific index. This feature somewhat overlaps with Maatkit's <em>mk-index-usage</em> (read <a href="http://www.mysqlperformanceblog.com/2010/11/11/advanced-index-analysis-with-mk-index-usage/">announcement</a>).</li>
<li><strong>filter-explain-contains</strong>: a general purpose <em>grep</em> on the execution plan. Log queries where the execution plan contains <em>some text</em>.</li>
</ul>
<p>There are other filters, and I will possibly add more in due time.</p>
<p>Here are a couple cases I used <em>oak-hook-general-log</em> for:</p>
<h4>Use case: temporary tables</h4>
<p>I have a server with this alarming chart (courtesy <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a>) of temporary tables:</p>
<blockquote>
<pre><img class="alignnone" title="Created tmp tables per second" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=370x180&amp;chts=303030,12&amp;chtt=Latest+24+hours:+Dec+9,+06:30++-++Dec+10,+06:30&amp;chf=c,s,ffffff&amp;chdl=created_tmp_tables_psec|created_tmp_disk_tables_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4&amp;chd=s:yzzy02zzz100zzz0rv9zz0zyzyz0yy2xz1t11xzztz0xr1xt2tz07vwzz100100z31z111yz1vzzzzz1zs80r902s1111010y20z03z11487zz011z11011002w0q5rxxz0y00z0s02xy1yy0,gggfghggfgggghhgYekhhghhhhhghfjghhdihfhgdghgZhgcicihpcehhhhhhhifkigjihghjehgiigjgYqiYqgiaihiifkhekhfijgiihhggggggggggfhgghffZoYgggggggggdihfggghg&amp;chxt=x,y&amp;chxr=1,0,35.060000&amp;chxl=0:||08:00||+||12:00||+||16:00||+||20:00||+||00:00||+||04:00||+|&amp;chxs=0,505050,10,0,lt&amp;chg=4.17,25,1,2,2.08,0&amp;chxp=0,2.08,6.25,10.42,14.59,18.76,22.93,27.10,31.27,35.44,39.61,43.78,47.95,52.12,56.29,60.46,64.63,68.80,72.97,77.14,81.31,85.48,89.65,93.82,97.99&amp;tsstart=2010-12-09+06:30:00&amp;tsstep=600" alt="" width="370" height="180" />
</pre>
</blockquote>
<p>What could possibly create <strong>30</strong> temporary tables per second on average?</p>
<p>The slow log produced nothing helpful, even with <strong>log-queries-not-using-indexes</strong> enabled. There were a lot of queries not using indexes there, but nothing at these numbers. With:</p>
<blockquote>
<pre>oak-hook-general-log --filter-explain-temporary</pre>
</blockquote>
<p>enabled for <strong>1</strong> minute, nothing came out. Weird. Enabled for <strong>5</strong> minutes, I got one entry. Turned out a scheduled script, acting once per <strong>5</strong> minutes, was making a single complicated query involving many nested views, which accounted for some <em>hundreds</em> of temporary tables created. All of them very small, query time was very fast. There is no temporary tables problem with this server, case closed.</p>
<h4>Use case: connections</h4>
<p>A server had issues with some exceptions being thrown on the client side. There was a large number of new connections created per second although the client was using a connection pool. Suspecting the pool didn't work well, I issued:</p>
<blockquote>
<pre>oak-hook-general-log --filter-connect</pre>
</blockquote>
<p>The pool was working well, all right. No entries for that client were recorder in <strong>1</strong> minute of testing. However, it turned out some old script was flooding the MySQL server with requests, every second. The log showed root@somehost, and sure enough, the script was disabled. Exceptions were due to another reason; it was good to eliminate a suspect.</p>
<p>Some of the tool's use case is relatively easy to solve with tail, grep &amp; awk; others are not. I am using it more and more often, and find it to make significant shortcuts in tracking down queries.</p>
<h4>Get it</h4>
<p>Download the tool as part of <em>openark kit</em>: access the <a href="http://code.google.com/p/openarkkit/">openark kit project page</a>.</p>
<p>Or get the <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/src/oak/oak-hook-general-log.py">source code</a> directly.</p>
<p>Feedback is most welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/oak-hook-general-log-your-poor-mans-query-analyzer/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>openark-kit (rev. 170): new tools, new functionality</title>
		<link>http://code.openark.org/blog/mysql/openark-kit-rev-170-new-tools-new-functionality</link>
		<comments>http://code.openark.org/blog/mysql/openark-kit-rev-170-new-tools-new-functionality#comments</comments>
		<pubDate>Wed, 15 Dec 2010 06:31:24 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Replication]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3124</guid>
		<description><![CDATA[I'm pleased to announce a new release of the openark kit. There's a lot of new functionality inside; following is a brief overview. The openark kit is a set of utilities for MySQL. They solve everyday maintenance tasks, which may be complicated or time consuming to work by hand. It's been a while since the [...]]]></description>
				<content:encoded><![CDATA[<p>I'm pleased to announce a new release of the <a href="http://code.openark.org/forge/openark-kit">openark kit</a>. There's a lot of new functionality inside; following is a brief overview.</p>
<p>The <em>openark kit</em> is a set of utilities for MySQL. They  solve everyday maintenance tasks, which may be complicated or time  consuming to work by hand.</p>
<p>It's been a while since the last announced release. Most of my attention was on <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a>, building new features, writing documentation etc. However my own use of <em>openark kit</em> has only increased in the past few months, and there's new useful solutions to common problems that have been developed.</p>
<p>I've used and improved many tools over this time, but doing the final cut, along with proper documentation, took some time. Anyway, here are the highlights:</p>
<h4>New tool: oak-hook-general-log</h4>
<p><em>oak-hook-general-log</em> hooks up a MySQL server and dumps the general log based on filtering rules, applying to query role or execution plan. It is possible to only dump connect/disconnect entries, queries which make a full table scan, or use temporary tables, or scan more than X number of rows, or...</p>
<p>I'll write more on this tool shortly.</p>
<h4>New tool: oak-prepare-shutdown</h4>
<p>This tool makes for an orderly and faster shutdown by safely stopping replication, and flushing InnoDB pages to disk prior to shutting down (keeping server available for connections even while attempting to flush dirty pages to disk). A typical use case would be:</p>
<blockquote>
<pre>oak-prepare-shutdown --user=root --ask-pass --socket=/tmp/mysql.sock &amp;&amp; /etc/init.d/mysql stop</pre>
</blockquote>
<h4>New tool: oak-repeat query</h4>
<p><em>oak-repeat-query</em> repeats executing a given query until some condition holds. The condition can be:</p>
<ul>
<li>Number of given iterations has been reached</li>
<li>Given time has elapsed</li>
<li>No rows have been affected by query</li>
</ul>
<p>The tool comes in handy for cleanup jobs, warming up caches, etc.<span id="more-3124"></span></p>
<h4>New tool: oak-get-slave-lag</h4>
<p>This simple tool just returns the number of seconds a slave is behind master. But it also returns with an appropriate exit code, based on a given threshold: <strong>0</strong> when lag is good, <strong>1</strong> (error exit code) when lag is too great or slave fails to replicate.</p>
<p>This tool has been used by 3rd party applications, such as a load balancer, to determine whether a slave should be accessed.</p>
<h4>Updated tool: oak-chunk-update</h4>
<p>This extremely useful utility breaks down very long queries into smaller chunks. These could be queries which should affect a huge amount of rows, or queries which cannot utilize an index.</p>
<p>Updates to the tool include limiting the range of rows the tool scans, by specifying start and stop position (either by providing constant values or by SELECT query). Also added is auto-termination when no rows are found to be affected. Last, it is possible to override INFORMATION_SCHEMA lookup by explicitly specifying chunking key.</p>
<p>This tool works great for your daily/weekly/monthly batch jobs; in creating DWH tables; populating new columns; purging old entries; clearing data based on non-indexed values; generating summary tables; and more.</p>
<h4>Frozen tool: oak-apply-ri</h4>
<p>I haven't been using this tool for a while. The main work down by this tool can be done with <em>oak-chunk-update</em>. There are some additional safety checks <em>oak-apply-ri</em> provides; I'm thinking over if they justify the tool's existence.</p>
<h4>Frozen tool: oak-online-alter-table</h4>
<p>With the appearance of Facebook’s <a href="http://www.facebook.com/note.php?note_id=430801045932">Online Schema Change</a> (OSC) tool, which derives from <em>oak-online-alter-table</em>, I'm not sure I will continue developing the tool. I intend to wait for general feedback on OSC before making a decision.</p>
<h4>Documentation</h4>
<p><a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/introduction.html">Documentation</a> is now part of <em>openark kit</em>'s SVN repository.</p>
<h4>Download</h4>
<p>The <em>openark kit</em> project is currently hosted by Google Code.  Downloads are available at the Google Code <a href="http://code.google.com/p/openarkkit/">openark kit project page</a>.</p>
<p>Downloads are available in the following packaging formats:</p>
<ul>
<li><strong>.deb</strong> package, to be installed on <em>debian</em>, <em>ubuntu</em> and otherwise debian based distributions.</li>
<li><strong>.rpm</strong> package, architecture free (<em>noarch</em>), for RPM supporting Linux distributions such as <em>RedHat</em>, <em>Fedora</em>, <em>CentOS</em> etc.</li>
<li><strong>.tar.gz</strong> using python's distutils installer.</li>
<li><strong>source</strong>, directly retrieved from SVN or from above python package.</li>
<li>Some distribution specific <a href="http://software.opensuse.org/search?baseproject=ALL&amp;p=1&amp;q=openark-kit">RPM packages</a>, courtesy Lenz Grimmer.</li>
</ul>
<h4>Feedback</h4>
<p>Your feedback is welcome! I may not always respond promptly; and I confess that some bugs were left open for more than I would have liked them to. I hope to make for good quality of code, and bug reporting is one major factor you can control.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/openark-kit-rev-170-new-tools-new-functionality/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Thoughts and ideas for Online Schema Change</title>
		<link>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change</link>
		<comments>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change#comments</comments>
		<pubDate>Thu, 07 Oct 2010 08:29:10 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[openark kit]]></category>
		<category><![CDATA[Opinions]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Schema]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[Triggers]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3005</guid>
		<description><![CDATA[Here's a few thoughts on current status and further possibilities for Facebook's Online Schema Change (OSC) tool. I've had these thoughts for months now, pondering over improving oak-online-alter-table but haven't got around to implement them nor even write them down. Better late than never. The tool has some limitations. Some cannot be lifted, some could. [...]]]></description>
				<content:encoded><![CDATA[<p>Here's a few thoughts on current status and further possibilities for Facebook's <a href="http://www.facebook.com/note.php?note_id=430801045932">Online Schema Change</a> (OSC) tool. I've had these thoughts for months now, pondering over improving <a href="../../forge/openark-kit/oak-online-alter-table">oak-online-alter-table</a> but haven't got around to implement them nor even write them down. Better late than never.</p>
<p>The tool has some limitations. Some cannot be lifted, some could. Quoting from the <a href="http://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932">announcement</a> and looking at the code, I add a few comments. I conclude with a general opinion on the tool's abilities.</p>
<h4>"The original table must have PK. Otherwise an error is returned."</h4>
<p>This restriction could be lifted: it's enough that the table has a UNIQUE KEY. My original <em>oak-online-alter-table</em> handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.</p>
<p>However, this restriction is of no real interest. As we're mostly interested in InnoDB tables, and since any InnoDB table <em>should have</em> a PRIMARY KEY, we shouldn't care too much.</p>
<h4>"No foreign keys should exist. Otherwise an error is returned."</h4>
<p>Tricky stuff. With <em>oak-online-alter-table</em>, changes to the original table were immediately reflected in the <em>ghost</em> table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn't be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).</p>
<p>The Facebook patch works differently: it captures changes and writes them to a <strong>delta</strong> table,  to be later (asynchronously) analyzed and make for a <em>replay</em> of actions on the <em>ghost</em> table.<span id="more-3005"></span></p>
<p>So in the Facebook code, some cases will lead to undesired behavior. Consider two tables, <strong>country</strong> and <strong>city</strong>, with city holding a RESTRICT/NO ACTION foreign key on <strong>country</strong>'s id. Now consider the scenario:</p>
<ol>
<li>Rows from <strong>city</strong> are DELETEd, where the country Id is Spain's.
<ul>
<li><strong>city</strong>'s ghost table is still unaffected, Spain's cities are still there.</li>
<li>A change is written to the delta table to mark these rows for deletion.</li>
</ul>
</li>
<li>A DELETE is issued on <strong>country</strong>'s Spain record.
<ul>
<li>The DELETE should work, from the user's perspective</li>
<li>But it will fail: city's ghost table has not received the changes yet. There's still matching rows. The NO ACTION constraint will fail the DELETE statement.</li>
</ul>
</li>
</ol>
<p>Now, this does not lead to corruption, just to seemingly unreasonable behavior on the database part. This behavior is probably undesired. NO ACTION constraint won't do.</p>
<p>However, with CASCADE or SET NULL options, there is less of an issue: operations on the parent table (e.g. <strong>country</strong>) cannot fail. We must make sure operations on the ghost table make it consistent with the original table (e.g. <strong>city</strong>).</p>
<p>Consider the following scenario:</p>
<ol>
<li>A new country is created, called "Sleepyland". An INSERT is made to <strong>country</strong>.
<ul>
<li>Both <strong>city</strong> and <strong>city</strong>'s ghost are immediately aware of it.</li>
</ul>
</li>
<li>A new town is created and INSERTed to <strong>city</strong>. The town is called "Naphaven".
<ul>
<li>The change takes time to propagate to <strong>city</strong>'s ghost table.</li>
</ul>
</li>
<li>Meanwhile, we realized we made a mistake. We've been had. There's no such city nor country.
<ol>
<li>We DELETE "Naphaven" from <strong>city</strong>.</li>
<li>We DELETE "Sleepyland" from <strong>country</strong>.</li>
</ol>
<ul>
<li>Note that <strong>city</strong>'s ghost table still hasn't caught up with the changes.</li>
</ul>
</li>
<li>Eventually, the INSERT statement for "Naphaven" reaches <strong>city</strong>'s ghost table.
<ul>
<li>What should happen now? The INSERT cannot succeed.</li>
<li>Will this fail the entire process?</li>
</ul>
</li>
</ol>
<p>Looking at the PHP code, I see that changes written on the <strong>delta</strong> table are blindly replayed on the ghost table.</p>
<p>Since the process is asynchronous, this should not be the case. We can solve the above if we use INSERT IGNORE instead of INSERT. The statement will fail without failing anything else. The row cannot exist, and that's because the original row does not exist anymore.</p>
<p>Unlike a replication corruption, this does not lead to accumulation mistakes. The <strong>replay</strong> is static, somewhat like in <em>binary log format</em>. Changes are <em>just written</em>, regardless of existing data.</p>
<p>I have given this considerable thought, and I can't say I've covered all the possible scenario. However I believe that with proper use of INSERT IGNORE and REPLACE INTO (two statements I heavily relied on with <em>oak-online-alter-table</em>), correctness can be achieved.</p>
<p>There's the small pain of re-generating the foreign key definition on the "ghost" table (<strong>CREATE TABLE LIKE ...</strong> does not copy FK definitions). And since foreign key names are unique, a new name must be picked up. Not pretty, but perfectly doable.</p>
<h4>"No AFTER_{INSERT/UPDATE/DELETE} triggers must exist."</h4>
<p>It would be nicer if MySQL had an ALTER TRIGGER statement. There isn't such statement. If there were such an atomic statement, then we would be able to rewrite the trigger, so as to add our own code to the <em>end of the trigger's code</em>. Yuck. Would be even nicer if we were <a href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-ii">allowed to have multiple triggers</a> of same event.</p>
<p>So, we are left with DROP and CREATE triggers. Alas, this makes for a short period where the trigger does not exist. Bad. The easy solution would be to LOCK WRITE the table, but apparently you can't DROP the trigger (*) when the table is locked. Sigh.</p>
<p>(*) Happened to me, apparently to Facebook too; With latest 5.1 (5.1.51) version this actually works. With 5.0 it didn't use to; this needs more checking.</p>
<h4>Use of INFORMATION_SCHEMA</h4>
<p>As with oak-online-alter-table, the OSC checks for triggers, indexes, column by searching on the INFORMATION_SCHEMA tables. This makes for nice SQL for getting the exact listing and types of PRIMARY KEY columns, whether or not AFTER triggers exist, and so on.</p>
<p>I've always considered this to be the weak part of <a href="http://code.openark.org/forge/openark-kit">openark-kit</a>, that it relies on INFORMATION_SCHEMA so much. It's easier, it's cleaner, it's even <em>more correct</em> to work that way -- but it just puts too much locks. I think Baron Schwartz (and now Daniel Nichter) did amazing work on analyzing table schemata by parsing the SHOW CREATE TABLE and other SHOW commands regex-wise with <a href="http://www.maatkit.org/">Maatkit</a>. It's a crazy work! Had I written <em>openark-kit</em> in Perl, I would have just import their code. But I'm too <span style="text-decoration: line-through;">lazy</span> busy to do the conversion from Perl to Python, and rewrite that code, what with all the debugging.</p>
<p>OSC is written in PHP. Again, much conversion work. I think performance-wise this is an important step to make.</p>
<h4>A word for the critics</h4>
<p>Finally, a word for the critics. I've read some Facebook/MySQL bashing comments and wish to relate.</p>
<p>In his <a href="http://www.theregister.co.uk/2010/09/21/facebook_online_schema_change_for_mysql/">interview to The Register</a>, Mark Callaghan gave the example that "Open Schema Change lets the company update indexes without user downtime, according to Callaghan".</p>
<p>PostgreSQL was mentioned for being able to add index with only read locks taken, or being able to do the work with no locks using CREATE INDEX CONCURRENTLY. I wish MySQL had that feature! Yes, MySQL has a lot to improve upon, and the latest PostgreSQL 9.0 brings valuable new features. (Did I make it clear I have no intention of bashing PostgreSQL? If not, please re-read this paragraph until convinced).</p>
<p>Bashing related to the notion of MySQL being so poor that Facebook used an even poorer mechanism to work out the ALTER TABLE.</p>
<p>Well, allow me to add a few words: the CREATE INDEX is by far not the only thing you can achieve with OSC (although it may be Facebook's major concern). You should be able to:</p>
<ul>
<li>Add columns</li>
<li>Drop columns</li>
<li>Convert character sets</li>
<li>Modify column types</li>
<li>Add partitioning</li>
<li>Reorganize partitioning</li>
<li>Compress the table</li>
<li>Otherwise changing table format</li>
<li>Heck, you could even modify the storage engine! (To other transactional engine)</li>
</ul>
<p>These are giant steps. How easy would it be to write these down into the database? It only takes a few weeks time to work out a working solution with reasonable limitations, just using the resources the MySQL server provides you with. The <a href="http://www.facebook.com/MySQLatFacebook">MySQL@Facebook team</a> should be given credit for that.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/thoughts-and-ideas-for-online-schema-change/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part II</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii#comments</comments>
		<pubDate>Thu, 12 Aug 2010 03:24:06 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2801</guid>
		<description><![CDATA[Continuing Table refactoring &#38; application version upgrades, Part I, we now discuss code &#38; database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both. As I'll show, DROP operations are significantly simpler than creation operations. Interestingly, it's [...]]]></description>
				<content:encoded><![CDATA[<p>Continuing <a href="http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i">Table refactoring &amp; application version upgrades, Part I</a>, we now discuss code &amp; database upgrades which require <strong>DROP</strong> operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.</p>
<p>As I'll show, DROP operations are significantly simpler than creation operations. Interestingly, it's the same as in life.</p>
<h4>DROP COLUMN</h4>
<p>A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Remove all references to column; make sure no queries use said column.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP COLUMN</strong>.</li>
</ol>
<h4>DROP INDEX</h4>
<p>A possibly simpler case here. Why would you drop an index? Is it because you found out you never use it anymore? Then all you have to do is just drop it.</p>
<p>Or perhaps you don't need the functionality the index supports anymore? Then first drop the functionality:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>. Check out InnoDB Plugin here.<span id="more-2801"></span></li>
</ol>
<h4>DROP UNIQUE INDEX</h4>
<p>When using Master-Slave failover for table refactoring, we're now removing a constraint from the slave. Since the master is more constrained than the slave, there is no problem here. It's mostly the same as with a normal DROP INDEX, with a minor addition:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that inserts duplicates.</li>
</ol>
<h4>DROP FOREIGN KEY</h4>
<p>Again, we are removing a constraint.</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that conflicts with removed constraint. I mean, if you really know what you are doing.</li>
</ol>
<h4>DROP TABLE</h4>
<p>The very simple steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Make sure no reference to table is made.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong>. Issue a <strong>DROP TABLE</strong>.</li>
</ol>
<p>With <strong>ext3</strong> dropping a large table is no less than a nightmare. Not only does the action take long time, it also locks down the table cache, which very quickly leads to having dozens of queries hang. <strong>xfs</strong> is a good alternative.</p>
<h4>Conclusion</h4>
<p>We looked at single table operations, coupled with application upgrades. By carefully looking at the process breakdown, multiple changes can be addressed with ease and safety. Not all operations are completely safe when used with replication failover. But they are mostly safe if you have some trust in your code.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part I</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i#comments</comments>
		<pubDate>Tue, 10 Aug 2010 12:36:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2775</guid>
		<description><![CDATA[A developer's major concern is: How do I do application &#38; database upgrades with minimal downtime? How do I synchronize between a DB's version upgrade and an application's version upgrade? I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to [...]]]></description>
				<content:encoded><![CDATA[<p>A developer's major concern is: <em>How do I do application &amp; database upgrades with minimal downtime? How do I synchronize between a DB's version upgrade and an application's version upgrade?<br />
</em></p>
<p>I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.</p>
<h4>Reader prerequisites</h4>
<p>I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an <strong>ALTER</strong> command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of <a href="http://mysql-mmm.org/">MMM for MySQL</a>. When I describe "Failover from <strong>M1</strong> to <strong>M2</strong>", I mean "Make the <strong>ALTER</strong> changes on <strong>M2</strong> (passive), then switch your application from <strong>M1</strong> to <strong>M2</strong> (change of IPs, VIP, etc.), promoting <strong>M2</strong> to active position, then apply same changes on <strong>M1</strong> (now passive) or completely rebuild it".</p>
<p>Phew, a one sentence description of M-M usage...</p>
<p>I also assume the reader's understanding that a table's schema can be different on master &amp; slave, which is the basis for the "use replication for refactoring" trick. But it cannot be too different, or, to be precise, the two schemata must both support the ongoing queries for the table.</p>
<p>A full discussion of the above is beyond the scope of this post.</p>
<h4>Types of refactoring needs</h4>
<p>As I limit this discussion to single table refactoring,we can look at major refactoring operations and their impact on application &amp; upgrades. We will discuss ADD/DROP COLUMN, ADD/DROP INDEX, ADD/DROP UNIQUE INDEX, ADD/DROP FOREIGN KEY, ADD/DROP TABLE.</p>
<p>We will assume the database and application are both in Version #1 (<strong>V1</strong>), and need to be upgraded to <strong>V2</strong> or greater.<span id="more-2775"></span></p>
<h4>ADD INDEX</h4>
<p>Starting with the easier actions. Why would you add an index? Either:</p>
<ol>
<li>There is some existing query which can be optimized by the new query</li>
<li>Or there is some new functionality which issues a query for which the new index is required.</li>
</ol>
<p>Adding an index is an easy action in that the table's data does not really change.</p>
<p>In case <strong>#1</strong>, all you need to do is to add the new index (if the table is large, fail over from <strong>M1</strong> to <strong>M2</strong>). There is no application upgrade, so all that happens is that the database upgrades <strong>V1 </strong>-&gt;<strong> V2</strong>.</p>
<p>In case <strong>#2</strong>, the database must be prepared with new schema before the new functionality/query is introduced (since it depends on the existence of the index). The steps, therefore, are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>)</li>
<li>(Sometime later) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will issue queries which utilize the new index.</li>
</ol>
<p>The application does not have to be upgraded at the same instant the DB gets upgraded. In fact, we'll see that this is a typical scenario: we can separate upgrades into smaller steps, which allow for time lapse. One <em>could</em> work out steps <strong>1</strong> &amp; <strong>2</strong> together, but that would take an extra effort.</p>
<h4>ADD COLUMN</h4>
<p>This must be one of the most common table schema upgrades: a new property is needed on the application side. It must be supported by the database. Perhaps a new field in some Java Object, with Hibernate mapping that field onto a new column. Or maybe the new column is there for purpose of de-normalization.</p>
<p>This is also a more complicated task. Let's look at the required steps:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD COLUMN</strong>.</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change is: provide column value for newly <strong>INSERT</strong>ed rows.</li>
<li>If needed, retroactively update column values for all pre-existing rows.</li>
<li>App: <strong>V2</strong> -&gt; <strong>V3</strong>. Application begins to use (read, <strong>SELECT</strong>) new column.</li>
</ol>
<p>The above procedure assumes that the new column must have some calculated value. A 10-million rows table must now be updated, to have the correct values filled in. So we ask of the application to start filling in data for new rows, which makes the invalid row set static. We can just take a "from row" and a "to row" and fill in the missing column's value for those rows. Only when all rows contain valid values can we let the application start using that row. This makes for <em>two</em> application upgrades.</p>
<p>If you're content with just a static <strong>DEFAULT</strong> value, then step <strong>3</strong> can be skipped, and step <strong>4</strong> can be merged with step <strong>2</strong>.</p>
<h4>ADD UNIQUE INDEX</h4>
<p>This is an altogether different case than the normal <strong>ADD INDEX</strong>, even though they may seem similar. And the case is particularly different when using Master-Slave failover for rebuilding the table.</p>
<p>Consider the case where we add a <strong>UNIQUE INDEX</strong> on a slave. Some <strong>INSERT</strong> query executes on the master, successfully, and is logged to the binary log. The slave picks it up, tries to execute it, to find that it fails on a DUPLICATE KEY error.</p>
<p>The <strong>UNIQUE INDEX</strong> is a constraint, and it makes the slave more constrained than the master. This is a delicate situation. Here how to (mostly) work it out:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change <strong>INSERT</strong> queries on relevant table to <strong>INSERT IGNORE</strong> or <strong>REPLACE</strong> queries, whichever is more appropriate.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD UNIQUE KEY</strong> (and while at it, a tip: are you aware of <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html">ALTER IGNORE TABLE</a>?)</li>
</ol>
<p>The change of query ensures that the query will succeed on the slave (either by silently doing nothing or by actually replacing content). It also means that the slave can now have different data than the master. Of course, it you trust your application to never <strong>INSERT</strong> duplicates, you can sleep better.</p>
<p>We do not handle <strong>UPDATE</strong> statements here.</p>
<h4>ADD CONSTRAINT FOREIGN KEY</h4>
<p>As with <strong>ADD UNIQUE INDEX</strong>, there is a new constraint here. A slave becomes more constrained than the master. But we now have to make sure <strong>INSERT</strong>, <strong>UPDATE</strong> and <strong>DELETE</strong> statements all go peacefully (well, it also depends on the type of <strong>ON DELETE</strong> and <strong>ON UPDATE</strong> property of the FK).</p>
<p>The steps would be:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD CONSTRAINT FOREIGN KEY</strong>.</li>
</ol>
<p>And then cross your fingers or have trust in your application. If the table is small enough, one does not have to use replication to do the refactoring, and life is simpler. Just execute the <strong>ALTER</strong> on the active master, and continue with your life.</p>
<h4>CREATE TABLE</h4>
<p>This is a simple case, since the table is new. The steps are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (no need to use slaves here)</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will start using new table.</li>
</ol>
<h4>Conslusion</h4>
<p>Having such steps formalized help with development management and database management. It makes clear what is expected of the application, and what is expected of the database. The breaking down of these operations into sequential steps allows us to work more slowly; make preparation work; work within our own working hours; get a chance to see the family.</p>
<p>In this post we took a look at "creation" refactoring changes. New columns, new keys, new constraints. In the next part of this article, we'll discuss <strong>DROP</strong> operations.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SQL: good comments conventions</title>
		<link>http://code.openark.org/blog/mysql/sql-good-comments-conventions</link>
		<comments>http://code.openark.org/blog/mysql/sql-good-comments-conventions#comments</comments>
		<pubDate>Thu, 01 Jul 2010 07:36:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2581</guid>
		<description><![CDATA[I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code. I list four major places where SQL comments are helpful. I'll use the sakila database. It is originally scarcely commented; I'll present it now enhanced with comments, to illustrate. Table definitions The CREATE TABLE statement [...]]]></description>
				<content:encoded><![CDATA[<p>I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.</p>
<p>I list four major places where SQL comments are helpful. I'll use the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database. It is originally scarcely commented; I'll present it now enhanced with comments, to illustrate.</p>
<h4>Table definitions</h4>
<p>The <strong>CREATE TABLE</strong> statement allows for a comment, intended to describe the nature of the table:</p>
<blockquote>
<pre>CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 <strong>COMMENT='Reflection of `film`, used for FULLTEXT search.'</strong>
</pre>
</blockquote>
<p>It's too bad the comment's max length is 60 characters, though. However, it's a very powerful field.</p>
<h4>Column definitions</h4>
<p>One may comment particular columns:<span id="more-2581"></span></p>
<blockquote>
<pre>CREATE TABLE `film` (
 `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(255) NOT NULL,
 `description` text,
 `release_year` year(4) DEFAULT NULL,
 `language_id` tinyint(3) unsigned NOT NULL <strong>COMMENT 'Soundtrack spoken language'</strong>,
 `original_language_id` tinyint(3) unsigned DEFAULT NULL <strong>COMMENT 'Filmed spoken language'</strong>,
 `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
 `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
 `length` smallint(5) unsigned DEFAULT NULL,
 `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
  ...
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
</pre>
</blockquote>
<h4>Stored routines definitions</h4>
<p>Here's an original <strong>sakila</strong> procedure, untouched. It is already commented:</p>
<blockquote>
<pre>CREATE DEFINER=`root`@`localhost` PROCEDURE `rewards_report`(
 IN min_monthly_purchases TINYINT UNSIGNED
 , IN min_dollar_amount_purchased DECIMAL(10,2) UNSIGNED
 , OUT count_rewardees INT
)
 READS SQL DATA
 <strong>COMMENT 'Provides a customizable report on best customers'</strong>
BEGIN

 DECLARE last_month_start DATE;
 DECLARE last_month_end DATE;
 ...
</pre>
</blockquote>
<h4>SQL queries</h4>
<p>Last but not least, while not part of the schema, SQL queries define the use of the schema. That is, the schema exists for the sole reason of being able to query it.</p>
<p>Where did <em>that</em> query come from? Which piece of code issued it? Why? What's its purpose?</p>
<p>Looking at the <strong>PROCESSLIST</strong>, the slow log, etc., it is easier when the queries are commented:</p>
<blockquote>
<pre>SELECT
 <strong>/* List film details along with participating actors */</strong>
 <strong>/* Issued by analytics module */</strong>
 film.*,
 COUNT(*) AS count_actors,
 GROUP_CONCAT(CONCAT(actor.first_name, ' ', actor.last_name))
FROM
 film
 JOIN film_actor USING(film_id)
 JOIN actor USING(actor_id)
GROUP BY film.film_id;
</pre>
</blockquote>
<h4>Conclusion</h4>
<p>Source code commenting is an important practice, and usually watched out for. SQL &amp; table definitions commenting are often scarce or non-existent. I urge DBAs to adopt a comments coding convention for SQL, and apply it whenever they can.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-good-comments-conventions/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Database schema under version control</title>
		<link>http://code.openark.org/blog/mysql/database-schema-under-version-control</link>
		<comments>http://code.openark.org/blog/mysql/database-schema-under-version-control#comments</comments>
		<pubDate>Thu, 22 Apr 2010 07:04:04 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2215</guid>
		<description><![CDATA[How many organization use version control for development? Probably almost every single one. How many store the database schema under version control? Alas, not as many. Coupling one's application with table schema is essential. Organization who actively support multiple versions of the product understand that well. Smaller organizations not always have this realization. How is [...]]]></description>
				<content:encoded><![CDATA[<p>How many organization use version control for development? Probably almost every single one.</p>
<p>How many store the database schema under version control? Alas, not as many.</p>
<p>Coupling one's application with table schema is essential. Organization who actively support multiple versions of the product understand that well. Smaller organizations not always have this realization.</p>
<p>How is it done?</p>
<h4>Ideally</h4>
<p>Ideally one would have:</p>
<ul>
<li>The schema, generated by hand</li>
<li>Essential data (<strong>INSERT INTO</strong> statements for those lookup tables without which you cannot have an application)</li>
<li>Sample data: sufficient real-life data on which to act. This would include customers data, logs, etc.</li>
</ul>
<p>If you can work this way, then creating a staging environment consists of re-creating the entire schema from out schema code, and filling in the essential &amp; sample data.</p>
<p>The thing with this method is that one does not (usually?) apply it on one's live system. Say we were to add a column. On our live servers we would issue an <strong>ALTER TABLE t ADD COLUMN</strong>.</p>
<p>But this means we're using different methods on our staging server and on our production server.</p>
<h4>Incremental</h4>
<p>Another kind of solution would be to hold:</p>
<ul>
<li>The static schema, as before</li>
<li>Essential data, as before</li>
<li>Sample data, as before</li>
<li>A migration script, which is the concatenation of all ALTERs, CREATEs etc. as of the static schema.</li>
</ul>
<p>Once in a while one can do a "reset", and update the static schema with the existing design.</p>
<h4>As you go along</h4>
<p>This solution simply means "we apply the changes on staging; test + version them; then apply on production".</p>
<p>A side effect of this solution is that the database generates the schema, but the schema does not generate the database as in previous cases. This makes for an uglier solution, where you first apply the changes to the database, and then, based on what the database report, enter data into the version control.</p>
<p>How to do that? Easiest would be to use <strong>mysqldump --routines --no-data</strong>. Some further parsing should be done to strip out the <strong>AUTO_INCREMENT</strong> values, which tend to change, as well as the surrounding variables settings (strip out the character set settings etc.).</p>
<h4>Summary</h4>
<p>However you do it, make sure you have some kind of version control on your schema. It pays off just as with doing version control for your code. You get to compare, understand the incremental changes, understand the change in design, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/database-schema-under-version-control/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
