<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org</title>
	<atom:link href="http://code.openark.org/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Mon, 14 May 2012 05:52:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Impact of foreign keys absence on replicating slaves</title>
		<link>http://code.openark.org/blog/mysql/impact-of-foreign-keys-absence-on-replicating-slaves</link>
		<comments>http://code.openark.org/blog/mysql/impact-of-foreign-keys-absence-on-replicating-slaves#comments</comments>
		<pubDate>Mon, 14 May 2012 05:52:41 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[data integrity]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4860</guid>
		<description><![CDATA[In this post I describe what happens when a slave's Foreign Key setup is different from that of the master. I'm in particular interested in a setup where the slave has a subset of the master's foreign keys, or no foreign keys at all. I wish to observe whether integrity holds. Making the changes Which [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I describe what happens when a slave's Foreign Key setup is different from that of the master. I'm in particular interested in a setup where the slave has a subset of the master's foreign keys, or no foreign keys at all. I wish to observe whether integrity holds.</p>
<h4>Making the changes</h4>
<p>Which foreign keys do we have and how do we drop them? If you want to do this by hand, well, good luck! Fortunately, <a href="http://code.google.com/p/common-schema/">common_schema</a> provides with quite a few handy views and routines to assist us. Consider viewing the existing foreign keys on <strong>sakila</strong>:</p>
<blockquote>
<pre>master&gt; SELECT <strong>create_statement</strong> FROM <strong>common_schema.sql_foreign_keys</strong> WHERE TABLE_SCHEMA='sakila';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| create_statement                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ALTER TABLE `sakila`.`address` ADD CONSTRAINT `fk_address_city` FOREIGN KEY (`city_id`) REFERENCES `sakila`.`city` (`city_id`) ON DELETE RESTRICT ON UPDATE CASCADE                             |
| ALTER TABLE `sakila`.`city` ADD CONSTRAINT `fk_city_country` FOREIGN KEY (`country_id`) REFERENCES `sakila`.`country` (`country_id`) ON DELETE RESTRICT ON UPDATE CASCADE                       |
| ALTER TABLE `sakila`.`customer` ADD CONSTRAINT `fk_customer_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE               |
| ALTER TABLE `sakila`.`customer` ADD CONSTRAINT `fk_customer_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE                       |
| ALTER TABLE `sakila`.`film` ADD CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `sakila`.`language` (`language_id`) ON DELETE RESTRICT ON UPDATE CASCADE                   |
| ALTER TABLE `sakila`.`film` ADD CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `sakila`.`language` (`language_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_actor` ADD CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `sakila`.`actor` (`actor_id`) ON DELETE RESTRICT ON UPDATE CASCADE                   |
| ALTER TABLE `sakila`.`film_actor` ADD CONSTRAINT `fk_film_actor_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE                       |
| ALTER TABLE `sakila`.`film_category` ADD CONSTRAINT `fk_film_category_category` FOREIGN KEY (`category_id`) REFERENCES `sakila`.`category` (`category_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_category` ADD CONSTRAINT `fk_film_category_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE                 |
| ALTER TABLE `sakila`.`inventory` ADD CONSTRAINT `fk_inventory_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE                         |
| ALTER TABLE `sakila`.`inventory` ADD CONSTRAINT `fk_inventory_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE                     |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_customer` FOREIGN KEY (`customer_id`) REFERENCES `sakila`.`customer` (`customer_id`) ON DELETE RESTRICT ON UPDATE CASCADE             |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_rental` FOREIGN KEY (`rental_id`) REFERENCES `sakila`.`rental` (`rental_id`) ON DELETE SET NULL ON UPDATE CASCADE                     |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_staff` FOREIGN KEY (`staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE                         |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_customer` FOREIGN KEY (`customer_id`) REFERENCES `sakila`.`customer` (`customer_id`) ON DELETE RESTRICT ON UPDATE CASCADE               |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_inventory` FOREIGN KEY (`inventory_id`) REFERENCES `sakila`.`inventory` (`inventory_id`) ON DELETE RESTRICT ON UPDATE CASCADE           |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_staff` FOREIGN KEY (`staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE                           |
| ALTER TABLE `sakila`.`staff` ADD CONSTRAINT `fk_staff_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE                     |
| ALTER TABLE `sakila`.`staff` ADD CONSTRAINT `fk_staff_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE                             |
| ALTER TABLE `sakila`.`store` ADD CONSTRAINT `fk_store_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE                     |
| ALTER TABLE `sakila`.`store` ADD CONSTRAINT `fk_store_staff` FOREIGN KEY (`manager_staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</pre>
</blockquote>
<p>Most of the foreign key constraints use <strong>RESTRICT</strong> for <strong>DELETE</strong> (meaning you are not allowed to delete a parent row when children exist), and <strong>CASCADE</strong> for <strong>UPDATE</strong> (meaning changes to parent will propagate to children). This is good, since I want to test behavior of both <strong>RESTRICT</strong> and <strong>CASCADE</strong>.</p>
<p><span id="more-4860"></span>OK, we wish to remove these constraints from the slave. To see what we are going to do, consider:</p>
<blockquote>
<pre>slave1&gt; select <strong>drop_statement</strong> from <strong>common_schema.sql_foreign_keys</strong> where table_schema='sakila';
+-----------------------------------------------------------------------------------+
| drop_statement                                                                    |
+-----------------------------------------------------------------------------------+
| ALTER TABLE `sakila`.`address` DROP FOREIGN KEY `fk_address_city`                 |
| ALTER TABLE `sakila`.`city` DROP FOREIGN KEY `fk_city_country`                    |
| ALTER TABLE `sakila`.`customer` DROP FOREIGN KEY `fk_customer_address`            |
| ALTER TABLE `sakila`.`customer` DROP FOREIGN KEY `fk_customer_store`              |
| ALTER TABLE `sakila`.`film` DROP FOREIGN KEY `fk_film_language`                   |
| ALTER TABLE `sakila`.`film` DROP FOREIGN KEY `fk_film_language_original`          |
| ALTER TABLE `sakila`.`film_actor` DROP FOREIGN KEY `fk_film_actor_actor`          |
| ALTER TABLE `sakila`.`film_actor` DROP FOREIGN KEY `fk_film_actor_film`           |
| ALTER TABLE `sakila`.`film_category` DROP FOREIGN KEY `fk_film_category_category` |
| ALTER TABLE `sakila`.`film_category` DROP FOREIGN KEY `fk_film_category_film`     |
| ALTER TABLE `sakila`.`inventory` DROP FOREIGN KEY `fk_inventory_film`             |
| ALTER TABLE `sakila`.`inventory` DROP FOREIGN KEY `fk_inventory_store`            |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_customer`             |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_rental`               |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_staff`                |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_customer`               |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_inventory`              |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_staff`                  |
| ALTER TABLE `sakila`.`staff` DROP FOREIGN KEY `fk_staff_address`                  |
| ALTER TABLE `sakila`.`staff` DROP FOREIGN KEY `fk_staff_store`                    |
| ALTER TABLE `sakila`.`store` DROP FOREIGN KEY `fk_store_address`                  |
| ALTER TABLE `sakila`.`store` DROP FOREIGN KEY `fk_store_staff`                    |
+-----------------------------------------------------------------------------------+</pre>
</blockquote>
<p>To actually make the DROP, we use <em>common_schema</em>'s <a href="http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/eval.html">eval()</a>:</p>
<blockquote>
<pre>slave1&gt; call <strong>common_schema.eval</strong>("select drop_statement from common_schema.sql_foreign_keys where table_schema='sakila'");</pre>
</blockquote>
<p><em>eval()</em> is a handy routine which invokes statements generated by the given query.</p>
<p>This concludes the setup part.</p>
<p>Tests will include:</p>
<ol>
<li>Attempting to delete a parent row</li>
<li>Attempting to add an invalid child row</li>
<li>Attempting to update parent row</li>
</ol>
<p>I was thinking there would be a difference between the two binary log file formats: <strong>STATEMENT</strong> and <strong>ROW</strong>. But the tests I produced showed no difference.</p>
<h4>Tests</h4>
<p>Attempting to delete parent row:</p>
<blockquote>
<pre>master&gt; delete from actor where actor_id=1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`sakila`.`film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON UPDATE CASCADE)

slave1&gt; select * from actor where actor_id=1;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|        1 | PENELOPE   | GUINESS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+</pre>
</blockquote>
<p>Good: the master refused the <strong>DELETE</strong>, and no <strong>DELETE</strong> occurred on slave. Integrity is intact.</p>
<p>Attempting to add an invalid child row:</p>
<blockquote>
<pre>master&gt; insert into film_actor (actor_id, film_id, last_update) values (9999, 1, NOW());
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`sakila`.`film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON UPDATE CASCADE)

slave&gt; select * from film_actor where actor_id=9999;
Empty set (0.00 sec)</pre>
</blockquote>
<p>Integrity is still intact.</p>
<p>Attempting to update parent row: there is nothing invalid about this operation. I'm wondering whether changes are <strong>CASCADE</strong>d on slave as well as on master:</p>
<blockquote>
<pre>master&gt; update actor set actor_id=999 where actor_id=199;

master&gt; select count(*) from film_actor where actor_id=999;
+----------+
| count(*) |
+----------+
|       15 |
+----------+</pre>
</blockquote>
<p>The <strong>999</strong> value wasn't there before on the master, so this verifies the <strong>CASCADE</strong> works on master. As for slave:</p>
<blockquote>
<pre>slave&gt; select count(*) from actor where actor_id=999;
+----------+
| count(*) |
+----------+
|        1 |
+----------+

slave&gt; select count(*) from film_actor where actor_id=999;
+----------+
| count(*) |
+----------+
|        0 |
+----------+</pre>
</blockquote>
<p>Bummer! The actor's row was updated, but cascading did not work on slave.</p>
<p>This is actually <a href="http://dev.mysql.com/doc/refman/5.0/en/innodb-and-mysql-replication.html">documented</a>. However, the documentation only relates to the issue of slave tables being <strong>MyISAM</strong>. The problem occurs even when the slave tables are <strong>InnoDB</strong>, and have no foreign key constraints.</p>
<h4>Conclusion</h4>
<p>My personal interest in the scenario is due to something I'm working on, I'll elaborate on a future post. People sometime hope to get rid of foreign keys, and might wonder whether replication performance would boost having constraints removed on slaves.</p>
<p>When slave does not enforce foreign keys, you cannot rely on integrity with cascading constraints. An ugly patch might be to use triggers so as to <a href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-i">simulate their behavior</a>. Performance wise this is very bad.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/impact-of-foreign-keys-absence-on-replicating-slaves/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Webinar review: Zero-Downtime Schema Changes In MySQL</title>
		<link>http://code.openark.org/blog/mysql/webinar-review-zero-downtime-schema-changes-in-mysql</link>
		<comments>http://code.openark.org/blog/mysql/webinar-review-zero-downtime-schema-changes-in-mysql#comments</comments>
		<pubDate>Thu, 03 May 2012 14:17:19 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[openark kit]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4895</guid>
		<description><![CDATA[Yesterday I attended the Zero-Downtime Schema Changes In MySQL webinar by Baron Schwartz, Percona (do you say "attended" for something you listened to from your home office?) I was keen to learn about possible enhancements and improvements of pt-online-schema-change over oak-online-alter-table. Here are my impressions: The base logic of pt-online-schema-change is essentially the same as [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I attended the <a href="http://www.percona.com/webinars/2012-05-02-zero-downtime-schema-changes-in-mysql/">Zero-Downtime Schema Changes In MySQL</a> webinar by Baron Schwartz, Percona (<em>do you say "attended" for something you listened to from your home office?</em>)</p>
<p>I was keen to learn about possible enhancements and improvements of <a href="http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html">pt-online-schema-change</a> over <a href="http://openarkkit.googlecode.com/svn/trunk/openarkkit/doc/html/oak-online-alter-table.html">oak-online-alter-table</a>. Here are my impressions:</p>
<p>The base logic of <em>pt-online-schema-change</em> is essentially the same as of <em>oak-online-alter-table</em>. You create a ghost/shadow table, create complex triggers, copy in chunks, freeze and swap. Both work on any type of <strong>PRIMARY KEY</strong> (<em>oak-online-alter-table</em> can work with any <strong>UNIQUE KEY</strong>, I'm not sure about <em>pt-online-schema-change</em> on this), be it an <strong>INTEGER</strong>, other type, or a multi column one.</p>
<p>However, <em>pt-online-schema-change</em> also adds the following:</p>
<ul>
<li>It supports <strong>FOREIGN KEY</strong>s (to some extent). This is something I've wanted to do with <em>oak-online-alter-table</em> but never got around to it. Foreign keys are very tricky, as Baron noted. With child-side keys, things are reasonably manageable. With parent-side this becomes a nightmare, sometimes unsolvable (when I say "unsolvable", I mean that under the constraint of having the operation run in a non-blocking, transparent way).</li>
<li>Chunk size is auto-calculated by the script. This is a cool addition. Instead of letting the user throwing out numbers like <strong>1,000</strong> rows per chunk, in the hope that this is neither too small nor too large, the tool monitors the time it takes a chunk to complete, then adjusts the size of next chunk accordingly. Hopefully this leads to a more optimized run, where locks are only held for very short periods, yet enough rows are being processed at a time.</li>
<li>The tool looks into replicating slaves to verify they're up to the job. If the slave lags too far, the tool slows down the work. This is an excellent feature, and again, one that I always wanted to have. Great work!</li>
</ul>
<p>So the three bullets above are what I understand to be the major advantages of Percona's tool over <em>oak-online-alter-table</em>.</p>
<h4>Q &amp; A</h4>
<p>The presentation itself was very good, and Baron answered some questions. There was one question he did not answer during the webinar, nor here, and I though I may pop in and answer it. Although I can't speak for the coders of <em>pt-online-schema-change</em>, I safely assume that since the logic follows that of <em>oak-online-alter-table</em>, the same answer applies in the case of Percona's toolkit.<span id="more-4895"></span></p>
<p>But, first, a background question (asked and answered during the webinar):</p>
<p><strong>Q</strong>: What if my table already has <strong>AFTER TRIGGER</strong>s?</p>
<p><strong>A</strong>: Then this can't work out. The table must not have triggers.</p>
<p>Which led to the next question:</p>
<p><strong>Q</strong>: Can't the tool use <strong>BEFORE TRIGGER</strong>s instead?</p>
<p>Imagine a <strong>MyISAM</strong> table being altered to <strong>InnoDB</strong> (this is a major task for which my tool was built). Suppose we used a <strong>BEFORE</strong> trigger on an <strong>INSERT</strong>, but the <strong>INSERT</strong> failed. That would make the shadow table inconsistent with the original table. Which is the reason why the trigger must be an <strong>AFTER</strong> trigger.</p>
<p>With <strong>InnoDB</strong> this should not be an issue, since triggers and actions all play within the same transaction, so all succeed or all fail. I have this nagging feeling at the back of my head which says I've already had thoughts on this and have found a problem with <strong>InnoDB</strong> tables as well. I can't put my finger on it now, so no comment on this one at this stage.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/webinar-review-zero-downtime-schema-changes-in-mysql/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Installing MySQL from source/binary tarball as a Linux service</title>
		<link>http://code.openark.org/blog/mysql/installing-mysql-from-sourcebinary-tarball-as-a-linux-service</link>
		<comments>http://code.openark.org/blog/mysql/installing-mysql-from-sourcebinary-tarball-as-a-linux-service#comments</comments>
		<pubDate>Tue, 01 May 2012 08:10:10 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Installation]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4858</guid>
		<description><![CDATA[I've written before I prefer to do a manual install of MySQL over a repository one. I still do: I typically install from binary tarball or by compiling from source. I'd like to share my setup procedure for Linux installation and service setup. I've done this dozens of times, on different Linux flavors, and it [...]]]></description>
			<content:encoded><![CDATA[<p>I've <a href="http://code.openark.org/blog/mysql/to-not-yum-or-to-not-apt-get">written before</a> I prefer to do a manual install of MySQL over a repository one. I still do: I typically install from binary tarball or by compiling from source.</p>
<p>I'd like to share my setup procedure for Linux installation and service setup. I've done this dozens of times, on different Linux flavors, and it works well for me.</p>
<h4>Installing from source</h4>
<p>To get this straight: you sometimes have to compile the source files. I, for example, happen to use the Sphinx MySQLSE extension. You can only use it if compiled with MySQL. You had to compile a "vanilla" <strong>5.1</strong> version without query cache in order to completely remove the cache's mutex contention.</p>
<p>Anyway, I find the easiest way is to install onto a path associated with the server version. For example, I would install a <strong>5.5</strong> server onto <strong>/usr/local/mysql55</strong></p>
<p>This way, a new version gets its own path, and no ambiguity.</p>
<p>To do that, use the <strong>prefix</strong> option on configuration step:</p>
<blockquote>
<pre>cd /path/to/extracted/source/tarball
sh BUILD/autorun.sh
./configure --prefix=/usr/local/mysql55
make
sudo make install</pre>
</blockquote>
<p>Once this is complete, you have <em>everything</em> under <strong>/usr/local/mysql55</strong>. This means binaries, libraries, scripts, etc.</p>
<p>To install the MySQL server as a service, copy the mysql.server script to <strong>/etc/init.d</strong>:</p>
<blockquote>
<pre>sudo cp /usr/local/mysql55/support-files/mysql.server /etc/init.d/mysql55</pre>
</blockquote>
<p>Again, I'm naming the script after the MySQL version. This avoids conflict with possible past or future installations of the MySQL server, which typically create a service named <strong>mysql</strong> or <strong>mysqld</strong>.<span id="more-4858"></span></p>
<p>A thing to note about the mysql.server script is that it allows you (at around line <strong>#45</strong>) to set two variables:</p>
<ul>
<li><strong>basedir</strong>: path to your installation directory. When compiling from source this is already setup with the path provided to the <strong>configure</strong> script. Thus, in our example, you can expect this variable to read <strong>/usr/local/mysql55</strong>. So basically nothing to do here.</li>
<li><strong>datadir</strong>: path to your data directory. If you're putting your <strong>my.cnf</strong> file in <strong>/etc</strong> or <strong>/etc/mysql</strong>, then setting <strong>datadir</strong> in <strong>my.cnf</strong> suffices. However, if you're going to put <strong>my.cnf</strong> itself on the data directory (e.g. so as to avoid collisions) then make sure to set the variable in the <strong>mysql.server</strong> init script.</li>
</ul>
<p>Depending on your <strong>$PATH</strong> configuration, it is also a good idea to specify <strong>basedir</strong> variable on your <strong>my.cnf</strong>'s <strong>[mysqld]</strong> section.</p>
<p>Which leads us to <strong>$PATH</strong>: your linux system is still unaware of the many binaries you've got in there. I typically add the following line at the end of <strong>/etc/bash.bashrc</strong>:</p>
<blockquote>
<pre>export PATH=/usr/local/mysql55:${PATH}</pre>
</blockquote>
<p>This is the most global PATH settings one can do. Alternatively, use <strong>/etc/profile</strong>, <strong>~/.bashrc</strong> etc. (you may have noticed by now I'm working with <strong>bash</strong>).</p>
<p>Finally, need to setup the init script to run at startup and stop at shutdown.</p>
<ul>
<li>On Debian/Ubuntu/related I use <strong>rcconf</strong> (I'm too lazy to remember the command line setup).</li>
<li>On RedHat/CentOS/related I use <strong>chkconfig --add mysql55</strong>, or  <strong>linuxconf</strong> (since I'm lazy).</li>
</ul>
<h4>Installing from binary tarball</h4>
<p>The only difference is that the <strong>mysql.server</strong> script is unaware of our deployment path. So the <strong>basedir</strong> variable must be set in that file. Other than that, follow same steps as for source installation (oh, of course no need to <strong>configure</strong> &amp; <strong>make</strong>...).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/installing-mysql-from-sourcebinary-tarball-as-a-linux-service/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It&#039;s *that time* of the year</title>
		<link>http://code.openark.org/blog/mysql/its-that-time-of-the-year</link>
		<comments>http://code.openark.org/blog/mysql/its-that-time-of-the-year#comments</comments>
		<pubDate>Mon, 16 Apr 2012 09:00:17 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[mysqlconf]]></category>
		<category><![CDATA[Opinions]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4824</guid>
		<description><![CDATA[Even without attending the Percona Live conference in Santa Clara, you could tell something big was going on. One way of measuring it was by looking at the flow of announcements. Here's a brief list, and apologies if I've missed anyone: Monty Program Announcing MariaDB 5.3. (and later Announcing MariaDB 5.5.23 GA) Tokutek Announcing TokuDB [...]]]></description>
			<content:encoded><![CDATA[<p>Even without attending the Percona Live conference in Santa Clara, you could tell something big was going on.</p>
<p>One way of measuring it was by looking at the flow of announcements. Here's a brief list, and apologies if I've missed anyone:</p>
<ul>
<li>Monty Program <a href="http://blog.montyprogram.com/announcing-mariadb-5-3-6/">Announcing MariaDB 5.3.</a> (and later <a href="http://blog.montyprogram.com/announcing-mariadb-5-5-23-ga/">Announcing MariaDB 5.5.23 GA</a>)</li>
<li>Tokutek <a href="http://www.tokutek.com/2012/04/announcing-tokudb-v6-0-less-slave-lag-and-more-compression/">Announcing TokuDB v6.0</a></li>
<li>Twitter <a href="http://blog.jcole.us/2012/04/09/twitter-mysql-published/">releasing its own MySQL fork</a></li>
<li>Oracle announcing so many new features I can't list them all. But they made a <a href="http://sqlhjalp.blogspot.com/2012/04/mysql-565-m8-dmr-table-of-contents.html">TOC</a> for it, and then announced a dozen more, not covered in the TOC.</li>
<li>Continuent <a href="http://continuent-tungsten.blogspot.com/2012/04/continuent-announces-tungsten.html">Announcing Tungsten Enterprise 1.5</a></li>
<li>Zmanda <a href="http://www.zmanda.com/blogs/?p=554">announcing Recovery Manager 3.4</a></li>
<li>And plenty of new partnerships between the major consulting companies</li>
</ul>
<p>All within the first days of the conference.</p>
<h4>What this means, over the surface</h4>
<p>I read a post by someone who was ranting about Oracle making so many announcements just as the conference began. He obviously suspected there was no coincidence. I got the impression he was looking at it the wrong way: as if Oracle's announcements came to discourage the relevance of the conference.</p>
<p>I beg the opposite.</p>
<p><span id="more-4824"></span>Obviously no one is insinuating the timing is coincidental. This does not mean, though, that by announcing new features companies try to undermine the conference. On the contrary: it's part of the celebration. The days of the conference are full of excitement. People are meeting, sharing experiences. It's a great opportunity to throw in a few more goodies and let everyone enjoy themselves.</p>
<p>No new development can make a conference's talk obsolete, as was insinuated by another's post. We all know it takes time for new released to become widespread. So it just adds up to the excitement that we not only have great fun <em>now</em>, but we are expected to enjoy new features to be stable by next year.</p>
<h4>What this means, under the hood</h4>
<p>To make my point even more interesting, consider that it takes a huge amount of energy to have a release, or a set of features to be released <em>at a specific date</em>. You won't hold out for a stable release for <strong>4</strong> months. You won't rush a premature release by <strong>3</strong> months.</p>
<p>It follows that many companies were planning these releases <em>months ahead</em>. Hold on. they were planning these releases months ahead to match the dates of the <em>Percona Live</em> conference. I don't look at this as undermining the conference: I see this as <em>showing confidence</em> in the conference. The conference <em>will be great</em>, so our announcements <em>will play well</em>!</p>
<h4>Even more under the hood</h4>
<p>There is really nothing special about it. You see this happening in other conferences as well. LinuxCon is full of announcements. MySQL's case is actually better. While Linuxcon suffers from premature announcements of new patches, with keynotes and sessions describing those patches, patches that are quickly discarded a few months later, we do happen to work with stable projects and products. No one is immune from the forces of economy, but we usually enjoy reliable announcements.</p>
<p>And, an interesting phenomena is created: we get a release cycle.</p>
<p>Everyone is eager to announce <em>something</em> at the conference; so we get to <em>expect</em> releases on the conference. With Oracle throwing another conference this fall, we can expect even more announcements. Not unlike Ubuntu's release cycle - April &amp; October, Tick Tock, Tick Tock, it's time for a release.</p>
<p>For all these I congratulate Percona on a job well done!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/its-that-time-of-the-year/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>common_schema talk at Percona Live</title>
		<link>http://code.openark.org/blog/mysql/common_schema-talk-at-percona-live</link>
		<comments>http://code.openark.org/blog/mysql/common_schema-talk-at-percona-live#comments</comments>
		<pubDate>Sun, 08 Apr 2012 16:57:15 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[common_schema]]></category>
		<category><![CDATA[mysqlconf]]></category>
		<category><![CDATA[Speaking]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4809</guid>
		<description><![CDATA[Are you attending PerconaLive? Allow me to suggest you attend the Common Schema: a framework for MySQL server administration session on April 12, 14:00 - 14:50 @ Ballroom F. This talk is by none other than Roland Bouman. Roland co-authored parts of common_schema, and is a great speaker. I have a personal interest, of course, [...]]]></description>
			<content:encoded><![CDATA[<p>Are you attending PerconaLive?</p>
<p>Allow me to suggest you attend the <a href="http://www.percona.com/live/mysql-conference-2012/sessions/common-schema-framework-mysql-server-administration">Common Schema: a framework for MySQL server administration</a> session on <strong></strong>April <strong>12</strong>, <strong>14:00 - 14:50</strong> @ Ballroom F.</p>
<p>This talk is by none other than <a href="http://rpbouman.blogspot.com/">Roland Bouman</a>. Roland co-authored parts of <em>common_schema</em>, and is a great speaker.</p>
<p>I have a personal interest, of course, being the author of most of the components in <em>common_schema</em>. I would like to convert you to a supporter of this project. I know a few very smart people who think this project is an important tool. I would like more people to get to know it. Eventually, I would like developers and DBAs alike to consider it an inseparable part of any MySQL installation.</p>
<p>Then I shall have world domination, Bwa ha ha!</p>
<p>PS,</p>
<p>Have fun, I will unfortunately not attend myself this year. Having been on the program committee, I can tell it's going to be a great conference!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/common_schema-talk-at-percona-live/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on using MySQL plugins</title>
		<link>http://code.openark.org/blog/mysql/thoughts-on-using-mysql-plugins</link>
		<comments>http://code.openark.org/blog/mysql/thoughts-on-using-mysql-plugins#comments</comments>
		<pubDate>Tue, 03 Apr 2012 11:05:42 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Opinions]]></category>
		<category><![CDATA[plugin]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4769</guid>
		<description><![CDATA[I'm giving thoughts on the viability of MySQL plugins. This is due to a particular experience I've had, which is thankfully solved. However, it left some bitter taste in my mouth. MySQL plugins are a tricky business. To create a plugin, you must compile it against the MySQL version you wish the users to use [...]]]></description>
			<content:encoded><![CDATA[<p>I'm giving thoughts on the viability of MySQL plugins. This is due to a particular experience I've had, which is thankfully solved. However, it left some bitter taste in my mouth.</p>
<p>MySQL plugins are a tricky business. To create a plugin, you must compile it against the MySQL version you wish the users to use it with. Theoretically, you should compile it against any existing MySQL version, minors as well (I'm not sure whether it may sometimes or most times work across minor versions).</p>
<p>But, most important, you must adapt your plugin to major versions.</p>
<p>Another option for plugin makers, is to actually <em>not</em> recompile it, but rather provide with the source code, and let the end user compile it with her own MySQL version. But here, too, the code must be compatible with whatever changes the new MySQL version may have.</p>
<h4>And if it doesn't compile with the new MySQL version?</h4>
<p>That's what happened to me. The particular case at hand was SphinxSE, a plugin which serves as a bridge between MySQL and a <a href="http://sphinxsearch.com/">Sphinx Search</a> server. I've been using it for years and was happy about it. But, as it happened, it took well over a year for sphinx to compile with MySQL <strong>5,5</strong>. This meant I was unable to upgrade my <strong>5.1</strong> installation to <strong>5.5</strong>, a thing I was aiming to do for quite a while.<span id="more-4769"></span></p>
<p>Even when fixed, not all features were included, and thankfully I was able to come up with <a href="http://code.openark.org/blog/mysql/sphinx-sphinx_snippets-mysql-5-5">my own patch</a>.</p>
<p>Not complaining about this particular project -- I think Sphinx is <em>awesome</em>, with latest versions providing great features I'm putting into immediate use.</p>
<p>However, how much am I willing to rely on 3rd party projects when planning my MySQL upgrades? I'm now thinking I shouldn't.</p>
<h4>A central repository?</h4>
<p>MariaDB does a great thing: it provides with several <a href="http://kb.askmonty.org/en/mariadb-versus-mysql-features">additional features</a> over standard MySQL, including a set of plugins. They turn into a debian-like repository, in which they maintain the plugins for their own distribution.</p>
<p>[<em>Darn!</em> I just realized I should have looked at what they did with Sphinx in their <strong>5.5</strong> distribution! Need to do my monthly mental examination.]</p>
<p>Anyway, this is something I would like to see outside MariaDB as well: a central repository where plugins are maintained and kept up to the latest releases.</p>
<p>Thoughts, anyone?</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/thoughts-on-using-mysql-plugins/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>sphinx, sphinx_snippets() &amp; MySQL 5.5</title>
		<link>http://code.openark.org/blog/mysql/sphinx-sphinx_snippets-mysql-5-5</link>
		<comments>http://code.openark.org/blog/mysql/sphinx-sphinx_snippets-mysql-5-5#comments</comments>
		<pubDate>Wed, 21 Mar 2012 13:57:59 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4775</guid>
		<description><![CDATA[I've written a patch which completes Sphinx's integration with MySQL 5.5. Up until a couple months ago, Sphinx would not compile with MySQL 5.5 at all. This is, thankfully, resolved as of Sphinx 2.0.3. However, to my disdain, I've found out that it only partially work: the sphinx_snippets() user defined function is not included within [...]]]></description>
			<content:encoded><![CDATA[<p>I've written a patch which completes Sphinx's integration with MySQL <strong>5.5</strong>.</p>
<p>Up until a couple months ago, Sphinx would not compile with MySQL <strong>5.5</strong> at all. This is, thankfully, resolved as of Sphinx <strong>2.0.3</strong>.</p>
<p>However, to my disdain, I've found out that it only partially work: the <a href="http://sphinxsearch.com/docs/manual-2.0.4.html#sphinxse-snippets">sphinx_snippets()</a> user defined function is not included within the plugin library. After some quick poking I discovered that it was not added to the build, and when added, would not compile.</p>
<p>I rely on <strong>sphinx_snippets()</strong> quite a lot, and like it. Eventually I wrote the fix to the <strong>snippets_udf.cc</strong> which allows it to run in a MySQL <strong>5.5</strong> server.</p>
<p>Here are the changes for the <strong>2.0.4</strong> version of Sphinx:</p>
<ul>
<li><a href="http://code.openark.org/blog/wp-content/uploads/2012/03/snippets_udf.cc">snippets_udf.cc</a></li>
<li><a href="http://code.openark.org/blog/wp-content/uploads/2012/03/CMakeLists.txt">CMakeLists.txt</a></li>
</ul>
<p>Replace your <strong>2.0.4</strong> files with these two and get on compiling your MySQL server.</p>
<h4>Compilation guide</h4>
<p>For completeness, here's how to compile Percona Server <strong>5.5</strong> with Sphinx <strong>2.0.4</strong> including the above patches:<span id="more-4775"></span></p>
<p>Get <a href="http://www.percona.com/downloads/Percona-Server-5.5/LATEST/">Percona Server source code</a> and <a href="http://sphinxsearch.com/downloads/release/">Sphinx Search source code</a>.</p>
<p>I'll be using Percona Server <strong>5.5.21-25.0</strong>. I use <strong>/data/tmp/mysql</strong> as compilation path, and install MySQL on <strong>/usr/local/mysql55</strong>.</p>
<blockquote>
<pre>mkdir -p /data/tmp/mysql
cd /data/tmp/mysql
tar xzfv Percona-Server-5.5.21-rel25.0.tar.gz
tar xzfv sphinx-2.0.4-release.tar.gz
cd Percona-Server-5.5.21-rel25.0/
cp -R /data/tmp/mysql/sphinx-2.0.4-release/mysqlse storage/sphinx</pre>
</blockquote>
<p>Overwrite with patched files included in this post:</p>
<blockquote>
<pre>cp /tmp/CMakeLists.txt storage/sphinx/CMakeLists.txt
cp /tmp/snippets_udf.cc storage/sphinx/snippets_udf.cc</pre>
</blockquote>
<p>Build MySQL:</p>
<blockquote>
<pre>sh BUILD/autorun.sh
./configure --with-plugin-sphinx --prefix=/usr/local/mysql55
make
sudo make install</pre>
</blockquote>
<p>Install the mysql55 service:</p>
<blockquote>
<pre>cd /usr/local/mysql55
sudo cp support-files/mysql.server /etc/init.d/mysql55</pre>
</blockquote>
<p>In <strong>/etc/bash.bashrc</strong>, add:</p>
<blockquote>
<pre>export PATH=/usr/local/mysql55/bin:${PATH}</pre>
</blockquote>
<p>Start MySQL:</p>
<blockquote>
<pre>sudo service mysql55 start</pre>
</blockquote>
<p>Login to MySQL as an administrato (typically <strong>root</strong>) and install Sphinx:</p>
<blockquote>
<pre>mysql&gt; INSTALL PLUGIN sphinx SONAME 'ha_sphinx.so';
mysql&gt; CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'ha_sphinx.so';</pre>
</blockquote>
<h4>Notes</h4>
<p>See also <a href="http://sphinxsearch.com/bugs/view.php?id=1090">http://sphinxsearch.com/bugs/view.php?id=1090</a> and <a href="http://sphinxsearch.com/forum/view.html?id=8982">http://sphinxsearch.com/forum/view.html?id=8982</a></p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sphinx-sphinx_snippets-mysql-5-5/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Auto caching INFORMATION_SCHEMA tables: seeking input</title>
		<link>http://code.openark.org/blog/mysql/auto-caching-information_schema-tables-seeking-input</link>
		<comments>http://code.openark.org/blog/mysql/auto-caching-information_schema-tables-seeking-input#comments</comments>
		<pubDate>Thu, 08 Mar 2012 18:31:56 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Hack]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4761</guid>
		<description><![CDATA[The short version I have it all working. It's kind of magic. But there are issues, and I'm not sure it should even exist, and am looking for input. The long version In Auto caching tables I presented with a hack which allows getting cached or fresh results via a simple SELECT queries. The drive [...]]]></description>
			<content:encoded><![CDATA[<h4>The short version</h4>
<p>I have it all working. It's kind of magic. But there are issues, and I'm not sure it should even exist, and am looking for input.</p>
<h4>The long version</h4>
<p>In <a title="Link to Auto caching tables" href="http://code.openark.org/blog/mysql/auto-caching-tables" rel="bookmark">Auto caching tables</a> I presented with a hack which allows getting cached or fresh results via a simple SELECT queries.</p>
<p>The drive for the above hack was <strong>INFORMATION_SCHEMA</strong> tables. There are two major problems with <strong>INFORMATION_SCHEMA</strong>:</p>
<ol>
<li>Queries on schema-oriented tables such as <strong>TABLES</strong>, <strong>COLUMNS</strong>, <strong>STATISTICS</strong>, etc. are heavyweight. How heavyweight? Enough to make a lockdown of your database. Enough to crash down your database in some cases.</li>
<li>The data is always generated on-the-fly, as you request it. Query the <strong>COLUMNS</strong> table twice, and risk two lockdowns of your database.</li>
</ol>
<p>The auto-cache mechanism solves issue <strong>#2</strong>. I have it working, time based. I have an auto-cache table for each of the <strong>INFORMATION_SCHEMA</strong> heavyweight tables. Say, every <strong>30</strong> minutes the cache is invalidated. Throughout those <strong>30</strong> minutes, you get a free pass!</p>
<p>The auto-cache mechanism also paves the road to solving issue <strong>#1</strong>: since it works by invoking a stored routine, I have better control of the way I read <strong>INFORMATION_SCHEMA</strong>. This, I can take advantage of <a href="http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html">INFORMATION_SCHEMA optimization</a>. It's tedious, but not complicated.</p>
<p>For example, if I wanted to cache the <strong>TABLES</strong> table, I don't necessarily read the entire <strong>TABLES</strong> data in one read. Instead, I can iterate the schemata, get a list of table names per schema, then read full row data for these, table by table. The result? Many many more <strong>SELECT</strong>s, but more optimized, and no one-big-lock-it-all query.</p>
<h4>And the problem is...</h4>
<p><span id="more-4761"></span>I have two burning problems.</p>
<ol>
<li><strong>INFORMATION_SCHEMA</strong> optimization only works <em>that much</em>. It sometimes does not work. In particular, I've noticed that if you have a view which relies on another view (possibly relying on yet another view), things get out of hand. I author a monitoring tool for MySQL called <a href="http://code.openark.org/forge/mycheckpoint/">mycheckpoint</a>. It uses some fancy techniques for generating aggregated data, HTML and charts, by means of nested views. There are a few views there I can never query for in <strong>COLUMNS</strong>. It just crashes down my server. Repeatedly. And it's a good machine with good configuration. Make that <strong>5</strong> machines. They all crash, repeatedly. I just can't trust <strong>INFORMATION_SCHEMA</strong>!</li>
<li>Replication: any caching table is bound to replicate. Does it make any sense to replicate cache for internal metadata? Does it make sense to query for the cached table on slave, to have it answer for <em>master'</em>s data? With plain old <strong>INFORMATION_SCHEMA</strong>, every server is on its own. Caching kinda works against this. Or is it fair enough, since we would usually expect master/slaves to reflect same schema structure?</li>
</ol>
<p>I would feel much better if I could read <strong>SHOW</strong> statements with a <strong>SELECT</strong> query. Though I've found this <a href="http://code.openark.org/blog/mysql/reading-results-of-show-statements-on-server-side">nice hack</a>, it can't work from a stored function, only via stored procedure. So it can't be used from within a <strong>SELECT</strong> query. I've been banging my head for months now, I think I gave up on this one.</p>
<p>Any insights are welcome!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/auto-caching-information_schema-tables-seeking-input/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Auto caching tables</title>
		<link>http://code.openark.org/blog/mysql/auto-caching-tables</link>
		<comments>http://code.openark.org/blog/mysql/auto-caching-tables#comments</comments>
		<pubDate>Tue, 06 Mar 2012 13:18:36 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Hack]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>
		<category><![CDATA[Views]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4353</guid>
		<description><![CDATA[Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated/invalidated? Hint: yes. But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have [...]]]></description>
			<content:encoded><![CDATA[<p>Is there a way to create a caching table, some sort of a materialized view, such that <em>upon selecting</em> from that table, its data is validated/invalidated?</p>
<p><em>Hint</em>: yes.</p>
<p>But to elaborate the point: say I have some table <strong>data_table</strong>. Can I rewrite all my queries which access <strong>data_table</strong> to read from some <strong>autocache_data_table</strong>, but have nothing changed in the query itself? No caveats, no additional <strong>WHERE</strong>s, and still have that <strong>autocache_data_table</strong> provide with the correct data, dynamically updated by some rule <em>of our choice</em>?</p>
<p>And: no <em>crontab</em>, no <em>event scheduler</em>, and no funny triggers on <strong>data_table</strong>? In such way that invalidation/revalidation occurs <em>upon <strong>SELECT</strong></em>?</p>
<p>Well, yes.</p>
<p>This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.</p>
<h4>Background</h4>
<p>The following derives from my long research on how to provide better, faster and <em>safer</em> access to <strong>INFORMATION_SCHEMA</strong> tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I'll have more to share about <strong>INFORMATION_SCHEMA</strong> specific solutions shortly.</p>
<p>I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds.<span id="more-4353"></span></p>
<h4>Sample data</h4>
<p>I'll explain by walking through the solution. Let's begin with some sample table:</p>
<blockquote>
<pre>CREATE TABLE sample_data (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  dt DATETIME,
  msg VARCHAR(128) CHARSET ascii
);

INSERT INTO sample_data VALUES (1, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (2, NOW(), 'sample txt');
INSERT INTO sample_data VALUES (3, NOW(), 'sample txt');

SELECT * FROM sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>In this simplistic example, I wish to create a construct which looks exactly like <strong>sample_data</strong>, but which caches data according to some heuristic. It will, in fact, cache the entire content of <strong>sample_data</strong>.</p>
<p>That much is not a problem: just create another table to cache the data:</p>
<blockquote>
<pre>CREATE TABLE cache_sample_data LIKE sample_data;</pre>
</blockquote>
<p>The big question is: how do you make the table invalidate itself while <strong>SELECT</strong>ing from it?</p>
<p>Here's the deal. I'll ask for your patience while I draw the outline, and start with failed solutions. By the end, everything will work.</p>
<h4>Failed attempt: purge rows from the table even while reading it</h4>
<p>My idea is to create a stored function which purges the <strong>cache_sample_data</strong> table, then fills in with fresh data, according to some heuristic. Something like this:</p>
<blockquote>
<pre>DELIMITER $$

CREATE FUNCTION `revalidate_cache_sample_data`() RETURNS tinyint unsigned
    MODIFIES SQL DATA
    DETERMINISTIC
    SQL SECURITY INVOKER
BEGIN
  if(rand() &gt; 0.1) then
    return 0; -- simplistic heuristic
  end if;

  DELETE FROM cache_sample_data;
  INSERT INTO cache_sample_data SELECT * FROM sample_data;
  RETURN 0;
END $$

DELIMITER ;</pre>
</blockquote>
<p>So the function uses some heuristic. It's a funny <strong>RAND()</strong> in our case; you will want to check up on time stamps, or some flags, what have you. But this is not the important part here, and I want to keep the focus on the main logic.</p>
<p>Upon deciding the table needs refreshing, the function purges all rows, then copies everything from <strong>sample_data</strong>. Sounds fair enough?</p>
<p>Let's try and invoke it. Just write some query by hand:</p>
<blockquote>
<pre>mysql&gt; SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              <strong>0</strong> |
+--------------------------------+

mysql&gt; SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              <strong>0</strong> |
+--------------------------------+

mysql&gt; SELECT revalidate_cache_sample_data();
+--------------------------------+
| revalidate_cache_sample_data() |
+--------------------------------+
|                              <strong>1</strong> |
+--------------------------------+</pre>
</blockquote>
<p>First two invocations - nothing. The third one indicated a revalidation of cache data. Let's verify:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM cache_sample_data;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>OK, seems like the function works.</p>
<p>We now gather some courage, and try combining calling to this function even while SELECTing from the cache table, like this:</p>
<blockquote>
<pre>SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>To explain what happens in the above query, consider its <a href="http://code.openark.org/blog/mysql/slides-from-my-talk-programmatic-queries-things-you-can-code-with-sql">programmatic nature</a>: we create a derived table, populated by the function's result. That means the function is invoked in order to generate the derived table. The derived table itself must be materialized before the query begins execution, and so it is that we first invoke the function, then make the <strong>SELECT</strong>.</p>
<p>Don't open the champagne yet. While the above paragraph is correct, we are deceived: in this last invocation, the function did <strong>not</strong> attempt a revalidation. The <strong>RAND()</strong> function just didn't provide with the right value.</p>
<p>Let's try again:</p>
<blockquote>
<pre>SELECT
  cache_sample_data.*
FROM
  cache_sample_data,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
<strong>ERROR 1442 (HY000): Can't update table 'cache_sample_data' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.</strong></pre>
</blockquote>
<p>Aha! Bad news. The MySQL manual says on <a href="http://dev.mysql.com/doc/refman/5.1/en/stored-program-restrictions.html">Restrictions on Stored Programs</a>:</p>
<blockquote><p>A stored function or trigger cannot modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger.</p></blockquote>
<h4>Anyone to the rescue?</h4>
<p>I was quite upset. Can we not make this work? At sorrow times like these, one reflects back on words of wiser people. What would <a href="http://rpbouman.blogspot.com/">Roland Bouman</a> say on this?</p>
<p>Oh, yes; he would say: <em>"we can use a <strong>FEDERATED</strong> table which connect onto itself, thus bypass the above restriction"</em>.</p>
<p>Unfortunately, <strong>FEDERATED</strong> is by default disabled nowadays; I cannot rely on its existence. Besides, to use <strong>FEDERATED</strong> one has to fill in passwords and stuff. Definitely not an out-of-the-box solution in this case.</p>
<p>Few more days gone by. Decided the problem cannot be solved. And then it hit me.</p>
<h4>MyISAM to the rescue</h4>
<p><em><strong>MyISAM</strong></em>? Really?</p>
<p>Yes, and not only <strong>MyISAM</strong>, but also its cousin: it's long abandoned cousin, forgotten once <strong>views</strong> and <strong>partitions</strong> came into MySQL. <strong><a href="http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html">MERGE</a></strong>.</p>
<p><strong>MERGE</strong> reflects the data contained within <strong>MyISAM</strong> tables. Perhaps the most common use for <strong>MERGE</strong> is to work out partitioned-like table of records, with <strong>MyISAM</strong> table-per month, and an overlooking <strong>MERGE</strong> table dynamically adding and removing tables from its view.</p>
<p>But I intend for <strong>MERGE</strong> a different use: just be an identical reflection of <strong>cache_sample_data</strong>.</p>
<p>So we must work out the following:</p>
<blockquote>
<pre>ALTER TABLE <strong>cache_sample_data</strong> ENGINE=<strong>MyISAM</strong>;
CREATE TABLE <strong>cache_sample_data_wrapper</strong> LIKE cache_sample_data;
ALTER TABLE <strong>cache_sample_data_wrapper</strong> ENGINE=<strong>MERGE</strong> <strong>UNION=(cache_sample_data)</strong>;</pre>
</blockquote>
<p>I just want to verify the new table is setup correctly:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM cache_sample_data_wrapper;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>Seems fine.</p>
<p>So the next step is what makes the difference: the two tables are <em>not the same</em>. One <em>relies on the other</em>, but they are distinct. Our function <strong>DELETE</strong>s from and <strong>INSERT</strong>s to <strong>cached_sample_data</strong>, but it does <em>not affect, nor lock</em>, <strong>cache_sample_data_wrapper</strong>.</p>
<p>We now rewrite our query to read:</p>
<blockquote>
<pre>SELECT
  cache_sample_data_wrapper.*
FROM
  <strong>cache_sample_data_wrapper</strong>,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;</pre>
</blockquote>
<p>This query is perfectly valid. It works. To illustrate, I do:</p>
<blockquote>
<pre>-- Try this a few times till RAND() is lucky:

<strong>TRUNCATE</strong> cache_sample_data;

SELECT
  cache_sample_data_wrapper.*
FROM
  cache_sample_data_wrapper,
  (SELECT revalidate_cache_sample_data()) AS select_revalidate
;
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>Whoa! Where did all this data come from? Didn't we just <strong>TRUNCATE</strong> the table?</p>
<p>The query worked. The function re-populated <strong>cache_sample_data</strong>.</p>
<h4>The final touch</h4>
<p>Isn't the above query just <em>beautiful</em>? I suppose not many will share my opinion. What happened to my declaration that <em>"the original query need not be changed, apart from querying a different table"</em>?</p>
<p>Yes, indeed. It's now time for the final touch. There's nothing amazing in this step, but we all know the way it is packaged is what makes the sale. We will now use <em>views</em>. We use two of them since a view must not contain a <em>subquery</em> in the <strong>FROM</strong> clause. Here goes:</p>
<blockquote>
<pre>CREATE OR REPLACE VIEW <strong>revalidate_cache_sample_data_view</strong> AS
  SELECT revalidate_cache_sample_data()
;

CREATE OR REPLACE VIEW <strong>autocache_sample_data</strong> AS
  SELECT
    cache_sample_data_wrapper.*
  FROM
    cache_sample_data_wrapper,
    revalidate_cache_sample_data_view
;</pre>
</blockquote>
<p>And finally, we can make a very simple query like this:</p>
<blockquote>
<pre>SELECT * FROM <strong>autocache_sample_data</strong>;
--
-- <strong><span style="color: #ff9900;">Magic in work now!</span></strong>
--
+----+---------------------+------------+
| id | dt                  | msg        |
+----+---------------------+------------+
|  1 | 2011-11-24 11:01:30 | sample txt |
|  2 | 2011-11-24 11:01:30 | sample txt |
|  3 | 2011-11-24 11:01:30 | sample txt |
+----+---------------------+------------+</pre>
</blockquote>
<p>Much as we would query the original <strong>sample_data</strong> table.</p>
<h4>Summary</h4>
<p>So what have we got? A stored routine, a <strong>MyISAM</strong> table, a <strong>MERGE</strong> table and two views. Quite a lot of constructs just to cache a table! But a beautiful cache access: <em>plain old SQL queries</em>. The flow looks like this:</p>
<blockquote><p><a href="http://code.openark.org/blog/wp-content/uploads/2011/11/autocache_flow.png"><img class="alignnone size-full wp-image-4463" title="autocache flow chart" src="http://code.openark.org/blog/wp-content/uploads/2011/11/autocache_flow.png" alt="" width="835" height="625" /></a></p></blockquote>
<p>Our cache table is a <strong>MyISAM</strong> table. It can get corrupted, which is bad. But not completely bad: it's nothing more than a cache; we can throw away its entire data, and revalidate. We can actually ask the function to revalidate (say, pass a parameter).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/auto-caching-tables/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>MySQL monitoring: storing, not caching</title>
		<link>http://code.openark.org/blog/mysql/mysql-monitoring-storing-not-caching</link>
		<comments>http://code.openark.org/blog/mysql/mysql-monitoring-storing-not-caching#comments</comments>
		<pubDate>Wed, 22 Feb 2012 07:44:47 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[mycheckpoint]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4736</guid>
		<description><![CDATA[I've followed with interest on Baron's Why don’t our new Nagios plugins use caching? and Sheeri's Caching for Monitoring: Timing is Everything. I wish to present my take on this, from mycheckpoint's point of view. So mycheckpoint works in a completely different way. On one hand, it doesn't bother with caching. On the other hand, [...]]]></description>
			<content:encoded><![CDATA[<p>I've followed with interest on Baron's <a href="http://www.mysqlperformanceblog.com/2012/02/19/why-dont-our-new-nagios-plugins-use-caching/">Why don’t our new Nagios plugins use caching?</a> and Sheeri's <a href="http://www.sheeri.com/content/caching-monitoring-timing-everything">Caching for Monitoring: Timing is Everything</a>. I wish to present my take on this, from <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a>'s point of view.</p>
<p>So <em>mycheckpoint</em> works in a completely different way. On one hand, it doesn't bother with caching. On the other hand, it doesn't bother with re-reads of data.</p>
<p>There are no staleness issues, the data is consistent as it can get (you can <em>never</em> get a completely atomic read of everything in MySQL), and you can issue as many calculations as you want at the price of one take of monitoring. As in Sheere's example, you can run <strong>Threads_connected/max_connections*100</strong>, mix status variables, system variables, meta-variables (e.g. Seconds_behind_master), user-created variables (e.g. number of purchases in your online shop) etc.</p>
<p><em>mycheckpoint</em>'s concept is to <strong>store</strong> data. And store it in relational format. That is, <strong>INSERT</strong> it to a table.</p>
<p>A sample-run generates a row, which lists all status, server, OS, user, meta variables. It's a huge row, with hundreds of columns. Columns like <strong>threads_connected</strong>, <strong>max_connections</strong>, <strong>innodb_buffer_pool_size</strong>, <strong>seconds_behind_master</strong>, etc.</p>
<p><em>mycheckpoint</em> hardly cares about these columns. It identifies them dynamically. Have you just upgraded to MySQL <strong>5.5</strong>? Oh, there's a new bunch of server and status variables? No problem, <em>mycheckpoint</em> will notice it doesn't have the matching columns and will add them via ALTER TABLE. There you go, now we have a place to store them.</p>
<p>Running a formula like <strong>Threads_connected/max_connections*100</strong> is as easy as issuing the following query:</p>
<blockquote>
<pre>SELECT Threads_connected/max_connections*100 FROM status_variables WHERE id = ...</pre>
</blockquote>
<p>Hmmm. This means I can run this formula on the most recent row I've just added. But wait, this also means I can run this formula on <em>any</em> row I've ever gathered.<span id="more-4736"></span></p>
<p>With <em>mycheckpoint</em> you can generate graphs <strong>retroactively</strong> using new formulas. The data is there, vanilla style. Any formula which can be calculated via SQL is good to go with. Plus, you get the benefit of cross referencing in fun ways: cross reference to the timestamp at which the sample was taken (so, for example, ignore the spikes generated at this and that timeframe due to maintenance. Don't alert me on these), to system issues like load average or CPU usage (show me the average <strong>Seconds_behind_master</strong> when load average is over <strong>8</strong>, or the average load average when slow query rate is over some threshold. You don't do that all the time, but when you need it, well, you can get all the insight you ever wanted.</p>
<p>Actually storing the monitored data in an easy to access format allows one to query, re-query, re-formulate. No worries about caching, you only sample once.</p>
<p>For completeness, all the above is relevant when the data is of numeric types. Other types are far more complicated to manage (the list of running queries is a common example).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mysql-monitoring-storing-not-caching/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

