<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org</title>
	<atom:link href="http://code.openark.org/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Thu, 02 Sep 2010 08:56:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Sphinx &amp; MySQL: facts and misconceptions</title>
		<link>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions</link>
		<comments>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions#comments</comments>
		<pubDate>Thu, 02 Sep 2010 08:56:15 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2754</guid>
		<description><![CDATA[Sphinx search is a full text search engine, commonly used with MySQL. There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx&#8217; properties, hoping to answer some common questions. Sphinx is not part of MySQL/Oracle. It is a standalone server; an external application to MySQL. Actually, it is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sphinxsearch.com/">Sphinx search</a> is a full text search engine, commonly used with MySQL.</p>
<p>There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx&#8217; properties, hoping to answer some common questions.</p>
<ul>
<li>Sphinx is not part of MySQL/Oracle.</li>
<li>It is a standalone server; an external application to MySQL.</li>
<li>Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.</li>
<li>And, although described as &#8220;free open-source SQL full-text search engine&#8221;, it is not SQL-specific: Sphinx can read documents from XML.</li>
<li>It is often described as &#8220;full text search for InnoDB&#8221;. This description  is misleading. Sphinx indexes text; be it from any storage engine or  external source. It solves, in a way, the issue of &#8220;FULLTEXT is only  supported by MyISAM&#8221;. Essentially, it provided full-text indexing for InnoDB tables, but in a <em>very</em> different way than the way MyISAM&#8217;s <strong>FULLTEXT</strong> index works.</li>
</ul>
<p>Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).<span id="more-2754"></span></p>
<ul>
<li>Being an external module, it does not update its indexes on the fly. So  if <strong>10</strong> new rows are <strong>INSERT</strong>ed, it has no knowledge of this. It must be  called externally to re-read the data (or just read the new data), and re-index.
<ul>
<li>This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM&#8217;s <strong>FULLTEXT</strong>. The latter is always updated, for every row <strong>INSERT</strong>ed, <strong>DELETE</strong>d or <strong>UPDATE</strong>d. The latter also suffers by this property, as this makes for serious overhead with large volumes.</li>
<li>There&#8217;s more than one way to make that less of an issue. I&#8217;ll write some more in future posts.</li>
</ul>
</li>
<li>Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked &#8220;Give me the blog post content for those posts containing &#8216;open source&#8217;&#8221;.
<ul>
<li>Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.</li>
<li>It is up to you to then get the content from the table.</li>
<li>With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.</li>
</ul>
</li>
<li>It can keep other numeric data. Such data can be used to filter results.</li>
<li>It provides with <strong>GROUP BY</strong>-like, as well as <strong>ORDER BY</strong>-like mechanism.</li>
<li>It allows for ordering results by relevance.</li>
<li>It allows for exact match search, boolean search, and more.</li>
<li>It has an API &amp; implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.</li>
</ul>
<p>The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:</p>
<ul>
<li>First and foremost, it knows how to query MySQL for data (duh!)</li>
<li>If you don&#8217;t mind compiling from source, you can rebuild MySQL with <a href="http://www.sphinxsearch.com/docs/current.html#sphinxse">SphinxSE</a>: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.
<ul>
<li>Thus, you can query for search results using <strong>SELECT</strong> statements, <strong>JOIN</strong>ing to document tables, retrieving results, all in one step.</li>
<li>If you do mind compiling MySQL, be aware that MariaDB <a href="http://askmonty.org/wiki/MariaDB_versus_MySQL">comes with SphinxSE</a> built in in newer versions.</li>
</ul>
</li>
<li>It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called <a href="http://www.sphinxsearch.com/docs/current.html#sphinxql">SphinxQL</a>.</li>
<li>Snippets (excerpts) are <a href="http://www.sphinxsearch.com/docs/current.html#sphinxse-snippets">supported</a> via MySQL UDF.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>mylvmbackup HOWTO: minimal privileges &amp; filesystem copy</title>
		<link>http://code.openark.org/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy</link>
		<comments>http://code.openark.org/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy#comments</comments>
		<pubDate>Tue, 17 Aug 2010 17:42:40 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2839</guid>
		<description><![CDATA[This HOWTO discusses two (unrelated) issues with mylvmbackup: The minimal privileges required to take MySQL backups with mylvmbackup. Making (non compressed) file system copy of one&#8217;s data files. Minimal privileges Some just give mylvmbackup the root account, which is far too permissive. We now consider what the minimal requirements of mylvmbackup are. The queries mylvmbackup [...]]]></description>
			<content:encoded><![CDATA[<p>This HOWTO discusses two (unrelated) issues with <a href="http://www.lenzg.net/mylvmbackup/"><em>mylvmbackup</em></a>:</p>
<ul>
<li>The minimal privileges required to take MySQL backups with <em>mylvmbackup.</em></li>
<li>Making (non compressed) file system copy of one&#8217;s data files.</li>
</ul>
<h4>Minimal privileges</h4>
<p>Some just give <em>mylvmbackup</em> the root account, which is far too permissive. We now consider what the minimal requirements of <em>mylvmbackup</em> are.</p>
<p>The queries <em>mylvmbackup</em> issues are:</p>
<ul>
<li><strong>FLUSH TABLES</strong></li>
<li><strong>FLUSH TABLES WITH READ LOCK</strong></li>
<li><strong>SHOW MASTER STATUS</strong></li>
<li><strong>SHOW SLAVE STATUS</strong></li>
<li><strong>UNLOCK TABLES</strong></li>
</ul>
<p>Both <strong>SHOW MASTER STATUS</strong> &amp; <strong>SHOW SLAVE STATUS</strong> require either the <strong>SUPER</strong> or <strong>REPLICATION CLIENT</strong> privilege. Since <strong>SUPER</strong> is more powerful, we choose <strong>REPLICATION CLIENT</strong>.</p>
<p>The <strong>FLUSH TABLES</strong> * and <strong>UNLOCK TABLES</strong> require the <strong>RELOAD</strong> privilege.</p>
<p>However, we are not done yet. <em>mylvmbackup</em> connects to the <strong>mysql</strong> database, which means we must also have some privilege there, too. We choose the <strong>SELECT</strong> privilege.</p>
<p><span id="more-2839"></span>Finally, here are the commands to create a <em>mylvmbackup</em> user with minimal privileges:</p>
<blockquote>
<pre>CREATE USER 'mylvmbackup'@'localhost' IDENTIFIED BY '12345';
GRANT RELOAD, REPLICATION CLIENT ON *.* TO 'mylvmbackup'@'localhost';
GRANT SELECT ON mysql.* TO 'mylvmbackup'@'localhost';
</pre>
</blockquote>
<p>In the <strong>mylvmbackup.conf</strong> file, the correlating rows are:</p>
<blockquote>
<pre>[mysql]
user=mylvmbackup
password=12345
host=localhost
</pre>
</blockquote>
<h4>Filesystem copy</h4>
<p>By default, <em>mylvmbackup</em> creates a <strong>.tar.gz</strong> compressed backup file of your data. This is good if the reason you&#8217;re running <em>mylvmbackup</em> is to, well, make a backup. However, as with all backups, one may be making the backup so as to create a replication server. But in this case you don&#8217;t really want compressed data: you want the data extracted on the replication server, just as it is on the original host.</p>
<p><em>mylvmbackup</em> supports backing up the files using <em>rsync</em>.</p>
<p>To copy MySQL data to a remote host, configure the following in the mylvmbackup.conf file:</p>
<blockquote>
<pre>[fs]
backupdir=shlomi@backuphost:/data/backup/mysql
[misc]
backuptype=rsync
</pre>
</blockquote>
<p>You may be prompted to enter password, unless you have the user&#8217;s public key stored on the remote host.</p>
<p>Normally, <em>rsync</em> is considered as <strong>r</strong>emote-<strong>sync</strong>, but it also works on local file systems. If you have a remote directory mounted on your file system (e.g. with <em>nfs</em>), you can use the fact that <em>rsync</em> works just as well with local file systems:</p>
<blockquote>
<pre>[fs]
backupdir=/mnt/backup/mysql
[misc]
backuptype=rsync
</pre>
</blockquote>
<p>Voila! Your backup is complete.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mylvmbackup-howto-minimal-privileges-filesystem-copy/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MMM for MySQL single reader role</title>
		<link>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role</link>
		<comments>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role#comments</comments>
		<pubDate>Thu, 12 Aug 2010 12:12:16 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[High availability]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2824</guid>
		<description><![CDATA[The standard documentation and tutorials on MMM for MySQL, for master-master replication setup, suggest one Virtual IP for the writer role, and two Virtual IPs for the reader role. It can be desired to only have a single virtual IP for the reader role, as explained below. The two IPs for the reader role A [...]]]></description>
			<content:encoded><![CDATA[<p>The standard documentation and tutorials on <a href="http://mysql-mmm.org/">MMM for MySQL</a>, for master-master replication setup, suggest one Virtual IP for the <em>writer</em> role, and two Virtual IPs for the <em>reader</em> role. It can be desired to only have a single virtual IP for the reader role, as explained below.</p>
<h4>The two IPs for the reader role</h4>
<p>A simplified excerpt from the <strong>mmm_common.conf</strong> sample configuration file, as can be found on the project&#8217;s site and which is most quoted:<span id="more-2824"></span></p>
<blockquote>
<pre>...
&lt;host db1&gt;
  ip                      192.168.0.11
  mode                    master
  peer                    db2
&lt;/host&gt;

&lt;host db2&gt;
  ip                      192.168.0.12
  mode                    master
  peer                    db1
&lt;/host&gt;
...
&lt;role writer&gt;
  hosts                   db1, db2
  ips                     192.168.0.100
  mode                    exclusive
&lt;/role&gt;

&lt;role reader&gt;
  hosts                   db1, db2
  ips                     192.168.0.101, 192.168.0.102
  mode                    balanced
&lt;/role&gt;
</pre>
</blockquote>
<p>In the above setup <strong>db1</strong> &amp; <strong>db2</strong> participate in master-master active-passive replication. Whenever you need to write something, you use <strong>192.18.0.100</strong>, which is the virtual IP for the writer role. Whenever you need to read something, you use either <strong>192.168.0.101</strong> or <strong>192.168.0.102</strong>, which are the virtual IPs of the two machines, this time in read role. Logic says one wishes to distribute reads between the two machines.</p>
<h4>One IP for reader role</h4>
<p>I have a few cases where the above setup is not satisfactory: there is a requirement to know the IP of the passive (read-only) master. Reason? There are queries which we only want to execute on the slave (reporting, long analysis), and only execute on the active master when this isn&#8217;t possible. Sometimes we might even prefer waiting for a slave to come back up rather than execute a query on the master.</p>
<p>This may involve an application level solution, or a connection-pool level solution (&#8220;get me a slave&#8217;s connection, or, if that&#8217;s not possible, get me the master&#8217;s&#8221;).</p>
<p>Anyway, neither <strong>192.168.0.101</strong> nor <strong>192.168.0.102</strong> relate to a particular machine&#8217;s role status. That is, the fact that one of the machines is in <em>writer</em> mode or not does not affect these virtual IPs.</p>
<p>The solution is a minor change to the configuration file. Real minor:</p>
<blockquote>
<pre>&lt;role reader&gt;
  hosts                   db1, db2
  ips                     192.168.0.101
  mode                    balanced
&lt;/role&gt;
</pre>
</blockquote>
<p>In this new setup the two nodes compete for a single <em>reader</em> role virtual IP. There is no <strong>192.168.0.102</strong> anymore. Although it does not reflect from the configuration file, it turns out MMM acts in a smart way; the way you would expect it to run.</p>
<p>There is nothing to suggest in the above that the IPs <strong>192.168.0.100</strong> &amp; <strong>192.168.0.101</strong> will be distributed between the two machines. But you would <em>like</em> them to. And MMM does that. It makes sure that, if possible, one of the machines (say <strong>db1</strong>) gets the <em>writer</em> role, hence <strong>192.168.0.100</strong>, and the other (<strong>db2</strong>) the <em>reader</em> role, hence <strong>192.168.0.101</strong>.</p>
<p>Moreover, it prefers that situation over a current known situation: say <strong>db1</strong> went down. The <em>writer</em> role moves to <strong>db2</strong>. When <strong>db1</strong> is up again, MMM acts smartly: it does <em>not</em> give it back the <em>writer</em> role (since moving the active master around is costly, after all), but <em>does</em> give it the <em>reader</em> role, along with the <strong>192.168.2.101</strong> IP. So it takes care not to leave a server without a role, while preferring to move the <em>writer</em> role as little as possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part II</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii#comments</comments>
		<pubDate>Thu, 12 Aug 2010 03:24:06 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2801</guid>
		<description><![CDATA[Continuing Table refactoring &#38; application version upgrades, Part I, we now discuss code &#38; database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both. As I&#8217;ll show, DROP operations are significantly simpler than creation operations. Interestingly, it&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing <a href="http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i">Table refactoring &amp; application version upgrades, Part I</a>, we now discuss code &amp; database upgrades which require <strong>DROP</strong> operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.</p>
<p>As I&#8217;ll show, DROP operations are significantly simpler than creation operations. Interestingly, it&#8217;s the same as in life.</p>
<h4>DROP COLUMN</h4>
<p>A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Remove all references to column; make sure no queries use said column.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP COLUMN</strong>.</li>
</ol>
<h4>DROP INDEX</h4>
<p>A possibly simpler case here. Why would you drop an index? Is it because you found out you never use it anymore? Then all you have to do is just drop it.</p>
<p>Or perhaps you don&#8217;t need the functionality the index supports anymore? Then first drop the functionality:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>. Check out InnoDB Plugin here.<span id="more-2801"></span></li>
</ol>
<h4>DROP UNIQUE INDEX</h4>
<p>When using Master-Slave failover for table refactoring, we&#8217;re now removing a constraint from the slave. Since the master is more constrained than the slave, there is no problem here. It&#8217;s mostly the same as with a normal DROP INDEX, with a minor addition:</p>
<ol>
<li>(optional) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Discard using functionality which relies on index.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that inserts duplicates.</li>
</ol>
<h4>DROP FOREIGN KEY</h4>
<p>Again, we are removing a constraint.</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>DROP INDEX</strong>.</li>
<li>(optional) App: <strong>V2</strong> -&gt; <strong>V3</strong>. Enable functionality that conflicts with removed constraint. I mean, if you really know what you are doing.</li>
</ol>
<h4>DROP TABLE</h4>
<p>The very simple steps are:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Make sure no reference to table is made.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong>. Issue a <strong>DROP TABLE</strong>.</li>
</ol>
<p>With <strong>ext3</strong> dropping a large table is no less than a nightmare. Not only does the action take long time, it also locks down the table cache, which very quickly leads to having dozens of queries hang. <strong>xfs</strong> is a good alternative.</p>
<h4>Conclusion</h4>
<p>We looked at single table operations, coupled with application upgrades. By carefully looking at the process breakdown, multiple changes can be addressed with ease and safety. Not all operations are completely safe when used with replication failover. But they are mostly safe if you have some trust in your code.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-ii/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another Python MySQL template</title>
		<link>http://code.openark.org/blog/mysql/another-python-mysql-template</link>
		<comments>http://code.openark.org/blog/mysql/another-python-mysql-template#comments</comments>
		<pubDate>Wed, 11 Aug 2010 05:51:57 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2815</guid>
		<description><![CDATA[Following up on Matt Reid&#8217;s simple python, mysql connection and iteration, I would like to share one of my own, which is the base for mycheckpoint &#38; openark kit scripts. It is oriented to provide with clean access to the data: the user is not expected to handle cursors and connections. Result sets are returned [...]]]></description>
			<content:encoded><![CDATA[<p>Following up on Matt Reid&#8217;s <a href="http://themattreid.com/wordpress/?p=330">simple python, mysql connection and iteration</a>, I would like to share one of my own, which is the base for mycheckpoint &amp; openark kit scripts.</p>
<p>It is oriented to provide with clean access to the data: the user is not expected to handle cursors and connections. Result sets are returned as python lists and dictionaries. It is also config file aware and comes with built in command line options.</p>
<p>I hope it comes to use: <a href="http://code.openark.org/blog/wp-content/uploads/2010/08/my.py">my.py</a></p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/another-python-mysql-template/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Table refactoring &amp; application version upgrades, Part I</title>
		<link>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i</link>
		<comments>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i#comments</comments>
		<pubDate>Tue, 10 Aug 2010 12:36:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2775</guid>
		<description><![CDATA[A developer&#8217;s major concern is: How do I do application &#38; database upgrades with minimal downtime? How do I synchronize between a DB&#8217;s version upgrade and an application&#8217;s version upgrade? I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to [...]]]></description>
			<content:encoded><![CDATA[<p>A developer&#8217;s major concern is: <em>How do I do application &amp; database upgrades with minimal downtime? How do I synchronize between a DB&#8217;s version upgrade and an application&#8217;s version upgrade?<br />
</em></p>
<p>I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.</p>
<h4>Reader prerequisites</h4>
<p>I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an <strong>ALTER</strong> command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of <a href="http://mysql-mmm.org/">MMM for MySQL</a>. When I describe &#8220;Failover from <strong>M1</strong> to <strong>M2</strong>&#8220;, I mean &#8220;Make the <strong>ALTER</strong> changes on <strong>M2</strong> (passive), then switch your application from <strong>M1</strong> to <strong>M2</strong> (change of IPs, VIP, etc.), promoting <strong>M2</strong> to active position, then apply same changes on <strong>M1</strong> (now passive) or completely rebuild it&#8221;.</p>
<p>Phew, a one sentence description of M-M usage&#8230;</p>
<p>I also assume the reader&#8217;s understanding that a table&#8217;s schema can be different on master &amp; slave, which is the basis for the &#8220;use replication for refactoring&#8221; trick. But it cannot be too different, or, to be precise, the two schemata must both support the ongoing queries for the table.</p>
<p>A full discussion of the above is beyond the scope of this post.</p>
<h4>Types of refactoring needs</h4>
<p>As I limit this discussion to single table refactoring,we can look at major refactoring operations and their impact on application &amp; upgrades. We will discuss ADD/DROP COLUMN, ADD/DROP INDEX, ADD/DROP UNIQUE INDEX, ADD/DROP FOREIGN KEY, ADD/DROP TABLE.</p>
<p>We will assume the database and application are both in Version #1 (<strong>V1</strong>), and need to be upgraded to <strong>V2</strong> or greater.<span id="more-2775"></span></p>
<h4>ADD INDEX</h4>
<p>Starting with the easier actions. Why would you add an index? Either:</p>
<ol>
<li>There is some existing query which can be optimized by the new query</li>
<li>Or there is some new functionality which issues a query for which the new index is required.</li>
</ol>
<p>Adding an index is an easy action in that the table&#8217;s data does not really change.</p>
<p>In case <strong>#1</strong>, all you need to do is to add the new index (if the table is large, fail over from <strong>M1</strong> to <strong>M2</strong>). There is no application upgrade, so all that happens is that the database upgrades <strong>V1 </strong>-&gt;<strong> V2</strong>.</p>
<p>In case <strong>#2</strong>, the database must be prepared with new schema before the new functionality/query is introduced (since it depends on the existence of the index). The steps, therefore, are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>)</li>
<li>(Sometime later) App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will issue queries which utilize the new index.</li>
</ol>
<p>The application does not have to be upgraded at the same instant the DB gets upgraded. In fact, we&#8217;ll see that this is a typical scenario: we can separate upgrades into smaller steps, which allow for time lapse. One <em>could</em> work out steps <strong>1</strong> &amp; <strong>2</strong> together, but that would take an extra effort.</p>
<h4>ADD COLUMN</h4>
<p>This must be one of the most common table schema upgrades: a new property is needed on the application side. It must be supported by the database. Perhaps a new field in some Java Object, with Hibernate mapping that field onto a new column. Or maybe the new column is there for purpose of de-normalization.</p>
<p>This is also a more complicated task. Let&#8217;s look at the required steps:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD COLUMN</strong>.</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change is: provide column value for newly <strong>INSERT</strong>ed rows.</li>
<li>If needed, retroactively update column values for all pre-existing rows.</li>
<li>App: <strong>V2</strong> -&gt; <strong>V3</strong>. Application begins to use (read, <strong>SELECT</strong>) new column.</li>
</ol>
<p>The above procedure assumes that the new column must have some calculated value. A 10-million rows table must now be updated, to have the correct values filled in. So we ask of the application to start filling in data for new rows, which makes the invalid row set static. We can just take a &#8220;from row&#8221; and a &#8220;to row&#8221; and fill in the missing column&#8217;s value for those rows. Only when all rows contain valid values can we let the application start using that row. This makes for <em>two</em> application upgrades.</p>
<p>If you&#8217;re content with just a static <strong>DEFAULT</strong> value, then step <strong>3</strong> can be skipped, and step <strong>4</strong> can be merged with step <strong>2</strong>.</p>
<h4>ADD UNIQUE INDEX</h4>
<p>This is an altogether different case than the normal <strong>ADD INDEX</strong>, even though they may seem similar. And the case is particularly different when using Master-Slave failover for rebuilding the table.</p>
<p>Consider the case where we add a <strong>UNIQUE INDEX</strong> on a slave. Some <strong>INSERT</strong> query executes on the master, successfully, and is logged to the binary log. The slave picks it up, tries to execute it, to find that it fails on a DUPLICATE KEY error.</p>
<p>The <strong>UNIQUE INDEX</strong> is a constraint, and it makes the slave more constrained than the master. This is a delicate situation. Here how to (mostly) work it out:</p>
<ol>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Change <strong>INSERT</strong> queries on relevant table to <strong>INSERT IGNORE</strong> or <strong>REPLACE</strong> queries, whichever is more appropriate.</li>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD UNIQUE KEY</strong> (and while at it, a tip: are you aware of <a href="http://dev.mysql.com/doc/refman/5.1/en/alter-table.html">ALTER IGNORE TABLE</a>?)</li>
</ol>
<p>The change of query ensures that the query will succeed on the slave (either by silently doing nothing or by actually replacing content). It also means that the slave can now have different data than the master. Of course, it you trust your application to never <strong>INSERT</strong> duplicates, you can sleep better.</p>
<p>We do not handle <strong>UPDATE</strong> statements here.</p>
<h4>ADD CONSTRAINT FOREIGN KEY</h4>
<p>As with <strong>ADD UNIQUE INDEX</strong>, there is a new constraint here. A slave becomes more constrained than the master. But we now have to make sure <strong>INSERT</strong>, <strong>UPDATE</strong> and <strong>DELETE</strong> statements all go peacefully (well, it also depends on the type of <strong>ON DELETE</strong> and <strong>ON UPDATE</strong> property of the FK).</p>
<p>The steps would be:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (possibly failover from <strong>M1</strong> to <strong>M2</strong>), change is <strong>ADD CONSTRAINT FOREIGN KEY</strong>.</li>
</ol>
<p>And then cross your fingers or have trust in your application. If the table is small enough, one does not have to use replication to do the refactoring, and life is simpler. Just execute the <strong>ALTER</strong> on the active master, and continue with your life.</p>
<h4>CREATE TABLE</h4>
<p>This is a simple case, since the table is new. The steps are:</p>
<ol>
<li>DB: <strong>V1</strong> -&gt; <strong>V2</strong> (no need to use slaves here)</li>
<li>App: <strong>V1</strong> -&gt; <strong>V2</strong>. Application will start using new table.</li>
</ol>
<h4>Conslusion</h4>
<p>Having such steps formalized help with development management and database management. It makes clear what is expected of the application, and what is expected of the database. The breaking down of these operations into sequential steps allows us to work more slowly; make preparation work; work within our own working hours; get a chance to see the family.</p>
<p>In this post we took a look at &#8220;creation&#8221; refactoring changes. New columns, new keys, new constraints. In the next part of this article, we&#8217;ll discuss <strong>DROP</strong> operations.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/table-refactoring-application-version-upgrades-part-i/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tips for taking MySQL backups using LVM</title>
		<link>http://code.openark.org/blog/mysql/tips-for-taking-mysql-backups-using-lvm</link>
		<comments>http://code.openark.org/blog/mysql/tips-for-taking-mysql-backups-using-lvm#comments</comments>
		<pubDate>Tue, 03 Aug 2010 06:45:29 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2717</guid>
		<description><![CDATA[LVM uses copy-on-write to implement snapshots. Whenever you&#8217;re writing data to some page, LVM copies the original page (the way it looked like when the snapshot was taken) to the snapshot volume. The snapshot volume must be large enough to accommodate all pages written to for the duration of the snapshot&#8217;s lifetime. In other words, [...]]]></description>
			<content:encoded><![CDATA[<p>LVM uses copy-on-write to implement snapshots. Whenever you&#8217;re writing data to some page, LVM copies the original page (the way it looked like when the snapshot was taken) to the snapshot volume. The snapshot volume must be large enough to accommodate all pages written to for the duration of the snapshot&#8217;s lifetime. In other words, you must be able to copy the data somewhere outside (tape, NFS, rsync, etc.) in less time than it would take for the snapshot to fill up.</p>
<p>While LVM allows for hot backups of MySQL, it still poses an impact on the disks. An LVM snapshot backup may not go unnoticed by the MySQL users.</p>
<p>Some general guidelines for making life easier with LVM backups follow.</p>
<h4>Lighter, longer snapshots</h4>
<p>If you&#8217;re confident that you have enough space on your snapshot volume, you may take the opportunity to make for a <em>longer</em> backup time. Why? Because you would then be able to reduce the stress from the file system. Use <strong>ionice</strong> when copying your data from the snapshot volume:</p>
<blockquote>
<pre>ionice -c 2 cp -R /mnt/mysql_snapshot /mnt/backup/daily/20100719/
</pre>
</blockquote>
<p><em>[Update: this is only on the cfq I/O scheduler; thanks, Vojtech]</em></p>
<h4>Are you running out of space?</h4>
<p>Monitor snapshot&#8217;s allocated size: if there&#8217;s just one snapshot, do it like this:<span id="more-2717"></span></p>
<blockquote>
<pre>lvdisplay | grep Allocated                                                                                                                  Mon Jul 19 09:51:29 2010

 Allocated to snapshot  3.63%
</pre>
</blockquote>
<p>Don&#8217;t let it reach 100%.</p>
<h4>Avoid running out of space</h4>
<p>To make sure you don&#8217;t run out of snapshot allocated size, stop all administrative scripts.</p>
<ul>
<li>Are you running your weekly purging of old data? You will be writing a lot of pages, and all will have to fit in the snapshot.</li>
<li>Building your reports? You may be creating large temporary tables; make sure these are not on the snapshot volume.</li>
<li>Rebuilding your Sphinx fulltext index? Make sure it is not on the snapshot volume, or postpone till after backup.</li>
</ul>
<p>You will gain not only snapshot space, but also faster backups.</p>
<h4>Someone did the job before you</h4>
<p>Use <a href="http://www.lenzg.net/mylvmbackup/">mylvmbackup</a>: the MySQL LVM backup script by Lenz Grimmer. Or do it manually: follow this old-yet-relevant <a href="http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/">post</a> by Peter Zaitsev.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/tips-for-taking-mysql-backups-using-lvm/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>SQL trick: overcoming GROUP_CONCAT limitation in special cases</title>
		<link>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases</link>
		<comments>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases#comments</comments>
		<pubDate>Wed, 21 Jul 2010 13:14:30 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2580</guid>
		<description><![CDATA[In Verifying GROUP_CONCAT limit without using variables, I have presented a test to verify if group_concat_max_len is sufficient for known limitations. I will follow the path where I assume I cannot control group_concat_max_len, not even in session scope, and show an SQL solution, dirty as it is, to overcome the GROUP_CONCAT limitation, under certain conditions. [...]]]></description>
			<content:encoded><![CDATA[<p>In <a title="Link to Verifying GROUP_CONCAT limit without  using variables" rel="bookmark" href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables">Verifying GROUP_CONCAT limit without using variables</a>, I have presented a test to verify if <strong>group_concat_max_len</strong> is sufficient for known limitations. I will follow the path where I assume I cannot control <strong>group_concat_max_len</strong>, not even in session scope, and show an SQL solution, dirty as it is, to overcome the <strong>GROUP_CONCAT</strong> limitation, under certain conditions.</p>
<p>Sheeri rightfully <a href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comment-14617">asks</a> why I wouldn&#8217;t just set <strong>group_concat_max_len </strong>in session scope. The particular case I have is that I&#8217;m providing a VIEW definition. I&#8217;d like users to &#8220;install&#8221; that view, i.e. to <strong>CREATE</strong> it on their database. The VIEW does some logic, and uses <strong>GROUP_CONCAT</strong> to implement that logic.</p>
<p>Now, I have no control on the DBA or developer who created the view. The creation of the view has nothing to do with the <strong>group_concat_max_len</strong> setting on her database instance.</p>
<h4>An example</h4>
<p>OK, apologies aside. Using the <a href="http://dev.mysql.com/doc/sakila/en/sakila.html">sakila</a> database, I execute:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): AKROYD,AKROYD,AKROYD,ALLEN,ALLEN,ALLEN,ASTAIRE,BACALL,BAILEY,BAILEY,BALE,BALL,BARRYMORE,BASINGER,BENING,BENING,BERGEN,BERGMAN,BERRY,BERRY,BERRY,BIRCH,BLOOM,BOLGER,BOLGER,BRIDGES,BRODY,BRODY,BULLOCK,CAGE,CAGE,CARREY,CHAPLIN,CHASE,CHASE,CLOSE,COSTNER,CRAWFORD,CRAWFORD,CRONYN,CRONYN,CROWE,CRUISE,CRUZ,DAMON,DAVIS,DAVIS,DAVIS,DAY-LEWIS,DEAN,DEAN,DEE,DEE,DEGENERES,DEGENERES,DEGENERES,DENCH,DENCH,DEPP,DEPP,DERN,DREYFUSS,DUKAKIS,DUKAKIS,DUNST,FAWCETT,FAWCETT,GABLE,GARLAND,GARLAND,GARLAND,GIBSON,GOLDBERG,GOODING,GOODING,GRANT,GUINESS,GUINESS,GUINESS,HACKMAN,HACKMAN,HARRIS,HARRIS,HARRIS,HAWKE,HESTON,HOFFMAN,HOFFMAN,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPKINS,HOPPER,HOPPER,HUDSON,HUNT,HURT,JACKMAN,JACKMAN,JOHANSSON,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KEITEL,KEITEL,KILMER,KILMER,KILMER,KILMER,KILMER,LEIGH,LOLLOBRIGIDA,MALDEN,MANSFIELD,MARX,MCCONAUGHEY,MCCONAUGHEY,MCDORMAND,MCKELLEN,MCKELLEN,MCQUEEN,MCQUEEN,MIRANDA,MONROE,MONROE,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,NOLTE,NOLTE,NOLTE,NOLTE,OLIVIER,OLIVIER,PALTROW,PALTROW,P
1 row in set, 1 warning (0.00 sec)

mysql&gt; SHOW WARNINGS;
+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |
+---------+------+--------------------------------------+
1 row in set (0.00 sec)
</pre>
</blockquote>
<p><span id="more-2580"></span>So, my <strong>GROUP_CONCAT</strong> has been truncated. How much did I lose?</p>
<blockquote>
<pre>mysql&gt; SELECT SUM(LENGTH(last_name) + 1) - 1 FROM actor;
+--------------------------------+
| SUM(LENGTH(last_name) + 1) - 1 |
+--------------------------------+
|                           1445 |
+--------------------------------+
</pre>
</blockquote>
<p>(In the above query I counted the separating commas; they are part of the <strong>GROUP_CONCAT</strong> limit).</p>
<h4>The special case at hand</h4>
<p>The proposed SQL trick assumes the following:</p>
<ul>
<li>The length of the <strong>GROUP_CONCAT</strong> result is <em>known to be under a certain value</em>.</li>
<li>A <strong>GROUP_CONCAT</strong> of any set of <em>n</em> rows is <em>known to be shorter than (or equal to) <strong>1024</strong> characters</em>.</li>
</ul>
<p>In our above example, I happen to know that the length of the <strong>GROUP_CONCAT</strong> result is below <strong>2048</strong>. I also happen to know that any <strong>100</strong> rows will yield in a <strong>GROUP_CONCAT</strong> length of less than <strong>1024</strong>.</p>
<p>How can I know this? Well, the length of my <strong>VARCHAR</strong>, or the fact I&#8217;m handling <strong>INT</strong> values can give me upper bounds on total lengths.</p>
<h4>Steps towards the solution</h4>
<p>Returning to our example, my intention becomes clearer: I want to work it out in two phases (later on I&#8217;ll show how this can be done in more phases). Any of the following is good:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 1 and 100 \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): GUINESS,WAHLBERG,CHASE,DAVIS,LOLLOBRIGIDA,NICHOLSON,MOSTEL,JOHANSSON,SWANK,GABLE,CAGE,BERRY,WOOD,BERGEN,OLIVIER,COSTNER,VOIGHT,TORN,FAWCETT,TRACY,PALTROW,MARX,KILMER,STREEP,BLOOM,CRAWFORD,MCQUEEN,HOFFMAN,WAYNE,PECK,SOBIESKI,HACKMAN,PECK,OLIVIER,DEAN,DUKAKIS,BOLGER,MCKELLEN,BRODY,CAGE,DEGENERES,MIRANDA,JOVOVICH,STALLONE,KILMER,GOLDBERG,BARRYMORE,DAY-LEWIS,CRONYN,HOPKINS,PHOENIX,HUNT,TEMPLE,PINKETT,KILMER,HARRIS,CRUISE,AKROYD,TAUTOU,BERRY,NEESON,NEESON,WRAY,JOHANSSON,HUDSON,TANDY,BAILEY,WINSLET,PALTROW,MCCONAUGHEY,GRANT,WILLIAMS,PENN,KEITEL,POSEY,ASTAIRE,MCCONAUGHEY,SINATRA,HOFFMAN,CRUZ,DAMON,JOLIE,WILLIS,PITT,ZELLWEGER,CHAPLIN,PECK,PESCI,DENCH,GUINESS,BERRY,AKROYD,PRESLEY,TORN,WAHLBERG,WILLIS,HAWKE,BRIDGES,MOSTEL,DEPP
1 row in set (0.00 sec)

mysql&gt; SELECT GROUP_CONCAT(last_name) FROM actor WHERE actor_id BETWEEN 101 and 200 \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): DAVIS,TORN,LEIGH,CRONYN,CROWE,DUNST,DEGENERES,NOLTE,DERN,DAVIS,ZELLWEGER,BACALL,HOPKINS,MCDORMAND,BALE,STREEP,TRACY,ALLEN,JACKMAN,MONROE,BERGMAN,NOLTE,DENCH,BENING,NOLTE,TOMEI,GARLAND,MCQUEEN,CRAWFORD,KEITEL,JACKMAN,HOPPER,PENN,HOPKINS,REYNOLDS,MANSFIELD,WILLIAMS,DEE,GOODING,HURT,HARRIS,RYDER,DEAN,WITHERSPOON,ALLEN,JOHANSSON,WINSLET,DEE,TEMPLE,NOLTE,HESTON,HARRIS,KILMER,GIBSON,TANDY,WOOD,MALDEN,BASINGER,BRODY,DEPP,HOPE,KILMER,WEST,WILLIS,GARLAND,DEGENERES,BULLOCK,WILSON,HOFFMAN,HOPPER,PFEIFFER,WILLIAMS,DREYFUSS,BENING,HACKMAN,CHASE,MCKELLEN,MONROE,GUINESS,SILVERSTONE,CARREY,AKROYD,CLOSE,GARLAND,BOLGER,ZELLWEGER,BALL,DUKAKIS,BIRCH,BAILEY,GOODING,SUVARI,TEMPLE,ALLEN,SILVERSTONE,WALKEN,WEST,KEITEL,FAWCETT,TEMPLE
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>It&#8217;s somewhat tempting to try the following trick based on <strong>IF</strong>, but see what happens:</p>
<blockquote>
<pre>mysql&gt; SELECT GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')) FROM actor\G
*************************** 1. row ***************************
GROUP_CONCAT(IF(actor_id BETWEEN 1 AND 100, last_name, '')): AKROYD,AKROYD,,,,,ASTAIRE,,BAILEY,,,,BARRYMORE,,,,BERGEN,,BERRY,BERRY,BERRY,,BLOOM,BOLGER,,BRIDGES,BRODY,,,CAGE,CAGE,,CHAPLIN,CHASE,,,COSTNER,CRAWFORD,,CRONYN,,,CRUISE,CRUZ,DAMON,DAVIS,,,DAY-LEWIS,DEAN,,,,DEGENERES,,,DENCH,,DEPP,,,,DUKAKIS,,,FAWCETT,,GABLE,,,,,GOLDBERG,,,GRANT,GUINESS,GUINESS,,HACKMAN,,HARRIS,,,HAWKE,,HOFFMAN,HOFFMAN,,,HOPKINS,,,,,HUDSON,HUNT,,,,JOHANSSON,JOHANSSON,,JOLIE,JOVOVICH,KEITEL,,,KILMER,KILMER,KILMER,,,,LOLLOBRIGIDA,,,MARX,MCCONAUGHEY,MCCONAUGHEY,,MCKELLEN,,MCQUEEN,,MIRANDA,,,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,,,,,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,,PESCI,,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,,,,,SINATRA,SOBIESKI,STALLONE,STREEP,,,SWANK,TANDY,,TAUTOU,TEMPLE,,,,,TORN,TORN,,TRACY,,VOIGHT,WAHLBERG,WAHLBERG,,WAYNE,,,WILLIAMS,,,WILLIS,WILLIS,,,WINSLET,,,WOOD,,WRAY,ZELLWEGER,,
1 row in set (0.00 sec)
</pre>
</blockquote>
<p>We&#8217;re getting there, though. We will mimic <strong>GROUP_CONCAT</strong>&#8216;s separator by using <strong>CONCAT</strong>, and remove the default separator:</p>
<blockquote>
<pre>SELECT
 GROUP_CONCAT(
   IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '')
   SEPARATOR ''
 ) AS result
FROM actor
\G
*************************** 1. row ***************************
result: ,AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER
1 row in set (0.00 sec)
</pre>
</blockquote>
<h4>Solution</h4>
<p>Let&#8217;s combine all we had so far to get the final result:</p>
<blockquote>
<pre>SELECT
  SUBSTRING(
    CONCAT(
      GROUP_CONCAT(
        IF(actor_id BETWEEN 1 AND 100, CONCAT(',', last_name), '')
        SEPARATOR ''
      ),
      GROUP_CONCAT(
        IF(actor_id BETWEEN 101 AND 200, CONCAT(',', last_name), '')
        SEPARATOR ''
      )
    ),
    2
  ) AS result
FROM actor
\G

*************************** 1. row ***************************
result: AKROYD,AKROYD,ASTAIRE,BAILEY,BARRYMORE,BERGEN,BERRY,BERRY,BERRY,BLOOM,BOLGER,BRIDGES,BRODY,CAGE,CAGE,CHAPLIN,CHASE,COSTNER,CRAWFORD,CRONYN,CRUISE,CRUZ,DAMON,DAVIS,DAY-LEWIS,DEAN,DEGENERES,DENCH,DEPP,DUKAKIS,FAWCETT,GABLE,GOLDBERG,GRANT,GUINESS,GUINESS,HACKMAN,HARRIS,HAWKE,HOFFMAN,HOFFMAN,HOPKINS,HUDSON,HUNT,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KILMER,KILMER,KILMER,LOLLOBRIGIDA,MARX,MCCONAUGHEY,MCCONAUGHEY,MCKELLEN,MCQUEEN,MIRANDA,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,OLIVIER,OLIVIER,PALTROW,PALTROW,PECK,PECK,PECK,PENN,PESCI,PHOENIX,PINKETT,PITT,POSEY,PRESLEY,SINATRA,SOBIESKI,STALLONE,STREEP,SWANK,TANDY,TAUTOU,TEMPLE,TORN,TORN,TRACY,VOIGHT,WAHLBERG,WAHLBERG,WAYNE,WILLIAMS,WILLIS,WILLIS,WINSLET,WOOD,WRAY,ZELLWEGER,AKROYD,ALLEN,ALLEN,ALLEN,BACALL,BAILEY,BALE,BALL,BASINGER,BENING,BENING,BERGMAN,BIRCH,BOLGER,BRODY,BULLOCK,CARREY,CHASE,CLOSE,CRAWFORD,CRONYN,CROWE,DAVIS,DAVIS,DEAN,DEE,DEE,DEGENERES,DEGENERES,DENCH,DEPP,DERN,DREYFUSS,DUKAKIS,DUNST,FAWCETT,GARLAND,GARLAND,GARLAND,GIBSON,GOODING,GOODING,GUINESS,HACKMAN,HARRIS,HARRIS,HESTON,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPPER,HOPPER,HURT,JACKMAN,JACKMAN,JOHANSSON,KEITEL,KEITEL,KILMER,KILMER,LEIGH,MALDEN,MANSFIELD,MCDORMAND,MCKELLEN,MCQUEEN,MONROE,MONROE,NOLTE,NOLTE,NOLTE,NOLTE,PENN,PFEIFFER,REYNOLDS,RYDER,SILVERSTONE,SILVERSTONE,STREEP,SUVARI,TANDY,TEMPLE,TEMPLE,TEMPLE,TOMEI,TORN,TRACY,WALKEN,WEST,WEST,WILLIAMS,WILLIAMS,WILLIS,WILSON,WINSLET,WITHERSPOON,WOOD,ZELLWEGER,ZELLWEGER
1 row in set (0.00 sec)
</pre>
</blockquote>
<h4>More than 2048 characters?</h4>
<p>As far as the upper limit is known, we can work this trick in the same manner. Assume the length is expected to be <strong>3000</strong> characters. We can then <strong>CONCAT</strong> three, or four, or five <strong>GROUP_CONCAT</strong> results, each of fewer number of rows as required. Just copy+paste the above <strong>GROUP_CONCAT(&#8230;)</strong> clause a couple more times, and edit the <strong>actor_id BETWEEN n AND m</strong> clauses.</p>
<p>Moreover, further using <strong>MIN(actor_id)</strong>, <strong>MAX(actor_id)</strong> can minimize dependencies on specific values.</p>
<p>Dirty? ugly? Not arguing. But it&#8217;s working! In some ways it is not such a dirty solution: I&#8217;m avoiding using stored routines (easily setting the <strong>group_concat_max_len</strong> session variable from within a stored function&#8217;s body, see Justin&#8217;s <a href="http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comment-14641">suggestion</a>), so I&#8217;m only relying on SQL, not on &#8220;external&#8221; technology, if I may call it that way.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-trick-overcoming-group_concat-limitation-in-special-cases/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>mycheckpoint (rev. 170): improved custom queries; local charting; page/swap I/O monitoring; improved HTML reports</title>
		<link>http://code.openark.org/blog/mysql/mycheckpoint-rev-170-improved-custom-queries-local-charting-pageswap-io-monitoring-improved-html-reports</link>
		<comments>http://code.openark.org/blog/mysql/mycheckpoint-rev-170-improved-custom-queries-local-charting-pageswap-io-monitoring-improved-html-reports#comments</comments>
		<pubDate>Fri, 16 Jul 2010 08:58:40 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Graphs]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[mycheckpoint]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2650</guid>
		<description><![CDATA[Revision 170 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision: Improved custom queries: lifting of limitations from previous, introductory revision; better HTML presentation Local, inline charting: no rendering of Google Charts, unless explicitly requested. All charts are now rendered locally using JavaScript. No data is now sent over [...]]]></description>
			<content:encoded><![CDATA[<p>Revision <strong>170</strong> of <a href="../../forge/mycheckpoint">mycheckpoint</a>, a MySQL monitoring solution, has  been released. New and updated in this revision:</p>
<ul>
<li><strong>Improved custom queries</strong>: lifting of limitations from previous, introductory revision; better HTML presentation</li>
<li><strong>Local, inline charting</strong>: no rendering of Google Charts, unless explicitly requested. All charts are now rendered locally using JavaScript. No data is now sent over the network.</li>
<li><strong>Page/Swap I/O monitoring</strong>: now monitoring for page ins and outs, swap ins and outs (Linux only).</li>
<li><strong>Improved HTML reports</strong>: several improvements on presentation (see <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-brief-169.html">sample</a>, more follow).</li>
</ul>
<h4>Improved custom queries</h4>
<p>Some limitations, introduced in revision <strong>132</strong>, are now lifted. New features are introduced.</p>
<ul>
<li>There is now no limit to the number of custom queries (well, an INT limit).</li>
<li>In fact, the data tables adjust themselves to the existing custom queries in the form of auto-deploy: once a new <a href="http://code.openark.org/forge/mycheckpoint/documentation/custom-monitoring">custom query is added</a> or an old one removed, mycheckpoint will add or remove the relevant columns from the data tables.</li>
<li>The <strong>chart_order</strong> column is now utilized: HTML reports which include custom query charts now order those charts according to <strong>chart_order</strong> values. This makes for nicer reports.</li>
<li>The standard <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-brief-169.html">HTML brief report</a> (<strong>SELECT html FROM sv_report_html_brief</strong>) now automatically includes all custom charts. The HTML brief report is the report one usually wants to look at: it provides with the latest 24 hours metrics for selected values. It now becomes a centralized place for all that is interesting in the past 24 hours.</li>
<li>Custom queries are now allowed to return <strong>NULL</strong>, treated as a missing value. This is a bugfix from previous revisions.</li>
</ul>
<h4>Local charting</h4>
<p>Motivation for local charting is clear: no one likes having their data being sent over the network. And no one likes Google to know about their DML values.</p>
<p>I&#8217;ve been playing around with quite a few charting solutions, and have gone into depths with two of them, adding and rewriting quite a lot of code. Eventually, I settled on my very own rendering. Here&#8217;s what I&#8217;ve seen &amp; tested:<span id="more-2650"></span></p>
<ul>
<li><a href="http://danvk.org/dygraphs/">dygraphs</a>: a very nice time series charting library. I&#8217;ve presented a use case on <a href="http://code.openark.org/blog/mysql/static-charts-vs-interactive-charts">a previous post</a>.
<ul>
<li>Pros: slick, easy to work with.</li>
<li>Cons: uses HTML Canvas for rendering. This is fine on Firefox, Chrome, Safari, you name it. This isn&#8217;t fine on IE, which does not support Canvas. There&#8217;s <a href="http://excanvas.sourceforge.net/">ExplorerCanvas</a>, a hack tool which converts canvas to IE&#8217;s VML, but it is far from being satisfactory: it is <em>sloooow</em>. Very, very slow. It is slow with one chart; but loading of 21 charts, as I do in some of <em>mycheckpoint</em>&#8216;s reports can take <em>long minutes</em> on Internet explorer.</li>
<li>Cons: Only provides with a time series chart. No scatter plots.</li>
</ul>
</li>
<li>Because they&#8217;re using ExplorerCanvas for IE, <a href="http://code.google.com/p/flot/">flot</a>, <a href="http://www.jqplot.com/">jqPlot</a> etc., are all unacceptable.</li>
<li><a href="http://g.raphaeljs.com/">gRaphael</a>: very slick charts based on Raphael. The original line charts are very basic, and I have invested a lot of time rewriting a great deal (you can find it all <a href="http://code.google.com/p/mycheckpoint/source/browse/#svn/trunk/graphael">here</a>). Raphael uses VML on IE, and SVG for all other browsers.
<ul>
<li>Pros: very slick. Supports various chart types, including line (though not time-series) and scatter.</li>
<li>Cons: <em>slooooooooow</em> when instantiating multiple charts. Unbearably slow, both on Firefox and IE. Slow as in minutes of waiting.</li>
</ul>
</li>
</ul>
<p>In addition, all of the above solutions were quite heavyweight: at about 45KB to start with, then add ExplorerCanvas or jQuery, or Raphael as supporting libraries, these became a real burden.</p>
<p>So, I had some time to spare (business is fine, thank you. I was a bit Ill. I&#8217;m feeling well now, thank you), and was upset what with all the time I invested in the above coding. And I decided to invest even more time, and build <em>my own</em> charts.</p>
<p>Enter <em>openark-charts</em>.</p>
<blockquote>
<pre><a href="http://code.openark.org/blog/wp-content/uploads/2010/07/mycheckpoint-report-html-screenshot.png"><img class="alignnone size-full wp-image-2662" title="mycheckpoint-report-html-screenshot" src="http://code.openark.org/blog/wp-content/uploads/2010/07/mycheckpoint-report-html-screenshot.png" alt="" width="808" height="307" /></a>

<a href="http://code.openark.org/blog/wp-content/uploads/2010/07/mycheckpoint-24-7-report-html-screenshot.png"><img class="alignnone size-full wp-image-2663" title="mycheckpoint-24-7-report-html-screenshot" src="http://code.openark.org/blog/wp-content/uploads/2010/07/mycheckpoint-24-7-report-html-screenshot.png" alt="" width="808" height="267" /></a></pre>
</blockquote>
<p>Currently, these line charts and scatter charts know how to parse a Google Image chart URL (only some features supported &#8212; only those I&#8217;m actually using with <em>mycheckpoint</em>). These are not full blown solutions: they come to serve mycheckpoint. And they do so nicely, if I may say so. Using Canvas for most browsers, or VML for IE, these very small pieces of code (10K for line chart, 6K for scatter chart, minified) load fast, use very little memory, and do their work well.</p>
<p>Granted, neither provides with interactive features: this is planned for the future.</p>
<h4>Page/swap I/O monitoring</h4>
<p>(Linux only) <em>mycheckpoint</em> now reads <strong>/proc/vmstat</strong> to get the <em>pageins</em>, <em>pageouts</em>, <em>swapins</em> and <em>swapouts</em> (since last reboot). I was actually looking at completely different places on the <strong>/proc</strong> file system to get swap info, and was frustrated with the complexity involved, till I bumped on <strong>/proc/vmstat</strong>&#8230; New tricks every day!</p>
<h4>Improved HTML reports</h4>
<p>This is mostly HTML make-up. Some minimal design, some more details thrown into the HTML pages (name of DB, MySQL version, <em>mycheckpoint</em> version). A little more verbosity; all sorts of stuff which was neglected so far.</p>
<p>Here are some <span style="text-decoration: line-through;"><strong>show off</strong></span> examples of the new HTML views: <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-full-169.html">[full report]</a>, <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-brief-169.html">[brief report]</a>, <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-24-7-169.html">[24/7 report]</a>, <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-custom-full-169.html">[custom full report]</a>, <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-report-custom-brief-169.html">[custom brief report]</a>, <a href="http://code.openark.org/forge/wp-content/uploads/2010/07/mycheckpoint-alert-pending-169.html">[alert pending report]</a>.</p>
<p>All HTML views now utilize the new <em>openark-charts</em>, and none renders charts with Google charts. This means when you <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-html-reports">use your HTML view</a>, your data is safe. No data is sent over the net. All charts are rendered using Javascript, which is loaded and executed locally.</p>
<p>But if you like, there&#8217;s a [url] link next to each chart, which leads to a (online) Google chart image. Why? Because neither HTML Canvas nor VML allow for a complete rendering of the charts to an image. So this is a way for one to retrieve &amp; store a chart&#8217;s image. Don&#8217;t use it if you see no reason for it; it&#8217;s just there.</p>
<p>And I even threw in rounded corners (IE users: only as of Windows 7).</p>
<h4>Future plans</h4>
<p>Work is going on. These are the non-scheduled future tasks I see:</p>
<ul>
<li>Monitoring InnoDB Plugin &amp; XtraDB status.</li>
<li>Interactive charts. See my <a href="../mysql/static-charts-vs-interactive-charts">earlier  post</a>.</li>
<li>A proper <em>man</em> page.</li>
<li>Anything else that interests me.</li>
</ul>
<h4>Try it out</h4>
<p>Try out <em>mycheckpoint</em>. It’s a different kind of monitoring  solution. You will need basic SQL skills, and in return you’ll get a lot  of power under your hands.</p>
<ul>
<li>Download mycheckpoint <a href="https://code.google.com/p/mycheckpoint/">here</a></li>
<li>Visit the project’s <a href="../../forge/mycheckpoint">homepage</a></li>
<li>Browse the <a href="../../forge/mycheckpoint/documentation">documentation</a></li>
<li>Report <a href="https://code.google.com/p/mycheckpoint/issues/list">bugs</a></li>
</ul>
<p><em>mycheckpoint</em> is released under the <a href="http://www.opensource.org/licenses/bsd-license.php">New BSD  License</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mycheckpoint-rev-170-improved-custom-queries-local-charting-pageswap-io-monitoring-improved-html-reports/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Implicit casting you don&#8217;t want to see around</title>
		<link>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around</link>
		<comments>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around#comments</comments>
		<pubDate>Wed, 07 Jul 2010 08:53:37 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2344</guid>
		<description><![CDATA[In Beware of implicit casting, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled: Number-String comparisons Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus: mysql&#62; SELECT 3 = '3.0'; +-----------+ &#124; 3 = [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://code.openark.org/blog/mysql/beware-of-implicit-casting">Beware of implicit casting</a>, I have outlined the dangers of implicit casting. Here&#8217;s a few more real-world examples I have tackled:</p>
<h4>Number-String comparisons</h4>
<p>Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql&gt; SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+
</pre>
</blockquote>
<p>The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.</p>
<h4>Direct DATE arithmetics</h4>
<p>The first query <em>seems</em> to work, but is completely incorrect. The second explains why. The third is a total mess.<span id="more-2344"></span></p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT DATE('2010-01-01')+3;
+----------------------+
| DATE('2010-01-01')+3 |
+----------------------+
|             20100104 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT DATE('2010-01-01')-3;
+----------------------+
| DATE('2010-01-01')-3 |
+----------------------+
|             20100098 |
+----------------------+
1 row in set (0.00 sec)

mysql&gt; SELECT '2010-01-01' - 3;
+------------------+
| '2010-01-01' - 3 |
+------------------+
|             2007 |
+------------------+
1 row in set, 1 warning (0.00 sec)
</pre>
</blockquote>
<h4>Number-String comparisons, big integers</h4>
<p>Look at the following crazy comparisons:</p>
<blockquote><pre class="brush: sql;">
mysql&gt; SELECT 1234 = '1234';
+---------------+
| 1234 = '1234' |
+---------------+
|             1 |
+---------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345678';
+-------------------------------------------+
| 123456789012345678 = '123456789012345678' |
+-------------------------------------------+
|                                         0 |
+-------------------------------------------+

mysql&gt; SELECT 123456789012345678 = '123456789012345677';
+-------------------------------------------+
| 123456789012345678 = '123456789012345677' |
+-------------------------------------------+
|                                         1 |
+-------------------------------------------+
</pre>
</blockquote>
<p>The amazing result of the last two comparisons may strike as odd. Actually, it may strike as a bug, and indeed when a customer approached me with this behavior I was at loss for words. But this is <a href="http://dev.mysql.com/doc/refman/5.0/en/type-conversion.html">documented</a>. The manual describes the cases for casting, then states: &#8220;&#8230; In all other cases, the arguments are compared <em>as             floating-point (real) numbers</em>. &#8230;&#8221;</p>
<h4>Lessons learned:</h4>
<ul>
<li>Be careful when comparing strings with floating point values. Matching depends on how both are represented.</li>
<li>Avoid converting temporal types to strings when doing date manipulation.</li>
<li>Avoid direct math on temporal types.</li>
<li>Avoid casting <strong>BIGINT</strong>s represented by strings. Casting will turn out to use <strong>FLOAT</strong>s and may be incorrect.</li>
</ul>
<p>Last but not least:</p>
<ul>
<li>Use the proper data types for your data&#8217;s representation. When dealing with numbers, use numbers. When dealing with temporal values, use temporal types.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/implicit-casting-you-dont-want-to-see-around/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
