<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Configuration</title>
	<atom:link href="http://code.openark.org/blog/tag/configuration/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Thu, 09 Sep 2010 16:15:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>MMM for MySQL single reader role</title>
		<link>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role</link>
		<comments>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role#comments</comments>
		<pubDate>Thu, 12 Aug 2010 12:12:16 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[High availability]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2824</guid>
		<description><![CDATA[The standard documentation and tutorials on MMM for MySQL, for master-master replication setup, suggest one Virtual IP for the writer role, and two Virtual IPs for the reader role. It can be desired to only have a single virtual IP for the reader role, as explained below. The two IPs for the reader role A [...]]]></description>
			<content:encoded><![CDATA[<p>The standard documentation and tutorials on <a href="http://mysql-mmm.org/">MMM for MySQL</a>, for master-master replication setup, suggest one Virtual IP for the <em>writer</em> role, and two Virtual IPs for the <em>reader</em> role. It can be desired to only have a single virtual IP for the reader role, as explained below.</p>
<h4>The two IPs for the reader role</h4>
<p>A simplified excerpt from the <strong>mmm_common.conf</strong> sample configuration file, as can be found on the project&#8217;s site and which is most quoted:<span id="more-2824"></span></p>
<blockquote>
<pre>...
&lt;host db1&gt;
  ip                      192.168.0.11
  mode                    master
  peer                    db2
&lt;/host&gt;

&lt;host db2&gt;
  ip                      192.168.0.12
  mode                    master
  peer                    db1
&lt;/host&gt;
...
&lt;role writer&gt;
  hosts                   db1, db2
  ips                     192.168.0.100
  mode                    exclusive
&lt;/role&gt;

&lt;role reader&gt;
  hosts                   db1, db2
  ips                     192.168.0.101, 192.168.0.102
  mode                    balanced
&lt;/role&gt;
</pre>
</blockquote>
<p>In the above setup <strong>db1</strong> &amp; <strong>db2</strong> participate in master-master active-passive replication. Whenever you need to write something, you use <strong>192.18.0.100</strong>, which is the virtual IP for the writer role. Whenever you need to read something, you use either <strong>192.168.0.101</strong> or <strong>192.168.0.102</strong>, which are the virtual IPs of the two machines, this time in read role. Logic says one wishes to distribute reads between the two machines.</p>
<h4>One IP for reader role</h4>
<p>I have a few cases where the above setup is not satisfactory: there is a requirement to know the IP of the passive (read-only) master. Reason? There are queries which we only want to execute on the slave (reporting, long analysis), and only execute on the active master when this isn&#8217;t possible. Sometimes we might even prefer waiting for a slave to come back up rather than execute a query on the master.</p>
<p>This may involve an application level solution, or a connection-pool level solution (&#8220;get me a slave&#8217;s connection, or, if that&#8217;s not possible, get me the master&#8217;s&#8221;).</p>
<p>Anyway, neither <strong>192.168.0.101</strong> nor <strong>192.168.0.102</strong> relate to a particular machine&#8217;s role status. That is, the fact that one of the machines is in <em>writer</em> mode or not does not affect these virtual IPs.</p>
<p>The solution is a minor change to the configuration file. Real minor:</p>
<blockquote>
<pre>&lt;role reader&gt;
  hosts                   db1, db2
  ips                     192.168.0.101
  mode                    balanced
&lt;/role&gt;
</pre>
</blockquote>
<p>In this new setup the two nodes compete for a single <em>reader</em> role virtual IP. There is no <strong>192.168.0.102</strong> anymore. Although it does not reflect from the configuration file, it turns out MMM acts in a smart way; the way you would expect it to run.</p>
<p>There is nothing to suggest in the above that the IPs <strong>192.168.0.100</strong> &amp; <strong>192.168.0.101</strong> will be distributed between the two machines. But you would <em>like</em> them to. And MMM does that. It makes sure that, if possible, one of the machines (say <strong>db1</strong>) gets the <em>writer</em> role, hence <strong>192.168.0.100</strong>, and the other (<strong>db2</strong>) the <em>reader</em> role, hence <strong>192.168.0.101</strong>.</p>
<p>Moreover, it prefers that situation over a current known situation: say <strong>db1</strong> went down. The <em>writer</em> role moves to <strong>db2</strong>. When <strong>db1</strong> is up again, MMM acts smartly: it does <em>not</em> give it back the <em>writer</em> role (since moving the active master around is costly, after all), but <em>does</em> give it the <em>reader</em> role, along with the <strong>192.168.2.101</strong> IP. So it takes care not to leave a server without a role, while preferring to move the <em>writer</em> role as little as possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mmm-for-mysql-single-reader-role/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Verifying GROUP_CONCAT limit without using variables</title>
		<link>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables</link>
		<comments>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comments</comments>
		<pubDate>Thu, 10 Jun 2010 07:16:14 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2534</guid>
		<description><![CDATA[I have a case where I must know if group_concat_max_len is at its default value (1024), which means there are some operation I cannot work out. I&#8217;ve ranted on this here. Normally, I would simply: SELECT @@group_concat_max_len However, I am using views, where session variables are not allowed. Using a stored function can do the [...]]]></description>
			<content:encoded><![CDATA[<p>I have a case where I must know if <strong>group_concat_max_len</strong> is at its default value (<strong>1024</strong>), which means there are some operation I cannot work out. I&#8217;ve ranted on this <a href="http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults">here</a>.</p>
<p>Normally, I would simply:</p>
<blockquote><pre class="brush: sql;">
SELECT @@group_concat_max_len
</pre>
</blockquote>
<p>However, I am using views, where session variables are not allowed. Using a stored function can <a href="http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown">do the trick</a>, but I wanted to avoid stored routines. So here&#8217;s a very simple test case: is the current <strong>group_concat_max_len</strong> long enough or not? I&#8217;ll present the long version and the short version.</p>
<h4>The long version</h4>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  )
FROM
  INFORMATION_SCHEMA.COLLATIONS;
</pre>
</blockquote>
<p>If the result is <strong>1024</strong>, we are in a bad shape. I happen to know that the total length of collation names is above <strong>1800</strong>, and so it is trimmed down. Another variance of the above query would be:<span id="more-2534"></span></p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  ) = SUM(CHAR_LENGTH(COLLATION_NAME))
    AS group_concat_max_len_is_long_enough
FROM
  INFORMATION_SCHEMA.COLLATIONS;

+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The <strong>COLLATIONS</strong>, <strong>CHARACTER_SETS</strong> or <strong>COLLATION_CHARACTER_SET_APPLICABILITY</strong> tables provide with known to exist variables (assuming you did not compile MySQL with particular charsets). It&#8217;s possible to <strong>CONCAT</strong>, <strong>UNION</strong> or <strong>JOIN</strong> columns and tables to detect longer than <strong>1800</strong> characters in <strong>group_concat_max_len</strong>. I admit this is becoming ugly, so let&#8217;s move on.</p>
<h4>The short version</h4>
<p>Don&#8217;t want to rely on existing tables? Not sure what values to expect? Look at this:</p>
<blockquote><pre class="brush: sql;">
SELECT CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 1025))) FROM DUAL
</pre>
</blockquote>
<p><strong>GROUP_CONCAT</strong> doesn&#8217;t really care about the number of rows. In the above example, I&#8217;m using a single row (retrieved from the <strong>DUAL</strong> virtual table), making sure it is long enough. Type in any number in place of <strong>1025</strong>, and you have a metric for your <strong>group_concat_max_len</strong>.</p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 32768))) &gt;= 32768 As group_concat_max_len_is_long_enough
FROM
  DUAL;
+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The above makes a computation with <strong>REPEAT</strong>. One can replace this with a big constant.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Those oversized, undersized variables defaults</title>
		<link>http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults</link>
		<comments>http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults#comments</comments>
		<pubDate>Wed, 09 Jun 2010 04:35:08 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[sql_mode]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1997</guid>
		<description><![CDATA[Some mysqld parameters are far from having reasonable defaults. Most notable are the engine-specific values, and in particular the InnoDB parameters. Some of these variables have different defaults as of MySQL 5.4. innodb_buffer_pool_size, for example, is 128M on 5.4. innodb_log_file_size, however, has changed back and forth, as far as I understand, and is down to [...]]]></description>
			<content:encoded><![CDATA[<p>Some <strong>mysqld</strong> parameters are far from having reasonable defaults. Most notable are the engine-specific values, and in particular the InnoDB parameters.</p>
<p>Some of these variables have different defaults as of MySQL 5.4. <strong>innodb_buffer_pool_size</strong>, for example, is <strong>128M</strong> on 5.4. <strong>innodb_log_file_size</strong>, however, has changed back and forth, as far as I understand, and is down to <strong>5M</strong> again. These settings are still the same on 5.5.</p>
<p>I wish to present some not-so-obvious parameters which, in my opinion, have poor defaults, for reasons I will explain.</p>
<ul>
<li><strong>group_concat_max_len</strong>: This parameter limits the maximum text length of a <strong>GROUP_CONCAT</strong> concatenation result. It defaults to <strong>1024</strong>. I think this is a very low value. I have been using <strong>GROUP_CONCAT</strong> more and more, recently, to solve otherwise difficult problems. And in most cases, <strong>1024</strong> was just too low, resulting in <a href="http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error">silent</a> (<em>Argh!</em>) truncating of the result, thus returning incorrect results. It is interesting to learn that the maximum value for this parameter is limited by <strong>max_packet_size</strong>. I would suggest, then, that this parameter should be altogether removed, and have the <strong>max_packet_size</strong> limitation as the only limitation. Otherwise, I&#8217;d like it to have a very large default value, in the order of a few MB.</li>
<li><strong>wait_timeout</strong>: Here&#8217;s a parameter whose default value is over permissive. <strong>wait_timeout</strong> enjoys an <strong>8 hour</strong> default. I usually go for <strong>5-10 minutes</strong>. I don&#8217;t see a point in letting idle connections waste resources for 8 hours. Applications which hold up such connections should be aware that they&#8217;re doing something wrong, in the form of a forced disconnection. Connection pools work beautifully with low settings, and can themselves do keepalives, if they choose to.</li>
<li><strong>sql_mode</strong>: I&#8217;ve <a href="http://code.openark.org/blog/mysql/do-we-need-sql_mode">discussed this</a> in length before. My opinion unchanged.</li>
<li><strong>open_files_limit</strong>: What with the fact connections, threads, table descriptors, table file descriptors (depending on how you use InnoDB), temporary file tables &#8212; all are files on unix-like systems, and considering this is an inexpensive payment, I think <strong>open_files_limit</strong> should default to a few thousands. Why risk the crash of &#8220;too many open files&#8221;?</li>
</ul>
<p><span id="more-1997"></span>No setting will ever be perfect for everyone, I know. But there are those parameters which you automatically set values for when you do a new install. These should be at focus and their defaults change.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Replication configuration checklist</title>
		<link>http://code.openark.org/blog/mysql/replication-configuration-checklist</link>
		<comments>http://code.openark.org/blog/mysql/replication-configuration-checklist#comments</comments>
		<pubDate>Tue, 18 May 2010 07:27:06 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2357</guid>
		<description><![CDATA[This post lists the essential and optional settings for a replication environment. It does not explain how to create replicating slaves. See How To Setup Replication for that. However, not all configuration options are well understood, and their roles in varying architectures can change. Here are the settings for a basic Master/Slave(s) replication architecturee. Essential [...]]]></description>
			<content:encoded><![CDATA[<p>This post lists the essential and optional settings for a replication environment.</p>
<p>It does not explain how to create replicating slaves. See <a href="http://dev.mysql.com/doc/refman/5.1/en/replication-howto.html">How To Setup Replication</a> for that. However, not all configuration options are well understood, and their roles in varying architectures can change.</p>
<p>Here are the settings for a basic Master/Slave(s) replication architecturee.</p>
<h4>Essential</h4>
<ul>
<li><strong>log-bin</strong>: enable binary logs on the master. Replication is based on the master logging all modifying queries (<strong>INSERT</strong>/<strong>CREATE</strong>/<strong>ALTER</strong>/<strong>GRANT</strong> etc.), and the slaves being able to replicate them.</li>
<li><strong>server-id</strong>: each machine must have a <em>unique</em> <strong>server-id</strong>. A slave will not replay queries originating from a server with the same <strong>server-id</strong> as its own.</li>
<li><strong>GRANT</strong>: grant a user with <strong>REPLICATION SLAVE</strong>. The host list must include all replication slave hosts.</li>
<li><strong>expire-logs-days</strong>: automatically clean up master&#8217;s binary logs older than given value. By default, binary logs are never removed.</li>
</ul>
<p>When working with Master/Slaves replication, one should be prepared to master failure and slave promotion to master. It may be desirable to identify a particular slave as primary candidate for promotion.</p>
<p><span id="more-2357"></span>Just setting up the <strong>log-bin</strong> will yield with warnings in the MySQL&#8217;s error log. The binary logs are named, by default, after the host&#8217;s name. If that should change &#8211; MySQL will not be able to find the binary logs anymore (expecting a name which does previous logs did not use). It is therefore recommended to use:</p>
<blockquote>
<pre>log-bin=mychachine-bin</pre>
</blockquote>
<p>or</p>
<blockquote>
<pre>log-bin=mysql-bin</pre>
</blockquote>
<h4>Essential/Optional</h4>
<ul>
<li><strong>log-bin</strong>: enable on a slave, so that in case it is promoted to master, the rest of the slaves can replicate using its binary logs. Enabling binary logging cannot be done on a live server: this parameter requires MySQL restart.</li>
<li><strong>GRANT</strong>: include the master&#8217;s host, so that when a slave promotes to master, the master can become a slave and continue replicating.</li>
<li><strong>log-slave-updates</strong>: together with <strong>log-bin</strong>, enable on slave so that master&#8217;s binary logs are propagated and logged by the slave. This is required if the slave takes the role of a master in a chained replication setup.</li>
<li><strong>expire-logs-days</strong>: set this flag on slave as well [tnx Sheeri].</li>
<li><strong>read-only</strong>: set on slave(s). Refuses any modifying query (INSERT, DELETE, ALTER, DROP etc.) for non-<strong>SUPER</strong> privileged users [tnx Ryan].</li>
<li><strong>sync-binlog</strong>: flush binary log to disk per transaction commit. Use this on master for safer replication; however note that increased I/O is expected [tnx Harrison].</li>
</ul>
<h4>Extra</h4>
<ul>
<li><strong>report-host</strong>, <strong>report-port</strong>: the host and port identifying the slave when looking at SHOW SLAVE HOSTS on master. Set this up on all hosts. See <a href="http://code.openark.org/blog/mysql/the-importance-of-report_host-report_port">further discussion here</a>.</li>
<li><strong>max-binlog-size</strong>: the maximum size for a binary log / relay log file, after which it is rotated.</li>
</ul>
<h4>Expert</h4>
<ul>
<li><strong>binlog-do-db</strong>, <strong>binlog-do-table</strong>, <strong>replicate-do-db</strong>, <strong>&#8230;</strong>: filter queries by either not writing them to binary log, or not reading them from the logs.</li>
</ul>
<p>The reason I list the above as &#8220;Expert&#8221; is not because one must have a super-brain to set them up. That part is easy enough. But they lead to some dangerous situations, sometimes seemingly harmless. It takes great care to control the application and developers from creating those situations. See <a href="http://dev.mysql.com/doc/refman/5.1/en/replication-rules.html">documentation here</a>. See also discussion <a href="http://code.openark.org/blog/mysql/quick-reminder-avoid-using-binlog-do-db">here</a> and <a href="http://www.mysqlperformanceblog.com/2009/05/14/why-mysqls-binlog-do-db-option-is-dangerous/">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/replication-configuration-checklist/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>But I DO want MySQL to say &#8220;ERROR&#8221;!</title>
		<link>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error</link>
		<comments>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error#comments</comments>
		<pubDate>Fri, 12 Mar 2010 04:53:28 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[sql_mode]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2005</guid>
		<description><![CDATA[MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data. Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though). Calling CREATE TEMPORARY TABLE? You get silent commit. Issuing a ROLLBACK on non-transactional involved [...]]]></description>
			<content:encoded><![CDATA[<p>MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.</p>
<ul>
<li>Using <strong>GROUP_CONCAT</strong> with a small <strong>group_concat_max_len</strong> setting? Your result will be silently truncated (make sure to check the warnings though).</li>
<li>Calling <strong>CREATE <span style="text-decoration: line-through;">TEMPORARY</span> TABLE</strong>? You get <a href="http://www.joinfu.com/2010/03/a-follow-up-on-the-sql-puzzle/">silent commit</a>.</li>
<li>Issuing a <strong>ROLLBACK</strong> on non-transactional involved engines? Have a warning; no error.</li>
<li>Using <strong>LOCK IN SHARE MODE</strong> on non transactional tables? Not a problem. Nothing reported.</li>
<li>Adding a <strong>FOREIGN KEY</strong> on a MyISAM table? Good for you; no action actually taken.</li>
<li>Inserting <strong>300</strong> to a <strong>TINYINT</strong> column in a relaxed <strong>sql_mode</strong>? Give me <strong>255</strong>, I&#8217;ll silently drop the remaining <strong>45</strong>. I owe you.</li>
</ul>
<h4>Warnings and errors</h4>
<p>It would be nice to:<span id="more-2005"></span></p>
<ul>
<li>Have an <strong>auto_propagate_warning_to_error</strong> server variable (global/session/both) which, well, does what it says.</li>
<li>Have an <strong>i_am_really_not_a_dummy</strong> server variable which implies stricter checks for all the above and prevents you from doing with <em>anything</em> that may be problematic (or rolls back your transactions on your invalid actions).</li>
</ul>
<p>Connectors may be nice enough to propagate warnings to errors &#8211; that&#8217;s good. But not enough: since data is already committed in MySQL.</p>
<p>If I understand correctly, and maybe it&#8217;s just a myth, it all relates to the times where MySQL had interest in a widespread adoption across the internet, in such way that it does not interfere too much with the users (hence leading to the common myth that &#8220;MySQL just works out of the box and does not require me to configure or understand anything&#8221;).</p>
<p>MySQL is a database system, and is now widespread, and is used by serious companies and products. It is time to stop play nice to everyone and provide with strict integrity &#8212; or, be nice to everyone, just allow me to specify what &#8220;nice&#8221; means for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/but-i-do-want-mysql-to-say-error/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Quick reminder: avoid using binlog-do-db</title>
		<link>http://code.openark.org/blog/mysql/quick-reminder-avoid-using-binlog-do-db</link>
		<comments>http://code.openark.org/blog/mysql/quick-reminder-avoid-using-binlog-do-db#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:03:50 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2077</guid>
		<description><![CDATA[Nothing new about this warning; but it&#8217;s worth repeating: Using binlog-do-db is dangerous to your replication. It means the master will not write to binary logs any statement not in the given database. Ahem. Not exactly. It will not write to binary logs any statement which did not originate from the given database. Which is [...]]]></description>
			<content:encoded><![CDATA[<p>Nothing new about this warning; but it&#8217;s worth repeating:</p>
<p>Using <a href="http://dev.mysql.com/doc/refman/5.1/en/replication-options-binary-log.html#option_mysqld_binlog-do-db"><strong>binlog-do-db</strong></a> is dangerous to your replication. It means the master will not write to binary logs any statement not in the given database.</p>
<p>Ahem. Not exactly. It will not write to binary logs any statement which did not originate from the given database.</p>
<p>Which is why a customer, who was using <strong>Toad for MySQL</strong> as client interface to MySQL, and by default connected to the <strong>mysql</strong> schema, did not see his queries being replicated. In fact, he later on got replication errors. If you do:</p>
<blockquote>
<pre>USE test;
INSERT INTO world.City VALUES (...)</pre>
</blockquote>
<p>Then the statement is assumed to be in the <strong>test</strong> database, not in the <strong>world</strong> database.</p>
<p>Slightly better is using <strong>replicate-do-db</strong> on the slave machines. At least we allow the master to write everything. But still, for the same reasons, slaves may fail to repeat a perfectly valid query, just because it has been issued in the context of the wrong database. <strong>replicate-ignore-db</strong> is somewhat safer yet, but the trap is still there.</p>
<p>My advice is that replication should replicate <em>everything</em>. Make sure you and everyone else you work with understand the implications of <strong>binlog-do-db</strong> and <strong>replicate-do-db</strong> before implementing it.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/quick-reminder-avoid-using-binlog-do-db/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>To not yum or to not apt-get?</title>
		<link>http://code.openark.org/blog/mysql/to-not-yum-or-to-not-apt-get</link>
		<comments>http://code.openark.org/blog/mysql/to-not-yum-or-to-not-apt-get#comments</comments>
		<pubDate>Tue, 16 Feb 2010 11:44:25 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Installation]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1776</guid>
		<description><![CDATA[I&#8217;ve written shortly on this before. I like yum; I love apt-get; I prefer not to use them for MySQL installations. I consider a binary tarball to be the best MySQL installation format (source installations being a different case altogether). Why? I use yum and apt-get whenever I can and for almost all needs (sometimes [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve <a href="http://code.openark.org/blog/mysql/manually-installing-multiple-mysql-instances-on-linux-howto">written</a> shortly on this before. I like <strong>yum</strong>; I love <strong>apt-get</strong>; I prefer <em>not</em> to use them for MySQL installations. I consider a binary tarball to be the best MySQL installation format (source installations being a different case altogether).</p>
<h4>Why?</h4>
<p>I use <strong>yum</strong> and <strong>apt-get</strong> whenever I can and for almost all needs (sometimes preferring CPAN for Perl installations). But on a MySQL machine, I avoid doing so. The reason is either dependency hell or dependency mismatch.</p>
<p>Package managers are supposed to solve the dependency hell issue. But package managers will rarely have an up to date MySQL version.</p>
<p>I&#8217;ve had several experiences where a simple <strong>yum</strong> installation re-installed the MySQL version. I&#8217;ve had customers calling me up when, having installed something with <strong>yum</strong>, MySQL would not work anymore.<span id="more-1776"></span></p>
<p><strong>yum install package-which-depends-on-mysql-server</strong> will install MySQL server on your system if it hasn&#8217;t been installed with <strong>yum</strong>. Are you on CentOS <strong>5.0</strong>? You&#8217;ll get MySQL <strong>5.0.22</strong>. Oh, did you already have a <strong>RPM</strong> installation for MySQL <strong>5.0.81</strong>? Sorry &#8211; it&#8217;s just been <em>downgraded</em>, plus <em>it won&#8217;t work</em> anymore since the error messages file has been changed since then.</p>
<p>Don&#8217;t press &#8216;<strong>Y</strong>&#8216; too soon!</p>
<p>Things are slightly better with <strong>apt-get</strong>. I&#8217;ve encountered less situations where <strong>mysql-server</strong> was on the dependency list. Many times it&#8217;s just the <strong>libmysqlclient</strong> package or the <strong>mysql-common</strong> one.</p>
<p>But wait! Did you install <strong>mysql-common</strong>? Bonus! You get the elusive <strong>/etc/mysql/my.cnf</strong> file created, and there goes your server configuration. Future spawns of the MySQL server / clients will read from the wrong configuration file, and will probably fail to load.</p>
<p>Not to mention neither will help you out with multiple instances installation.</p>
<h4>My argument</h4>
<p>A sys admin recently argued with me that it was wrong of me to have the entire machine set up with <strong>yum</strong>, but have MySQL installed with binary tarball. He argued that it broke the entire setup. I expressed my opinion: <em>on a MySQL dedicated server, MySQL gets to be prioritized. It&#8217;s special</em>. It is the reason for the existence of the machine. I would imagine that same would hold for Apache on an Apache dedicated machine, for Sendmail on a Sendmail dedicated machine, etc. As a DBA, I want to have best control of the MySQL installation; I want to be able to upgrade minor versions quickly: I often find newer versions to solve bugs I was concerned with; I want to be able to install multiple instances; I want to be able to downgrade without having to remove and uninstall the previous version.</p>
<p>I want to have control. World domination aside, that is.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/to-not-yum-or-to-not-apt-get/feed</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Announcing mycheckpoint: lightweight, SQL oriented monitoring for MySQL</title>
		<link>http://code.openark.org/blog/mysql/announcing-mycheckpoint-lightweight-sql-oriented-monitoring-for-mysql</link>
		<comments>http://code.openark.org/blog/mysql/announcing-mycheckpoint-lightweight-sql-oriented-monitoring-for-mysql#comments</comments>
		<pubDate>Tue, 10 Nov 2009 13:16:59 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[mycheckpoint]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1550</guid>
		<description><![CDATA[I&#8217;m proud to announce mycheckpoint, a monitoring utility for MySQL, with strong emphasis on user accessibility to monitored data. mycheckpoint is a different kind of monitoring tool. It leaves the power in the user&#8217;s hand. It&#8217;s power is not with script-based calculations of recorded data. It&#8217;s with the creation of a view hierarchy, which allows [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m proud to announce <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a>, a monitoring utility for MySQL, with strong emphasis on user accessibility to monitored data.</p>
<p><em>mycheckpoint</em> is a different kind of monitoring tool. It leaves the power in the user&#8217;s hand. It&#8217;s power is not with script-based calculations of recorded data. It&#8217;s with the creation of a view hierarchy, which allows the user to access computed metrics directly.</p>
<p><em>mycheckpoint</em> is needed first, to deploy a monitoring schema. It <em>may</em> be needed next, so as to INSERT recorded data (GLOBAL STATUS, GLOBAL VARIABLES, MASTER STATUS, SLAVE STATUS) &#8212; but this is just a simple INSERT; anyone can do that, even another monitoring tool.</p>
<p>It is then that you do not need it anymore: everything is laid at your fingertips. Consider:</p>
<blockquote>
<pre><strong>SELECT</strong> innodb_read_hit_percent, DML <strong>FROM</strong> sv_report_chart_hour;</pre>
</blockquote>
<blockquote><p><img class="alignnone" title="Google chart #1" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+5,+13:10++-++Nov+6,+10:25+(0+days,+21+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:xz3m3P34z3svvz33xzsvxvvsz11xz344443x443133x414131444344144444o1K44444444664446664636444444z64x3666466666641q6666666666666666666666366668888616686866zMGq66666vhqW46666zqPx44466zljz444434343444444433434334434K434441413344444414444343434443434666666664464636&amp;chxt=x,y&amp;chxr=1,99.66,100.00&amp;chxl=0:||Nov+5,+17:25|Nov+5,+21:40|Nov+6,+01:55|Nov+6,+06:10|&amp;chxs=0,505050,10" alt="" width="400" height="200" /><img class="alignnone" title="Google Chat #2" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+26,+19:00++-++Nov+6,+10:00+(10+days,+15+hours)&amp;chdl=com_select_psec|com_insert_psec|com_delete_psec|com_update_psec|com_replace_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4,9acd32,dc143c,9932cc&amp;chd=s:11455ljjkkjnlmnoo268wyy123445njjjkjkllnoorsuvyvxv4533mikkljklmnoqrstuxy001223mojjkjkkllnnpqrrttuvyxyxkghghhhiihijjklmnnoprrssfdeefefgihjjmnoqstwwzx00khiijijilkmopprsuuxx0012khiijhihkjmmnprtwt0z2242mjjkljlknnnnpqrrwwzy1034jijhjijjkkmoqqsswvyx0z11khihjijjkkk,WXYWW4UUWSSRTUWWWjoncZYXZYZXXzUSSSTUVWYWWYbgbYXWWWWWW7aWTTTSTWXXWVVVWWWYXZZXXznTTVTUUVYWVYWUVWVVWVXWU3WWUVVSSSTSTSSTWTUUVUTUUVSUTTTUUVUVVVVVVWXWVXXVVUSVXUSSTTVWVVWVWZXYYbZXXaVVVVUTUUWXVVYZabXaYXXWWaTXZXUTVVVVVVVYZYYYWWWWVaTSRRSSSTVXZaaYWYYbXXWYXbTTUXXUUVVV,JKLKKtJIHHGILJJJJJKJKJKKKKLKKpIHGJGIIJJKJKJJJJJJKKLKKwNIJHGHJJJJLJLJJIKKKKKKKlbIIHHJIKJJKJKKKKKKLLKKMtKHGGHGGLIKMJJMJJJIJJJIIKJHHHIGIJJJJIIJJJIJJKJIJKIIGKHHKLJJKJJJIJJJIJIJKMJJHIHHMMKJJIJIIIHIIIIIIOJIIHIHIJJJJIJIIKIJLKJLKQIHGHGHIKKKKKJJJLKKKKKOKRJJHHHIIKKK,HIIHHJHHHHHHHHIHHHIIIIIIIIIIHJHHHHHHIHIIHHIIIIHHHHIHHIHHHHHGHIIIHHHHHHIIIIIHHJHHHHHHHHHHHHIHHHIHHHHHHIHHHHHHHHHHGGHHHHGGGGHGGIGGGGGHHHHHGHHHHHHHHHHHHIHHHHHHHHHHHHHHHHHHIHHHHJHHHHHHHHHHHHHHHHHHHIIHHJHHHHHHHHIHHHHHHHIIHHHHHIHHHHHHHHHHHIIIHHIIIIIIHIHHHHHHHHHH,&amp;chxt=x,y&amp;chxr=1,0,142.31&amp;chxl=0:||Oct+28,+22:00|Oct+31,+01:00|Nov+2,+04:00|Nov+4,+07:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></p></blockquote>
<p><em>mycheckpoint</em> provides the views which take raw data (just <strong>innodb_buffer_pool_read_requests</strong>, <strong>com_select</strong>, <strong>innodb_buffer_pool_size</strong>, <strong>table_open_cache</strong>, <strong>seconds_behind_master</strong> etc.) and generate Google Charts URLs, HTML reports, human readable reports, or otherwise easily accessible data.</p>
<p><span id="more-1550"></span>Data is provided in different time resolutions:</p>
<ul>
<li>Per sampling</li>
<li>Per hour aggregated data</li>
<li>Per day aggregated data</li>
</ul>
<p>It is thus easy to get a fine grained or a daily overview of your status. In fact, the <em>SQL-generated</em> <a href="http://code.openark.org/forge/wp-content/uploads/2009/11/report.html">HTML report</a> lays them all together.</p>
<p><em>[Read more on <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-google-charts">generating Google Charts</a> and <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-html-reports">HTML reports</a>]</em></p>
<h4>It is more about data accessibility</h4>
<p>Charts are cool to look at, but they are not useful for detailed analysis. The user is free to ask anything of the supporting views:</p>
<p>I want to see the average number of SELECT queries per second in the last 5 hours:</p>
<blockquote>
<pre>mysql&gt; SELECT ts, com_select_psec FROM sv_hour ORDER BY id DESC LIMIT 5;
+---------------------+-----------------+
| ts                  | com_select_psec |
+---------------------+-----------------+
| 2009-11-09 11:00:00 |          294.17 |
| 2009-11-09 10:00:00 |          198.37 |
| 2009-11-09 09:00:00 |          151.29 |
| 2009-11-09 08:00:00 |           90.06 |
| 2009-11-09 07:00:00 |           82.98 |
+---------------------+-----------------+</pre>
</blockquote>
<p>Hmm. Seems like too many SELECTs in the last hour.</p>
<p>Unrelated, is the InnoDB buffer pool being utilized well?</p>
<blockquote>
<pre>mysql&gt; SELECT ts, innodb_buffer_pool_used_percent, innodb_read_hit_percent
       FROM sv_report_sample
       ORDER BY id DESC LIMIT 5;
+---------------------+---------------------------------+-------------------------+
| ts                  | innodb_buffer_pool_used_percent | innodb_read_hit_percent |
+---------------------+---------------------------------+-------------------------+
| 2009-11-09 12:35:01 |                           100.0 |                   99.93 |
| 2009-11-09 12:30:01 |                           100.0 |                   99.89 |
| 2009-11-09 12:25:01 |                           100.0 |                   99.60 |
| 2009-11-09 12:20:01 |                           100.0 |                   99.14 |
| 2009-11-09 12:15:01 |                           100.0 |                   98.99 |
+---------------------+---------------------------------+-------------------------+</pre>
</blockquote>
<p>Apparently, <strong>innodb_buffer_pool_size</strong> could use some more memory.</p>
<p>When did we have excessive amount of writes?</p>
<blockquote>
<pre>mysql&gt; SELECT ts, com_insert_psec
       FROM sv_hour
       WHERE com_insert_psec &gt; (SELECT 2*AVG(com_insert_psec) FROM sv_hour);
+---------------------+-----------------+
| ts                  | com_insert_psec |
+---------------------+-----------------+
| 2009-10-27 00:00:00 |          133.66 |
| 2009-10-28 00:00:00 |          121.79 |
| 2009-10-29 00:00:00 |          138.88 |
| 2009-10-30 00:00:00 |          120.79 |
| 2009-10-31 00:00:00 |          131.78 |
+---------------------+-----------------+</pre>
</blockquote>
<p>Something is going on on those midnights!</p>
<p><em>[Read more on <a href="http://code.openark.org/forge/mycheckpoint/documentation/querying-for-data">querying for data</a>]</em></p>
<h4>Human reports</h4>
<p>But while we&#8217;re at it: it&#8217;s nice to let the user the ability to ask around; but why not provide with some niceties? Special views aggregate monitored data to present human readable reports:</p>
<blockquote>
<pre>SELECT report FROM sv_report_human_hour ORDER BY id DESC LIMIT 1,1 \G</pre>
</blockquote>
<blockquote>
<pre>Report period: 2009-11-08 14:00:00 to 2009-11-08 15:00:00. Period is 60 minutes (1.00 hours)
Uptime: 100.0% (Up: 285 days, 07:17:28 hours)

InnoDB:
    innodb_buffer_pool_size: 4718592000 bytes (4500.0MB). Used: 100.0%
    Read hit: 99.75%
    Disk I/O: 83.00 reads/sec  20.33 flushes/sec
    Estimated log written per hour: 797.0MB
    Locks: 0.32/sec  current: 0

MyISAM key cache:
    key_buffer_size: 33554432 bytes (32.0MB). Used: 18.3%
    Read hit: 99.7%  Write hit: 100.0%

DML:
    SELECT:  149.88/sec  34.1%
    INSERT:  55.84/sec  12.7%
    UPDATE:  17.55/sec  4.0%
    DELETE:  20.68/sec  4.7%
    REPLACE: 0.00/sec  0.0%
    SET:     170.05/sec  38.7%
    COMMIT:  0.02/sec  0.0%
    slow:    2.28/sec  0.5% (slow time: 2sec)

Selects:
    Full scan: 8.37/sec  5.6%
    Full join: 0.00/sec  0.0%
    Range:     40.45/sec  27.0%
    Sort merge passes: 0.00/sec

Locks:
    Table locks waited:  0.00/sec  0.0%

Tables:
    Table cache: 2048. Used: 26.5%
    Opened tables: 0.00/sec

Temp tables:
    Max tmp table size:  67108864 bytes (64.0MB)
    Max heap table size: 67108864 bytes (64.0MB)
    Created:             7.15/sec
    Created disk tables: 0.51/sec  7.1%

Connections:
    Max connections: 200. Max used: 245  122.5%
    Connections: 3.31/sec
    Aborted:     0.07/sec  2.1%

Threads:
    Thread cache: 32. Used: 50.0%
    Created: 0.06/sec

Replication:
    Master status file number: 1494, position: 404951764
    Relay log space limit: 10737418240, used: N/A  (N/A%)
    Seconds behind master: N/A
    Estimated time for slave to catch up: N/A seconds (N/A days, N/A hours)  ETA: N/A</pre>
</blockquote>
<p>The above is a <em>SQL-generated</em> report. The view&#8217;s CREATE statement is <em>ugly</em>, trust me! But the user needs not be aware of this &#8212; all is generated behind the scenes. Since it is SQL-generated, the report is not actually stored anywhere; and one can generate reports for as long as data exists. A three months old data can still be evaluated and used to produce a fresh report.</p>
<p>The above report resembles the ever-so-useful <a href="http://hackmysql.com/mysqlreport">mysqlreport</a> by <a href="http://hackmysql.com/"><strong>Daniel Nichter</strong></a>. I have drawn many ideas from this tool.</p>
<p><em>[Read more on <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-human-reports">generating human readable reports</a>]</em></p>
<h4>Tracking change of parameters</h4>
<p>Since <em>mycheckpoint</em> records server variables, it&#8217;s easy enough to detect a change in variable. Did you dynamically change a variable and forgot to update <strong>my.cnf</strong>? Were you baffled when the server restarted and everything started behaving differently? Just ask away:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM sv_param_change;
+---------------------+-----------------+-----------+-----------+
| ts                  | variable_name   | old_value | new_value |
+---------------------+-----------------+-----------+-----------+
| 2009-11-04 13:00:01 | max_connections |       500 |       200 |
+---------------------+-----------------+-----------+-----------+</pre>
</blockquote>
<p>Doh! That&#8217;s how we got <strong>122.5%</strong> max used connections!</p>
<p><em>[Read more on <a href="http://code.openark.org/forge/mycheckpoint/documentation/detecting-parameters-change">detecting parameters change</a>]</em></p>
<h4>Additional notes</h4>
<p>Just recently, a somewhat similar project, <a href="http://www.pythian.com/news/4703/sar-sql-the-script-formerly-known-as-mysar">sar-sql</a> was announced by <a href="http://mmatemate.blogspot.com/"><strong>Gerry Narvaja</strong></a> (Ex-<a href="http://www.pythian.com/">Pythian</a>). When sar-sql (formerly MySAR) was announced, my own code and ideas were at late stages. I&#8217;ve pondered about this, and have decided to go on with a separate project. While both make use of the same ideas, the implementation is quite different.</p>
<p>With proper setup, <em>mycheckpoint</em> can be used as an add-on to other monitoring tools. I currently have no plans for doing that, but time will tell.</p>
<p>I believe the ease of access to monitored data is a compelling reason to try out <em>mycheckpoint</em>. Please visit the <a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint home page</a>, read through the <a href="http://code.openark.org/forge/mycheckpoint/documentation">documentation</a>, and take some <a href="http://code.openark.org/forge/mycheckpoint/download">downloads</a> with you!</p>
<p>As always, community feedback is welcome. Feel free to throw in valueable feedback, <a href="http://code.google.com/p/mycheckpoint/issues/list">bug reports</a> or even a couple of tomatoes!</p>
<p><em>mycheckpoint</em> is released under the <a href="http://www.opensource.org/licenses/bsd-license.php">BSD license</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/announcing-mycheckpoint-lightweight-sql-oriented-monitoring-for-mysql/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Character sets: latin1 vs. ascii</title>
		<link>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii</link>
		<comments>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii#comments</comments>
		<pubDate>Wed, 08 Jul 2009 07:39:02 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[Data Types]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=828</guid>
		<description><![CDATA[Unless specified otherwise, latin1 is the default character set in MySQL. What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.) [...]]]></description>
			<content:encoded><![CDATA[<p>Unless specified otherwise, <strong>latin1</strong> is the default character set in MySQL.</p>
<p>What I usually find in schemes are columns which are either <strong>utf8</strong> or <strong>latin1</strong>. The <strong>utf8</strong> columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and <strong>latin1</strong> column being all the rest (passwords, digests, email addresses, hard-coded values etc.)</p>
<p>I find <strong>latin1</strong> to be improper for such purposes and suggest that <strong>ascii</strong> be used instead. The reason being that <strong>latin1</strong> implies a European text (with swedish collation). It is unclear for an outsider, when finding a <strong>latin1</strong> column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character in <strong>latin1</strong> only requires 1 byte of storage.<span id="more-828"></span></p>
<p>Well, this is what the <strong>ascii</strong> character set is for. When I see an <strong>ascii</strong> column, I know for sure no West European characters are allowed; just the plain old a-zA-Z0-9 etc. It is clearer from the schema&#8217;s definition what the stored values should be.</p>
<h4>A note to MySQL</h4>
<p>It&#8217;s been long since the Swedish roots of the company have dictated defaults. New instances should default to either <strong>ascii</strong> or  <strong>utf8</strong> (the latter being the most common and space efficient unicode protocol): character sets that are locale-neutral. Really, how many people realize that when they <strong>ORDER BY</strong> a text column, rows are sorted according to Swedish dictionary ordering?</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/character-sets-latin1-vs-ascii/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Reasons to use innodb_file_per_table</title>
		<link>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table</link>
		<comments>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table#comments</comments>
		<pubDate>Thu, 21 May 2009 03:40:42 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=614</guid>
		<description><![CDATA[When working with InnoDB, you have two ways for managing the tablespace storage: Throw everything in one big file (optionally split). Have one file per table. I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable. A single tablespace Having everything in one big file [...]]]></description>
			<content:encoded><![CDATA[<p>When working with InnoDB, you have two ways for managing the tablespace storage:</p>
<ol>
<li>Throw everything in one big file (optionally split).</li>
<li>Have one file per table.</li>
</ol>
<p>I will discuss the advantages and disadvantages of the two options, and will strive to convince that <strong>innodb_file_per_table</strong> is preferable.</p>
<h4>A single tablespace</h4>
<p>Having everything in one big file means all tables and indexes, from <em>all schemes</em>, are &#8216;mixed&#8217; together in that file.</p>
<p>This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my <strong>log</strong> table, the now unused space can be occupied by new rows of any other table.</p>
<p>This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.</p>
<p>An annoying property of InnoDB&#8217;s tablespaces is that they never shrink. So after purging those rows from the <strong>log</strong> table, the tablespace file (usually <strong>ibdata1</strong>) still keeps the same storage. It does not release storage to the file system.</p>
<p>I&#8217;ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around.<span id="more-614"></span></p>
<p>There&#8217;s little to do in this case. Well, one can always purge the rows. Sure, the space would be reused by InnoDB. But having a file which consumes some 80-90% of disk space is a performance catastrophe. It means the disk needle needs to move large distances. Overall disk performance runs very low.</p>
<p>The best way to solve this is to setup a new slave (after purging of the rows), and dump the data into that slave.</p>
<h4>InnoDB Hot Backup</h4>
<p>The funny thing is, the <strong>ibbackup</strong> utility will copy the tablespace file as it is. If it was 120GB, of which only 30GB are used, you still get a 120GB backed up and restored.</p>
<h4>mysqldump, mk-parallel-dump</h4>
<p>mysqldump would be your best choice if you only had the original machine to work with. Assuming you&#8217;re only using InnoDB, a dump with <strong>&#8211;single-transaction</strong> will do the job. Or you can utilize <a title="Maatkit: mk-parallel-dump" href="http://www.maatkit.org/">mk-parallel-dump</a> to speed things up (depending on your dump method and accessibility needs, mind the locking).</p>
<h4>innodb_file_per_table</h4>
<p>With this parameter set, a <strong>.ibd</strong> file is created per table. What we get is this:</p>
<ul>
<li>Tablespace is not shared among different tables, and certainly not among different schemes.</li>
<li>Each file is considered a tablespace of its own.</li>
<li>Again, tablespace never reduces in size.</li>
<li>It is possible to regain space per tablespace.</li>
</ul>
<p>Wait. The last two seem conflicting, don&#8217;t they? Let&#8217;s explain.</p>
<p>In our <strong>log</strong> table example, we purge many rows (up to 90GB of data is removed). The <strong>.ibd</strong> file does not shrink. But we <em>can</em> do:</p>
<blockquote><p>ALTER TABLE log ENGINE=InnoDB</p></blockquote>
<p>What will happen is that a new, temporary file is created, into which the table is rebuilt. Only existing data is added to the new table. Once comlete, the original table is removed, and the new table renamed as the original table.</p>
<p>Sure, this takes a long time, during which the table is completely locked: no writes and no reads allowed. But still &#8211; it allows us to regain disk space.</p>
<p>With the new InnoDB plugin, disk space is also regained when execuing a <strong>TRUNCATE TABLE log</strong> statement.</p>
<p>Fragmentation is not as bad as in a single tablespace: the data is limited within the boundaries of a smaller file.</p>
<h4>Monitoring</h4>
<p>One other nice thing about <strong>innodb_file_per_table</strong> is that it is possible to monitor table size on the file system level. You don&#8217;t need access to MySQL, to use SHOW TABLE STATUS or to query the INFORMATION_SCHEMA. You can just look up the top 10 largest files under your MySQL data directory (and subdirectories), and monitor their size. You can see which table grows fastest.</p>
<h4>Backup</h4>
<p>Last, it is not yet possible to backup single InnoDB tables by copying the <strong>.ibd</strong> files. But hopefully work will be done in this direction.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table/feed</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
	</channel>
</rss>
