<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Performance</title>
	<atom:link href="http://code.openark.org/blog/tag/performance/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Thu, 09 Sep 2010 16:15:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Views: better performance with condition pushdown</title>
		<link>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown</link>
		<comments>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown#comments</comments>
		<pubDate>Thu, 20 May 2010 05:17:05 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1328</guid>
		<description><![CDATA[Justin&#8217;s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer&#8217;s site. The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with [...]]]></description>
			<content:encoded><![CDATA[<p>Justin&#8217;s <a href="http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/">A workaround for the performance problems of TEMPTABLE views</a> post on <a href="http://www.mysqlperformanceblog.com/">mysqlperformanceblog.com</a> reminded me of a solution I once saw on a customer&#8217;s site.</p>
<p>The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.</p>
<p>While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there&#8217;s a hack around this.</p>
<p>Let&#8217;s use the world database to illustrate. Look at the following view definition:<span id="more-1328"></span></p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;
</pre>
</blockquote>
<p>The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>And, even if we only want to filter out a single country, we still get the same plan:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>So, we need to scan the entire country_language and country tables in order to return results for just one row.</p>
<h4>A non-working solution</h4>
<p>The solution offered by the developer was this:</p>
<blockquote><pre class="brush: sql;">
CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And follow by:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
</pre>
</blockquote>
<p>So, pushdown a <strong>WHERE</strong> condition into the view&#8217;s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using <strong>IFNULL</strong>, for example, or <strong>OR</strong> statements) to allow for full scan in case the variable is undefined.</p>
<p>This doesn&#8217;t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.</p>
<h4>A workaround</h4>
<p>Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:</p>
<blockquote><pre class="brush: sql;">
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And now:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql&gt; EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
</pre>
</blockquote>
<p>Since views are allowed to call stored routines (Justing used this to call upon <strong>CONNECTION_ID()</strong>), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.</p>
<p>Back in the customer&#8217;s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.</p>
<h4>Another kind of use</h4>
<p>Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:</p>
<p>Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?</p>
<p>There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I&#8217;ve seen it in use.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reducing locks by narrowing primary key</title>
		<link>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key</link>
		<comments>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key#comments</comments>
		<pubDate>Tue, 04 May 2010 06:46:01 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1269</guid>
		<description><![CDATA[In a period of two weeks, I had two cases with the exact same symptoms. Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks. In both cases, tables were InnoDB. In both cases, there was a [...]]]></description>
			<content:encoded><![CDATA[<p>In a period of two weeks, I had two cases with the exact same symptoms.</p>
<p>Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that <strong>INSERT</strong>s were causing the locks.</p>
<p>In both cases, tables were InnoDB. In both cases, there was a <strong>PRIMARY KEY</strong> on the combination of all <strong>5</strong> columns. And in both cases, there was no clear explanation as for why the <strong>PRIMARY KEY</strong> was chosen as such.</p>
<h4>Choosing a proper PRIMARY KEY</h4>
<p>Especially with InnoDB, which uses clustered index structure, the <strong>PRIMARY KEY</strong> is of particular importance. Besides the fact that a bloated <strong>PRIMARY KEY</strong> bloats the entire clustered index and secondary keys (see: <a href="http://code.openark.org/blog/mysql/the-depth-of-an-index-primer">The depth of an index: primer</a>), it is also a source for locks. It&#8217;s true that any <strong>UNIQUE KEY</strong> can serve as a <strong>PRIMARY KEY</strong>. But not all such keys are good candidates.<span id="more-1269"></span></p>
<h4>Reducing the locks</h4>
<p>In both described cases, the solution was to add an <strong>AUTO_INCREMENT</strong> column to serve as the <strong>PRIMARY KEY</strong>, and have that <strong>5</strong> column combination under a secondary <strong>UNIQUE KEY</strong>. The impact was immediate: no further locks on that table were detected, and query responsiveness turned very high.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Misimproving performance problems with INSERT DELAYED</title>
		<link>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed</link>
		<comments>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed#comments</comments>
		<pubDate>Thu, 14 Jan 2010 18:58:36 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1745</guid>
		<description><![CDATA[INSERT DELAYED may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand. It may even offer some performance boost, by aggregating [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://dev.mysql.com/doc/refman/5.1/en/insert-delayed.html">INSERT DELAYED</a> may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand.</p>
<p>It may even offer some performance boost, by aggregating such frequent INSERTs in a single thread.</p>
<p>But it is <strong>NOT</strong> a performance solution.</p>
<p>That is, in a case I&#8217;ve seen, database performance was poor. INSERTs were taking a very long time. Lot&#8217;s of locks were involved. The solution offered was to change all slow INSERTs to INSERT DELAYED. Voila! All INSERT queries now completed in no time.</p>
<p>But the database performance remained poor. Just as poor as before, with the additional headache: nobody knew what caused the low performance.</p>
<p>Using INSERT DELAYED to improve overall INSERT performance is like sweeping the dust under the carpet. It&#8217;s still there, only you can&#8217;t actually see it. When your queries are slow to return, you know which queries or which parts of your application are the immediate suspects. When everything happens in the background you lose that feeling.</p>
<p>The slow query log, fortunately, still provides with the necessary information, and all the other metrics are just as before. Good. But it now takes a deeper level of analysis to find a problem that was previously in plain sight.</p>
<p>So: use INSERT DELAYED carefully, don&#8217;t just throw it at your slow queries like a magic potion.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>On restoring a single table from mysqldump</title>
		<link>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump</link>
		<comments>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump#comments</comments>
		<pubDate>Tue, 01 Dec 2009 08:25:00 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1630</guid>
		<description><![CDATA[Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject. I also wish to note performance issues with the two suggested solutions, and offer improvements. Problem relevance While the problem is interesting, I just want to [...]]]></description>
			<content:encoded><![CDATA[<p>Following <a href="http://everythingmysql.ning.com/profiles/blogs/restore-one-table-from-an-all">Restore one table from an ALL database dump</a> and <a href="http://gtowey.blogspot.com/2009/11/restore-single-table-from-mysqldump.html">Restore a Single Table From mysqldump</a>, I would like to add my own thoughts and comments on the subject.</p>
<p>I also wish to note performance issues with the two suggested solutions, and offer improvements.</p>
<h4>Problem relevance</h4>
<p>While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small &#8211; and it doesn&#8217;t matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big &#8211; and it would not be worthwhile to restore from <em>mysqldump</em> anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.</p>
<h4>Problem recap</h4>
<p>Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?</p>
<p>Let&#8217;s review the two referenced solutions. I&#8217;ll be using the <a href="http://dev.mysql.com/doc/employee/en/employee.html">employees db</a> on <a href="https://launchpad.net/mysql-sandbox">mysql-sandbox</a> for testing. I&#8217;ll choose a very small table to restore: <strong>departments</strong> (only a few rows in this table).</p>
<h4>Security based solution</h4>
<p><a href="http://everythingmysql.ning.com/profiles/blogs/restore-one-table-from-an-all"><strong>Chris</strong></a> offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I&#8217;m afraid, not too efficient, for two reasons:<span id="more-1630"></span></p>
<ol>
<li>MySQL needs to process all irrelevant queries (ALTER, INSERT, &#8230;) only to disallow them due to access violation errors.</li>
<li>Assuming restore is from remote host, we overload the network with all said irrelevant queries.</li>
</ol>
<p>Just how inefficient? Let&#8217;s time it:</p>
<blockquote>
<pre>mysql&gt; grant usage on *.* to 'restoreuser'@'localhost';
mysql&gt; grant select on *.* to 'restoreuser'@'localhost';
mysql&gt; grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees &lt; /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    <strong>0m31.945s</strong>
user    0m6.328s
sys     0m0.508s</pre>
</blockquote>
<p>So, at about <strong>30</strong> seconds to restore a 9 rows table.</p>
<h4>Text filtering based solution.</h4>
<p><a href="http://gtowey.blogspot.com/2009/11/restore-single-table-from-mysqldump.html"><strong>gtowey</strong></a> offers parsing the dump file beforehand:</p>
<ul>
<li>First, parse with <em>grep</em>, to detect rows where tables are referenced within dump file</li>
<li>Second, parse with <em>sed</em>, extracting relevant rows.</li>
</ul>
<p>Let&#8217;s time this one:</p>
<blockquote>
<pre>$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    <strong>0m0.397s</strong>
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    <strong>0m0.562s</strong>
user    0m0.380s
sys     0m0.176s</pre>
</blockquote>
<p>Much faster: about <strong>1</strong> second, compared to <strong>30</strong> seconds from above.</p>
<p>Nevertheless, I find two issues here:</p>
<ol>
<li>A correctness problem: this solution somewhat assumes that there&#8217;s only a single table with desired name. I say &#8220;somewhat&#8221; since it leaves this for the user.</li>
<li>An efficiency problem: it reads the dump file <em>twice</em>. First parsing it with <em>grep</em>, then with <em>sed</em>.</li>
</ol>
<h4>A third solution</h4>
<p><em>sed</em> is much stronger than presented. In fact, the inquiry made by <em>grep</em> in gtowey&#8217;s solution can be easily handled by <em>sed</em>:</p>
<blockquote>
<pre>$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    <strong>0m0.573s</strong>
user    0m0.416s
sys     0m0.152s</pre>
</blockquote>
<p>So, the <strong>&#8220;/^&#8211; Table structure for table \`departments\`/,/^&#8211; Table structure for table/p&#8221;</strong> part tells <em>sed</em> to only print those rows starting from the <strong>departments</strong> table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).</p>
<p>And, we only do it in <strong>0.57</strong> seconds: about half the time of previous attempt.</p>
<p>Now, just to be more correct, we only wish to consider the <strong>employees.department</strong> table. So, <em>assuming</em> there&#8217;s more than one database dumped (and, by consequence, <strong>USE</strong> statements in the dump-file), we use:</p>
<blockquote>
<pre>cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees</pre>
</blockquote>
<h4>Further notes</h4>
<ul>
<li>All tests used warmed-up caches.</li>
<li>The sharp eyed readers would notice that <strong>departments</strong> is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I&#8217;ve created an <strong>xdepartments</strong> table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we&#8217;re still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.</li>
</ul>
<p>[<strong>UPDATE</strong>: see also following similar post: <a href="http://blog.tsheets.com/2008/tips-tricks/extract-a-single-table-from-a-mysqldump-file.html">Extract a Single Table from a mysqldump File</a>]</p>
<h4>Conclusion</h4>
<p><a href="http://www.amazon.com/Classic-Shell-Scripting-Arnold-Robbins/dp/0596005954/ref=sr_1_1"><img class="alignright" title="classic-shell-scripting" src="http://code.openark.org/blog/wp-content/uploads/2009/12/classic-shell-scripting.png" alt="classic-shell-scripting" width="144" height="189" /></a>Its is always best to test on large datasets, to get a feel on performance.</p>
<p>It&#8217;s best to save MySQL the trouble of parsing &amp; ignoring statements. Scripting utilities like <em>sed</em>, <em>awk</em> &amp; <em>grep</em> have been around for ages, and are well optimized. They excel at text processing.</p>
<p>I&#8217;ve used <em>sed</em> many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep &amp; awk are also very useful.</p>
<p>May I recommend, at this point, reading <a href="http://www.amazon.com/Classic-Shell-Scripting-Arnold-Robbins/dp/0596005954/ref=sr_1_1">Classic Shell Scripting</a>, a very easy to follow book, which lists the most popular command line utilities like <em>grep</em>, <em>sed</em>, <em>awk</em>, <em>sort</em>, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Performance analysis with mycheckpoint</title>
		<link>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint</link>
		<comments>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint#comments</comments>
		<pubDate>Thu, 12 Nov 2009 10:47:00 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[mycheckpoint]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1568</guid>
		<description><![CDATA[mycheckpoint (see announcement) allows for both graph presentation and quick SQL access to monitored &#38; analyzed data. I&#8217;d like to show the power of combining them both. InnoDB performance Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the HTML [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a> (see <a href="http://code.openark.org/blog/mysql/announcing-mycheckpoint-lightweight-sql-oriented-monitoring-for-mysql">announcement</a>) allows for both graph presentation and quick SQL access to monitored &amp; analyzed data. I&#8217;d like to show the power of combining them both.</p>
<h4>InnoDB performance</h4>
<p>Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-html-reports">HTML report</a>):</p>
<blockquote>
<pre>SELECT innodb_read_hit_percent FROM sv_report_chart_sample \G
*************************** 1. row ***************************
innodb_read_hit_percent: http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&amp;chxt=x,y&amp;chxr=1,99.60,100.00&amp;chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&amp;chxs=0,505050,10</pre>
</blockquote>
<blockquote>
<pre><img class="alignnone" title="innodb_read_hit_percent" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&amp;chxt=x,y&amp;chxr=1,99.60,100.00&amp;chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>We see that read hit is usually high, but occasionally drops low, down to 99.7, or even 99.6. But it seems like most of the time we are above 99.95% read hit ratio. It&#8217;s hard to tell about 99.98%.</p>
<h4>Can we know for sure?</h4>
<p>We can stress our eyes, yet be certain of little. It&#8217;s best if we just query for the metrics! <em>mycheckpoint</em> provides with all data, accessible by simple SQL queries:<span id="more-1568"></span></p>
<blockquote>
<pre>SELECT SUM(innodb_read_hit_percent &gt; 99.95)/count(*)
  FROM sv_report_sample;
+-----------------------------------------------+
| SUM(innodb_read_hit_percent &gt; 99.95)/count(*) |
+-----------------------------------------------+
|                                        0.7844 |
+-----------------------------------------------+</pre>
</blockquote>
<p>Yes, most of the time we&#8217;re above 99.95% read hit ratio: but not too often!</p>
<p>I&#8217;m more interested in seeing how much time my server&#8217;s above 99.98% read hit:</p>
<blockquote>
<pre>SELECT SUM(innodb_read_hit_percent &gt; 99.98)/count(*)
  FROM sv_report_sample;
+-----------------------------------------------+
| SUM(innodb_read_hit_percent &gt; 99.98)/count(*) |
+-----------------------------------------------+
|                                        0.3554 |
+-----------------------------------------------+</pre>
</blockquote>
<p>We can see the server only has 99.98% read hit percent 35% of the time. Need to work on that!</p>
<h4>Disk activity</h4>
<p>Lower read hit percent means higher number of disk reads; that much is obvious. The first two following graphs present this obvious connection. But the third graph tells us another fact: with increased disk I/O, we can expect more (and longer) locks.</p>
<p>Again, this should be very intuitive, when thinking about it this way. The problem sometimes arises when we try to analyze it the other way round: &#8220;Hey! InnoDB has a lot of locks! What are we going to do about it?&#8221;. Many times, people will look for answers in their <em>transactions</em>, their <em>Isolation Level</em>, their <em>LOCK IN SHARE MODE</em> clauses. But the simple answer can be: &#8220;There&#8217;s a lot of I/O, so everything has to wait; therefore we increase the probability for locks; therefore there&#8217;s more locks&#8221;.</p>
<p>The answer, then, is to reduce I/O. The usual stuff: slow queries; indexing; &#8230; and, yes, perhaps transactions or tuning.</p>
<p>The charts below make it quite clear that we have an issue of excessive reads -&gt; less read hit -&gt; increased I/O -&gt; more locks.</p>
<blockquote>
<pre><img class="alignnone" title="DML" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=com_select_psec|com_insert_psec|com_delete_psec|com_update_psec|com_replace_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4,9acd32,dc143c,9932cc&amp;chd=s:IJJJJJKHGHGHGHHHHHIIIJJJKKKLKLLIHHHIHIHIIIJJJJJKKKLLLLMIHHHHHHHIIIIIJJKKKLLLLMMIHIIIIIIIIIIJJJJKKLLLLMMIHHHIHIHIIIJJJJJKKLKLLLLIHHHIHIHIIIIJIJJJJKKKKKKHHHHHHHHHHHHIIIIJIJJJJJKHHHHHHHHHHIIJJNKLLKKLLLMSMHHIHHHIOSae9RNPJIIJJJKHGGGHGGHHHHHJJKJLKLLLMKMJHIIIIIII,EEEEEEEFEEEEEEEFEFFFFFFFFFFFFFFEEFFEEEEEFFFFFFFFFFFGFFFGFFFFEEEEFFFFFFGGFGFFFFFGEFFFEEFFFFFFFFFFFFFFFFFGEEEEEEEEFFFGGFFFFGFFFFFGEEEFFEEFFEFFEEFFFFFFFEEFEEEEEEEEEEEEEEFFEEEEFEEGEEEEEEEEEFFFFFFFFFEFFFFHEEEEEEEFFFFFFFFFFEEEEFEHEFEEEEEEEEFFFGFGGFFFFFFIEEEEEEEF,CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCDDCCCCCCCCCCCCCCCDCCCCCCCCCCCCCCCCCCDCCCCECCBCBCCCCCCCCCCDCCCCCDCECCCCCCCCCCCCCCCCCCCDCCCDCCCCBBCCCCDCCDCCCCCCCCCECCCBBCCCCCCCCCCCCCCCCCCFCCBCCCCCCCCCCCCCCCCCCCCFCCCCBCCCCCCCDCCCCDCCDCCGCCCCCCCD,CBCCCBBCBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCCCCCCBBBCCCCCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBBBBBBCBBBBBBBBCBBCCCCCCCCCCCCCCCCCCCCC,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&amp;chxt=x,y&amp;chxr=1,0,680.42&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_read_hit_percent" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:8p879mq7z1377377777z788863778839z13773877633697786888969z1379377667275377376672813167266771288716689y759121685885785236675889869232685789w63y69997989999252696878y8698878588886933368587ffpibibaTYRfVAdXjqfdmbYneRhciXYcifb6995802z56377666576877268875913278387&amp;chxt=x,y&amp;chxr=1,99.44,99.99&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_io" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_buffer_pool_reads_psec|innodb_buffer_pool_pages_flushed_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4&amp;chd=s:DYDDBZVEPMJEFKFGGGEOEDDDEJDECBGBONKEFJEFFFIHFCDDCECCCBECSPLECJEFGHFLDGHEDHEEEDHCMJMGFLGHFELMDCDLEFDBRDFBKIKECIDEHEDHJHFEDGCDCDFCKJKFEJFECRGHNFCCBFBECBCCLHKFBGDEDUEGBCCEEHDCDDFBJJJFEIDFwrfpthozmqqcn3g9hYjkbpqdsvhxormohorGCBHDOLNGEHEDFFGIEFDEDKFDDDGCLIJECIDE,EEEEEGEFSWUFFEGHKIHHHGGHJIHHHHGGQbTFFEFFHGGGGFFHHHGHGFFGUdYGDEGFJKIHHHLKJJJIHHHGHZQRGFGHIHIGGGHIIIGHFFFEHYPNCEFEHHIIIKKJIJHHGIFFGbSPFGJIGFGGGEFFEFEFFEEGSIUODCDFHGGFEEGGGGGGGHFFGYPNDFFGJHIJJIJIHHGFFFEHHVSLCDGIHIHGIHGGFGFFFGIJTRSMEEFFGHHIIIHJKLKHIHGHPNNMCFGE&amp;chxt=x,y&amp;chxr=1,0,151.44&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_row_lock_waits_psec" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_row_lock_waits_psec&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:GWFGGYSHQJKFGJHIHIHNGHGGGKHHFFGGMMJFFKHHHKMIHGIIJGGFGFHGTNOGFJIHGJGKGJHGGGGFHFJFPIKFFJHJIFLKGGFIFGGEVEJGPILGFIGHJJHIJKGGFJGLIGKGJMSGGIGIGVGGQJGHHKHIGHFGLHMHFIFIGQGGIFGJHIEEHFHGKLJHGGGIYgTVXOaXabSUadW9gVfRSeaQbfalXeYcXTiGHIKHKEJEFFFGGGIGGGGHGKGGHGLGPJHGFJEG&amp;chxt=x,y&amp;chxr=1,0,1.42&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>By the way, the above resulted from the fact that, due to a problematic query, all slave stopped replicating. Slaves participated in read-balancing, so when they went stale, all reads were directed at the master (the monitored node).</p>
<h4>You have the metrics at your disposal</h4>
<p>Looking at the following chart:</p>
<blockquote>
<pre><img class="alignnone" title="questions" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+11,+15:15++-++Nov+12,+12:30+(0+days,+21+hours)&amp;chdl=queries_psec|questions_psec|slow_queries_psec|com_commit_psec|com_set_option_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4,9acd32,dc143c,9932cc&amp;chd=s:pqpviksvvz0vuxxxjpw0mwpkkso1vhuvn0nnrtx2uisrnvoknmusomqvlyymsvpuweqslwumkomutcromzrinukvwcuzotujjto1shrtszqlu849mXenejkaZlZhcYbgciaZZegecUWhZkaYWebfaXVaecdZUZgdbSbccbcTXYeaaTYZfZeVjbnZhRdegcfYorkdmVadqenfcknkoadeuhrjcbptpkhkqkrqfjprrtmllmnqdwsusojoo0qtpwp4,abfjVQebgjaWeedkWRgcelWUcdclYUhddnaVidendUieflaUhcfkdRfefmgSjianfPkcdmfUegamfRmcfmgTgegmhQghgmgWeiepfShfhqjcqzwzfVYdbfbUXbXcYSYaYdVWUZabWTTbXeYUVZYdVTUYabYWUYbbXSWaYaYSSXZZVTVZaXZURZaWbQWZaaYWUWbaZSUadadVUcbbbTWYeabXUUcebUYbabdVUYbbdTWaaccWTddaeTWbdbgXXdci,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,MLOPJHNMNPLKNNNQJHOMNQJJMMMQLIPMNSLJPNNRNJPNNRKJPMNQNHONNRNIPPLROHQNMRNJOOMRNHRMNRNJONOROHOPOROKNPNTOIPOOTOMQUSUNJKNLOLJKMKMKIKLKMJJILLMKIIMKNJJJLKNIJJKKMKJIKLMKHJLKLKIIKKLJIJLKKKJHLLKLHJLKLKJIKLLKIJLMLMKJMMMMIJKMLLKIJMNLJKMLMMJJKLMNIKLLMMKIMMLNIKMMMOKKNMP&amp;chxt=x,y&amp;chxr=1,0,916.47&amp;chxl=0:||Nov+11,+19:30|Nov+11,+23:45|Nov+12,+04:00|Nov+12,+08:15|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>It appears that there&#8217;s no slow queries. But this may be misleading: perhaps there&#8217;s just a little, that don&#8217;t show due to the chart&#8217;s large scale?</p>
<p>One could argue that this is the chart&#8217;s fault. Perhaps there should be a distinct chart for &#8220;slow queries percent&#8221;. Perhaps I&#8217;ll add one. But we can&#8217;t have special charts for everything. It&#8217;s would be too tiresome to look at hundreds of charts.</p>
<p>Anyway, my point is: let&#8217;s verify just how many slow queries we have:</p>
<blockquote>
<pre>SELECT slow_queries_psec FROM sv_hour ORDER BY id DESC;
+-------------------+
| slow_queries_psec |
+-------------------+
|              3.05 |
|              3.83 |
|              4.39 |
|              4.03 |
|              3.86 |
|              3.56 |
|              3.73 |
|              3.79 |
|              3.58 |
|              3.55 |
...
+-------------------+</pre>
</blockquote>
<p>So, between 3 and 4 slow queries per second. It doesn&#8217;t look too good in this light. Checking on the percentage of slow queries (of total questions):</p>
<blockquote>
<pre>SELECT ROUND(100*slow_queries_diff/questions_diff, 1) AS slow_queries_percent
  FROM sv_hour ORDER BY id DESC LIMIT 10;</pre>
</blockquote>
<p>Or, since the above calculation is pre-defined in the reports tables:</p>
<blockquote>
<pre>SELECT slow_queries_percent FROM sv_report_hour_recent;
+----------------------+
| slow_queries_percent |
+----------------------+
|                  0.8 |
|                  1.0 |
|                  1.2 |
|                  1.2 |
|                  1.1 |
|                  1.0 |
|                  1.1 |
|                  1.1 |
|                  1.0 |
...
+----------------------+</pre>
</blockquote>
<h4>Accessible data</h4>
<p>This is what I&#8217;ve been trying to achieve with <em>mycheckpoint</em>. As a DBA, consultant and SQL geek I find that direct SQL access works best for me. It&#8217;s like loving command line interface over GUI tools. Direct SQL gives you so much more control and information.</p>
<p>Charting is important, since it&#8217;s easy to watch and get first impressions, or find extreme changes. But beware of relying on charts all the time. Scale issues, misleading human interpretation, technology limitations &#8211; all these make charts inaccurate.</p>
<p><a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a> allows for both methods, and, I believe, intuitively so.</p>
<p>&lt;/propaganda&gt;</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to calculate a good InnoDB log file size &#8211; recap</title>
		<link>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap</link>
		<comments>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap#comments</comments>
		<pubDate>Tue, 20 Oct 2009 19:04:40 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=895</guid>
		<description><![CDATA[Following Baron Schwartz&#8217; post: How to calculate a good InnoDB log file size, which shows how to make an estimate for the InnoDB log file size, and based on SQL: querying for status difference over time, I&#8217;ve written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB [...]]]></description>
			<content:encoded><![CDATA[<p>Following Baron Schwartz&#8217; post: <a href="http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/">How to calculate a good InnoDB log file size</a>, which shows how to make an estimate for the InnoDB log file size, and based on <a href="http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time">SQL: querying for status difference over time</a>, I&#8217;ve written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB transaction log bytes that are expected to be written in the period of 1 hour.</p>
<p><em>Recap</em>: this information can be useful if you&#8217;re looking for a good <strong>innodb_log_file_size</strong> value, such that will not pose too much I/O (smaller values will make for more frequent flushes), not will make for a too long recovery time (larger values mean more transactions to recover upon crash).</p>
<p>It is assumed that the 60 seconds period represents an average system load, not some activity spike period. Edit the sleep time and factors as you will to sample longer or shorter periods.<span id="more-895"></span></p>
<blockquote>
<pre><strong>SELECT</strong>
  innodb_os_log_written_per_minute*60
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour,
  CONCAT(ROUND(innodb_os_log_written_per_minute*60/1024/1024, 1), 'MB')
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour_mb
<strong>FROM</strong>
  (<strong>SELECT</strong> <strong>SUM</strong>(value) <strong>AS</strong> innodb_os_log_written_per_minute <strong>FROM</strong> (
    <strong>SELECT</strong> -VARIABLE_VALUE <strong>AS</strong> value
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> SLEEP(60)
      <strong>FROM</strong> DUAL
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> VARIABLE_VALUE
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
  ) s1
) s2
;</pre>
</blockquote>
<p>Sample output:</p>
<blockquote>
<pre>+------------------------------------------+---------------------------------------------+
| estimated_innodb_os_log_written_per_hour | estimated_innodb_os_log_written_per_hour_mb |
+------------------------------------------+---------------------------------------------+
|                                584171520 | 557.1MB                                     |
+------------------------------------------+---------------------------------------------+</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>High Performance MySQL &#8211; a book to re-read</title>
		<link>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read</link>
		<comments>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read#comments</comments>
		<pubDate>Sun, 27 Sep 2009 07:56:59 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1346</guid>
		<description><![CDATA[I first read High Performance MySQL, 2nd edition about a year ago, when it first came out. I since re-read a few pages on occasion. In my previous posts I&#8217;ve suggested ways to improve upon the common ranking solution. Very innovative stuff! Or&#8230; so I thought. I happened to browse through the book today, and [...]]]></description>
			<content:encoded><![CDATA[<p>I first read <a href="http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716">High Performance MySQL, 2nd edition</a> about a year ago, when it first came out. I since re-read a few pages on occasion.</p>
<p>In my previous posts I&#8217;ve suggested ways to improve upon the common ranking solution. Very innovative stuff! Or&#8230; so I thought.</p>
<p>I happened to browse through the book today, and a section on User Variables caught my eye. &#8220;<em>Let&#8217;s see if I get get some insight</em>&#8220;, I thought to myself. Imagine my surprise when I realized almost everything I&#8217;ve suggested is discussed in this modest section, black on white, sitting on my bookshelf for over a year!</p>
<p>I have read it a year back, have forgotten all about it, have re-invented stuff already solved and discussed&#8230; Oh, for more brain capacity&#8230;</p>
<p>To be honest, this has happened to me more than once in the past few months; I&#8217;m taking the habit of browsing the web when I&#8217;m looking for answers to my problems; I forget that this book contains the answers to so many common, practical MySQL problems, and does so in a very direct and helpful manner.</p>
<p>So, yet again, thumbs up to <em>High Performance MySQL</em>. Really a must book. Get it if you haven&#8217;t already!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>InnoDB is dead. Long live InnoDB!</title>
		<link>http://code.openark.org/blog/mysql/innodb-is-dead-long-live-innodb</link>
		<comments>http://code.openark.org/blog/mysql/innodb-is-dead-long-live-innodb#comments</comments>
		<pubDate>Thu, 10 Sep 2009 05:04:38 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1271</guid>
		<description><![CDATA[I find myself converting more and more customers&#8217; databases to InnoDB plugin. In one case, it was a last resort: disk space was running out, and plugin&#8217;s compression released 75% space; in another, a slow disk made for IO bottlenecks, and plugin&#8217;s improvements &#38; compression alleviated the problem; in yet another, I used the above [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself converting more and more customers&#8217; databases to InnoDB plugin. In one case, it was a last resort: disk space was running out, and plugin&#8217;s compression released 75% space; in another, a slow disk made for IO bottlenecks, and plugin&#8217;s improvements &amp; compression alleviated the problem; in yet another, I used the above to fight replication lag on a stubborn slave.</p>
<p>In all those case, I needed to justify the move to &#8220;new technology&#8221;. The questions &#8220;Is it GA? Is it stable?&#8221; are being asked a lot. Well, just a few days ago the MySQL 5.1 distribution started shipping with InnoDB plugin 1.0.4. That gives some weight to the stability question when facing a doubtful customer.</p>
<p>But I realized <em>that wasn&#8217;t the point</em>.</p>
<p><span id="more-1271"></span>Before InnoDB plugin was first announced, little was going on with InnoDB. There were concerns about the slow/nonexistent progress on this important storage engine, essentially the heart of MySQL. Then the plugin was announced, and everyone went happy.</p>
<p>The point being, since then I only saw (or was exposed to, at least) progress on the plugin. The way I understand it, the plugin is the main (and only?) focus of development. And this is the significant thing to consider: if you&#8217;re keeping to &#8220;old InnoDB&#8221;, fine &#8211; but it won&#8217;t get you much farther; you&#8217;re unlikely to see great performance improvements (will 5.4 make a change? An ongoing improvement to InnoDB?). It may eventually become stale.</p>
<p>Converting to InnoDB plugin means you&#8217;re working with the technology at focus. It&#8217;s being tested, benchmarked, forked, improved, talked about, explained. I find this to be a major motive.</p>
<p>So, long live InnoDB Plugin! (At least till next year, that is, when we may all find ourselves migrating to PBXT)</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/innodb-is-dead-long-live-innodb/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>SQL: finding a user&#8217;s country/region based on IP</title>
		<link>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip</link>
		<comments>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip#comments</comments>
		<pubDate>Tue, 26 May 2009 06:35:17 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=705</guid>
		<description><![CDATA[I&#8217;ve encountered the same problem twice for different customers, so I guess it&#8217;s worth a discussion. A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve encountered the same problem twice for different customers, so I guess it&#8217;s worth a discussion.</p>
<p>A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.</p>
<p>To start with, there&#8217;s a table which lists the IP ranges per country/region. Let&#8217;s assume we&#8217;re only dealing with IPv4:</p>
<blockquote>
<pre>CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);</pre>
</blockquote>
<p>The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?</p>
<h4>The wrong way</h4>
<p>The form I&#8217;ve encountered is as follows: an index is declared on regions_ip_range:</p>
<blockquote>
<pre>KEY ip_range_idx (start_ip, end_ip)</pre>
</blockquote>
<p>And the query goes like this:</p>
<blockquote>
<pre>SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip</pre>
</blockquote>
<p><span id="more-705"></span>It takes a grasp of indexes to understand that this is wrong. I&#8217;m not saying the results are wrong, just that the query performance is bad. Let&#8217;s rewrite the query to understand why. The following query is the exact equal of the above:</p>
<blockquote>
<pre>SELECT * FROM regions_ip_range
WHERE my_ip &gt;= start_ip AND my_ip &lt;= end_ip</pre>
</blockquote>
<p>Can you see the problem?</p>
<p>There&#8217;s a range condition on the first indexed column (<strong>start_ip</strong>). This automatically negates the use of the second column (<strong>end_ip</strong>). Reversing the order won&#8217;t do, since there&#8217;s also a range condition on <strong>end_ip</strong>.</p>
<p>Effectively, if this were the only query we were executing, we would get the same performance had we defined the following index:</p>
<blockquote>
<pre>KEY ip_range_idx (start_ip)</pre>
</blockquote>
<p>Now that doesn&#8217;t look promising. It&#8217;s fair to guess (as happens in reality) that for the vast majority of ip addresses, MySQL would rather perform a full table scan than use the index.</p>
<h4>Another wrong way</h4>
<p>When pointing this to people, the natural response is: &#8220;OK, then, let&#8217;s index like <em>this</em>:&#8221;</p>
<blockquote>
<pre>KEY start_ip_idx (start_ip)
KEY end_ip_idx (end_ip)</pre>
</blockquote>
<p>Now we have two indexes, one on each address. But that won&#8217;t do at all. Even if we assume MySQL will use both indexes for our query, and do an index_merge, we won&#8217;t have good performance. Consider: you can&#8217;t have both indexes be selective for any given IP. Either the IP is close to the beginning of the global range (in which case the &#8216;<strong>my_ip &gt;= start_ip</strong>&#8216; part is not selective) or it is nearer the upper bound (in which case the &#8216;<strong>my_ip &lt;= end_ip</strong>&#8216; part is not selective), or is somewhere in the middle, in which case none is selective.</p>
<p>In fact, I cannot imagine MySQL would choose to use index_merge at all, and so at most one index is used, if not full table scan again.</p>
<h4>A solution</h4>
<p>An important step towards a solution is the realization that the IP ranges are <em>mutually exclusive</em>. No IP can lie in any two ranges, just one (at least, this is the data I&#8217;ve worked with. If you have hierarchical ranges, you&#8217;ll need to make adjustments). This means I don&#8217;t really need to index both columns. One would suffice. Say I was to put the following index:</p>
<blockquote>
<pre>KEY start_ip_idx (start_ip)</pre>
</blockquote>
<p>We&#8217;ve seen that the presented query won&#8217;t run well on this. Can we rewrite the query as well? Sure! Here&#8217;s one that will work:</p>
<blockquote>
<pre>SELECT * FROM regions_ip_range
WHERE start_ip &lt;= my_ip
ORDER BY start_ip DESC LIMIT 1</pre>
</blockquote>
<p>What we&#8217;re asking for, now, is the first range for which our IP is larger than the range&#8217;s start, reading <em>backwards</em>. What the optimizer needs to do is find the first entry for which <strong>start_ip &lt;= my_ip</strong>, using the index, and then&#8230; oh, there&#8217;s no need to go on, as we have <strong>LIMIT 1</strong>.</p>
<p>If this seems confusing, you can do the opposite. Define this key:</p>
<blockquote>
<pre>KEY end_ip_idx (end_ip)</pre>
</blockquote>
<p>And use this query, instead:</p>
<blockquote>
<pre>SELECT * FROM regions_ip_range
WHERE my_ip &lt;= end_ip
ORDER BY end_ip ASC LIMIT 1</pre>
</blockquote>
<p>It&#8217;s interesting that EXPLAIN would still claim it&#8217;s going to scan a large number of rows, since it does not take the <strong>LIMIT 1</strong> into account.</p>
<p>I&#8217;ve <a title="Two storage engines; different plans, Part II" href="http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-ii">written before</a> about the differences between storage engines in the way they recommend the optimizer to use (or not to use) an index. So you may need to end up with a <strong>FORCE_INDEX</strong> after all.</p>
<h4>Assumptions</h4>
<p>I&#8217;ve made a few assumptions here:</p>
<ol>
<li>The table lists ranges are covering: they start with 0.0.0.0 and end with 255.255.255.255.</li>
<li>There are no &#8216;holes&#8217; in the table. Meaning there&#8217;s bound to be a range for any given IP.</li>
<li>IP ranges are mutually exclusive (no hierarchical IP ranges)</li>
</ol>
<p>If the first two assumptions are not met, it should be checked, once the query returns, that <strong>my_ip</strong> is indeed between <strong>start_ip</strong> and <strong>end_ip</strong>.</p>
<p>If assumption #3 is not met, the data can be split to two tables: one must hold the mutually exlusive data; the second one may contain whatever data you have, possibly utilizing some hierarchial algorithm such as nested sets etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-finding-a-users-countryregion-based-on-ip/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Reasons to use innodb_file_per_table</title>
		<link>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table</link>
		<comments>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table#comments</comments>
		<pubDate>Thu, 21 May 2009 03:40:42 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=614</guid>
		<description><![CDATA[When working with InnoDB, you have two ways for managing the tablespace storage: Throw everything in one big file (optionally split). Have one file per table. I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable. A single tablespace Having everything in one big file [...]]]></description>
			<content:encoded><![CDATA[<p>When working with InnoDB, you have two ways for managing the tablespace storage:</p>
<ol>
<li>Throw everything in one big file (optionally split).</li>
<li>Have one file per table.</li>
</ol>
<p>I will discuss the advantages and disadvantages of the two options, and will strive to convince that <strong>innodb_file_per_table</strong> is preferable.</p>
<h4>A single tablespace</h4>
<p>Having everything in one big file means all tables and indexes, from <em>all schemes</em>, are &#8216;mixed&#8217; together in that file.</p>
<p>This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my <strong>log</strong> table, the now unused space can be occupied by new rows of any other table.</p>
<p>This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.</p>
<p>An annoying property of InnoDB&#8217;s tablespaces is that they never shrink. So after purging those rows from the <strong>log</strong> table, the tablespace file (usually <strong>ibdata1</strong>) still keeps the same storage. It does not release storage to the file system.</p>
<p>I&#8217;ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around.<span id="more-614"></span></p>
<p>There&#8217;s little to do in this case. Well, one can always purge the rows. Sure, the space would be reused by InnoDB. But having a file which consumes some 80-90% of disk space is a performance catastrophe. It means the disk needle needs to move large distances. Overall disk performance runs very low.</p>
<p>The best way to solve this is to setup a new slave (after purging of the rows), and dump the data into that slave.</p>
<h4>InnoDB Hot Backup</h4>
<p>The funny thing is, the <strong>ibbackup</strong> utility will copy the tablespace file as it is. If it was 120GB, of which only 30GB are used, you still get a 120GB backed up and restored.</p>
<h4>mysqldump, mk-parallel-dump</h4>
<p>mysqldump would be your best choice if you only had the original machine to work with. Assuming you&#8217;re only using InnoDB, a dump with <strong>&#8211;single-transaction</strong> will do the job. Or you can utilize <a title="Maatkit: mk-parallel-dump" href="http://www.maatkit.org/">mk-parallel-dump</a> to speed things up (depending on your dump method and accessibility needs, mind the locking).</p>
<h4>innodb_file_per_table</h4>
<p>With this parameter set, a <strong>.ibd</strong> file is created per table. What we get is this:</p>
<ul>
<li>Tablespace is not shared among different tables, and certainly not among different schemes.</li>
<li>Each file is considered a tablespace of its own.</li>
<li>Again, tablespace never reduces in size.</li>
<li>It is possible to regain space per tablespace.</li>
</ul>
<p>Wait. The last two seem conflicting, don&#8217;t they? Let&#8217;s explain.</p>
<p>In our <strong>log</strong> table example, we purge many rows (up to 90GB of data is removed). The <strong>.ibd</strong> file does not shrink. But we <em>can</em> do:</p>
<blockquote><p>ALTER TABLE log ENGINE=InnoDB</p></blockquote>
<p>What will happen is that a new, temporary file is created, into which the table is rebuilt. Only existing data is added to the new table. Once comlete, the original table is removed, and the new table renamed as the original table.</p>
<p>Sure, this takes a long time, during which the table is completely locked: no writes and no reads allowed. But still &#8211; it allows us to regain disk space.</p>
<p>With the new InnoDB plugin, disk space is also regained when execuing a <strong>TRUNCATE TABLE log</strong> statement.</p>
<p>Fragmentation is not as bad as in a single tablespace: the data is limited within the boundaries of a smaller file.</p>
<h4>Monitoring</h4>
<p>One other nice thing about <strong>innodb_file_per_table</strong> is that it is possible to monitor table size on the file system level. You don&#8217;t need access to MySQL, to use SHOW TABLE STATUS or to query the INFORMATION_SCHEMA. You can just look up the top 10 largest files under your MySQL data directory (and subdirectories), and monitor their size. You can see which table grows fastest.</p>
<h4>Backup</h4>
<p>Last, it is not yet possible to backup single InnoDB tables by copying the <strong>.ibd</strong> files. But hopefully work will be done in this direction.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/reasons-to-use-innodb_file_per_table/feed</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
	</channel>
</rss>
