<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Performance</title>
	<atom:link href="http://code.openark.org/blog/tag/performance/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Wed, 01 Feb 2012 08:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Self throttling MySQL queries</title>
		<link>http://code.openark.org/blog/mysql/self-throttling-mysql-queries</link>
		<comments>http://code.openark.org/blog/mysql/self-throttling-mysql-queries#comments</comments>
		<pubDate>Tue, 01 Nov 2011 07:55:47 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Hack]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4294</guid>
		<description><![CDATA[Recap on the problem: A query takes a long time to complete. During this time it makes for a lot of I/O. Query's I/O overloads the db, making for other queries run slow. I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the [...]]]></description>
			<content:encoded><![CDATA[<p>Recap on the problem:</p>
<ul>
<li>A query takes a long time to complete.</li>
<li>During this time it makes for a lot of I/O.</li>
<li>Query's I/O overloads the db, making for other queries run slow.</li>
</ul>
<p>I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.</p>
<p>I present two approaches:</p>
<ul>
<li>The naive approach: for every <strong>1,000</strong> rows, the query sleep for <strong>1</strong> second</li>
<li>The factor approach: for every <strong>1,000</strong> rows, the query sleeps for the amount of time it took to iterate those <strong>1,000</strong> rows (effectively doubling the total runtime of the query).<span id="more-4294"></span></li>
</ul>
<h4>Sample query</h4>
<p>We use a simple, single-table scan. No aggregates (which complicate the solution considerably).</p>
<blockquote>
<pre>SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental
;</pre>
</blockquote>
<h4>The naive solution</h4>
<p>We need to know every <strong>1,000</strong> rows. So we need to count the rows. We do that by using a counter, as follows:</p>
<blockquote>
<pre>SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days,
  @row_counter := @row_counter + 1
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>A thing that bothers me, is that I wasn't asking for an additional column. I would like the result set to remain as it were; same result structure. We also want to sleep for <strong>1</strong> second for each <strong>1,000</strong> rows. So we merge the two together along with one of the existing columns, like this:</p>
<blockquote>
<pre>SELECT
  rental_id +
    IF(
      (@row_counter := @row_counter + 1) % 1000 = 0,
      SLEEP(1), 0
    ) AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>To remain faithful to <a href="http://code.openark.org/blog/mysql/slides-from-my-talk-programmatic-queries-things-you-can-code-with-sql">my slides</a>, I rewrite as follows, and this is <em>the naive solution</em>:</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN <strong>(@row_counter := @row_counter + 1) % 1000 = 0</strong> THEN <strong>SLEEP(1)</strong>
      ELSE <strong>0</strong>
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>The <strong>WHEN</strong> clause always returns <strong>0</strong>, so it does not affect the value of <strong>rental_id</strong>.</p>
<h4>The factor approach</h4>
<p>In the factor approach we wish to keep record of query execution, every <strong>1,000</strong> rows. I introduce a nested <strong>WHEN</strong> statement which updates time records. I rely on <strong>SYSDATE()</strong> to return the true time, and on <strong>NOW()</strong> to return query execution start time.</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN <strong>@row_counter % 1000 = 0</strong> THEN
        CASE
          WHEN (@time_now := <strong>SYSDATE()</strong>) IS NULL THEN NULL
          WHEN (@time_diff := (<strong>TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now)</strong>)) IS NULL THEN NULL
          WHEN <strong>SLEEP(@time_diff)</strong> IS NULL THEN NULL
          WHEN (@chunk_start_time := <strong>SYSDATE()</strong>) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;</pre>
</blockquote>
<h4>Proof</h4>
<p>How can we prove that the queries do indeed work?</p>
<p>We can see if the total runtime sums up to the number of sleep calls, in seconds; but how do we know that sleeps do occur at the correct times?</p>
<p>A solution I offer is to use a stored routines which logs to a MyISAM table (a non transactional table) the exact time (using <strong>SYSDATE()</strong>) and value per row. The following constructs are introduced:</p>
<blockquote>
<pre><strong>CREATE TABLE</strong> test.proof(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  dt DATETIME NOT NULL,
  msg VARCHAR(255)
) <strong>ENGINE=MyISAM</strong>;

DELIMITER $$
<strong>CREATE FUNCTION</strong> test.prove_it(message VARCHAR(255)) RETURNS TINYINT
DETERMINISTIC
MODIFIES SQL DATA
BEGIN
  <strong>INSERT INTO test.proof (dt, msg) VALUES (SYSDATE(), message); RETURN 0;</strong>
END $$
DELIMITER ;</pre>
</blockquote>
<p>The <strong>prove_it()</strong> function records the immediate time and a message into the MyISAM table, which immediately accepts the write, being non-transactional. It returns with <strong>0</strong>, so we will now embed it within the query. Of course, the function itself incurs some overhead, but it will nevertheless convince you that <strong>SLEEP()</strong>s do occur at the right time!</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff)<strong> + test.prove_it(CONCAT('will sleep for ', @time_diff, ' seconds'))</strong> IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

mysql&gt; SELECT * FROM test.proof;
+----+---------------------+--------------------------+
| id | dt                  | msg                      |
+----+---------------------+--------------------------+
|  1 | 2011-11-01 09:22:36 | will sleep for 1 seconds |
|  2 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  3 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  4 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  5 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  6 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  7 | 2011-11-01 09:22:38 | will sleep for 1 seconds |
|  8 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
|  9 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 10 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 11 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 12 | 2011-11-01 09:22:40 | will sleep for 1 seconds |
| 13 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 14 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 15 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
+----+---------------------+--------------------------+</pre>
</blockquote>
<p>The above query is actually very fast. Try adding <strong>BENCHMARK(1000,ENCODE('hello','goodbye'))</strong> to rental_id so as to make it slower, or just use it on a really large table, see what happens (this is what I actually used to make the query run for several seconds in the example above).</p>
<p>Observant reads will note that the <strong>"will sleep..."</strong> message actually gets written <em>after</em> the <strong>SLEEP()</strong> call. I leave this as it is.</p>
<p>Another very nice treat of the code is that you don't need sub-second resolution for it to work. If you look at the above, we don't actually go to sleep every <strong>1,000</strong> rows (<strong>1,000</strong> is just too quick in the query -- perhaps I should have used <strong>10,000</strong> seconds). But we <em>do</em> make it once a second has <em>elapsed</em>. Which means it works correctly <em>on average</em>. Of course, the entire discussion is only of interest when a query executes for a <em>substantial</em> number of seconds, so this is just an anecdote.</p>
<h4>And the winner is...</h4>
<p>Wow, this <a href="http://code.openark.org/blog/mysql/contest-for-glory-write-a-self-throttling-mysql-query">contest</a> was anything but popular. <strong><a href="http://marcalff.blogspot.com/">Marc Alff</a></strong> is the obvious winner: he is the <em>only</em> one to suggest a solution <img src='http://code.openark.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>But Marc uses a very nice trick: he reads the <strong>PERFORMANCE_SCHEMA</strong>. Now, I'm not sure how the <strong>PERFORMANCE_SCHEMA</strong> gets updated. I know that the <strong>INFORMATION_SCHEMA.GLOBAL_STATUS</strong> table does not get updated by a query until the query completes (so you cannot expect a change in <strong>innodb_rows_read</strong> throughout the execution of the query). I just didn't test it (homework, anyone?). If it does get updated, then we can throttle the query based on InnoDB page reads using a simple query. Otherwise, an access to <strong>/proc/diskstats</strong> is possible, assuming no <em>apparmor</em> or <em>SELinux</em> are blocking us.</p>
<p>Marc also uses a stored function, which is the <em>clean</em> way of doing it; however I distrust the overhead incurred by s stored routine and prefer my solution (which is, admittedly, not a pretty SQL sight!).</p>
<p>Happy throttling!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/self-throttling-mysql-queries/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>On generating unique IDs using LAST_INSERT_ID() and other tools</title>
		<link>http://code.openark.org/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools</link>
		<comments>http://code.openark.org/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools#comments</comments>
		<pubDate>Wed, 02 Feb 2011 06:50:02 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Query Cache]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3283</guid>
		<description><![CDATA[There's a trick for using LAST_INSERT_ID() to generate sequences in MySQL. Quoting from the Manual: Create a table to hold the sequence counter and initialize it: mysql&#62; CREATE TABLE sequence (id INT NOT NULL); mysql&#62; INSERT INTO sequence VALUES (0); Use the table to generate sequence numbers like this: mysql&#62; UPDATE sequence SET id=LAST_INSERT_ID(id+1); mysql&#62; [...]]]></description>
			<content:encoded><![CDATA[<p>There's a <a href="http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_last-insert-id">trick</a> for using <strong>LAST_INSERT_ID()</strong> to generate sequences in MySQL. Quoting from the Manual:</p>
<blockquote>
<ol type="1">
<li>Create a table to hold the sequence counter and initialize               it:
<pre>mysql&gt; <strong><code>CREATE TABLE sequence (id INT NOT NULL);</code></strong>
mysql&gt; <strong><code>INSERT INTO sequence VALUES (0);</code></strong>
</pre>
</li>
<li>Use the table to generate sequence numbers like this:
<pre>mysql&gt; <strong><code>UPDATE sequence SET id=LAST_INSERT_ID(id+1);</code></strong>
mysql&gt; <strong><code>SELECT LAST_INSERT_ID();</code></strong>
</pre>
</li>
</ol>
</blockquote>
<p>This trick calls for trouble.</p>
<h4>Contention</h4>
<p>A customer was using this trick to generate unique session IDs for his JBoss sessions. These IDs would eventually be written back to the database in the form of log events. Business go well, and one day the customer adds three new JBoss servers (doubling the amount of webapps). All of a sudden, nothing works quite as it used to. All kinds of queries take long seconds to complete; load average becomes very high.<span id="more-3283"></span></p>
<p>A short investigation reveals that a very slight load is enough to make for an accumulation of sequence-UPDATE queries. Dozens of them are active at any given time, waiting for long seconds.</p>
<p>InnoDB or MyISAM both make for poor response times. No wonder! Everyone's contending for one lock.</p>
<h4>Not just one</h4>
<p>Other queries seem to hang as well. Why?</p>
<p>It is easy to forget or let go unnoticed that there are quite a few global locks involved with each query. If query cache is activated, then any query must pass through that cache, holding the query cache mutex. There's a global mutex on MyISAM's key cache. There's one on InnoDB's buffer pool (see multiple buffer pools in InnoDB 5.5), albeit less of an overhead. And there's the table cache.</p>
<p>When table cache is enabled, any completed query attempts to return file handles to the cache. Any new query attempts to retrieve handles from the cache. While writing to the cache (extracting, adding), the cache is locked. When everyone's busy doing the sequence-UPDATE, table cache lock is being abused. Other queries are unable to find the time to squire the lock and get on with their business.</p>
<h4>What can be done?</h4>
<p>One could try and increase the <strong>table_open_cache</strong> value. That may help to some extent, and for limited time. But the more requests are made, the quicker the problem surfaces again. When, in fact, <em>reducing</em> the <strong>table_open_cache</strong> to zero (well, minimum value is <strong>1</strong>) can make for a great impact. If there's nothing to fight for, everyone just get by on their own.</p>
<p>I know the following is not a scientific explanation, but it hits me as a good comparison: when my daughter brings a friend over, and there's a couple of toys, both are happy. A third friend makes for a fight: <em>"I saw it first! She took it from me! I was holding it!"</em>. Any parent knows the ultimate solution to this kind of fight: take away the toys, and have them find something else to enjoy doing. OK, sorry for this unscientific display, I had to share my daily stress.</p>
<p>When no table cache is available, a query will go on opening the table by itself, and will not attempt to return the file handle back to the cache. The file handle will simply be destroyed. Now, usually this is not desired. Caching is good. But in our customer's case, the cost of not using a table cache was minified by the cost of having everyone fight for the sequence table. Reducing the table cache made for an immediate relaxation of the database, with observable poorer responsiveness on peak times, however way better than with large table cache.</p>
<h4>Other tools?</h4>
<p>I don't consider the above to be a good solution. It's just a temporary hack.</p>
<p>I actually don't like the <strong>LAST_INSERT_ID()</strong> trick. Moreover, I don't see that it's the database's job to provide with unique IDs. Let it do relational stuff. If generating IDs is too intensive, let someone else do it.</p>
<p>NoSQL solutions provide such a service. <a href="http://memcached.org/">Memcached</a>, <a href="http://redis.io/">redis</a>, <a href="http://www.mongodb.org/">MongoDB</a> (and probably more) all provide with increment functions. Check them out.</p>
<h4>Application level solutions</h4>
<p>I actually use an application level solution to generate unique IDs. I mean, there's always <strong>GUID()</strong>, but it's result is just too long. Take a look at the following Java code:</p>
<blockquote>
<pre>public class Utils {
  private static long lastUniqueNumber = 0;

  public static synchronized long uniqueNumber() {
    long unique = System.currentTimeMillis();
    if (unique &lt;= lastUniqueNumber)
      unique = lastUniqueNumber + 1;
    lastUniqueNumber = unique;
    return unique;
  }
}</pre>
</blockquote>
<p>Within a Java application this above method returns with unique IDs, up to 1000 per second on average (and it can perform way more than 1000 times per second).</p>
<p>On consequential executions of applications on the same machine one would still expect unique values due to the time-related nature of values. However, computer time changes. It's possible that <strong>System.currentTimeMillis()</strong> would return a value already used in the past.</p>
<p>And, what about two processes running on the same machine at the same time? Or on different machine?</p>
<p>Which is why I use the following combination to generate my unique IDs:</p>
<ul>
<li>Server ID (much like MySQL's server_id parameter). this could be the last byte in the server's IP address, or just 4 or 5 bits if not too many players are expected.</li>
<li>Process ID (plain old <em>pid</em>) which I pass to the Java runtime in the form of system properties. Any two processes running on the same machine are assured to have different IDs. Two consequently spawned processes will have different IDs. The time it would take to cycle the process IDs is way more than would make for a "time glitch" problem as described above</li>
<li>Current time in milliseconds.</li>
</ul>
<p>If you have to have everything withing 64 bit (BIGINT) then you'll have to do bit manipulation, and drop some of the MSB on the milliseconds so as to overwrite with server &amp; process IDs.</p>
<p>If you are willing to have your IDs unique in the bounds of a given time frame (so, for example, a month from now you wouldn't mind reusing old IDs), then the problem is significantly easier. You may just use "day of month" and "millis since day start" and save those precious bits.</p>
<h4>Still other?</h4>
<p>Please share your solutions below!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>How often should you use OPTIMIZE TABLE? - followup</title>
		<link>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup</link>
		<comments>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup#comments</comments>
		<pubDate>Mon, 04 Oct 2010 08:07:45 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2882</guid>
		<description><![CDATA[This post follows up on Baron's How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table's indexes. I worked on some production data I was authorized to provide as example. The [...]]]></description>
			<content:encoded><![CDATA[<p>This post follows up on Baron's <a href="http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/">How often should you use OPTIMIZE TABLE?</a>. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the <strong>OPTIMIZE</strong> operation on table's indexes. I worked on some production data I was authorized to provide as example.</p>
<h4>The use case</h4>
<p>I'll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I've rewritten some column names for privacy:</p>
<blockquote>
<pre>mysql&gt; show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8</pre>
</blockquote>
<p>The table had log records starting <strong>2010-08-23</strong> and up till <strong>2010-09-02</strong> noon. Table status:<span id="more-2882"></span></p>
<blockquote>
<pre>mysql&gt; show table status like 'logs'\G
*************************** 1. row ***************************
           Name: logs
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 22433048
 Avg_row_length: 206
    Data_length: 4625285120
Max_data_length: 0
   Index_length: 1437073408
      Data_free: 4194304
 Auto_increment: 186878920
    Create_time: 2010-08-24 18:10:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
        Comment:</pre>
</blockquote>
<p>(A bit puzzled on the <strong>Create_time</strong>; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the <strong>Create_time</strong> field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).</p>
<p>I was using <a href="http://www.percona.com/downloads/Percona-Server-5.1/">Percona-Server-5.1.47-11.2</a>, and so was able to look at the index statistics for that table:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+----------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys   | index_size | leaf_pages |
+--------------+------------+--------------+--------+----------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1              |     282305 |     246856 |
| newsminer    | logs       | s            |      2 | 17, 1          |      38944 |      33923 |
| newsminer    | logs       | name         |      3 | 2492739, 10, 2 |      22432 |      19551 |
| newsminer    | logs       | origin       |      3 | 1303, 4, 1     |      26336 |      22931 |
+--------------+------------+--------------+--------+----------------+------------+------------+</pre>
</blockquote>
<h4>Status after massive purge</h4>
<p>My first requirement was to purge out all record up to <strong>2010-09-01 00:00:00</strong>. I did so in small chunks, using <a href="http://code.openark.org/forge/openark-kit">openark kit</a>'s oak-chunk-update (same can be achieved with <a href="http://www.maatkit.org/">maatkit</a>'s mk-archiver). The process purged <strong>1000</strong> rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in <a href="http://code.openark.org/blog/mysql/monotonic-functions-sql-and-mysql">monotonically ascending</a> values, purging of old rows also means purging of lower PKs, which means we're trimming the PK tree from left.</p>
<p>Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:</p>
<blockquote>
<pre>mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40961 |      35262 |
| newsminer    | logs       | s            |      2 | 26, 1        |      34440 |       3798 |
| newsminer    | logs       | name         |      3 | 341011, 4, 1 |       4738 |       2774 |
| newsminer    | logs       | origin       |      3 | 341011, 4, 2 |      10178 |       3281 |
+--------------+------------+--------------+--------+--------------+------------+------------+</pre>
</blockquote>
<p>The number of deleted rows was roughly <strong>85%</strong> of total rows, so down to <strong>15%</strong> number of rows.</p>
<h4>Status after OPTIMIZE TABLE</h4>
<p>Time to see whether <strong>OPTIMIZE</strong> really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?</p>
<blockquote>
<pre>mysql&gt; OPTIMIZE TABLE logs;
...
mysql&gt; SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40436 |      35323 |
| newsminer    | logs       | s            |      2 | 16, 1        |       5489 |       4784 |
| newsminer    | logs       | name         |      3 | 335813, 7, 1 |       3178 |       2749 |
| newsminer    | logs       | origin       |      3 | 335813, 5, 2 |       3951 |       3446 |
+--------------+------------+--------------+--------+--------------+------------+------------+
4 rows in set (0.00 sec)</pre>
</blockquote>
<p>The above shows no significant change in either of the indexes: not for <strong>index_size</strong>, not for <strong>leaf_pages</strong>, not for statistics (<strong>row_per_keys</strong>). The <strong>OPTIMIZE</strong> did not reduce index size. It did not reduce the number of index pages (<strong>leaf_pages</strong> are the major factor here). Some <strong>leaff_pages</strong> values have even increased, but in small enough margin to consider as equal.</p>
<p>Index-wise, the above example does not show an advantage to using <strong>OPTIMIZE</strong>. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.</p>
<h4>So, no use for OPTIMIZE?</h4>
<p>Think again: file system-wise, things look different.</p>
<p>Before purging of data:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd</pre>
</blockquote>
<p>After purging of data:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd</pre>
</blockquote>
<p>Recall that InnoDB never releases table space back to file system!</p>
<p>After <strong>OPTIMIZE</strong> on table:</p>
<blockquote>
<pre>bash:~# ls -l logs.* -h
-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd</pre>
</blockquote>
<p>On <strong>innodb_file_per_table</strong> an <strong>OPTIMIZE</strong> creates a new table space, and the old one gets destroyed. Space goes back to file system. Don't know about you; I like to have my file system with as much free space as possible.</p>
<h4>Need to verify</h4>
<p>I've tested Percona Server, since this is where I can find <strong>INNODB_INDEX_STATS</strong>. But this begs the following questions:</p>
<ul>
<li>Perhaps the results only apply for Percona Server? (I'm guessing not).</li>
<li>Or only for InnoDB plugin? Does the same hold for "builtin" InnoDB? (dunno)</li>
<li>Only on &gt;= 5.1? (Maybe; 5.0 is becoming rare now anyway)</li>
<li>Only on InnoDB (Well, of course this test is storage engine dependent!)</li>
</ul>
<h4>Conclusion</h4>
<p>The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle <strong>UPDATE</strong>s to indexed columns. I have some more operations to do here, with larger tables (e.g. <strong>40GB</strong> compressed). If anything changes, I'll drop a note.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/how-often-should-you-use-optimize-table-followup/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Views: better performance with condition pushdown</title>
		<link>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown</link>
		<comments>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown#comments</comments>
		<pubDate>Thu, 20 May 2010 05:17:05 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1328</guid>
		<description><![CDATA[Justin's A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer's site. The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with [...]]]></description>
			<content:encoded><![CDATA[<p>Justin's <a href="http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/">A workaround for the performance problems of TEMPTABLE views</a> post on <a href="http://www.mysqlperformanceblog.com/">mysqlperformanceblog.com</a> reminded me of a solution I once saw on a customer's site.</p>
<p>The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.</p>
<p>While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there's a hack around this.</p>
<p>Let's use the world database to illustrate. Look at the following view definition:<span id="more-1328"></span></p>
<blockquote><pre class="brush: sql; title: ; notranslate">
CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;
</pre>
</blockquote>
<p>The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>And, even if we only want to filter out a single country, we still get the same plan:</p>
<blockquote>
<pre>mysql&gt; EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
</pre>
</blockquote>
<p>So, we need to scan the entire country_language and country tables in order to return results for just one row.</p>
<h4>A non-working solution</h4>
<p>The solution offered by the developer was this:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And follow by:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
</pre>
</blockquote>
<p>So, pushdown a <strong>WHERE</strong> condition into the view's definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using <strong>IFNULL</strong>, for example, or <strong>OR</strong> statements) to allow for full scan in case the variable is undefined.</p>
<p>This doesn't work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.</p>
<h4>A workaround</h4>
<p>Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:</p>
<blockquote><pre class="brush: sql; title: ; notranslate">
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;
</pre>
</blockquote>
<p>And now:</p>
<blockquote>
<pre>mysql&gt; SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql&gt; SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql&gt; EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | &lt;derived2&gt;      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
</pre>
</blockquote>
<p>Since views are allowed to call stored routines (Justing used this to call upon <strong>CONNECTION_ID()</strong>), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.</p>
<p>Back in the customer's office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.</p>
<h4>Another kind of use</h4>
<p>Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:</p>
<p>Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?</p>
<p>There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I've seen it in use.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reducing locks by narrowing primary key</title>
		<link>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key</link>
		<comments>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key#comments</comments>
		<pubDate>Tue, 04 May 2010 06:46:01 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1269</guid>
		<description><![CDATA[In a period of two weeks, I had two cases with the exact same symptoms. Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks. In both cases, tables were InnoDB. In both cases, there was a [...]]]></description>
			<content:encoded><![CDATA[<p>In a period of two weeks, I had two cases with the exact same symptoms.</p>
<p>Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that <strong>INSERT</strong>s were causing the locks.</p>
<p>In both cases, tables were InnoDB. In both cases, there was a <strong>PRIMARY KEY</strong> on the combination of all <strong>5</strong> columns. And in both cases, there was no clear explanation as for why the <strong>PRIMARY KEY</strong> was chosen as such.</p>
<h4>Choosing a proper PRIMARY KEY</h4>
<p>Especially with InnoDB, which uses clustered index structure, the <strong>PRIMARY KEY</strong> is of particular importance. Besides the fact that a bloated <strong>PRIMARY KEY</strong> bloats the entire clustered index and secondary keys (see: <a href="http://code.openark.org/blog/mysql/the-depth-of-an-index-primer">The depth of an index: primer</a>), it is also a source for locks. It's true that any <strong>UNIQUE KEY</strong> can serve as a <strong>PRIMARY KEY</strong>. But not all such keys are good candidates.<span id="more-1269"></span></p>
<h4>Reducing the locks</h4>
<p>In both described cases, the solution was to add an <strong>AUTO_INCREMENT</strong> column to serve as the <strong>PRIMARY KEY</strong>, and have that <strong>5</strong> column combination under a secondary <strong>UNIQUE KEY</strong>. The impact was immediate: no further locks on that table were detected, and query responsiveness turned very high.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/reducing-locks-by-narrowing-primary-key/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Misimproving performance problems with INSERT DELAYED</title>
		<link>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed</link>
		<comments>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed#comments</comments>
		<pubDate>Thu, 14 Jan 2010 18:58:36 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1745</guid>
		<description><![CDATA[INSERT DELAYED may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand. It may even offer some performance boost, by aggregating [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://dev.mysql.com/doc/refman/5.1/en/insert-delayed.html">INSERT DELAYED</a> may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand.</p>
<p>It may even offer some performance boost, by aggregating such frequent INSERTs in a single thread.</p>
<p>But it is <strong>NOT</strong> a performance solution.</p>
<p>That is, in a case I've seen, database performance was poor. INSERTs were taking a very long time. Lot's of locks were involved. The solution offered was to change all slow INSERTs to INSERT DELAYED. Voila! All INSERT queries now completed in no time.</p>
<p>But the database performance remained poor. Just as poor as before, with the additional headache: nobody knew what caused the low performance.</p>
<p>Using INSERT DELAYED to improve overall INSERT performance is like sweeping the dust under the carpet. It's still there, only you can't actually see it. When your queries are slow to return, you know which queries or which parts of your application are the immediate suspects. When everything happens in the background you lose that feeling.</p>
<p>The slow query log, fortunately, still provides with the necessary information, and all the other metrics are just as before. Good. But it now takes a deeper level of analysis to find a problem that was previously in plain sight.</p>
<p>So: use INSERT DELAYED carefully, don't just throw it at your slow queries like a magic potion.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/misimproving-performance-problems-with-insert-delayed/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>On restoring a single table from mysqldump</title>
		<link>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump</link>
		<comments>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump#comments</comments>
		<pubDate>Tue, 01 Dec 2009 08:25:00 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1630</guid>
		<description><![CDATA[Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject. I also wish to note performance issues with the two suggested solutions, and offer improvements. Problem relevance While the problem is interesting, I just want to [...]]]></description>
			<content:encoded><![CDATA[<p>Following <a href="http://everythingmysql.ning.com/profiles/blogs/restore-one-table-from-an-all">Restore one table from an ALL database dump</a> and <a href="http://gtowey.blogspot.com/2009/11/restore-single-table-from-mysqldump.html">Restore a Single Table From mysqldump</a>, I would like to add my own thoughts and comments on the subject.</p>
<p>I also wish to note performance issues with the two suggested solutions, and offer improvements.</p>
<h4>Problem relevance</h4>
<p>While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small - and it doesn't matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big - and it would not be worthwhile to restore from <em>mysqldump</em> anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.</p>
<h4>Problem recap</h4>
<p>Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?</p>
<p>Let's review the two referenced solutions. I'll be using the <a href="http://dev.mysql.com/doc/employee/en/employee.html">employees db</a> on <a href="https://launchpad.net/mysql-sandbox">mysql-sandbox</a> for testing. I'll choose a very small table to restore: <strong>departments</strong> (only a few rows in this table).</p>
<h4>Security based solution</h4>
<p><a href="http://everythingmysql.ning.com/profiles/blogs/restore-one-table-from-an-all"><strong>Chris</strong></a> offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I'm afraid, not too efficient, for two reasons:<span id="more-1630"></span></p>
<ol>
<li>MySQL needs to process all irrelevant queries (ALTER, INSERT, ...) only to disallow them due to access violation errors.</li>
<li>Assuming restore is from remote host, we overload the network with all said irrelevant queries.</li>
</ol>
<p>Just how inefficient? Let's time it:</p>
<blockquote>
<pre>mysql&gt; grant usage on *.* to 'restoreuser'@'localhost';
mysql&gt; grant select on *.* to 'restoreuser'@'localhost';
mysql&gt; grant all on employees.departments to 'restoreuser'@'localhost';

$ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees &lt; /tmp/employees.sql
...
ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
...
(lot's of these messages)
...

real    <strong>0m31.945s</strong>
user    0m6.328s
sys     0m0.508s</pre>
</blockquote>
<p>So, at about <strong>30</strong> seconds to restore a 9 rows table.</p>
<h4>Text filtering based solution.</h4>
<p><a href="http://gtowey.blogspot.com/2009/11/restore-single-table-from-mysqldump.html"><strong>gtowey</strong></a> offers parsing the dump file beforehand:</p>
<ul>
<li>First, parse with <em>grep</em>, to detect rows where tables are referenced within dump file</li>
<li>Second, parse with <em>sed</em>, extracting relevant rows.</li>
</ul>
<p>Let's time this one:</p>
<blockquote>
<pre>$ time grep -n 'Table structure' /tmp/employees.sql
23:-- Table structure for table `departments`
48:-- Table structure for table `dept_emp`
89:-- Table structure for table `dept_manager`
117:-- Table structure for table `employees`
161:-- Table structure for table `salaries`
301:-- Table structure for table `titles`

real    <strong>0m0.397s</strong>
user    0m0.232s
sys     0m0.164s

$ time sed -n 23,48p /tmp/employees.sql | ./use employees

real    <strong>0m0.562s</strong>
user    0m0.380s
sys     0m0.176s</pre>
</blockquote>
<p>Much faster: about <strong>1</strong> second, compared to <strong>30</strong> seconds from above.</p>
<p>Nevertheless, I find two issues here:</p>
<ol>
<li>A correctness problem: this solution somewhat assumes that there's only a single table with desired name. I say "somewhat" since it leaves this for the user.</li>
<li>An efficiency problem: it reads the dump file <em>twice</em>. First parsing it with <em>grep</em>, then with <em>sed</em>.</li>
</ol>
<h4>A third solution</h4>
<p><em>sed</em> is much stronger than presented. In fact, the inquiry made by <em>grep</em> in gtowey's solution can be easily handled by <em>sed</em>:</p>
<blockquote>
<pre>$ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees

real    <strong>0m0.573s</strong>
user    0m0.416s
sys     0m0.152s</pre>
</blockquote>
<p>So, the <strong>"/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p"</strong> part tells <em>sed</em> to only print those rows starting from the <strong>departments</strong> table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).</p>
<p>And, we only do it in <strong>0.57</strong> seconds: about half the time of previous attempt.</p>
<p>Now, just to be more correct, we only wish to consider the <strong>employees.department</strong> table. So, <em>assuming</em> there's more than one database dumped (and, by consequence, <strong>USE</strong> statements in the dump-file), we use:</p>
<blockquote>
<pre>cat /tmp/employees.sql | sed -n "/^USE \`employees\`/,/^USE \`/p" | sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" | ./use employees</pre>
</blockquote>
<h4>Further notes</h4>
<ul>
<li>All tests used warmed-up caches.</li>
<li>The sharp eyed readers would notice that <strong>departments</strong> is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I've created an <strong>xdepartments</strong> table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we're still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.</li>
</ul>
<p>[<strong>UPDATE</strong>: see also following similar post: <a href="http://blog.tsheets.com/2008/tips-tricks/extract-a-single-table-from-a-mysqldump-file.html">Extract a Single Table from a mysqldump File</a>]</p>
<h4>Conclusion</h4>
<p><a href="http://www.amazon.com/Classic-Shell-Scripting-Arnold-Robbins/dp/0596005954/ref=sr_1_1"><img class="alignright" title="classic-shell-scripting" src="http://code.openark.org/blog/wp-content/uploads/2009/12/classic-shell-scripting.png" alt="classic-shell-scripting" width="144" height="189" /></a>Its is always best to test on large datasets, to get a feel on performance.</p>
<p>It's best to save MySQL the trouble of parsing &amp; ignoring statements. Scripting utilities like <em>sed</em>, <em>awk</em> &amp; <em>grep</em> have been around for ages, and are well optimized. They excel at text processing.</p>
<p>I've used <em>sed</em> many times in transforming dump outputs; for example, in converting MyISAM to InnoDB tables; to convert Antelope InnoDB tables to Barracuda format, etc. grep &amp; awk are also very useful.</p>
<p>May I recommend, at this point, reading <a href="http://www.amazon.com/Classic-Shell-Scripting-Arnold-Robbins/dp/0596005954/ref=sr_1_1">Classic Shell Scripting</a>, a very easy to follow book, which lists the most popular command line utilities like <em>grep</em>, <em>sed</em>, <em>awk</em>, <em>sort</em>, (countless more) and shell scripting in general. While most of these utilities are well known, the book excels in providing suprisingly practical, simple solution to common tasks.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Performance analysis with mycheckpoint</title>
		<link>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint</link>
		<comments>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint#comments</comments>
		<pubDate>Thu, 12 Nov 2009 10:47:00 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[mycheckpoint]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1568</guid>
		<description><![CDATA[mycheckpoint (see announcement) allows for both graph presentation and quick SQL access to monitored &#38; analyzed data. I'd like to show the power of combining them both. InnoDB performance Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the HTML [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a> (see <a href="http://code.openark.org/blog/mysql/announcing-mycheckpoint-lightweight-sql-oriented-monitoring-for-mysql">announcement</a>) allows for both graph presentation and quick SQL access to monitored &amp; analyzed data. I'd like to show the power of combining them both.</p>
<h4>InnoDB performance</h4>
<p>Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the <a href="http://code.openark.org/forge/mycheckpoint/documentation/generating-html-reports">HTML report</a>):</p>
<blockquote>
<pre>SELECT innodb_read_hit_percent FROM sv_report_chart_sample \G
*************************** 1. row ***************************
innodb_read_hit_percent: http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&amp;chxt=x,y&amp;chxr=1,99.60,100.00&amp;chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&amp;chxs=0,505050,10</pre>
</blockquote>
<blockquote>
<pre><img class="alignnone" title="innodb_read_hit_percent" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&amp;chxt=x,y&amp;chxr=1,99.60,100.00&amp;chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>We see that read hit is usually high, but occasionally drops low, down to 99.7, or even 99.6. But it seems like most of the time we are above 99.95% read hit ratio. It's hard to tell about 99.98%.</p>
<h4>Can we know for sure?</h4>
<p>We can stress our eyes, yet be certain of little. It's best if we just query for the metrics! <em>mycheckpoint</em> provides with all data, accessible by simple SQL queries:<span id="more-1568"></span></p>
<blockquote>
<pre>SELECT SUM(innodb_read_hit_percent &gt; 99.95)/count(*)
  FROM sv_report_sample;
+-----------------------------------------------+
| SUM(innodb_read_hit_percent &gt; 99.95)/count(*) |
+-----------------------------------------------+
|                                        0.7844 |
+-----------------------------------------------+</pre>
</blockquote>
<p>Yes, most of the time we're above 99.95% read hit ratio: but not too often!</p>
<p>I'm more interested in seeing how much time my server's above 99.98% read hit:</p>
<blockquote>
<pre>SELECT SUM(innodb_read_hit_percent &gt; 99.98)/count(*)
  FROM sv_report_sample;
+-----------------------------------------------+
| SUM(innodb_read_hit_percent &gt; 99.98)/count(*) |
+-----------------------------------------------+
|                                        0.3554 |
+-----------------------------------------------+</pre>
</blockquote>
<p>We can see the server only has 99.98% read hit percent 35% of the time. Need to work on that!</p>
<h4>Disk activity</h4>
<p>Lower read hit percent means higher number of disk reads; that much is obvious. The first two following graphs present this obvious connection. But the third graph tells us another fact: with increased disk I/O, we can expect more (and longer) locks.</p>
<p>Again, this should be very intuitive, when thinking about it this way. The problem sometimes arises when we try to analyze it the other way round: "Hey! InnoDB has a lot of locks! What are we going to do about it?". Many times, people will look for answers in their <em>transactions</em>, their <em>Isolation Level</em>, their <em>LOCK IN SHARE MODE</em> clauses. But the simple answer can be: "There's a lot of I/O, so everything has to wait; therefore we increase the probability for locks; therefore there's more locks".</p>
<p>The answer, then, is to reduce I/O. The usual stuff: slow queries; indexing; ... and, yes, perhaps transactions or tuning.</p>
<p>The charts below make it quite clear that we have an issue of excessive reads -&gt; less read hit -&gt; increased I/O -&gt; more locks.</p>
<blockquote>
<pre><img class="alignnone" title="DML" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=com_select_psec|com_insert_psec|com_delete_psec|com_update_psec|com_replace_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4,9acd32,dc143c,9932cc&amp;chd=s:IJJJJJKHGHGHGHHHHHIIIJJJKKKLKLLIHHHIHIHIIIJJJJJKKKLLLLMIHHHHHHHIIIIIJJKKKLLLLMMIHIIIIIIIIIIJJJJKKLLLLMMIHHHIHIHIIIJJJJJKKLKLLLLIHHHIHIHIIIIJIJJJJKKKKKKHHHHHHHHHHHHIIIIJIJJJJJKHHHHHHHHHHIIJJNKLLKKLLLMSMHHIHHHIOSae9RNPJIIJJJKHGGGHGGHHHHHJJKJLKLLLMKMJHIIIIIII,EEEEEEEFEEEEEEEFEFFFFFFFFFFFFFFEEFFEEEEEFFFFFFFFFFFGFFFGFFFFEEEEFFFFFFGGFGFFFFFGEFFFEEFFFFFFFFFFFFFFFFFGEEEEEEEEFFFGGFFFFGFFFFFGEEEFFEEFFEFFEEFFFFFFFEEFEEEEEEEEEEEEEEFFEEEEFEEGEEEEEEEEEFFFFFFFFFEFFFFHEEEEEEEFFFFFFFFFFEEEEFEHEFEEEEEEEEFFFGFGGFFFFFFIEEEEEEEF,CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCDDCCCCCCCCCCCCCCCDCCCCCCCCCCCCCCCCCCDCCCCECCBCBCCCCCCCCCCDCCCCCDCECCCCCCCCCCCCCCCCCCCDCCCDCCCCBBCCCCDCCDCCCCCCCCCECCCBBCCCCCCCCCCCCCCCCCCFCCBCCCCCCCCCCCCCCCCCCCCFCCCCBCCCCCCCDCCCCDCCDCCGCCCCCCCD,CBCCCBBCBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCCCCCCBBBCCCCCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCBBBBBBCBBBBBBBBCBBCCCCCCCCCCCCCCCCCCCCC,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&amp;chxt=x,y&amp;chxr=1,0,680.42&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_read_hit_percent" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_read_hit_percent&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:8p879mq7z1377377777z788863778839z13773877633697786888969z1379377667275377376672813167266771288716689y759121685885785236675889869232685789w63y69997989999252696878y8698878588886933368587ffpibibaTYRfVAdXjqfdmbYneRhciXYcifb6995802z56377666576877268875913278387&amp;chxt=x,y&amp;chxr=1,99.44,99.99&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_io" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_buffer_pool_reads_psec|innodb_buffer_pool_pages_flushed_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4&amp;chd=s:DYDDBZVEPMJEFKFGGGEOEDDDEJDECBGBONKEFJEFFFIHFCDDCECCCBECSPLECJEFGHFLDGHEDHEEEDHCMJMGFLGHFELMDCDLEFDBRDFBKIKECIDEHEDHJHFEDGCDCDFCKJKFEJFECRGHNFCCBFBECBCCLHKFBGDEDUEGBCCEEHDCDDFBJJJFEIDFwrfpthozmqqcn3g9hYjkbpqdsvhxormohorGCBHDOLNGEHEDFFGIEFDEDKFDDDGCLIJECIDE,EEEEEGEFSWUFFEGHKIHHHGGHJIHHHHGGQbTFFEFFHGGGGFFHHHGHGFFGUdYGDEGFJKIHHHLKJJJIHHHGHZQRGFGHIHIGGGHIIIGHFFFEHYPNCEFEHHIIIKKJIJHHGIFFGbSPFGJIGFGGGEFFEFEFFEEGSIUODCDFHGGFEEGGGGGGGHFFGYPNDFFGJHIJJIJIHHGFFFEHHVSLCDGIHIHGIHGGFGFFFGIJTRSMEEFFGHHIIIHJKLKHIHGHPNNMCFGE&amp;chxt=x,y&amp;chxr=1,0,151.44&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" />

<img class="alignnone" title="innodb_row_lock_waits_psec" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Oct+31,+17:00++-++Nov+11,+08:00+(10+days,+15+hours)&amp;chdl=innodb_row_lock_waits_psec&amp;chdlp=b&amp;chco=ff8c00&amp;chd=s:GWFGGYSHQJKFGJHIHIHNGHGGGKHHFFGGMMJFFKHHHKMIHGIIJGGFGFHGTNOGFJIHGJGKGJHGGGGFHFJFPIKFFJHJIFLKGGFIFGGEVEJGPILGFIGHJJHIJKGGFJGLIGKGJMSGGIGIGVGGQJGHHKHIGHFGLHMHFIFIGQGGIFGJHIEEHFHGKLJHGGGIYgTVXOaXabSUadW9gVfRSeaQbfalXeYcXTiGHIKHKEJEFFFGGGIGGGGHGKGGHGLGPJHGFJEG&amp;chxt=x,y&amp;chxr=1,0,1.42&amp;chxl=0:||Nov+2,+20:00|Nov+4,+23:00|Nov+7,+02:00|Nov+9,+05:00|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>By the way, the above resulted from the fact that, due to a problematic query, all slave stopped replicating. Slaves participated in read-balancing, so when they went stale, all reads were directed at the master (the monitored node).</p>
<h4>You have the metrics at your disposal</h4>
<p>Looking at the following chart:</p>
<blockquote>
<pre><img class="alignnone" title="questions" src="http://chart.apis.google.com/chart?cht=lc&amp;chs=400x200&amp;chts=303030,12&amp;chtt=Nov+11,+15:15++-++Nov+12,+12:30+(0+days,+21+hours)&amp;chdl=queries_psec|questions_psec|slow_queries_psec|com_commit_psec|com_set_option_psec&amp;chdlp=b&amp;chco=ff8c00,4682b4,9acd32,dc143c,9932cc&amp;chd=s:pqpviksvvz0vuxxxjpw0mwpkkso1vhuvn0nnrtx2uisrnvoknmusomqvlyymsvpuweqslwumkomutcromzrinukvwcuzotujjto1shrtszqlu849mXenejkaZlZhcYbgciaZZegecUWhZkaYWebfaXVaecdZUZgdbSbccbcTXYeaaTYZfZeVjbnZhRdegcfYorkdmVadqenfcknkoadeuhrjcbptpkhkqkrqfjprrtmllmnqdwsusojoo0qtpwp4,abfjVQebgjaWeedkWRgcelWUcdclYUhddnaVidendUieflaUhcfkdRfefmgSjianfPkcdmfUegamfRmcfmgTgegmhQghgmgWeiepfShfhqjcqzwzfVYdbfbUXbXcYSYaYdVWUZabWTTbXeYUVZYdVTUYabYWUYbbXSWaYaYSSXZZVTVZaXZURZaWbQWZaaYWUWbaZSUadadVUcbbbTWYeabXUUcebUYbabdVUYbbdTWaaccWTddaeTWbdbgXXdci,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,MLOPJHNMNPLKNNNQJHOMNQJJMMMQLIPMNSLJPNNRNJPNNRKJPMNQNHONNRNIPPLROHQNMRNJOOMRNHRMNRNJONOROHOPOROKNPNTOIPOOTOMQUSUNJKNLOLJKMKMKIKLKMJJILLMKIIMKNJJJLKNIJJKKMKJIKLMKHJLKLKIIKKLJIJLKKKJHLLKLHJLKLKJIKLLKIJLMLMKJMMMMIJKMLLKIJMNLJKMLMMJJKLMNIKLLMMKIMMLNIKMMMOKKNMP&amp;chxt=x,y&amp;chxr=1,0,916.47&amp;chxl=0:||Nov+11,+19:30|Nov+11,+23:45|Nov+12,+04:00|Nov+12,+08:15|&amp;chxs=0,505050,10" alt="" width="400" height="200" /></pre>
</blockquote>
<p>It appears that there's no slow queries. But this may be misleading: perhaps there's just a little, that don't show due to the chart's large scale?</p>
<p>One could argue that this is the chart's fault. Perhaps there should be a distinct chart for "slow queries percent". Perhaps I'll add one. But we can't have special charts for everything. It's would be too tiresome to look at hundreds of charts.</p>
<p>Anyway, my point is: let's verify just how many slow queries we have:</p>
<blockquote>
<pre>SELECT slow_queries_psec FROM sv_hour ORDER BY id DESC;
+-------------------+
| slow_queries_psec |
+-------------------+
|              3.05 |
|              3.83 |
|              4.39 |
|              4.03 |
|              3.86 |
|              3.56 |
|              3.73 |
|              3.79 |
|              3.58 |
|              3.55 |
...
+-------------------+</pre>
</blockquote>
<p>So, between 3 and 4 slow queries per second. It doesn't look too good in this light. Checking on the percentage of slow queries (of total questions):</p>
<blockquote>
<pre>SELECT ROUND(100*slow_queries_diff/questions_diff, 1) AS slow_queries_percent
  FROM sv_hour ORDER BY id DESC LIMIT 10;</pre>
</blockquote>
<p>Or, since the above calculation is pre-defined in the reports tables:</p>
<blockquote>
<pre>SELECT slow_queries_percent FROM sv_report_hour_recent;
+----------------------+
| slow_queries_percent |
+----------------------+
|                  0.8 |
|                  1.0 |
|                  1.2 |
|                  1.2 |
|                  1.1 |
|                  1.0 |
|                  1.1 |
|                  1.1 |
|                  1.0 |
...
+----------------------+</pre>
</blockquote>
<h4>Accessible data</h4>
<p>This is what I've been trying to achieve with <em>mycheckpoint</em>. As a DBA, consultant and SQL geek I find that direct SQL access works best for me. It's like loving command line interface over GUI tools. Direct SQL gives you so much more control and information.</p>
<p>Charting is important, since it's easy to watch and get first impressions, or find extreme changes. But beware of relying on charts all the time. Scale issues, misleading human interpretation, technology limitations - all these make charts inaccurate.</p>
<p><a href="http://code.openark.org/forge/mycheckpoint">mycheckpoint</a> allows for both methods, and, I believe, intuitively so.</p>
<p>&lt;/propaganda&gt;</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/performance-analysis-with-mycheckpoint/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to calculate a good InnoDB log file size - recap</title>
		<link>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap</link>
		<comments>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap#comments</comments>
		<pubDate>Tue, 20 Oct 2009 19:04:40 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=895</guid>
		<description><![CDATA[Following Baron Schwartz' post: How to calculate a good InnoDB log file size, which shows how to make an estimate for the InnoDB log file size, and based on SQL: querying for status difference over time, I've written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB [...]]]></description>
			<content:encoded><![CDATA[<p>Following Baron Schwartz' post: <a href="http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/">How to calculate a good InnoDB log file size</a>, which shows how to make an estimate for the InnoDB log file size, and based on <a href="http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time">SQL: querying for status difference over time</a>, I've written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB transaction log bytes that are expected to be written in the period of 1 hour.</p>
<p><em>Recap</em>: this information can be useful if you're looking for a good <strong>innodb_log_file_size</strong> value, such that will not pose too much I/O (smaller values will make for more frequent flushes), not will make for a too long recovery time (larger values mean more transactions to recover upon crash).</p>
<p>It is assumed that the 60 seconds period represents an average system load, not some activity spike period. Edit the sleep time and factors as you will to sample longer or shorter periods.<span id="more-895"></span></p>
<blockquote>
<pre><strong>SELECT</strong>
  innodb_os_log_written_per_minute*60
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour,
  CONCAT(ROUND(innodb_os_log_written_per_minute*60/1024/1024, 1), 'MB')
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour_mb
<strong>FROM</strong>
  (<strong>SELECT</strong> <strong>SUM</strong>(value) <strong>AS</strong> innodb_os_log_written_per_minute <strong>FROM</strong> (
    <strong>SELECT</strong> -VARIABLE_VALUE <strong>AS</strong> value
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> SLEEP(60)
      <strong>FROM</strong> DUAL
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> VARIABLE_VALUE
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
  ) s1
) s2
;</pre>
</blockquote>
<p>Sample output:</p>
<blockquote>
<pre>+------------------------------------------+---------------------------------------------+
| estimated_innodb_os_log_written_per_hour | estimated_innodb_os_log_written_per_hour_mb |
+------------------------------------------+---------------------------------------------+
|                                584171520 | 557.1MB                                     |
+------------------------------------------+---------------------------------------------+</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>High Performance MySQL - a book to re-read</title>
		<link>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read</link>
		<comments>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read#comments</comments>
		<pubDate>Sun, 27 Sep 2009 07:56:59 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1346</guid>
		<description><![CDATA[I first read High Performance MySQL, 2nd edition about a year ago, when it first came out. I since re-read a few pages on occasion. In my previous posts I've suggested ways to improve upon the common ranking solution. Very innovative stuff! Or... so I thought. I happened to browse through the book today, and [...]]]></description>
			<content:encoded><![CDATA[<p>I first read <a href="http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716">High Performance MySQL, 2nd edition</a> about a year ago, when it first came out. I since re-read a few pages on occasion.</p>
<p>In my previous posts I've suggested ways to improve upon the common ranking solution. Very innovative stuff! Or... so I thought.</p>
<p>I happened to browse through the book today, and a section on User Variables caught my eye. "<em>Let's see if I get get some insight</em>", I thought to myself. Imagine my surprise when I realized almost everything I've suggested is discussed in this modest section, black on white, sitting on my bookshelf for over a year!</p>
<p>I have read it a year back, have forgotten all about it, have re-invented stuff already solved and discussed... Oh, for more brain capacity...</p>
<p>To be honest, this has happened to me more than once in the past few months; I'm taking the habit of browsing the web when I'm looking for answers to my problems; I forget that this book contains the answers to so many common, practical MySQL problems, and does so in a very direct and helpful manner.</p>
<p>So, yet again, thumbs up to <em>High Performance MySQL</em>. Really a must book. Get it if you haven't already!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/high-performance-mysql-a-book-to-re-read/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

