<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; MyISAM</title>
	<atom:link href="http://code.openark.org/blog/tag/myisam/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Wed, 01 Feb 2012 08:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Self throttling MySQL queries</title>
		<link>http://code.openark.org/blog/mysql/self-throttling-mysql-queries</link>
		<comments>http://code.openark.org/blog/mysql/self-throttling-mysql-queries#comments</comments>
		<pubDate>Tue, 01 Nov 2011 07:55:47 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Hack]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Stored routines]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=4294</guid>
		<description><![CDATA[Recap on the problem: A query takes a long time to complete. During this time it makes for a lot of I/O. Query's I/O overloads the db, making for other queries run slow. I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the [...]]]></description>
			<content:encoded><![CDATA[<p>Recap on the problem:</p>
<ul>
<li>A query takes a long time to complete.</li>
<li>During this time it makes for a lot of I/O.</li>
<li>Query's I/O overloads the db, making for other queries run slow.</li>
</ul>
<p>I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.</p>
<p>I present two approaches:</p>
<ul>
<li>The naive approach: for every <strong>1,000</strong> rows, the query sleep for <strong>1</strong> second</li>
<li>The factor approach: for every <strong>1,000</strong> rows, the query sleeps for the amount of time it took to iterate those <strong>1,000</strong> rows (effectively doubling the total runtime of the query).<span id="more-4294"></span></li>
</ul>
<h4>Sample query</h4>
<p>We use a simple, single-table scan. No aggregates (which complicate the solution considerably).</p>
<blockquote>
<pre>SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental
;</pre>
</blockquote>
<h4>The naive solution</h4>
<p>We need to know every <strong>1,000</strong> rows. So we need to count the rows. We do that by using a counter, as follows:</p>
<blockquote>
<pre>SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days,
  @row_counter := @row_counter + 1
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>A thing that bothers me, is that I wasn't asking for an additional column. I would like the result set to remain as it were; same result structure. We also want to sleep for <strong>1</strong> second for each <strong>1,000</strong> rows. So we merge the two together along with one of the existing columns, like this:</p>
<blockquote>
<pre>SELECT
  rental_id +
    IF(
      (@row_counter := @row_counter + 1) % 1000 = 0,
      SLEEP(1), 0
    ) AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>To remain faithful to <a href="http://code.openark.org/blog/mysql/slides-from-my-talk-programmatic-queries-things-you-can-code-with-sql">my slides</a>, I rewrite as follows, and this is <em>the naive solution</em>:</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN <strong>(@row_counter := @row_counter + 1) % 1000 = 0</strong> THEN <strong>SLEEP(1)</strong>
      ELSE <strong>0</strong>
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;</pre>
</blockquote>
<p>The <strong>WHEN</strong> clause always returns <strong>0</strong>, so it does not affect the value of <strong>rental_id</strong>.</p>
<h4>The factor approach</h4>
<p>In the factor approach we wish to keep record of query execution, every <strong>1,000</strong> rows. I introduce a nested <strong>WHEN</strong> statement which updates time records. I rely on <strong>SYSDATE()</strong> to return the true time, and on <strong>NOW()</strong> to return query execution start time.</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN <strong>@row_counter % 1000 = 0</strong> THEN
        CASE
          WHEN (@time_now := <strong>SYSDATE()</strong>) IS NULL THEN NULL
          WHEN (@time_diff := (<strong>TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now)</strong>)) IS NULL THEN NULL
          WHEN <strong>SLEEP(@time_diff)</strong> IS NULL THEN NULL
          WHEN (@chunk_start_time := <strong>SYSDATE()</strong>) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;</pre>
</blockquote>
<h4>Proof</h4>
<p>How can we prove that the queries do indeed work?</p>
<p>We can see if the total runtime sums up to the number of sleep calls, in seconds; but how do we know that sleeps do occur at the correct times?</p>
<p>A solution I offer is to use a stored routines which logs to a MyISAM table (a non transactional table) the exact time (using <strong>SYSDATE()</strong>) and value per row. The following constructs are introduced:</p>
<blockquote>
<pre><strong>CREATE TABLE</strong> test.proof(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  dt DATETIME NOT NULL,
  msg VARCHAR(255)
) <strong>ENGINE=MyISAM</strong>;

DELIMITER $$
<strong>CREATE FUNCTION</strong> test.prove_it(message VARCHAR(255)) RETURNS TINYINT
DETERMINISTIC
MODIFIES SQL DATA
BEGIN
  <strong>INSERT INTO test.proof (dt, msg) VALUES (SYSDATE(), message); RETURN 0;</strong>
END $$
DELIMITER ;</pre>
</blockquote>
<p>The <strong>prove_it()</strong> function records the immediate time and a message into the MyISAM table, which immediately accepts the write, being non-transactional. It returns with <strong>0</strong>, so we will now embed it within the query. Of course, the function itself incurs some overhead, but it will nevertheless convince you that <strong>SLEEP()</strong>s do occur at the right time!</p>
<blockquote>
<pre>SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff)<strong> + test.prove_it(CONCAT('will sleep for ', @time_diff, ' seconds'))</strong> IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

mysql&gt; SELECT * FROM test.proof;
+----+---------------------+--------------------------+
| id | dt                  | msg                      |
+----+---------------------+--------------------------+
|  1 | 2011-11-01 09:22:36 | will sleep for 1 seconds |
|  2 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  3 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  4 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  5 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  6 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  7 | 2011-11-01 09:22:38 | will sleep for 1 seconds |
|  8 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
|  9 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 10 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 11 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 12 | 2011-11-01 09:22:40 | will sleep for 1 seconds |
| 13 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 14 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 15 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
+----+---------------------+--------------------------+</pre>
</blockquote>
<p>The above query is actually very fast. Try adding <strong>BENCHMARK(1000,ENCODE('hello','goodbye'))</strong> to rental_id so as to make it slower, or just use it on a really large table, see what happens (this is what I actually used to make the query run for several seconds in the example above).</p>
<p>Observant reads will note that the <strong>"will sleep..."</strong> message actually gets written <em>after</em> the <strong>SLEEP()</strong> call. I leave this as it is.</p>
<p>Another very nice treat of the code is that you don't need sub-second resolution for it to work. If you look at the above, we don't actually go to sleep every <strong>1,000</strong> rows (<strong>1,000</strong> is just too quick in the query -- perhaps I should have used <strong>10,000</strong> seconds). But we <em>do</em> make it once a second has <em>elapsed</em>. Which means it works correctly <em>on average</em>. Of course, the entire discussion is only of interest when a query executes for a <em>substantial</em> number of seconds, so this is just an anecdote.</p>
<h4>And the winner is...</h4>
<p>Wow, this <a href="http://code.openark.org/blog/mysql/contest-for-glory-write-a-self-throttling-mysql-query">contest</a> was anything but popular. <strong><a href="http://marcalff.blogspot.com/">Marc Alff</a></strong> is the obvious winner: he is the <em>only</em> one to suggest a solution <img src='http://code.openark.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>But Marc uses a very nice trick: he reads the <strong>PERFORMANCE_SCHEMA</strong>. Now, I'm not sure how the <strong>PERFORMANCE_SCHEMA</strong> gets updated. I know that the <strong>INFORMATION_SCHEMA.GLOBAL_STATUS</strong> table does not get updated by a query until the query completes (so you cannot expect a change in <strong>innodb_rows_read</strong> throughout the execution of the query). I just didn't test it (homework, anyone?). If it does get updated, then we can throttle the query based on InnoDB page reads using a simple query. Otherwise, an access to <strong>/proc/diskstats</strong> is possible, assuming no <em>apparmor</em> or <em>SELinux</em> are blocking us.</p>
<p>Marc also uses a stored function, which is the <em>clean</em> way of doing it; however I distrust the overhead incurred by s stored routine and prefer my solution (which is, admittedly, not a pretty SQL sight!).</p>
<p>Happy throttling!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/self-throttling-mysql-queries/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Limiting table disk quota in MySQL</title>
		<link>http://code.openark.org/blog/mysql/limiting-table-disk-quota-in-mysql</link>
		<comments>http://code.openark.org/blog/mysql/limiting-table-disk-quota-in-mysql#comments</comments>
		<pubDate>Mon, 07 Mar 2011 07:08:21 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[File System]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Triggers]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=3359</guid>
		<description><![CDATA[Question asked by a student: is there a way to limit a table's quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size. There is [...]]]></description>
			<content:encoded><![CDATA[<p>Question asked by a student: is there a way to limit a table's quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size.</p>
<p>There is no built-in way to limit a table's quota on disk. First thing to observe is that MySQL has nothing to do with this. It is entirely up to the storage engine to provide with such functionality. The storage engine is the one to handle data storage: how table and keys are stored on disk. Just consider the difference between MyISAM's <strong>.MYD</strong> &amp; <strong>.MYI</strong> to InnoDB's shared tablespace <strong>ibdata1</strong> to InnoDB's file-per table <strong>.ibd</strong> files.</p>
<p>The only engine I know of that has a quota is the MEMORY engine: it accepts the <strong>max_heap_table_size</strong>, which limits the size of a single table in memory. Hrmmm... In memory...</p>
<h4>Why limit?</h4>
<p>I'm not as yet aware of the specific requirements of said company, but this is not the first time I heard this question.</p>
<p>The fact is: when MySQL runs out of disk space, it goes with a BOOM. It crashed ungracefully, with binary logs being out of sync, replication being out of sync. To date, and I've seen some cases, InnoDB merely crashes and manages to recover once disk space is salvaged, but I am not certain this is guaranteed to be the case. Anyone?</p>
<p>And, with MyISAM..., who knows?</p>
<p>Rule #1 of MySQL disk usage: <em>don't run out of disk space.</em></p>
<h4>Workarounds</h4>
<p>I can think of two workarounds, none of which is pretty. The first involves triggers (actually, a few variations for this one), the second involves privileges.<span id="more-3359"></span></p>
<h4>Triggers</h4>
<p>The following code (first presented in <a title="Permanent Link to Triggers Use Case Compilation, Part II" rel="bookmark" href="http://code.openark.org/blog/mysql/triggers-use-case-compilation-part-ii">Triggers Use Case Compilation, Part II</a>) assumed the DATA_LENGTH and INDEX_LENGTH values in INFORMATION_SCHEMA to be good indicators:</p>
<blockquote>
<pre>DROP TABLE IF EXISTS `world`.`logs`;
CREATE TABLE  `world`.`logs` (
  `logs_id` int(11) NOT NULL auto_increment,
  `ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `message` varchar(255) character set utf8 NOT NULL,
  PRIMARY KEY  (`logs_id`)
) ENGINE=MyISAM;

DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='world' AND TABLE_NAME='LOGS' INTO @estimated_table_size;
  IF (@estimated_table_size &gt; 25*1024) THEN
    SELECT 0 FROM `logs table is full` INTO @error;
  END IF;
END $$

DELIMITER ;
</pre>
</blockquote>
<p>Or, you could write your own UDF, e.g. <strong>get_table_file_size(fully_qualified_table_name)</strong> and be more accurate:</p>
<blockquote>
<pre>DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT get_table_file_size('world.logs') INTO @table_size;
  IF (@table_size &gt; 25*1024) THEN
    SELECT 0 FROM `logs table is full` INTO @error;
  END IF;
END $$

DELIMITER ;
</pre>
</blockquote>
<p>(Same should be done for <strong>UPDATE</strong> operations)</p>
<p>In both workarounds above, triggers are pre-defined. But triggers are performance-killers.</p>
<p>How about preventing writing to the table only when it's truly on the edge? A simple shell script, spawned by a cronjob, could do this well: get the file size of a specific table, and test if it's larger than <em>n</em> bytes. If not, the script exits. If the file is indeed too large, the scripts invokes the following on <em>mysql</em>:</p>
<blockquote>
<pre>DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT 0 FROM `logs table is full` INTO @error;
END $$

DELIMITER ;
</pre>
</blockquote>
<p>So, during most of the time, there is no trigger. Only when the external script detects that table is too large, does it create a trigger. The trigger has no logic: it simply raises an error (PS, use <strong>raise</strong> in MySQL <strong>5.5</strong>).</p>
<h4>Privileges</h4>
<p>Another way to work around the problem is to use security features. Instead of creating a trigger on the table, <strong>REVOKE</strong> the <strong>INSERT</strong> &amp; <strong>UPDATE</strong> privileges from the appropriate user on that table.</p>
<p>This may turn out to be a difficult task, since MySQL has no notion of <em>fine grain changes</em>. That is, suppose we have:</p>
<blockquote>
<pre>GRANT INSERT, UPDATE, DELETE, SELECT ON mydb.* TO 'webuser'@'%.webdomain'</pre>
</blockquote>
<p>If we just do:</p>
<blockquote>
<pre>REVOKE SELECT ON mydb.logs FROM 'webuser'@'%.webdomain'</pre>
</blockquote>
<p>We get:</p>
<blockquote>
<pre>There is no such grant defined for user 'webuser' on host '%.webdomain' on table 'logs'.</pre>
</blockquote>
<p>So this requires setting up privileges on the table level in the first place. Plus note that as long as the grants on the database level do allow for INSERTs, you cannot override it on the table level.</p>
<h4>Other ideas?</h4>
<p>I never actually implemented table disk quota. I'm not sure this is a viable solution; but I haven't heard all the arguments in favor as yet, so I don't want to rule this out.</p>
<p>Please share below if you are using other means of table size control, other than the trivial cleanup of old records.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/limiting-table-disk-quota-in-mysql/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Personal observation: more migrations from MyISAM to InnoDB</title>
		<link>http://code.openark.org/blog/mysql/personal-observation-more-migrations-from-myisam-to-innodb</link>
		<comments>http://code.openark.org/blog/mysql/personal-observation-more-migrations-from-myisam-to-innodb#comments</comments>
		<pubDate>Wed, 16 Jun 2010 16:43:42 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[Opinions]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2517</guid>
		<description><![CDATA[I'm evidencing an increase in the planning, confidence &#38; execution for MyISAM to InnoDB migration. How much can a single consultant observe? I agree Oracle should not go to PR based on my experience. But I find that: More companies are now familiar with InnoDB than there used to. More companies are interested in migration [...]]]></description>
			<content:encoded><![CDATA[<p>I'm evidencing an increase in the planning, confidence &amp; execution for MyISAM to InnoDB migration.</p>
<p>How much can a single consultant observe? I agree Oracle should not go to PR based on my experience. But I find that:</p>
<ul>
<li>More companies are now familiar with InnoDB than there used to.</li>
<li>More companies are interested in migration to InnoDB than there used to.</li>
<li>More companies feel such migration to be safe.</li>
<li>More companies start up with an InnoDB based solution than with a MyISAM based solution.</li>
</ul>
<p>This is the way I see it. No doubt, the Oracle/Sun deal made its impact. The fact that InnoDB is no longer a 3rd party; the fact Oracle invests in InnoDB and no other engine (Falcon is down, no real development on MyISAM); the fact InnoDB is to be the default engine: all these put companies at ease with migration.</p>
<p><span id="more-2517"></span>I am happy with this change. I believe for most installations InnoDB provides with a clear advantage over MyISAM (though MyISAM has its uses), and this makes for more robust, correct and manageable MySQL instances; the kind that make a DBA's life easier and quieter. And it is easier to make customers see the advantages.</p>
<p>I am not inclined to say <em>"You should migrate your entire database to InnoDB"</em>. I don't do that a lot. But recently, more customers approach and say <em>"We were thinking about migrating our entire database to InnoDB, what do you think?"</em>. What a change of approach.</p>
<p>And, yes: there are still <em>a lot</em> of companies using MyISAM based databases, who still live happily.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/personal-observation-more-migrations-from-myisam-to-innodb/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A MyISAM backup is blocking as read-only, including mysqldump backup</title>
		<link>http://code.openark.org/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup</link>
		<comments>http://code.openark.org/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup#comments</comments>
		<pubDate>Tue, 18 May 2010 17:29:05 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2441</guid>
		<description><![CDATA[Actually this is (almost) all I wanted to say. This is intentionally posted with all related keywords in title, in the hope that a related search on Google will result with this post on first page. I'm just still encountering companies who use MyISAM as their storage engine and are unaware that their nightly backup [...]]]></description>
			<content:encoded><![CDATA[<p>Actually this is (almost) all I wanted to say. This is intentionally posted with all related keywords in title, in the hope that a related search on Google will result with this post on first page.</p>
<p>I'm just still encountering companies who use MyISAM as their storage engine and are <em>unaware</em> that their nightly backup actually blocks their application, basically rendering their product unavailable for long minutes to hours on a nightly basis.</p>
<p>So this is posted as a warning for those who were not aware of this fact.</p>
<p>There is no hot (non blocking) backup for MyISAM. Closest would be file system snapshot, but even this requires flushing of tables, which may take a while to complete. If you must have a hot backup, then either use replication - and take the risk of the slave not being in complete sync with the master - or use another storage engine, i.e. InnoDB.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/a-myisam-backup-is-blocking-as-read-only-including-mysqldump-backup/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The depth of an index: primer</title>
		<link>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer</link>
		<comments>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer#comments</comments>
		<pubDate>Thu, 09 Apr 2009 03:55:08 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Data Types]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=545</guid>
		<description><![CDATA[InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index). In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves. Assuming the index is not in memory, the depth [...]]]></description>
			<content:encoded><![CDATA[<p>InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).</p>
<p>In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.</p>
<p>Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.</p>
<p>What affects the depth of an index?</p>
<p>There are quite a few structural issues, but it boils down to two important factors:</p>
<ol>
<li>The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.</li>
<li>The size of the indexed column(s). An index on an INT column can be expected to be shallower than an index on a CHAR(32) column (on a very small number of rows they may have the same depth, so we'll assume a large number of rows).</li>
</ol>
<p><span id="more-545"></span>Of course, these two factors also affect the total size of the index, hence its disk usage, but I wish to concentrate on the index depth.</p>
<p>Let's emphasize the second factor. It is best to index shorter columns, if that is possible. It is the reason behind using an index on a VARCHAR's prefix (e.g. KEY(email_address(16)). It is also a reason to use INT, instead of BIGINT columns for your primary key, when BIGINT is not required.</p>
<p>The larger the indexed data type is (or the total size of data types for all columns in a combined index), the less values that can fit in an index node. The less values in a node, the more node splits occur; the more nodes are required to build the index. The less values in the node, the less <em>wide</em> the index tree is. The less wide an index tree is, and the more nodes it has - the deeper it gets.</p>
<p>So bigger data types lead to deeper trees. Deeper trees lead to more IO operations on lookup.</p>
<h4>InnoDB</h4>
<p>On InnoDB there's another issue: all tables are clustered by primary key. Any access to table data requires diving into, or traversing the primary key tree.</p>
<p>On InnoDB, a secondary index (any index which is not the primary key) does not lead to table data. Instead, the "data" in the leaf nodes of a secondary index - are the primary key values.</p>
<p>And so, when looking up a value on an InnoDB table using a secondary key, we first search the secondary key to retrieve the primary key value, then go to the primary key tree to retrieve the data.</p>
<p>This means two index lookups, one of which is always the primary key.</p>
<p>On InnoDB, it is therefore in particular important to keep the primary key small. Have small data types. Prefer an SMALLINT to INT, if possible. Prefer an INT to BIGINT, if possible. Prefer an integer value over some VARCHAR text.</p>
<p>With long data types used in an InnoDB primary key, not only is the primary key index bloated (deep), but also every other index gets to be bloated, as the leaf values in all other indexes are those same long data types.</p>
<h4>MyISAM</h4>
<p>MyISAM does not use clustered trees, hence the primary key is just a regular unique key. All indexes are created equal and an index lookup only consists of a single index search. Therefore, two indexes do no affect one another, with the exception that they are competing on the same key cache.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/the-depth-of-an-index-primer/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>LOCK TABLES in MyISAM is NOT a poor man&#039;s tranactions substitute</title>
		<link>http://code.openark.org/blog/mysql/lock-tables-in-myisam-is-not-a-poor-mans-tranactions-substitute</link>
		<comments>http://code.openark.org/blog/mysql/lock-tables-in-myisam-is-not-a-poor-mans-tranactions-substitute#comments</comments>
		<pubDate>Wed, 18 Mar 2009 07:37:56 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=658</guid>
		<description><![CDATA[I get to hear that a lot: that LOCK TABLES with MyISAM is some sort of replacement for transactions; some model we can work with which gives us 'transactional flavor'. It isn't, and here's why. When we speak of a transactional database/engine, we check out its ACID compliance. Let's break out the ACID and see [...]]]></description>
			<content:encoded><![CDATA[<p>I get to hear that a lot: that LOCK TABLES with MyISAM is some sort of replacement for transactions; some model we can work with which gives us 'transactional flavor'.</p>
<p>It isn't, and here's why.</p>
<p>When we speak of a transactional database/engine, we check out its ACID compliance. Let's break out the ACID and see what LOCK TABLES provides us with:</p>
<ul>
<li><strong>A</strong>: Atomicity. MyISAM does not provide atomicity.  If we have LOCK TABLES followed by two statements, then closed by UNLOCK TABLES, then it follows that a crash between the two statements will have the first one applied, the second one not not applied. No mechanism ensures an "all or nothing" behavior.</li>
<li><strong>C</strong>: Consistency. An error in a statement would roll back the entire transaction in a transactional database. This won't work on MyISAM: every statement is "committed" immediately.</li>
<li><strong>I</strong>: Isolation. Without LCOK TABLES, working with MyISAM resembles using the <strong>read uncommitted</strong>, or <strong>dirty read</strong> isolation level. With LOCK TABLES - it depends. If you were to use LOCK TABLES ... WRITE on all tables in all statements, you would get the <strong>serializable</strong> isolation level. Actually it would be more than <strong>serializable</strong>. It would be <em>truely serial</em>.</li>
<li><strong>D</strong>: Durability. Did the INSERT succeed? And did the power went down just after? MyISAM provides not guarantees that the data will be there.</li>
</ul>
<p><span id="more-658"></span>So of all ACID properties, the only thing we could get is a <strong>serializable</strong> isolation level, and that, too, only if we used LOCK TABLES ... WRITE  practically everywhere.</p>
<p>Where does the notion come from, then?</p>
<p>There's one thing which LOCK TABLES does help us with: race conditions. It effectively creates a mutex block. The same effect could be achieved when using GET_LOCK() and RELEASE_LOCK(). Perhaps this is the source of confusion.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/lock-tables-in-myisam-is-not-a-poor-mans-tranactions-substitute/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>MySQL User Group Meetings in Israel</title>
		<link>http://code.openark.org/blog/mysql/mysql-user-group-meetings-in-israel</link>
		<comments>http://code.openark.org/blog/mysql/mysql-user-group-meetings-in-israel#comments</comments>
		<pubDate>Wed, 11 Mar 2009 05:42:18 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[User Group]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=634</guid>
		<description><![CDATA[This is a short note that the MySQL User Group Meetings in Israel are established (well, re-established after a very long period). Thanks to Eddy Resnick from Sun Microsystems Israel who has set up the meetings. So far, we've had 2 successful meetings, and we intend to have more! First one was in Sun's offices [...]]]></description>
			<content:encoded><![CDATA[<p>This is a short note that the MySQL User Group Meetings in Israel are established (well, re-established after a very long period).</p>
<p>Thanks to Eddy Resnick from Sun Microsystems Israel who has set up the meetings. So far, we've had 2 successful meetings, and we intend to have more! First one was in Sun's offices in Herzlia; second one, held last week, was at <a title="Interbit" href="http://interbit.co.il/">Interbit</a> (a MySQL training center) in Ramat Gan. We hope to hold these meetings on a monthly basis, and the next ones are expected to be held at Interbit.</p>
<p>A new (blessed) law in Israel forbids us from sending invitations for these meetings via email without prior consent of the recepient (this law has passed as means of stopping spam). We do realize there are many users out there who would be interested in these meeting. For those users: please stay tuned to Interbit's website, where future meetings will be published - or just give them a call!</p>
<p>It was my honor to present a short session, one of three in this last meeting. Other presenters were Erad Deutch, who presented "MySQL Success Stories", and Moshe Kaplan, who presented "Sharding Solutions". I have presented "MyISAM &amp; InnoDB Tuning Fundamentals", where I have layed down the basics behind parameter tuning for these storage engines.</p>
<p>As per audience request, here's the <a href="http://code.openark.org/blog/wp-content/uploads/2009/03/innodb_myisam_tuning_fundamentals_share.pdf">presentation</a> in PDF format:</p>
<p>I intend to give sessions in future meetings, and have already started working on my next one. So please come, it's a fun way to pass a nice afternoon. See you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/mysql-user-group-meetings-in-israel/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Less known SQL syntax and functions in MySQL</title>
		<link>http://code.openark.org/blog/mysql/less-known-sql-syntax-and-functions-in-mysql</link>
		<comments>http://code.openark.org/blog/mysql/less-known-sql-syntax-and-functions-in-mysql#comments</comments>
		<pubDate>Sun, 23 Nov 2008 05:53:52 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=126</guid>
		<description><![CDATA["Standard SQL" is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.

Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE, are commonly used. Here's a list of some MySQL deviations to SQL, which are not so well known.]]></description>
			<content:encoded><![CDATA[<p>"Standard SQL" is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.</p>
<p>Some deviations are storage engine dependent. Others are more general. Many, such as <strong><code>INSERT IGNORE</code></strong>, are commonly used. Here's a list of some MySQL deviations to SQL, which are not so well known.<span id="more-126"></span></p>
<p>I'll be using MySQL's <a title="MySQL's world database setup" href="http://dev.mysql.com/doc/world-setup/en/world-setup.html">world database</a> for demonstration.</p>
<h4>GROUP_CONCAT</h4>
<p style="padding-left: 30px;">Assume the following query: <strong><code>SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode</code></strong>, which selects the number of cities per country, using MySQL's world database. It is possible to get a name for one "sample" city per country using standard SQL: <strong><code>SELECT CountryCode, Name, COUNT(*) FROM City GROUP BY CountryCode</code></strong></p>
<p style="padding-left: 30px;">But in MySQL it is also possible to get the list of cities per group: <strong><code>SELECT CountryCode, GROUP_CONCAT(Name), COUNT(*) FROM City GROUP BY CountryCode</code></strong>. This will provide with a comma delimited string of all city names per country.</p>
<h4>ORDER BY NULL</h4>
<p style="padding-left: 30px;">If you ran the previous queries, you may have noticed that the results were ordered by CountryCode. MySQL's default behavior when <strong><code>GROUP BY</code></strong> is used, is to order by the grouped column. But this means sorting is required, possibly using merge passes and temporary tables. MySQL accepts the following syntax:</p>
<p style="padding-left: 30px;"><strong><code>SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode ORDER BY NULL</code></strong></p>
<p style="padding-left: 30px;">If you <strong><code>EXPLAIN</code></strong> the query, you'll see no "Using filesort". When not using <strong><code>ORDER BY NULL</code></strong>, "Using filesort" appears.</p>
<h4>ALTER TABLE ... ORDER BY</h4>
<p style="padding-left: 30px;">MyISAM tables are not clustered. The table data is independent of indexes. Depending on <strong><code>concurrent_insert</code></strong> settings, new rows are either appended to the end of the table, or fill the space previously occupied by <strong>DELETE</strong>d rows.</p>
<p style="padding-left: 30px;">When you <strong><code>SELECT (*) FROM Country</code></strong>, the order of rows is as stored on disk. It is possible to do a one-time reordering of rows in a MyISAM table by executing: <strong><code>ALTER TABLE Country ORDER BY Code</code></strong>. This is a lengthy operation (on large tables), which locks the table, so take care when using it. The change does not last for long, either: as you <strong><code>INSERT</code></strong> new rows, the rows get out of order again. But if your table does not get modified, or only gets modified rarely, this is a nice trick to use when order of rows is important, and you don't want to pay the price of sorting per query.</p>
<h4>ROW_COUNT()</h4>
<p style="padding-left: 30px;">Anyone who uses MySQL with a connector (say, Connector/J with JDBC), knows that <strong><code>INSERT</code></strong>, <code><strong>DELETE</strong> </code>and <code><strong>UPDATE</strong> </code>statements return with an integer value: the number of modified rows. In MySQL, the explicit way to get the number of modified rows is to invoke <strong><code>SELECT ROW_COUNT()</code></strong> right after your query. This method is useful if you like to know whether your <strong><code>DELETE</code></strong> did in fact remove rows, or <strong><code>INSERT IGNORE</code></strong> did in fact add a row, etc.</p>
<h4>LIMIT</h4>
<p style="padding-left: 30px;">Well, MySQL DBAs are familiar with it. I just thought I'd mention <strong><code>LIMIT</code></strong>, since it's a MySQL deviation. I was surprised to find that out, when an Oracle DBA once asked me how I did paging with results. "You mean like <strong><code>LIMIT 60,10</code></strong>?" I asked, and he replied: "LIMIT??". So, you can <strong><code>LIMIT</code></strong> to limit the number of results, like: <strong><code>SELECT * FROM Country LIMIT 10</code></strong>, to only get first 10 rows, or to do paging like: <strong><code>SELECT * FROM Country LIMIT 60,10</code></strong>, which skips 60 rows, then reads 10.</p>
<h4>SQL_CALC_FOUND_ROWS, FOUND_ROWS()</h4>
<p style="padding-left: 30px;">While at it, it may be required to use LIMIT to only return 10 rows, but still ask MySQL how many rows there really were. Do it like this:</p>
<p style="padding-left: 30px;"><strong><code>SELECT SQL_CALC_FOUND_ROWS Code, Name FROM Country LIMIT 10;</code></strong></p>
<p style="padding-left: 30px;"><strong><code>SELECT FOUND_ROWS();</code></strong></p>
<p style="padding-left: 30px;">First query gives the required 10 results. Second query says "239", which is the total rows I would get had I not used <strong><code>LIMIT</code></strong>. Note that a <strong><code>SELECT SQL_CALC_FOUND_ROWS</code></strong> is a "heavy" query, which actually searches through the entire rowset, and then only returns the LIMITed rows. Use with care.</p>
<h4>PROCEDURE ANALYSE</h4>
<p style="padding-left: 30px;">A very nice diagnostic tool, which tells us what data types are proper based on existing data. If we have an <strong><code>INT</code></strong> column, but all values are smaller than 200, <strong><code>PROCEDURE_ANALYSE()</code></strong> recommends that we use a <strong><code>TINYINT</code></strong>. Usage: <strong><code>SELECT * FROM Country PROCEDURE ANALYSE(10,10)</code></strong>. Just remember it does not anticipate data growth. It only relies on current data.</p>
<h4>INSERT IGNORE</h4>
<p style="padding-left: 30px;">OK, I said above that it is commonly used, but couldn't help myself, it's just too useful to leave out. <strong><code>INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau')</code></strong> will silently abort if there's a <strong><code>UNIQUE KEY</code></strong> on `id` and an existing id=1000 value. A normal <strong><code>INSERT</code></strong> will terminate with an error, or raise an Exception in your application's code. It is of particular use when doing an extended INSERT: <strong><code>INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau'), (9009, 'Wanaka')</code></strong> may have trouble with the first row, but <em>will</em> insert the second row. <strong><code>ROW_COUNT()</code></strong> can tell me how well it went.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/less-known-sql-syntax-and-functions-in-mysql/feed</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Two storage engines; different plans, Part II</title>
		<link>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-ii</link>
		<comments>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-ii#comments</comments>
		<pubDate>Fri, 07 Nov 2008 17:55:08 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=35</guid>
		<description><![CDATA[In Part I of this article, we have seen how the internal structure of the storage engine's index can affect an execution plan. We've seen that some plans are inherent to the way engines are implemented. We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will [...]]]></description>
			<content:encoded><![CDATA[<p>In <a title=" Two storage engines; different plans, Part I" href="http://code.openark.org/blog/?p=9">Part I</a> of this article, we have seen how the internal structure of the storage engine's index can affect an execution plan. We've seen that some plans are inherent to the way engines are implemented.</p>
<p>We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will consider MyISAM and InnoDB. Again, we will use the world database for testing. This time, we will see how confident the storage engines are in their index search capabilities.</p>
<p>Many newcomers to databases often believe that an index search is always preferable to full table scan. This is not the case. If I were to look for 10 rows in a 1,000,000 rows table, using an indexed column - I could benefit from an index search. However, if I’m looking for 200,000 rows on that table (that’s 20% of the rows) - an index search can actually be much more expensive than a full table scan.<span id="more-35"></span></p>
<p>There are several points to consider here: a full table scan is often close to sequential, whereas an index traversal is not. Not only are the index nodes stored non sequentially, but the links from the index to table data may look like a macaroni plate. Also, the index structure itself is a tree-structure, and it can be shown that the number of pages in the index can be larger than the number of pages in the table. Even for partial index scans, it may be worthwhile to simply scan the table.</p>
<p>The threshold above which table scan is preferred is somewhere between 10% and 30% in common DBMS.</p>
<p>We will consider here a scenario where we index a two-valued column, a simple ‘T’ and ‘F’ enum. “That’s a very poor column to index”, you may say. But what if the ratio between the two values is high? Say, 1000:1? Should there be different search plans for the ‘F’ valued rows and for the ‘T’ valued rows?</p>
<p>Let us duplicate the CountryLanguage table, and make it much larger. We will create a table named “cl”, with some 125K rows.</p>
<p><strong><code>mysql&gt; SHOW CREATE TABLE CountryLanguage \G<br />
*************************** 1. row ***************************<br />
Table: CountryLanguage<br />
Create Table: CREATE TABLE `CountryLanguage` (<br />
`CountryCode` char(3) NOT NULL default '',<br />
`Language` char(30) NOT NULL default '',<br />
`IsOfficial` enum('T','F') NOT NULL default 'F',<br />
`Percentage` float(4,1) NOT NULL default '0.0',<br />
PRIMARY KEY  (`CountryCode`,`Language`)<br />
) ENGINE=MyISAM DEFAULT CHARSET=latin1<br />
1 row in set (0.00 sec)</code></strong></p>
<p><strong><code>mysql&gt; CREATE TABLE cl SELECT * FROM CountryLanguage;<br />
Query OK, 984 rows affected (0.02 sec)<br />
Records: 984  Duplicates: 0  Warnings: 0</code></strong></p>
<p>And now make it very large:</p>
<p><strong><code>mysql&gt; INSERT INTO cl SELECT * FROM cl;<br />
Query OK, 984 rows affected (0.02 sec)<br />
Records: 984  Duplicates: 0  Warnings: 0</code></strong></p>
<p>…</p>
<p><strong><code>mysql&gt; INSERT INTO cl SELECT * FROM cl;<br />
Query OK, 62976 rows affected (0.08 sec)<br />
Records: 62976  Duplicates: 0  Warnings: 0</code></strong></p>
<p><strong><code>mysql&gt; UPDATE cl SET IsOfficial='F';<br />
Query OK, 1265 rows affected (0.23 sec)<br />
Rows matched: 125952  Changed: 1265  Warnings: 0</code></strong></p>
<p><strong><code>mysql&gt; UPDATE cl SET IsOfficial='T' WHERE RAND()&lt;0.001;<br />
Query OK, 148 rows affected (0.20 sec)<br />
Rows matched: 148  Changed: 148  Warnings: 0</code></strong></p>
<p>We have now a large table, where the majority of rows have ‘F’ values for ‘IsOfficial’, and the minority have ‘T’. We shall now add an index on this column, and will then make sure the table is in MyISAM (it may be created with another storage engine, depending on our default engine parameter).</p>
<p><strong><code>mysql&gt; ALTER TABLE cl ADD INDEX (IsOfficial);<br />
Query OK, 125952 rows affected (0.31 sec)<br />
Records: 125952  Duplicates: 0  Warnings: 0</code></strong></p>
<p><strong><code>mysql&gt; ALTER TABLE cl ENGINE=MyISAM;<br />
Query OK, 125952 rows affected (1.21 sec)<br />
Records: 125952  Duplicates: 0  Warnings: 0</code></strong></p>
<p>Now let us compare the search plans for ‘F’ and for ‘T’.</p>
<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='F' \G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: cl<br />
type: ALL<br />
possible_keys: IsOfficial<br />
key: NULL<br />
key_len: NULL<br />
ref: NULL<br />
rows: 94464<br />
Extra: Using where<br />
1 row in set (0.02 sec)</code></strong></p>
<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='T' \G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: cl<br />
type: ref<br />
possible_keys: IsOfficial<br />
key: IsOfficial<br />
key_len: 1<br />
ref: const<br />
rows: 138<br />
Extra: Using where<br />
1 row in set (0.00 sec)</code></strong></p>
<p>What MyISAM decided was that an index search on the ‘F’ rows is useless. A table scan was deemed to be preferable. However, for ‘T’ values rows, the index we created was just fine, and would indeed be used.</p>
<p>InnoDB will state differently.</p>
<p><strong><code>mysql&gt; ALTER TABLE cl ENGINE=InnoDB;<br />
Query OK, 125952 rows affected (1.07 sec)<br />
Records: 125952  Duplicates: 0  Warnings: 0</code></strong></p>
<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='F' \G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: cl<br />
type: ref<br />
possible_keys: IsOfficial<br />
key: IsOfficial<br />
key_len: 1<br />
ref: const<br />
rows: 61667<br />
Extra: Using where<br />
1 row in set (0.00 sec)</code></strong></p>
<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='T' \G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: cl<br />
type: ref<br />
possible_keys: IsOfficial<br />
key: IsOfficial<br />
key_len: 1<br />
ref: const<br />
rows: 148<br />
Extra: Using where<br />
1 row in set (0.00 sec)<br />
</code><br />
</strong>On the ‘T’ search, MyISAM and InnoDB agree. But look at the plan for the ‘F’ rows: InnoDB still prefers an index search to table scan, even though it estimates a lookup on 50% of the rows.</p>
<p>The behavior just exposed is not entirely consistent. InnoDB and MyISAM differ in the way they update the index statistics. While ANALYZE TABLE on MyISAM performs an exaustive search on index values, InnoDB will only do 10 random test dives and return with a rough calculation. In fact, InnDB’s estimations can greatly vary from the real values distribution, and successive calls to ANALYZE table can produce varying results.</p>
<p>What has been presented in this part is not a rule to live by. You shouldn’t base your queries or expected behavior on the index distribution or search plan calculated by the storage engine. These may change in time. What’s instructive here is the freedom MySQL gives the storage engines in decision making, and the different actions taken when dealing with different engines.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-ii/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Two storage engines; different plans, Part I</title>
		<link>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-i</link>
		<comments>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-i#comments</comments>
		<pubDate>Sat, 01 Nov 2008 16:36:29 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Execution plan]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[MyISAM]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=9</guid>
		<description><![CDATA[A popping question is: "Can an execution plan change for different storage engines?" The answer is "Yes". I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine. We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many [...]]]></description>
			<content:encoded><![CDATA[<p>A popping question is: "Can an execution plan change for different storage engines?"</p>
<p>The answer is "Yes". I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine.</p>
<p>We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many respects, and in particular, the way they implement indexes and statistics: two major players in the optimizer's point of view.<span id="more-9"></span></p>
<p>Let's start with the famous <em>world </em>database, available from <a title="http://dev.mysql.com/doc/world-setup/en/world-setup.html" href="http://dev.mysql.com/doc/world-setup/en/world-setup.html">dev.mysql.com</a>. All tables in this schema are defined as MyISAM. We will alter them between MyISAM and InnoDB as we go along.</p>
<p>A peek at the Country table reveals:</p>
<p><strong><code>mysql&gt; SHOW CREATE TABLE Country \G<br />
*************************** 1. row ***************************<br />
Table: Country<br />
Create Table: CREATE TABLE `Country` (<br />
`Code` char(3) NOT NULL default '',<br />
`Name` char(52) NOT NULL default '',<br />
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL default 'Asia',<br />
...<br />
PRIMARY KEY  (`Code`)<br />
) ENGINE=MyISAM DEFAULT CHARSET=latin1<br />
1 row in set (0.00 sec)</code></strong></p>
<p>To see the first example of execution plan difference, we will add an index on the Country table:</p>
<p><strong><code>ALTER TABLE Country ADD INDEX (Continent);</code></strong></p>
<p>And run the following query to find European country codes:</p>
<p><strong><code>mysql&gt; SELECT Code FROM Country WHERE Continent = 'Europe';<br />
+------+<br />
| Code |<br />
+------+<br />
| NLD  |<br />
| ALB  |<br />
| AND  |<br />
| BEL  |<br />
| BIH  |<br />
| GBR  |<br />
...</code></strong></p>
<p>But how is this query executed?</p>
<p><strong><code>mysql&gt; EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe'\G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: Country<br />
type: ref<br />
possible_keys: Continent<br />
key: Continent<br />
key_len: 1<br />
ref: const<br />
rows: 37<br />
Extra: Using where<br />
1 row in set (0.00 sec)</code></strong></p>
<p>Simple enough: we asked for European countries only. MySQL has found the index on Continent to be appropriate. However, to get the actual Code, a table row read was necessary.</p>
<p>InnoDB will provide a different plan, though:</p>
<p><strong><code>mysql&gt; ALTER TABLE Country ENGINE=InnoDB;<br />
Query OK, 239 rows affected (0.18 sec)<br />
Records: 239  Duplicates: 0  Warnings: 0</code></strong></p>
<p><strong><code>mysql&gt; EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe'\G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: Country<br />
type: ref<br />
possible_keys: Continent<br />
key: Continent<br />
key_len: 1<br />
ref: const<br />
rows: 46<br />
Extra: Using where; Using index<br />
1 row in set (0.00 sec)</code></strong></p>
<p>Can you spot the difference? The "Extra" column now indicates "Using index" (The numbers of expected rows also differ, but that's another issue).</p>
<p>The reason for this change lies with the way MyISAM and InnoDB implement indexes. MyISAM takes the approach where the table data resides in its own space (and in fact, its own file), and all indexes refer to rows in that space. MyISAM is using nonclustered indexes.</p>
<p>InnoDB, however, uses a clustered index on the PRIMARY KEY. That is, for every table there is always a PRIMARY KEY index (even if we never defined one), and table data is aggregated withing the index' structure. And so, to access table rows, one must first traverse the PRIMARY KEY index. This type of index is called a "clustered index". The Code column is the primary key, and therefore the data is clustered on the Code column.</p>
<p>InnoDB's secondary indexes behave altogether differently. A secondary index does not refer to the table rows directly, but instead refer to the PRIMARY KEY value, which relates to those rows. A table look up using a secondary key involves a search on that key, only to get a PRIMARY KEY value, and search on that clustered index as well. A side effect is that a secondary index includes the values of the PRIMARY KEY. Each secondary index, like the one we created on Continent, is somewhat a compound index, like on (Continent, Code). This is the reason that for our query, a search on the index was enough. There was no need to access table data, since all relevant data could be found within the index.</p>
<p>I say "somewhat", because in contrast with an index on (Continent, Code), the index does not necessarily store the PRIMARY KEY values in any particular order. To prove this, let's ask the following:</p>
<p><strong><code>mysql&gt; EXPLAIN SELECT Code FROM Country WHERE Continent = 'Europe' ORDER BY Code\G<br />
*************************** 1. row ***************************<br />
id: 1<br />
select_type: SIMPLE<br />
table: Country<br />
type: ref<br />
possible_keys: Continent<br />
key: Continent<br />
key_len: 1<br />
ref: const<br />
rows: 46<br />
Extra: Using where; Using index; Using filesort<br />
1 row in set (0.00 sec)</code></strong></p>
<p>There's a "Using filesort" comment in the "Extra" column, which would not be there had we used a compound index on (Continent, Code).</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/two-storage-engines-different-plans-part-i/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

