{"id":35,"date":"2008-11-07T19:55:08","date_gmt":"2008-11-07T17:55:08","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=35"},"modified":"2008-12-13T06:47:21","modified_gmt":"2008-12-13T04:47:21","slug":"two-storage-engines-different-plans-part-ii","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/two-storage-engines-different-plans-part-ii","title":{"rendered":"Two storage engines; different plans, Part II"},"content":{"rendered":"<p>In <a title=\" Two storage engines; different plans, Part I\" href=\"http:\/\/code.openark.org\/blog\/?p=9\">Part I<\/a> of this article, we have seen how the internal structure of the storage engine&#8217;s index can affect an execution plan. We&#8217;ve seen that some plans are inherent to the way engines are implemented.<\/p>\n<p>We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will consider MyISAM and InnoDB. Again, we will use the world database for testing. This time, we will see how confident the storage engines are in their index search capabilities.<\/p>\n<p>Many newcomers to databases often believe that an index search is always preferable to full table scan. This is not the case. If I were to look for 10 rows in a 1,000,000 rows table, using an indexed column &#8211; I could benefit from an index search. However, if I\u2019m looking for 200,000 rows on that table (that\u2019s 20% of the rows) &#8211; an index search can actually be much more expensive than a full table scan.<!--more--><\/p>\n<p>There are several points to consider here: a full table scan is often close to sequential, whereas an index traversal is not. Not only are the index nodes stored non sequentially, but the links from the index to table data may look like a macaroni plate. Also, the index structure itself is a tree-structure, and it can be shown that the number of pages in the index can be larger than the number of pages in the table. Even for partial index scans, it may be worthwhile to simply scan the table.<\/p>\n<p>The threshold above which table scan is preferred is somewhere between 10% and 30% in common DBMS.<\/p>\n<p>We will consider here a scenario where we index a two-valued column, a simple \u2018T\u2019 and \u2018F\u2019 enum. \u201cThat\u2019s a very poor column to index\u201d, you may say. But what if the ratio between the two values is high? Say, 1000:1? Should there be different search plans for the \u2018F\u2019 valued rows and for the \u2018T\u2019 valued rows?<\/p>\n<p>Let us duplicate the CountryLanguage table, and make it much larger. We will create a table named \u201ccl\u201d, with some 125K rows.<\/p>\n<p><strong><code>mysql&gt; SHOW CREATE TABLE CountryLanguage \\G<br \/>\n*************************** 1. row ***************************<br \/>\nTable: CountryLanguage<br \/>\nCreate Table: CREATE TABLE `CountryLanguage` (<br \/>\n`CountryCode` char(3) NOT NULL default '',<br \/>\n`Language` char(30) NOT NULL default '',<br \/>\n`IsOfficial` enum('T','F') NOT NULL default 'F',<br \/>\n`Percentage` float(4,1) NOT NULL default '0.0',<br \/>\nPRIMARY KEY\u00a0 (`CountryCode`,`Language`)<br \/>\n) ENGINE=MyISAM DEFAULT CHARSET=latin1<br \/>\n1 row in set (0.00 sec)<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; CREATE TABLE cl SELECT * FROM CountryLanguage;<br \/>\nQuery OK, 984 rows affected (0.02 sec)<br \/>\nRecords: 984\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p>And now make it very large:<\/p>\n<p><strong><code>mysql&gt; INSERT INTO cl SELECT * FROM cl;<br \/>\nQuery OK, 984 rows affected (0.02 sec)<br \/>\nRecords: 984\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p>\u2026<\/p>\n<p><strong><code>mysql&gt; INSERT INTO cl SELECT * FROM cl;<br \/>\nQuery OK, 62976 rows affected (0.08 sec)<br \/>\nRecords: 62976\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; UPDATE cl SET IsOfficial='F';<br \/>\nQuery OK, 1265 rows affected (0.23 sec)<br \/>\nRows matched: 125952\u00a0 Changed: 1265\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; UPDATE cl SET IsOfficial='T' WHERE RAND()&lt;0.001;<br \/>\nQuery OK, 148 rows affected (0.20 sec)<br \/>\nRows matched: 148\u00a0 Changed: 148\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p>We have now a large table, where the majority of rows have \u2018F\u2019 values for \u2018IsOfficial\u2019, and the minority have \u2018T\u2019. We shall now add an index on this column, and will then make sure the table is in MyISAM (it may be created with another storage engine, depending on our default engine parameter).<\/p>\n<p><strong><code>mysql&gt; ALTER TABLE cl ADD INDEX (IsOfficial);<br \/>\nQuery OK, 125952 rows affected (0.31 sec)<br \/>\nRecords: 125952\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; ALTER TABLE cl ENGINE=MyISAM;<br \/>\nQuery OK, 125952 rows affected (1.21 sec)<br \/>\nRecords: 125952\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p>Now let us compare the search plans for \u2018F\u2019 and for \u2018T\u2019.<\/p>\n<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='F' \\G<br \/>\n*************************** 1. row ***************************<br \/>\nid: 1<br \/>\nselect_type: SIMPLE<br \/>\ntable: cl<br \/>\ntype: ALL<br \/>\npossible_keys: IsOfficial<br \/>\nkey: NULL<br \/>\nkey_len: NULL<br \/>\nref: NULL<br \/>\nrows: 94464<br \/>\nExtra: Using where<br \/>\n1 row in set (0.02 sec)<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='T' \\G<br \/>\n*************************** 1. row ***************************<br \/>\nid: 1<br \/>\nselect_type: SIMPLE<br \/>\ntable: cl<br \/>\ntype: ref<br \/>\npossible_keys: IsOfficial<br \/>\nkey: IsOfficial<br \/>\nkey_len: 1<br \/>\nref: const<br \/>\nrows: 138<br \/>\nExtra: Using where<br \/>\n1 row in set (0.00 sec)<\/code><\/strong><\/p>\n<p>What MyISAM decided was that an index search on the \u2018F\u2019 rows is useless. A table scan was deemed to be preferable. However, for \u2018T\u2019 values rows, the index we created was just fine, and would indeed be used.<\/p>\n<p>InnoDB will state differently.<\/p>\n<p><strong><code>mysql&gt; ALTER TABLE cl ENGINE=InnoDB;<br \/>\nQuery OK, 125952 rows affected (1.07 sec)<br \/>\nRecords: 125952\u00a0 Duplicates: 0\u00a0 Warnings: 0<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='F' \\G<br \/>\n*************************** 1. row ***************************<br \/>\nid: 1<br \/>\nselect_type: SIMPLE<br \/>\ntable: cl<br \/>\ntype: ref<br \/>\npossible_keys: IsOfficial<br \/>\nkey: IsOfficial<br \/>\nkey_len: 1<br \/>\nref: const<br \/>\nrows: 61667<br \/>\nExtra: Using where<br \/>\n1 row in set (0.00 sec)<\/code><\/strong><\/p>\n<p><strong><code>mysql&gt; EXPLAIN SELECT * FROM cl WHERE IsOfficial='T' \\G<br \/>\n*************************** 1. row ***************************<br \/>\nid: 1<br \/>\nselect_type: SIMPLE<br \/>\ntable: cl<br \/>\ntype: ref<br \/>\npossible_keys: IsOfficial<br \/>\nkey: IsOfficial<br \/>\nkey_len: 1<br \/>\nref: const<br \/>\nrows: 148<br \/>\nExtra: Using where<br \/>\n1 row in set (0.00 sec)<br \/>\n<\/code><br \/>\n<\/strong>On the \u2018T\u2019 search, MyISAM and InnoDB agree. But look at the plan for the \u2018F\u2019 rows: InnoDB still prefers an index search to table scan, even though it estimates a lookup on 50% of the rows.<\/p>\n<p>The behavior just exposed is not entirely consistent. InnoDB and MyISAM differ in the way they update the index statistics. While ANALYZE TABLE on MyISAM performs an exaustive search on index values, InnoDB will only do 10 random test dives and return with a rough calculation. In fact, InnDB\u2019s estimations can greatly vary from the real values distribution, and successive calls to ANALYZE table can produce varying results.<\/p>\n<p>What has been presented in this part is not a rule to live by. You shouldn\u2019t base your queries or expected behavior on the index distribution or search plan calculated by the storage engine. These may change in time. What\u2019s instructive here is the freedom MySQL gives the storage engines in decision making, and the different actions taken when dealing with different engines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Part I of this article, we have seen how the internal structure of the storage engine&#8217;s index can affect an execution plan. We&#8217;ve seen that some plans are inherent to the way engines are implemented. We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[15,26,14,13],"class_list":["post-35","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-execution-plan","tag-indexing","tag-innodb","tag-myisam"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-z","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/35","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=35"}],"version-history":[{"count":27,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/35\/revisions"}],"predecessor-version":[{"id":363,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/35\/revisions\/363"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=35"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=35"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=35"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}