{"id":4561,"date":"2011-12-12T09:35:19","date_gmt":"2011-12-12T07:35:19","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=4561"},"modified":"2011-12-12T15:06:09","modified_gmt":"2011-12-12T13:06:09","slug":"information_schema-optimizations-still-crashing-my-servers","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/information_schema-optimizations-still-crashing-my-servers","title":{"rendered":"INFORMATION_SCHEMA Optimizations: still crashing my servers"},"content":{"rendered":"<p><strong>[Update<\/strong>: need to take more breaks: now<strong> NOT<\/strong> crashing my servers! See clarifications below<strong>]<\/strong><\/p>\n<p><a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/information-schema-optimization.html\">INFORMATION_SCHEMA Optimizations<\/a> are meant to make your <strong>INFORMATION_SCHEMA<\/strong> queries lighter and safer.<\/p>\n<p>For example, if you&#8217;re going to query the <strong>COLUMNS<\/strong> table for just the columns of a single table, then the following:<\/p>\n<blockquote>\n<pre>SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='sakila' AND TABLE_NAME='rental'<\/pre>\n<\/blockquote>\n<p>makes for an optimization: specifying a literal on <strong>TABLE_SCHEMA<\/strong> avoid scanning the directories of other schemata. Specifying a literal on <strong>TABLE_NAME<\/strong> avoids checking up on other tables. So it&#8217;s a one-schema-one-table read operation, as opposed to <em>&#8220;first read every single column from all and any single schema and table, then return only those I&#8217;m interested in&#8221;<\/em>.<\/p>\n<p>Here&#8217;s the execution plan for the above query:<\/p>\n<blockquote>\n<pre>*************************** 1. row ***************************\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 id: 1\r\n\u00a0 select_type: SIMPLE\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 table: COLUMNS\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 type: ALL\r\npossible_keys: NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 key: TABLE_SCHEMA,TABLE_NAME\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 key_len: NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ref: NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rows: NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Extra: Using where; Open_frm_only; Scanned 0 databases<\/pre>\n<\/blockquote>\n<p>What I tried to do is to read the entire <strong>COLUMNS<\/strong> table, one schema at a time, one table at a time. I&#8217;m good with this taking longer time.<\/p>\n<p>I have a production system on which reads from <strong>COLUMNS<\/strong> <em>consistently crash the servers<\/em>. Well, one read at a time can&#8217;t do harm, right?<!--more--><\/p>\n<p><del>Unfortunately, as the title of this posts reveals, even sequential read of <strong>COLUMNS<\/strong> using <strong>INFORMATION_SCHEMA<\/strong> optimization does not help: a minute into the process and the client lost connection. The server crashed.<\/del><\/p>\n<p><del>I was expecting that table locks would be released, buffers released etc. One at a time, there wouldn&#8217;t be a congestion of locks, reads, table cache suffocation etc.<\/del><\/p>\n<p><del>Was actually having high hopes for this to succeed. I have to find a way in which <strong>INFORMATION_SCHEMA<\/strong> tables are not dangerous.<\/del><\/p>\n<p>A few hours later, and I have both conclusions and achievements.<\/p>\n<p>There are indeed memory issues with querying from <strong>INFORMATION_SCHEMA<\/strong> tables. I&#8217;ve found that <strong>VARCHAR(64)<\/strong> columns can consume <strong>64K<\/strong> each: I&#8217;m reading from large tables of more than <strong>1,000<\/strong> columns each, while monitoring MySQL&#8217;s memory consumption. By dividing the increase in memory by the number of rows resulting from a query I sent, and which was for one single columns, I got an almost exact <strong>64K<\/strong> value per row.<\/p>\n<p>So a query on <strong>INFORMATION_SCHEMA<\/strong> consumes much more memory than it should. The good news is that this memory is released once the query terminates. So there is no leak into the session memory.<\/p>\n<p>This is combined with a <em>mistake of mine<\/em> in the way I iterated the tables, such that the problem was amplified: I happened to query much more than I needed, and so got my query&#8217;s memory bloated. That is to say, I used the <strong>INFORMATION_SCHEMA<\/strong> optimizations only partly right, and so got only part of the savings it could offer me.<\/p>\n<p>With better pinpointing I&#8217;m now actually able to read from <strong>COLUMNS,<\/strong> without crashing my servers, <em>consistently<\/em>.<\/p>\n<p>I will further look into the <strong>64K<\/strong> issue. That in itself still drains a lot of memory: on my <a href=\"http:\/\/code.openark.org\/forge\/mycheckpoint\">mycheckpoint<\/a> schema tables a singe table read means &gt; <strong>64MB<\/strong> of query memory down the drain.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[Update: need to take more breaks: now NOT crashing my servers! See clarifications below] INFORMATION_SCHEMA Optimizations are meant to make your INFORMATION_SCHEMA queries lighter and safer. For example, if you&#8217;re going to query the COLUMNS table for just the columns of a single table, then the following: SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA=&#8217;sakila&#8217; AND TABLE_NAME=&#8217;rental&#8217; [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[24],"class_list":["post-4561","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-information_schema"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1bz","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4561","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=4561"}],"version-history":[{"count":5,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4561\/revisions"}],"predecessor-version":[{"id":4565,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4561\/revisions\/4565"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=4561"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=4561"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=4561"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}