{"id":4761,"date":"2012-03-08T20:31:56","date_gmt":"2012-03-08T18:31:56","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=4761"},"modified":"2012-03-08T20:32:54","modified_gmt":"2012-03-08T18:32:54","slug":"auto-caching-information_schema-tables-seeking-input","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/auto-caching-information_schema-tables-seeking-input","title":{"rendered":"Auto caching INFORMATION_SCHEMA tables: seeking input"},"content":{"rendered":"<h4>The short version<\/h4>\n<p>I have it all working. It&#8217;s kind of magic. But there are issues, and I&#8217;m not sure it should even exist, and am looking for input.<\/p>\n<h4>The long version<\/h4>\n<p>In <a title=\"Link to Auto caching tables\" href=\"http:\/\/code.openark.org\/blog\/mysql\/auto-caching-tables\" rel=\"bookmark\">Auto caching tables<\/a> I presented with a hack which allows getting cached or fresh results via a simple SELECT queries.<\/p>\n<p>The drive for the above hack was <strong>INFORMATION_SCHEMA<\/strong> tables. There are two major problems with <strong>INFORMATION_SCHEMA<\/strong>:<\/p>\n<ol>\n<li>Queries on schema-oriented tables such as <strong>TABLES<\/strong>, <strong>COLUMNS<\/strong>, <strong>STATISTICS<\/strong>, etc. are heavyweight. How heavyweight? Enough to make a lockdown of your database. Enough to crash down your database in some cases.<\/li>\n<li>The data is always generated on-the-fly, as you request it. Query the <strong>COLUMNS<\/strong> table twice, and risk two lockdowns of your database.<\/li>\n<\/ol>\n<p>The auto-cache mechanism solves issue <strong>#2<\/strong>. I have it working, time based. I have an auto-cache table for each of the <strong>INFORMATION_SCHEMA<\/strong> heavyweight tables. Say, every <strong>30<\/strong> minutes the cache is invalidated. Throughout those <strong>30<\/strong> minutes, you get a free pass!<\/p>\n<p>The auto-cache mechanism also paves the road to solving issue <strong>#1<\/strong>: since it works by invoking a stored routine, I have better control of the way I read <strong>INFORMATION_SCHEMA<\/strong>. This, I can take advantage of <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/information-schema-optimization.html\">INFORMATION_SCHEMA optimization<\/a>. It&#8217;s tedious, but not complicated.<\/p>\n<p>For example, if I wanted to cache the <strong>TABLES<\/strong> table, I don&#8217;t necessarily read the entire <strong>TABLES<\/strong> data in one read. Instead, I can iterate the schemata, get a list of table names per schema, then read full row data for these, table by table. The result? Many many more <strong>SELECT<\/strong>s, but more optimized, and no one-big-lock-it-all query.<\/p>\n<h4>And the problem is&#8230;<\/h4>\n<p><!--more-->I have two burning problems.<\/p>\n<ol>\n<li><strong>INFORMATION_SCHEMA<\/strong> optimization only works <em>that much<\/em>. It sometimes does not work. In particular, I&#8217;ve noticed that if you have a view which relies on another view (possibly relying on yet another view), things get out of hand. I author a monitoring tool for MySQL called <a href=\"http:\/\/code.openark.org\/forge\/mycheckpoint\/\">mycheckpoint<\/a>. It uses some fancy techniques for generating aggregated data, HTML and charts, by means of nested views. There are a few views there I can never query for in <strong>COLUMNS<\/strong>. It just crashes down my server. Repeatedly. And it&#8217;s a good machine with good configuration. Make that <strong>5<\/strong> machines. They all crash, repeatedly. I just can&#8217;t trust <strong>INFORMATION_SCHEMA<\/strong>!<\/li>\n<li>Replication: any caching table is bound to replicate. Does it make any sense to replicate cache for internal metadata? Does it make sense to query for the cached table on slave, to have it answer for <em>master&#8217;<\/em>s data? With plain old <strong>INFORMATION_SCHEMA<\/strong>, every server is on its own. Caching kinda works against this. Or is it fair enough, since we would usually expect master\/slaves to reflect same schema structure?<\/li>\n<\/ol>\n<p>I would feel much better if I could read <strong>SHOW<\/strong> statements with a <strong>SELECT<\/strong> query. Though I&#8217;ve found this <a href=\"http:\/\/code.openark.org\/blog\/mysql\/reading-results-of-show-statements-on-server-side\">nice hack<\/a>, it can&#8217;t work from a stored function, only via stored procedure. So it can&#8217;t be used from within a <strong>SELECT<\/strong> query. I&#8217;ve been banging my head for months now, I think I gave up on this one.<\/p>\n<p>Any insights are welcome!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The short version I have it all working. It&#8217;s kind of magic. But there are issues, and I&#8217;m not sure it should even exist, and am looking for input. The long version In Auto caching tables I presented with a hack which allows getting cached or fresh results via a simple SELECT queries. The drive [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[74,24,8],"class_list":["post-4761","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-hack","tag-information_schema","tag-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1eN","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4761","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=4761"}],"version-history":[{"count":7,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4761\/revisions"}],"predecessor-version":[{"id":4768,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4761\/revisions\/4768"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=4761"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=4761"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=4761"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}