{"id":4353,"date":"2012-03-06T15:18:36","date_gmt":"2012-03-06T13:18:36","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=4353"},"modified":"2012-03-06T18:39:24","modified_gmt":"2012-03-06T16:39:24","slug":"auto-caching-tables","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/auto-caching-tables","title":{"rendered":"Auto caching tables"},"content":{"rendered":"<p>Is there a way to create a caching table, some sort of a materialized view, such that <em>upon selecting<\/em> from that table, its data is validated\/invalidated?<\/p>\n<p><em>Hint<\/em>: yes.<\/p>\n<p>But to elaborate the point: say I have some table <strong>data_table<\/strong>. Can I rewrite all my queries which access <strong>data_table<\/strong> to read from some <strong>autocache_data_table<\/strong>, but have nothing changed in the query itself? No caveats, no additional <strong>WHERE<\/strong>s, and still have that <strong>autocache_data_table<\/strong> provide with the correct data, dynamically updated by some rule <em>of our choice<\/em>?<\/p>\n<p>And: no <em>crontab<\/em>, no <em>event scheduler<\/em>, and no funny triggers on <strong>data_table<\/strong>? In such way that invalidation\/revalidation occurs <em>upon <strong>SELECT<\/strong><\/em>?<\/p>\n<p>Well, yes.<\/p>\n<p>This post is long, but I suggest you read it through to understand the mechanism, it will be worthwhile.<\/p>\n<h4>Background<\/h4>\n<p>The following derives from my long research on how to provide better, faster and <em>safer<\/em> access to <strong>INFORMATION_SCHEMA<\/strong> tables. It is however not limited to this exact scenario, and in this post I provide with a simple, general purpose example. I&#8217;ll have more to share about <strong>INFORMATION_SCHEMA<\/strong> specific solutions shortly.<\/p>\n<p>I was looking for a server side solution which would not require query changes, apart from directing the query to other tables. Solution has to be supported by all standard MySQL installs; so: no plugins, no special rebuilds.<!--more--><\/p>\n<h4>Sample data<\/h4>\n<p>I&#8217;ll explain by walking through the solution. Let&#8217;s begin with some sample table:<\/p>\n<blockquote>\n<pre>CREATE TABLE sample_data (\r\n\u00a0 id INT UNSIGNED NOT NULL PRIMARY KEY,\r\n\u00a0 dt DATETIME,\r\n\u00a0 msg VARCHAR(128) CHARSET ascii\r\n);\r\n\r\nINSERT INTO sample_data VALUES (1, NOW(), 'sample txt');\r\nINSERT INTO sample_data VALUES (2, NOW(), 'sample txt');\r\nINSERT INTO sample_data VALUES (3, NOW(), 'sample txt');\r\n\r\nSELECT * FROM sample_data;\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>In this simplistic example, I wish to create a construct which looks exactly like <strong>sample_data<\/strong>, but which caches data according to some heuristic. It will, in fact, cache the entire content of <strong>sample_data<\/strong>.<\/p>\n<p>That much is not a problem: just create another table to cache the data:<\/p>\n<blockquote>\n<pre>CREATE TABLE cache_sample_data LIKE sample_data;<\/pre>\n<\/blockquote>\n<p>The big question is: how do you make the table invalidate itself while <strong>SELECT<\/strong>ing from it?<\/p>\n<p>Here&#8217;s the deal. I&#8217;ll ask for your patience while I draw the outline, and start with failed solutions. By the end, everything will work.<\/p>\n<h4>Failed attempt: purge rows from the table even while reading it<\/h4>\n<p>My idea is to create a stored function which purges the <strong>cache_sample_data<\/strong> table, then fills in with fresh data, according to some heuristic. Something like this:<\/p>\n<blockquote>\n<pre>DELIMITER $$\r\n\r\nCREATE FUNCTION `revalidate_cache_sample_data`() RETURNS tinyint unsigned\r\n\u00a0\u00a0\u00a0 MODIFIES SQL DATA\r\n\u00a0\u00a0\u00a0 DETERMINISTIC\r\n\u00a0\u00a0\u00a0 SQL SECURITY INVOKER\r\nBEGIN\r\n\u00a0 if(rand() &gt; 0.1) then\r\n\u00a0\u00a0\u00a0 return 0; -- simplistic heuristic\r\n\u00a0 end if;\r\n\r\n\u00a0 DELETE FROM cache_sample_data;\r\n\u00a0 INSERT INTO cache_sample_data SELECT * FROM sample_data;\r\n\u00a0 RETURN 0;\r\nEND $$\r\n\r\nDELIMITER ;<\/pre>\n<\/blockquote>\n<p>So the function uses some heuristic. It&#8217;s a funny <strong>RAND()<\/strong> in our case; you will want to check up on time stamps, or some flags, what have you. But this is not the important part here, and I want to keep the focus on the main logic.<\/p>\n<p>Upon deciding the table needs refreshing, the function purges all rows, then copies everything from <strong>sample_data<\/strong>. Sounds fair enough?<\/p>\n<p>Let&#8217;s try and invoke it. Just write some query by hand:<\/p>\n<blockquote>\n<pre>mysql&gt; SELECT revalidate_cache_sample_data();\r\n+--------------------------------+\r\n| revalidate_cache_sample_data() |\r\n+--------------------------------+\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>0<\/strong> |\r\n+--------------------------------+\r\n\r\nmysql&gt; SELECT revalidate_cache_sample_data();\r\n+--------------------------------+\r\n| revalidate_cache_sample_data() |\r\n+--------------------------------+\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>0<\/strong> |\r\n+--------------------------------+\r\n\r\nmysql&gt; SELECT revalidate_cache_sample_data();\r\n+--------------------------------+\r\n| revalidate_cache_sample_data() |\r\n+--------------------------------+\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>1<\/strong> |\r\n+--------------------------------+<\/pre>\n<\/blockquote>\n<p>First two invocations &#8211; nothing. The third one indicated a revalidation of cache data. Let&#8217;s verify:<\/p>\n<blockquote>\n<pre>mysql&gt; SELECT * FROM cache_sample_data;\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>OK, seems like the function works.<\/p>\n<p>We now gather some courage, and try combining calling to this function even while SELECTing from the cache table, like this:<\/p>\n<blockquote>\n<pre>SELECT\r\n\u00a0 cache_sample_data.*\r\nFROM\r\n\u00a0 cache_sample_data,\r\n\u00a0 (SELECT revalidate_cache_sample_data()) AS select_revalidate\r\n;\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>To explain what happens in the above query, consider its <a href=\"http:\/\/code.openark.org\/blog\/mysql\/slides-from-my-talk-programmatic-queries-things-you-can-code-with-sql\">programmatic nature<\/a>: we create a derived table, populated by the function&#8217;s result. That means the function is invoked in order to generate the derived table. The derived table itself must be materialized before the query begins execution, and so it is that we first invoke the function, then make the <strong>SELECT<\/strong>.<\/p>\n<p>Don&#8217;t open the champagne yet. While the above paragraph is correct, we are deceived: in this last invocation, the function did <strong>not<\/strong> attempt a revalidation. The <strong>RAND()<\/strong> function just didn&#8217;t provide with the right value.<\/p>\n<p>Let&#8217;s try again:<\/p>\n<blockquote>\n<pre>SELECT\r\n\u00a0 cache_sample_data.*\r\nFROM\r\n\u00a0 cache_sample_data,\r\n\u00a0 (SELECT revalidate_cache_sample_data()) AS select_revalidate\r\n;\r\n<strong>ERROR 1442 (HY000): Can't update table 'cache_sample_data' in stored function\/trigger because it is already used by statement which invoked this stored function\/trigger.<\/strong><\/pre>\n<\/blockquote>\n<p>Aha! Bad news. The MySQL manual says on <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/stored-program-restrictions.html\">Restrictions on Stored Programs<\/a>:<\/p>\n<blockquote><p>A stored function or trigger cannot modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger.<\/p><\/blockquote>\n<h4>Anyone to the rescue?<\/h4>\n<p>I was quite upset. Can we not make this work? At sorrow times like these, one reflects back on words of wiser people. What would <a href=\"http:\/\/rpbouman.blogspot.com\/\">Roland Bouman<\/a> say on this?<\/p>\n<p>Oh, yes; he would say: <em>&#8220;we can use a <strong>FEDERATED<\/strong> table which connect onto itself, thus bypass the above restriction&#8221;<\/em>.<\/p>\n<p>Unfortunately, <strong>FEDERATED<\/strong> is by default disabled nowadays; I cannot rely on its existence. Besides, to use <strong>FEDERATED<\/strong> one has to fill in passwords and stuff. Definitely not an out-of-the-box solution in this case.<\/p>\n<p>Few more days gone by. Decided the problem cannot be solved. And then it hit me.<\/p>\n<h4>MyISAM to the rescue<\/h4>\n<p><em><strong>MyISAM<\/strong><\/em>? Really?<\/p>\n<p>Yes, and not only <strong>MyISAM<\/strong>, but also its cousin: it&#8217;s long abandoned cousin, forgotten once <strong>views<\/strong> and <strong>partitions<\/strong> came into MySQL. <strong><a href=\"http:\/\/dev.mysql.com\/doc\/refman\/5.1\/en\/merge-storage-engine.html\">MERGE<\/a><\/strong>.<\/p>\n<p><strong>MERGE<\/strong> reflects the data contained within <strong>MyISAM<\/strong> tables. Perhaps the most common use for <strong>MERGE<\/strong> is to work out partitioned-like table of records, with <strong>MyISAM<\/strong> table-per month, and an overlooking <strong>MERGE<\/strong> table dynamically adding and removing tables from its view.<\/p>\n<p>But I intend for <strong>MERGE<\/strong> a different use: just be an identical reflection of <strong>cache_sample_data<\/strong>.<\/p>\n<p>So we must work out the following:<\/p>\n<blockquote>\n<pre>ALTER TABLE <strong>cache_sample_data<\/strong> ENGINE=<strong>MyISAM<\/strong>;\r\nCREATE TABLE <strong>cache_sample_data_wrapper<\/strong> LIKE cache_sample_data;\r\nALTER TABLE <strong>cache_sample_data_wrapper<\/strong> ENGINE=<strong>MERGE<\/strong> <strong>UNION=(cache_sample_data)<\/strong>;<\/pre>\n<\/blockquote>\n<p>I just want to verify the new table is setup correctly:<\/p>\n<blockquote>\n<pre>mysql&gt; SELECT * FROM cache_sample_data_wrapper;\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>Seems fine.<\/p>\n<p>So the next step is what makes the difference: the two tables are <em>not the same<\/em>. One <em>relies on the other<\/em>, but they are distinct. Our function <strong>DELETE<\/strong>s from and <strong>INSERT<\/strong>s to <strong>cached_sample_data<\/strong>, but it does <em>not affect, nor lock<\/em>, <strong>cache_sample_data_wrapper<\/strong>.<\/p>\n<p>We now rewrite our query to read:<\/p>\n<blockquote>\n<pre>SELECT\r\n\u00a0 cache_sample_data_wrapper.*\r\nFROM\r\n\u00a0 <strong>cache_sample_data_wrapper<\/strong>,\r\n\u00a0 (SELECT revalidate_cache_sample_data()) AS select_revalidate\r\n;<\/pre>\n<\/blockquote>\n<p>This query is perfectly valid. It works. To illustrate, I do:<\/p>\n<blockquote>\n<pre>-- Try this a few times till RAND() is lucky:\r\n\r\n<strong>TRUNCATE<\/strong> cache_sample_data;\r\n\r\nSELECT\r\n\u00a0 cache_sample_data_wrapper.*\r\nFROM\r\n\u00a0 cache_sample_data_wrapper,\r\n\u00a0 (SELECT revalidate_cache_sample_data()) AS select_revalidate\r\n;\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>Whoa! Where did all this data come from? Didn&#8217;t we just <strong>TRUNCATE<\/strong> the table?<\/p>\n<p>The query worked. The function re-populated <strong>cache_sample_data<\/strong>.<\/p>\n<h4>The final touch<\/h4>\n<p>Isn&#8217;t the above query just <em>beautiful<\/em>? I suppose not many will share my opinion. What happened to my declaration that <em>&#8220;the original query need not be changed, apart from querying a different table&#8221;<\/em>?<\/p>\n<p>Yes, indeed. It&#8217;s now time for the final touch. There&#8217;s nothing amazing in this step, but we all know the way it is packaged is what makes the sale. We will now use <em>views<\/em>. We use two of them since a view must not contain a <em>subquery<\/em> in the <strong>FROM<\/strong> clause. Here goes:<\/p>\n<blockquote>\n<pre>CREATE OR REPLACE VIEW <strong>revalidate_cache_sample_data_view<\/strong> AS\r\n\u00a0 SELECT revalidate_cache_sample_data()\r\n;\r\n\r\nCREATE OR REPLACE VIEW <strong>autocache_sample_data<\/strong> AS\r\n\u00a0 SELECT\r\n\u00a0\u00a0\u00a0 cache_sample_data_wrapper.*\r\n\u00a0 FROM\r\n\u00a0\u00a0\u00a0 cache_sample_data_wrapper,\r\n\u00a0\u00a0\u00a0 revalidate_cache_sample_data_view\r\n;<\/pre>\n<\/blockquote>\n<p>And finally, we can make a very simple query like this:<\/p>\n<blockquote>\n<pre>SELECT * FROM <strong>autocache_sample_data<\/strong>;\r\n--\r\n-- <strong><span style=\"color: #ff9900;\">Magic in work now!<\/span><\/strong>\r\n--\r\n+----+---------------------+------------+\r\n| id | dt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | msg\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+----+---------------------+------------+\r\n|\u00a0 1 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 2 | 2011-11-24 11:01:30 | sample txt |\r\n|\u00a0 3 | 2011-11-24 11:01:30 | sample txt |\r\n+----+---------------------+------------+<\/pre>\n<\/blockquote>\n<p>Much as we would query the original <strong>sample_data<\/strong> table.<\/p>\n<h4>Summary<\/h4>\n<p>So what have we got? A stored routine, a <strong>MyISAM<\/strong> table, a <strong>MERGE<\/strong> table and two views. Quite a lot of constructs just to cache a table! But a beautiful cache access: <em>plain old SQL queries<\/em>. The flow looks like this:<\/p>\n<blockquote><p><a href=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2011\/11\/autocache_flow.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4463\" title=\"autocache flow chart\" src=\"http:\/\/code.openark.org\/blog\/wp-content\/uploads\/2011\/11\/autocache_flow.png\" alt=\"\" width=\"835\" height=\"625\" srcset=\"https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2011\/11\/autocache_flow.png 835w, https:\/\/code.openark.org\/blog\/wp-content\/uploads\/2011\/11\/autocache_flow-300x224.png 300w\" sizes=\"auto, (max-width: 835px) 100vw, 835px\" \/><\/a><\/p><\/blockquote>\n<p>Our cache table is a <strong>MyISAM<\/strong> table. It can get corrupted, which is bad. But not completely bad: it&#8217;s nothing more than a cache; we can throw away its entire data, and revalidate. We can actually ask the function to revalidate (say, pass a parameter).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is there a way to create a caching table, some sort of a materialized view, such that upon selecting from that table, its data is validated\/invalidated? Hint: yes. But to elaborate the point: say I have some table data_table. Can I rewrite all my queries which access data_table to read from some autocache_data_table, but have [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[74,13,21,59,66],"class_list":["post-4353","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-hack","tag-myisam","tag-sql","tag-stored-routines","tag-views"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-18d","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4353","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=4353"}],"version-history":[{"count":39,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4353\/revisions"}],"predecessor-version":[{"id":4756,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/4353\/revisions\/4756"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=4353"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=4353"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=4353"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}