{"id":89,"date":"2008-12-15T07:56:14","date_gmt":"2008-12-15T05:56:14","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=89"},"modified":"2008-12-15T08:27:35","modified_gmt":"2008-12-15T06:27:35","slug":"using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache","title":{"rendered":"Using memcached functions for MySQL; an automated alternative to Query Cache"},"content":{"rendered":"<p>There&#8217;s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. <a href=\"http:\/\/tangent.org\/\">TangentOrg<\/a> have developed a memcached client in the form of MySQL UDFs (User Defined Functions).<\/p>\n<p>I wish to discuss the memcached functions for MySQL: if and how they should be used.<\/p>\n<p>Disclaimer: I do not work with memcached functions for MySQL on a production system; all that is written here reflects my opinion on how things should be done.<\/p>\n<p>With memcached functions for MySQL, we can do the following:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>memc_set('mykey', 'The answer is 42');\r\n<strong>SELECT <\/strong>memc_get('mykey');<\/pre>\n<\/blockquote>\n<p>(See my previous post on how to <a title=\"Installing memcached functions for MySQL\" href=\"http:\/\/code.openark.org\/blog\/mysql\/installing-memcached-functions-for-mysql\">install memcached functions for MySQL<\/a>).<\/p>\n<h4>In what scenario should we use these functions?<\/h4>\n<p>I believe memcached is the right tool for the application level. I am less enthusiastic about using it from MySQL. Sure, pushing it down to MySQL centralizes everything. Instead of having all my application code (PHP, Java etc.) access memcached separately, they can all access one single MySQL node, which gets to access memcached. I see two problems with this approach:<!--more--><\/p>\n<ul>\n<li>Doing this adds load on the database. I think the greatest advantage of memcached is that it allows us to alleviate load from the database. By pushing everything into MySQL we counter that benefit. We pay here both for loading the MySQL network and for the CPU consumed by MySQL to do the job. In a distributed application which used memcached, every server gets to take some of the load.<\/li>\n<li>It seems to me as a flawed design. The database should be at an end point, and should not rely on anything except the operating system, file system and network. Sure, there could be applications talking to the database, but the database should be able to work all by itself. By putting memcached <em>behind<\/em> the database, we make the database dependent upon an external application.<\/li>\n<\/ul>\n<h4>How about memcached <em>increments<\/em>?<\/h4>\n<p>memcached provides an increment mechanism, which can be used by MySQL to create distinct PRIMARY KEYs, like sequences in other databases. While this seems attractive, this feature fits most into the second point above: it makes MySQL completely dependant on memcached. So if memcached is down, MySQL is unable to generate keys.<\/p>\n<h4>memcahced invalidation<\/h4>\n<p>I believe a very good use would be to let MySQL invalidate cached data. Not set or get anything, just invalidate. To explain, let&#8217;s compare with MySQL&#8217;s query cache. I&#8217;ll be using <a title=\"MySQL's world database setup\" href=\"http:\/\/dev.mysql.com\/doc\/world-setup\/en\/world-setup.html\">MySQL&#8217;s world database<\/a>.<\/p>\n<p>It is a known issue with the query cache, that if you change (INSERT\/UPDATE\/DELETE) data within a certain table, all queries involved with that table are invalidated. Take a look at the following:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>* <strong>FROM <\/strong>City <strong>WHERE <\/strong>CountryCode='BLZ';\r\n<strong>UPDATE <\/strong>City <strong>SET <\/strong>Population=Population+1 <strong>WHERE <\/strong>CountryCode='CHE';\r\n<strong>SELECT <\/strong>* <strong>FROM <\/strong>City <strong>WHERE <\/strong>CountryCode='BLZ';<\/pre>\n<\/blockquote>\n<p>The UPDATE does not affect the results for the SELECT query. Nevertheless, the second SELECT does not return from the query cache, since it&#8217;s invalidated by the UPDATE.<\/p>\n<p>memcached can be used to solve this problem in a programmatic way. Let&#8217;s look at a short python program:<strong> memcached_test.py<\/strong>. What is does (see blue highlighted rows) is connect to memcached; connect to MySQL, and try to get the results for following from memcached:<\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>* <strong>FROM <\/strong>City <strong>WHERE <\/strong>CountryCode='BLZ';\r\n<strong>SELECT <\/strong>* <strong>FROM <\/strong>City <strong>WHERE <\/strong>CountryCode='CHE';<\/pre>\n<\/blockquote>\n<p>If these results are in memcached, they are returned immediately. If not, they are retrieved from MySQL, then inserted into memcached. The results for &#8216;CHE&#8217; are under the &#8216;City:CHE&#8217; key, and &#8216;BLZ&#8217; is under &#8216;City:BLZ&#8217;.<\/p>\n<blockquote>\n<pre><strong>import <\/strong>MySQLdb\r\n<strong>import <\/strong>memcache\r\n\r\n<strong>def <\/strong>select_cities_by_country(country_code):\r\n\tkey = <span style=\"color: #993300;\"><strong>\"City:\"<\/strong><\/span>+country_code\r\n\t<span style=\"color: #3366ff;\">cities = memcache_client.get(key)<\/span>\r\n\t<strong>if <\/strong>cities:\r\n\t\tfound_in_memcached = <strong>True<\/strong>\r\n\t<strong>else<\/strong>:\r\n\t\tcursor = conn.cursor()\r\n\t\tcursor.execute(<span style=\"color: #993300;\"><strong>\"\"\"\r\n\t\t\tSELECT Name, CountryCode,\r\n\t\t\tPopulation FROM City\r\n\t\t\tWHERE CountryCode=%s\"\"\"<\/strong><\/span>,\r\n\t\t\t\tcountry_code)\r\n\t\t<span style=\"color: #3366ff;\">cities = cursor.fetchall()<\/span>\r\n\t\t<span style=\"color: #3366ff;\">memcache_client.set(key, cities, 100)<\/span>\r\n\t\tcursor.close()\r\n\t\tfound_in_memcached = <strong>False<\/strong>\r\n\t<strong>for <\/strong>row <strong>in <\/strong>cities:\r\n\t\tprint <span style=\"color: #993300;\"><strong>\"%s, %s: %d\"<\/strong><\/span> % (row[0], row[1], row[2])\r\n\tprint <span style=\"color: #993300;\"><strong>\"%s found in memcached? %s\\n\"<\/strong><\/span> % (\r\n                country_code, found_in_memcached)\r\n\r\nconn = <strong>None<\/strong>\r\n<strong>try<\/strong>:\r\n\t<strong>try<\/strong>:\r\n\t\tconn = MySQLdb.connect(\r\n\t\t\thost=<span style=\"color: #993300;\"><strong>\"localhost\"<\/strong><\/span>,\r\n                        user=<span style=\"color: #993300;\"><strong>\"myuser\"<\/strong><\/span>,\r\n\t\t\tpasswd=<span style=\"color: #993300;\"><strong>\"mypassword\"<\/strong><\/span>,\r\n\t\t\tunix_socket=<span style=\"color: #993300;\"><strong>\"\/tmp\/mysql.sock\"<\/strong><\/span>,\r\n                        db=<span style=\"color: #993300;\"><strong>\"world\"<\/strong><\/span>)\r\n\t\tmemcache_client = memcache.Client([<span style=\"color: #993300;\"><strong>\"127.0.0.1:11211\"<\/strong><\/span>])\r\n\r\n\t\tselect_cities_by_country(<span style=\"color: #993300;\"><strong>\"BLZ\"<\/strong><\/span>);\r\n\t\tselect_cities_by_country(<span style=\"color: #993300;\"><strong>\"CHE\"<\/strong><\/span>);\r\n\t<strong>except <\/strong>Exception, err:\r\n\t\tprint err\r\n<strong>finally<\/strong>:\r\n\t<strong>if <\/strong>conn:\r\n\t\tconn.close()<\/pre>\n<\/blockquote>\n<p>Let&#8217;s run this program. This is a first time run, so obviously nothing is in memcached:<\/p>\n<blockquote>\n<pre><strong>$ python memcached_test.py<\/strong>\r\nBelize City, BLZ: 55810\r\nBelmopan, BLZ: 7105\r\n<strong>BLZ <\/strong>found in memcached? <strong>False<\/strong>\r\n\r\nZurich, CHE: 336800\r\nGeneve, CHE: 173500\r\nBasel, CHE: 166700\r\nBern, CHE: 122700\r\nLausanne, CHE: 114500\r\n<strong>CHE <\/strong>found in memcached? <strong>False<\/strong><\/pre>\n<\/blockquote>\n<p>Immediately executed again, we get results from memcached:<\/p>\n<blockquote>\n<pre><strong>$ python memcached_test.py<\/strong>\r\nBelize City, BLZ: 55810\r\nBelmopan, BLZ: 7105\r\n<strong>BLZ <\/strong>found in memcached? <strong>True<\/strong>\r\n\r\nZurich, CHE: 336800\r\nGeneve, CHE: 173500\r\nBasel, CHE: 166700\r\nBern, CHE: 122700\r\nLausanne, CHE: 114500\r\n<strong>CHE <\/strong>found in memcached? <strong>True<\/strong><\/pre>\n<\/blockquote>\n<p>We are going to execute the following query:<\/p>\n<blockquote>\n<pre><strong>UPDATE <\/strong>City <strong>SET <\/strong>Population=Population+1 <strong>WHERE <\/strong>CountryCode='CHE';<\/pre>\n<\/blockquote>\n<p>But nothing as yet will invalidate our memcached values. Let&#8217;s set up TRIGGERs on the City table:<\/p>\n<blockquote>\n<pre><strong>DELIMITER <\/strong>$$\r\n\r\n<strong>DROP TRIGGER IF EXISTS<\/strong> City_AI $$\r\n<strong>CREATE TRIGGER<\/strong> City_AI <strong>AFTER INSERT ON<\/strong> City\r\n<strong>FOR EACH ROW\r\nBEGIN\r\n  SELECT<\/strong> memc_delete(<strong>CONCAT<\/strong>('City:',<strong>NEW<\/strong>.CountryCode)) <strong>INTO <\/strong>@discard;\r\n<strong>END<\/strong>;\r\n$$\r\n\r\n<strong>DROP TRIGGER IF EXISTS<\/strong> City_AU $$\r\n<strong>CREATE TRIGGER<\/strong> City_AU <strong>AFTER UPDATE ON<\/strong> City\r\n<strong>FOR EACH ROW\r\nBEGIN\r\n  SELECT<\/strong> memc_delete(<strong>CONCAT<\/strong>('City:',<strong>OLD<\/strong>.CountryCode)) <strong>INTO <\/strong>@discard;\r\n  <strong>SELECT <\/strong>memc_delete(<strong>CONCAT<\/strong>('City:',<strong>NEW<\/strong>.CountryCode)) <strong>INTO <\/strong>@discard;\r\n<strong>END<\/strong>;\r\n$$\r\n\r\n<strong>DROP TRIGGER IF EXISTS<\/strong> City_AD $$\r\n<strong>CREATE TRIGGER<\/strong> City_AD <strong>AFTER DELETE ON<\/strong> City\r\n<strong>FOR EACH ROW\r\nBEGIN\r\n  SELECT<\/strong> memc_delete(<strong>CONCAT<\/strong>('City:',<strong>OLD<\/strong>.CountryCode)) <strong>INTO <\/strong>@discard;\r\n<strong>END<\/strong>;\r\n$$\r\n\r\n<strong>DELIMITER <\/strong>;<\/pre>\n<\/blockquote>\n<p>These triggers will cause any change to a city invalidates all cities in the same country. Naive? Far less than MySQL&#8217;s query cache. Let&#8217;s put this to the test:<\/p>\n<blockquote>\n<pre>mysql&gt; <strong>UPDATE <\/strong>City <strong>SET <\/strong>Population=Population+1 <strong>WHERE <\/strong>CountryCode='CHE';\r\nQuery OK, 5 rows affected (0.01 sec)\r\nRows matched: 5  Changed: 5  Warnings: 0<\/pre>\n<\/blockquote>\n<p>And run out python program one last time:<\/p>\n<blockquote>\n<pre><strong>$ python memcached_test.py<\/strong>\r\nBelize City, BLZ: 55810\r\nBelmopan, BLZ: 7105\r\n<strong>BLZ <\/strong>found in memcached? <span style=\"color: #339966;\"><strong>True<\/strong><\/span>\r\n\r\nZurich, CHE: 336801\r\nGeneve, CHE: 173501\r\nBasel, CHE: 166701\r\nBern, CHE: 122701\r\nLausanne, CHE: 114501\r\n<strong>CHE <\/strong>found in memcached? <span style=\"color: #339966;\"><strong>False<\/strong><\/span><\/pre>\n<\/blockquote>\n<p>Right! The &#8216;CHE&#8217; values were invalidated, and could not be found in memcaches. &#8216;BLZ&#8217;, however, wasn&#8217;t disturbed.<\/p>\n<p>We can further improve our invalidation mechanism to check only for changes for desired columns. This will require some more code in our triggers.<\/p>\n<h4>Notes<\/h4>\n<p>The triggers themselves pose a performance penalty on our code. It is assumed that SELECTs are more important here, or else we would not use caching at all. At any case, the example provided here has not been benchmarked, and its value can only be estimated in your real life situation.<\/p>\n<h4>Conclusion<\/h4>\n<p>I believe invalidation is the most interesting part of memcached functions for MySQL. It makes the most sense:<\/p>\n<ul>\n<li>No data passes between MySQL and memcached.<\/li>\n<li>The application isn&#8217;t even aware that MySQL is talking to memcached. MySQL does everything internally using triggers.<\/li>\n<li>MySQL does not depend on memcached. If memcached goes away, the triggers will simply have no effect. It is still possible that due to temporary network failure, an invalidation is skipped. But memcached supports us by adding a timeout for cached values, so we have some kind of &#8220;backup plan&#8221;.<\/li>\n<\/ul>\n<p>Please share below your insights and real life experience with memcached functions for MySQL.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. TangentOrg have developed a memcached client in the form of MySQL UDFs (User Defined Functions). I wish to discuss the memcached functions for MySQL: if and how they should be used. Disclaimer: I do not [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[28,31,29,30],"class_list":["post-89","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-memcached","tag-python","tag-query-cache","tag-triggers"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1r","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/89","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=89"}],"version-history":[{"count":13,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/89\/revisions"}],"predecessor-version":[{"id":394,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/89\/revisions\/394"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=89"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=89"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=89"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}