Query Cache – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Wed, 02 Feb 2011 08:15:11 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 On generating unique IDs using LAST_INSERT_ID() and other tools https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools#comments Wed, 02 Feb 2011 06:50:02 +0000 https://shlomi-noach.github.io/blog/?p=3283 There’s a trick for using LAST_INSERT_ID() to generate sequences in MySQL. Quoting from the Manual:

  1. Create a table to hold the sequence counter and initialize it:
    mysql> CREATE TABLE sequence (id INT NOT NULL);
    mysql> INSERT INTO sequence VALUES (0);
    
  2. Use the table to generate sequence numbers like this:
    mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);
    mysql> SELECT LAST_INSERT_ID();
    

This trick calls for trouble.

Contention

A customer was using this trick to generate unique session IDs for his JBoss sessions. These IDs would eventually be written back to the database in the form of log events. Business go well, and one day the customer adds three new JBoss servers (doubling the amount of webapps). All of a sudden, nothing works quite as it used to. All kinds of queries take long seconds to complete; load average becomes very high.

A short investigation reveals that a very slight load is enough to make for an accumulation of sequence-UPDATE queries. Dozens of them are active at any given time, waiting for long seconds.

InnoDB or MyISAM both make for poor response times. No wonder! Everyone’s contending for one lock.

Not just one

Other queries seem to hang as well. Why?

It is easy to forget or let go unnoticed that there are quite a few global locks involved with each query. If query cache is activated, then any query must pass through that cache, holding the query cache mutex. There’s a global mutex on MyISAM’s key cache. There’s one on InnoDB’s buffer pool (see multiple buffer pools in InnoDB 5.5), albeit less of an overhead. And there’s the table cache.

When table cache is enabled, any completed query attempts to return file handles to the cache. Any new query attempts to retrieve handles from the cache. While writing to the cache (extracting, adding), the cache is locked. When everyone’s busy doing the sequence-UPDATE, table cache lock is being abused. Other queries are unable to find the time to squire the lock and get on with their business.

What can be done?

One could try and increase the table_open_cache value. That may help to some extent, and for limited time. But the more requests are made, the quicker the problem surfaces again. When, in fact, reducing the table_open_cache to zero (well, minimum value is 1) can make for a great impact. If there’s nothing to fight for, everyone just get by on their own.

I know the following is not a scientific explanation, but it hits me as a good comparison: when my daughter brings a friend over, and there’s a couple of toys, both are happy. A third friend makes for a fight: “I saw it first! She took it from me! I was holding it!”. Any parent knows the ultimate solution to this kind of fight: take away the toys, and have them find something else to enjoy doing. OK, sorry for this unscientific display, I had to share my daily stress.

When no table cache is available, a query will go on opening the table by itself, and will not attempt to return the file handle back to the cache. The file handle will simply be destroyed. Now, usually this is not desired. Caching is good. But in our customer’s case, the cost of not using a table cache was minified by the cost of having everyone fight for the sequence table. Reducing the table cache made for an immediate relaxation of the database, with observable poorer responsiveness on peak times, however way better than with large table cache.

Other tools?

I don’t consider the above to be a good solution. It’s just a temporary hack.

I actually don’t like the LAST_INSERT_ID() trick. Moreover, I don’t see that it’s the database’s job to provide with unique IDs. Let it do relational stuff. If generating IDs is too intensive, let someone else do it.

NoSQL solutions provide such a service. Memcached, redis, MongoDB (and probably more) all provide with increment functions. Check them out.

Application level solutions

I actually use an application level solution to generate unique IDs. I mean, there’s always GUID(), but it’s result is just too long. Take a look at the following Java code:

public class Utils {
  private static long lastUniqueNumber = 0;

  public static synchronized long uniqueNumber() {
    long unique = System.currentTimeMillis();
    if (unique <= lastUniqueNumber)
      unique = lastUniqueNumber + 1;
    lastUniqueNumber = unique;
    return unique;
  }
}

Within a Java application this above method returns with unique IDs, up to 1000 per second on average (and it can perform way more than 1000 times per second).

On consequential executions of applications on the same machine one would still expect unique values due to the time-related nature of values. However, computer time changes. It’s possible that System.currentTimeMillis() would return a value already used in the past.

And, what about two processes running on the same machine at the same time? Or on different machine?

Which is why I use the following combination to generate my unique IDs:

  • Server ID (much like MySQL’s server_id parameter). this could be the last byte in the server’s IP address, or just 4 or 5 bits if not too many players are expected.
  • Process ID (plain old pid) which I pass to the Java runtime in the form of system properties. Any two processes running on the same machine are assured to have different IDs. Two consequently spawned processes will have different IDs. The time it would take to cycle the process IDs is way more than would make for a “time glitch” problem as described above
  • Current time in milliseconds.

If you have to have everything withing 64 bit (BIGINT) then you’ll have to do bit manipulation, and drop some of the MSB on the milliseconds so as to overwrite with server & process IDs.

If you are willing to have your IDs unique in the bounds of a given time frame (so, for example, a month from now you wouldn’t mind reusing old IDs), then the problem is significantly easier. You may just use “day of month” and “millis since day start” and save those precious bits.

Still other?

Please share your solutions below!

]]>
https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools/feed 7 3283
Using memcached functions for MySQL; an automated alternative to Query Cache https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache#comments Mon, 15 Dec 2008 05:56:14 +0000 https://shlomi-noach.github.io/blog/?p=89 There’s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. TangentOrg have developed a memcached client in the form of MySQL UDFs (User Defined Functions).

I wish to discuss the memcached functions for MySQL: if and how they should be used.

Disclaimer: I do not work with memcached functions for MySQL on a production system; all that is written here reflects my opinion on how things should be done.

With memcached functions for MySQL, we can do the following:

SELECT memc_set('mykey', 'The answer is 42');
SELECT memc_get('mykey');

(See my previous post on how to install memcached functions for MySQL).

In what scenario should we use these functions?

I believe memcached is the right tool for the application level. I am less enthusiastic about using it from MySQL. Sure, pushing it down to MySQL centralizes everything. Instead of having all my application code (PHP, Java etc.) access memcached separately, they can all access one single MySQL node, which gets to access memcached. I see two problems with this approach:

  • Doing this adds load on the database. I think the greatest advantage of memcached is that it allows us to alleviate load from the database. By pushing everything into MySQL we counter that benefit. We pay here both for loading the MySQL network and for the CPU consumed by MySQL to do the job. In a distributed application which used memcached, every server gets to take some of the load.
  • It seems to me as a flawed design. The database should be at an end point, and should not rely on anything except the operating system, file system and network. Sure, there could be applications talking to the database, but the database should be able to work all by itself. By putting memcached behind the database, we make the database dependent upon an external application.

How about memcached increments?

memcached provides an increment mechanism, which can be used by MySQL to create distinct PRIMARY KEYs, like sequences in other databases. While this seems attractive, this feature fits most into the second point above: it makes MySQL completely dependant on memcached. So if memcached is down, MySQL is unable to generate keys.

memcahced invalidation

I believe a very good use would be to let MySQL invalidate cached data. Not set or get anything, just invalidate. To explain, let’s compare with MySQL’s query cache. I’ll be using MySQL’s world database.

It is a known issue with the query cache, that if you change (INSERT/UPDATE/DELETE) data within a certain table, all queries involved with that table are invalidated. Take a look at the following:

SELECT * FROM City WHERE CountryCode='BLZ';
UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';
SELECT * FROM City WHERE CountryCode='BLZ';

The UPDATE does not affect the results for the SELECT query. Nevertheless, the second SELECT does not return from the query cache, since it’s invalidated by the UPDATE.

memcached can be used to solve this problem in a programmatic way. Let’s look at a short python program: memcached_test.py. What is does (see blue highlighted rows) is connect to memcached; connect to MySQL, and try to get the results for following from memcached:

SELECT * FROM City WHERE CountryCode='BLZ';
SELECT * FROM City WHERE CountryCode='CHE';

If these results are in memcached, they are returned immediately. If not, they are retrieved from MySQL, then inserted into memcached. The results for ‘CHE’ are under the ‘City:CHE’ key, and ‘BLZ’ is under ‘City:BLZ’.

import MySQLdb
import memcache

def select_cities_by_country(country_code):
	key = "City:"+country_code
	cities = memcache_client.get(key)
	if cities:
		found_in_memcached = True
	else:
		cursor = conn.cursor()
		cursor.execute("""
			SELECT Name, CountryCode,
			Population FROM City
			WHERE CountryCode=%s""",
				country_code)
		cities = cursor.fetchall()
		memcache_client.set(key, cities, 100)
		cursor.close()
		found_in_memcached = False
	for row in cities:
		print "%s, %s: %d" % (row[0], row[1], row[2])
	print "%s found in memcached? %s\n" % (
                country_code, found_in_memcached)

conn = None
try:
	try:
		conn = MySQLdb.connect(
			host="localhost",
                        user="myuser",
			passwd="mypassword",
			unix_socket="/tmp/mysql.sock",
                        db="world")
		memcache_client = memcache.Client(["127.0.0.1:11211"])

		select_cities_by_country("BLZ");
		select_cities_by_country("CHE");
	except Exception, err:
		print err
finally:
	if conn:
		conn.close()

Let’s run this program. This is a first time run, so obviously nothing is in memcached:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? False

Zurich, CHE: 336800
Geneve, CHE: 173500
Basel, CHE: 166700
Bern, CHE: 122700
Lausanne, CHE: 114500
CHE found in memcached? False

Immediately executed again, we get results from memcached:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? True

Zurich, CHE: 336800
Geneve, CHE: 173500
Basel, CHE: 166700
Bern, CHE: 122700
Lausanne, CHE: 114500
CHE found in memcached? True

We are going to execute the following query:

UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';

But nothing as yet will invalidate our memcached values. Let’s set up TRIGGERs on the City table:

DELIMITER $$

DROP TRIGGER IF EXISTS City_AI $$
CREATE TRIGGER City_AI AFTER INSERT ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',NEW.CountryCode)) INTO @discard;
END;
$$

DROP TRIGGER IF EXISTS City_AU $$
CREATE TRIGGER City_AU AFTER UPDATE ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',OLD.CountryCode)) INTO @discard;
  SELECT memc_delete(CONCAT('City:',NEW.CountryCode)) INTO @discard;
END;
$$

DROP TRIGGER IF EXISTS City_AD $$
CREATE TRIGGER City_AD AFTER DELETE ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',OLD.CountryCode)) INTO @discard;
END;
$$

DELIMITER ;

These triggers will cause any change to a city invalidates all cities in the same country. Naive? Far less than MySQL’s query cache. Let’s put this to the test:

mysql> UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5  Changed: 5  Warnings: 0

And run out python program one last time:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? True

Zurich, CHE: 336801
Geneve, CHE: 173501
Basel, CHE: 166701
Bern, CHE: 122701
Lausanne, CHE: 114501
CHE found in memcached? False

Right! The ‘CHE’ values were invalidated, and could not be found in memcaches. ‘BLZ’, however, wasn’t disturbed.

We can further improve our invalidation mechanism to check only for changes for desired columns. This will require some more code in our triggers.

Notes

The triggers themselves pose a performance penalty on our code. It is assumed that SELECTs are more important here, or else we would not use caching at all. At any case, the example provided here has not been benchmarked, and its value can only be estimated in your real life situation.

Conclusion

I believe invalidation is the most interesting part of memcached functions for MySQL. It makes the most sense:

  • No data passes between MySQL and memcached.
  • The application isn’t even aware that MySQL is talking to memcached. MySQL does everything internally using triggers.
  • MySQL does not depend on memcached. If memcached goes away, the triggers will simply have no effect. It is still possible that due to temporary network failure, an invalidation is skipped. But memcached supports us by adding a timeout for cached values, so we have some kind of “backup plan”.

Please share below your insights and real life experience with memcached functions for MySQL.

]]>
https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache/feed 1 89