Auto caching INFORMATION_SCHEMA tables: seeking input

The short version

I have it all working. It’s kind of magic. But there are issues, and I’m not sure it should even exist, and am looking for input.

The long version

In Auto caching tables I presented with a hack which allows getting cached or fresh results via a simple SELECT queries.

The drive for the above hack was INFORMATION_SCHEMA tables. There are two major problems with INFORMATION_SCHEMA:

  1. Queries on schema-oriented tables such as TABLES, COLUMNS, STATISTICS, etc. are heavyweight. How heavyweight? Enough to make a lockdown of your database. Enough to crash down your database in some cases.
  2. The data is always generated on-the-fly, as you request it. Query the COLUMNS table twice, and risk two lockdowns of your database.

The auto-cache mechanism solves issue #2. I have it working, time based. I have an auto-cache table for each of the INFORMATION_SCHEMA heavyweight tables. Say, every 30 minutes the cache is invalidated. Throughout those 30 minutes, you get a free pass!

The auto-cache mechanism also paves the road to solving issue #1: since it works by invoking a stored routine, I have better control of the way I read INFORMATION_SCHEMA. This, I can take advantage of INFORMATION_SCHEMA optimization. It’s tedious, but not complicated.

For example, if I wanted to cache the TABLES table, I don’t necessarily read the entire TABLES data in one read. Instead, I can iterate the schemata, get a list of table names per schema, then read full row data for these, table by table. The result? Many many more SELECTs, but more optimized, and no one-big-lock-it-all query.

And the problem is…

I have two burning problems.

  1. INFORMATION_SCHEMA optimization only works that much. It sometimes does not work. In particular, I’ve noticed that if you have a view which relies on another view (possibly relying on yet another view), things get out of hand. I author a monitoring tool for MySQL called mycheckpoint. It uses some fancy techniques for generating aggregated data, HTML and charts, by means of nested views. There are a few views there I can never query for in COLUMNS. It just crashes down my server. Repeatedly. And it’s a good machine with good configuration. Make that 5 machines. They all crash, repeatedly. I just can’t trust INFORMATION_SCHEMA!
  2. Replication: any caching table is bound to replicate. Does it make any sense to replicate cache for internal metadata? Does it make sense to query for the cached table on slave, to have it answer for master’s data? With plain old INFORMATION_SCHEMA, every server is on its own. Caching kinda works against this. Or is it fair enough, since we would usually expect master/slaves to reflect same schema structure?

I would feel much better if I could read SHOW statements with a SELECT query. Though I’ve found this nice hack, it can’t work from a stored function, only via stored procedure. So it can’t be used from within a SELECT query. I’ve been banging my head for months now, I think I gave up on this one.

Any insights are welcome!

11 thoughts on “Auto caching INFORMATION_SCHEMA tables: seeking input

  1. @Sheeri,
    With MyISAM tables, there’s the UPDATE_TIME column in TABLES which notifies when .frm was last modified. But this does not apply to InnoDB.

    Since I wish to cache any data within the TABLES, COLUMNS, etc. tables, I can’t specifically rely on this and that columns to change or not to change (very bad phrasing, but I hope you get my meaning).

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.