The short version
I have it all working. It’s kind of magic. But there are issues, and I’m not sure it should even exist, and am looking for input.
The long version
In Auto caching tables I presented with a hack which allows getting cached or fresh results via a simple SELECT queries.
The drive for the above hack was INFORMATION_SCHEMA tables. There are two major problems with INFORMATION_SCHEMA:
- Queries on schema-oriented tables such as TABLES, COLUMNS, STATISTICS, etc. are heavyweight. How heavyweight? Enough to make a lockdown of your database. Enough to crash down your database in some cases.
- The data is always generated on-the-fly, as you request it. Query the COLUMNS table twice, and risk two lockdowns of your database.
The auto-cache mechanism solves issue #2. I have it working, time based. I have an auto-cache table for each of the INFORMATION_SCHEMA heavyweight tables. Say, every 30 minutes the cache is invalidated. Throughout those 30 minutes, you get a free pass!
The auto-cache mechanism also paves the road to solving issue #1: since it works by invoking a stored routine, I have better control of the way I read INFORMATION_SCHEMA. This, I can take advantage of INFORMATION_SCHEMA optimization. It’s tedious, but not complicated.
For example, if I wanted to cache the TABLES table, I don’t necessarily read the entire TABLES data in one read. Instead, I can iterate the schemata, get a list of table names per schema, then read full row data for these, table by table. The result? Many many more SELECTs, but more optimized, and no one-big-lock-it-all query.
And the problem is…
I have two burning problems.
- INFORMATION_SCHEMA optimization only works that much. It sometimes does not work. In particular, I’ve noticed that if you have a view which relies on another view (possibly relying on yet another view), things get out of hand. I author a monitoring tool for MySQL called mycheckpoint. It uses some fancy techniques for generating aggregated data, HTML and charts, by means of nested views. There are a few views there I can never query for in COLUMNS. It just crashes down my server. Repeatedly. And it’s a good machine with good configuration. Make that 5 machines. They all crash, repeatedly. I just can’t trust INFORMATION_SCHEMA!
- Replication: any caching table is bound to replicate. Does it make any sense to replicate cache for internal metadata? Does it make sense to query for the cached table on slave, to have it answer for master’s data? With plain old INFORMATION_SCHEMA, every server is on its own. Caching kinda works against this. Or is it fair enough, since we would usually expect master/slaves to reflect same schema structure?
I would feel much better if I could read SHOW statements with a SELECT query. Though I’ve found this nice hack, it can’t work from a stored function, only via stored procedure. So it can’t be used from within a SELECT query. I’ve been banging my head for months now, I think I gave up on this one.
Any insights are welcome!
innodb_stats_on_metadata helps a little bit
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_stats_on_metadata
http://www.mysqlperformanceblog.com/2011/12/23/solving-information_schema-slowness/
@Rob,
Thanks. Not on VIEWs, unfortunately.
Hi Shlomi,
As I understand it, Oracle/MySQL query the information_schema quite a bit with their Query Analyzer product.
I hope I am directing you to the right person, but maybe http://www.markleith.co.uk can help with that.
Do just getting the VIEWs from the INFORMATION_SCHEMA make your server crash or only in combination with the caching? In other words, are your VIEWs already dangerous for anyone to use (risking runnning an INFORMATION_SCHEMA query on the one day)?
Maybe it would be worth it to work out what exactly in the VIEWs is triggering the crash?