[Update: need to take more breaks: now NOT crashing my servers! See clarifications below]
INFORMATION_SCHEMA Optimizations are meant to make your INFORMATION_SCHEMA queries lighter and safer.
For example, if you’re going to query the COLUMNS table for just the columns of a single table, then the following:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='sakila' AND TABLE_NAME='rental'
makes for an optimization: specifying a literal on TABLE_SCHEMA avoid scanning the directories of other schemata. Specifying a literal on TABLE_NAME avoids checking up on other tables. So it’s a one-schema-one-table read operation, as opposed to “first read every single column from all and any single schema and table, then return only those I’m interested in”.
Here’s the execution plan for the above query:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: COLUMNS type: ALL possible_keys: NULL key: TABLE_SCHEMA,TABLE_NAME key_len: NULL ref: NULL rows: NULL Extra: Using where; Open_frm_only; Scanned 0 databases
What I tried to do is to read the entire COLUMNS table, one schema at a time, one table at a time. I’m good with this taking longer time.
I have a production system on which reads from COLUMNS consistently crash the servers. Well, one read at a time can’t do harm, right?
Unfortunately, as the title of this posts reveals, even sequential read of COLUMNS using INFORMATION_SCHEMA optimization does not help: a minute into the process and the client lost connection. The server crashed.
I was expecting that table locks would be released, buffers released etc. One at a time, there wouldn’t be a congestion of locks, reads, table cache suffocation etc.
Was actually having high hopes for this to succeed. I have to find a way in which INFORMATION_SCHEMA tables are not dangerous.
A few hours later, and I have both conclusions and achievements.
There are indeed memory issues with querying from INFORMATION_SCHEMA tables. I’ve found that VARCHAR(64) columns can consume 64K each: I’m reading from large tables of more than 1,000 columns each, while monitoring MySQL’s memory consumption. By dividing the increase in memory by the number of rows resulting from a query I sent, and which was for one single columns, I got an almost exact 64K value per row.
So a query on INFORMATION_SCHEMA consumes much more memory than it should. The good news is that this memory is released once the query terminates. So there is no leak into the session memory.
This is combined with a mistake of mine in the way I iterated the tables, such that the problem was amplified: I happened to query much more than I needed, and so got my query’s memory bloated. That is to say, I used the INFORMATION_SCHEMA optimizations only partly right, and so got only part of the savings it could offer me.
With better pinpointing I’m now actually able to read from COLUMNS, without crashing my servers, consistently.
I will further look into the 64K issue. That in itself still drains a lot of memory: on my mycheckpoint schema tables a singe table read means > 64MB of query memory down the drain.
The I_S can indeed be a bit dangerous. I’ve seen a number of installations on which the table_(open_)cache was set to the default and there were were about 4000 tables. To find out the amount of used data inside the ibdata1 file (no innodb_file_per_table) a query against I_S.tables was done and this gave serious issues with the production traffic/queries.
Did you file a bugreport for this issue?
@Daniel,
As my memory serves me right, I’ve seen several bug reports on this in the past (about I_S in general, I mean).
I have not filed a bug report as yet.