A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.
I will present queries for:
- Checking on database engines and size
- Locating duplicate and redundant indexes
- Checking on character sets for columns and tables, looking for variances
- Checking on processes and long queries (only with MySQL 5.1)
Dimensions
The following query returns the total size per engine per database. For example, it is common that in a given database, all tables are InnoDB. But once in a while, and even though default-engine is set to InnoDB, someone creates a MyISAM table. This may break transactional behavior, or may cause a mysqldump --single-transaction to be ineffective.
See aggregated size per schema per engine:
SELECT TABLE_SCHEMA, ENGINE, COUNT(*) AS count_tables,
SUM(DATA_LENGTH+INDEX_LENGTH) AS size,
SUM(INDEX_LENGTH) AS index_size FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND ENGINE IS NOT NULL GROUP BY TABLE_SCHEMA, ENGINE
Result example:
+--------------+--------+--------------+----------+------------+ | TABLE_SCHEMA | ENGINE | count_tables | size | index_size | +--------------+--------+--------------+----------+------------+ | test | InnoDB | 3 | 12140544 | 0 | | world | InnoDB | 1 | 4734976 | 0 | | world | MyISAM | 5 | 10665303 | 4457472 | +--------------+--------+--------------+----------+------------+
I may not have intended to, but it seems I have both MyISAM and InnoDB tables in the world database.
The index_size may be important with MyISAM when estimating the desired key_buffer_size.
See per table size (almost exactly as presented in INFORMATION_SCHEMA):
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE,
SUM(DATA_LENGTH+INDEX_LENGTH) AS size,
SUM(INDEX_LENGTH) AS index_size FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND ENGINE IS NOT NULL GROUP BY TABLE_SCHEMA, TABLE_NAME
Indexes
We will now turn to check for duplicate or redundant indexes. We begin by presenting the following table:
mysql> show create table City \G
*************************** 1. row ***************************
Table: City
Create Table: CREATE TABLE `City` (
`ID` int(11) NOT NULL auto_increment,
`Name` char(35) character set utf8 NOT NULL default '',
`CountryCode` char(3) NOT NULL default '',
`District` char(20) NOT NULL default '',
`Population` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID` (`ID`),
KEY `Population` (`Population`),
KEY `Population_2` (`Population`,`CountryCode`)
) ENGINE=MyISAM AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1
We can see that the Population_2 index covers the Population index, so the latter is redundant and should be removed. We also see that the ID index is redundant, since there is a PRIMARY KEY on ID, which is in itself a unique key. How can we test such cases by querying the INFORMATION_SCHEMA? Turns out we can do that using the STATISTICS table.
[Update: thanks to Roland Bouman’s comments. The following queries only consider BTREE indexes, and do not verify FULLTEXT or HASH indexes]
See if some index is a prefix of another (in which case it is redundant):
SELECT * FROM (
SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns
FROM `information_schema`.`STATISTICS`
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND NON_UNIQUE = 1 AND INDEX_TYPE='BTREE'
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i1 INNER JOIN (
SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns
FROM `information_schema`.`STATISTICS`
WHERE INDEX_TYPE='BTREE'
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i2
USING (TABLE_SCHEMA, TABLE_NAME)
WHERE i1.columns != i2.columns AND LOCATE(CONCAT(i1.columns, ','), i2.columns) = 1
The above query lists pairs of indexes in which one of them is a true prefix of the other. I’m using GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) to aggregate columns per index, by order of appearance in that index.
The query only considers cases when the prefix (the “shorter”) index is non-unique. Else wise there is no redundancy, as the uniqueness of the index imposes a constraint which is not achieved by the “longer” index.
Result example:
+--------------+------------+------------+------------+--------------+------------------------+ | TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns | INDEX_NAME | columns | +--------------+------------+------------+------------+--------------+------------------------+ | world | City | Population | Population | Population_2 | Population,CountryCode | +--------------+------------+------------+------------+--------------+------------------------+
See if any two indexes are identical:
SELECT * FROM (
SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns, NON_UNIQUE
FROM `information_schema`.`STATISTICS`
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND INDEX_TYPE='BTREE'
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i1 INNER JOIN (
SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS columns, NON_UNIQUE
FROM `information_schema`.`STATISTICS`
WHERE INDEX_TYPE='BTREE'
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS i2
USING (TABLE_SCHEMA, TABLE_NAME)
WHERE i1.columns = i2.columns AND i1.NON_UNIQUE = i2.NON_UNIQUE
AND i1.INDEX_NAME < i2.INDEX_NAME
The above checks for unique or non-unique indexes alike. It checks for indexes with identical columns list (and in the same order, of course). Any two indexes having the same list of columns imply a redundancy. If both are unique or both are non-unique, either can be removed. If one is unique and the other is not, the non-unique index should be removed.
Result example:
+--------------+------------+------------+---------+------------+------------+---------+------------+ | TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns | NON_UNIQUE | INDEX_NAME | columns | NON_UNIQUE | +--------------+------------+------------+---------+------------+------------+---------+------------+ | world | City | PRIMARY | ID | 0 | ID | ID | 0 | +--------------+------------+------------+---------+------------+------------+---------+------------+
You may also wish to take a look at the excellent mk-duplicate-key-checker, a maatkit utility by Baron Schwartz.
Character sets
Show the character sets for all tables:
SELECT TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME, TABLE_COLLATION
FROM INFORMATION_SCHEMA.TABLES
INNER JOIN INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
ON (TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
Surprisingly, the TABLES table does not include the character set for the table, only the collation, so we must join with COLLATION_CHARACTER_SET_APPLICABILITY to get the character set for that collation. Yes, it’s more normalized this way, but INFORMATION_SCHEMA is not too normalized anyway.
See all the textual columns, along with their character sets:
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND CHARACTER_SET_NAME IS NOT NULL
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
See those columns for which the character set or collation is different from the table’s character set and collation:
SELECT columns.TABLE_SCHEMA, columns.TABLE_NAME, COLUMN_NAME,
CHARACTER_SET_NAME AS column_CHARSET,
COLLATION_NAME AS column_COLLATION,
table_CHARSET, TABLE_COLLATION
FROM (
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
AND CHARACTER_SET_NAME IS NOT NULL
) AS columns INNER JOIN (
SELECT TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME AS table_CHARSET, TABLE_COLLATION
FROM INFORMATION_SCHEMA.TABLES
INNER JOIN INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
ON (TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)
) AS tables
ON (columns.TABLE_SCHEMA = tables.TABLE_SCHEMA AND columns.TABLE_NAME = tables.TABLE_NAME)
WHERE (columns.CHARACTER_SET_NAME != table_CHARSET OR columns.COLLATION_NAME != TABLE_COLLATION)
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
Result example:
+--------------+------------+-------------+----------------+------------------+---------------+-------------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | column_CHARSET | column_COLLATION | table_CHARSET | TABLE_COLLATION | +--------------+------------+-------------+----------------+------------------+---------------+-------------------+ | world | City | Name | utf8 | utf8_general_ci | latin1 | latin1_swedish_ci | +--------------+------------+-------------+----------------+------------------+---------------+-------------------
Processes (MySQL 5.1)
With MySQL 5.1 comes a boost to INFORMATION_SCHEMA. Among the new tables we can find the PROCESSLIST table, as well as GLOBAL_VARIABLES and GLOBAL_STATUS. Together with the new Event Scheduler, it seems the sky is the limit.
See which processes are active:
SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep'
Show slow queries:
SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep' AND TIME > 4
How many processes per user?
SELECT USER, COUNT(*) FROM information_schema.PROCESSLIST GROUP BY USER
How many processes per host?
SELECT SUBSTR(HOST, 1, LOCATE(':',HOST)-1) AS hostname, COUNT(*)
FROM information_schema.PROCESSLIST GROUP BY hostname
Along with the Event Scheduler, a stored procedure may decide to KILL processes executing for more than 10 minutes, KILL users who have too many connections, perform some logging on connections and more.
Conclusion
I have presented what I think is a set of useful queries. When I approach a new database I use these to get an overall understanding of what’s in it. Finding duplicate indexes can explain a lot about how the designers or developers think the database should behave. Looking at the non-default character sets shows if textual columns have been carefully designed or not. For example, querying for non-default columns characters and getting no results may imply that many textual columns have improper character sets.
Hi! Nice post! Few comments though…. In the table size query you do: WHERE TABLE_SCHEMA NOT IN (‘mysql’, ‘INFORMATION_SCHEMA’) AND ENGINE IS NOT NULL I am assuming you added ENGINE IS NOT NULL to get rid of views. Personally, I feel the WHERE clause would be better if written like so: WHERE TABLE_TYPE = ‘BASE TABLE’ This gets rid of all information schema tables (since those are all ‘SYSTEM VIEW’s). It does leave the tables in the mysql database in the result, but I consider that an advantage. Second, about your redundant indexes query: “See if some index is a… Read more »
Roland, Thanks for your comments, 1. Indeed you are right, I used the “ENGINE IS NOT NULL” to get rid of the views. I was considering reading the TABLE_TYPE, but decided to go for the NOT NULL. There are many column in the INFORMATION_SCHEMA whose values I don’t feel comfortable with (values like ‘YES’ and ‘NO’, instead of 1, 0 etc.) 2. Again, you are right. In this post I did not consider FULLTEXT and HASH keys at all. I don’t get to cross them much; I’ll review the queries. Thanks for the reference to your Redundant Index Finder. I’ll… Read more »
[…] the following post: Useful database analysis queries with INFORMATION_SCHEMA for queries which diagnose your databases character sets. tags: Data Types, SQL, Syntax MySQL | by […]