{"id":188,"date":"2008-11-26T08:47:12","date_gmt":"2008-11-26T06:47:12","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=188"},"modified":"2009-08-26T09:33:06","modified_gmt":"2009-08-26T07:33:06","slug":"useful-database-analysis-queries-with-information_schema","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/useful-database-analysis-queries-with-information_schema","title":{"rendered":"Useful database analysis queries with INFORMATION_SCHEMA"},"content":{"rendered":"<p>A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.<\/p>\n<p>I will present queries for:<\/p>\n<ul>\n<li>Checking on database engines and size<\/li>\n<li>Locating duplicate and redundant indexes<\/li>\n<li>Checking on character sets for columns and tables, looking for variances<\/li>\n<li>Checking on processes and long queries (only with MySQL 5.1)<!--more--><\/li>\n<\/ul>\n<h4>Dimensions<\/h4>\n<p>The following query returns the total size per engine per database. For example, it is common that in a given database, all tables are InnoDB. But once in a while, and even though default-engine is set to InnoDB, someone creates a MyISAM table. This may break transactional behavior, or may cause a <code>mysqldump --single-transaction<\/code> to be ineffective.<\/p>\n<p><em>See aggregated size per schema per engine:<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>TABLE_SCHEMA, ENGINE, <strong>COUNT<\/strong>(*) <strong>AS <\/strong>count_tables,\r\n  <strong>SUM<\/strong>(DATA_LENGTH+INDEX_LENGTH) <strong>AS <\/strong>size,\r\n<strong>SUM<\/strong>(INDEX_LENGTH) <strong>AS <\/strong>index_size<strong> FROM <\/strong>INFORMATION_SCHEMA.TABLES\r\n<strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN<\/strong> ('mysql', 'INFORMATION_SCHEMA')\r\n  <strong>AND <\/strong>ENGINE <strong>IS NOT NULL GROUP BY<\/strong> TABLE_SCHEMA, ENGINE<\/pre>\n<\/blockquote>\n<p>Result example:<\/p>\n<blockquote>\n<pre>+--------------+--------+--------------+----------+------------+\r\n| TABLE_SCHEMA | ENGINE | count_tables | size\u00a0\u00a0\u00a0\u00a0 | index_size |\r\n+--------------+--------+--------------+----------+------------+\r\n| test\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | InnoDB |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3 | 12140544 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n| world\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | InnoDB |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0 4734976 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n| world\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | MyISAM |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 | 10665303 |\u00a0\u00a0\u00a0 4457472 |\r\n+--------------+--------+--------------+----------+------------+<\/pre>\n<\/blockquote>\n<p>I may not have intended to, but it seems I have both MyISAM and InnoDB tables in the world database.<\/p>\n<p>The index_size may be important with MyISAM when estimating the desired key_buffer_size.<\/p>\n<p><em>See per table size (almost exactly as presented in INFORMATION_SCHEMA):<\/em><\/p>\n<blockquote>\n<pre>SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, \r\n SUM(DATA_LENGTH+INDEX_LENGTH) AS size,\r\nSUM(INDEX_LENGTH) AS index_size FROM INFORMATION_SCHEMA.TABLES\r\nWHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')\r\n AND ENGINE IS NOT NULL GROUP BY TABLE_SCHEMA, TABLE_NAME<\/pre>\n<\/blockquote>\n<h4>Indexes<\/h4>\n<p>We will now turn to check for duplicate or redundant indexes. We begin by presenting the following table:<\/p>\n<blockquote>\n<pre>mysql&gt; show create table City \\G\r\n*************************** 1. row ***************************\r\n       Table: City\r\nCreate Table: CREATE TABLE `City` (\r\n  `ID` int(11) NOT NULL auto_increment,\r\n  `Name` char(35) character set utf8 NOT NULL default '',\r\n  `CountryCode` char(3) NOT NULL default '',\r\n  `District` char(20) NOT NULL default '',\r\n  `Population` int(11) NOT NULL default '0',\r\n  PRIMARY KEY  (`ID`),\r\n  UNIQUE KEY `ID` (`ID`),\r\n  KEY `Population` (`Population`),\r\n  KEY `Population_2` (`Population`,`CountryCode`)\r\n) ENGINE=MyISAM AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1<\/pre>\n<\/blockquote>\n<p>We can see that the Population_2 index covers the Population index, so the latter is redundant and should be removed. We also see that the ID index is redundant, since there is a PRIMARY KEY on ID, which is in itself a unique key. How can we test such cases by querying the INFORMATION_SCHEMA? Turns out we can do that using the STATISTICS table.<\/p>\n<p>[Update: thanks to Roland Bouman&#8217;s comments. The following queries only consider BTREE indexes, and do not verify FULLTEXT or HASH indexes]<\/p>\n<p><em>See if some index is a prefix of another (in which case it is redundant):<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>* <strong>FROM <\/strong>(\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,\r\n    <strong>GROUP_CONCAT<\/strong>(COLUMN_NAME <strong>ORDER BY<\/strong> SEQ_IN_INDEX) <strong>AS <\/strong>columns\r\n  <strong>FROM <\/strong>`information_schema`.`STATISTICS`\r\n  <strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN<\/strong> ('mysql', 'INFORMATION_SCHEMA')\r\n    <strong>AND <\/strong>NON_UNIQUE = 1 <strong>AND <\/strong>INDEX_TYPE='BTREE' <strong>\r\n  GROUP BY<\/strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME\r\n) <strong>AS <\/strong>i1 <strong>INNER JOIN<\/strong> (\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,\r\n    <strong>GROUP_CONCAT<\/strong>(COLUMN_NAME <strong>ORDER BY<\/strong> SEQ_IN_INDEX) <strong>AS <\/strong>columns\r\n  <strong>FROM <\/strong>`information_schema`.`STATISTICS`\r\n  <strong>WHERE <\/strong>INDEX_TYPE='BTREE' <strong>\r\n  GROUP BY<\/strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME\r\n) <strong>AS <\/strong>i2\r\n<strong>USING <\/strong>(TABLE_SCHEMA, TABLE_NAME)\r\n<strong>WHERE <\/strong>i1.columns != i2.columns <strong>AND LOCATE<\/strong>(<strong>CONCAT<\/strong>(i1.columns, ','), i2.columns) = 1<\/pre>\n<\/blockquote>\n<p>The above query lists pairs of indexes in which one of them is a true prefix of the other. I&#8217;m using <code><strong>GROUP_CONCAT<\/strong>(COLUMN_NAME <strong>ORDER BY<\/strong> SEQ_IN_INDEX)<\/code> to aggregate columns per index, by order of appearance in that index.<br \/>\nThe query only considers cases when the prefix (the &#8220;shorter&#8221;) index is non-unique. Else wise there is no redundancy, as the uniqueness of the index imposes a constraint which is not achieved by the &#8220;longer&#8221; index.<\/p>\n<p>Result example:<\/p>\n<blockquote>\n<pre>+--------------+------------+------------+------------+--------------+------------------------+\r\n| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns\u00a0\u00a0\u00a0 | INDEX_NAME\u00a0\u00a0 | columns\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\r\n+--------------+------------+------------+------------+--------------+------------------------+\r\n| world\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | City\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Population | Population | Population_2 | Population,CountryCode |\r\n+--------------+------------+------------+------------+--------------+------------------------+<\/pre>\n<\/blockquote>\n<p><em>See if any two indexes are identical:<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>* <strong>FROM <\/strong>(\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,\r\n    <strong>GROUP_CONCAT<\/strong>(COLUMN_NAME <strong>ORDER BY<\/strong> SEQ_IN_INDEX) <strong>AS <\/strong>columns, NON_UNIQUE\r\n  <strong>FROM <\/strong>`information_schema`.`STATISTICS`\r\n  <strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN <\/strong>('mysql', 'INFORMATION_SCHEMA')\r\n  <strong>AND<\/strong> INDEX_TYPE='BTREE'\r\n  <strong>GROUP BY<\/strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME\r\n) <strong>AS <\/strong>i1 <strong>INNER JOIN<\/strong> (\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,\r\n    <strong>GROUP_CONCAT<\/strong>(COLUMN_NAME <strong>ORDER BY<\/strong> SEQ_IN_INDEX) <strong>AS <\/strong>columns, NON_UNIQUE\r\n  <strong>FROM <\/strong>`information_schema`.`STATISTICS`\r\n  <strong>WHERE <\/strong>INDEX_TYPE='BTREE'\r\n \u00a0<strong>GROUP BY<\/strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME\r\n) <strong>AS <\/strong>i2\r\n<strong>USING <\/strong>(TABLE_SCHEMA, TABLE_NAME)\r\n<strong>WHERE <\/strong>i1.columns = i2.columns <strong>AND <\/strong>i1.NON_UNIQUE = i2.NON_UNIQUE\r\n  <strong>AND <\/strong>i1.INDEX_NAME &lt; i2.INDEX_NAME<\/pre>\n<\/blockquote>\n<p>The above checks for unique or non-unique indexes alike. It checks for indexes with identical columns list (and in the same order, of course). Any two indexes having the same list of columns imply a redundancy. If both are unique or both are non-unique, either can be removed. If one is unique and the other is not, the non-unique index should be removed.<\/p>\n<p>Result example:<\/p>\n<blockquote>\n<pre>+--------------+------------+------------+---------+------------+------------+---------+------------+\r\n| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns | NON_UNIQUE | INDEX_NAME | columns | NON_UNIQUE |\r\n+--------------+------------+------------+---------+------------+------------+---------+------------+\r\n| world\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | City\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | PRIMARY\u00a0\u00a0\u00a0 | ID\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 | ID\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | ID\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 |\r\n+--------------+------------+------------+---------+------------+------------+---------+------------+<\/pre>\n<\/blockquote>\n<p>You may also wish to take a look at the excellent mk-duplicate-key-checker, a <a title=\"maatkit\" href=\"http:\/\/www.maatkit.org\/\">maatkit<\/a> utility by <a title=\"Xaprb\" href=\"http:\/\/www.xaprb.com\/blog\/\">Baron Schwartz<\/a>.<\/p>\n<h4>Character sets<\/h4>\n<p><em>Show the character sets for all tables:<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME, TABLE_COLLATION\r\n<strong>FROM <\/strong>INFORMATION_SCHEMA.TABLES\r\n<strong>INNER JOIN<\/strong> INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY\r\n  <strong>ON <\/strong>(TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)\r\n<strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN<\/strong> ('mysql', 'INFORMATION_SCHEMA')<\/pre>\n<\/blockquote>\n<p>Surprisingly, the TABLES table does not include the character set for the table, only the collation, so we must join with COLLATION_CHARACTER_SET_APPLICABILITY to get the character set for that collation. Yes, it&#8217;s more normalized this way, but INFORMATION_SCHEMA is not too normalized anyway.<\/p>\n<p><em>See all the textual columns, along with their character sets:<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME\r\n<strong>FROM <\/strong>INFORMATION_SCHEMA.COLUMNS\r\n<strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN<\/strong> ('mysql', 'INFORMATION_SCHEMA')\r\n  <strong>AND <\/strong>CHARACTER_SET_NAME <strong>IS NOT NULL<\/strong>\r\n<strong>ORDER BY<\/strong> TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME<\/pre>\n<\/blockquote>\n<p><em>See those columns for which the character set or collation is different from the table&#8217;s character set and collation:<\/em><\/p>\n<blockquote>\n<pre><strong>SELECT <\/strong>columns.TABLE_SCHEMA, columns.TABLE_NAME, COLUMN_NAME,\r\n  CHARACTER_SET_NAME <strong>AS <\/strong>column_CHARSET,\r\n  COLLATION_NAME <strong>AS <\/strong>column_COLLATION,\r\n  table_CHARSET, TABLE_COLLATION\r\n<strong>FROM <\/strong>(\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME\r\n  <strong>FROM <\/strong>INFORMATION_SCHEMA.COLUMNS\r\n  <strong>WHERE <\/strong>TABLE_SCHEMA <strong>NOT IN<\/strong> ('mysql', 'INFORMATION_SCHEMA')\r\n    <strong>AND <\/strong>CHARACTER_SET_NAME <strong>IS NOT NULL<\/strong>\r\n) <strong>AS <\/strong>columns <strong>INNER JOIN <\/strong>(\r\n  <strong>SELECT <\/strong>TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME <strong>AS <\/strong>table_CHARSET, TABLE_COLLATION\r\n  <strong>FROM <\/strong>INFORMATION_SCHEMA.TABLES\r\n  <strong>INNER JOIN<\/strong> INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY\r\n    <strong>ON <\/strong>(TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)\r\n) <strong>AS <\/strong>tables\r\n<strong>ON <\/strong>(columns.TABLE_SCHEMA = tables.TABLE_SCHEMA <strong>AND <\/strong>columns.TABLE_NAME = tables.TABLE_NAME)\r\n<strong>WHERE <\/strong>(columns.CHARACTER_SET_NAME != table_CHARSET <strong>OR <\/strong>columns.COLLATION_NAME != TABLE_COLLATION)\r\n<strong>ORDER BY<\/strong> TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME<\/pre>\n<\/blockquote>\n<p>Result example:<\/p>\n<blockquote>\n<pre>+--------------+------------+-------------+----------------+------------------+---------------+-------------------+\r\n| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | column_CHARSET | column_COLLATION | table_CHARSET | TABLE_COLLATION   |\r\n+--------------+------------+-------------+----------------+------------------+---------------+-------------------+\r\n| world        | City       | Name        | utf8           | utf8_general_ci  | latin1        | latin1_swedish_ci |\r\n+--------------+------------+-------------+----------------+------------------+---------------+-------------------<\/pre>\n<\/blockquote>\n<h4>Processes (MySQL 5.1)<\/h4>\n<p>With MySQL 5.1 comes a boost to INFORMATION_SCHEMA. Among the new tables we can find the PROCESSLIST table, as well as GLOBAL_VARIABLES and GLOBAL_STATUS. Together with the new Event Scheduler, it seems the sky is the limit.<\/p>\n<p><em>See which processes are active:<\/em><\/p>\n<blockquote>\n<pre>SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep'<\/pre>\n<\/blockquote>\n<p><em>Show slow queries:<\/em><\/p>\n<blockquote>\n<pre>SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep' AND TIME &gt; 4<\/pre>\n<\/blockquote>\n<p><em>How many processes per user?<\/em><\/p>\n<blockquote>\n<pre>SELECT USER, COUNT(*) FROM information_schema.PROCESSLIST GROUP BY USER<\/pre>\n<\/blockquote>\n<p><em>How many processes per host?<\/em><\/p>\n<blockquote>\n<pre>SELECT SUBSTR(HOST, 1, LOCATE(':',HOST)-1) AS hostname, COUNT(*)\r\nFROM information_schema.PROCESSLIST GROUP BY hostname<\/pre>\n<\/blockquote>\n<p>Along with the Event Scheduler, a stored procedure may decide to KILL processes executing for more than 10 minutes, KILL users who have too many connections, perform some logging on connections and more.<\/p>\n<h4>Conclusion<\/h4>\n<p>I have presented what I think is a set of useful queries. When I approach a new database I use these to get an overall understanding of what&#8217;s in it. Finding duplicate indexes can explain a lot about how the designers or developers <em>think<\/em> the database should behave. Looking at the non-default character sets shows if textual columns have been carefully designed or not. For example, querying for non-default columns characters and getting no results may imply that many textual columns have improper character sets.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.<\/p>\n<p>I will present queries for:<\/p>\n<p>    * Checking on database engines and size<br \/>\n    * Locating duplicate and redundant indexes<br \/>\n    * Checking on character sets for columns and tables, looking for variances<br \/>\n    * Checking on processes and long queries<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[25,26,24,21,20],"class_list":["post-188","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-analysis","tag-indexing","tag-information_schema","tag-sql","tag-syntax"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-32","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":43,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"predecessor-version":[{"id":258,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/188\/revisions\/258"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}