<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; INFORMATION_SCHEMA</title>
	<atom:link href="http://code.openark.org/blog/tag/information_schema/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Tue, 07 Sep 2010 05:53:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Verifying GROUP_CONCAT limit without using variables</title>
		<link>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables</link>
		<comments>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables#comments</comments>
		<pubDate>Thu, 10 Jun 2010 07:16:14 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Configuration]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2534</guid>
		<description><![CDATA[I have a case where I must know if group_concat_max_len is at its default value (1024), which means there are some operation I cannot work out. I&#8217;ve ranted on this here. Normally, I would simply: SELECT @@group_concat_max_len However, I am using views, where session variables are not allowed. Using a stored function can do the [...]]]></description>
			<content:encoded><![CDATA[<p>I have a case where I must know if <strong>group_concat_max_len</strong> is at its default value (<strong>1024</strong>), which means there are some operation I cannot work out. I&#8217;ve ranted on this <a href="http://code.openark.org/blog/mysql/those-oversized-undersized-variables-defaults">here</a>.</p>
<p>Normally, I would simply:</p>
<blockquote><pre class="brush: sql;">
SELECT @@group_concat_max_len
</pre>
</blockquote>
<p>However, I am using views, where session variables are not allowed. Using a stored function can <a href="http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown">do the trick</a>, but I wanted to avoid stored routines. So here&#8217;s a very simple test case: is the current <strong>group_concat_max_len</strong> long enough or not? I&#8217;ll present the long version and the short version.</p>
<h4>The long version</h4>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  )
FROM
  INFORMATION_SCHEMA.COLLATIONS;
</pre>
</blockquote>
<p>If the result is <strong>1024</strong>, we are in a bad shape. I happen to know that the total length of collation names is above <strong>1800</strong>, and so it is trimmed down. Another variance of the above query would be:<span id="more-2534"></span></p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  ) = SUM(CHAR_LENGTH(COLLATION_NAME))
    AS group_concat_max_len_is_long_enough
FROM
  INFORMATION_SCHEMA.COLLATIONS;

+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The <strong>COLLATIONS</strong>, <strong>CHARACTER_SETS</strong> or <strong>COLLATION_CHARACTER_SET_APPLICABILITY</strong> tables provide with known to exist variables (assuming you did not compile MySQL with particular charsets). It&#8217;s possible to <strong>CONCAT</strong>, <strong>UNION</strong> or <strong>JOIN</strong> columns and tables to detect longer than <strong>1800</strong> characters in <strong>group_concat_max_len</strong>. I admit this is becoming ugly, so let&#8217;s move on.</p>
<h4>The short version</h4>
<p>Don&#8217;t want to rely on existing tables? Not sure what values to expect? Look at this:</p>
<blockquote><pre class="brush: sql;">
SELECT CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 1025))) FROM DUAL
</pre>
</blockquote>
<p><strong>GROUP_CONCAT</strong> doesn&#8217;t really care about the number of rows. In the above example, I&#8217;m using a single row (retrieved from the <strong>DUAL</strong> virtual table), making sure it is long enough. Type in any number in place of <strong>1025</strong>, and you have a metric for your <strong>group_concat_max_len</strong>.</p>
<blockquote><pre class="brush: sql;">
SELECT
  CHAR_LENGTH(GROUP_CONCAT(REPEAT('0', 32768))) &gt;= 32768 As group_concat_max_len_is_long_enough
FROM
  DUAL;
+-------------------------------------+
| group_concat_max_len_is_long_enough |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
</pre>
</blockquote>
<p>The above makes a computation with <strong>REPEAT</strong>. One can replace this with a big constant.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/verifying-group_concat-limit-without-using-variables/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>How to calculate a good InnoDB log file size &#8211; recap</title>
		<link>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap</link>
		<comments>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap#comments</comments>
		<pubDate>Tue, 20 Oct 2009 19:04:40 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=895</guid>
		<description><![CDATA[Following Baron Schwartz&#8217; post: How to calculate a good InnoDB log file size, which shows how to make an estimate for the InnoDB log file size, and based on SQL: querying for status difference over time, I&#8217;ve written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB [...]]]></description>
			<content:encoded><![CDATA[<p>Following Baron Schwartz&#8217; post: <a href="http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/">How to calculate a good InnoDB log file size</a>, which shows how to make an estimate for the InnoDB log file size, and based on <a href="http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time">SQL: querying for status difference over time</a>, I&#8217;ve written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB transaction log bytes that are expected to be written in the period of 1 hour.</p>
<p><em>Recap</em>: this information can be useful if you&#8217;re looking for a good <strong>innodb_log_file_size</strong> value, such that will not pose too much I/O (smaller values will make for more frequent flushes), not will make for a too long recovery time (larger values mean more transactions to recover upon crash).</p>
<p>It is assumed that the 60 seconds period represents an average system load, not some activity spike period. Edit the sleep time and factors as you will to sample longer or shorter periods.<span id="more-895"></span></p>
<blockquote>
<pre><strong>SELECT</strong>
  innodb_os_log_written_per_minute*60
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour,
  CONCAT(ROUND(innodb_os_log_written_per_minute*60/1024/1024, 1), 'MB')
    <strong>AS</strong> estimated_innodb_os_log_written_per_hour_mb
<strong>FROM</strong>
  (<strong>SELECT</strong> <strong>SUM</strong>(value) <strong>AS</strong> innodb_os_log_written_per_minute <strong>FROM</strong> (
    <strong>SELECT</strong> -VARIABLE_VALUE <strong>AS</strong> value
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> SLEEP(60)
      <strong>FROM</strong> DUAL
    <strong>UNION ALL</strong>
    <strong>SELECT</strong> VARIABLE_VALUE
      <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
      <strong>WHERE</strong> VARIABLE_NAME = 'innodb_os_log_written'
  ) s1
) s2
;</pre>
</blockquote>
<p>Sample output:</p>
<blockquote>
<pre>+------------------------------------------+---------------------------------------------+
| estimated_innodb_os_log_written_per_hour | estimated_innodb_os_log_written_per_hour_mb |
+------------------------------------------+---------------------------------------------+
|                                584171520 | 557.1MB                                     |
+------------------------------------------+---------------------------------------------+</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/how-to-calculate-a-good-innodb-log-file-size-recap/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL: querying for status difference over time</title>
		<link>http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time</link>
		<comments>http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time#comments</comments>
		<pubDate>Tue, 20 Oct 2009 09:31:43 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=945</guid>
		<description><![CDATA[The InnoDB plugin has a nice INFORMATION_SCHEMA concept: resetting tables. For example, the INNODB_CMP table lists information about compression operation. A similar table, INNODB_CMP_RESET, provides the same information, but resets the values. The latter can be used to measure, for example, number of compression operations over time. I wish to present a SQL trick which [...]]]></description>
			<content:encoded><![CDATA[<p>The InnoDB plugin has a nice <strong>INFORMATION_SCHEMA</strong> concept: resetting tables. For example, the <strong>INNODB_CMP</strong> table lists information about compression operation. A similar table, <strong>INNODB_CMP_RESET</strong>, provides the same information, but resets the values. The latter can be used to measure, for example, number of compression operations over time.</p>
<p>I wish to present a SQL trick which does the same, without need for resetting tables. Suppose you have some status table, and you wish to measure the change in status per second, per minute etc. The trick is to query for the value twice in the same query, with some pause in between, and make the difference calculation.</p>
<p>For sake of simplicity, I&#8217;ll demonstrate using 5.1&#8242;s <strong>INFORMATION_SCHEMA.GLOBAL_STATUS</strong>. Please refer to <a href="http://code.openark.org/blog/mysql/information_schema-global_status-watch-out">INFORMATION_SCHEMA.GLOBAL_STATUS: watch out</a> for some discussion on this.</p>
<p><span id="more-945"></span>In our example, we wish to measure the number of questions per second. Getting the number of questions is done with:</p>
<blockquote>
<pre><strong>SELECT</strong> * <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS <strong>WHERE</strong> VARIABLE_NAME = 'questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| QUESTIONS     | 3619           |
+---------------+----------------+
1 row in set (0.00 sec)</pre>
</blockquote>
<p>Applying the trick, thus solving the problem:</p>
<blockquote>
<pre><strong>SELECT</strong> <strong>SUM</strong>(value) <strong>AS</strong> questions_per_sec <strong>FROM</strong> (
  <strong>SELECT</strong> -VARIABLE_VALUE <strong>AS</strong> value
    <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
    <strong>WHERE</strong> VARIABLE_NAME = 'questions'
  <strong>UNION</strong> <strong>ALL</strong>
  <strong>SELECT</strong> SLEEP(1)
    <strong>FROM</strong> DUAL
  <strong>UNION</strong> <strong>ALL</strong>
  <strong>SELECT</strong> VARIABLE_VALUE
    <strong>FROM</strong> INFORMATION_SCHEMA.GLOBAL_STATUS
    <strong>WHERE</strong> VARIABLE_NAME = 'questions'
) s1;
+-------------------+
| questions_per_sec |
+-------------------+
|               126 |
+-------------------+
1 row in set (1.01 sec)</pre>
</blockquote>
<p>Make a one minute measurement with <strong>SLEEP(60)</strong>, then divide <strong>SUM</strong> by 60.</p>
<h4>Note on transactional tables</h4>
<p>The above trick will not work when reading values from transactional tables, and with isolation level &gt;= <strong>REPEATABLE-READ</strong>, since, by definition, you must get the same value back while in the same transaction. So this works on MyISAM, MEMORY, functions and otherwise non transactional data sources.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sql-querying-for-status-difference-over-time/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>INFORMATION_SCHEMA.GLOBAL_STATUS: watch out</title>
		<link>http://code.openark.org/blog/mysql/information_schema-global_status-watch-out</link>
		<comments>http://code.openark.org/blog/mysql/information_schema-global_status-watch-out#comments</comments>
		<pubDate>Wed, 14 Oct 2009 19:03:32 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1194</guid>
		<description><![CDATA[MySQL 5.1 boasts some new and useful INFORMATION_SCHEMA tables. Among them is the GLOBAL_STATUS table. At last, it is possible to ask questions like: node1&#62; SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'innodb_os_log_written'; +-----------------------+----------------+ &#124; VARIABLE_NAME         &#124; VARIABLE_VALUE &#124; +-----------------------+----------------+ &#124; INNODB_OS_LOG_WRITTEN &#124; 512            &#124; +-----------------------+----------------+ 1 row in set (0.00 sec) node1&#62; SELECT * [...]]]></description>
			<content:encoded><![CDATA[<p>MySQL 5.1 boasts some new and useful <strong>INFORMATION_SCHEMA</strong> tables. Among them is the <strong>GLOBAL_STATUS </strong>table.</p>
<p>At last, it is possible to ask questions like:</p>
<blockquote>
<pre>node1&gt; SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'innodb_os_log_written';
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| INNODB_OS_LOG_WRITTEN | 512            |
+-----------------------+----------------+
1 row in set (0.00 sec)

node1&gt; SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'questions';
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| QUESTIONS     | 28             |
+---------------+----------------+
1 row in set (0.00 sec)</pre>
</blockquote>
<h4>Watch out #1</h4>
<p>As with all <strong>INFORMATION_SCHEMA</strong> tables, to get a single row one needs to materialize the entire table. To ask the above two questions, the table will materialize twice. This means gathering all the information &#8212; twice. To get 20 values, we materialize the table 20 times. It not only takes time, but also increases some of the status variables themselves, like <strong>questions</strong>, <strong>select_scan</strong>, <strong>created_tmp_tables</strong>. Ironically, when we used <strong>SHOW GLOBAL STATUS</strong> and had to parse the results in our application code, we only issued the query once. But with the convenience of <strong>INFORMATION_SCHEMA</strong>, it&#8217;s much easier (and makes more sense!) to query per variable.</p>
<h4><span id="more-1194"></span>Watch out #2</h4>
<p>So if we&#8217;re to access a handful of status variables, and wish to only materialize the table once, what can we do? An easy solution is to create a <strong>MEMORY</strong> table which looks just like <strong>GLOBAL_STATUS</strong>, like this:</p>
<blockquote>
<pre>node1&gt; CREATE TABLE memory_global_status LIKE INFORMATION_SCHEMA.GLOBAL_STATUS;
Query OK, 0 rows affected (0.00 sec)
node1&gt; INSERT INTO memory_global_status SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS;
Query OK, 291 rows affected (0.01 sec)
Records: 291  Duplicates: 0  Warnings: 0</pre>
</blockquote>
<p>We can now query the <strong>memory_global_status</strong> table, having &#8216;friezed&#8217; the status, for as many times as we wish, with no real cost.</p>
<p>But let&#8217;s take a look at:</p>
<blockquote>
<pre>node1&gt; SHOW TABLE STATUS LIKE 'memory_global_status'\G
*************************** 1. row ***************************
           Name: memory_global_status
         Engine: MEMORY
        Version: 10
     Row_format: Fixed
           Rows: 291
 Avg_row_length: 3268
    Data_length: 1050624
Max_data_length: 16755036
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)</pre>
</blockquote>
<p>Ouch! How did we get <strong>Avg_row_length: 3268</strong>, and <strong>Data_length: 1050624</strong>? That&#8217;s quite more then we expected. Well, most of the values in <strong>GLOBAL_STATUS</strong> are just intgers. But some, just a few, are textual, and so the table definition is:</p>
<blockquote>
<pre>node1&gt; SHOW CREATE TABLE INFORMATION_SCHEMA.GLOBAL_STATUS \G
*************************** 1. row ***************************
       Table: GLOBAL_STATUS
Create Table: CREATE TEMPORARY TABLE `GLOBAL_STATUS` (
  `VARIABLE_NAME` varchar(64) NOT NULL DEFAULT '',
  `VARIABLE_VALUE` varchar(1024) DEFAULT NULL
) ENGINE=MEMORY DEFAULT CHARSET=utf8
1 row in set (0.00 sec)</pre>
</blockquote>
<p>A <strong>MEMORY</strong> tables works with FIXED row format, which means we need to allocate 64 utf8 characters for <strong>VARIABLE_NAME</strong>, plus 1024 utf8 characters for <strong>VARIABLE_VALUE</strong>. This makes for: (1+64*3) + 2+(1024*3) = 3267 (the missing byte is to indicate the NULLable values).</p>
<p>I&#8217;m not sure why the table definition is as such. <strong>VARIABLE_NAME</strong> can be safely declared as <strong>ascii</strong>, and, as far as I can see, so can <strong>VARIABLE_VALUE</strong>. There are a few <strong>ON</strong>/<strong>OFF</strong> values (I&#8217;ve expressed my opinion and concerns on these <a href="http://code.openark.org/blog/mysql/variables-ambiguities-in-names-and-values">here</a> and <a href="http://code.openark.org/blog/mysql/more-on-variables-ambiguities">here</a>; why not just use <strong>0</strong>/<strong>1</strong>?). <strong>SSL_CIPHER</strong> seems like the only variable which can get long enough to justify the 1024 characters.</p>
<p>If you don&#8217;t mind about truncating those text values, or don&#8217;t mind about text values at all (we usually care about the counters), you can altogether disregard them when SELECTing from <strong>GLOBAL_STATUS</strong>. One can also add a <strong>HASH</strong> index on the <strong>VARIABLE_NAME</strong> parameter to avoid using full table scans upon reading each value.</p>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 476px; width: 1px; height: 1px;">node1 [localhost] {msandbox} (test) &gt; SHOW TABLE STATUS LIKE &#8216;memory_global_status&#8217;\G<br />
*************************** 1. row ***************************<br />
Name: memory_global_status<br />
Engine: MEMORY<br />
Version: 10<br />
Row_format: Fixed<br />
Rows: 291<br />
Avg_row_length: 3268<br />
Data_length: 1050624<br />
Max_data_length: 16755036<br />
Index_length: 0<br />
Data_free: 0<br />
Auto_increment: NULL<br />
Create_time: NULL<br />
Update_time: NULL<br />
Check_time: NULL<br />
Collation: utf8_general_ci<br />
Checksum: NULL<br />
Create_options:<br />
Comment:<br />
1 row in set (0.00 sec)</div>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/information_schema-global_status-watch-out/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Useful database analysis queries with INFORMATION_SCHEMA</title>
		<link>http://code.openark.org/blog/mysql/useful-database-analysis-queries-with-information_schema</link>
		<comments>http://code.openark.org/blog/mysql/useful-database-analysis-queries-with-information_schema#comments</comments>
		<pubDate>Wed, 26 Nov 2008 06:47:12 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[INFORMATION_SCHEMA]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=188</guid>
		<description><![CDATA[A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.

I will present queries for:

    * Checking on database engines and size
    * Locating duplicate and redundant indexes
    * Checking on character sets for columns and tables, looking for variances
    * Checking on processes and long queries
]]></description>
			<content:encoded><![CDATA[<p>A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.</p>
<p>I will present queries for:</p>
<ul>
<li>Checking on database engines and size</li>
<li>Locating duplicate and redundant indexes</li>
<li>Checking on character sets for columns and tables, looking for variances</li>
<li>Checking on processes and long queries (only with MySQL 5.1)<span id="more-188"></span></li>
</ul>
<h4>Dimensions</h4>
<p>The following query returns the total size per engine per database. For example, it is common that in a given database, all tables are InnoDB. But once in a while, and even though default-engine is set to InnoDB, someone creates a MyISAM table. This may break transactional behavior, or may cause a <code>mysqldump --single-transaction</code> to be ineffective.</p>
<p><em>See aggregated size per schema per engine:</em></p>
<blockquote>
<pre><strong>SELECT </strong>TABLE_SCHEMA, ENGINE, <strong>COUNT</strong>(*) <strong>AS </strong>count_tables,
  <strong>SUM</strong>(DATA_LENGTH+INDEX_LENGTH) <strong>AS </strong>size,
<strong>SUM</strong>(INDEX_LENGTH) <strong>AS </strong>index_size<strong> FROM </strong>INFORMATION_SCHEMA.TABLES
<strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN</strong> ('mysql', 'INFORMATION_SCHEMA')
  <strong>AND </strong>ENGINE <strong>IS NOT NULL GROUP BY</strong> TABLE_SCHEMA, ENGINE</pre>
</blockquote>
<p>Result example:</p>
<blockquote>
<pre>+--------------+--------+--------------+----------+------------+
| TABLE_SCHEMA | ENGINE | count_tables | size     | index_size |
+--------------+--------+--------------+----------+------------+
| test         | InnoDB |            3 | 12140544 |          0 |
| world        | InnoDB |            1 |  4734976 |          0 |
| world        | MyISAM |            5 | 10665303 |    4457472 |
+--------------+--------+--------------+----------+------------+</pre>
</blockquote>
<p>I may not have intended to, but it seems I have both MyISAM and InnoDB tables in the world database.</p>
<p>The index_size may be important with MyISAM when estimating the desired key_buffer_size.</p>
<p><em>See per table size (almost exactly as presented in INFORMATION_SCHEMA):</em></p>
<blockquote>
<pre>SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE,
 SUM(DATA_LENGTH+INDEX_LENGTH) AS size,
SUM(INDEX_LENGTH) AS index_size FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'INFORMATION_SCHEMA')
 AND ENGINE IS NOT NULL GROUP BY TABLE_SCHEMA, TABLE_NAME</pre>
</blockquote>
<h4>Indexes</h4>
<p>We will now turn to check for duplicate or redundant indexes. We begin by presenting the following table:</p>
<blockquote>
<pre>mysql&gt; show create table City \G
*************************** 1. row ***************************
       Table: City
Create Table: CREATE TABLE `City` (
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) character set utf8 NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `ID` (`ID`),
  KEY `Population` (`Population`),
  KEY `Population_2` (`Population`,`CountryCode`)
) ENGINE=MyISAM AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1</pre>
</blockquote>
<p>We can see that the Population_2 index covers the Population index, so the latter is redundant and should be removed. We also see that the ID index is redundant, since there is a PRIMARY KEY on ID, which is in itself a unique key. How can we test such cases by querying the INFORMATION_SCHEMA? Turns out we can do that using the STATISTICS table.</p>
<p>[Update: thanks to Roland Bouman's comments. The following queries only consider BTREE indexes, and do not verify FULLTEXT or HASH indexes]</p>
<p><em>See if some index is a prefix of another (in which case it is redundant):</em></p>
<blockquote>
<pre><strong>SELECT </strong>* <strong>FROM </strong>(
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    <strong>GROUP_CONCAT</strong>(COLUMN_NAME <strong>ORDER BY</strong> SEQ_IN_INDEX) <strong>AS </strong>columns
  <strong>FROM </strong>`information_schema`.`STATISTICS`
  <strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN</strong> ('mysql', 'INFORMATION_SCHEMA')
    <strong>AND </strong>NON_UNIQUE = 1 <strong>AND </strong>INDEX_TYPE='BTREE' <strong>
  GROUP BY</strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) <strong>AS </strong>i1 <strong>INNER JOIN</strong> (
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    <strong>GROUP_CONCAT</strong>(COLUMN_NAME <strong>ORDER BY</strong> SEQ_IN_INDEX) <strong>AS </strong>columns
  <strong>FROM </strong>`information_schema`.`STATISTICS`
  <strong>WHERE </strong>INDEX_TYPE='BTREE' <strong>
  GROUP BY</strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) <strong>AS </strong>i2
<strong>USING </strong>(TABLE_SCHEMA, TABLE_NAME)
<strong>WHERE </strong>i1.columns != i2.columns <strong>AND LOCATE</strong>(<strong>CONCAT</strong>(i1.columns, ','), i2.columns) = 1</pre>
</blockquote>
<p>The above query lists pairs of indexes in which one of them is a true prefix of the other. I&#8217;m using <code><strong>GROUP_CONCAT</strong>(COLUMN_NAME <strong>ORDER BY</strong> SEQ_IN_INDEX)</code> to aggregate columns per index, by order of appearance in that index.<br />
The query only considers cases when the prefix (the &#8220;shorter&#8221;) index is non-unique. Else wise there is no redundancy, as the uniqueness of the index imposes a constraint which is not achieved by the &#8220;longer&#8221; index.</p>
<p>Result example:</p>
<blockquote>
<pre>+--------------+------------+------------+------------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns    | INDEX_NAME   | columns                |
+--------------+------------+------------+------------+--------------+------------------------+
| world        | City       | Population | Population | Population_2 | Population,CountryCode |
+--------------+------------+------------+------------+--------------+------------------------+</pre>
</blockquote>
<p><em>See if any two indexes are identical:</em></p>
<blockquote>
<pre><strong>SELECT </strong>* <strong>FROM </strong>(
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    <strong>GROUP_CONCAT</strong>(COLUMN_NAME <strong>ORDER BY</strong> SEQ_IN_INDEX) <strong>AS </strong>columns, NON_UNIQUE
  <strong>FROM </strong>`information_schema`.`STATISTICS`
  <strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN </strong>('mysql', 'INFORMATION_SCHEMA')
  <strong>AND</strong> INDEX_TYPE='BTREE'
  <strong>GROUP BY</strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) <strong>AS </strong>i1 <strong>INNER JOIN</strong> (
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, INDEX_NAME,
    <strong>GROUP_CONCAT</strong>(COLUMN_NAME <strong>ORDER BY</strong> SEQ_IN_INDEX) <strong>AS </strong>columns, NON_UNIQUE
  <strong>FROM </strong>`information_schema`.`STATISTICS`
  <strong>WHERE </strong>INDEX_TYPE='BTREE'
  <strong>GROUP BY</strong> TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) <strong>AS </strong>i2
<strong>USING </strong>(TABLE_SCHEMA, TABLE_NAME)
<strong>WHERE </strong>i1.columns = i2.columns <strong>AND </strong>i1.NON_UNIQUE = i2.NON_UNIQUE
  <strong>AND </strong>i1.INDEX_NAME &lt; i2.INDEX_NAME</pre>
</blockquote>
<p>The above checks for unique or non-unique indexes alike. It checks for indexes with identical columns list (and in the same order, of course). Any two indexes having the same list of columns imply a redundancy. If both are unique or both are non-unique, either can be removed. If one is unique and the other is not, the non-unique index should be removed.</p>
<p>Result example:</p>
<blockquote>
<pre>+--------------+------------+------------+---------+------------+------------+---------+------------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | columns | NON_UNIQUE | INDEX_NAME | columns | NON_UNIQUE |
+--------------+------------+------------+---------+------------+------------+---------+------------+
| world        | City       | PRIMARY    | ID      |          0 | ID         | ID      |          0 |
+--------------+------------+------------+---------+------------+------------+---------+------------+</pre>
</blockquote>
<p>You may also wish to take a look at the excellent mk-duplicate-key-checker, a <a title="maatkit" href="http://www.maatkit.org/">maatkit</a> utility by <a title="Xaprb" href="http://www.xaprb.com/blog/">Baron Schwartz</a>.</p>
<h4>Character sets</h4>
<p><em>Show the character sets for all tables:</em></p>
<blockquote>
<pre><strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME, TABLE_COLLATION
<strong>FROM </strong>INFORMATION_SCHEMA.TABLES
<strong>INNER JOIN</strong> INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
  <strong>ON </strong>(TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)
<strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN</strong> ('mysql', 'INFORMATION_SCHEMA')</pre>
</blockquote>
<p>Surprisingly, the TABLES table does not include the character set for the table, only the collation, so we must join with COLLATION_CHARACTER_SET_APPLICABILITY to get the character set for that collation. Yes, it&#8217;s more normalized this way, but INFORMATION_SCHEMA is not too normalized anyway.</p>
<p><em>See all the textual columns, along with their character sets:</em></p>
<blockquote>
<pre><strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
<strong>FROM </strong>INFORMATION_SCHEMA.COLUMNS
<strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN</strong> ('mysql', 'INFORMATION_SCHEMA')
  <strong>AND </strong>CHARACTER_SET_NAME <strong>IS NOT NULL</strong>
<strong>ORDER BY</strong> TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME</pre>
</blockquote>
<p><em>See those columns for which the character set or collation is different from the table&#8217;s character set and collation:</em></p>
<blockquote>
<pre><strong>SELECT </strong>columns.TABLE_SCHEMA, columns.TABLE_NAME, COLUMN_NAME,
  CHARACTER_SET_NAME <strong>AS </strong>column_CHARSET,
  COLLATION_NAME <strong>AS </strong>column_COLLATION,
  table_CHARSET, TABLE_COLLATION
<strong>FROM </strong>(
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
  <strong>FROM </strong>INFORMATION_SCHEMA.COLUMNS
  <strong>WHERE </strong>TABLE_SCHEMA <strong>NOT IN</strong> ('mysql', 'INFORMATION_SCHEMA')
    <strong>AND </strong>CHARACTER_SET_NAME <strong>IS NOT NULL</strong>
) <strong>AS </strong>columns <strong>INNER JOIN </strong>(
  <strong>SELECT </strong>TABLE_SCHEMA, TABLE_NAME, CHARACTER_SET_NAME <strong>AS </strong>table_CHARSET, TABLE_COLLATION
  <strong>FROM </strong>INFORMATION_SCHEMA.TABLES
  <strong>INNER JOIN</strong> INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
    <strong>ON </strong>(TABLES.TABLE_COLLATION = COLLATION_CHARACTER_SET_APPLICABILITY.COLLATION_NAME)
) <strong>AS </strong>tables
<strong>ON </strong>(columns.TABLE_SCHEMA = tables.TABLE_SCHEMA <strong>AND </strong>columns.TABLE_NAME = tables.TABLE_NAME)
<strong>WHERE </strong>(columns.CHARACTER_SET_NAME != table_CHARSET <strong>OR </strong>columns.COLLATION_NAME != TABLE_COLLATION)
<strong>ORDER BY</strong> TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME</pre>
</blockquote>
<p>Result example:</p>
<blockquote>
<pre>+--------------+------------+-------------+----------------+------------------+---------------+-------------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | column_CHARSET | column_COLLATION | table_CHARSET | TABLE_COLLATION   |
+--------------+------------+-------------+----------------+------------------+---------------+-------------------+
| world        | City       | Name        | utf8           | utf8_general_ci  | latin1        | latin1_swedish_ci |
+--------------+------------+-------------+----------------+------------------+---------------+-------------------</pre>
</blockquote>
<h4>Processes (MySQL 5.1)</h4>
<p>With MySQL 5.1 comes a boost to INFORMATION_SCHEMA. Among the new tables we can find the PROCESSLIST table, as well as GLOBAL_VARIABLES and GLOBAL_STATUS. Together with the new Event Scheduler, it seems the sky is the limit.</p>
<p><em>See which processes are active:</em></p>
<blockquote>
<pre>SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep'</pre>
</blockquote>
<p><em>Show slow queries:</em></p>
<blockquote>
<pre>SELECT * FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep' AND TIME &gt; 4</pre>
</blockquote>
<p><em>How many processes per user?</em></p>
<blockquote>
<pre>SELECT USER, COUNT(*) FROM information_schema.PROCESSLIST GROUP BY USER</pre>
</blockquote>
<p><em>How many processes per host?</em></p>
<blockquote>
<pre>SELECT SUBSTR(HOST, 1, LOCATE(':',HOST)-1) AS hostname, COUNT(*)
FROM information_schema.PROCESSLIST GROUP BY hostname</pre>
</blockquote>
<p>Along with the Event Scheduler, a stored procedure may decide to KILL processes executing for more than 10 minutes, KILL users who have too many connections, perform some logging on connections and more.</p>
<h4>Conclusion</h4>
<p>I have presented what I think is a set of useful queries. When I approach a new database I use these to get an overall understanding of what&#8217;s in it. Finding duplicate indexes can explain a lot about how the designers or developers <em>think</em> the database should behave. Looking at the non-default character sets shows if textual columns have been carefully designed or not. For example, querying for non-default columns characters and getting no results may imply that many textual columns have improper character sets.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/useful-database-analysis-queries-with-information_schema/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
