SQL: good comments conventions

I happened upon a customer who left me in awe and admiration. The reason: excellent comments for their SQL code.

I list four major places where SQL comments are helpful. I’ll use the sakila database. It is originally scarcely commented; I’ll present it now enhanced with comments, to illustrate.

Table definitions

The CREATE TABLE statement allows for a comment, intended to describe the nature of the table:

CREATE TABLE `film_text` (
 `film_id` smallint(6) NOT NULL,
 `title` varchar(255) NOT NULL,
 `description` text,
 PRIMARY KEY (`film_id`),
 FULLTEXT KEY `idx_title_description` (`title`,`description`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Reflection of `film`, used for FULLTEXT search.'

It’s too bad the comment’s max length is 60 characters, though. However, it’s a very powerful field.

Column definitions

One may comment particular columns: Continue reading » “SQL: good comments conventions”

SQL: forcing single row tables integrity

Single row tables are used in various cases. Such tables can be used for “preferences” or “settings”; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.

The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them to have just one row?

The half-baked solution

The common solution is to create a PRIMARY KEY and always use the same value for that key. In addition, using REPLACE or INSERT INTO ON DUPLICATE KEY UPDATE helps out in updating the row. For example:

CREATE TABLE heartbeat (
 id int NOT NULL PRIMARY KEY,
 ts datetime NOT NULL
 );

The above table definition is taken from mk-heartbeat. It should be noted that mk-heartbeat in itself does not require that the table has a single row, so it is not the target of this post. I’m taking the above table definition as a very simple example.

So, we assume we want this table to have a single row, for whatever reasons we have. We would usually do:

REPLACE INTO heartbeat (id, ts) VALUES (1, NOW());

or

INSERT INTO heartbeat (id, ts) VALUES (1, NOW()) ON DUPLICATE KEY UPDATE ts = NOW();

Why is the above a “half baked solution”? Because it is up to the application to make sure it reuses the same PRIMARY KEY value. There is nothing in the database to prevent the following: Continue reading » “SQL: forcing single row tables integrity”

Personal observation: more migrations from MyISAM to InnoDB

I’m evidencing an increase in the planning, confidence & execution for MyISAM to InnoDB migration.

How much can a single consultant observe? I agree Oracle should not go to PR based on my experience. But I find that:

  • More companies are now familiar with InnoDB than there used to.
  • More companies are interested in migration to InnoDB than there used to.
  • More companies feel such migration to be safe.
  • More companies start up with an InnoDB based solution than with a MyISAM based solution.

This is the way I see it. No doubt, the Oracle/Sun deal made its impact. The fact that InnoDB is no longer a 3rd party; the fact Oracle invests in InnoDB and no other engine (Falcon is down, no real development on MyISAM); the fact InnoDB is to be the default engine: all these put companies at ease with migration.

Continue reading » “Personal observation: more migrations from MyISAM to InnoDB”

Verifying GROUP_CONCAT limit without using variables

I have a case where I must know if group_concat_max_len is at its default value (1024), which means there are some operation I cannot work out. I’ve ranted on this here.

Normally, I would simply:

SELECT @@group_concat_max_len

However, I am using views, where session variables are not allowed. Using a stored function can do the trick, but I wanted to avoid stored routines. So here’s a very simple test case: is the current group_concat_max_len long enough or not? I’ll present the long version and the short version.

The long version

SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  )
FROM
  INFORMATION_SCHEMA.COLLATIONS;

If the result is 1024, we are in a bad shape. I happen to know that the total length of collation names is above 1800, and so it is trimmed down. Another variance of the above query would be: Continue reading » “Verifying GROUP_CONCAT limit without using variables”

Those oversized, undersized variables defaults

Some mysqld parameters are far from having reasonable defaults. Most notable are the engine-specific values, and in particular the InnoDB parameters.

Some of these variables have different defaults as of MySQL 5.4. innodb_buffer_pool_size, for example, is 128M on 5.4. innodb_log_file_size, however, has changed back and forth, as far as I understand, and is down to 5M again. These settings are still the same on 5.5.

I wish to present some not-so-obvious parameters which, in my opinion, have poor defaults, for reasons I will explain.

  • group_concat_max_len: This parameter limits the maximum text length of a GROUP_CONCAT concatenation result. It defaults to 1024. I think this is a very low value. I have been using GROUP_CONCAT more and more, recently, to solve otherwise difficult problems. And in most cases, 1024 was just too low, resulting in silent (Argh!) truncating of the result, thus returning incorrect results. It is interesting to learn that the maximum value for this parameter is limited by max_packet_size. I would suggest, then, that this parameter should be altogether removed, and have the max_packet_size limitation as the only limitation. Otherwise, I’d like it to have a very large default value, in the order of a few MB.
  • wait_timeout: Here’s a parameter whose default value is over permissive. wait_timeout enjoys an 8 hour default. I usually go for 5-10 minutes. I don’t see a point in letting idle connections waste resources for 8 hours. Applications which hold up such connections should be aware that they’re doing something wrong, in the form of a forced disconnection. Connection pools work beautifully with low settings, and can themselves do keepalives, if they choose to.
  • sql_mode: I’ve discussed this in length before. My opinion unchanged.
  • open_files_limit: What with the fact connections, threads, table descriptors, table file descriptors (depending on how you use InnoDB), temporary file tables — all are files on unix-like systems, and considering this is an inexpensive payment, I think open_files_limit should default to a few thousands. Why risk the crash of “too many open files”?

Continue reading » “Those oversized, undersized variables defaults”

mycheckpoint (rev. 132): custom monitoring, custom charts, process list dump

Revision 132 of mycheckpoint has been released. New and updated in this revision:

  • Custom monitoring: monitoring & charting for user defined queries
  • HTML reports for custom monitoring
  • Process list dump upon alert notifications

Custom monitoring & charts

Custom monitoring allows the user to supply with a query, the results of which will be monitored.

That is, mycheckpoint monitors the status variables, replication status, OS metrics. But it cannot by itself monitor one’s application. Which is why a user may supply with such query as:

SELECT COUNT(*) FROM shopping_cart WHERE is_pending=1

Such a query will tell an online store how many customers are in the midst of shopping. There is no argument that this number is worth monitoring for. Given the above query, mycheckpoint will execute it per sample, and store the query’s result along with all sampled data, to be then aggregated by complex views to answer for:

  • What was the value per given sample?
  • What is the value difference for each sample?
  • What is the change per second, i.e. the rate?

mycheckpoint goes one step forward, and explicity records another metric:

  • How much time did it take to take that sample?

Continue reading » “mycheckpoint (rev. 132): custom monitoring, custom charts, process list dump”

Choosing MySQL boolean data types

How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.

ENUM

Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,’Y’) or ENUM(‘0’, ‘1’).

This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, ‘True’, ‘False’) always yields ‘True’)
  2. There’s no real convention. Is it ‘Y’/’N’? ‘T’/’F’? ‘P’/’N’? ‘1’/’0′?

CHAR(1)

Simple again. Proposed values are, as before, ‘F’, ‘T’ etc. This time there’s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an ‘X’.

Watch out for the charset! If it’s utf8 you pay with 3 bytes instead of just 1. And, again, ‘T’, ‘F’, ‘Y’, ‘N’ values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR. Continue reading » “Choosing MySQL boolean data types”

Views: better performance with condition pushdown

Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition: Continue reading » “Views: better performance with condition pushdown”

A MyISAM backup is blocking as read-only, including mysqldump backup

Actually this is (almost) all I wanted to say. This is intentionally posted with all related keywords in title, in the hope that a related search on Google will result with this post on first page.

I’m just still encountering companies who use MyISAM as their storage engine and are unaware that their nightly backup actually blocks their application, basically rendering their product unavailable for long minutes to hours on a nightly basis.

So this is posted as a warning for those who were not aware of this fact.

There is no hot (non blocking) backup for MyISAM. Closest would be file system snapshot, but even this requires flushing of tables, which may take a while to complete. If you must have a hot backup, then either use replication – and take the risk of the slave not being in complete sync with the master – or use another storage engine, i.e. InnoDB.