Verifying GROUP_CONCAT limit without using variables

I have a case where I must know if group_concat_max_len is at its default value (1024), which means there are some operation I cannot work out. I’ve ranted on this here.

Normally, I would simply:

SELECT @@group_concat_max_len

However, I am using views, where session variables are not allowed. Using a stored function can do the trick, but I wanted to avoid stored routines. So here’s a very simple test case: is the current group_concat_max_len long enough or not? I’ll present the long version and the short version.

The long version

SELECT
  CHAR_LENGTH(
    GROUP_CONCAT(
      COLLATION_NAME SEPARATOR ''
    )
  )
FROM
  INFORMATION_SCHEMA.COLLATIONS;

If the result is 1024, we are in a bad shape. I happen to know that the total length of collation names is above 1800, and so it is trimmed down. Another variance of the above query would be: Continue reading » “Verifying GROUP_CONCAT limit without using variables”

Choosing MySQL boolean data types

How do you implement True/False columns?

There are many ways to do it, each with its own pros and cons.

ENUM

Create you column as ENUM(‘F’, ‘T’), or ENUM(‘N’,’Y’) or ENUM(‘0’, ‘1’).

This is the method used in the mysql tables (e.g. mysql.user privileges table). It’s very simple and intuitive. It truly restricts the values to just two options, which serves well. It’s compact (just one byte).

A couple disadvantages to this method:

  1. Enums are represented by numerical values (which is good) and start with 1 instead of 0. This means ‘F’ is 1, and ‘T’ is 2, and they both translate to True when directly used in a booleanic expression (e.g. IF(val, ‘True’, ‘False’) always yields ‘True’)
  2. There’s no real convention. Is it ‘Y’/’N’? ‘T’/’F’? ‘P’/’N’? ‘1’/’0′?

CHAR(1)

Simple again. Proposed values are, as before, ‘F’, ‘T’ etc. This time there’s no way to limit the range of values. You cannot (in MySQL, unless using triggers) prevent an ‘X’.

Watch out for the charset! If it’s utf8 you pay with 3 bytes instead of just 1. And, again, ‘T’, ‘F’, ‘Y’, ‘N’ values all evaluate as True. It is possible to use the zero-valued character, but it defeats the purpose of using CHAR. Continue reading » “Choosing MySQL boolean data types”

Views: better performance with condition pushdown

Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition: Continue reading » “Views: better performance with condition pushdown”

Discovery of the day: GROUP BY … DESC

I happened on a query where, by mistake, an

SELECT ... ORDER BY x DESC LIMIT 1

was written as

SELECT ... GROUP BY x DESC LIMIT 1

And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It’s documented.

In MySQL, GROUP BY results are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort order.

Proper SQL table alias use conventions

After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.

I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.

Using the sakila database, take a look at the following queries: Continue reading » “Proper SQL table alias use conventions”

Tip: faster than TRUNCATE

TRUNCATE is usually a fast operation (much faster than DELETE FROM). But sometimes it just hangs; I’ve has several such uncheerful events with InnoDB (Plugin) tables which were extensively written to. The TRUNCATE hanged; nothing else would work; minutes pass.

TRUNCATE on tables with no FOREIGN KEYs should act fast: it translate to dropping the table and creating a new one (and it all depends on the MySQL version, see the manual).

What’s faster than TRUNCATE, then? If you don’t have triggers nor FOREIGN KEYs, a RENAME TABLE can come to the rescue. Instead of:

TRUNCATE log_table

Do:

CREATE TABLE log_table_new LIKE log_table;
RENAME TABLE log_table TO log_table_old, log_table_new TO log_table;
DROP TABLE log_table_old;

I found this to work well for me. Do note that AUTO_INCREMENT values can be tricky here: the “new” table is created with an AUTO_INCREMENT value which is immediately taken in the “working” table. If you care about not using same AUTO_INCREMENT values, you can: Continue reading » “Tip: faster than TRUNCATE”

Monotonic functions, SQL and MySQL

In mathematics, a monotonic function (or monotone function) is a function which preserves the given order. [Wikipedia]

To be more precise, a function f is monotonic increasing, if for every x ≤ y it holds that f(x) ≤ f(y). f is said to be strictly monotonic increasing is for every x < y it holds that f(x) < f(y).

So, if we follow values in some order, we say that f is monotonic increasing if f‘s value never decreases (it either increases or stays the same), and we say that f is strictly increasing if f‘s value is always changes “upwards”.

Monotonic functions play an important role in SQL. To discuss monotonic functions in SQL we must first determine what the order is, and then, what the function is.

Well, they both change according to our point of view. Let’s look at some examples. Take a look at the following table: Continue reading » “Monotonic functions, SQL and MySQL”

Beware of implicit casting

Ever so often a query provides a “bad” execution plan. Adding a missing index can many times solve the problem. However, not everything can be solved with an index. I wish to highlight the point of having an implicit cast, which negates the use of an index on MySQL.

I see this happening a lot on customers’ databases, and this begs for a short introduction.

MySQL doesn’t support index functions

Let’s assume the following table: Continue reading » “Beware of implicit casting”

Checking for string permutation

A permutation is a change of places. Thus, ‘lolhe’ is a permuted ‘hello’ (commonly referred to as ‘scrambled text’).

I wish to present an SQL solution for checking if two strings are permutations of the same text.

About permutations

So, if ‘lolhe’ is a permutation of ‘hello’, then ‘hello’ is a permutation of ‘lolhe’, as well; and both are permutations of ‘elloh’. The REVERSE() of a text is an example of permutation. Mathematically, string permutation is an equivalence relation, and divides all strings to equivalence classes.

Use cases

  • We may be interested in permutations when a user chooses a password. We may disallow a password which is identical to the login name; but we may also disallow upper-lower-case-only transformations of the text. We may still disallow a permutation of the text.
  • On a slightly different scale, the two queries: SELECT * FROM City WHERE id IN (5, 21, 13) and SELECT * FROM City WHERE id IN (13, 5, 21) are identical. Here, the permutation is not with string characters, but with string tokens. While the solution discussed is targeted at string characters, it can be easily converted to work with string tokens.

Checking for permutation

The solution I’m suggesting checks for permutation between 2 strings by permuting both to a third, normal form. The two string are permutations of each other if both have the same normal form.

Continue reading » “Checking for string permutation”

Misimproving performance problems with INSERT DELAYED

INSERT DELAYED may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand.

It may even offer some performance boost, by aggregating such frequent INSERTs in a single thread.

But it is NOT a performance solution.

That is, in a case I’ve seen, database performance was poor. INSERTs were taking a very long time. Lot’s of locks were involved. The solution offered was to change all slow INSERTs to INSERT DELAYED. Voila! All INSERT queries now completed in no time.

But the database performance remained poor. Just as poor as before, with the additional headache: nobody knew what caused the low performance.

Using INSERT DELAYED to improve overall INSERT performance is like sweeping the dust under the carpet. It’s still there, only you can’t actually see it. When your queries are slow to return, you know which queries or which parts of your application are the immediate suspects. When everything happens in the background you lose that feeling.

The slow query log, fortunately, still provides with the necessary information, and all the other metrics are just as before. Good. But it now takes a deeper level of analysis to find a problem that was previously in plain sight.

So: use INSERT DELAYED carefully, don’t just throw it at your slow queries like a magic potion.