Proper SQL table alias use conventions

After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.

I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.

Using the sakila database, take a look at the following queries: Continue reading » “Proper SQL table alias use conventions”

7 ways to convince MySQL to use the right index

Sometimes MySQL gets it wrong. It doesn’t use the right index.

It happens that MySQL generates a query plan which is really bad (EXPLAIN says it’s going to explore some 10,000,000 rows), when another plan (soon to show how was generated) says: “Sure, I can do that with 100 rows using a key”.

A true story

A customer had issues with his database. Queries were taking 15 minutes to complete, and the db in general was not responsive. Looking at the slow query log, I found the criminal query. Allow me to bring you up to speed:

A table is defined like this:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT,
  type INT UNSIGNED,
  level TINYINT unsigned,
  ...
  PRIMARY KEY(id),
  KEY `type` (type)
) ENGINE=InnoDB;

The offending query was this:

SELECT id FROM data
WHERE type=12345 AND level > 3
ORDER BY id

The facts were:

  • `t` has about 10,000,000 rows.
  • The index on `type` is selective: about 100 rows per value on average.
  • The query took a long time to complete.
  • EXPLAIN has shown that MySQL uses the PRIMARY KEY, hence searches 10,000,000 rows, filtered “using where”.
  • The other EXPLAIN has shown that by using the `type` key, only 110 rows are expected, to be filtered “using where”, then sorted “using filesort”

So MySQL acknowledged it was generating the wrong plan. The other plan was better by its own standards.

Solving the problem

Let’s walk through 7 ways to solve the problem, starting with the more aggressive solutions, refining to achieve desired behavior through subtle changes. Continue reading » “7 ways to convince MySQL to use the right index”

REPLACE INTO: think twice

The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO’s. It certainly has a nicer syntax than INSERT INTO … ON DUPLICATE KEY UPDATE, and it’s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.

But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look. Continue reading » “REPLACE INTO: think twice”

MySQL’s character sets and collations demystified

MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets. Continue reading » “MySQL’s character sets and collations demystified”

Dynamic sequencing with a single query

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL’s world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That’s more of a problem. Continue reading » “Dynamic sequencing with a single query”

Selecting a specific non aggregated column data in GROUP BY

In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using MySQL’s world database, I wish to get the number of countries per continent, along with a “sample” country:

SELECT Continent, COUNT(*), Name
FROM `Country` GROUP BY Continent

+---------------+----------+----------------+
| Continent     | COUNT(*) | Name           |
+---------------+----------+----------------+
| Asia          |       51 | Afghanistan    |
| Europe        |       46 | Albania        |
| North America |       37 | Aruba          |
| Africa        |       58 | Angola         |
| Oceania       |       28 | American Samoa |
| Antarctica    |        5 | Antarctica     |
| South America |       14 | Argentina      |
+---------------+----------+----------------+

What if I want to choose that “sample” country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs. Continue reading » “Selecting a specific non aggregated column data in GROUP BY”

Useful database analysis queries with INFORMATION_SCHEMA

A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.

I will present queries for:

Less known SQL syntax and functions in MySQL

“Standard SQL” is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.

Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE, are commonly used. Here’s a list of some MySQL deviations to SQL, which are not so well known. Continue reading » “Less known SQL syntax and functions in MySQL”