Less known SQL syntax and functions in MySQL

“Standard SQL” is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.

Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE, are commonly used. Here’s a list of some MySQL deviations to SQL, which are not so well known.

I’ll be using MySQL’s world database for demonstration.

GROUP_CONCAT

Assume the following query: SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode, which selects the number of cities per country, using MySQL’s world database. It is possible to get a name for one “sample” city per country using standard SQL: SELECT CountryCode, Name, COUNT(*) FROM City GROUP BY CountryCode

But in MySQL it is also possible to get the list of cities per group: SELECT CountryCode, GROUP_CONCAT(Name), COUNT(*) FROM City GROUP BY CountryCode. This will provide with a comma delimited string of all city names per country.

ORDER BY NULL

If you ran the previous queries, you may have noticed that the results were ordered by CountryCode. MySQL’s default behavior when GROUP BY is used, is to order by the grouped column. But this means sorting is required, possibly using merge passes and temporary tables. MySQL accepts the following syntax:

SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode ORDER BY NULL

If you EXPLAIN the query, you’ll see no “Using filesort”. When not using ORDER BY NULL, “Using filesort” appears.

ALTER TABLE … ORDER BY

MyISAM tables are not clustered. The table data is independent of indexes. Depending on concurrent_insert settings, new rows are either appended to the end of the table, or fill the space previously occupied by DELETEd rows.

When you SELECT (*) FROM Country, the order of rows is as stored on disk. It is possible to do a one-time reordering of rows in a MyISAM table by executing: ALTER TABLE Country ORDER BY Code. This is a lengthy operation (on large tables), which locks the table, so take care when using it. The change does not last for long, either: as you INSERT new rows, the rows get out of order again. But if your table does not get modified, or only gets modified rarely, this is a nice trick to use when order of rows is important, and you don’t want to pay the price of sorting per query.

ROW_COUNT()

Anyone who uses MySQL with a connector (say, Connector/J with JDBC), knows that INSERT, DELETE and UPDATE statements return with an integer value: the number of modified rows. In MySQL, the explicit way to get the number of modified rows is to invoke SELECT ROW_COUNT() right after your query. This method is useful if you like to know whether your DELETE did in fact remove rows, or INSERT IGNORE did in fact add a row, etc.

LIMIT

Well, MySQL DBAs are familiar with it. I just thought I’d mention LIMIT, since it’s a MySQL deviation. I was surprised to find that out, when an Oracle DBA once asked me how I did paging with results. “You mean like LIMIT 60,10?” I asked, and he replied: “LIMIT??”. So, you can LIMIT to limit the number of results, like: SELECT * FROM Country LIMIT 10, to only get first 10 rows, or to do paging like: SELECT * FROM Country LIMIT 60,10, which skips 60 rows, then reads 10.

SQL_CALC_FOUND_ROWS, FOUND_ROWS()

While at it, it may be required to use LIMIT to only return 10 rows, but still ask MySQL how many rows there really were. Do it like this:

SELECT SQL_CALC_FOUND_ROWS Code, Name FROM Country LIMIT 10;

SELECT FOUND_ROWS();

First query gives the required 10 results. Second query says “239”, which is the total rows I would get had I not used LIMIT. Note that a SELECT SQL_CALC_FOUND_ROWS is a “heavy” query, which actually searches through the entire rowset, and then only returns the LIMITed rows. Use with care.

PROCEDURE ANALYSE

A very nice diagnostic tool, which tells us what data types are proper based on existing data. If we have an INT column, but all values are smaller than 200, PROCEDURE_ANALYSE() recommends that we use a TINYINT. Usage: SELECT * FROM Country PROCEDURE ANALYSE(10,10). Just remember it does not anticipate data growth. It only relies on current data.

INSERT IGNORE

OK, I said above that it is commonly used, but couldn’t help myself, it’s just too useful to leave out. INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau') will silently abort if there’s a UNIQUE KEY on `id` and an existing id=1000 value. A normal INSERT will terminate with an error, or raise an Exception in your application’s code. It is of particular use when doing an extended INSERT: INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau'), (9009, 'Wanaka') may have trouble with the first row, but will insert the second row. ROW_COUNT() can tell me how well it went.

15 thoughts on “Less known SQL syntax and functions in MySQL”

shlomi says:

December 17, 2008 at 7:25 am

Hi Jason,

I’ve seen a very bad effect of REPLACE INTO in a production system. The overhead of DELETEing a row, then INSERTing a new one is huge, especially on InnoDB tables. Changing the code to INSERT INTO…ON DUPLICATE KEY reduced the load average on the db machine by factors.
Kannan says:

December 19, 2008 at 8:35 am

Good Post for MySQL.
Tim says:

January 19, 2010 at 9:24 pm

Your Oracle DBA was shocked that you’re using “LIMIT” for pagination because that syntax is a MySQL extension. The standard SQL way would be “LIMIT 10 OFFSET 60”.

MySQL also supports OFFSET, and it makes it clearer which number is which, so I’m not sure why you’d ever use “LIMIT a,b”.
shlomi says:

January 20, 2010 at 8:56 am

@Tim,

There’s some contradiction in your comment, I think. AFAIK, there is no LIMIT in ANSI SQL. Therefore, it is purely a MySQL extension. Thus, there is no “standard SQL way” for doing LIMIT.

Using one syntax over the other is a matter of preference and, in my case, and old habit. I agree using LIMIT with OFFSET is more verbose.

Regards
Pingback: Discovery of the day: GROUP BY DESC | code.openark.org

This site uses Akismet to reduce spam. Learn how your comment data is processed.