“Standard SQL” is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.
Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE
, are commonly used. Here’s a list of some MySQL deviations to SQL, which are not so well known.
I’ll be using MySQL’s world database for demonstration.
GROUP_CONCAT
Assume the following query: SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode
, which selects the number of cities per country, using MySQL’s world database. It is possible to get a name for one “sample” city per country using standard SQL: SELECT CountryCode, Name, COUNT(*) FROM City GROUP BY CountryCode
But in MySQL it is also possible to get the list of cities per group: SELECT CountryCode, GROUP_CONCAT(Name), COUNT(*) FROM City GROUP BY CountryCode
. This will provide with a comma delimited string of all city names per country.
ORDER BY NULL
If you ran the previous queries, you may have noticed that the results were ordered by CountryCode. MySQL’s default behavior when GROUP BY
is used, is to order by the grouped column. But this means sorting is required, possibly using merge passes and temporary tables. MySQL accepts the following syntax:
SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode ORDER BY NULL
If you EXPLAIN
the query, you’ll see no “Using filesort”. When not using ORDER BY NULL
, “Using filesort” appears.
ALTER TABLE … ORDER BY
MyISAM tables are not clustered. The table data is independent of indexes. Depending on concurrent_insert
settings, new rows are either appended to the end of the table, or fill the space previously occupied by DELETEd rows.
When you SELECT (*) FROM Country
, the order of rows is as stored on disk. It is possible to do a one-time reordering of rows in a MyISAM table by executing: ALTER TABLE Country ORDER BY Code
. This is a lengthy operation (on large tables), which locks the table, so take care when using it. The change does not last for long, either: as you INSERT
new rows, the rows get out of order again. But if your table does not get modified, or only gets modified rarely, this is a nice trick to use when order of rows is important, and you don’t want to pay the price of sorting per query.
ROW_COUNT()
Anyone who uses MySQL with a connector (say, Connector/J with JDBC), knows that INSERT
, DELETE
and UPDATE
statements return with an integer value: the number of modified rows. In MySQL, the explicit way to get the number of modified rows is to invoke SELECT ROW_COUNT()
right after your query. This method is useful if you like to know whether your DELETE
did in fact remove rows, or INSERT IGNORE
did in fact add a row, etc.
LIMIT
Well, MySQL DBAs are familiar with it. I just thought I’d mention LIMIT
, since it’s a MySQL deviation. I was surprised to find that out, when an Oracle DBA once asked me how I did paging with results. “You mean like LIMIT 60,10
?” I asked, and he replied: “LIMIT??”. So, you can LIMIT
to limit the number of results, like: SELECT * FROM Country LIMIT 10
, to only get first 10 rows, or to do paging like: SELECT * FROM Country LIMIT 60,10
, which skips 60 rows, then reads 10.
SQL_CALC_FOUND_ROWS, FOUND_ROWS()
While at it, it may be required to use LIMIT to only return 10 rows, but still ask MySQL how many rows there really were. Do it like this:
SELECT SQL_CALC_FOUND_ROWS Code, Name FROM Country LIMIT 10;
SELECT FOUND_ROWS();
First query gives the required 10 results. Second query says “239”, which is the total rows I would get had I not used LIMIT
. Note that a SELECT SQL_CALC_FOUND_ROWS
is a “heavy” query, which actually searches through the entire rowset, and then only returns the LIMITed rows. Use with care.
PROCEDURE ANALYSE
A very nice diagnostic tool, which tells us what data types are proper based on existing data. If we have an INT
column, but all values are smaller than 200, PROCEDURE_ANALYSE()
recommends that we use a TINYINT
. Usage: SELECT * FROM Country PROCEDURE ANALYSE(10,10)
. Just remember it does not anticipate data growth. It only relies on current data.
INSERT IGNORE
OK, I said above that it is commonly used, but couldn’t help myself, it’s just too useful to leave out. INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau')
will silently abort if there’s a UNIQUE KEY
on `id` and an existing id=1000 value. A normal INSERT
will terminate with an error, or raise an Exception in your application’s code. It is of particular use when doing an extended INSERT: INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau'), (9009, 'Wanaka')
may have trouble with the first row, but will insert the second row. ROW_COUNT()
can tell me how well it went.
The ALTER TABLE … ORDER BY thing is similar to CLUSTER in postgres I think.
http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html
Hi Nils,
CLUSTER in PostgreSQL seems to only work on an index, while
ALTER TABLE…ORDER BY will work on any columns.
there is another feature (some would not use this word) in mysql that is stunning in its implication — “hidden” columns in the GROUP BY
at first, i thought it was an egregious error that mysql would allow a query like this:
SELECT a, b, MAX(c) FROM t GROUP BY a
most databases toss an error about column b in the SELECT clause but not in the GROUP BY
read this article, which completely changed my mind on the issue — http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html (also republished on mysql.com)
keep “functional dependency” in mind (from normalization theory) and see if you don’t agree that it can be exceptionally useful
I personally like the REPLACE INTO command… same syntax as INSERT INTO, except if it finds a duplicate key, it deletes it and replaces it….
Similar in spirit to the UPDATE… ON DUPLICATE KEY UPDATE, the difference being – REPLACE INTO will delete and then insert so you get incremented auto numbered IDs