Less known SQL syntax and functions in MySQL

November 23, 2008

"Standard SQL" is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.

Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE, are commonly used. Here's a list of some MySQL deviations to SQL, which are not so well known.

I'll be using MySQL's world database for demonstration.

GROUP_CONCAT

Assume the following query: SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode, which selects the number of cities per country, using MySQL's world database. It is possible to get a name for one "sample" city per country using standard SQL: SELECT CountryCode, Name, COUNT(*) FROM City GROUP BY CountryCode

But in MySQL it is also possible to get the list of cities per group: SELECT CountryCode, GROUP_CONCAT(Name), COUNT(*) FROM City GROUP BY CountryCode. This will provide with a comma delimited string of all city names per country.

ORDER BY NULL

If you ran the previous queries, you may have noticed that the results were ordered by CountryCode. MySQL's default behavior when GROUP BY is used, is to order by the grouped column. But this means sorting is required, possibly using merge passes and temporary tables. MySQL accepts the following syntax:

SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode ORDER BY NULL

If you EXPLAIN the query, you'll see no "Using filesort". When not using ORDER BY NULL, "Using filesort" appears.

ALTER TABLE ... ORDER BY

MyISAM tables are not clustered. The table data is independent of indexes. Depending on concurrent_insert settings, new rows are either appended to the end of the table, or fill the space previously occupied by DELETEd rows.

When you SELECT (*) FROM Country, the order of rows is as stored on disk. It is possible to do a one-time reordering of rows in a MyISAM table by executing: ALTER TABLE Country ORDER BY Code. This is a lengthy operation (on large tables), which locks the table, so take care when using it. The change does not last for long, either: as you INSERT new rows, the rows get out of order again. But if your table does not get modified, or only gets modified rarely, this is a nice trick to use when order of rows is important, and you don't want to pay the price of sorting per query.

ROW_COUNT()

Anyone who uses MySQL with a connector (say, Connector/J with JDBC), knows that INSERT, DELETE and UPDATE statements return with an integer value: the number of modified rows. In MySQL, the explicit way to get the number of modified rows is to invoke SELECT ROW_COUNT() right after your query. This method is useful if you like to know whether your DELETE did in fact remove rows, or INSERT IGNORE did in fact add a row, etc.

LIMIT

Well, MySQL DBAs are familiar with it. I just thought I'd mention LIMIT, since it's a MySQL deviation. I was surprised to find that out, when an Oracle DBA once asked me how I did paging with results. "You mean like LIMIT 60,10?" I asked, and he replied: "LIMIT??". So, you can LIMIT to limit the number of results, like: SELECT * FROM Country LIMIT 10, to only get first 10 rows, or to do paging like: SELECT * FROM Country LIMIT 60,10, which skips 60 rows, then reads 10.

SQL_CALC_FOUND_ROWS, FOUND_ROWS()

While at it, it may be required to use LIMIT to only return 10 rows, but still ask MySQL how many rows there really were. Do it like this:

SELECT SQL_CALC_FOUND_ROWS Code, Name FROM Country LIMIT 10;

SELECT FOUND_ROWS();

First query gives the required 10 results. Second query says "239", which is the total rows I would get had I not used LIMIT. Note that a SELECT SQL_CALC_FOUND_ROWS is a "heavy" query, which actually searches through the entire rowset, and then only returns the LIMITed rows. Use with care.

PROCEDURE ANALYSE

A very nice diagnostic tool, which tells us what data types are proper based on existing data. If we have an INT column, but all values are smaller than 200, PROCEDURE_ANALYSE() recommends that we use a TINYINT. Usage: SELECT * FROM Country PROCEDURE ANALYSE(10,10). Just remember it does not anticipate data growth. It only relies on current data.

INSERT IGNORE

OK, I said above that it is commonly used, but couldn't help myself, it's just too useful to leave out. INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau') will silently abort if there's a UNIQUE KEY on `id` and an existing id=1000 value. A normal INSERT will terminate with an error, or raise an Exception in your application's code. It is of particular use when doing an extended INSERT: INSERT IGNORE INTO City (id, Name) VALUES (1000, 'Te Anau'), (9009, 'Wanaka') may have trouble with the first row, but will insert the second row. ROW_COUNT() can tell me how well it went.

tags: , ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

15 Comments to "Less known SQL syntax and functions in MySQL"

  1. LewisC wrote:

    I wasn't aware of INSERT IGNORE. I like it. PROCEDURE ANALYSE is handy too.

    Nice post.

    LewisC

  2. Olaf van der Spek wrote:

    > It is possible to get a name for one “sample” city per country using standard SQL

    Isn't that a MySQL bug? It's not standard SQL and I think it's disabled by the ONLY_FULL_GROUP_BY SQL MODE.

    > But this means sorting is required,

    That's a weird thing. Why would MySQL do an automatic order by?

  3. shlomi wrote:

    Hi Olaf,
    1. The ANSI sql mode does indeed include ONLY_FULL_GROUP_BY. Apparently (at least as MySQL sees it), Oracle, SQL server, DB2 & PostgreSQL do not include this restriction; see http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html.

    2. I don't know why it is like that. I find no explanation on the official docs. I'll ask around. Meanwhile, see http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

  4. rudy wrote:

    this blew me away when i saw it --

    INSERT INTO t (c1,c2,c3)
    VALUES (v1,v2,v3),(v1,v2,v3),(v1,v2,v3),(v1,v2,v3)
    ON DUPLICATE KEY UPDATE c1=VALUES(c1),c2=VALUES(c2)

    the ON DUPLICATE KEY UPDATE part is nifty enough, but the VALUES clauses then specify which values in the INSERT statement to use to update the corresponding columns in the existing rows

    coolness squared

  5. shlomi wrote:

    rudy - nice!

  6. Nils wrote:

    The ALTER TABLE ... ORDER BY thing is similar to CLUSTER in postgres I think.

    http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html

  7. shlomi wrote:

    Hi Nils,
    CLUSTER in PostgreSQL seems to only work on an index, while
    ALTER TABLE...ORDER BY will work on any columns.

  8. rudy wrote:

    there is another feature (some would not use this word) in mysql that is stunning in its implication -- "hidden" columns in the GROUP BY

    at first, i thought it was an egregious error that mysql would allow a query like this:

    SELECT a, b, MAX(c) FROM t GROUP BY a

    most databases toss an error about column b in the SELECT clause but not in the GROUP BY

    read this article, which completely changed my mind on the issue -- http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html (also republished on mysql.com)

    keep "functional dependency" in mind (from normalization theory) and see if you don't agree that it can be exceptionally useful

  9. A day in the life of… - Once a nomad, always a nomad wrote:

    [...] Bookmarked a link on Delicious. Less known SQL syntax and functions in MySQL | code.openark.org [...]

  10. Jason wrote:

    I personally like the REPLACE INTO command... same syntax as INSERT INTO, except if it finds a duplicate key, it deletes it and replaces it....

    Similar in spirit to the UPDATE... ON DUPLICATE KEY UPDATE, the difference being - REPLACE INTO will delete and then insert so you get incremented auto numbered IDs

  11. shlomi wrote:

    Hi Jason,

    I've seen a very bad effect of REPLACE INTO in a production system. The overhead of DELETEing a row, then INSERTing a new one is huge, especially on InnoDB tables. Changing the code to INSERT INTO...ON DUPLICATE KEY reduced the load average on the db machine by factors.

  12. Kannan wrote:

    Good Post for MySQL.

  13. Tim wrote:

    Your Oracle DBA was shocked that you're using "LIMIT" for pagination because that syntax is a MySQL extension. The standard SQL way would be "LIMIT 10 OFFSET 60".

    MySQL also supports OFFSET, and it makes it clearer which number is which, so I'm not sure why you'd ever use "LIMIT a,b".

  14. shlomi wrote:

    @Tim,

    There's some contradiction in your comment, I think. AFAIK, there is no LIMIT in ANSI SQL. Therefore, it is purely a MySQL extension. Thus, there is no "standard SQL way" for doing LIMIT.

    Using one syntax over the other is a matter of preference and, in my case, and old habit. I agree using LIMIT with OFFSET is more verbose.

    Regards

  15. Discovery of the day: GROUP BY DESC | code.openark.org wrote:

    [...] are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort [...]

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org