REPLACE INTO: think twice

The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO’s. It certainly has a nicer syntax than INSERT INTO … ON DUPLICATE KEY UPDATE, and it’s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.

But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look. Continue reading » “REPLACE INTO: think twice”

Using memcached functions for MySQL; an automated alternative to Query Cache

There’s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. TangentOrg have developed a memcached client in the form of MySQL UDFs (User Defined Functions).

I wish to discuss the memcached functions for MySQL: if and how they should be used.

Disclaimer: I do not work with memcached functions for MySQL on a production system; all that is written here reflects my opinion on how things should be done.

With memcached functions for MySQL, we can do the following:

SELECT memc_set('mykey', 'The answer is 42');
SELECT memc_get('mykey');

(See my previous post on how to install memcached functions for MySQL).

In what scenario should we use these functions?

I believe memcached is the right tool for the application level. I am less enthusiastic about using it from MySQL. Sure, pushing it down to MySQL centralizes everything. Instead of having all my application code (PHP, Java etc.) access memcached separately, they can all access one single MySQL node, which gets to access memcached. I see two problems with this approach: Continue reading » “Using memcached functions for MySQL; an automated alternative to Query Cache”

Installing memcached functions for MySQL

tangent.org provide a memcached client for MySQL, in the form of UDFs. The provided set of functions allow for connecting to a memcached server, putting values in the cache, getting values, invalidating, utilizing increments etc.

The code is not (yet?) available in binary format, so the libraries need to be compiled and installed manually. Following is a quick installation HOWTO for Linux users. Continue reading » “Installing memcached functions for MySQL”

MySQL’s character sets and collations demystified

MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets. Continue reading » “MySQL’s character sets and collations demystified”

Dynamic sequencing with a single query

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL’s world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That’s more of a problem. Continue reading » “Dynamic sequencing with a single query”

Selecting a specific non aggregated column data in GROUP BY

In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using MySQL’s world database, I wish to get the number of countries per continent, along with a “sample” country:

SELECT Continent, COUNT(*), Name
FROM `Country` GROUP BY Continent

+---------------+----------+----------------+
| Continent     | COUNT(*) | Name           |
+---------------+----------+----------------+
| Asia          |       51 | Afghanistan    |
| Europe        |       46 | Albania        |
| North America |       37 | Aruba          |
| Africa        |       58 | Angola         |
| Oceania       |       28 | American Samoa |
| Antarctica    |        5 | Antarctica     |
| South America |       14 | Argentina      |
+---------------+----------+----------------+

What if I want to choose that “sample” country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs. Continue reading » “Selecting a specific non aggregated column data in GROUP BY”