Installing memcached functions for MySQL

tangent.org provide a memcached client for MySQL, in the form of UDFs. The provided set of functions allow for connecting to a memcached server, putting values in the cache, getting values, invalidating, utilizing increments etc.

The code is not (yet?) available in binary format, so the libraries need to be compiled and installed manually. Following is a quick installation HOWTO for Linux users. Continue reading » “Installing memcached functions for MySQL”

MySQL’s character sets and collations demystified

MySQL’s character sets and collations are often considered as a mystery, and many users either completely disregard them and keep with the defaults, or set everything to UTF8.

This post will attempt to shed some light on the mystery, and provide with some best practices for use with text columns with regard to character sets. Continue reading » “MySQL’s character sets and collations demystified”

Dynamic sequencing with a single query

It is a known trick to use a session variables for dynamically counting/sequencing rows. The way to go is to SET a variable to zero, then use arithmetic within assignment to increment its value for each row in the SELECTed rows.

For example, the following query lists the top 10 populated countries, using MySQL’s world database:

SELECT Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+--------------------+------------+
| Code | Name               | Population |
+------+--------------------+------------+
| CHN  | China              | 1277558000 |
| IND  | India              | 1013662000 |
| USA  | United States      |  278357000 |
| IDN  | Indonesia          |  212107000 |
| BRA  | Brazil             |  170115000 |
| PAK  | Pakistan           |  156483000 |
| RUS  | Russian Federation |  146934000 |
| BGD  | Bangladesh         |  129155000 |
| JPN  | Japan              |  126714000 |
| NGA  | Nigeria            |  111506000 |
+------+--------------------+------------+

The results do not provide any sequence number. Nor does the table have an AUTO_INCREMENT or otherwise unique row number. If I were to rate the countries by population, the common trick is:

SET @rank := 0;
SELECT
  @rank := @rank+1 AS rank,
  Code, Name, Population
FROM Country ORDER BY Population DESC LIMIT 10;

+------+------+--------------------+------------+
| rank | Code | Name               | Population |
+------+------+--------------------+------------+
|    1 | CHN  | China              | 1277558000 |
|    2 | IND  | India              | 1013662000 |
|    3 | USA  | United States      |  278357000 |
|    4 | IDN  | Indonesia          |  212107000 |
|    5 | BRA  | Brazil             |  170115000 |
|    6 | PAK  | Pakistan           |  156483000 |
|    7 | RUS  | Russian Federation |  146934000 |
|    8 | BGD  | Bangladesh         |  129155000 |
|    9 | JPN  | Japan              |  126714000 |
|   10 | NGA  | Nigeria            |  111506000 |
+------+------+--------------------+------------+

The first query sets the @rank to zero, so that it is not NULL (since no arithmetic can be done with NULL).  The second query relies on its success.

Can the same be achieved with one query only? That’s more of a problem. Continue reading » “Dynamic sequencing with a single query”

Selecting a specific non aggregated column data in GROUP BY

In a GROUP BY query, MySQL may allow specifying non aggregated columns. For example, using MySQL’s world database, I wish to get the number of countries per continent, along with a “sample” country:

SELECT Continent, COUNT(*), Name
FROM `Country` GROUP BY Continent

+---------------+----------+----------------+
| Continent     | COUNT(*) | Name           |
+---------------+----------+----------------+
| Asia          |       51 | Afghanistan    |
| Europe        |       46 | Albania        |
| North America |       37 | Aruba          |
| Africa        |       58 | Angola         |
| Oceania       |       28 | American Samoa |
| Antarctica    |        5 | Antarctica     |
| South America |       14 | Argentina      |
+---------------+----------+----------------+

What if I want to choose that “sample” country? For example, for each continent, I wish to show the country with the largest population. To simply see the largest population, I would use MAX(Population). But which country is referred? I wish to provide a solution which does not involve sub-queries, HAVING or JOINs. Continue reading » “Selecting a specific non aggregated column data in GROUP BY”

Useful database analysis queries with INFORMATION_SCHEMA

A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.

I will present queries for:

Less known SQL syntax and functions in MySQL

“Standard SQL” is something you read about. All popular databases have modified version of SQL. Each database adds its own flavor and features to the standard. MySQL is no different.

Some deviations are storage engine dependent. Others are more general. Many, such as INSERT IGNORE, are commonly used. Here’s a list of some MySQL deviations to SQL, which are not so well known. Continue reading » “Less known SQL syntax and functions in MySQL”

Common wrong Data Types compilation

During my work with companies using MySQL, I have encountered many issues with regard to schema design, normalization and indexing. Of the most common errors are incorrect data types definition. Many times the database is designed by programmers or otherwise non-expert DBAs. Some companies do not have the time and cannot spare the effort of redesigning and refactoring their databases, and eventually face poor performance issues.

Here’s a compilation of “the right and the wrong” data types. Continue reading » “Common wrong Data Types compilation”

Dangers of skip-grant-tables

When MySQL’s root password is lost and must be reset, there are two popular ways to create a new password. One of the options is far too popular, in my opinion.

The preferred way of setting a root’s password is by using an init-file. The process for doing this is well explained in MySQL’s manual. Using this method requires creating a simple text file, in which the required

GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFY BY '****' WIth GRANT OPTION;

(or, alternatively,  SET PASSWORD ...) statement is written.

An entry must be written to my.cnf, or supplied via command line parameters:

init-file=/tmp/my-init-file.sql

MySQL must then be restarted. Upon restart, and before opening any outside connections, the init-file is executed. Once MySQL is up and running, the init-file entry should be dropped. Continue reading » “Dangers of skip-grant-tables”

Two storage engines; different plans, Part II

In Part I of this article, we have seen how the internal structure of the storage engine’s index can affect an execution plan. We’ve seen that some plans are inherent to the way engines are implemented.

We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will consider MyISAM and InnoDB. Again, we will use the world database for testing. This time, we will see how confident the storage engines are in their index search capabilities.

Many newcomers to databases often believe that an index search is always preferable to full table scan. This is not the case. If I were to look for 10 rows in a 1,000,000 rows table, using an indexed column – I could benefit from an index search. However, if I’m looking for 200,000 rows on that table (that’s 20% of the rows) – an index search can actually be much more expensive than a full table scan. Continue reading » “Two storage engines; different plans, Part II”

Two storage engines; different plans, Part I

A popping question is: “Can an execution plan change for different storage engines?”

The answer is “Yes”. I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine.

We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many respects, and in particular, the way they implement indexes and statistics: two major players in the optimizer’s point of view. Continue reading » “Two storage engines; different plans, Part I”