Character sets: latin1 vs. ascii

Unless specified otherwise, latin1 is the default character set in MySQL.

What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.)

I find latin1 to be improper for such purposes and suggest that ascii be used instead. The reason being that latin1 implies a European text (with swedish collation). It is unclear for an outsider, when finding a latin1 column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character in latin1 only requires 1 byte of storage. Continue reading » “Character sets: latin1 vs. ascii”

New and Noteworthy in openark kit

A new release of openark kit is out. Some interesting additions.changes are:

openark kit is a set of utilities for MySQL, helping in easing out everyday’s work. Let’s look more closely at the changes.

oak-online-alter-table

The utility allows for non-blocking ALTER TABLE operations, under certain limitations. One limitation which has been removed in the current release was the single-column UNIQUE KEY limitation. As of now, a requirement for running oak-online-alter-table is that the altered table has some UNIQUE KEY. It could be numerical, textual, single column, multi-column (compound), anything. Continue reading » “New and Noteworthy in openark kit”

Unwalking a string with GROUP_CONCAT

“Walking a string” is an SQL technique to convert a single value into multiple rows result set. For example, walking the string ‘hello’ results with 5 rows, each of which contains a single character from the text.

I’ll present a brief example of walking a string, and then show how to “unwalk” the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine): Continue reading » “Unwalking a string with GROUP_CONCAT”

Even more on MySQL password security

This post follows Ronald Bradford’s More Basic MySQL Security, and Lenz Grimmer’s Basic MySQL Security: Providing passwords on the command line and More on MySQL password security.

In Ronald’s post I’ve argued that passwords provided on command line are visible in plaintext on “ps aux”. Lenz has argued that this is incorrect, providing the source code to support that. Giuseppe commenting that this has been fixed since 2002. Later on, Lenz shows that passwords are visible in plaintext on OpenSolaris, Solaris and variants of BSD and SysV.

Mental note: old habits die hard; I must remember to revisit issues from time to time.

Centralizing

Back to the question: why use a file to store your password, and not provide it on command line?

Continue reading » “Even more on MySQL password security”

SQL: finding a user’s country/region based on IP

I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);

The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?

The wrong way

The form I’ve encountered is as follows: an index is declared on regions_ip_range:

KEY ip_range_idx (start_ip, end_ip)

And the query goes like this:

SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip

Continue reading » “SQL: finding a user’s country/region based on IP”

Reasons to use innodb_file_per_table

When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are ‘mixed’ together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB’s tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I’ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around. Continue reading » “Reasons to use innodb_file_per_table”

The importance of report_host & report_port

The two server variables, report_host and report_port, are not commonly used. One of the reasons, I suspect, is that they do not appear in any of the standard .cnf files provided with a MySQL installation.

For a replication environment, I find these two variables to be very useful.

Where are these variables used?

Here’s a slightly modified exerpt from a MySQL master node I have, and which has two slaves: Continue reading » “The importance of report_host & report_port”

More on variables ambiguities

Continuing Variables ambiguities in names and values, there are two more issues I wish to present.

First, I’ve shown that variable values may be ambiguous. Thus, 1 and ON are interchangeable. But also 1 and YES.

The important thing to note is that it’s not always like that. You can’t just swap 1 for ON or YES as you will: it depends on the variable. Thus, a specific variable (e.g. query_cache_type) may accept ON for 1 – but will not accept YES. Another may accept YES for 1 – bot not ON.

Confused? Here’s the second issue. This one is really a bug, as I see it. Take a look at the following:

mysql> SHOW GLOBAL VARIABLES LIKE 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+

The ON value is actually illegal: if you try to set this value in your my.cnf file – you’ll get a non-file-per-table behavior. The real value you should be using in my.cnf is 1 (or no value at all).

And so in this latter example, 1 equals ON, but only one of them is valid in the my.cnf file, and yet only the other one is being reported.

The behavior is documented here. But, if, like myself, you find it undesired, please comment on bug #44509.

Variables ambiguities in names and values

Writing up some scripts, I see more and more ambiguities with regard to global variables.

For one thing, the names ambiguity between the hyphen (‘-‘) and the underscore (‘_’). So wait_timeout and wait-timeout are the same variable.

But just check out the many levels of inconsistency:

  • Command line arguments (e.g. run mysqld with option variables) use the hyphen convention
  • mysql –verbose –help shows the hyphen convention
  • SHOW GLOBAL VARIABLES uses the underscore convention
  • The MySQL supplied sample configuration files use both conventions interchangeably

Enough? Not quite: there are ambiguities in values, as well. For example, you may set query_cache_type to 1 or ON. These are equivalent. That’s very friendly. However: Continue reading » “Variables ambiguities in names and values”