Guide to MySQL installation files

Even for DBAs already familiar with MySQL, the choice of installation methods and the variety of install/package files is overwhelming and confusing.

I’ll make a (very!) brief introduction to the various installation options, concentrating on the Linux operating system, and provide with a simple shopping list.

For illustration, I’ll refer to the MySQL 5.1.52 community edition, the latest at the time of this writing. Downloads are available at http://dev.mysql.com/downloads/mysql/5.1.html.

I will then refer to alternate distributions.

A note for Windows users

You are lucky: your choice is very clear. Download the Windows MSI package. Choose 32 or 64 bit architecture according to your OS. If you know about the other ways to install and setup MySQL under Windows, you don’t need this post.

Linux: repositories

Easiest way to install MySQL on linux would be to use your distro’s repository. Just:

sudo apt-get install mysql-server-5.1

or

sudo yum install MySQL-Server-5.1

Your distro should resolve any package dependencies.

I’m in the opinion that if MySQL is the main application to be used on a server, distribution’s default repositories are not the way to go. Reasons include MySQL outdated version, incapability of installing multiple instance, danger of automatic upgrades or downgrades. Read this post and discussion that follows for more.

Linux: packages

If you’re a RedHat/CentOS/SuSE user, you’re in luck: MySQL provides pre-built RPM packages for your system. I vaguely remember MySQL announcing that Ubuntu is to be supported. That was a couple years ago; there is still no package for Debian/Ubuntu.

Continue reading » “Guide to MySQL installation files”

An argument for using mysqldump

I fully agree with Morgan’s An argument for not using mysqldump. This post does not come to contradict it, but rather shed a positive light on mysqldump.

I usually prefer an LVM snapshot based backup, or using XtraBackup. And, with databases as large as dozens of GB and above, I consider mysqldump to be a poor alternative. Poor in runtime, poor in overhead while taking the backup.

However once in a while I get to be reminded that mysqldump just works.

As a recent example, I had a server which was killed after an ALTER TABLE statement hanged forever (table already ALTERed, but old scheme never dropped). The old table data still hanged around the file system, but was not recognized by InnoDB. Trying out DISCARD TABLESPACE did not do the job, and eventually file was dropped.

So far, reasonable. InnoDB would complain about some table it never recognized in the first place, but all would work. That is, until backup was concerned. With innobackup or XtraBackup the restore would fail on some internal problem. LVM would work, but would only copy+paste the problem: innobackup would never again be able to be used on this database. Continue reading » “An argument for using mysqldump”

mycheckpoint (rev 208): aggregation tables, enhanced charting, RPM distribution

Revision 208 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • Aggregation tables: aggregated data makes for fast reports on previously slow queries.
  • Enhanced charting: interactive charts now present time stamps dynamically (see demo); “Zoom in” charts are available (see demo) on mycheckpoint‘s HTTP server.
  • RPM distribution: a “noarch” RPM mycheckpoint build is now available.
  • Initial work on formalizing test environment

mycheckpoint celebrates one year of existence!

Aggregation tables

I really wanted to avoid using these: everything was so more beautiful with one single dataset and dozens of supporting views (OK, the views themselves are hardly “beautiful”).

However it was impossible (for my level of expertise) to optimize query performance what with all those views on per-hour and per-day aggregation. The GROUP BYs and the JOINs did not make it possible for condition pushdown (i.e. using MERGE algorithm) where desired.

As result, mycheckpoint now manages aggregation tables: per-hour and per-day. The impact on sample taking is neglect able (making for two additional fast queries), but the impact on reading aggregated data is overwhelming. Generating a HTML full report could take a few minutes to complete. It now returns in no time. This makes charting more attractive, and allows for enhanced charting, such as zooming in on charts, as described following.

Aggregation tables will automatically be created and retroactively populated upon using revision 208. There’s nothing special to do; be advised that for one single execution of mycheckpoint, many INSERT queries are going to be executed. Shouldn’t take more than a couple minutes on commodity hardware and a few months of history.

It is possible to disable aggregation tables, or make for a complete rebuild of tables; by default, though, aggregation is ON.

Enhanced charting

Two enhancements here: Continue reading » “mycheckpoint (rev 208): aggregation tables, enhanced charting, RPM distribution”

MySQL terminology: processes, threads & connections

There’s some confusion in the MySQL terminology for processes, threads & connections, which I will try to resolve. I can’t explain the full nature of what processes and threads are; please see Wikipedia [1] [2] for that. But here’s some basics with regard to MySQL:

  • MySQL server is a single process application.
  • It is multithreaded.
  • It (usually) acts as a TCP/IP server, accepting connections.
  • Each connection gets a dedicated thread.
  • These threads are sometimes named processes, and sometimes they’re referred to as connections.

The last part is where confusion arises, so let me discuss again the use of threads and connections in MySQL.

Continue reading » “MySQL terminology: processes, threads & connections”

Simple guideline for choosing appropriate InnoDB PRIMARY KEYs

Risking some flames, I’d like to suggest only two options for choosing PRIMARY KEYs for InnoDB tables. I suggest they should cover 99% (throwing numbers around) of cases.

PRIMARY KEY cases

  1. An integer (SMALLINT / INT / BIGINT), possibly AUTO_INCREMENT column.
  2. The combination of two columns on a many-to-many connecting table (e.g. film_actor, which connects films to actors), the two columns being the PRIMARY KEYs of respective data tables. This rule may be extended to 3-way relation tables.

A short recap: an InnoDB must have a PRIMARY KEY. It will pick one if you don’t offer it. It can pick a really bad UNIQUE KEY (e.g. website_url(255)) or make one up using InnoDB internal row ids. If you don’t have a good candidate, an AUTO_INCREMENT PRIMARY KEY is probably the easiest way out.

A 2-column combination for a many-to-many connection table is common and viable. The PRIMARY KEY will not only provide with good join access method, but will also provide with the required UNIQUE constraint.

An integer-based PRIMARY KEY will make for more compact & shallow index tree structures, which leads to less I/O and page reads.

An AUTO_INCREMENT will allow for ascending PRIMARY KEY order of INSERT, which is InnoDB-friendly: index pages will be more utilized, less fragmented. Continue reading » “Simple guideline for choosing appropriate InnoDB PRIMARY KEYs”

Thoughts and ideas for Online Schema Change

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table,  to be later (asynchronously) analyzed and make for a replay of actions on the ghost table. Continue reading » “Thoughts and ideas for Online Schema Change”

How often should you use OPTIMIZE TABLE? – followup

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status: Continue reading » “How often should you use OPTIMIZE TABLE? – followup”

Oracle week 2010 in Israel: not a single MySQL session

Development, Middleware, [Oracle] Database, BI, ERP, CRM, SCM, EPM, SOA & BPM, Java, Security — all these and more are on the schedule. No MySQL track, not a single MySQL session.

Lack of speakers? Hardly. Lack of public interest? I can’t imagine.

Then what is it? I have no information, and don’t want to throw around suggestions.

Reading the convention’s objectives, I see nothing to suggest why MySQL would not be in. Heck, there’s even two new special seminars: “Family Economy” and “Career planning“.

Sorry, not providing a link this time (besides, it’s all Hebrew). If you’re eager to look it up, google for “Oracle week Israel”.

It’s a shame, and makes it harder to answer the “so what do you think Oracle will do to MySQL?” question I get to be asked ever so often.

openark-kit, Facebook Online Schema Change, and thoughts on open source licenses

MySQL@Facebook team have recently published an Online Schema Change code for non blocking ALTER TABLE operations. Thumbs Up!

The code is derived from oak-online-alter-table, part of openark-kit, a toolkit I’m authoring. Looking at the documentation I can see many ideas were incorporated as well. And of course many things are different, a lot of work has been put to it by MySQL@Facebook.

openark-kit is currently released under the new BSD license, and, as far as I can tell (I’m not a lawyer), Facebook’s work has followed the license to the letter. It is a strange thing to see your code incorporated into another project. While I knew work has begun on the tool by Facebook, I wasn’t in on it except for a few preliminary email exchanges.

And this is the beauty

You release code under open source license, and anyone can pick it up and continue working on it. One doesn’t have to ask or even let you know. Eventually one may release back to the community improved code, more tested (not many comments on oak-online-alter-table in the past 18 months).

It is a beauty, that you can freely use one’s patches, and he can then use yours.

And here is my concern

When I created both openark-kit and mycheckpoint, I licensed them under the BSD license. A very permissive license. Let anyone do what they want with it, I thought. However Facebook’s announcement suddenly hit me: what license would other people use for their derived work?

The OSC has been release under permissive license back to the community (again, I am not a lawyer). But, someone else could have made it less friendly. Perhaps not release the code at all: just sell it, closed-source, embedded in their product. And I found out that I do not want anyone to do whatever they want with my code.

I want all derived work to remain open!

Which is why in next releases of code I’m authoring the license will change to less permissive and more open license, such as GPL or LGPL. (Of course, all code released so far remains under the BSD license).

Tool of the day: autossh

Maybe I’m like an old replication server, lagging way behind, but a couple of weeks ago I found autossh, which is a wrapper around ssh, that keeps reconnecting the session if it breaks.

With public key encryption, I am now able to work out pretty reliable SSH tunneling among servers, which doesn’t break. It seems to be working well during these couple of weeks. And it’s in my favorite distro’s repository 🙂

I suppose use cases are as many as those for SSH or SSH tunneling, and I’m putting it to an interesting use. But I suppose the most obvious use in the MySQL world would be to encrypt client connections over unsafe network, or make the network more reliable, for that matter. Yes, there’s SSL connections, but opening your 3306 port on your firewall? Too risky for my taste.