Thoughts and ideas for Online Schema Change

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table,  to be later (asynchronously) analyzed and make for a replay of actions on the ghost table. Continue reading » “Thoughts and ideas for Online Schema Change”

How often should you use OPTIMIZE TABLE? – followup

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status: Continue reading » “How often should you use OPTIMIZE TABLE? – followup”

Oracle week 2010 in Israel: not a single MySQL session

Development, Middleware, [Oracle] Database, BI, ERP, CRM, SCM, EPM, SOA & BPM, Java, Security — all these and more are on the schedule. No MySQL track, not a single MySQL session.

Lack of speakers? Hardly. Lack of public interest? I can’t imagine.

Then what is it? I have no information, and don’t want to throw around suggestions.

Reading the convention’s objectives, I see nothing to suggest why MySQL would not be in. Heck, there’s even two new special seminars: “Family Economy” and “Career planning“.

Sorry, not providing a link this time (besides, it’s all Hebrew). If you’re eager to look it up, google for “Oracle week Israel”.

It’s a shame, and makes it harder to answer the “so what do you think Oracle will do to MySQL?” question I get to be asked ever so often.

openark-kit, Facebook Online Schema Change, and thoughts on open source licenses

MySQL@Facebook team have recently published an Online Schema Change code for non blocking ALTER TABLE operations. Thumbs Up!

The code is derived from oak-online-alter-table, part of openark-kit, a toolkit I’m authoring. Looking at the documentation I can see many ideas were incorporated as well. And of course many things are different, a lot of work has been put to it by MySQL@Facebook.

openark-kit is currently released under the new BSD license, and, as far as I can tell (I’m not a lawyer), Facebook’s work has followed the license to the letter. It is a strange thing to see your code incorporated into another project. While I knew work has begun on the tool by Facebook, I wasn’t in on it except for a few preliminary email exchanges.

And this is the beauty

You release code under open source license, and anyone can pick it up and continue working on it. One doesn’t have to ask or even let you know. Eventually one may release back to the community improved code, more tested (not many comments on oak-online-alter-table in the past 18 months).

It is a beauty, that you can freely use one’s patches, and he can then use yours.

And here is my concern

When I created both openark-kit and mycheckpoint, I licensed them under the BSD license. A very permissive license. Let anyone do what they want with it, I thought. However Facebook’s announcement suddenly hit me: what license would other people use for their derived work?

The OSC has been release under permissive license back to the community (again, I am not a lawyer). But, someone else could have made it less friendly. Perhaps not release the code at all: just sell it, closed-source, embedded in their product. And I found out that I do not want anyone to do whatever they want with my code.

I want all derived work to remain open!

Which is why in next releases of code I’m authoring the license will change to less permissive and more open license, such as GPL or LGPL. (Of course, all code released so far remains under the BSD license).

Tool of the day: autossh

Maybe I’m like an old replication server, lagging way behind, but a couple of weeks ago I found autossh, which is a wrapper around ssh, that keeps reconnecting the session if it breaks.

With public key encryption, I am now able to work out pretty reliable SSH tunneling among servers, which doesn’t break. It seems to be working well during these couple of weeks. And it’s in my favorite distro’s repository 🙂

I suppose use cases are as many as those for SSH or SSH tunneling, and I’m putting it to an interesting use. But I suppose the most obvious use in the MySQL world would be to encrypt client connections over unsafe network, or make the network more reliable, for that matter. Yes, there’s SSL connections, but opening your 3306 port on your firewall? Too risky for my taste.

mycheckpoint (rev. 190): HTTP server; interactive charts

Revision 190 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • HTTP server: mycheckpoint can now act as a web server. Point your browser and start browsing through HTML reports. See mock up demo.
  • Interactive charts: HTML line charts are now interactive, presenting with accurate data as you move over them. See sample.
  • Enhanced auto-deploy: now auto-recognizing failed upgrades.
  • Reduced footprint: much code taken out of the views, leading to faster loading times.
  • Better configuration file use: now supporting all command line options in config file.
  • Remote host monitoring accessibility: now supporting complete configurable accessibility details.
  • Bug fixes: thanks to the bug reporters!

mycheckpoint is free, simple, easy to use (now easier with HTTP server) and useful. I encourage you to try it out: even compared with other existing and emerging monitoring tools, I believe you will find it a breeze; it’s low impact and lightness appealing; it’s alerts mechanism assuring; its geeky SQL-based nature with ability to drill down to fine details — geeky-kind-of-attractive.

HTTP server

You can now run mycheckpoint in http mode:

bash$ mycheckpoint http

mycheckpoint will listen on port 12306, and will present you with easy browsing through the reports of your mycheckpoint databases. Continue reading » “mycheckpoint (rev. 190): HTTP server; interactive charts”

Sphinx & MySQL: facts and misconceptions

Sphinx search is a full text search engine, commonly used with MySQL.

There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx’ properties, hoping to answer some common questions.

  • Sphinx is not part of MySQL/Oracle.
  • It is a standalone server; an external application to MySQL.
  • Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.
  • And, although described as “free open-source SQL full-text search engine”, it is not SQL-specific: Sphinx can read documents from XML.
  • It is often described as “full text search for InnoDB”. This description is misleading. Sphinx indexes text; be it from any storage engine or external source. It solves, in a way, the issue of “FULLTEXT is only supported by MyISAM”. Essentially, it provided full-text indexing for InnoDB tables, but in a very different way than the way MyISAM’s FULLTEXT index works.

Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed). Continue reading » “Sphinx & MySQL: facts and misconceptions”

mylvmbackup HOWTO: minimal privileges & filesystem copy

This HOWTO discusses two (unrelated) issues with mylvmbackup:

  • The minimal privileges required to take MySQL backups with mylvmbackup.
  • Making (non compressed) file system copy of one’s data files.

Minimal privileges

Some just give mylvmbackup the root account, which is far too permissive. We now consider what the minimal requirements of mylvmbackup are.

The queries mylvmbackup issues are:

  • FLUSH TABLES
  • FLUSH TABLES WITH READ LOCK
  • SHOW MASTER STATUS
  • SHOW SLAVE STATUS
  • UNLOCK TABLES

Both SHOW MASTER STATUS & SHOW SLAVE STATUS require either the SUPER or REPLICATION CLIENT privilege. Since SUPER is more powerful, we choose REPLICATION CLIENT.

The FLUSH TABLES * and UNLOCK TABLES require the RELOAD privilege.

However, we are not done yet. mylvmbackup connects to the mysql database, which means we must also have some privilege there, too. We choose the SELECT privilege.

Continue reading » “mylvmbackup HOWTO: minimal privileges & filesystem copy”

MMM for MySQL single reader role

The standard documentation and tutorials on MMM for MySQL, for master-master replication setup, suggest one Virtual IP for the writer role, and two Virtual IPs for the reader role. It can be desired to only have a single virtual IP for the reader role, as explained below.

The two IPs for the reader role

A simplified excerpt from the mmm_common.conf sample configuration file, as can be found on the project’s site and which is most quoted: Continue reading » “MMM for MySQL single reader role”