Replication configuration checklist

This post lists the essential and optional settings for a replication environment.

It does not explain how to create replicating slaves. See How To Setup Replication for that. However, not all configuration options are well understood, and their roles in varying architectures can change.

Here are the settings for a basic Master/Slave(s) replication architecturee.

Essential

  • log-bin: enable binary logs on the master. Replication is based on the master logging all modifying queries (INSERT/CREATE/ALTER/GRANT etc.), and the slaves being able to replicate them.
  • server-id: each machine must have a unique server-id. A slave will not replay queries originating from a server with the same server-id as its own.
  • GRANT: grant a user with REPLICATION SLAVE. The host list must include all replication slave hosts.
  • expire-logs-days: automatically clean up master’s binary logs older than given value. By default, binary logs are never removed.

When working with Master/Slaves replication, one should be prepared to master failure and slave promotion to master. It may be desirable to identify a particular slave as primary candidate for promotion.

Continue reading » “Replication configuration checklist”

EXPLAIN: missing db info

I’m further developing a general log hook, which can stream queries from the general log.

A particular direction I’m taking is to filter queries by their type of actions. For example, the tool (oak-hook-general-log) can be instructed to only stream out those queries which involve creation of a temporary table; or those which cause for a filesort, or full index scan, etc.

This is done by evaluating of query execution plans on the fly. I suspect the MySQL query analyzer roughly does the same (as a small part of what it does).

It’s almost nothing one cannot do with sed/awk. However, I bumped into a couple of problems:

  1. The general log (and the mysql.general_log table, in  particular) does not indicate the particular database one is using for the query. Since slow log does indicate this data, I filed a bug on this. I currently solve this by crossing connection id with the process list, where the current database is listed. It’s shaky, but mostly works.
  2. Just realized: there’s no DB info in the EXPLAIN output! It’s weird, since I’ve been EXPLAINing queries for years now. But I’ve always had the advantage of knowing the schema used: either because I was manually executing the query on a known schema, or mk-query-digest was kind enough to let me know.

Continue reading » “EXPLAIN: missing db info”

Discovery of the day: GROUP BY … DESC

I happened on a query where, by mistake, an

SELECT ... ORDER BY x DESC LIMIT 1

was written as

SELECT ... GROUP BY x DESC LIMIT 1

And it took me by surprise to realize GROUP BY x DESC is a valid statement. I looked it up: yep! It’s documented.

In MySQL, GROUP BY results are sorted according to the group statement. You can override this by adding ORDER BY NULL (see past post). I wasn’t aware you can actually control the sort order.

Reducing locks by narrowing primary key

In a period of two weeks, I had two cases with the exact same symptoms.

Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks.

In both cases, tables were InnoDB. In both cases, there was a PRIMARY KEY on the combination of all 5 columns. And in both cases, there was no clear explanation as for why the PRIMARY KEY was chosen as such.

Choosing a proper PRIMARY KEY

Especially with InnoDB, which uses clustered index structure, the PRIMARY KEY is of particular importance. Besides the fact that a bloated PRIMARY KEY bloats the entire clustered index and secondary keys (see: The depth of an index: primer), it is also a source for locks. It’s true that any UNIQUE KEY can serve as a PRIMARY KEY. But not all such keys are good candidates. Continue reading » “Reducing locks by narrowing primary key”

Database schema under version control

How many organization use version control for development? Probably almost every single one.

How many store the database schema under version control? Alas, not as many.

Coupling one’s application with table schema is essential. Organization who actively support multiple versions of the product understand that well. Smaller organizations not always have this realization.

How is it done?

Ideally

Ideally one would have:

  • The schema, generated by hand
  • Essential data (INSERT INTO statements for those lookup tables without which you cannot have an application)
  • Sample data: sufficient real-life data on which to act. This would include customers data, logs, etc.

If you can work this way, then creating a staging environment consists of re-creating the entire schema from out schema code, and filling in the essential & sample data.

The thing with this method is that one does not (usually?) apply it on one’s live system. Say we were to add a column. On our live servers we would issue an ALTER TABLE t ADD COLUMN.

But this means we’re using different methods on our staging server and on our production server.

Incremental

Another kind of solution would be to hold:

  • The static schema, as before
  • Essential data, as before
  • Sample data, as before
  • A migration script, which is the concatenation of all ALTERs, CREATEs etc. as of the static schema.

Once in a while one can do a “reset”, and update the static schema with the existing design.

As you go along

This solution simply means “we apply the changes on staging; test + version them; then apply on production”.

A side effect of this solution is that the database generates the schema, but the schema does not generate the database as in previous cases. This makes for an uglier solution, where you first apply the changes to the database, and then, based on what the database report, enter data into the version control.

How to do that? Easiest would be to use mysqldump –routines –no-data. Some further parsing should be done to strip out the AUTO_INCREMENT values, which tend to change, as well as the surrounding variables settings (strip out the character set settings etc.).

Summary

However you do it, make sure you have some kind of version control on your schema. It pays off just as with doing version control for your code. You get to compare, understand the incremental changes, understand the change in design, etc.

Tales of the Trade #3: MySQL vs. NoSQL

Apparently the message from the comic below was not well conceived.

Following the strip I’ve added some spoiler notes.

This is not about bashing NoSQL. Apparently some people feel this way.

This is about:

  • Trying to judge/analyze NoSQL as if it were a single entity, thereby asking “should I use MySQL or NoSQL”?
  • The fact Wikipedia defines NoSQL as a “movement” (who’s the leader?)
  • The fact that people are still arguing on how “MySQL” should be pronounced.

Poll: what (minor) versions of MySQL should be supported by an open source MySQL related project?

I would like to get the community’s opinion about supporting older (minor) versions of MySQL in open source projects.

If I were to develop some open source project, and a bug report came which only applied to MySQL 5.0.51 (but more recent versions worked fine), would I need to fix the code so as to support this older version?

How about supporting 5.0.22 (released almost 4 years ago, with almost 70 revisions since)? Would you expect an open source project to support this MySQL version because, say, this is the default version in your yum repository?

I would like to concentrate on the currently stable MySQL versions: 5.0 and 5.1. Versions 4.x are out of the question for me, and 5.5 is not yet GA.

Sure, it would be great to support everything. But also time and effort consuming. So, I would greatly appreciate your feedback!
Continue reading » “Poll: what (minor) versions of MySQL should be supported by an open source MySQL related project?”

mycheckpoint (Rev. 118): alerts, email notifications and more

Revision 118 of mycheckpoint has been released. New and updated in this revision:

  • Conditional alerts
  • Email notifications
  • Revised HTML reports, including 24/7 reports.
  • Updated documentation

With this new revision mycheckpoint turns into a monitoring solution for MySQL. One can now:

  • Store measure metrics
  • Query for raw, aggregated or digested metrics
  • Generate charts for selected metrics
  • View HTML reports for selecetd metrics
  • Define alerts conditions, query for pending alerts
  • Be notified via email on raised or resolved alerts.

Conditional alerts

mycheckpoint is SQL oriented. As such, it allows for creation of alert conditions, which are nothing more than SQL conditions.

Continue reading » “mycheckpoint (Rev. 118): alerts, email notifications and more”

oak-hook-general-log: streaming general log

I’m seeking input on a new openark kit utility I’ve started to implement.

The tool, oak-hook-general-log, will hook up to a MySQL (>= 5.1) server, and stream the general log into standard output. It looks like this:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root
2010-03-21 10:18:42     root[root] @ localhost []       79      1       Query   SELECT COUNT(*) FROM City
2010-03-21 10:18:48     root[root] @ localhost []       79      1       Query   DELETE FROM City WHERE id=1000
2010-03-21 10:18:54     root[root] @ localhost []       79      1       Query   SHOW PROCESSLIST
2010-03-21 10:19:06     root[root] @ localhost []       79      1       Quit
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Connect root@localhost on
2010-03-21 10:19:07     root[root] @ localhost []       93      1       Query   select @@version_comment limit 1
2010-03-21 10:22:33     root[root] @ localhost []       93      1       Query   SELECT City.Name, Country.Name FROM Country JOIN City ON Country.Capit
2010-03-21 10:22:58     root[root] @ localhost []       93      1       Quit

Since output is written to stdout, one can further:

bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep Connect
bash$ python src/oak/oak-hook-general-log.py --socket=/tmp/mysql.sock --user=root | grep webuser@webhost

What the tool does is to enable table logs, and periodically rotate the mysql.general_log table, read and dump its content.

Continue reading » “oak-hook-general-log: streaming general log”

But I DO want MySQL to say “ERROR”!

MySQL is known for its willingness to accept invalid queries, data values. It can silently commit your transaction, truncate your data.

  • Using GROUP_CONCAT with a small group_concat_max_len setting? Your result will be silently truncated (make sure to check the warnings though).
  • Calling CREATE TEMPORARY TABLE? You get silent commit.
  • Issuing a ROLLBACK on non-transactional involved engines? Have a warning; no error.
  • Using LOCK IN SHARE MODE on non transactional tables? Not a problem. Nothing reported.
  • Adding a FOREIGN KEY on a MyISAM table? Good for you; no action actually taken.
  • Inserting 300 to a TINYINT column in a relaxed sql_mode? Give me 255, I’ll silently drop the remaining 45. I owe you.

Warnings and errors

It would be nice to: Continue reading » “But I DO want MySQL to say “ERROR”!”