Quick reminder: avoid using binlog-do-db

Nothing new about this warning; but it’s worth repeating:

Using binlog-do-db is dangerous to your replication. It means the master will not write to binary logs any statement not in the given database.

Ahem. Not exactly. It will not write to binary logs any statement which did not originate from the given database.

Which is why a customer, who was using Toad for MySQL as client interface to MySQL, and by default connected to the mysql schema, did not see his queries being replicated. In fact, he later on got replication errors. If you do:

USE test;
INSERT INTO world.City VALUES (...)

Then the statement is assumed to be in the test database, not in the world database.

Slightly better is using replicate-do-db on the slave machines. At least we allow the master to write everything. But still, for the same reasons, slaves may fail to repeat a perfectly valid query, just because it has been issued in the context of the wrong database. replicate-ignore-db is somewhat safer yet, but the trap is still there.

My advice is that replication should replicate everything. Make sure you and everyone else you work with understand the implications of binlog-do-db and replicate-do-db before implementing it.

6 thoughts on “Quick reminder: avoid using binlog-do-db

  1. Shlomi — you’re exactly right, except I wouldn’t say “never use binlog-do-db”. I would say “learn what binlog-do-db does, and know that you lose the ability to restore from backups when you do that, etc”.

    (I do agree that the general case is “binlog everything, because it’s not just for replication, it’s also for incremental backups, and can be useful when trying to figure out if a change was already made”. I just hate saying “always use foo” and “never use bar”….)

    If people really want to replicate only statements that affect the foo database, they can use:

    replicate_wild_do_table=foo.%

    If they want to ignore more than one database, the correct syntax is:
    replicate_wild_do_table=foo.%
    replicate_wild_do_table=bar.%

  2. Except, of course, if you are using row-based binary logging, in which case the filter is on the database of the actual affected table(s). It’s a fun difference in behavior that can make MIXED mode problematic.

  3. I would agree that keeping your replication slaves consistent with the master in smaller environments is significantly simpler then finding weird issues like this.

    My comment would also be just how much overhead does the filtering do, either on the master and the slave. Reducing I/O may be a factor, but under general use by 99% of customers this is not an issue.

    If you are happy to incur the cost of writing to the binary log and replaying the binary log on the slave, but you don’t want the data on the slave, another trick is to change the slave DB objects to the BLACKHOLE storage engine. You can no benefit in all the operations being performed, you simply win on the overall storage in the DB on the slave.
    NOTE of course, you can not actually use this slave for failover, or for reads, however you could never do that when ignoring replication anyway.

  4. @Sheeri,
    Well put. Though I did not say “never” or “always” anywhere in my post 🙂
    I did say “avoid”, and I take note of your point.

    @Todd,
    Thank you! Yes, I did only relate to statement based replication. Thanks for pointing it out.

    @Ronald,
    BLACKHOLE is indeed an option; and I’ve been using it here and there.
    I’m always concerned, though, that someone should do an ALTER TABLE ENGINE=… on the master in order to rebuild it or something, without setting SQL_LOG_BIN to 0.

  5. It is true that the current database issue (that I also mentioned in http://mysqlmusings.blogspot.com/2006/09/replication-and-disappearing.html) is something that has to be remembered, but even more important is understanding that if the statements are not written to the binary log, it is not possible to recover the database in the event of a crash. This means that statements filtered out from the binary log using binlog-* options should *only* be statements that can be lost without risking the integrity of the database.

    For this reason, filtering using replicate-* options is preferable.

    Row-based replication is a better way to actually filter out the real changes, but as Todd pointed out, the behavior is “interesting” in MIXED mode. We are working on trying to figure out a way to handle that safely.

    For more advanced filtering, we are considering turning the pre-alpha http://forge.mysql.com/wiki/ReplicationFeatures/ScriptableReplication into a solid implementation, which will allow quite advanced replication setups indeed.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.