This post continues my impressions from some of the talks I've been to. I'll dedicate this post to a single session.
- BoF: Replication, Lars Thalmann, Mat Keep (Oracle)
Lars Thallman presented the MySQL 5.5 replication features, as well as the expected 5.6 features. Among other features, one could notice parallel replication, binlog checksums, sub-second resolution and more. There was an open discussion about these features, asking for comments; looking for new ideas and suggestion from the audience.
I can't possibly cover it all. I'll note two discussion I participated in, and which have interested me. This also serves for noting down to myself my ideas and thoughts.
As the trend goes, parallel replication will be implemented by having a thread per schema. My own contribution to the discussion is that with such paradigm, the slave may be at any state inconsistent with the master. That is, at any point in time it can be in such state that has never existed in any point in time on the master. This is because it is possible that one or more of the replication threads executes much faster than one or more of the others.
We have discussed the meaning of STOP SLAVE. I have suggested that STOP SLAVE works in such way that it stops in a consistent slave. Lars suggested there could be normal STOP SLAVE and STOP SLAVE CONSISTENT.
It has been noted that we may not care about the slave being inconsistent, as the prerequisite for parallel replication is that the distinct schemata are independent of one another. After some thought, I disagree. Today parallel replication may be implemented on different schemata; tomorrow - based on finer parameters. I think it's best to prepare for the inevitable future. Besides, STOP SLAVE INCONSISTENT is such an easy feature to implement!
I also had a long time feature request. While writing management tools, I realized it is difficult to learn the topology of a replication environment by simple interrogation. If you connect to a master, you can either issue SHOW PROCESSLIST to detect the slaves, or SHOW SLAVE HOSTS.
The former does not tell you the port on which the slaves are listening. The latter requires that you configure report_host and report_port properly.
What I would like to see is a SHOW SLAVES STATUS command (or similar INFORMATION_SCHEMA table) on the master. Such that it lists connected slaves, one per row, with the following details:
- Slave_host (this is easy, already exists in SHOW PROCESSLIST)
- Slave_port (slave should send this kind of information to master upon first connect)
- Slave_master_log_file (last log file requested by slave)
- Slave_master_log_pos (last log pos requested by slave)
- Slave_seconds_behind_master (slave should report this upon each request)
- Slave_server_id (slave should send this upon connect)
By reading Slave_host, Slave_port & Slave_server_id, it makes it easy for us to build replication topology. It's a single operation to detect a master's slaves. It takes recursive steps to build entire topology.
By reading the Slave_master_log_file, Slave_master_log_pos, we can easily determine that binary logs up to that point have already been written to the slave's relay log. This makes it easier to decide how to purge master logs. Slave_seconds_behind_master completes the replication picture.