Note: the method described here is an anti pattern
This is the fourth in a series of posts reviewing methods for MySQL master discovery: the means by which an application connects to the master of a replication tree. Moreover, the means by which, upon master failover, it identifies and connects to the newly promoted master.
These posts are not concerned with the manner by which the replication failure detection and recovery take place. I will share orchestrator
specific configuration/advice, and point out where cross DC orchestrator/raft
setup plays part in discovery itself, but for the most part any recovery tool such as MHA
, replication-manager
, severalnines
or other, is applicable.
We discuss asynchronous (or semi-synchronous) replication, a classic single-master-multiple-replicas setup. A later post will briefly discuss synchronous replication (Galera/XtraDB Cluster/InnoDB Cluster).
Master discovery via Proxy Heuristics
In Proxy Heuristics all clients connect to the master through a proxy. The proxy observes the backend MySQL servers and determines who the master is.
This setup is simple and easy, but is an anti pattern. I recommend against using this method, as explained shortly.
Clients are all configured to connect to, say, cluster1-writer.proxy.example.net:3306
. The proxy will intercept incoming requests either based on hostname or by port. It is aware of all/some MySQL backend servers in that cluster, and will route traffic to the master M
.
A simple heuristic that I’ve seen in use is: pick the server that has read_only=0
, a very simple check.
Let’s take a look at how this works and what can go wrong.
A non planned failover illustration #1
Master M
has died, the box had a power failure. R
gets promoted in its place. Our recovery tool:
- Fails over, but doesn’t need to run any hooks.
The proxy:
- Knows both about
M
andR
. - Notices
M
fails health checks (select @@global.read_only
returns error since the box is down). - Notices
R
reports healthy and withread_only=0
. - Routes all traffic to
R
.
Success, we’re happy.
Configuration tip
With an automated failover solution, use read_only=1
in my.cnf
at all times. Only the failover solution will set a server to read_only=0
.
With this configuration, when M
restarts, MySQL starts up as read_only=1
.
A non planned failover illustration #2
Master M
gets network isolated for 10
seconds, during which time we failover. R
gets promoted. Our tool:
- Fails over, but doesn’t need to run any hooks.
The proxy:
- Knows both about
M
andR
. - Notices
M
fails health checks (select @@global.read_only
returns error since the box is down). - Notices
R
reports healthy and withread_only=0
. - Routes all traffic to
R
. 10
seconds laterM
comes back to life, claimingread_only=0
.- The proxy now sees two servers reporting as healthy and with
read_only=0
. - The proxy has no context. It does not know why both are reporting the same. It is unaware of failovers. All it sees is what the backend MySQL servers report.
Therein lies the problem: you can not trust multiple servers (MySQL backends) to deterministically pick a leader (the master) without them collaborating on some elaborate consensus communication.
A non planned failover illustration #3
Master M
box is overloaded, issuing too many connections
for incoming connections.
Our tool decides to failover.
- And doesn’t need to run any hooks.
The proxy:
- Notices
M
fails health checks (select @@global.read_only
does not respond because of the load). - Notices
R
reports healthy and withread_only=0
. - Routes all traffic to
R
. - Shortly followed by
M
recovering (since no more writes are sent its way), claimingread_only=0
. - The proxy now sees two servers reporting as healthy and with
read_only=0
.
Again, the proxy has no context, and neither do M
and R
, for that matter. The context (the fact we failed over from M
to R
) was known to our failover tool, but was lost along the way.
Planned failover illustration
We wish to replace the master, for maintenance reasons. We successfully and gracefully promote R
.
M
is available and responsive, we set it toread_only=1
.- We set
R
toread_only=0
. - All new connections route to
R
. - We should also instruct our Proxy to kill all previous connections to
M
.
This works very nicely.
Discussion
There is a substantial risk to this method. Correlation between failover and network partitioning/load (illustrations #2 and #3) is reasonable.
The root of the problem is that we expect individual servers to resolve conflicts without speaking to each other: we expect the MySQL servers to correctly claim “I’m the master” without context.
We then add to that problem by using the proxy to “pick a side” without giving it any context, either.
Sample orchestrator configuration
By way of discouraging use of this method I do not present an orchestrator
configuration file.
All posts in this series
- MySQL master discovery methods, part 1: DNS
- MySQL master discovery methods, part 2: VIP & DNS
- MySQL master discovery methods, part 3: app & service discovery
- MySQL master discovery methods, part 4: Proxy heuristics
- MySQL master discovery methods, part 5: Service discovery & Proxy
- MySQL master discovery methods, part 6: other methods