I'm setting up HA Proxy as a load balancer and failover on a master/master + slave replication. I have two xinetd bash scripts listening to the ports 9200 and 9201. The one at the port 9200 check the master status and the one at the port 9201 check the slave status and how behind the master it is.
My HA Proxy config file looks like:
global
log 127.0.0.1 local0 notice
defaults
log global
retries 2
timeout connect 10000
timeout server 28800000
timeout client 28800000
// writes and critical reads goes here
// critical reads are the ones we can't afford any latency at all
listen mariadb-writes
bind 0.0.0.0:3307
mode tcp
option allbackups
option httpchk
balance roundrobin
// 9200 check master status
server mariadb1 1.1.1.1:3306 check port 9200 // master1
server mariadb2 2.2.2.2:3306 check port 9200 backup // master2
// heavy reads that we can afford some latency
listen mariadb-reads
bind 0.0.0.0:3308
mode tcp
option allbackups
option httpchk
balance roundrobin
// 9201 check slave status and seconds behind
server mariadb1 1.1.1.1:3306 check port 9201
server mariadb2 2.2.2.2:3306 check port 9201
server mariadb3 3.3.3.3:3306 check port 9201
// 9200 on backups check the master status
server mariadb1b 1.1.1.1:3306 check port 9200 backup
server mariadb2b 2.2.2.2:3306 check port 9200 backup
The reason I use two scripts is because it's the only way I found to solve a broken replication problem, but it is also creating a new issue. I opted to do two different scripts because checking the slave status on my master-master replication could deactivate one of the masters if the other goes down, since it would break the replication. So instead of checking the slave status, on my masters I just write to one of the nodes and keep writing to it if it's up. If for some reason my master goes down, the master backup will hold the requests.
The problem I see with that is if the master1 goes down, master2 will receive the writes and depending how long it stays down, when it goes up, the replication will be far behind and activating it will cause serious data consistency problem until the replication is caught up.
I'm thinking of doing two checkups in the 9200 master script, one will check the slave status and if it's up, check how many seconds it's behind, but if the slave is down, check the master status. In other words, do not return a 503 in case slave is broken, since it can be the second master going down and breaking the replication. But this has some flaws as well since when the master1 is up, replication will be broken until MariaDB reconnect to the other master2, so during this time writes can't be directed to that node. I can configure HA Proxy to wait several seconds before activating a node that is down, but it does not seems the proper solution to me.
Basically I'm trying to figure how to manage the connections if my master1 goes up and HA Proxy forward the requests to it while it's catching up the downtime replicating the data from the master2. Does anyone knows a better approach for this issue?
(This answer does not address the Monitoring question you pose; instead it fast-forwards to the next step - fixing replication slowdowns.)
Do you have multi-threaded replication? If so, what parameters have you set? (Too high may be as bad as too low.)
Do you have the slowlog turned on in the Slave? With a low value for long_query_time and log_slow_slave_statements = ON?
What are the slowest queries? Let's see them, plus SHOW CREATE TABLE and EXPLAIN SELECT ....
That is, speeding up either the SELECTs on the Slave, or the writes replicated to the Slave, may "eliminate" the problem.
Related
I have configured MySQL Master-Slave replication and have an automation to perform the failover when Master goes down and also it will take care of fail back to Master when it comes back online.
But I am trying to configure HaProxy to send the requests to Master always and when Master goes down then in few mins it has forward the requests to Slave server and when Master comes back online then all the requests has to be forwarded to Master again.
Is there any balance config which does this setup in HaProxy?
I managed to make it work. Here is what I have used on haproxy config.
listen sql_cluster 10.0.0.4:3307
mode tcp
balance roundrobin
option mysql-check user haproxy_check
server masterdb 10.0.0.5:3306 check inter 30s fall 3 rise 100
server slavedb 10.0.0.6:3306 check backup
So in this config, it will check 100 times in the time of 30s interval to fall back the connection to Master, in meantime the automation which I developed will take care of make the Master DB sync with Slave once it comes back online and will restarting haproxy should forward the connections to Master.
I currently have haproxy configured to load balance and failover a set of mysql servers. I have a backend configured for write transactions like so:
backend pool_mysql_write
timeout connect 10s
timeout server 1m
mode tcp
option mysql-check user haproxy_check
server primary <primary_node>:3306 check fastinter 1000
server secondary <secondary_node>:3306 check fastinter 1000 backup
The "backup" directive is there so that all writes go to the primary DB node only and will failover to secondary only if primary goes down.
EDIT: The DB nodes are in master-master replication mode.
My question is, what happens to in-flight mysql write queries while haproxy is in the process of failing over to the secondary DB node? For example, if it takes 5 seconds for haproxy to promote the secondary DB node for writing, what happens to all of the write queries that may have been trying to write to the DB within the 5 seconds of failover time?
Do they just disappear? Do they get queued up somewhere so that they could be committed once the secondary DB node gets promoted?
When a new TCP connection to haproxy is initiated (by a client), it in turn opens a new TCP connection to the upstream server. Once that upstream TCP connection is established, haproxy just connects those TCP connections together, relaying communications back and forth.
If either of those TCP connections (i.e. the one it has to the client, or the one it has to the upstream server) drop, haproxy simply drops the other one—in other words, haproxy won't handover existing TCP connections to alternative servers (it just redirects new connections). The counterparty is responsible for deciding what to do next. (For a smarter approach, you'd need a layer-7 proxy like MariaDB MaxScale, which can reroute existing connections).
Typically, if its connection was dropped unexpectedly, the client will attempt to reconnect (whereupon haproxy could end up connecting it with a different upstream server, e.g. because the original one is no longer available).
The question is, what if the client had sent commands to the original server that would have mutated its state?
If the client had received a commit acknowledgement prior to the connection dropping, then that the client will understand its writes to have been committed. You must therefore be certain that primary does not acknowledge commits until writes have been replicated to secondary—hence why I asked above how you are performing that replication: at very least, you want Semisynchronous Replication:
MySQL replication by default is asynchronous. The master writes events to its binary log but does not know whether or when a slave has retrieved and processed them. With asynchronous replication, if the master crashes, transactions that it has committed might not have been transmitted to any slave. Consequently, failover from master to slave in this case may result in failover to a server that is missing transactions relative to the master.
Semisynchronous replication can be used as an alternative to asynchronous replication:
[ deletia ]
While the master is blocking (waiting for acknowledgment from a slave), it does not return to the session that performed the transaction. When the block ends, the master returns to the session, which then can proceed to execute other statements. At this point, the transaction has committed on the master side, and receipt of its events has been acknowledged by at least one slave.
If the client had not received commit acknowledgement prior to the connection dropping, then the client should assume that the writes were not committed and handle accordingly—e.g. reattempt the transaction once a new connection is established.
However, it is possible that primary had in fact committed the writes (and indeed replicated them to secondary) but then failed just before it sent the commit acknowledgement, or it sent the commit acknowledgement but the connection failed before it was received by the client. I don't think there's much one can do about this (comments please?), but the risk is extremely small (since no processing occurs between commits being completed and acknowledgements being sent).
I have remote host (centos6) with mariadb (10.0.17-MariaDB-log - MariaDB Server) as master:
server-id = 1
log-bin=mysql-bin
binlog_do_db = mydatabase
and local (win8.1) with mariadb (10.0.16-MariaDB-log - mariadb.org binary distribution) as a slave:
server-id = 2
As initial procedure I've dumped database on remote host, imported it on local host, then executed SHOW MASTER STATUS, get filename and offset and run:
CHANGE MASTER TO MASTER_HOST='$host', MASTER_USER='$user', MASTER_PORT = $port, MASTER_PASSWORD='$pass', MASTER_LOG_FILE='$fname', MASTER_LOG_POS=$pos
STOP SLAVE
START SLAVE
Replication starts. Everything I do with table on master is reflected to slave.
But if slave goes down, after it is up no changes (made on master) while slave was offline reflected to slave! So it looks like my slave should always be online, but it's a laptop!
However after slave is up realtime replication still works - it doesn't remember all changes from offline, but if I change database on master when slave is online and started all changes are perfectly reflected to slave. Of course I know that replication is statement based, so I get not data diff but instructions. But I thought master remembers what it sent and what did not. So on next operation it just sends all non-delivered changes. Am I wrong?
My replication scenario: master server interacts with clients (mobile devices) and they change the database. From time to time I launch my laptop, start replication, get updated database and do some heavy analysis (it's too hard for my 2-core cheap server).
Maybe there is a better method? Is there a way to get "offline changes" like in ICQ messenger? :)
For now I can see only one solution - full db dump, but it is inconvenient, takes too much time and loads master heavily.
While the Slave is not connected to the Master, the Master is writing to its binlog(s). The Slave has remembered where it left off in reading from those binlogs. When the Slave reconnects, it picks up "where it left off", copies the changes from the Master's binlog(s) to the Slaves relay-log(s) and performs them. This "catchup" process will take a little time, how long depends on a lot of factors.
Do SHOW SLAVE STATUS; on the Slave to verify that it is connected and running ("Yes").
Normally, the version of the Slave should be no older than that of the Master. (I doubt if there is any issue between 10.0.17 vs 16. Nothing in the changelog for 10.0.17 jumps out at me.)
Are you using "parallel replication"? See bug fixed in 10.0.18.
I'm running MySQL as the database on Ubuntu instances. I'm using MySQL Master-Slave replication where master's changes will be written to slave and slave's changes will not be reflected on the master. That's fine. I'm using a HAProxy load balancer to front the MySQL instances where all the requests will be sent to master MySQL instance. If the master MySQL instance is down slave MySQL instance will act as master and HAProxy will send all the requests to salve. Active-Passive scenario.
HAProxy - 192.168.A.ABC
MySQL Master - 192.168.A.ABD
MySQL Slave - 192.168.A.ABE
Let's assume that the MySQL master(192.168.A.ABD) is down. Now all the requests will be sent to MySQL slave(192.168.A.ABE) by HAProxy where now he acts as the master MySQL server for the time being.
My problems are
What happens when original master MySQL instance(192.168.A.ABD) is up?
Will changes written to new MySQL master (192.168.A.ABE) be replicated to original master(192.168.A.ABD) again?
How should I address this scenario?
First of all I should say that I have never used HA Proxy so con't comment on that directly.
However, in your current setup the Master (ABD) will be out of sync, and wont catch up. You will have to rebuild that using mysqlDump or similar tool.
What you would need is a Master < > Master setup (as opposed to Master > Slave), which enables you to write to either database and have it reflected in the other. This isn't quite as straight forward as it sounds though.
Assuming you already have your master > slave setup, and they are in sync
On the Master (ABD) you want to add:
auto_increment_increment=2
auto_increment_offset=1
log-slave-updates
On the Slave (ABE) add:
auto_increment_increment=2
auto_increment_offset=2
log-slave-updates
to your my.cnf files. Restart the Database. This will help to prevent Duplicate Key Errors. (n.b. that log-slave-updates isn't strictly required but makes it easier to add another slave in future)
Next you want to tell the Master (ABD) to replicate from the Slave (ABE).
Depending on what version of MySQL and if you are using GTID etc. the exact process differs slightly. But basically you are going to issue a CHANGE MASTER statement on the Master so it replicates from the slave.
And away you go. You probably want to avoid writing to both at the same time as that opens up a whole other kettle of fish. But if the Master goes down, you can switch your writes to the slave, and when the master comes back up, it will simply start replicating the missing data.
I am considering you scenario
Master - 192.168.A.ABD
Slave - 192.168.A.ABE
You cannot directly add the master in system. To Add master in system you need to perform below steps:
1) When master is up you can add this as a slave. So now this happens
Master - 192.168.A.ABE
Slave - 192.168.A.ABD
2) Then Now U can put master Down. Means You can put 192.168.A.ABD Down
3) Then Again Add this as slave. So After this You will get below scenarion
Master - 192.168.A.ABD
Slave - 192.168.A.ABE
You can refer this link
https://dev.mysql.com/doc/refman/5.5/en/replication-solutions-switch.html
I'm using mysql 5.22 version for master and slave replication. when I execute the show slave status command it's showing slave_io_state as connecting. how to solve this problem.
Please help me the same.
Regards,
Yasar
From here-
run a "show master status" on the master DB. It will give you the
correct values to update your slave with. From your slave status, it
looks like your slave has successfully connected to the master and is
awaiting log events. To me, this means your slave user has been
properly set up, and has the correct access. It really seems like you
just need to sync the correct log file position. Careful, because to
get a good sync, you should probably stop the master, dump the DB,
record the master log file positions, then start the master,import the
DB on the slave, and finally start the slave in slave mode using the
correct master log file pos. I've done this about 30 times, and if you
don't follow those steps almost exactly, you will get a bad sync.
Else go through this(How to set up replication) again, to see if there are some config problem.
If Slave I/O thread is showing Connecting to Master status check these things:
Verify the privileges for the user being used for replication on the
master.
Check that the host name of the master is correct and that you are using the correct port to connect to the master. The port used for replication is the same as used for client network communication (the default is 3306). For the host name, ensure that the name resolves to the correct IP address.
Check that networking has not been disabled on the master or slave. Look for the skip-networking option in the configuration file. If present, comment it out or remove it.
If the master has a firewall or IP filtering configuration, ensure that the network port being used for MySQL is not being filtered.
Check that you can reach the master by using ping or traceroute/tracert to reach the host.