Mysql slave out of sync after crash - mysql

We have a "1 master, 1 slave" MySQL setup. We had a sudden power outage that took down the slave. After getting the machine back up, I found that the slave was out of sync with the master:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.1
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-log.001576
Read_Master_Log_Pos: 412565824
Relay_Log_File: mysqld-relay-bin.002671
Relay_Log_Pos: 6930
Relay_Master_Log_File: mysql-log.001573
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table: blah.table2
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Skip_Counter: 0
Exec_Master_Log_Pos: 689030864
Relay_Log_Space: 2944772417
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
We're using a binlog format of "ROW", so when I try to use mysqlbinlog to look at the offending row, I don't see anything of use. I don't want to simply set the skip counter, because I think that would throw my table even further out of sync.
Is there anything I can do on the slave that would essentially "roll back" to a given point in time, where I could then reset the master log number, poition, etc? If not, is there anything at all that I can do to get back in sync?

One can usually recover from small discrepancies using pt-table-checksum and pt-table-sync.
It looks to me like your slave lost its place in the binary log sequence when it crashes. The slave continually writes its last processed binlog event into datadir/relay-log.info, but this file uses buffered writes, so it is susceptible to losing data in a crash.
That's why Percona Server created a crash-resistant replication feature to store the same replica info in an InnoDB table, to recover from this scenario.
MySQL 5.6 has implemented a similar feature: you can set relay_log_info_repository=TABLE so the replica saves its state in a crash-resistant way.
Re your comment:
Yes, in theory pt-table-sync can fix any amount of replication drift, but it's not necessarily the most efficient way to correct large discrepancies. At some point, it's quicker and more efficient to trash the outdated replica and reinitialize it using a new backup from the master.
Check out How to setup a slave for replication in 6 simple steps with Percona Xtrabackup.

Related

Issues with MySql replication on MariaDB

I have been trying to get MySQL replication set up on digital ocean with forge servers & Maria DB.
I keep getting this error when running slave status\g :
Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
This is the tutorial I followed:
https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql
I've checked the server-id in both my.conf files and the master is set to 1 and the slave 2.
Here's a dump of the full status\g output
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: *****
Master_User: slave_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000017
Read_Master_Log_Pos: 642
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 4 <br>
Relay_Master_Log_File: mariadb-bin.000017
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 1
Exec_Master_Log_Pos: 642
Relay_Log_Space: 249
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1593
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: conservative
Can anyone help?
Check that the config file is being used. It is probably /etc/my.cnf (not my.conf).
Run SHOW VARIABLES LIKE 'server_id'; on both servers.
Check that server_id is in the [mysqld] section of my.cnf.

MySQL replication: failover scenario

I have the following MySQL instances, along with a replication setup:
S1 -----> (M1 <--> M2), where:
M1 - M2 is a multi-master replication setup,
S1 - a slave which replicates the writes which are done at Master M1.
Now, I'm trying to enhance the setup with a channel failover mechanism, where S1 would start replicating from M2, should M1 go down. Currently, the only way of doing this that I see is:
(M1 failure detection mechanism on S1 machine), then:
-> S1 gets the latest timestamp of M1's queries from the local relay log file.
-> M2 searches (bash script using mysqlbinlog utility) for the local binlog file + binlog index which corresponds to S1's latest timestamp
-> S1 can finally do a "STOP SLAVE", "CHANGE MASTER TO master_host=M2... master_log_file=... master_log_pos=...", etc. command to continue replication, but from M2 this time
Is there a better (and less error prone) way of doing this?
Thank you
EDIT: Nowadays, this is much easier to achieve thanks to the unique Xid binlog query tags commonly used by the publicly accessible MySQL clustering solutions.
There is a more simplistic way to retrieve the binlog and position needed.
Would it make more sense to just use the current binlog and position as M2 knows it ? You need to check the Slave status on M2.
Example
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.64.51.130
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000463
Read_Master_Log_Pos: 453865699
Relay_Log_File: relay-bin.001226
Relay_Log_Pos: 453865845
Relay_Master_Log_File: mysql-bin.000463
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: search_cache
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 453865699
Relay_Log_Space: 453866038
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 106451130
1 row in set (0.00 sec)
mysql>
For this display, there are five crucial components:
Master_Log_File (Line 6) : Log File on the Master whose position was last read
Read_Master_Log_Pos (Line 7) : Last position read on the Slave from the Master
Relay_Master_Log_File (Line 10) : Log File on the Master whose position was last executed
Exec_Master_Log_Pos (Line 22) : Last position executed on the Slave from the Master
Relay_Log_Space (Line 23) : Sum of bytes from all relay logs
Relay_Log_Space
Please note Relay_Log_Space. Once this number stops incrementing, every SQL statement imported from the Master has been read. Unfortunately, it is possible that the last relay log may be corrupt or simply incomplete because of a sudden failover.
Replication Coordinates
Please also note that the Replication Coordinates (Relay_Master_Log_File, Exec_Master_Log_Pos). This is the position you are hunting for. However, like Relay_Log_Space it may still be incrementing. In fact, those Replication Coordinates should be equal to the other Replication Coordinates (Master_Log_File,Read_Master_Log_Pos ). That's when you know everything is caught up. If the pair of Replication Coordinates never meet, then you should rely on Relay_Log_Space a little more in terms of when it stops incrementing.
What about Seconds_Behind_Master ?
The reason you cannot use Seconds_Behind_Master is simple. Once a Master goes down hard, all it takes just one Replication thread (Slave_IO_Running or Slave_SQL_Running) to become No and Seconds_Behind_Master turns NULL.

MySQL Replication fails with error "Could not parse relay log event entry."

I've searched google thoroughly for a definitive solution or set of steps to resolve this issue, but there don't seem to be many high quality results, and I haven't found the question on stack overflow. We're trying to set up MySQL replication using one slave. The slave appears to be replicating fine, and then the following error occurs:
Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
In order to benefit the large number of people who will inevitably stumble upon this question from a search, it would be helpful if someone who responds provided an overview of what could be going wrong and what steps to take to resolve this issue, but I will also provide more details below related to my particular situation in hopes that someone can help me solve it.
The dump that we imported into the slave to get it started was created using the following command on the master:
mysqldump --opt --allow-keywords -q -uroot -ppassword dbname > E:\Backups\dbname.sql
The script that performs this backup also logs the master's current binary log position. We then took the following steps to start replication on the slave:
1. STOP SLAVE;
2. DROP DATABASE dbname;
3. SOURCE dbname.sql;
(... waited a few hours for the 10gb dump to import)
4. RESET SLAVE;
5. CHANGE MASTER TO MASTER_HOST='[masterhostname]', MASTER_USER='[slaveusername]', MASTER_PASSWORD='[slaveuserpassword]', MASTER_PORT=[port], MASTER_LOG_FILE='[masterlogfile]', MASTER_LOG_POS=[masterlogposition];
6. START SLAVE;
After about a day of replication working fine, it failed again at 3:43 AM. The first thing that appeared in MySQL's error log was the error above. Then another generic error appeared after with the same timestamp:
Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '[masterlogfile]' position [masterlogpos]
For more logging information, I had set up a batch script to run "SHOW SLAVE STATUS" and "SHOW FULL PROCESSLIST" every hour. Here are the results before and after the failure:
--Monitoring: 3:00:00.15
Slave Status:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.xxx.xxx
Master_User: slave_user
Master_Port: xxxx
Connect_Retry: 60
Master_Log_File: mysql-bin.000xxx
Read_Master_Log_Pos: 316611912
Relay_Log_File: dbname-relay-bin.00000x
Relay_Log_Pos: 404287513
Relay_Master_Log_File: mysql-bin.000xxx
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: dbname
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 316611912
Relay_Log_Space: 404287513
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
*************************** 1. row ***************************
Id: 98
User: system user
Host:
db: NULL
Command: Connect
Time: 60547
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 99
User: system user
Host:
db: NULL
Command: Connect
Time: 5
State: Has read all relay log; waiting for the slave I/O thread to update it
Info: NULL
*************************** 3. row ***************************
Id: 119
User: root
Host: localhost:xxxx
db: NULL
Command: Query
Time: 0
State: NULL
Info: SHOW FULL PROCESSLIST
--Monitoring: 4:00:02.71
Slave Status:
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.xxx.xxx
Master_User: slave_user
Master_Port: xxxx
Connect_Retry: 60
Master_Log_File: mysql-bin.000xxx
Read_Master_Log_Pos: 324365637
Relay_Log_File: dbname-relay-bin.00000x
Relay_Log_Pos: 410327741
Relay_Master_Log_File: mysql-bin.000xxx
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB: dbname
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 322652140
Relay_Log_Space: 412041238
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
*************************** 1. row ***************************
Id: 98
User: system user
Host:
db: NULL
Command: Connect
Time: 64149
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 122
User: root
Host: localhost:3029
db: NULL
Command: Query
Time: 0
State: NULL
Info: SHOW FULL PROCESSLIST
I tried following the instructions from the error and ran mysqlbinlog on the slave's relay log with a start_position thousands of statements before, and stop_position thousands of statements after the point of failure, and redirected the output to a text file. I did not see any corruption errors on the command line or in the log file. This is what the log file said around the point of failure:
...
# at 410327570
#120816 3:43:26 server id 1 log_pos 322651969 Intvar
SET INSERT_ID=3842697;
# at 410327598
#120816 3:43:26 server id 1 log_pos 322651997 Query thread_id=762340 exec_time=0 error_code=0
SET TIMESTAMP=1345113806
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327741
#120816 3:44:26 server id 1 log_pos 322754486 Intvar
SET INSERT_ID=3842701;
# at 410327769
#120816 3:43:26 server id 1 log_pos 322754514 Query thread_id=762340 exec_time=0 error_code=0
SET TIMESTAMP=1345113866;
insert into LOGTABLENAME (UpdateDate, Description) values (now(), "Invalid floating point operation");
# at 410327912
...
Interesting that it's logging an Invalid floating point operation at that point, but I'm not sure how that could cause replication to break at that position. I ran mysqlbinlog on the master's binary log found in SHOW SLAVE STATUS from above, and did not see any errors on the command line (but did not get a chance to open the 100mb log file that was generated since I didn't want to bog down the production server).
So right now I'm at a loss for what else to try. I'm basically just looking for any insights as to what might be going wrong or any suggestions for what steps to take next. Thanks!
I'm not sure what the root cause may be. But to recover from this situation, you'd want to instruct MySQL to clear out all the relay-bin-logs beyond the following point
Relay_Master_Log_File: mysql-bin.000xxx
Exec_Master_Log_Pos: 322652140
by doing the following:
STOP SLAVE; CHANGE MASTER TO MASTER_LOG_FILE = 'mysql-bin.000xxx', MASTER_LOG_POS = 322652140; START SLAVE;
NOTE: To readers out there, do not be confused by Relay_Master_Log_File, it is NOT the same as Read_Master_Log_Pos. And do not confuse Exec_Master_Log_Pos with Read_Master_Log_Pos. The Read_* is a read-ahead strategy that MySQL does to download the replication bin logs from the master ahead of the actual implementation of the replication being executed locally.

Replicate tables from different database of same mysql server

I have one server with 2 databases, and i want to replicate several tables from one database to another. Purpose is that we uses same user's table that used in projects.
As in anothers tables used InnoDB with foreign keys to users table i've chosen a replication way.
For that I made the changes for my.cnf
master-user=root
server-id = 2
replicate-rewrite-db = dou->jobs
replicate-do-table = jobs.auth\_user
replicate-wild-do-table = jobs.geo\_%
replicate-do-table = jobs.user\_profile
replicate-same-server-id = 1
report-host = master-is-slave
binlog-do-db = dou
log-bin
after syncing tables from binlog-do-db and starting slave error.log next lines appears:
111112 15:10:22 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='localhost', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='localhost', master_port='3306', master_log_file='mysql-bin.000074', master_log_pos='106'.
111112 15:10:36 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000074' at position 106, relay log '/var/log/mysql/dell-relay-bin.000001' position: 4
111112 15:10:36 [Note] Slave I/O thread: connected to master 'root#localhost:3306',replication started in log 'mysql-bin.000074' at position 106
Seems that on this step everything is ok, and show slave status shows no errors.
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: localhost
Master_User: root
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000074
Read_Master_Log_Pos: 814
Relay_Log_File: dell-relay-bin.000002
Relay_Log_Pos: 959
Relay_Master_Log_File: mysql-bin.000074
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table: jobs.user\_profile,jobs.auth\_user
Replicate_Ignore_Table:
Replicate_Wild_Do_Table: jobs.geo\_%
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 814
Relay_Log_Space: 1113
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
ERROR:
No query specified
The thing is that master changes does not affect slave but slave status changes.
Thanks for any help in solve that problem.
I won't tell you that replicating table to the same DB is bad idea, cause of several times increased IO.
Slave is not updated, because server-id is the same for both master and slave. Usually slave ignores updates with the same server-id as it's own.
Add replicate-same-server-id to my.cnf. replicate-same-server-id documentation
Out of interest would a view do what you want?
I really don't think running replication on the same instance of mysql into to the same instance is a good idea.
The other option you might want to investigate if server hardware is a problem, would to run multiple instances of mysql running off different ports on the same machine, which might help you achieve what you are looking for. This is something i am using in a test environment for simulated master DB failure and slave DB promotion.
There is many situations on where you can achive more optimisations for example. DB1 replicates to DB2 (same server). From where DB2 only has data stored for as long as 1 week. all data that is older than 1 weeks gets deleted (On db2). in such a setup for example a high traffic db server where u need it to be as clean as possible a dual db setup on the same server uses "less" resources from the server if what your server do most of the time is read data from the database. I have such a setup buth i use 4 diferent servers. server 1) 3 days. server 2) 30 days. server 3) 2 months, and server 4 all data from start. (Server 4 mostly used for gething very old registers not so much used. Sorry for my english buth i think i made a point on where you CAN and should use db replication on the same server to reduce memmory usage and cpu usage.

Mysql Slave not updating

I have replication set up every thing looks fine I have not errors , but the data is not being moved to the Slave
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: xxxxx
Master_User: xxxxxx
Master_Port: xxxx
Connect_Retry: 30
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 98
Relay_Log_File: xxxxx-relay-bin.002649
Relay_Log_Pos: 235
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 235
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
Run a show master status or show master status\G on the master DB. It will give you the correct values to update your slave with.
From your slave status, it looks like your slave has successfully connected to the master and is awaiting log events. To me, this means your slave user has been properly set up, and has the correct access. It really seems like you just need to sync the correct log file position.
Careful, because to get a good sync, you should probably stop the master, dump the DB, record the master log file positions, then start the master, import the DB on the slave, and finally start the slave in slave mode using the correct master log file pos. I've done this about 30 times, and if you don't follow those steps almost exactly, you will get a bad sync.
there could be couple of issues
master did not know about slave.
slave and master are not in sync with relay log file.
you have to sync the slave with master from where it did not updated. then you start slave. it should work fine.