Replaying Mysql Replication error - mysql

We're running a standard Mysql Master/Slave replication system which has been working nicely for a couple of years. So nicely in fact that we were able to recover from a server outage (which rendered the Master offline) by using the Slave DB (after turning it into a Master).
Anyway, the issue that is causing our problem occurred after we recovered the server and returned the original Master to being a Master and the original slave back to being a Slave.
It was caused by me being an idiot - not by anything else!
Basically a write occurred on the Slave - meaning that a write on the same table on the Master cannot be replicated across due to a Duplicated Primary Key error.
As I said, my fault.
I can delete the record from the Slave that was erroneously written there - thus removing the conflicting ID from the Slave but I want the record from the Master DB.
My question is this: I know that I can 'fix' the replication by skipping over the relevant query from the Master (by setting the SQL_SLAVE_SKIP_COUNTER to 1) - but that will mean I lose the record from the Master.
So, is there a way of 'replaying' the errored replication write? Basically pointing the replication to be one query back?
Or do I have to go through the whole rigmarole of dropping my Slave, repopulating it from my last successful backup and set it to be a Slave again?
Many thanks in advance.

If it's a simple modification/update/... why don't you simply play it on the master (manually or taking it from the slave binary log if you have one), with slave's slave process off, then set the new replication pointer (file & position) on the slave (show master status; on master), and restart the slave?

The answer was actually staring me in the face (isn't it always!).
The duplicate Primary key error was caused by me doing an insert on the Slave and an insert to the same table on the master - meaning both new records had the same ID.
Luckily for me, I did not need the insert on the Slave, so I simply deleted it - meaning that there was only one record with the primary key value.
I just needed a way to get the replication to attempt to replicate the record across - and this was easy.
Just restart the Slave!
So:
mysql root:(none)>START SLAVE;
Query OK, 0 rows affected (0.03 sec)
mysql root:(none)> _

Related

Safely killing a mySQL query that will not respond to KILL PROCESS

Running a regular OPTIMIZE TABLE query on a MYISAM table that usually takes a few minutes has now taken about 3 days. This also happened a week or so ago, but it did complete successfully and I wrongly assumed that the table had been repaired okay, and this time it wouldn't present a problem. Not so...
Last time, the process did not respond to KILL PROCESS and I ended up having to restart mysqld. The error log reported warnings such as:
[Warning] /usr/sbin/mysqld: Forcing close of thread 24974085 user: 'XXXX'
And when mysqld restarted, a number of other tables had been left corrupted and had to be rebuilt. Some of these tables are large, and I want to avoid this happening again. I understood that restarting mysqld should safely close tables, but it didn't seem to.
The web services that use the mySQL server will be put into maintenance mode to prevent new queries hitting the DB. But how can I safely close all open tables given that FLUSH TABLES is being blocked by the OPTIMIZE query?
SHOW OPEN TABLES;
[etc...]
114 rows in set (0.01 sec)
I really don't want 114 tables to have to be repaired. Any advice?
NB: I'm quite aware that the table being optimized will be corrupted and can live with that. I just want to minimize problems with the other tables.
Thanks.
UPDATE:
So FLUSH TABLES accepts a comma-separated list of tables... Maybe it could be as simple as flushing all open tables, but I'm still open to reassurance about this, or any other ideas. Flushing a specific table does remove it from the open tables list so this is looking promising.
http://dev.mysql.com/doc/refman/5.1/en/flush.html
FWIW, this all went very smoothly. Steps followed were something like this. I'm not sure if flushing and closing each table separately was overkill, but no tables crashed and total downtime was about 45 mins, so it was worth it:
Make a full slave backup
Redirect FQDN for the master to a dead IP so all services are down and master receives no new queries
When master is receiving no new queries, flush and close all tables individually
Stop mysql on master
Ensure OPTIMIZE TABLE does not cause problems for slave if it is replicated out
Stop mysql on slave
Copy all blog tables from slave to master
Restart master
Restart slave
Check for any corrupt tables and allow them to be repaired
Map FQDN back to the master internal IP
Sanity check of posts on main blog
Restart replication on slave and masters
Make sure replication progresses correctly (multi-master)
Make a full master backup

MySql Replication - slave lagging behind master

I have a master/slave replication on my MySql DB.
my slave DB was down for a few hours and is back up again (master was up all the time), when issuing show slave status I can see that the slave is X seconds behind the master.
the problem is that the slave dont seem to catch up with the master, the X seconds behind master dont seem to drop...
any ideas on how I can help the slave catch up?
Here is an idea
In order for you to know that MySQL is fully processing the SQL from the relay logs. Try the following:
STOP SLAVE IO_THREAD;
This will stop replication from downloading new entries from the master into its relay logs.
The other thread, known as the SQL thread, will continue processing the SQL statements it downloaded from the master.
When you run SHOW SLAVE STATUS\G, keep your eye on Exec_Master_Log_Pos. Run SHOW SLAVE STATUS\G again. If Exec_Master_Log_Pos does not move after a minute, you can go ahead run START SLAVE IO_THREAD;. This may reduce the number of Seconds_Behind_Master.
Other than that, there is really nothing you can do except to:
Trust Replication
Monitor Seconds_Behind_Master
Monitor Exec_Master_Log_Pos
Run SHOW PROCESSLIST;, take note of the SQL thread to see if it is processing long running queries.
BTW Keep in mind that when you run SHOW PROCESSLIST; with replication running, there should be two DB Connections whose user name is system user. One of those DB Connections will have the current SQL statement being processed by replication. As long as a different SQL statement is visible each time you run SHOW PROCESSLIST;, you can trust mysql is still replicating properly.
What binary log format are you using ? Are you using ROW or STATEMENT ?
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
If you are using ROW as a binlog format make sure that all your tables has Primary or Unique Key:
SELECT t.table_schema,t.table_name,engine
FROM information_schema.tables t
INNER JOIN information_schema .columns c
on t.table_schema=c.table_schema
and t.table_name=c.table_name
and t.table_schema not in ('performance_schema','information_schema','mysql')
GROUP BY t.table_schema,t.table_name
HAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;
If you execute e.g. one delete statement on the master to delete 1 million records on a table without a PK or unique key then only one full table scan will take place on the master's side, which is not the case on the slave.
When ROW binlog_format is being used, MySQL writes the rows changes to the binary logs (not as a statement like STATEMENT binlog_format) and that change will be applied on the slave's side row by row, which means a 1 million full table scan will take place on the slave's to reflect only one delete statement on the master and that is causing slave lagging problem.
"seconds behind" isn't a very good tool to find out how much behind the master you really is. What it says is "the query I just executed was executed X seconds ago on the master". That doesn't mean that you will catch up and be right behind the master the next second.
If your slave is normally not lagging behind and the work load on the master is roughly constant you will catch up, but it might take some time, it might even take "forever" if the slave is normally just barely keeping up with the master. Slaves operate on one single thread so it is by design much slower than the master, also if there are some queries that take a while on the master they will block replication while running on the slave.
Just check if you have same time and timezones on both the servers, i.e., Master as well as Slave.
If you are using INNODB tables, check that you have innodb_flush_log_at_trx_commit to a value different that 0 at SLAVE.
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
We had exactly the same issue after setting up our slave from a recent backup.
We had changed the configuration of our slave to be more crash-safe:
sync_binlog = 1
sync_master_info = 1
relay_log_info_repository = TABLE
relay_log_recovery = 1
I think that especially the sync_binlog = 1 causes the problem, as the specs of this slave is not so fast as in the master. This config option forces the slave to store every transaction in the binary lo before they are executed (instead of the default every 10k transactions).
After disabling these config options again to their default values I see that the slave is catching up again.
Just to add the findings in my similar case.
There were few bulk temporary table insert/update/delete were happening in master which occupied most of the space from relay log in slave. And in Mysql 5.5, since being single threaded, CPU was always in 100% and took lot of time to process these records.
All I did was to add these line in mysql cnf file
replicate-ignore-table=<dbname>.<temptablename1>
replicate-ignore-table=<dbname>.<temptablename2>
and everything became smooth again.
Inorder to figure out which tables are taking more space in relay log, try the following command and then open in a text editor. You may get some hints
cd /var/lib/mysql
mysqlbinlog relay-bin.000010 > /root/RelayQueries.txt
less /root/RelayQueries.txt
If u have multiple schema's consider using multi threaded slave replication.This is relatively new feature.
This can be done dynamically without stopping server.Just stop the slave sql thread.
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;
I have an issue similar to this. and both of my MySQL server hosted on AWS EC2 (master and replication). by increasing EBS disk size (which automatically increased IOPS) for MySQL slave server, its turned out the solution for me. R/W Throughput and bandwidth is increased R/W latency were decreased.
now my MySQL database replication is catching up to the master. and Seconds_Behind_Master was decreased (it was got increased from day to day).
so if you have MySQL hosted on EC2. I suggest you tried to increase EBS disk size or its IOPS on the slave.
I know it's been a while since OP asked but it would have helped me to read the following answer.
In /etc/mysql/mysql.cnf :
[mysql]
disable_log_bin
innodb_flush_log_at_trx_commit=2
innodb_doublewrite = 0
sync_binlog=0
disable_log_bin REALLY carried the trick for me.

writes to mysql slave server by mistake

I have mysql replication set up with one master and one slave. Due to a bug in the code, somewhere in the middle the entries started to get written on slave server and it was detected a few days later on.
Now I am thinking of how to switch it correctly without any hassle or minimal down time, what would be the best way to do this? Lets consider only one table...
Solution 1
Simply start writing to master from now on after setting auto_increment to slave's last id. Wondering if it will be troublesome to keep master and slave out of sync.
Solution 2
Clear all the data from master, stop the app from making any more entries refill the data using mysqldump and then switching the app back on with correct config.
stop slave
// load the dump
start slave
Will this stop master from re-attempting to write to slave the same data?
Any help appreciated. Any other solutions also welcomed.
Thanks
Sushil
I think you are on the correct track with solution 2. Simply stopping the slave will not prevent the master from writing to it's binary log. So when you start the slave again it will just replicate all the SQL statements from the master.
However, you can use this to your advantage if you have included 'DROP TABLE' before each table creation. This will mean that you have the following:
1) Stop the app from making any more entries in the master table(s)
2) Dump data from slave (ensure that mysqldump includes 'DROP TABLE' before each table import - it should do as it is a default option of mysqldump)
3) Run dump against master
4) Check slave status using SHOW SLAVE STATUS\G. Once Seconds_Behind_Master reaches 0 then you are good to switch on the app again (make sure it is writing to the master!!)
Step 3 will drop and recreate the tables on the master using the data from the slave. This drop and recreate will be replicated on to the slave so you should end up with the two in sync and a correct master slave set up.
Good luck!
I think your best option is to reset the slave/master completely. If the data on the slave is correct reload the data from it and then export export a new dump from the master and import it to the slave, then execute a new "CHANGE MASTER TO..." command
I would recommend setting the "read_only" global variable on the slave.
http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option_mysqld_read-only

MySQL Replication Error(1062)

I am new to MySQL and after a long search I am able to configure master-slave ROW based replication. I thought it would be safe and I would not have to recheck it again and again.
But today when I did SHOW SLAVE STATUS; on slave then I found following
could not execute Write_rows event on
table mydatabasename.atable; Duplicate
entry '174465' for key 'PRIMARY',
Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's
master log mysql-bin.000004,
end_log_pos 60121977
Can someone tell me how this can even come when master has no such error and schema on both server is the same then how could this happen. And how to fix it to make this work again and how to prevent such thing in future.
Please also let me know what else unexpected I should expect other than this.
It would never happen on master, why?
The series of SQL are replicated from master,
if the record already exist in master, mysql reject on master
but on slave, if fails and the replication position does not advanced to next SQL (it just halted)
Reason?
The insert query of that record is write directly into slave without using replication from the master
How to fix?
Skip the error on slave, like
SET GLOBAL sql_slave_skip_counter = N;
details - http://dev.mysql.com/doc/refman/5.0/en/set-global-sql-slave-skip-counter.html
Or delete the duplicate record on slave, resume the slave again (let the replication do the insertion)
The worse scenario, required you to re-do the setup again to ensure data integrity on slave.
How to prevent?
Check application level, make sure no write directly into slave
This including how you connect to mysql in command prompt
Split mysql user that can do write and read,
So, your application should use read user (master and slave) when does not require write.
Use write user (master only) for action require write to database.
skip counter is not a viable solution always, you are skipping the records but it might affect the further records.
Here is the complete details on why sql slave skip counter is bad.
http://www.mysqlperformanceblog.com/2013/07/23/another-reason-why-sql_slave_skip_counter-is-bad-in-mysql/
You can delete bigger than duplicate rows in slave db;
DELETE FROM mydatabasename.atable WHERE ID>=174465;
then
START SLAVE;

MySQL Binary Log Replication: Can it be set to ignore errors?

I'm running a master-slave MySQL binary log replication system (phew!) that, for some data, is not in sync (meaning, the master holds more data than the slave). But the slave stops very frequently on the slightest MySQL error, can this be disabled? (perhaps a my.cnf setting for the replicating slave ignore-replicating-errors or some of the sort ;) )
This is what happens, every now and then, when the slave tries to replicate an item that does not exist, the slave just dies. a quick check at SHOW SLAVE STATUS \G; gives
Slave-IO-Running: Yes
Slave-SQL-Running: No
Replicate-Do-DB:
Last-Errno: 1062
Last-Error: Error 'Duplicate entry '15218' for key 1' on query. Default database: 'db'. Query: 'INSERT INTO db.table ( FIELDS ) VALUES ( VALUES )'
which I promptly fix (once I realize that the slave has been stopped) by doing the following:
STOP SLAVE;
RESET SLAVE;
START SLAVE;
... lately this has been getting kind of tiresome, and before I spit out some sort of PHP which does this for me, i was wondering if there's some my.cnf entry which will not kill the slave on the first error.
Cheers,
/mp
stop slave; set global sql_slave_skip_counter=1; start slave;
You can ignore only the current error and continue the replication process.
Yes, with --slave-skip-errors=xxx in my.cnf, where xxx is 'all' or a comma sep list of error codes.
First, do you really want to ignore errors? If you get an error, it is likely that the data is not in sync any more. Perhaps what you want is to drop the slave database and restart the sync process when you get an error.
Second, I think the error you are getting is not when you replicate an item that does not exist (what would that mean anyway?) - it looks like you are replicating an item that already exists in the slave database.
I suspect the problem mainly arises from not starting at a clean data copy. It seems that the master has been copied to the slave; then replication has been turned off (or failed); and then it has started up again, but without giving the slave the chance to catch up with what it missed.
If you ever have a time when the master can be closed for write access long enough to clone the database and import it into the slave, this might get the problems to go away.
Modern mysqldump commands have a couple options to help with setting up consistent replication. Check out --master-data which will put the binary log file and position in the dump and automatically set when loaded into slave. Also --single-transaction will do the dump inside a transaction so that no write lock is needed to do a consistent dump.
If the slave isn't used for any writes other than the replication, the authors of High Performance MySQL recommend adding read_only on the slave server to prevent users from mistakenly changing data on the slave as this is will also create the same errors you experienced.
i think you are doing replication with out sync the database first sync the database and try for replication and servers are generating same unique ids and try to set auto incerment offset