MySql Replication - slave lagging behind master - mysql

I have a master/slave replication on my MySql DB.
my slave DB was down for a few hours and is back up again (master was up all the time), when issuing show slave status I can see that the slave is X seconds behind the master.
the problem is that the slave dont seem to catch up with the master, the X seconds behind master dont seem to drop...
any ideas on how I can help the slave catch up?

Here is an idea
In order for you to know that MySQL is fully processing the SQL from the relay logs. Try the following:
STOP SLAVE IO_THREAD;
This will stop replication from downloading new entries from the master into its relay logs.
The other thread, known as the SQL thread, will continue processing the SQL statements it downloaded from the master.
When you run SHOW SLAVE STATUS\G, keep your eye on Exec_Master_Log_Pos. Run SHOW SLAVE STATUS\G again. If Exec_Master_Log_Pos does not move after a minute, you can go ahead run START SLAVE IO_THREAD;. This may reduce the number of Seconds_Behind_Master.
Other than that, there is really nothing you can do except to:
Trust Replication
Monitor Seconds_Behind_Master
Monitor Exec_Master_Log_Pos
Run SHOW PROCESSLIST;, take note of the SQL thread to see if it is processing long running queries.
BTW Keep in mind that when you run SHOW PROCESSLIST; with replication running, there should be two DB Connections whose user name is system user. One of those DB Connections will have the current SQL statement being processed by replication. As long as a different SQL statement is visible each time you run SHOW PROCESSLIST;, you can trust mysql is still replicating properly.

What binary log format are you using ? Are you using ROW or STATEMENT ?
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
If you are using ROW as a binlog format make sure that all your tables has Primary or Unique Key:
SELECT t.table_schema,t.table_name,engine
FROM information_schema.tables t
INNER JOIN information_schema .columns c
on t.table_schema=c.table_schema
and t.table_name=c.table_name
and t.table_schema not in ('performance_schema','information_schema','mysql')
GROUP BY t.table_schema,t.table_name
HAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;
If you execute e.g. one delete statement on the master to delete 1 million records on a table without a PK or unique key then only one full table scan will take place on the master's side, which is not the case on the slave.
When ROW binlog_format is being used, MySQL writes the rows changes to the binary logs (not as a statement like STATEMENT binlog_format) and that change will be applied on the slave's side row by row, which means a 1 million full table scan will take place on the slave's to reflect only one delete statement on the master and that is causing slave lagging problem.

"seconds behind" isn't a very good tool to find out how much behind the master you really is. What it says is "the query I just executed was executed X seconds ago on the master". That doesn't mean that you will catch up and be right behind the master the next second.
If your slave is normally not lagging behind and the work load on the master is roughly constant you will catch up, but it might take some time, it might even take "forever" if the slave is normally just barely keeping up with the master. Slaves operate on one single thread so it is by design much slower than the master, also if there are some queries that take a while on the master they will block replication while running on the slave.

Just check if you have same time and timezones on both the servers, i.e., Master as well as Slave.

If you are using INNODB tables, check that you have innodb_flush_log_at_trx_commit to a value different that 0 at SLAVE.
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

We had exactly the same issue after setting up our slave from a recent backup.
We had changed the configuration of our slave to be more crash-safe:
sync_binlog = 1
sync_master_info = 1
relay_log_info_repository = TABLE
relay_log_recovery = 1
I think that especially the sync_binlog = 1 causes the problem, as the specs of this slave is not so fast as in the master. This config option forces the slave to store every transaction in the binary lo before they are executed (instead of the default every 10k transactions).
After disabling these config options again to their default values I see that the slave is catching up again.

Just to add the findings in my similar case.
There were few bulk temporary table insert/update/delete were happening in master which occupied most of the space from relay log in slave. And in Mysql 5.5, since being single threaded, CPU was always in 100% and took lot of time to process these records.
All I did was to add these line in mysql cnf file
replicate-ignore-table=<dbname>.<temptablename1>
replicate-ignore-table=<dbname>.<temptablename2>
and everything became smooth again.
Inorder to figure out which tables are taking more space in relay log, try the following command and then open in a text editor. You may get some hints
cd /var/lib/mysql
mysqlbinlog relay-bin.000010 > /root/RelayQueries.txt
less /root/RelayQueries.txt

If u have multiple schema's consider using multi threaded slave replication.This is relatively new feature.
This can be done dynamically without stopping server.Just stop the slave sql thread.
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;

I have an issue similar to this. and both of my MySQL server hosted on AWS EC2 (master and replication). by increasing EBS disk size (which automatically increased IOPS) for MySQL slave server, its turned out the solution for me. R/W Throughput and bandwidth is increased R/W latency were decreased.
now my MySQL database replication is catching up to the master. and Seconds_Behind_Master was decreased (it was got increased from day to day).
so if you have MySQL hosted on EC2. I suggest you tried to increase EBS disk size or its IOPS on the slave.

I know it's been a while since OP asked but it would have helped me to read the following answer.
In /etc/mysql/mysql.cnf :
[mysql]
disable_log_bin
innodb_flush_log_at_trx_commit=2
innodb_doublewrite = 0
sync_binlog=0
disable_log_bin REALLY carried the trick for me.

Related

MYSQL SLOW replication but suddenly increasing and then again slow in mysql Slave unable

I am facing a major issue in MySQL server from 2 days. My slave server is Seconds behind master by 70000 and its not getting down from 2 days. At night its suddenly increasing but again it in slow mode. Is there any way to Synchronize Master slave replication FAST? What is the problem with? Slave is working its IO and sql running in YES MODE. Please Help me out if there is any way
Is it repeatedly bouncing between 70000 and about 0? If so, that is a mystery that I have seen on and off for more than a decade. Ignore it, it will go away.
If Seconds_behind_master is rising at the rate of 1 second per second, the look at what the Slave is doing. SHOW PROCESSLIST; You will probably find something like ALTER that has been running a long time, tying up replication.
If Seconds_behind_master is getting big, but not going down much, then there are several possible answers.
Is the Slave a "weaker" machine than the Master? Keep in mind that Replication is (depending on the version) only single-threaded. Multiple writes can happen on the Master simultaneously, but then have to be done one at a time on the Slave.
Is the Slave running a big query that is locking what the replication thread would like to get to? Look at the Slave's PROCESSLIST.
Which Engine are you using? VM? Cloud hosted? Performing backups at night?

replication slave has less space

I need to setup the replication
so for that i took backup from the master and imported into slave and now slave has 20 GB free space,
at the time while i am restoring backup master got 5+ GB of data
after that i enabled replication
Now the problem is while data from the relay log is written to slave many new relay logs are generated and i am left with no space on slave....
The relay-log-purge is enabled but the writing process from relay log to database is very slow...
You may have performance benefits by switching binlog format to 'ROW'. In mixed format mysql is writing binlog statement based (the slave has to reparse and plan the strategy again) and only switches to row format in certain cases (http://dev.mysql.com/doc/refman/5.1/en/binary-log-mixed.html).
It can be done on the fly:
SET GLOBAL binlog_format = 'ROW';
With ROW based replication you can offload the slaves which makes them easier to keep up with the master. The only downside of it is you may have larger log files however from this "got 5+ GB of data" I assume you mainly do inserts so the there will be no difference.
Best is to try it and see how it behaves.
Other option until the replication catches up to change the innodb_flush_log_at_trx_commit to 2 on the slave. Only do this temporary.
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
just turn off the slave io thread when you get less disk space alert (90% reached) as
mysql> stop slave io_thread;
and start it again after some time as
mysql> start slave io_thread;
on production this is the only best possible option......
Try adding this to your config
relay-log-space-limit=5G

Replaying Mysql Replication error

We're running a standard Mysql Master/Slave replication system which has been working nicely for a couple of years. So nicely in fact that we were able to recover from a server outage (which rendered the Master offline) by using the Slave DB (after turning it into a Master).
Anyway, the issue that is causing our problem occurred after we recovered the server and returned the original Master to being a Master and the original slave back to being a Slave.
It was caused by me being an idiot - not by anything else!
Basically a write occurred on the Slave - meaning that a write on the same table on the Master cannot be replicated across due to a Duplicated Primary Key error.
As I said, my fault.
I can delete the record from the Slave that was erroneously written there - thus removing the conflicting ID from the Slave but I want the record from the Master DB.
My question is this: I know that I can 'fix' the replication by skipping over the relevant query from the Master (by setting the SQL_SLAVE_SKIP_COUNTER to 1) - but that will mean I lose the record from the Master.
So, is there a way of 'replaying' the errored replication write? Basically pointing the replication to be one query back?
Or do I have to go through the whole rigmarole of dropping my Slave, repopulating it from my last successful backup and set it to be a Slave again?
Many thanks in advance.
If it's a simple modification/update/... why don't you simply play it on the master (manually or taking it from the slave binary log if you have one), with slave's slave process off, then set the new replication pointer (file & position) on the slave (show master status; on master), and restart the slave?
The answer was actually staring me in the face (isn't it always!).
The duplicate Primary key error was caused by me doing an insert on the Slave and an insert to the same table on the master - meaning both new records had the same ID.
Luckily for me, I did not need the insert on the Slave, so I simply deleted it - meaning that there was only one record with the primary key value.
I just needed a way to get the replication to attempt to replicate the record across - and this was easy.
Just restart the Slave!
So:
mysql root:(none)>START SLAVE;
Query OK, 0 rows affected (0.03 sec)
mysql root:(none)> _

Replication slave locking

I have a primary mysql server and 2 slaves as backups.
One of the slaves has been outfitted with solid state storage and due to this is used heavily for reporting.
Some of the data generated takes some time (around half an hour to an hour in some cases) and uses and generates allot of data (on the order of a couple gigs, which makes me hesitant to use transactions). The reporting tables are but a small subset of the total database so fully shutting down replication is somewhat out of the question.
The issue at hand is reports generated while the data is being generated are obviously incomplete and wrong.
What would be the best way to lock the tables on both the master and the reporting server?
Would the "LOCK TABLES" statement be replicated to the slaves or would my best course of action be to generate the data in temporary tables and then copy them to the final table in one INSERT ... SELECT statement.
Try the following
You could try the following:
Step 01) On the Master, run this
FLUSH TABLES WITH READ LOCK; SELECT CONNECTION_ID(); SELECT SLEEP(300);
Step 02) SHOW SLAVE STATUS\G on both Slaves (or just the Reporting Slave)
Step 03) Repeat Step 02 Until
Relay_Log_Space stops changing
Relay_Log_Pos stops changing
Seconds_Behind_Master is 0
At this point, since both Slaves have not received any new SQL to process, you have effectively frozen MySQL on the Slaves at the Same Point-In-Time as the Master
Step 04) On the Slaves (or just the Reporting Slave), run STOP SLAVE;
Step 05) On the Master, (if the CONNECTION_ID() return 789) run mysql> KILL 789; in another mysql session.
Step 06) Run your reports
Step 07) Run START SLAVE; on the Slaves (or just the Reporting Slave)
UPDATE 2012-06-05 15:15 EDT
Since this seems a little heavy handed for the sake of a few tables in one particular schema, the simplest thing would just be to run STOP SLAVE; on the Slave you do the Reporting from.

MySQL Binary Log Replication: Can it be set to ignore errors?

I'm running a master-slave MySQL binary log replication system (phew!) that, for some data, is not in sync (meaning, the master holds more data than the slave). But the slave stops very frequently on the slightest MySQL error, can this be disabled? (perhaps a my.cnf setting for the replicating slave ignore-replicating-errors or some of the sort ;) )
This is what happens, every now and then, when the slave tries to replicate an item that does not exist, the slave just dies. a quick check at SHOW SLAVE STATUS \G; gives
Slave-IO-Running: Yes
Slave-SQL-Running: No
Replicate-Do-DB:
Last-Errno: 1062
Last-Error: Error 'Duplicate entry '15218' for key 1' on query. Default database: 'db'. Query: 'INSERT INTO db.table ( FIELDS ) VALUES ( VALUES )'
which I promptly fix (once I realize that the slave has been stopped) by doing the following:
STOP SLAVE;
RESET SLAVE;
START SLAVE;
... lately this has been getting kind of tiresome, and before I spit out some sort of PHP which does this for me, i was wondering if there's some my.cnf entry which will not kill the slave on the first error.
Cheers,
/mp
stop slave; set global sql_slave_skip_counter=1; start slave;
You can ignore only the current error and continue the replication process.
Yes, with --slave-skip-errors=xxx in my.cnf, where xxx is 'all' or a comma sep list of error codes.
First, do you really want to ignore errors? If you get an error, it is likely that the data is not in sync any more. Perhaps what you want is to drop the slave database and restart the sync process when you get an error.
Second, I think the error you are getting is not when you replicate an item that does not exist (what would that mean anyway?) - it looks like you are replicating an item that already exists in the slave database.
I suspect the problem mainly arises from not starting at a clean data copy. It seems that the master has been copied to the slave; then replication has been turned off (or failed); and then it has started up again, but without giving the slave the chance to catch up with what it missed.
If you ever have a time when the master can be closed for write access long enough to clone the database and import it into the slave, this might get the problems to go away.
Modern mysqldump commands have a couple options to help with setting up consistent replication. Check out --master-data which will put the binary log file and position in the dump and automatically set when loaded into slave. Also --single-transaction will do the dump inside a transaction so that no write lock is needed to do a consistent dump.
If the slave isn't used for any writes other than the replication, the authors of High Performance MySQL recommend adding read_only on the slave server to prevent users from mistakenly changing data on the slave as this is will also create the same errors you experienced.
i think you are doing replication with out sync the database first sync the database and try for replication and servers are generating same unique ids and try to set auto incerment offset