I have a replication setup here where data get replicated from a stationary host to a notebook.
Replication happens in two steps: the copying of the relay files, which is quite fast, and the application of the relay log events to the database, which tends to be slow.
Now my question: Suppose the slave has gotten all data from the master, but the "import process" still runs. Can I safely shut down the slave host and resume the still pending part of the replication without disturbing the process in any way?
So I am connected to the host, say "stop slave", shut down the notebook, go home and then "start slave" again without having a connection to the host. Can I expect the slave instance to resume the import process again?
Your laptop is permanently a Slave to the other machine, correct? You are just breaking the network connection to the Master every night?
There are two threads on the Slave. The I/O thread is responsible for pulling data from the binlog on the Master and putting the stuff into the "relay-log" on the Slave. If (when) the network goes away, this thread repeatedly retries. There are settings that say how frequently and when to eventually give up. Consider tuning them.
The SQL thread is responsible for applying whatever is in the relay-log. Effectively, the SQL thread can run all the time. It's quite happy to "do nothing" when there is nothing to do.
The I/O thread creates new relay-log files as needed; the SQL thread deletes a log as it finishes with it.
I have dealt with dozens of slaves over the years; I don't recall any issue with network or power failures. You are essentially causing at least a network failure every night. If you are also powering down the laptop, do it gracefully. InnoDB (but not MyISAM) recovers nicely from power failures, but don't push your luck.
STOP/START SLAVE seems unnecessary, but won't hurt. Things should "resume" and eventually "catch up".
Your quote talks about the Master purging binlogs. Well, there is an issue here. The Master does not keep track of what Slaves exist, so it can't tell if your Slave is un-connected for longer than the Master is keeping the binlogs.
See expires_logs_days. Suggest you set that to higher than the number of vacation days you might ever take.
My experience with Slaves predates GTIDs, Galera, etc.; will you be using such?
I partially have found the answer to my question:
The MySQL documentation says:
If the slave stops before the SQL thread has executed all the fetched statements, the I/O thread has at least fetched everything so that a safe copy of the statements is stored locally in the slave's relay logs, ready for execution the next time that the slave starts. This enables the master server to purge its binary logs sooner because it no longer needs to wait for the slave to fetch their contents.
This indicates that it is perfectly possible to resume the import process (execution of the statements), however, it still remains unclear
if I need to start slave before the described things happen and
what happens if the slave doesn't find its master if I do start slave.
Related
I have an AWS RDS MySQL 5.7 database with MyISAM tables that I would like to migrate to another RDS in a custom VPC, and once migrated, convert those MyISAM tables to InnoDB.
If I undertood correctly, the only way to create a correct automatic backup is using the following procedure explained here: "Automated Backups with Unsupported MySQL Storage Engines"
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html#Overview.BackupDeviceRestrictions
Stop all activity to your MyISAM tables (that is, close all sessions).
You can close all sessions by calling the mysql.rds_kill command for each process that is returned from the SHOW FULL PROCESSLIST command.
Lock and flush each of your MyISAM tables
Create a snapshot of your DB instance. When the snapshot has completed, release the locks and resume activity on the MyISAM tables
Has someone done this procedure before?
How is that the snapshots are being created successfully every night from the current RDS DBInstance, even though it contains MyISAM tables?
Thanks!
The problem isn't with snapshot creation. It's what can go wrong when you actually try to use one of the snapshots.
RDS snapshots work by capturing a snapshot if your RDS instance's underlying EBS volume (you can't see this volume, but it's there -- RDS runs on EC2, with "hidden" instances and volumes).
EBS snapshots capture the entire contents of the hard drive exactly as they happened to exist at the moment in time when the snapshot process starts.
What ends up on the snapshot is essentially the same thing that you would have on a MySQL Server if you executed sudo killall -9 mysqld -- it is as if the server had halted everything, immediately, without doing any of the things it normally does to clean up for a graceful shutdown. With RDS, things are not quite that dramatic, because RDS does take some precautions, but fundamentally, this is the nature of what is happening.
When you create an RDS instance from a snapshot, the first thing that happens when the instance starts up is the same thing your hypothetical server would do when you restarted the killed MySQL Server daemon: InnoDB Crash Recovery.
InnoDB Crash Recovery
To recover from a MySQL server crash, the only requirement is to restart the MySQL server. InnoDB automatically checks the logs and performs a roll-forward of the database to the present. InnoDB automatically rolls back uncommitted transactions that were present at the time of the crash.
https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery.html#innodb-crash-recovery
Crash recovery is InnoDB's mechanism for bringing everything back into harmony in it internal data structures and ensure that all data is intact, exactly as your application left it. It's possible because InnoDB is a transactional storage engine. That means a lot of different things, but what it specifically means in this case is that InnoDB doesn't just change table data when you change a table. It goes through a process that can be simplified something like this:
store the proposed changes to disk¹
actually make the changes
mark the changes as complete
What this means is that until the changes are finalized, InnoDB can be interrupted and will subsequently be able to pick up where it left off, without corrupting or losing data.
MyISAM has no such mechanisms. It just writes to the data files, directly. Even if a MyISAM table isn't actively being used, it may still need to be repaired when the server comes up, to clean up its structures. In some circumstances, repairing the table can be impossible, and all or part of the data in the table will be lost.
If your MyISAM tables are flushed and locked when the snapshot occurs, they are in a quiescent state on the disk, as though the server had actually been gracefully shut down before the snapshot had occurred, so they will be stable on the snapshot.
But the snapshot process will always appear to succeed, because the snapshot is just duplicating whatever is on the disk, as it appears at the moment in time when the snapshot gets underway.
The problem is that what the snapshot captured may not be usable, and you have no way of knowing whether the snapshot will be fully viable.
¹ Note that the first step, "store the proposed changes to disk" is related to the system variable innodb_flush_log_at_trx_commit which makes the system slower if set to 1 but also is the safest setting, because your query doesn't actually succeed until that first step is done. A setting of 2 is still reasonably safe, because it still writes the changes but continues without requiring that the operating system confirm that they have actually been written to the hard drive before your query returns success... but in a crash, a transaction your application thinks was committed may or may not have survived.
What will happen if you stop a mysql slave server for a few hours(8 hours)? Can the master log all the changes it made for the last 8 hours and can you just start the slave server again and just wait for it to catch up with the master?
As long as you have space for the log files, you can stop it as long as you want. In extreme cases, if your data change volume is high enough, your slave won't be able to keep up. But if that's happening you need to be using different hardware or load balancing the database differently.
The master waits for slave acknowledgment of transaction receipt after the sync. Upon receiving acknowledgment, the master commits the transaction to the storage engine and returns a result to the client.I want know when the slaves commit ?
The slaves commit whenever they get around to it, which can be within a matter of milliseconds, but can also take a substantial amount of time if the slave is lagging, because it's overloaded or underpowered, or because a process has obtained locks on the slave that cause the updates to wait for the locks.
Semi-synchronous replication only guarantees that no transaction is lost by making sure that at least one slave has saved -- not executed -- a copy of the the transaction.
The data is safe from any single-failure loss once the replication IO_THREAD on any slave has receive and written the replication event to its relay log on disk, and confirmed this fact back to the master server.
The slave acknowledges receipt of a transaction's events only after the events have been written to its relay log and flushed to disk.
https://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html
The slave SQL_THREAD reads the events that the IO_THREAD has written to the relay log and applies them to the slave's data set, but this part is a fully-asynchronous process that provides no feedback towards the semi-sync logic.
If you need the slaves to be immediately in sync for read queries of data that has just been written to the master, native replication -- neither asynchronous (conventional) nor semi-sync provides that. If that is what you require, you need Galera Replication Provider, which makes replication truly synchronous (and makes all the servers in the cluster into writable masters).
I setup an EC2 instance with MySQL on EBS volume and setup another instance which acts as Slave for Replication. The replication set up was fine. My question is about taking snapshots of these volumes. I noticed that the tables need to be locked for the snapshot process which may cause inconvenience for the users. So, my idea is to leave the Master instance alone and take a snapshot of instance acting as slave. Is this a good idea? Is there anyone out with a similar setup and could guide me in a right way?
Also, taking snapshot of slave instance would require locking of tables. Would that mean replication will break?
Thanks in advance.
Though it's a good idea to lock the database and freeze the file system when you initiate the snapshot, the actual API call to initiate the snapshot takes a fraction of a second, so your database and file system aren't locked/frozen for long.
That said, there are a couple other considerations you did not mention:
When you attempt to create the lock on the database, it might need to wait for other statements to finish before the lock is granted. During this time, your pending lock might further statements to wait until you get and release the lock. This can cause interruptions in the flow of statements on your production database.
After you initiate the creation of the snapshot, your application/database is free to use the file system on the volume, but if you have a lot of writes, you could experience high iowait, sometimes enough to create a noticeable slowdown of your application. The reason for this is that the background snapshot process needs to copy a block to S3 before it will allow a write to that block on the active volume.
I solve the first issue by requesting a lock and timing out if it is not granted quickly. I then wait a bit and keep retrying until I get the lock. Appropriate timeouts and retry delay may vary for different database loads.
I solve the second problem by performing the frequent, consistent snapshots on the slave instead of the master, just as you proposed. I still recommend performing occasional snapshots against the master simply to improve its intrinsic durability (a deep EBS property) but those snapshots do not need to be performed with locking or freezing as you aren't going to use them for backups.
I also recommend the use of a file system that supports flushing and freezing (XFS). Otherwise, you are snapshotting locked tables in MySQL that might not yet even have all their blocks on the EBS volume yet or other parts of the file system might be modified and inconsistent in the snapshot.
If you're interested, I've published open source software that performs the best practices I've collected related to creating consistent EBS snapshots with MySQL and XFS (both optional).
http://alestic.com/2009/09/ec2-consistent-snapshot
To answer your last question, locking tables in the master will not break replication. In my snapshot software I also flush the tables with read lock to make sure that everything is on the disk being snapshotted and I add the keyword "LOCAL" so that the flush is not replicated to any potential slaves.
You can definitely take a snapshot of the slave.
From your description, it does not seem like the slave is being used operationally.
If this is the case, then the safest method of obtaining a reliable volume snapshot would be to:
Stop mysql server on the slave
start the snapshot (either through the AWS Console, or by command line)
When the snapshot is complete, restart mysqld on the slave server
I have two servers configured in a master-master pair using MMM. I recently had an issue where the passive master received a replication error (got a packet bigger than max_allowed_packet) but the slave IO and SQL threads continued running. And seconds_behind_master was still showing as 0 even though the slave was not executing new statements.
I thought this type of error would cause replication to stop (it's done this in the past). Instead replication kept running and our monitors didn't notice the problem. Also the replication errors continually showed up in the mysql error log, instead of "Last_Error" in "show slave status".
We are running version 5.0.33.
Any ideas what happened here? thanks!
For the max allowed packet size, it sounds like your two DBs are not configured identically. At least the network protocol stuff should be identical.
Did you try show slave status on both machines?
Quiet failure is a terrible situation. I wonder what records did not make it. Do you have a way of finding out?
Are you getting periodic errors in the error log or a flood of identical errors? Is the sequence number incrementing on the passive master?
Jacob