Connection during MySQL import - mysql

What happens to a long lasting query executed from commandline via SSH if the connection to MySQL or SSH is lost?
Context:
We have 2 servers with a very large MySQL database on them. One server acts as the Master, and the other as Slave. During regular maintenance, the replication became corrupt, and we noticed data was missing from the slave, even though it reported Seconds_Behind_Master = 0.
So I am in the process of repairing the replication. I am, as we speak, importing one of two large dumps in to the slave. I am connected to MySQL through SSH, and used the MySQL "\. file.sql" command to import the dump.
Right now I am constantly getting results like so "Query OK, 6798 rows affected".
It has been running for probably 30 minutes now. My question and worry is, what happens if I lose connection through SSH while this is running?
I have another, even larger dump to import after this.
Thanks for the answer!
-Steve

if you lose your connection, all children of your bash process will die, including mysql.
to avoid this problem use the screen command.

Related

Mysqldump on RDS Read Replica Slave is 50x slower

I created a read-replica of my MySQL database on amazon RDS.
When executing the following command, it is super fast (half a second) on the master, but takes more like 30 seconds on the slave. Super annoying because I wanted to dump off of the slave so that I don't slow down the master.
mysqldump --set-gtid-purged=OFF -h myDomain.com -u dev -pmyPassword mySchema > out.sql
There are three issues to consider.
The most significant is that mysqldump does not perform well when run at a distance from the database, due to limitations in the traditional MySQL client/server wire protocol, which makes no allowance for pipelining a series of commands.
The mysqldump utility uses no magic to generate dump files -- it issues SQL statements to the server, and takes the results of those queries to generate its output.
As a result, every single object (schema, table, view, stored function/procedure, event) in the database requires at least one round trip and sometimes more than one.
For each table, mysqldump first issues SHOW CREATE TABLE t1; followed by SELECT * FROM t1; ... so a round trip time of 100 ms would mean that extracting a dump file of 150 tables would mean 150 × 2 × 0.100 = 30 seconds are simply wasted by the distance between the machine running mysqldump and the server -- and this is true even if the tables are completely empty.
This is not a recommendation, but you might take a look at mydumper, which claims to have the ability of creating the backup using multiple database connections, in parallel, and this could help mediate the cycles wasted as commands pass to the server and return to the client, by parallelizing the dump process. I don't know the quality of this code base, but something like this could help.
Next, you almost always want to use the --compress option for mysqldump. Contrary to what you might assume, this does not compress the backup file. The generated backup file is identical when this option is used, but when this feature is activated, the server compresses the data it sends to mysqldump on the wire, and mysqldump decompresses the data again before writing it out -- so this option will almost always make for a faster process unless the machine running mysqldump and the database server are connected by a low-latency, high-bandwidth network. Because the generated file is identical, there are no compatibility concerns when using this option.
Finally, there's an issue with newly-created RDS servers that you need to be aware of, so that it doesn't skew your benchmarks. When you create an RDS replica, it is originally seeded with data from a snapshot of the upstream master. This is, behind the scenes, an EBS snapshot of the master's hard drive, and the new database instance is backed by an EBS volume restored from that snapshot. EBS volumes are lazily-loaded from the snapshot, so they have a documented first-touch penalty. This issue could have a substantial impact on the performance of the first complete backup, but should have no meaningful impact after that.

Good idea to use SQS to move thousands of databases?

We want to move from using MySQL on an EC2 instance to RDS and setup replication. Seems like a no-brainer, right? Well, I've got 30,000 databases to move (don't ask). While setting up replication seems to work well, the process of getting the 30,000 databases into RDS is a royal pain; it takes forever and something almost alway happens.
The nightly backup takes about two hours. I end up with a multi-GB SQL dump file. When I try to restore it, something almost always goes wrong: the RDS instance wasn't big enough memory-wise and crashed, the localhost ran out of swap space, the network connection went flaky. Whatever! I did get it to restore once; IIRC it took 23 hours (30K MySQL DBs are a ton of file IO).
So today, I decided to use mydumper. It generated 30,000 schema files for the database in about two hours, then suddenly, the source MySQL went into uninterruptible sleep according to top, I lost my client connections, strace showed it was still trying to read files, and the mydumper process crashed. I restarted the whole process and just checked the status; mysqld restarted 2.5 hours into it for some reason.
So here's what I'm thinking and I'd like your input: I write two python scripts: firstScript.py will run mydumper on a single database, update a status table, package up the SQL, put it onto an AWS SQS queue, repeating until no more databases are found; the secondScript.py reads from the queue, runs the SQL and updates the status table, repeating until no more messages are found.
I think this can work. Do you? The main thing I'm not sure of is this: can I simply run multiple secondScript.py by Ctrl-Z-ing them into the background?
Or does someone have a better way of moving 30,000 databases?
I would not use mysqldump or mydumper to make a logical dump. Loading the resulting SQL-format dump takes too long.
Instead, use Percona XtraBackup to make a physical backup of your EC2 instance, and upload the backup to S3. Then restore to the RDS instance from S3, setup replication on the RDS instance to your EC2 instance, and let it catch up.
The feature of restoring a physical MySQL backup to RDS was announced in November 2017.
See also:
https://www.percona.com/blog/2018/04/02/migrate-to-amazon-rds-with-percona-xtrabackup/
https://aws.amazon.com/about-aws/whats-new/2017/11/easily-restore-an-amazon-rds-mysql-database-from-your-mysql-backup/
You should try it out with a smaller instance than your 30k databases just so you get some practice with the steps. See the steps in the Percona blog I linked to above.

Mysql full lock on big import

I've one MYSQL server with 5 databases.
I was using phpmyadmin csv import to load a very big amount of data in one table of one database.
I understand that all other operations in this machine may get slower due to the amount of processing take, but MYSQL is simple not responding to any other simultaneous request, even in other table or in other database.
And because of this apache doen't answer any request that need database connection (keeps loading forever)
after the import is finished, the apache and the mysql return to work normaly... i dont need to restart or execute any other command
my question is, Is this behavior normal? should mysql stop answering all other requests due a single giant one?
I'm afraid that if i've a big query running in one database in this server, all my other databases will be locked also and my applications stop working

Import large MySQL file without replication lagging

I'm about to import a 5 GB table on the command line:
mysql -u dbuser -p customersdb < transactions.sql
Previously I had imported a 2GB file and that caused replication to lag for long periods of time. Is there anyway to avoid that here? Somehow adding a timeout after every few thousand imports would seem ideal in my mind.
I've tried googling it but it doesn't seem like this use case comes up often.
Edit: Additionally, is there anyway to monitor the progress of an import?
The issue causing the lag is that the slave thread is single threaded by default. All operations - both from your import and from other operations - happen in a single queue.
Starting with MySQL 5.6 you can use multi threading there by setting the slave_parallel_workers option. With MySQL 5.6 this will distribute operations from different schemas, with 5.7 it can also parallize within a single schema.
See https://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_slave_parallel_workers

What when mysqldump becomes utterly slow

Currently my database is almost 20 GB big and still growing.
I'm taking a daily backup with mysqldump and it's getting really slow.
So slow that in the meanwhile new connections stack up and eventually cause this error:
SQLSTATE[HY000] [1040] Too many connections
(I could improve the amount of connections that's accepted but that won't do anything because the connections are still just frozen, waiting for the backup to complete, which will lead to timeout)
I've been reading up on some options to improve the speed and this is what I've found:
option --quick (Will probably help)
option --single-transaction (Will prevent tables from being locked, but may cause database to become incorrect)
Master-Slave replication (Probably the best thing I could do, one problem, I have only one server available)
The master-slave replication really sounds like it's the best option since I can stop the slave from updating, take the backup, and let it resume syncing. The problem is I only have one fysical machine to work with.
I know that I can set up multiple mysql instances on this one server. The question is: Is it wise to do so?
The slave is really only used to generate that backup file (which will be copied to a different disk on the network) so that the master can stay live.
if you use just innodb - try xtrabackup.
if you use both myisam and innodb - flush + lvm snapshot + file-level copy might work for you.
indeed replication slave for backups is good idea as well. just remember to periodically check data consistency between the master and the slave.