Mysqldump on RDS Read Replica Slave is 50x slower - mysql

I created a read-replica of my MySQL database on amazon RDS.
When executing the following command, it is super fast (half a second) on the master, but takes more like 30 seconds on the slave. Super annoying because I wanted to dump off of the slave so that I don't slow down the master.
mysqldump --set-gtid-purged=OFF -h myDomain.com -u dev -pmyPassword mySchema > out.sql

There are three issues to consider.
The most significant is that mysqldump does not perform well when run at a distance from the database, due to limitations in the traditional MySQL client/server wire protocol, which makes no allowance for pipelining a series of commands.
The mysqldump utility uses no magic to generate dump files -- it issues SQL statements to the server, and takes the results of those queries to generate its output.
As a result, every single object (schema, table, view, stored function/procedure, event) in the database requires at least one round trip and sometimes more than one.
For each table, mysqldump first issues SHOW CREATE TABLE t1; followed by SELECT * FROM t1; ... so a round trip time of 100 ms would mean that extracting a dump file of 150 tables would mean 150 × 2 × 0.100 = 30 seconds are simply wasted by the distance between the machine running mysqldump and the server -- and this is true even if the tables are completely empty.
This is not a recommendation, but you might take a look at mydumper, which claims to have the ability of creating the backup using multiple database connections, in parallel, and this could help mediate the cycles wasted as commands pass to the server and return to the client, by parallelizing the dump process. I don't know the quality of this code base, but something like this could help.
Next, you almost always want to use the --compress option for mysqldump. Contrary to what you might assume, this does not compress the backup file. The generated backup file is identical when this option is used, but when this feature is activated, the server compresses the data it sends to mysqldump on the wire, and mysqldump decompresses the data again before writing it out -- so this option will almost always make for a faster process unless the machine running mysqldump and the database server are connected by a low-latency, high-bandwidth network. Because the generated file is identical, there are no compatibility concerns when using this option.
Finally, there's an issue with newly-created RDS servers that you need to be aware of, so that it doesn't skew your benchmarks. When you create an RDS replica, it is originally seeded with data from a snapshot of the upstream master. This is, behind the scenes, an EBS snapshot of the master's hard drive, and the new database instance is backed by an EBS volume restored from that snapshot. EBS volumes are lazily-loaded from the snapshot, so they have a documented first-touch penalty. This issue could have a substantial impact on the performance of the first complete backup, but should have no meaningful impact after that.

Related

Good idea to use SQS to move thousands of databases?

We want to move from using MySQL on an EC2 instance to RDS and setup replication. Seems like a no-brainer, right? Well, I've got 30,000 databases to move (don't ask). While setting up replication seems to work well, the process of getting the 30,000 databases into RDS is a royal pain; it takes forever and something almost alway happens.
The nightly backup takes about two hours. I end up with a multi-GB SQL dump file. When I try to restore it, something almost always goes wrong: the RDS instance wasn't big enough memory-wise and crashed, the localhost ran out of swap space, the network connection went flaky. Whatever! I did get it to restore once; IIRC it took 23 hours (30K MySQL DBs are a ton of file IO).
So today, I decided to use mydumper. It generated 30,000 schema files for the database in about two hours, then suddenly, the source MySQL went into uninterruptible sleep according to top, I lost my client connections, strace showed it was still trying to read files, and the mydumper process crashed. I restarted the whole process and just checked the status; mysqld restarted 2.5 hours into it for some reason.
So here's what I'm thinking and I'd like your input: I write two python scripts: firstScript.py will run mydumper on a single database, update a status table, package up the SQL, put it onto an AWS SQS queue, repeating until no more databases are found; the secondScript.py reads from the queue, runs the SQL and updates the status table, repeating until no more messages are found.
I think this can work. Do you? The main thing I'm not sure of is this: can I simply run multiple secondScript.py by Ctrl-Z-ing them into the background?
Or does someone have a better way of moving 30,000 databases?
I would not use mysqldump or mydumper to make a logical dump. Loading the resulting SQL-format dump takes too long.
Instead, use Percona XtraBackup to make a physical backup of your EC2 instance, and upload the backup to S3. Then restore to the RDS instance from S3, setup replication on the RDS instance to your EC2 instance, and let it catch up.
The feature of restoring a physical MySQL backup to RDS was announced in November 2017.
See also:
https://www.percona.com/blog/2018/04/02/migrate-to-amazon-rds-with-percona-xtrabackup/
https://aws.amazon.com/about-aws/whats-new/2017/11/easily-restore-an-amazon-rds-mysql-database-from-your-mysql-backup/
You should try it out with a smaller instance than your 30k databases just so you get some practice with the steps. See the steps in the Percona blog I linked to above.

Moving of large MySQL database from limited resource server

I have a Windows Server with MySQL Database Server installed.
Multiple databases exist among them, database A contains a huge table named 'tlog', size about 220gb.
I would like to move over database A to another server for backup purposes.
I know I can do SQL Dump or use MySQL Workbench/SQLyog to do table copy.
But due to limited disk storage in server (less than 50gb) SQL Dump is not possible.
The server is serving other works so basically the CPU & RAM is limited too. As a result, copy table without used up CPU & RAM is not possible.
Is there any other method that can do the moving of the huge database A over to another server please?
Thanks in advance.
You have a few ways:
Method 1
Dump and compress at the same time: mysqldump ... | gzip > blah.sql.gz
This method is good because chances are your database will be less than 50GB; as the database dump should be in ASCII; you're then compressing it on the fly.
Method 2
You can use slave replication; this method will require a dump of the data.
Method 3
You can also use xtrabackup.
Method 4
You can shutdown the database, and rsync the data directory.
Note: You don't actually have to shutdown the database; you can however do multiple rsyncs; and eventually nothing will change (unlikely if the database is busy; have to do during slow time); which means the database would have sync'd over.
I've had to do this method with fairly large PostgreSQL databases (1TB+). It takes a few rsyncs: but, hey; it's the cost of 0 down time.
Method 5
If you're in a virtual environment you could:
Clone the disk image.
If you're in AWS you could create an AMI.
You could add another disk and just sync locally; then detach the disk, and re-attach to the new VM.
If you're worried about consuming resources during the dump or transfer you can use ionice and renice to limit the priority of the dump/transfer.

Migrating a MySQL server from one box to another

The databases are prohibitively large (> 400MB), so dump > SCP > source is proving to be hours and hours work.
Is there an easier way? Can I connect to the DB directly and import from the new server?
You can simply copy the whole /data folder.
Have a look at High Performance MySQL - transferring large files
Use can use ssh to directly pipe your data over the Internet. First set up SSH keys for password-less login. Next, try something like this:
$ mysqldump -u db_user -p some_database | gzip | ssh someuser#newserver 'gzip -d | mysql -u db_user --password=db_pass some_database'
Notes:
The basic idea is that you are just dumping standard output straight into a command on the other side, which SSH is perfect for.
If you don't need encryption then you can use netcat but it's probably not worth it
The SQL text data goes over the wire compressed!
Obviously, change db_user to user user and some_database to your database. someuser is the (Linux) system user, not the MySQL user.
You will also have to use --password the long way because having mysql prompt you will be a lot of headache.
You could setup a MySQL slave replication and let MySQL copy the data, and then make the slave the new master
400M is really not a large database; transferring it to another machine will only take a few minutes over a 100Mbit network. If you do not have 100M networks between your machines, you are in a big trouble!
If they are running the exact same version of MySQL and have identical (or similar ENOUGH) my.cnf and you just want a copy of the entire data, it is safe to copy the server's entire data directory across (while both instances are stopped, obviously). You'll need to delete the data directory of the target machine first of course, but you probably don't care about that.
Backup/restore is usually slowed down by the restoration having to rebuild the table structure, rather than the file copy. By copying the data files directly, you avoid this (subject to the limitations stated above).
If you are migrating a server:
The dump files can be very large so it is better to compress it before sending or use the -C flag of scp. Our methodology of transfering files is to create a full dump, in which the incremental logs are flushed (use --master-data=2 --flush logs, please check you don't mess any slave hosts if you have them). Then we copy the dump and play it. Afterwards we flush the logs again (mysqladmin flush-logs), take the recent incremental log (which shouldn't be very large) and play only it. Keep doing it until the last incremental log is very small so that you can stop the database on the original machine, copy the last incremental log and then play it - it should take only a few minutes.
If you just want to copy data from one server to another:
mysqldump -C --host=oldhost --user=xxx --database=yyy -p | mysql -C --host=newhost --user=aaa -p
You will need to set the db users correctly and provide access to external hosts.
try importing the dump on the new server using mysql console, not an auxiliar software
I have no experience with doing this with mysql, but to me it seems the bottleneck is transferring the actual data?
4oo MB isnt that much. But if dump -> SCP is slow, i dont think connecting to the db server from the remove box would be any faster?
I'd suggest dumping, compressing, then copying over network or burning to disk and manually transfering the data.
Compressing such a dump will most likely give you quite good compression rate since, most likely , theres a lot of repeptetive data.
If you are only copying all the databases of the server, copy the entire /data directory.
If you are just copying one or more databases and adding them to an existing mysql server:
create the empty database in the new server, set up the permissions for users etc.
copy the folder for the database in /data/databasename to the new server /data/databasename
I like to use BigDump: Staggered Mysql Dump Importer after Exporting my database from the old server.
http://www.ozerov.de/bigdump/
One thing to note though, if you don't set the export options (namely the maximum length of created queries) respective to the load your new server can handle, it'll just fail and you will have to try again with different parameters. Personally, I set mine to about 25,000, but that's just me. Test it out a bit and you'll get the hang of it.

How do I backup a MySQL database?

What do I have to consider when backing up a database with millions of entries? Are there any tools (maybe bundled with the MySQL server) that I could use?
Depending on your requirements, there's several options that I have been using myself:
if you don't need hot backups, take down the db server and back up on the file system level, i. e. using tar, rsync or similar.
if you do need the database server to keep running, you can start out with the mysqlhotcopy tool (a perl script), which locks the tables that are being backed up and allows you to select single tables and databases.
if you want the backup to be portable, you might want to use mysqldump, which creates SQL scripts to recreate the data, but which is slower than mysqlhotcopy
if you have a copy of the db at a certain point in time, you could also just keep the binlogs (starting at that point in time) somewhere safe. This can be very easy to do and doesn't interfere with the server's operation, but might not be the fastest to restore, and you have to make sure you don't miss part of the logs.
Methods I haven't tried, but that make sense to me:
if you have a filesystem like ZFS or are running on LVM, it might be a good idea to do a snapshot of the database by doing a filesystem snapshot, because they are very, very quick. Just remember to ensure a consistent state of your db during the whole operation, e. g. by doing FLUSH TABLES WITH READ LOCK (and of course, don't forget UNLOCK TABLES afterwards)
Additionally:
you can use a master-slave setup to replicate your production server to either a different machine or a second instance on the same machine and do any of the above to the replicated copy, leaving your production machine alone. Instead of running continously, you can also fire up the slave on regular intervals, let it read the binlog, and switch it off again.
I think, MySQL cluster and the enterprise licensed version have more tools, but I have never tried them.
Mysqlhotcopy is badly described - it only works if you use MyISAM, and it's not hot.
The problem with mysqldump is the time it takes to restore the backup (but it can be made hot if you have all InnoDB tables, see --single-transaction).
I recommend using a hot backup tool, like what is available in XtraBackup:
http://www.percona.com/docs/wiki/percona-xtrabackup:start
Watch out if using mysqldump on large tables using the MyISAM storage engine; it blocks selects while the dump is running on each table and this can take down busy sites for 5-10 minutes in some cases.
Using InnoDB, by comparison, you get non-blocking backups because of its row-level locking, so this is not such an issue.
If you need to use MyISAM, a common strategy is to replicate to a second MySQL instance and do the mysqldump against the replicated copy instead.
Use the export tab in phpMyAdmin. phpMyAdmin is the free easy to use web interface for doing MySQL administration.
I think mysqldump is the proper way of doing it.

What's the quickest way to dump & load a MySQL InnoDB database using mysqldump?

I would like to create a copy of a database with approximately 40 InnoDB tables and around 1.5GB of data with mysqldump and MySQL 5.1.
What are the best parameters (ie: --single-transaction) that will result in the quickest dump and load of the data?
As well, when loading the data into the second DB, is it quicker to:
1) pipe the results directly to the second MySQL server instance and use the --compress option
or
2) load it from a text file (ie: mysql < my_sql_dump.sql)
QUICKLY dumping a quiesced database:
Using the "-T " option with mysqldump results in lots of .sql and .txt files in the specified directory. This is ~50% faster for dumping large tables than a single .sql file with INSERT statements (takes 1/3 less wall-clock time).
Additionally, there is a huge benefit when restoring if you can load multiple tables in parallel, and saturate multiple cores. On an 8-core box, this could be as much as an 8X difference in wall-clock time to restore the dump, on top of the efficiency improvements provided by "-T". Because "-T" causes each table to be stored in a separate file, loading them in parallel is easier than splitting apart a massive .sql file.
Taking the strategies above to their logical extreme, one could create a script to dump a database widely in parallel. Well, that's exactly what the Maakit mk-parallel-dump (see http://www.maatkit.org/doc/mk-parallel-dump.html) and mk-parallel-restore tools are; perl scripts that make multiple calls to the underlying mysqldump program. However, when I tried to use these, I had trouble getting the restore to complete without duplicate key errors that didn't occur with vanilla dumps, so keep in mind that your milage may vary.
Dumping data from a LIVE database (w/o service interruption):
The --single-transaction switch is very useful for taking a dump of a live database without having to quiesce it or taking a dump of a slave database without having to stop slaving.
Sadly, -T is not compatible with --single-transaction, so you only get one.
Usually, taking the dump is much faster than restoring it. There is still room for a tool that take the incoming monolithic dump file and breaks it into multiple pieces to be loaded in parallel. To my knowledge, such a tool does not yet exist.
Transferring the dump over the Network is usually a win
To listen for an incoming dump on one host run:
nc -l 7878 > mysql-dump.sql
Then on your DB host, run
mysqldump $OPTS | nc myhost.mydomain.com 7878
This reduces contention for the disk spindles on the master from writing the dump to disk slightly speeding up your dump (assuming the network is fast enough to keep up, a fairly safe assumption for two hosts in the same datacenter). Plus, if you are building out a new slave, this saves the step of having to transfer the dump file after it is finished.
Caveats - obviously, you need to have enough network bandwidth not to slow things down unbearably, and if the TCP session breaks, you have to start all over, but for most dumps this is not a major concern.
Lastly, I want to clear up one point of common confusion.
Despite how often you see these flags in mysqldump examples and tutorials, they are superfluous because they are turned ON by default:
--opt
--add-drop-table
--add-locks
--create-options
--disable-keys
--extended-insert
--lock-tables
--quick
--set-charset.
From http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html:
Use of --opt is the same as specifying --add-drop-table, --add-locks, --create-options, --disable-keys, --extended-insert, --lock-tables, --quick, and --set-charset. All of the options that --opt stands for also are on by default because --opt is on by default.
Of those behaviors, "--quick" is one of the most important (skips caching the entire result set in mysqld before transmitting the first row), and can be with "mysql" (which does NOT turn --quick on by default) to dramatically speed up queries that return a large result set (eg dumping all the rows of a big table).
Pipe it directly to another instance, to avoid disk overhead. Don't bother with --compress unless you're running over a slow network, since on a fast LAN or loopback the network overhead doesn't matter.
i think it will be a lot faster and save you disk space if you tried database replication as opposed to using mysqldump. personally i use sqlyog enterprise for my really heavy lifting but there also a number of other tools that can provide the same services. unless of course you would like to use only mysqldump.
For innodb, --order-by-primary --extended-insert is usually the best combo. If your after every last bit of performance and the target box has many CPU cores, you might want to split the resulting dumpfile and do parallel inserts in many threads, up to innodb_thread_concurrency/2.
Also, tweak the innodb_buffer_pool_size on the target to the max you can afford, and increase innodb_log_file_size to 128 or 256 MB (careful with this, you need to remove the old logfiles before restarting the mysql daemon otherwise it won't restart)
Use mk-parallel-dump tool from Maatkit.
At least that would probably be faster. I'd trust mysqldump more.
How often are you doing this? Is it really an application performance problem? Perhaps you should design a way of doing this which doesn't need to dump the whole data (replication?)
On the other hand, 1.5G is quite a small database so it probably won't be much of a problem.
mydumper is a good choice, with paralel export, even paralell threads per table, and compressed files, see: