Good idea to use SQS to move thousands of databases? - mysql

We want to move from using MySQL on an EC2 instance to RDS and setup replication. Seems like a no-brainer, right? Well, I've got 30,000 databases to move (don't ask). While setting up replication seems to work well, the process of getting the 30,000 databases into RDS is a royal pain; it takes forever and something almost alway happens.
The nightly backup takes about two hours. I end up with a multi-GB SQL dump file. When I try to restore it, something almost always goes wrong: the RDS instance wasn't big enough memory-wise and crashed, the localhost ran out of swap space, the network connection went flaky. Whatever! I did get it to restore once; IIRC it took 23 hours (30K MySQL DBs are a ton of file IO).
So today, I decided to use mydumper. It generated 30,000 schema files for the database in about two hours, then suddenly, the source MySQL went into uninterruptible sleep according to top, I lost my client connections, strace showed it was still trying to read files, and the mydumper process crashed. I restarted the whole process and just checked the status; mysqld restarted 2.5 hours into it for some reason.
So here's what I'm thinking and I'd like your input: I write two python scripts: firstScript.py will run mydumper on a single database, update a status table, package up the SQL, put it onto an AWS SQS queue, repeating until no more databases are found; the secondScript.py reads from the queue, runs the SQL and updates the status table, repeating until no more messages are found.
I think this can work. Do you? The main thing I'm not sure of is this: can I simply run multiple secondScript.py by Ctrl-Z-ing them into the background?
Or does someone have a better way of moving 30,000 databases?

I would not use mysqldump or mydumper to make a logical dump. Loading the resulting SQL-format dump takes too long.
Instead, use Percona XtraBackup to make a physical backup of your EC2 instance, and upload the backup to S3. Then restore to the RDS instance from S3, setup replication on the RDS instance to your EC2 instance, and let it catch up.
The feature of restoring a physical MySQL backup to RDS was announced in November 2017.
See also:
https://www.percona.com/blog/2018/04/02/migrate-to-amazon-rds-with-percona-xtrabackup/
https://aws.amazon.com/about-aws/whats-new/2017/11/easily-restore-an-amazon-rds-mysql-database-from-your-mysql-backup/
You should try it out with a smaller instance than your 30k databases just so you get some practice with the steps. See the steps in the Percona blog I linked to above.

Related

Efficiently restoring one database to another using AWS RDS

I have a MySQL database called latest, and another database called previous, both running on the same server. Both databases have identical content. Once per day, an application runs that updates latest. Later on, towards the end of the applications execution, a comparison is made between latest and previous for certain data. Differences that are found, if any, will trigger certain actions e.g. notification emails to sent. After that, a copy of latest is dumped to a file using mysqldump and restored to previous. Both databases are now in sync again and the process repeats the following day.
I would like to migrate the database(s) to AWS RDS. I'm open to using Aurora, but the MySQL engine is fine too. Is there a simpler or more efficient way of performing the restore process so that both databases are in sync using RDS? A way that avoids having to use mysqldump and feeding the result into previous?
I understand that I could create a read replica of an instance running latest to act as previous, but I think that updates the read replica as the source DB is updated (well, asynchronously anyway) which would ruin the possibility of performing a comparison between the two later on.
I don't have any particular problem with using mysqldump for the restore process, but I'm just not sure If I'm missing a trick.
If you don't want a read replica, your option using mysqldump is good but probably you could use it with mysqlimport as suggested in the MySQL Docs:
Copying MySQL Databases to Another Machine
You can also use mysqldump and mysqlimport to transfer the database. For large tables, this is much faster than simply using mysqldump.

What is an efficient way to maintain a local readonly copy of a live remote MySQL database?

I maintain a server that runs daily cron jobs to aggregate data sources and generate reports, accessible by a private Ruby on Rails application.
One of our data sources is a partial dump of one of our partner's databases. The partner runs an active application and the MySQL DB has hundreds of tables. They have given us read-only access to a relatively underpowered readonly slave of their application DB.
Because of latency issues and performance bottlenecking on their slave DB, we have been maintaining a limited local copy of their DB. We only need about 20 tables for our reports, so I only dump those tables. We also only need the data to a daily granularity, so realtime sync is not a requirement.
For a few months, I had implemented a nightly cron which streamed the dump of the necessary tables into a local production_tmp database. Then, when all tables were imported, I dropped production and renamed production_tmp to production. This was working until the DB grew to over 25GB, and we started running into disk space limitations.
For now, I have removed the redundancy step and am just streaming the dump straight into production on our local server. This feels a bit flimsy to me, and I would like to implement a safer approach. Also, currently doing the full dump/load takes our server over 2 hours, and I'd like to implement an approach that doesn't take as long. The database will only keep growing, so I'd like to implement something future proof.
Any suggestions would be appreciated!
I take it you have never heard of, or considered MySQL Replication?
The idea is that you do your backup & restore once, and then configure the replica to "subscribe" to a continuous stream of changes as they are made on the primary MySQL instance. Any change applied to the primary is applied automatically to the replica within seconds. You don't have to do the backup & restore procedure again, unless the replica gets damaged.
It takes some care to set up and keep working, but it's a much more efficient method of keeping two instances in sync.
#SusannahPotts mentions hot backup and/or incremental backup. You can get both of these features for free, without paying for MySQL Enterprise using Percona XtraBackup.
You can also consider using MySQL Transportable Tablespaces.
You'll need filesystem access to run either Percona XtraBackup or MySQL Enterprise Backup. It's not possible to use these physical backup tools for Amazon RDS, for example.
One alternative is to create a replication slave in the same network as the live system, and run Percona XtraBackup on that slave, where you do have filesystem access.
Another option is to stream the binary logs to another host (see https://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html) and then transfer them periodically to your local instance and replay them.
Each of these solutions has pros and cons. It's hard to recommend which solution is best for you, because you aren't sharing full details about your requirements.
This was working until the DB grew to over 25GB, and we started running into disk space limitations.
Some question marks "here":
Why don't you just increase the available Diskspace for your database? 25 GB seems nothing when it comes down to disk-space?
Why don't you modify your script to: download table1, import table1_tmp, drop table1_prod, rename table1_tmp to table1_prod; rinse and repeat.
Other than that:
Why don't you ask your partner for a system with enough performance to run your reports on? I'm quite sure, he would prefer this rather than having YOU download sensitive data every day to your "local site"?
Last thought (requires MySQL Enterprise Backup https://www.mysql.de/products/enterprise/backup.html):
Rather than dumping, downloading and importing 25 GB every day:
Create a full backup
Download and import
Use Differential or incremental backups from now.
The next day you download (and import) only the data-delta: https://dev.mysql.com/doc/mysql-enterprise-backup/4.0/en/mysqlbackup.incremental.html

Export huge database from amazon RDS to local mysql

I have a mysql database on a Amazon RDS (About 600GB of data) I need to move it back home to our local dedicated servers, but I don't know where to start.
Every time I try to init a sqldump it freezes, are there a way to move it on to S3? maybe even splitting it to smaller parts before starting the download?
How would you go about migrating a 600GB mysql DB?
Did you tried to use innobackupex script? It allows to to run living database (hot backup) and tar|gzip final backup thus you can get a smaller file. Works only with file_per_table=1
If you have downtime to move database you can also try to optimize tables to reclaim some space (especially if you did a lot of deletes).
Also you can think about get rid of some data: logs, archives etc and move them later.

RHEL5 rSync Mysql server

Objective: to be able to synchronize 2 linux server realtime.
My concern is after using Rsync to mirror the mysql server. The only thing it wasnt able to synchronize is the entries (ie. inserting data to the database using the insert query). How will I be able to solve this?
Things I've done:
scp the keys of the 2 server so that password wont be asked for each transaction
I used
rsync -avc /var/lib/mysql/ root#10.1.99.XXX:/var/lib/mysql/
to sync the database/tables, but wasn't able to sync the entries.
Isubaki,
It's not quite as simple as just using rsync, as mysql may have the files open at the time you are pushing them across. Linux will do the file copy ok, but using this technique, the table is locked in memory until the database is restarted.
I do have a script that will do the sync part, but it does require a database restart, which may not be what you want (you mention realtime sync)

Connection during MySQL import

What happens to a long lasting query executed from commandline via SSH if the connection to MySQL or SSH is lost?
Context:
We have 2 servers with a very large MySQL database on them. One server acts as the Master, and the other as Slave. During regular maintenance, the replication became corrupt, and we noticed data was missing from the slave, even though it reported Seconds_Behind_Master = 0.
So I am in the process of repairing the replication. I am, as we speak, importing one of two large dumps in to the slave. I am connected to MySQL through SSH, and used the MySQL "\. file.sql" command to import the dump.
Right now I am constantly getting results like so "Query OK, 6798 rows affected".
It has been running for probably 30 minutes now. My question and worry is, what happens if I lose connection through SSH while this is running?
I have another, even larger dump to import after this.
Thanks for the answer!
-Steve
if you lose your connection, all children of your bash process will die, including mysql.
to avoid this problem use the screen command.