Export huge database from amazon RDS to local mysql - mysql

I have a mysql database on a Amazon RDS (About 600GB of data) I need to move it back home to our local dedicated servers, but I don't know where to start.
Every time I try to init a sqldump it freezes, are there a way to move it on to S3? maybe even splitting it to smaller parts before starting the download?
How would you go about migrating a 600GB mysql DB?

Did you tried to use innobackupex script? It allows to to run living database (hot backup) and tar|gzip final backup thus you can get a smaller file. Works only with file_per_table=1
If you have downtime to move database you can also try to optimize tables to reclaim some space (especially if you did a lot of deletes).
Also you can think about get rid of some data: logs, archives etc and move them later.

Related

mysql master-slave. can i add slave when the master server already has a lot of data

is there a way to replicate mysql while the master server already has a lot of data.I tried the normal way, but I had difficulty getting the MASTER_LOG_POS value. how can the slave server be able to replicate data that previously existed on the master server.
Generally you start with an exact full copy of your existing database. This means creating a real copy of your MySQL data directory (while the server is off), go with a (consistent) snapshot, or use a tool like Percona XtraBackup.
Only after you have 2 identical MySQL servers, you can start replicating. Note that using a tool like mysqldump is not a good idea for consistent snapshots.
If you have a relatively small amount of data you could use mysqldump --master-data=1 --single-transaction. This will create a snapshot with the correct master-binlog and position required. This should not be used for production environments or large amounts of data.

Good idea to use SQS to move thousands of databases?

We want to move from using MySQL on an EC2 instance to RDS and setup replication. Seems like a no-brainer, right? Well, I've got 30,000 databases to move (don't ask). While setting up replication seems to work well, the process of getting the 30,000 databases into RDS is a royal pain; it takes forever and something almost alway happens.
The nightly backup takes about two hours. I end up with a multi-GB SQL dump file. When I try to restore it, something almost always goes wrong: the RDS instance wasn't big enough memory-wise and crashed, the localhost ran out of swap space, the network connection went flaky. Whatever! I did get it to restore once; IIRC it took 23 hours (30K MySQL DBs are a ton of file IO).
So today, I decided to use mydumper. It generated 30,000 schema files for the database in about two hours, then suddenly, the source MySQL went into uninterruptible sleep according to top, I lost my client connections, strace showed it was still trying to read files, and the mydumper process crashed. I restarted the whole process and just checked the status; mysqld restarted 2.5 hours into it for some reason.
So here's what I'm thinking and I'd like your input: I write two python scripts: firstScript.py will run mydumper on a single database, update a status table, package up the SQL, put it onto an AWS SQS queue, repeating until no more databases are found; the secondScript.py reads from the queue, runs the SQL and updates the status table, repeating until no more messages are found.
I think this can work. Do you? The main thing I'm not sure of is this: can I simply run multiple secondScript.py by Ctrl-Z-ing them into the background?
Or does someone have a better way of moving 30,000 databases?
I would not use mysqldump or mydumper to make a logical dump. Loading the resulting SQL-format dump takes too long.
Instead, use Percona XtraBackup to make a physical backup of your EC2 instance, and upload the backup to S3. Then restore to the RDS instance from S3, setup replication on the RDS instance to your EC2 instance, and let it catch up.
The feature of restoring a physical MySQL backup to RDS was announced in November 2017.
See also:
https://www.percona.com/blog/2018/04/02/migrate-to-amazon-rds-with-percona-xtrabackup/
https://aws.amazon.com/about-aws/whats-new/2017/11/easily-restore-an-amazon-rds-mysql-database-from-your-mysql-backup/
You should try it out with a smaller instance than your 30k databases just so you get some practice with the steps. See the steps in the Percona blog I linked to above.

What is an efficient way to maintain a local readonly copy of a live remote MySQL database?

I maintain a server that runs daily cron jobs to aggregate data sources and generate reports, accessible by a private Ruby on Rails application.
One of our data sources is a partial dump of one of our partner's databases. The partner runs an active application and the MySQL DB has hundreds of tables. They have given us read-only access to a relatively underpowered readonly slave of their application DB.
Because of latency issues and performance bottlenecking on their slave DB, we have been maintaining a limited local copy of their DB. We only need about 20 tables for our reports, so I only dump those tables. We also only need the data to a daily granularity, so realtime sync is not a requirement.
For a few months, I had implemented a nightly cron which streamed the dump of the necessary tables into a local production_tmp database. Then, when all tables were imported, I dropped production and renamed production_tmp to production. This was working until the DB grew to over 25GB, and we started running into disk space limitations.
For now, I have removed the redundancy step and am just streaming the dump straight into production on our local server. This feels a bit flimsy to me, and I would like to implement a safer approach. Also, currently doing the full dump/load takes our server over 2 hours, and I'd like to implement an approach that doesn't take as long. The database will only keep growing, so I'd like to implement something future proof.
Any suggestions would be appreciated!
I take it you have never heard of, or considered MySQL Replication?
The idea is that you do your backup & restore once, and then configure the replica to "subscribe" to a continuous stream of changes as they are made on the primary MySQL instance. Any change applied to the primary is applied automatically to the replica within seconds. You don't have to do the backup & restore procedure again, unless the replica gets damaged.
It takes some care to set up and keep working, but it's a much more efficient method of keeping two instances in sync.
#SusannahPotts mentions hot backup and/or incremental backup. You can get both of these features for free, without paying for MySQL Enterprise using Percona XtraBackup.
You can also consider using MySQL Transportable Tablespaces.
You'll need filesystem access to run either Percona XtraBackup or MySQL Enterprise Backup. It's not possible to use these physical backup tools for Amazon RDS, for example.
One alternative is to create a replication slave in the same network as the live system, and run Percona XtraBackup on that slave, where you do have filesystem access.
Another option is to stream the binary logs to another host (see https://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html) and then transfer them periodically to your local instance and replay them.
Each of these solutions has pros and cons. It's hard to recommend which solution is best for you, because you aren't sharing full details about your requirements.
This was working until the DB grew to over 25GB, and we started running into disk space limitations.
Some question marks "here":
Why don't you just increase the available Diskspace for your database? 25 GB seems nothing when it comes down to disk-space?
Why don't you modify your script to: download table1, import table1_tmp, drop table1_prod, rename table1_tmp to table1_prod; rinse and repeat.
Other than that:
Why don't you ask your partner for a system with enough performance to run your reports on? I'm quite sure, he would prefer this rather than having YOU download sensitive data every day to your "local site"?
Last thought (requires MySQL Enterprise Backup https://www.mysql.de/products/enterprise/backup.html):
Rather than dumping, downloading and importing 25 GB every day:
Create a full backup
Download and import
Use Differential or incremental backups from now.
The next day you download (and import) only the data-delta: https://dev.mysql.com/doc/mysql-enterprise-backup/4.0/en/mysqlbackup.incremental.html

best tool to take a backup from mysql

I am using mysql workbench for taking a backup/dump of my database hosted on Amazon RDS service. My database is very huge (about 8gib) and taking a 9-10 hours to download it from read-replica, mean while I am not able to see If download process is stuck or running.
Is there any GUI tool available to take a backup fast and can also give details of which process is running like which table is downloading with its row details or percentage of total download. Mysql workbench is a good tool, but It hasn't show all the options given in 'mysqldump' command utility, and It is also very slow. and I also doubt about my data integrity. can someone explain me how it's work specially with data integrity?
Thanks
First of all, your 8GB database is by no means 'huge'. Second, I'm not clear on what you're trying to do? Amazon provides multiple ways for you to have backups.
From: http://aws.amazon.com/rds/faqs/
Q: Do I need to enable backups for my DB Instance or is it done automatically?
By default and at no additional charge, Amazon RDS enables automated backups of your DB Instance with a 1 day retention period.

What are the best practices for Mysql backup

We have one php application and mysql server running on one of our production server.
Mysql server is currently 4GB big with intention to grow up to tens or even up to hundreds of GB.
What am curious to find out is what are the best practices for backup of mysql database in condition that application must be live under any circumstance? What is better, to have mysql replication server on which we will run backup scripts or to run on live server? What is more likely to slow down We have possibility to add additional server(s) if needed. Where do I need to store mysql dumps? Is it suggested to ftp copy mysql backup files to remote server.
What is the best practice to organize web application backup if don't have problem with number of server instances?
MySQL backup methods are documented on MySQL documentation.
The ideal backup solution will be to use MySQL Enterprise Backup. This is a licensed product sold on Oracle store. It is very fast compared to mysqldump.
MySQL Enterprise Backup: A licensed product that performs hot backups
of MySQL databases. It offers the most efficiency and flexibility when
backing up InnoDB tables, but can also back up MyISAM and other kinds
of tables.
If you are looking for a free solution with MySQL community edition, then you can install another replication server and either run mysqldump to take backup or make a raw data backup. During backup on your replication server, your main master database will be running. Since your data is big or will get bigger, it is recommended to backup raw data files. It is basically a process of copying data and log files from disk. Details are explained on MySQL documentation.
For larger databases, where mysqldump would be impractical or
inefficient, you can back up the raw data files instead. Using the raw
data files option also means that you can back up the binary and relay
logs that will enable you to recreate the slave in the event of a
slave failure.
Finally, you should copy backup files to another physical disk on the same to recover from disk failures or to another physical server to easily recover from complete server failures.
Replication is something that protects against hardware errors, for example, a hard disk crashed.
Backup - protects against software errors, for example, due to the human factor, data has been deleted from a table.
It is definitely good practice to combine both of these technologies by running a utility to create a backup on a replica. This not only reduces the load on the product database, but also covers more recovery scenarios.
In case of a hardware error, you can restore the most up-to-date data from the replica, and in cases of data corruption, you can already consider about from the what date to use the backup for recovery. Well, if your both the main server and the replica fail, then the backup will also save you.
What is the best way to make backups?
mysqldump is a good solution for small databases. This is a utility for creating logical backups nad it is included to MySQL Server. At the output, the utility creates a .sql file to recreate the database.
For large databases, it is better to use a physical backup. There are two ways on how to do it.
mysqlbackup is a utility included with MySQL Enterprise Solution. As a result, you get a binary file. Such a backup is created much faster than using mysqldump and is less load on the server.
xtrabackup, from Percona, is a lot like the MySQL Enterprise backup utility, but it's free. A more detailed comparison can be found here.
How often the backups should be made?
The more often you make backups, the better, but you can't make many such backups - since you will run out of space in the backup storage. There are two ways:
Find a compromise between the frequency of backups and the duration of storage.
Use incremental backups. The above utilities support incremental backups, but the management of such backups is more complicated (read more here)
Where the backups should be stored?
Anywhere you prefer, but not in the same place as the MySQL Server. Overall, I think using cloud storage is a good choice. Almost everyone today has a command line interface.
How to automate a backup?
The process of creating regular backups should be automated, and a person should intervene in it only in case of failure. A good backup process should include the following steps:
Creating a backup copy
Compression\Encryption
Uploading to storage
Sending success\fail notification
Removing old backups from the storage (so that it does not overflow)
The simplest script that implements this can be found, for example, here.
Something else?
Yes, the most important thing is not to create a backup and then restore it. Therefore, it is best practice to regularly test the recovery scenarios.
Happy backups!
What is better, to have mysql replication server on which we will run backup scripts or to run on live server
It depends on your db size (and time needed to dump it using mysqldump) and your reliability requirements.
If your db is relatively small and mysqldump dumps it in seconds or in a few minutes then its ok to just run scheduled backups. For most cases it is sufficient to have a daily backup which runs at a time when your app is mostly idle (at night when you clients are sleeping). You can use a nice tool automysqlbackup for that: it cares about the scheduling and backup rotation, all you need to do is to add it as your cron task and set up its config once.
Setting up a replica is only needed if:
Your backup takes long time (dozens of minutes or hours) to complete so you can not just stop your service for that long.
You can not afford loosing any history in case of main db crash. E.g. if you process financial transactions you may want to ensure that nothing will be lost if master db server dies.
In this cases you may want a replica with backups. Though you must understand that adding replication adds a new layer of problems: replicas may go out of sync, silently crash (and you will not notice that as the master and your app is running fine) etc.