Reliable backups for huge mysql databases?

Reliable backups for huge mysql databases? - mysql

I have 200GB / 400Mrows mysql/innodb database - far beyond what's reasonable as I found out.
One surprising problem is restoring backups. mysqldump generates huge sql files, and they take about a week to import back into a fresh database (attempts at making it faster like bigger/smaller transactions, turning off keys during import etc., network compression etc. failed so far, myisam import seems 2x faster but then there would be no transactions).
What's worse - and I hope to get some help with this - a network connection which transfers >200GB over a time period of a week has a non-trivial chance of breaking, and sql import process cannot be continued in any non-trivial way.
What would be the best way of dealing with it? Right now if I notice a broken connection I manually try to figure out when it ended by checking highest primary key of the last imported table, and then have a perlscript which basically does this:
perl -nle 'BEGIN{open F, "prelude.txt"; #a=<F>; print #a; close F;}; print if $x; $x++ if /INSERT.*last-table-name.*highest-primary-key/'
This really isn't the way to go, so what would be the best way?

Does your MySQL box have enough hard drive space for all the data doubled? Local storage would be best here, but if it's not an option, you could also try some sort of NAS device utilizing iSCSI. It's still happening over the network, but in this case you get more throughput and reliability, because you're only relying on a NAS which has a pretty slim OS and almost never has to be rebooted.

You can't use mysqldump to backup large databases - 200G is feasible but bigger ones it gets worse and worse.
Your best bet is to take a volume snapshot of the database directory and zip that up somehow - that's what we've generally done - or rsync it somewhere else.
If your filesystem or block device does not support snapshots then you're basically in trouble. You can shut the db down to take a backup, but I don't imagine you want to do that.
To restore it, just do the opposite then restart and wait (possibly some time) for innodb recovery to fix things up.
The maatkit mk-parallel-dump and restore tools are a bit better than mysqldump, speed-wise - but I'm not 100% convinced of their correctness
Edit: re-reading the question, I think filesystem snapshot + rsync is probably the best way to go; you can do this without impacting the live system (you'll only need to transfer what changed since the last backup) too much and you can resume the rsync if the connection fails, and it'll continue where it left off.

Do you need everything in the database?
Can you push some of the information to an archive database and add something into your application that would allow people to view records in the archive,
Obviously this depends a lot upon your application and set up, but it may be a solution? Your DB is probably only going to get bigger....

Related

WHM / cPanel: mySQL back-up

I am setting a new server, with WHM / cPanel installed.
It is very important that I can take a full mySQL back-up once or twice a day.
Right now the databases are pretty small (20 mb), but volume will increase rapidly as soon s we get more customers.
I now there is a possibility to create a cron job and have the back-up emailed.
However, I think it is a shitty solution due to the future size of these back-up's.
What are your best advices regarding daily mySQL back-ups?

You can have a cron job set up to back up your database and dump it to a directory of your choosing:
mysqldump --all-databases --skip-lock-tables | gzip -9 > /your/backup/dir/here/`date +%Y%m%d`_backup.sql.gz
What you do from there is up to you, but I agree that you should not email it due to the size. Perhaps you can use SCP to send it to a different server.

A common practice for backups is as follows
Do a daily backup.
Once the first daily reaches a week old, start deleting things, but keep the one that hit a week old.
Start keeping monthly backups, deleting all weekly once the first one hits a month old.
You can continue to do this with yearly as well.
The whole point of this is that you should catch any errors in information before your backups hit the week mark. You shouldn't really need to keep the REALLY old backups for any specific purpose, so that is why you start deleting things.
Unless you have unlimited space on various disks, you would probably be best to follow this practice, and just make sure nothing happens to your data.
Cheers.

If you think that your data will grow to considerable size to where it will cause significant database downtime to make dumps, you first ought to look into implementing a replicated set up to where you can take the dumps off the slave.
You can certainly run a simple mysqldump command from within cron, placing dumps into a drive on the machine. You can then worry about moving the data off the machine by some other means (you don't want to email it). Just keep in mind that it will lock your database up briefly for as long as your data dump takes to complete.
If you want to think out of the box, you could just implement a MySQL instance on Amazon RDS. It can do the replication, database snapshots, etc. with basically push-button ease.

Using a MySQL database is slow

We have a dedicated MySQL server, with about 2000 small databases on it. (It's a Drupal multi-site install - each database is one site).
When you load each site for the first time in a while, it can take up to 30s to return the first page. After that, the pages return at an acceptable speed. I've traced this through the stack to MySQL. Also, when you connect with the command line mysql client, connection is fast, then "use dbname" is slow, and then queries are fast.
My hunch is that this is due to the server not being configured correctly, and the unused dbs falling out of a cache, or something like that, but I'm not sure which cache or setting applies in this case.
One thing I have tried is the innodb_buffer_pool size. This was set to the default 8M. I tried raising it to 512MB (The machine has ~ 2GB of RAM, and the additional RAM was available) as the reading I did indicated that more should give better performance, but this made the system run slower, so it's back at 8MB now.
Thanks for reading.

With 2000 databases you should adjust the table cache setting. You certainly have a lot of cache miss in this cache.
Try using mysqltunner and/or tunning_primer.sh to get other informations on potential issues with your settings.
Now drupal makes Database intensive work, check you Drupal installations, you are maybe generating a lot (too much) of requests.
About the innodb_buffer_pool_size, you certainly have a lot of pagination cache miss with a little buffer (8Mb). The ideal size is when all your data and indexes size can fit in this buffer, and with 2000 databases... well it is quite certainly a very little size but it will be hard for you to grow. Tunning a MySQL server is hard, if MySQL takes too much RAM your apache won't get enough RAM.
Solutions are:
check that you do not make the connexion with DNS names but with IP
(in case of)
buy more RAM
set MySQL on a separate server
adjust your settings
For Drupal, try to set the session not in the database but in memcache (you'll need RAM for that but it will be better for MySQL), modules for that are available. If you have Drupal 7 you can even try to set some of the cache tables in memcache instead of MySQL (do not do that with big cache tables).
edit: last thing, I hope you have not modified Drupal to use persistent database connexions, some modules allows that (or having an old drupal 5 which try to do it automatically). With 2000 database you would kill your server. Try to check mysql error log for "too many connections" errors.

Hello Rupertj as I read you are using tables type innodb, right?
innodb table is a bit slower than myisam tables, but I don't think it is a major problem, as you told, you are using drupal system, is that a kind of mult-sites, like a word-press system?
If yes, sorry about but this kind of systems, each time you install a plugin or something else, it grow your database in tables and of course in datas.. and it can change into something very very much slow. I have experiencied by myself not using Drupal but using Word-press blog system, and it was a nightmare to me and my friends..
Since then, I have abandoned the project... and my only advice to you is, don't install a lot of plugins in your drupal system.
I hope this advice help you, because it help me a lot in word-press.

This sounds like a caching issue in Drupal, not MYSQL. It seems there are a few very heavy queries, or many, many small ones, or both, that hammer the database-server. Once that is done, Drupal caches that in several caching layers. After which only one (or very few) queries are all that is needed to build up a page. Slow in the beginning, fast after that.

You will have to profile it to determine what the cause is, but the table cache seems like a likely suspect.
However, you should also be mindful of persistent connections - which should absolutely definitely, always be turned off (yes, for everyone, not just you). Apache / PHP persistent connections are a pessimisation that you and everyone else can generally do without.

How long should a 20GB restore take in MySQL? (A.k.a. Is something broken?)

I'm trying to build a dev copy of a production MySQL database by loading one of the backups. How long should it take to do this if the uncompressed dump is ~20G?
This command has been running for something like 24h with 10% CPU load and I'm wondering if it's just slow or if it/I am doing something wrong.
mysql -u root -p < it_mysql_dump.sql
BTW it's on a beefy desktop dev machine with plenty of ram, but it might be reading and writing the same HDD. I think I'm using InnoDB.

Restoring MySQL dumps can take a long time. This is because it does really rebuild the entire tables.
Exactly what you need to do to fix it depends on the engine, but in general
I would say, do the following:
Zeroth rule: Only use a 64-bit OS.
Make sure that you have enough physical ram to fit the biggest single table into memory; include any overhead for the OS in this calculation (NB: On operating systems that use 4k pages i.e. all of them, the page tables take up a lot of memory themselves on large-memory systems - don't forget this)
Tune the innodb_buffer_pool such that it is bigger than the largest single table; or if using MyISAM, tune the key_buffer so that it is big enough to hold the indexes of the largest table.
Be patient.
Now, if you are still finding that it is slow having done the above, it may be that your particular database has a very tricky structure to restore.
Personally I've managed to rebuild a server with ~ 2Tb in < 48 hours, but that was a particular case.
Be sure that your development system has production-grade hardware if you intend to load production data into it.
In particular, if you think that you can bulk-load data into tables which don't fit into memory (or at least, mostly into memory), forget it.
If this all seems like too much, remember that you can just use a filesystem or LVM snapshot online with InnoDB, and then just copy the files. With MyISAM it's a bit trickier but can still be done.

Open another terminal, run mysql, and count the rows in some of the tables in your dump (SELECT COUNT(*) FROM table). Compare to the source database. That'll tell you the progress.
I INSERTed about 80GB of data into MySQL over a network in about 14 hours. They were one-insert-per-row dumps (slow) with a good bit of overhead, inserting on a server with fast disks.
24 hours is possible if the hardware is old enough, or your import is competing with something else for disk IO and memory.

I just went through the experience of restoring a 51.8 Gb database from a 36.8 Gb mysqldump file to create an imdb database. For me the restore which was not done over the network but was done from a file on the native machine took a little under 4 hours.
The machine is a Quad Core Server running Windows Server 2008. People have wondered if there is a way to monitor progress. There actually is. You can watch the restore create the database files by going to the Program Data directory and finding the MYSQL subdirectory and then finding the subdirectory with your database name.
The files are gradually built in the directory and you can watch them build up. No small comfort when you have a production issue and you are wondering if the restore job is hung up or just taking a long time.

Fast InnoDB Restore?

I am working on a development copy of a MySQL / InnoDB database that is about 5GB.
I need to restore it pretty frequently for testing alter scripts, and using the mysqldump file is taking quite a while. The file itself is about 900MB and takes around an hour to load in. I've removed data inserts for unimportant tables, and done extended inserts, etc., but it's still pretty slow.
Is there a faster way to do this? I am thinking to just make a copy of the database files from .../mysql/database-name, id_logfile#, and ibdata1, and copy them back in when I need to 'reset' the db, but is this viable with InnoDB? Is the ibdata file for one database? I only see one, even though I have multiple InnoBD db on this box.
Thanks!

I believe you can just copy the file named for the database (with the server daemon down) and all should be well. It seems like something that a little testing on a sample db should answer quickly, no?

I have no idea if it would be faster -- it might be slower, in fact, depending on how extensively your tests alter the data --, but what if you put all your tests in a transaction, and then rolled it back?

Use an LVM snapshot of the disk. Load your data (lengthy process) and take a snapshot (seconds). Then, whenever you need to go back to the snapshot, play LVM games -- again only seconds to re-load and re-clone any sized disk.

You might be able to do something with http://dev.mysql.com/doc/refman/5.1/en/multiple-tablespaces.html
See the ALTER TABLE xxx IMPORT TABLESPACE option especially.

For innodb backup and restore speed you might want to look at mydumper/myloader - much faster than mysqldump due to multi-threaded nature http://vbtechsupport.com/1716/

What's the fastest way to import a large mysql database backup?

What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.

It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.

Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008