WHM / cPanel: mySQL back-up - mysql

I am setting a new server, with WHM / cPanel installed.
It is very important that I can take a full mySQL back-up once or twice a day.
Right now the databases are pretty small (20 mb), but volume will increase rapidly as soon s we get more customers.
I now there is a possibility to create a cron job and have the back-up emailed.
However, I think it is a shitty solution due to the future size of these back-up's.
What are your best advices regarding daily mySQL back-ups?

You can have a cron job set up to back up your database and dump it to a directory of your choosing:
mysqldump --all-databases --skip-lock-tables | gzip -9 > /your/backup/dir/here/`date +%Y%m%d`_backup.sql.gz
What you do from there is up to you, but I agree that you should not email it due to the size. Perhaps you can use SCP to send it to a different server.

A common practice for backups is as follows
Do a daily backup.
Once the first daily reaches a week old, start deleting things, but keep the one that hit a week old.
Start keeping monthly backups, deleting all weekly once the first one hits a month old.
You can continue to do this with yearly as well.
The whole point of this is that you should catch any errors in information before your backups hit the week mark. You shouldn't really need to keep the REALLY old backups for any specific purpose, so that is why you start deleting things.
Unless you have unlimited space on various disks, you would probably be best to follow this practice, and just make sure nothing happens to your data.
Cheers.

If you think that your data will grow to considerable size to where it will cause significant database downtime to make dumps, you first ought to look into implementing a replicated set up to where you can take the dumps off the slave.
You can certainly run a simple mysqldump command from within cron, placing dumps into a drive on the machine. You can then worry about moving the data off the machine by some other means (you don't want to email it). Just keep in mind that it will lock your database up briefly for as long as your data dump takes to complete.
If you want to think out of the box, you could just implement a MySQL instance on Amazon RDS. It can do the replication, database snapshots, etc. with basically push-button ease.

Related

Should I use mysql to keep logs, or just dump to a text file

I am creating a site which will make lots of searches and I need to log data about every search that is made for later analysis.
I anticipate ultimately having load distributed between a number of servers, then each month I will download and import all logs into a single mysql database at my end for analysis.
At the moment I've been looking at setting every server up as a mysql 'master' which will live update the slave analysis server and essentially also act as a backup.
However I'm aiming for efficiency. Obviously the benefits of mysql replication are that I always have logs centrally available and don't have to import and reset log files on each server every month.
How much more efficient would it be to log in a plaintext file and just dump this logfile every month and import into mysql centrally? Is a plaintext dump much, if any, more efficient/faster than mysql?
Thanks for your thoughts!
Databases are strong for doing more than inserts. They are strong for locking mechanisms, transaction management, fast searches, connections pooling, and the list goes on.
On the other hand, if all you need to do in general is writing a chunk of data to the disk, a database would be a huge overhead.
Given the above, and since you only want to write stuff all month long, I would recommend you use logs, and once a month - take the logs, merge them together and analyze them. You could then decide if you want to merge all of them into a database (if it makes sense and gives you some added value), or you just want to merge the text together.
BTW, you may want to save the INSERT statements into this log, and then use it as a script to load everything into the database. Give it a thought :-)

SQL Server 2008 backup best practices

Without not reason, I lose all my data in my database. Fortunately this was just test data, but this made me to think what will happen if this was done with a production db.
Ultimately, every developer got a db problem and want to rollback the db. We don't do things to protect the db, as we think its a DBA work, but then we got into trouble...
What are your backup best practices?
Since all the developers are also the DBAs where I work, we're collectively responsible for our backup strategy as well - if you care about the data, make sure you're at least informed about how the backups work, even if you're not part of the actual decisions.
The VERY first thing I do (before I even have any databases set up) is set up nightly maintenance plans that include a full backup, and direct those backups to a central network share on a different computer (our NAS). At the very least, for the love of your job, don't put the backups on the same physical storage that your database files sit on. What good are backups if you lose them at the same time you lose the disk?
We don't do point-in-time restores, so we don't do log backups (all our databases are set to Simple recovery mode), but if you want logs backed up, make sure you include those as well, as an acceptable interval as well.
On a side note, SQL 2008 supports compressed backups, which speeds up backup time considerably and makes the files much, much smaller - I can't think of an instance where you wouldn't want to use this option. I'd love to hear one, though, and I'm willing to reconsider!
Here are some points from my own experience:
Store your backups and SQL Server database files on the different physical storages. Otherwise, if your physical storage failed you will lose both backups and database files.
Create your own SQL Server database backup schedule.
Test your SQL Server backup. If you never tested your backups I have a doubt that you will be able to restore your database if the failure occurs. Time to time you need to have the practice to restore your backup on a test server.
Test your recovery strategy - here is another tip. If the failure occurs how much time do you need to restore your database to the working state?
Backup SQL Server's system databases.
And this isn't a whole list, you can find more tips in my article https://sqlbak.com/blog/backup-and-recovery-best-practices/
Choosing the right backup strategy is one of the most important factors a DBA should consider at the point of developing the DB.
However the backup strategy you choose depends on a number of factors:
How frequent is transaction carried out on the DB: are there thousands of transactions going on every minute or maybe few transactions per day? For very busy database, i would say, take full nightly backups and transaction log backups every 10 mins or even less.
How critical the data content is: may it be employee payroll data? Then you'll have no acceptable excuse or you may have a few angry faces around your car when you want to drive home! For very critical database, take nightly backups possibly in two locations and transaction log backups every 5 mins. (Also think of implementing mirroring).
Location of your backup location: if your backup location is close to the DB location, then you can afford to take more frequent backup than when it is located several hops away with not excellent bandwidth between.
But in all, i would say, schedule a nighly backup every night, and then transaction log backups at intervals.

Reliable backups for huge mysql databases?

I have 200GB / 400Mrows mysql/innodb database - far beyond what's reasonable as I found out.
One surprising problem is restoring backups. mysqldump generates huge sql files, and they take about a week to import back into a fresh database (attempts at making it faster like bigger/smaller transactions, turning off keys during import etc., network compression etc. failed so far, myisam import seems 2x faster but then there would be no transactions).
What's worse - and I hope to get some help with this - a network connection which transfers >200GB over a time period of a week has a non-trivial chance of breaking, and sql import process cannot be continued in any non-trivial way.
What would be the best way of dealing with it? Right now if I notice a broken connection I manually try to figure out when it ended by checking highest primary key of the last imported table, and then have a perlscript which basically does this:
perl -nle 'BEGIN{open F, "prelude.txt"; #a=<F>; print #a; close F;}; print if $x; $x++ if /INSERT.*last-table-name.*highest-primary-key/'
This really isn't the way to go, so what would be the best way?
Does your MySQL box have enough hard drive space for all the data doubled? Local storage would be best here, but if it's not an option, you could also try some sort of NAS device utilizing iSCSI. It's still happening over the network, but in this case you get more throughput and reliability, because you're only relying on a NAS which has a pretty slim OS and almost never has to be rebooted.
You can't use mysqldump to backup large databases - 200G is feasible but bigger ones it gets worse and worse.
Your best bet is to take a volume snapshot of the database directory and zip that up somehow - that's what we've generally done - or rsync it somewhere else.
If your filesystem or block device does not support snapshots then you're basically in trouble. You can shut the db down to take a backup, but I don't imagine you want to do that.
To restore it, just do the opposite then restart and wait (possibly some time) for innodb recovery to fix things up.
The maatkit mk-parallel-dump and restore tools are a bit better than mysqldump, speed-wise - but I'm not 100% convinced of their correctness
Edit: re-reading the question, I think filesystem snapshot + rsync is probably the best way to go; you can do this without impacting the live system (you'll only need to transfer what changed since the last backup) too much and you can resume the rsync if the connection fails, and it'll continue where it left off.
Do you need everything in the database?
Can you push some of the information to an archive database and add something into your application that would allow people to view records in the archive,
Obviously this depends a lot upon your application and set up, but it may be a solution? Your DB is probably only going to get bigger....

What's the fastest way to import a large mysql database backup?

What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.
It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.
Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

Full complete MySQL database replication? Ideas? What do people do?

Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.
We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync
GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).
I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489