What is the best strategy to backup the 2 TB MySQL data and How often should one schedule the data backup ?
I am using replication as a backup strategy, but i know that's not the good practice.
Please note : I am new to MySQL servers and this question may sound very basic and unsuitable to some old users.But I am trying to learn.
Thanks.
Size matters most in the fact that all operations take longer. There's no getting around that. Otherwise a lot of backup strategy remains the same.
First off, Replication is not a backup. It's for availability and scalability. Replication (with a delayed slave apply) is at best a single snapshot. Once a bad update/delete/truncate is replicated, the data is gone.
Your "best strategy" depends on several factors:
- Recovery Time Objective (how fast you need to restore).
- Recovery Point Objective (to what point in time to restore).
- Many small databases? One 2 TB database?
- How much money do you have to spend on resources.
- Are you held to regulatory requirements for being able to restore data for 1,3,7, years, etc.
A physical backup using Persona Xtrabackup will be able to take a point in time snapshot of all databases on your server. (caveat being non-transactional tables using myisam engine)
A logical backup with mysqldump may be faster to backup, be smaller, and compress better, but on restore, it needs to build indexes, so may take longer.
So... In a perfect situation, take regular physical and logical backups. Take continuous backups of the binary logs (https://www.percona.com/blog/2012/01/18/backing-up-binary-log-files-with-mysqlbinlog/). As long as your slave is up to date, you can do your backups there, as to not impact your master. To determine your frequency of backups, restore a backup, and time how long it takes to apply 1 week of logs. Did you meet your "Recovery Time Objective"? No, more frequent backups are needed.
Also, hang out at https://dba.stackexchange.com to get some more insight into these operational challenges of owning a database :)
Related
Let's say that I have 100GB RDS database. On Monday I created a manual snapshot of that database. On Friday I did the same.
I understand that the first snapshot will have the same size as original db. What about the second one? Will it also contain all data or will it only have changes made since Monday?
In other words: are manual snapshots more expensive than automated one (that I think are stored incrementally)?
TL;DR : Use automated backups. Only use manual snapshots for long term backups.
The below is not exact science, only speculation unfortunately.
RDS backups are an AWS secret sauce. However it seems to leverage MySQL binlogs (hence impossible to disable binlogs;) or Postgres WALs for automated backups and possibly LVM snapshots or other technology for other RDS types. But we'll never know. Rumours also say they are compressed (why wouldn't you if CPU is free but storage isn't:)
Generally manual snapshots will be more expensive. Due to the fact snapshots don't have a previous point in time to compare with so yes they would probably be fulls (an EBS "safe" snapshot then files are probably shipped to S3 for cold storage). Also it will be more expensive because you will necessarily use more storage on top of the automated backups. However if you turn off automatic backups to manually manage them, you will probably end up paying more (can't confirm) unless they compress well. Even if you could save a few dollars with manuals, the time spent manually managing these snapshots are probably not worth the effort nor risk unless you plan to keep weekly, monthly and yearly rotation which will inevitably cost you more on the long run. FWIW, what we do with our RDS instances is : 7 days backups + a lambda managing weeklies/monthlies/yearlies (and automatic clean-up / rotation). Yes it cost a fair bit of extra $$ on top.
As you know, backups are "free" as long as the total amount of backups and snapshots is less or equal to the total amount of RDS storage (all DB combined). See Reference. Unfortunately there is no way to know how much your RDS snapshots/backups are using therefore how far you are from paying something.
I hope the above (speculation) will somehow comfort you in your thoughts.
I have a an application that requires a master catalogue of about 30 tables which require to be copied out to many (100+) slave copies of the application. Slaves may be in their own DB instance or there may be multiple slaves in single DB instances. Any changes to the Master catalogue require to be copied out to the slaves within a reasonable time - around 5 minutes. Our infrastructure is all AWS EC2 and we use MySQL. Master and slaves will all reside within a single AWS region.
I had planned to use Master-Slave replication but I see reports of MySQL replication being sometimes unreliable and I am not sure if this is due to failings inherent in the particular implementations or failings in MySQL itself. We need a highly automated and reliable system and it may be that we have to develop monitoring scripts that allow a slave to continuously monitor its catalogue relative to the master.
Any observations?
When I was taking dance lessons before my wedding, the instructor said, "You don't have to do every step perfectly, you just have to learn to recover gracefully when missteps happen. If you can do that quickly, with a smile on your face, no one will notice."
If you have 100+ replicas, expect that you will be reinitializing replicas frequently, probably at least one or two every day. This is normal.
All software has bugs. Expecting anything different is, frankly, naive. Don't expect software to be flawless and continue operating 24/7 indefinitely without errors, because you will be disappointed. You should not seek a perfect solution, you should think like a dancer and recover gracefully.
MySQL replication is reasonably stable, and no less so than other solutions. But there are a variety of failures that can happen, without it being MySQL's fault.
Binlogs can develop corrupted packets in transit due to network glitches. MySQL 5.6 introduced binlog checksums to detect this.
The master instance can crash and fail to write an event to the binlog. sync_binlog can help to ensure all transactions are written to the binlog on commit (though with overhead for transactions).
Replica data can fall out of sync due to non-deterministic SQL statements, or packet corruption, or log corruption on disk, or some user can change data directly on a replica. Percona's pt-table-checksum can detect this, and pt-table-sync can correct errors. Using binlog_format=ROW reduces the chance of non-deterministic changes. Setting the replicas read-only can help, and don't let users have SUPER privilege.
Resources can run out. For example, you could fill up the disk on the master or the replica.
Replicas can fall behind, if they can't keep up with the changes on the master. Make sure your replica instances are not under-powered. Use binlog_format=ROW. Write fewer changes to an individual MySQL master. MySQL 5.6 introduces multi-threaded replicas, but so far I've seen some cases where this is still a bit buggy, so test carefully.
Replicas can be offline for an extended time, and when they come back online, some of the master's binlogs have been expired so the replica can't replay a continuous stream of events from where it left off. In that case, you should trash the replica and reinitialize it.
Bugs happen in any software project, and MySQL's replication has had their share. You should keep reading release notes of MySQL, and be prepared to upgrade to take advantage of bug fixes.
Managing a big collection of database servers in continuous operation takes a significant amount of full-time work, no matter what brand of database you use. But data has become the lifeblood of most businesses, so it's necessary to manage this resource. MySQL is no better and no worse than any other brand of database, and if anyone tells you something different, they're selling something.
P.S.: I'd like to hear why you think you need 100+ replicas in a single AWS region, because that is probably overkill by an order of magnitude for any goal of high availability or scaling.
We have to design an SQL Server 2008 R2 database storing many varbinary blobs.
Each blob will have around 40K and there will be around 700.000 additional entries a day.
The maximum size of the database estimated is 25 TB (30 months).
The blobs will never change. They will only be stored and retrieved.
The blobs will be either deleted the same day they are added, or only during cleanup after 30 months. In between there will be no change.
Of course we will need table partitioning, but the general question is, what do we need to consider during implementation for a functioning backup (to tape) and restore strategy?
Thanks for any recommendations!
Take a look at the "piecemeal backup and restore" - you will find it very useful for your scenario, which would benefit from different backup schedules for different filegroups/partitions. Here are a couple of articles to get you started:
http://msdn.microsoft.com/en-us/library/ms177425(v=sql.120).aspx
http://msdn.microsoft.com/en-us/library/dn387567(v=sql.120).aspx
I have had the pleasure in the past of working with several very large databases, the largest environment I have worked with being in the 5+ TB range. Going even larger than that, I am sure that you will encounter some unique challenges that I may not have faced.
What I can say for sure is that any backup strategy that you are going to implement is going to take awhile, so you should plan to have at least one day a week devoted to backups and maintenance where the database while available should not be expected to perform at the same levels.
Second, I have found the following MVP article to be extremely useful in planning backups which are taken through the native MSSQL backup operations. There are some large database specific options for the backup command which could assist in reducing your backup duration. While these increase throughput, you can expect performance impact. Specifically the options that had the greatest impact in my testing is buffercount, blocksize, and maxtransfersize.
http://henkvandervalk.com/how-to-increase-sql-database-full-backup-speed-using-compression-and-solid-state-disks
Additionally, assuming your data is stored on a SAN, you may wish as an alternative to investigate the use of SAN level tools in your backup strategy. Some SAN vendors provide software which integrates with SQL Server to perform SAN style snapshot backups while still integrating with the engine to handle things like marking backup dates and forwarding LSN values.
Based on your statement that the majority of the data will not change over time, inclusion of differential backups seems like a very useful option for you allowing you to reduce the number of transaction logs which would be have to be restored in a recovery scenario.
Please feel free to get in touch with me directly if you would like to discuss further.
We have to show a difference to show the advantages of using replication. We have two computers, linked by teamviewer so we can show our class what we are doing exactly.
Is it possible to show a difference in performance? (How long it takes to execute certain queries?)
What sort queries should we test? (in other words, where is the difference between using/not using replication the biggest)
How should we fill our database? How much data should be there?
Thanks a lot!
I guess the answer to the above questions depends on factors such as which storage engine you are using, size of the database, as well as your chosen replication architecture.
I don't think replication will have much of an impact on query execution for simple master->slave architecture. If however, you have an architecture where there are two masters: one handling writes, replicating to another master which exclusively handles reads, and then replication to a slave which handles backups, then you are far more likely to be able to present some of the more positive scenarios. Have a read up on locks and storage engines, as this might influence your choices.
One simple way to show how Replication can be positive is to demonstrate a simple backup strategy. E.g. Taking hourly backups on a master server itself can bring the underlying application to a complete halt for the duration of the backup (Taking backups using mysqldump locks the tables so that no read/write operations can occur). Whereas replicating to a slave, then taking backups from there negates this affect.
If you want to show detailed statistics, it's probably better to look into some benchmarking/profiling tools (sysbench,mysqlslap,sql-bench to name a few). This can become quite complex though.
Also might be worth looking at the Percona Toolkit and the Percona monitoring plugins here: http://www.percona.com/software/
Replication has several advantages:
Robustness is increased with a master/slave setup. In the event of problems with the master, you can switch to the slave as a backup
Better response time for clients can be achieved by splitting the load for processing client queries between the master and slave servers
Another benefit of using replication is that you can perform database backups using a slave server without disturbing the master.
Using replication always a safe thing to do you should be replicating your Production server always incase of failure it will be helpful.
You can show seconds_behind_master value while showing replication performance, this shows indication of how “late” the slave is this value should not be more than 600-800 seconds but network latency does matter here.
Make sure that Master and Slave servers are configured correctly now
You can stop slave server and let Master server has some updates/inserts (bulk inserts) happening and now start slave server you will see larger seconds_behind_master value it should be keep on decreasing till reaches 0 value.
There is a tool called MONyog - MySQL Monitor and Advisor which shows Replication status in real-time.
Also what kind of replication to use whether statement based or row based has been explained here
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html
Without not reason, I lose all my data in my database. Fortunately this was just test data, but this made me to think what will happen if this was done with a production db.
Ultimately, every developer got a db problem and want to rollback the db. We don't do things to protect the db, as we think its a DBA work, but then we got into trouble...
What are your backup best practices?
Since all the developers are also the DBAs where I work, we're collectively responsible for our backup strategy as well - if you care about the data, make sure you're at least informed about how the backups work, even if you're not part of the actual decisions.
The VERY first thing I do (before I even have any databases set up) is set up nightly maintenance plans that include a full backup, and direct those backups to a central network share on a different computer (our NAS). At the very least, for the love of your job, don't put the backups on the same physical storage that your database files sit on. What good are backups if you lose them at the same time you lose the disk?
We don't do point-in-time restores, so we don't do log backups (all our databases are set to Simple recovery mode), but if you want logs backed up, make sure you include those as well, as an acceptable interval as well.
On a side note, SQL 2008 supports compressed backups, which speeds up backup time considerably and makes the files much, much smaller - I can't think of an instance where you wouldn't want to use this option. I'd love to hear one, though, and I'm willing to reconsider!
Here are some points from my own experience:
Store your backups and SQL Server database files on the different physical storages. Otherwise, if your physical storage failed you will lose both backups and database files.
Create your own SQL Server database backup schedule.
Test your SQL Server backup. If you never tested your backups I have a doubt that you will be able to restore your database if the failure occurs. Time to time you need to have the practice to restore your backup on a test server.
Test your recovery strategy - here is another tip. If the failure occurs how much time do you need to restore your database to the working state?
Backup SQL Server's system databases.
And this isn't a whole list, you can find more tips in my article https://sqlbak.com/blog/backup-and-recovery-best-practices/
Choosing the right backup strategy is one of the most important factors a DBA should consider at the point of developing the DB.
However the backup strategy you choose depends on a number of factors:
How frequent is transaction carried out on the DB: are there thousands of transactions going on every minute or maybe few transactions per day? For very busy database, i would say, take full nightly backups and transaction log backups every 10 mins or even less.
How critical the data content is: may it be employee payroll data? Then you'll have no acceptable excuse or you may have a few angry faces around your car when you want to drive home! For very critical database, take nightly backups possibly in two locations and transaction log backups every 5 mins. (Also think of implementing mirroring).
Location of your backup location: if your backup location is close to the DB location, then you can afford to take more frequent backup than when it is located several hops away with not excellent bandwidth between.
But in all, i would say, schedule a nighly backup every night, and then transaction log backups at intervals.