We have to design an SQL Server 2008 R2 database storing many varbinary blobs.
Each blob will have around 40K and there will be around 700.000 additional entries a day.
The maximum size of the database estimated is 25 TB (30 months).
The blobs will never change. They will only be stored and retrieved.
The blobs will be either deleted the same day they are added, or only during cleanup after 30 months. In between there will be no change.
Of course we will need table partitioning, but the general question is, what do we need to consider during implementation for a functioning backup (to tape) and restore strategy?
Thanks for any recommendations!
Take a look at the "piecemeal backup and restore" - you will find it very useful for your scenario, which would benefit from different backup schedules for different filegroups/partitions. Here are a couple of articles to get you started:
http://msdn.microsoft.com/en-us/library/ms177425(v=sql.120).aspx
http://msdn.microsoft.com/en-us/library/dn387567(v=sql.120).aspx
I have had the pleasure in the past of working with several very large databases, the largest environment I have worked with being in the 5+ TB range. Going even larger than that, I am sure that you will encounter some unique challenges that I may not have faced.
What I can say for sure is that any backup strategy that you are going to implement is going to take awhile, so you should plan to have at least one day a week devoted to backups and maintenance where the database while available should not be expected to perform at the same levels.
Second, I have found the following MVP article to be extremely useful in planning backups which are taken through the native MSSQL backup operations. There are some large database specific options for the backup command which could assist in reducing your backup duration. While these increase throughput, you can expect performance impact. Specifically the options that had the greatest impact in my testing is buffercount, blocksize, and maxtransfersize.
http://henkvandervalk.com/how-to-increase-sql-database-full-backup-speed-using-compression-and-solid-state-disks
Additionally, assuming your data is stored on a SAN, you may wish as an alternative to investigate the use of SAN level tools in your backup strategy. Some SAN vendors provide software which integrates with SQL Server to perform SAN style snapshot backups while still integrating with the engine to handle things like marking backup dates and forwarding LSN values.
Based on your statement that the majority of the data will not change over time, inclusion of differential backups seems like a very useful option for you allowing you to reduce the number of transaction logs which would be have to be restored in a recovery scenario.
Please feel free to get in touch with me directly if you would like to discuss further.
Related
We have started a new project using MySQL, spring boot, and Angular js. Initially, we did not realize our DB is going to handle large data.
The number of tables will not be large (<130), only 10 to 20 tables will be contained in more data, which is almost inserted/ read/ update.
The estimated amount of data in that 10 table is going to grow at 12,00,000 records in a month, and we should not delete those data be able to do various reports.
There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time.
I don't have first-hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. as we have completed 100% coding and development but now we realize this. I have doubts may be MYSQL going to handle large data. I know that Oracle is the safe bet, interested if Mysql with a similar setup. But it is bound only in MySQL I am ok with any DB based on you all feedback I can take a call.
Open source DB more preferable but it's not mandatory we can go for paid DB also.
Handling Large Data
MySQL is more than capable of handling such loads. In fact, it is capable of handling much much more load than what you are talking about. You just have to create the right kind of tables. You can do that by choosing
the correct storage engine for your use-case
the correct character set
the optimal data type for your column
the right indexing strategy - creating indexes thoughtfully
the right partitioning strategy (if the data in the table exceeds tens of millions of records)
EDIT: You've also got to choose the right kind of data modelling and normalization strategy for your use-case. Most of OLTP applications require some level of normalization. But if you want to do analytics and aggregates on heavy tables, you should either have a Data Warehouse of have highly denormalized tables to avoid joins and/or have a column-oriented database to support such queries.
MySQL is open-source and has a very strong community support so you will find a lot of literature around any issue that you face. You can also find all the filed bugs (resolved and unresolved) here.
As far as the number of tables are concerned, there's really no cap on that. See here, MySQL permits 4 billion tables if you're using InnoDB as the engine.
A lot of very big companies with scale use MySQL in some capacity. Facebook is one of them.
Native JSON Support
With the growing popularity of JSON as the de facto data exchange format across the internet, MySQL has also provided native JSON support in 5.7, so now you can store and query JSON from your APIs, if required.
HA and Replication
MySQL Replication works! Earlier, MySQL used to support coordinate replication only but now it supports GTID replication which makes it easier to maintain and fix replication issues. There are third-party replicators also available in the market. For instance, Continuent's Tungsten is a replicator written in Java and is a replacement for native replication. It comes with a lot of configuration options which are not available with native MySQL replication.
I agree with MontyPython, MySql can do it and the design is critical. Fortunately MySql allows you to be flexible over time as needed.
I've had history tables needed used in daily reporting that grew to over a billion records in plain MySql and had no problems.
I've also used MySql Merge tables to divide up tables with big-ish rows (100KB+) to speed things up. Basically keeping the individual merge table file sizes under 30GB each. However that solution increases the open file count (in the system) per client - might be a bigger deal on a clustered system. That one was not.
That said, I like to give Honorable Mention to:
MariaDB - MySql but with contributions from Facebook, Alibaba, Google, and more.
I've moved most of my MySql community edition projects over to MariaDB and have been very happy. It's an almost transparent upgrade.
They offer an interesting enterprise Big Data Analytics (MariaDB AX) package, but with your current requirements its probably overkill and the standard community edition will fulfill your needs.
For example, here's an informative tutorial on how to set up a scalable Cluster (Galera) and adding MaxScale for High Availability:
https://mariadb.com/resources/blog/getting-started-mariadb-galera-and-mariadb-maxscale-centos
Another interesting option is Vitesse - developed at Youtube, which allows for sharded mysql through a (mostly) driver based solution. It solves the problem of needing to have available access to huge amounts of data and always yield good performance. As such, it goes beyond high availability and focuses on a solution wherein no single query (ie. a report against millions of rows of historical data) can negatively impact the other queries needing to be performed.
What is the best strategy to backup the 2 TB MySQL data and How often should one schedule the data backup ?
I am using replication as a backup strategy, but i know that's not the good practice.
Please note : I am new to MySQL servers and this question may sound very basic and unsuitable to some old users.But I am trying to learn.
Thanks.
Size matters most in the fact that all operations take longer. There's no getting around that. Otherwise a lot of backup strategy remains the same.
First off, Replication is not a backup. It's for availability and scalability. Replication (with a delayed slave apply) is at best a single snapshot. Once a bad update/delete/truncate is replicated, the data is gone.
Your "best strategy" depends on several factors:
- Recovery Time Objective (how fast you need to restore).
- Recovery Point Objective (to what point in time to restore).
- Many small databases? One 2 TB database?
- How much money do you have to spend on resources.
- Are you held to regulatory requirements for being able to restore data for 1,3,7, years, etc.
A physical backup using Persona Xtrabackup will be able to take a point in time snapshot of all databases on your server. (caveat being non-transactional tables using myisam engine)
A logical backup with mysqldump may be faster to backup, be smaller, and compress better, but on restore, it needs to build indexes, so may take longer.
So... In a perfect situation, take regular physical and logical backups. Take continuous backups of the binary logs (https://www.percona.com/blog/2012/01/18/backing-up-binary-log-files-with-mysqlbinlog/). As long as your slave is up to date, you can do your backups there, as to not impact your master. To determine your frequency of backups, restore a backup, and time how long it takes to apply 1 week of logs. Did you meet your "Recovery Time Objective"? No, more frequent backups are needed.
Also, hang out at https://dba.stackexchange.com to get some more insight into these operational challenges of owning a database :)
I am having a MySQL database which is running on a dedicated Ubuntu server having 2GB RAM and 500GB hard drive. I appreciate if anyone could help on fine tuning the database to increase the performance. Enhancements need to impact on CRUD tasks of the database, including procedure calls' and scheduled events' performances.
I have done searches on the web regarding this and found various mechanisms, tools and etc in various websites to do the job. But I need to know the proper way of escalating the performance (ex: execution time of an SQL query and etc) of a MySQL database itself without using any 3rd party tools or software. The database configurations which I am having are listed below.
MySQL version: 5.5
Used storage engine: MyISAM
Operating system: Ubuntu 12
Hard disk capacity: 500GB
RAM: 2GB
Other: The database consists of Tables, Indexes, Stored Procedures, Scheduled Events and Views
You have said nothing about the specifics of your data, its distribution, the type of workload you use, the ratio of reads to writes, the variety of your queries, the complexity of your queries, and so on. This is a vital part of the tuning process for one simple reason:
Tuning is specific to your data and your workload.
The guys who make database platforms such as MySQL pay a lot of attention to making sure the default settings are good enough for the majority of users. If there was some easy route to improving the performance of a database, they'd already have done it at the factory.
The guys who make the third party tools, on the other hand, write code that reads your data and your logs to find out information about your tables, their contents, and your queries, and that code makes best-guess estimates about tuning based on your data and your workload. They're not perfect, but they sure beat having to do that stuff manually if you don't know how to.
Think of tuning a database like tuning a guitar: You start with an idea of what you want (Standard tuning? Drop D? DADGAD?) and then you make small adjustments to one string at a time, measuring it against your desired result. Once you've achieved the best possible result for that string, you move onto the next one and make small changes there etc. When you get to the final string, you might have adjusted the balance of the whole guitar so you might have to revisit the settings from the beginning to make tiny incremental changes until the whole lot is singing perfectly.
Read http://dev.mysql.com/doc/refman/5.5/en/server-parameters.html to get started on the most important "strings" to tune in MySQL 5.5. There are lots, but none of them are particularly difficult on their own.
As a tangent, tuning your server away from the defaults might give you a 5-10% boost in performance. You'd be much better spending your time looking at your database design, data types, and the indexes you're using. You can often make 50%-100% improvements in performance by doing that sort of thing.
You should find http://www.mysqlcalculator.com/ helpful for starters.
This will show you some critical general defaults and allow you to
enter your own values as displayed by
SHOW GLOBAL VARIABLES
to calculate MySQL maximum memory usage.
This will only scratch the surface - and will be enlightening.
There is NO simple answer.
I'm studying up on the future of the database I maintain. Right now we have one database server running MySQL using InnoDB and MyISAM tables. I'm watching the metrics closely and I can see that this will not be sustainable forever. Where does one go next? I have reviewed solutions like Cassandra, but I want to stick to an SQL approach so I'm not sure about that. I have also reviewed NDB cluster and federated database solutions, but I've noticed no one has anything good to say about those. Basically, I looking for advice on intermediate solutions. We do not yet need a vast multi-node array operating on tens of DB servers, but one server is about to reach its limit. I don't want to just throw another server on the pile without making sure that the DB architecture at hand benefits well from the extra power. What do you guys suggest for when it is time to move beyond a single server and how to manage this transition. Thank you to anyone who can help.
Edit to better explain: At present, we have about a hundred tables. We run many join operations to gather the data the end user needs to see, such that most of our queries join at least two tables to complete any operation. The data set is not too big yet, only a few hundred Megs, but the data is accessed in such a way that each table has a few writes everyday, the heaviest of which has about a thousand writes a day. We probably have about a few hundred thousand reads a day too, so read do outnumber writes about 9 to 1.
First Solutions:
Indices go a LONG way
Use profiling software to find your slow queries and optimize them
Depending on your hosting company you can usually update the RAM/CPU of the server
Second Solutions:
Split your reads and your writes into two databases. (I don't know if you're using PHP or not but PHP has a plugin that will automatically split them for you without having to change any of your code http://php.net/manual/en/mysqlnd-ms.rwsplit.php)
Use software like memcache to store database information that is frequently queried but not frequently updated
Without not reason, I lose all my data in my database. Fortunately this was just test data, but this made me to think what will happen if this was done with a production db.
Ultimately, every developer got a db problem and want to rollback the db. We don't do things to protect the db, as we think its a DBA work, but then we got into trouble...
What are your backup best practices?
Since all the developers are also the DBAs where I work, we're collectively responsible for our backup strategy as well - if you care about the data, make sure you're at least informed about how the backups work, even if you're not part of the actual decisions.
The VERY first thing I do (before I even have any databases set up) is set up nightly maintenance plans that include a full backup, and direct those backups to a central network share on a different computer (our NAS). At the very least, for the love of your job, don't put the backups on the same physical storage that your database files sit on. What good are backups if you lose them at the same time you lose the disk?
We don't do point-in-time restores, so we don't do log backups (all our databases are set to Simple recovery mode), but if you want logs backed up, make sure you include those as well, as an acceptable interval as well.
On a side note, SQL 2008 supports compressed backups, which speeds up backup time considerably and makes the files much, much smaller - I can't think of an instance where you wouldn't want to use this option. I'd love to hear one, though, and I'm willing to reconsider!
Here are some points from my own experience:
Store your backups and SQL Server database files on the different physical storages. Otherwise, if your physical storage failed you will lose both backups and database files.
Create your own SQL Server database backup schedule.
Test your SQL Server backup. If you never tested your backups I have a doubt that you will be able to restore your database if the failure occurs. Time to time you need to have the practice to restore your backup on a test server.
Test your recovery strategy - here is another tip. If the failure occurs how much time do you need to restore your database to the working state?
Backup SQL Server's system databases.
And this isn't a whole list, you can find more tips in my article https://sqlbak.com/blog/backup-and-recovery-best-practices/
Choosing the right backup strategy is one of the most important factors a DBA should consider at the point of developing the DB.
However the backup strategy you choose depends on a number of factors:
How frequent is transaction carried out on the DB: are there thousands of transactions going on every minute or maybe few transactions per day? For very busy database, i would say, take full nightly backups and transaction log backups every 10 mins or even less.
How critical the data content is: may it be employee payroll data? Then you'll have no acceptable excuse or you may have a few angry faces around your car when you want to drive home! For very critical database, take nightly backups possibly in two locations and transaction log backups every 5 mins. (Also think of implementing mirroring).
Location of your backup location: if your backup location is close to the DB location, then you can afford to take more frequent backup than when it is located several hops away with not excellent bandwidth between.
But in all, i would say, schedule a nighly backup every night, and then transaction log backups at intervals.