This is a bit of a general question but, I recently started paying for a webhost for my website and, once I noticed that the MySQL servers are shared(checked phpmyadmin, it has sent 9GB of data on a total of 15k requests the past hour alone, and has been running for 39 days without a restart) and I only have admin control over my own databases, I started wondering:
How big can a MySQL database become before having issues?(Say, long delays, errors, crashes, etc?) Does anyone have experience with that?
Mind you mine runs on MySQL 5.7
The load you describe should be easy for MySQL on a reasonably powerful server. But it depends on what the queries are, and how well you have optimized.
At my last DBA job, we "gently encouraged" the developers (i.e. alerted them with PagerDuty) if the database size grew over 1TB, or if a single table grew over 512GB, or the queries per second rose over 50k. That's when things started to go south for a typical application workload, if the database and queries were not designed by developers who were especially mindful about optimization. Also the servers were pretty powerful hardware, circa 2020 (48 core Xeon, 256GB RAM, 3TB NVMe storage typically with 2 or 3 physical devices in a RAID).
With more care, it's possible to increase the scale of MySQL a lot higher. With variants like PlanetScale, it can support even more. Cf. https://planetscale.com/blog/one-million-queries-per-second-with-mysql
On the other hand, I saw a photo of one of our NVMe drive literally melted from serving 6k writes per second on a database with half a TB of data. It depends on the types of queries (perhaps it was a faulty NVMe drive).
The only answer that could be useful to you is:
You must load-test your queries on your server.
Also, besides query performance (the only thing most developers care about), you also have to consider database operations. I supported some databases that grew without bound, despite my team's urging the developers to split them up. The result was it would take more than 24 hours to make a backup, which is a problem if there's a policy requirement that backups must be made daily. Also if they wanted to alter a table (e.g. add a column), it could take up to 4 weeks to run.
Related
We have to design an SQL Server 2008 R2 database storing many varbinary blobs.
Each blob will have around 40K and there will be around 700.000 additional entries a day.
The maximum size of the database estimated is 25 TB (30 months).
The blobs will never change. They will only be stored and retrieved.
The blobs will be either deleted the same day they are added, or only during cleanup after 30 months. In between there will be no change.
Of course we will need table partitioning, but the general question is, what do we need to consider during implementation for a functioning backup (to tape) and restore strategy?
Thanks for any recommendations!
Take a look at the "piecemeal backup and restore" - you will find it very useful for your scenario, which would benefit from different backup schedules for different filegroups/partitions. Here are a couple of articles to get you started:
http://msdn.microsoft.com/en-us/library/ms177425(v=sql.120).aspx
http://msdn.microsoft.com/en-us/library/dn387567(v=sql.120).aspx
I have had the pleasure in the past of working with several very large databases, the largest environment I have worked with being in the 5+ TB range. Going even larger than that, I am sure that you will encounter some unique challenges that I may not have faced.
What I can say for sure is that any backup strategy that you are going to implement is going to take awhile, so you should plan to have at least one day a week devoted to backups and maintenance where the database while available should not be expected to perform at the same levels.
Second, I have found the following MVP article to be extremely useful in planning backups which are taken through the native MSSQL backup operations. There are some large database specific options for the backup command which could assist in reducing your backup duration. While these increase throughput, you can expect performance impact. Specifically the options that had the greatest impact in my testing is buffercount, blocksize, and maxtransfersize.
http://henkvandervalk.com/how-to-increase-sql-database-full-backup-speed-using-compression-and-solid-state-disks
Additionally, assuming your data is stored on a SAN, you may wish as an alternative to investigate the use of SAN level tools in your backup strategy. Some SAN vendors provide software which integrates with SQL Server to perform SAN style snapshot backups while still integrating with the engine to handle things like marking backup dates and forwarding LSN values.
Based on your statement that the majority of the data will not change over time, inclusion of differential backups seems like a very useful option for you allowing you to reduce the number of transaction logs which would be have to be restored in a recovery scenario.
Please feel free to get in touch with me directly if you would like to discuss further.
I have been tasked with investigating reasons why our internal web application is hitting performance problems.
The web application itself is part written in PHP and part written in Perl, and we have a MySQL database which is where I believe the source of performance hit is occurring.
We have about 400 users of the system, of which, most are spread across different timezones, so generally there are only ever a max of 30 users online at any one time. The performance problems have crept up on us, particularly over the past year as the database keeps growing.
The system is running on one single 32-bit debian server - 6GB of RAM, with 8 x 2.4GHz intel CPU. This is probably not hefty enough for the job in-hand. However, even at times where I am the only user online, page loading time can still be slow.
I'm trying to determine whether we need to scale up or scale out. Firstly, I'd like to know is how well our hardware is coping with the demands placed upon it. And secondly, whether it might be worth scaling out and creating some replication slaves to balance the load.
There are a lot of tools available on the internet - probably a bit too many to investigate. Can anyone recommend any tools that can provide some profiling/performance monitoring that may help me on my quest.
Many thanks,
ns
Your slow-down seems to be related to the data and not to the number of concurrent users.
Properly indexed queries tend to scale logarithmically with the amount of data - i.e. doubling the data increases the query time by some constant C, doubling the data again by the same C, doubling again by the same C etc... Before you know it, you have humongous amounts of data, yet your queries are just a little slower.
If the slow-down wasn't as gradual in your case (i.e. it was linear to the amount of data, or worse), this might be an indication of badly optimized queries. Throwing more iron at the problem will postpone it, but unless you have unlimited budget, you'll have to actually solve the root cause at some point:
Measure the query performance on the actual data to identify slow queries.
Examine the execution plans for possible improvements.
If necessary, learn about indexing, clustering, covering and other performance techniques.
And finally, apply that knowledge onto queries you have identified in steps (1) and (2).
If nothing else helps, think about your data model. Sometimes, a "perfectly" normalized model is not the best performing one, so a little judicial denormalization might be warranted.
The easy (lazy) way if you have budget is just to throw some more iron at it.
A better way would be, before deciding where or how to scale, would be to identify the bottlenecks. Is it every page load that is slow? Or just particular pages? If it is just a few pages then invest in a profiler (for PHP both xDebug and the Zend Debugger can do profiling). I would also (if you haven't) invest in a test system that is as similar as possible to the live system to run diagnostics.
You could also look at gathering some stats; either at server level with a program such as sar (from the sysstat package and also at the db level (have you got the slow query log running?).
The mysql performance of running the magento for this situation under one mysql installation is giving a headache. I wonder if it is feasible to setup an individual mysql for each website so that updates to the catalog can occur concurrently across all websites.
It sure can be made working within a cluster and if you queue your updates and plan ahead for such. But it won't be cheap and i'll guess you 'll need a mysql instance for every 30 to 50 website. It's worth to observe mysql sharding for heavily used tables and ways to run all this inside RAM to dramatically pull down the resource usage needed.
and for such task you have to be living and breathing INNODB person
So I've got a MySQL database for an web community that is a potential stats goldmine. Currently I'm serving stats built via all sorts of nasty queries on my well-normalized database. I've run into the "patience limit" for such queries on my shared hosting, and would like to move to data warehousing and a daily cron job, thereby sacrificing instant updates for a 100-fold increase in statistical depth.
I've just started reading about data warehouses, and particularly the star schema, and it all seems pretty straight-forward.
My question essentially is - should I toss all that crap into a new database, or just pile the tables into my existing MySQL database? The current database has 47 tables, the largest of which has 30k records. I realize this is paltry compared to your average enterprise application, but your average enterprise application does not (I hope!) run on shared-hosting!
So, keeping my hardware limits in mind, which method would be better?
I really don't know much about this at all, but I assume reading Table A, calculating, then updating Table B is a lot easier in the same database than across databases, correct?
Should I even care how many tables my DB has?
If you just need to improve performance, you should just create a set of pre-cocked reporting tables. Low effort and big performance gains. With the data volume you described, this won't even have an noticable impact on the users of your web community.
The different database approach has several benefits (see below) but I don't think you will gain any of them as you are on a shared database host.
You can support different SLA for DW and web site
DW and Web database can have different configurations
DW database is basically read-only for large portion of the day
DW and Web database can have different release cycles (this is big)
Typical DW queries (large amount of data) don't kill the cache for web DB.
The number of tables in a particular database does not usually become a problem until you have thousands (or tens of thousands) of tables, and these problems usually come into play due to filesystem limits related to the maximum number of files in a directory.
You don't say what storage engine you are using. In general, you want the indexes in your database to fit into memory for good insert/update/delete performance, so the size of your key buffer or buffer pool must be large enough to hold the "hot" part of the index.
How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.