What if Database Size Limit exceeds? - mysql

I was just thinking if a mysql database's size limit exceeds what will happen to my app running on that.
My hosting only allows 1 GB of space per database. I know thats too much, but what if i make an app on people discussing something, and sometime after many years the database limit exceeds.
Then what will I do? And approximately how much text data can be stored in 1 GB?
And can I have 2 databases running one application. Like one database contains usernames and profiles and that sort of stuff, and the other contains questions and answers? And will that slow down process of getting everything?
Update: can i set up mysql on my own server and have overcome the size limitation?
Thanks.

There will be no speed disadvantage from splitting your tables across two databases (assuming both databases are on the same MySQL server), but if the data are logically part of the same application then it is more sensible they be grouped together.
When you want to refer to a table in another database, you also have to qualify it with the appropriate database name, which you could see as an inefficiency.
My guess is that if you approach 1GB with two databases or with one, it's not going to make a difference how your host treats you (it shouldn't make a difference for MySQL, after all). I suggest you not worry about it unless you're going to be generating data like nobody's business, and in that case you require a more dedicated host.
If you figure out years down the line that you're coming to the limit, you can make a decision then whether to dump some of your older data or move to a host that permits you to store more data.
I don't think your application would stop working immediately when you hit 1GB. I think it more likely that your host would start writing you emails telling you off and suggesting you upgrade packages, or something.

Most of this is specific to your host. 1GB is ~1 billion bytes (one letter usually = on byte). Having 2 databases will not slow anything down, so long as they're both on the same host and they're properly set up.

Related

Mysql database size growing over 4 terabytes ? Azure supports up to 4

A couple of days ago, I was asked for help in this particular situation. A MYSQL Database set up on Azure is reaching 4 terabytes in size. I have set up databases before and developed for them but I'm not really a dba.
The problem according to them is that Azure size limit is 4 terabytes (and it will double that size in a couple of months but luckily just it wont keep growing like that). I talked to them about achieving some of the data, but they need all 10 years worth of data apparently. They don't want to move from Azure or use something other than MYSQL. One thing they pointed out to me was that 1 table in particular was almost 2 terabytes in size.
Unfortunately, I haven't been given access to the database yet but I just wanted to ask about my options in a situation like this. I looked into this a bit and I saw Stuff like MYSQL sharding. Is this the only option ? Can it be done on Azure (I saw SQL sharding articles for SQL server on Azure but not for Mysql). Can I partition some tables into another MYSQL database for example ?
I guess I'm just looking for advice on how to move forward with this. Any link on something like this is appreciated.
Thank you
Simple Answer
4TB is not the MySQL limit, so Azure is limiting you. Switch to another service.
Future problems
But... 4TB is rife with issues, especially for a non-dba.
Over-sized datatypes (wasting disk space)
Lack of normalization (wasting disk space)
Need for summary tables (if it is a Data Warehouse)
Sub-optimal indexes (performance)
Ingestion speed (bog down on loading of fresh data)
Query speed
Partitioning to aid in purging (if you will eventually purge 'old' data)
Sharding (This is as big a discussion as all the others combined)
All of these can be tackled, but we need to see the schema, the queries, the dataflow, the overall architecture, etc.

MySQL server very high load

I run a website with ~500 real time visitors, ~50k daily visitors and ~1,3million total users. I host my server on AWS, where I use several instances of different kind. When I started the website the different instances cost rougly the same. When the website started to gain users the RDS instance (MySQL DB) CPU constantly keept hitting the roof, I had to upgrade it several times, now it have started to take up the main part of the performance and monthly cost (around 95% of (2,8k$/month)). I currently use a database server with 16vCPU and 64GiB of RAM, I also use Multi-AZ Deployment to protect against failures. I wonder if it is normal for the database to be that expensive, or if I have done something terribly wrong?
Database Info
At the moment my database have 40 tables with the most of them have 100k rows, some have ~2millions and 1 have 30 millions.
I have a system the archives rows that are older then 21 days when they are not needed anymore.
Website Info
The website mainly use PHP, but also some NodeJS and python.
Most of the functions of the website works like this:
Start transaction
Insert row
Get last inserted id (lastrowid)
Do some calculations
Updated the inserted row
Update the user
Commit transaction
I also run around 100bots wich polls from the database with 10-30sec interval, they also inserts/updates the database sometimes.
Extra
I have done several things to try to lower the load on the database. Such as enable database cache, use a redis cache for some queries, tried to remove very slow queries, tried to upgrade the storage type to "Provisioned IOPS SSD". But nothing seems to help.
This is the changes I have done to the setting paramters:
I have though about creating a MySQL cluster of several smaller instances, but I don't know if this would help, and I also don't know if this works good with transactions.
If you need any more information, please ask, any help on this issue is greatly appriciated!
In my experience, as soon as you ask the question "how can I scale up performance?" you know you have outgrown RDS (edit: I admit my experience that leads me to this opinion may be outdated).
It sounds like your query load is pretty write-heavy. Lots of inserts and updates. You should increase the innodb_log_file_size if you can on your version of RDS. Otherwise you may have to abandon RDS and move to an EC2 instance where you can tune MySQL more easily.
I would also disable the MySQL query cache. On every insert/update, MySQL has to scan the query cache to see if there any results cached that need to be purged. This is a waste of time if you have a write-heavy workload. Increasing your query cache to 2.56GB makes it even worse! Set the cache size to 0 and the cache type to 0.
I have no idea what queries you run, or how well you have optimized them. MySQL's optimizer is limited, so it's frequently the case that you can get huge benefits from redesigning SQL queries. That is, changing the query syntax, as well as adding the right indexes.
You should do a query audit to find out which queries are accounting for your high load. A great free tool to do this is https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html, which can give you a report based on your slow query log. Download the RDS slow query log with the http://docs.aws.amazon.com/cli/latest/reference/rds/download-db-log-file-portion.html CLI command.
Set your long_query_time=0, let it run for a while to collect information, then change long_query_time back to the value you normally use. It's important to collect all queries in this log, because you might find that 75% of your load is from queries under 2 seconds, but they are run so frequently that it's a burden on the server.
After you know which queries are accounting for the load, you can make some informed strategy about how to address them:
Query optimization or redesign
More caching in the application
Scale out to more instances
I think the answer is "you're doing something wrong". It is very unlikely you have reached an RDS limitation, although you may be hitting limits on some parts of it.
Start by enabling detailed monitoring. This will give you some OS-level information which should help determine what your limiting factor really is. Look at your slow query logs and database stats - you may have some queries that are causing problems.
Once you understand the problem - which could be bad queries, I/O limits, or something else - then you can address them. RDS allows you to create multiple read replicas, so you can move some of your read load to slaves.
You could also move to Aurora, which should give you better I/O performance. Or use PIOPS (or allocate more disk, which should increase performance). You are using SSD storage, right?
One other suggestion - if your calculations (step 4 above) takes a significant amount of time, you might want look at breaking it into two or more transactions.
A query_cache_size of more than 50M is bad news. You are writing often -- many times per second per table? That means the QC needs to be scanned many times/second to purge the entries for the table that changed. This is a big load on the system when the QC is 2.5GB!
query_cache_type should be DEMAND if you can justify it being on at all. And in that case, pepper the SELECTs with SQL_CACHE and SQL_NO_CACHE.
Since you have the slowlog turned on, look at the output with pt-query-digest. What are the first couple of queries?
Since your typical operation involves writing, I don't see an advantage of using readonly Slaves.
Are the bots running at random times? Or do they all start at the same time? (The latter could cause terrible spikes in CPU, etc.)
How are you "archiving" "old" records? It might be best to use PARTITIONing and "transportable tablespaces". Use PARTITION BY RANGE and 21 partitions (plus a couple of extras).
Your typical transaction seems to work with one row. Can it be modified to work with 10 or 100 all at once? (More than 100 is probably not cost-effective.) SQL is much more efficient in doing lots of rows at once versus lots of queries of one row each. Show us the SQL; we can dig into the details.
It seems strange to insert a new row, then update it, all in one transaction. Can't you completely compute it before doing the insert? Hanging onto the inserted_id for so long probably interferes with others doing the same thing. What is the value of innodb_autoinc_lock_mode?
Do the "users" interactive with each other? If so, in what way?

When should I care about data modeling?

On the last month I've done basically the impossible: I have a Debian server on a Intel Celeron 2.5Ghz / 512 MB RAM / >40GB IDE Hard Drive with MySql running smoothly. I managed to connect using MySql Workbench and then I realized that I didn't stop to think about the database model.
My current database is an Access 97 Database with 2 gigantic tables:
Tbl_Swift - 13 fields and one of them is a 'memo' field with a full page of information.
Tbl_Contr - 20 fields where FOUR of them are 'memo' fields with pages of information.
It's not like the database is heavy or slow on Access, but I wanted to make it available to most users... then I realized that I should optimize my database, but here's the problem:
WHY?
Will it make that much of a difference? I'll have less than 5 users connected to this database and NONE of them will have 'write' privilege, they'll just run some standard queries. The database itself is rather small, it's under 600MB and ~90K records.
So, should I really stop to think about making it more 'optimized'?
"When I say OPTMIZE I mean when people say I should have a lot of tables with little information"
What you are talking about is normalization, and recently there was a thread about normalization vs. performance here: Denormalization: How much is too much?
And yes, I believe that you should think about normalization before that DB gets too big.

MySQL NDBCLUSTER: is it good for large scale solutions?

one question about NDBCLUSTER.
I inherited the writing of a web site basing on NDBCLUSTER 5.1 solution (LAMP platform).
Unfortunately, who designed the former solution didn't realize that this database engine has strong limits. One, the maximum number of fields a table can have is 128. The former programmer conceived tables with 369 fields in a single row, one for each day of the year plus some key field (he originally worked with MyISAM engine). Ok it must be refactored, anyways, I know.
What is more, the engine needs a lot of tuning: maximum number of attributes for a table (which defaults to 1000, a bit too few) and many other parameters, the misinterpretation or underestimation of which can lead to serious problems once you're in production with your database and you're forced to change something.
Even the fact that disk storage for NDBCLUSTER tables is kind of aleatory if not precisely configured: even if specified in CREATE TABLE statements, the engine seems to prefer keeping data in memory - which explains the speed - but can be a pain if your table on node 1 should suddenly collapse (as it did during testing). All table data lost on all nodes and table corrupted after 1000 records only.
We were on a server with 8Gb RAM, and the table had just 27 fields.
Please note that no ndb_mgm operation for nodes shutdown ran to compromise table data. It simply fell down, full stop. Our provider didn't understand why.
So the question is: would you recommend NDBCLUSTER as a stable solution for a large scale web service database?
We're talking about a database which should contain several millions of records, thousands of tables and thousands of catalogues.
If not that which database would you recommend as the best to accomplish the task of making a nation-level scale web service.
Thanks in advance.
I have a terrible experience with NDBCLUSTER. It's good replacement for memcached with range invalidation, nothing more. The stability and configurability does not exist for this solution. You can not force all processes to listen on specific ports, backup was working but I have to edit bkp files in vim to restore database etc..

Best approach to relating databases or tables?

What I have:
A MySQL database running on Ubuntu that maintains a
large table of articles (similar to
wordpress).
Need to relate a given article to
another set of data. This set of data
will be fairly large.
There maybe various sets of data that
will be related.
The query:
Is it better to contain these various large sets of data within the same database of articles, which will have a large set of traffic on it?
or
Is it better to create different databases (on the same server) that
relate by a primary key to the main database with the articles?
Put them all in the same DB initially, until you find that there is a performance issue. Much easier than prematurely optimising.
Modern RDBMS are very good at optimising data access.
If you need to connect frequently and read both of the records, you should put in a the same database. The server then won't have to run permission checks twice for each of your databases.
If you have serious traffic, you should consider using persistent connection for that query.
If you don't need to read them together frequently, consider to put on different machine. As the high traffic for the bigger database won't cause slow downs on the other.
Different databases on the same server gives you all the problems of a distributed architecture without any of the benefits of scaling out. One database per server is the way to go.
When you say 'same database' and 'different databases related' don't you mean 'same table' vs 'different tables'?
if that's the question, i'd say:
one table for articles
if these 'other sets of data' are all of the same structure, put them all in the same table. if not, one table per kind of data.
everything on the same database
if you grow big enough to make database size a performance issue (after many million records and lots of queries a second), consider table partitioning or maybe replacing the biggest table with a key/value store (couchDB, mongoDB, redis, tokyo cabinet, [etc][6]), which can be a little faster than MySQL but a lot easier to distribute for performance.
[6]:key-value store