When should I care about data modeling? - mysql

On the last month I've done basically the impossible: I have a Debian server on a Intel Celeron 2.5Ghz / 512 MB RAM / >40GB IDE Hard Drive with MySql running smoothly. I managed to connect using MySql Workbench and then I realized that I didn't stop to think about the database model.
My current database is an Access 97 Database with 2 gigantic tables:
Tbl_Swift - 13 fields and one of them is a 'memo' field with a full page of information.
Tbl_Contr - 20 fields where FOUR of them are 'memo' fields with pages of information.
It's not like the database is heavy or slow on Access, but I wanted to make it available to most users... then I realized that I should optimize my database, but here's the problem:
WHY?
Will it make that much of a difference? I'll have less than 5 users connected to this database and NONE of them will have 'write' privilege, they'll just run some standard queries. The database itself is rather small, it's under 600MB and ~90K records.
So, should I really stop to think about making it more 'optimized'?

"When I say OPTMIZE I mean when people say I should have a lot of tables with little information"
What you are talking about is normalization, and recently there was a thread about normalization vs. performance here: Denormalization: How much is too much?
And yes, I believe that you should think about normalization before that DB gets too big.

Related

Mysql database size growing over 4 terabytes ? Azure supports up to 4

A couple of days ago, I was asked for help in this particular situation. A MYSQL Database set up on Azure is reaching 4 terabytes in size. I have set up databases before and developed for them but I'm not really a dba.
The problem according to them is that Azure size limit is 4 terabytes (and it will double that size in a couple of months but luckily just it wont keep growing like that). I talked to them about achieving some of the data, but they need all 10 years worth of data apparently. They don't want to move from Azure or use something other than MYSQL. One thing they pointed out to me was that 1 table in particular was almost 2 terabytes in size.
Unfortunately, I haven't been given access to the database yet but I just wanted to ask about my options in a situation like this. I looked into this a bit and I saw Stuff like MYSQL sharding. Is this the only option ? Can it be done on Azure (I saw SQL sharding articles for SQL server on Azure but not for Mysql). Can I partition some tables into another MYSQL database for example ?
I guess I'm just looking for advice on how to move forward with this. Any link on something like this is appreciated.
Thank you
Simple Answer
4TB is not the MySQL limit, so Azure is limiting you. Switch to another service.
Future problems
But... 4TB is rife with issues, especially for a non-dba.
Over-sized datatypes (wasting disk space)
Lack of normalization (wasting disk space)
Need for summary tables (if it is a Data Warehouse)
Sub-optimal indexes (performance)
Ingestion speed (bog down on loading of fresh data)
Query speed
Partitioning to aid in purging (if you will eventually purge 'old' data)
Sharding (This is as big a discussion as all the others combined)
All of these can be tackled, but we need to see the schema, the queries, the dataflow, the overall architecture, etc.

How can I List data which is of more than 2 GB in mysql

I have nearly 2.54GB database datas, which have tens millions of listing.
Now I have optimized mysql query as good as I can. but still I got 10 to 12 secs to get data. SO can anyone help me what should I do now ?
There are several things you could do:
If it's feasible, optimize your database by choosing the data sizes and types which fit best;
Add indexes to the most searched columns in your queries;
Choose the right configuration parameters for your database. You should use MySQLTuner-perl and/or the database configuration wizard from Percona (free registration required). Remember, tuning MySQL is a trial-and-error process; there is no "right" configuration, only one that works better for you. For instance, you could find that you get better performances with a large query cache, or with a disabled query cache altogether;
You could move your database to a SSD drive to increase disk access times.

Project specific: PostgreSQL or MySQL need advice

I am making a web app using the Django framework and would like some opinion regarding which database to use.
PostgreSQL works very nicely with Django and I think (please do correct me if I'm wrong) MySQL requires a bit more time and effort to work with Django.
My database in a single table will have around 60 million entries and it does read and write per request once but will sometimes require two reads making it I guess in a sense more read heavy.
Total expected DB size: around 10 tables each with around ~50 million entries.
My question is that will PostgreSQL suffice for having such a large number of entries while performing both read and writes or should I switch to MySQL because I heard MySQL is more advantageous to read heavy tasks.
Thank You.
Both MySQL and PostgreSQL are free to download and install. Install then, tune the servers for the expected load, insert 100 million rows of random data, and take some measurements.
PostgreSQL, when configured correctly for your hardware, will perform fine. (PostgreSQL's default settings are very conservative.) Its query optimizer and indexing options are far superior to MySQL.

What if Database Size Limit exceeds?

I was just thinking if a mysql database's size limit exceeds what will happen to my app running on that.
My hosting only allows 1 GB of space per database. I know thats too much, but what if i make an app on people discussing something, and sometime after many years the database limit exceeds.
Then what will I do? And approximately how much text data can be stored in 1 GB?
And can I have 2 databases running one application. Like one database contains usernames and profiles and that sort of stuff, and the other contains questions and answers? And will that slow down process of getting everything?
Update: can i set up mysql on my own server and have overcome the size limitation?
Thanks.
There will be no speed disadvantage from splitting your tables across two databases (assuming both databases are on the same MySQL server), but if the data are logically part of the same application then it is more sensible they be grouped together.
When you want to refer to a table in another database, you also have to qualify it with the appropriate database name, which you could see as an inefficiency.
My guess is that if you approach 1GB with two databases or with one, it's not going to make a difference how your host treats you (it shouldn't make a difference for MySQL, after all). I suggest you not worry about it unless you're going to be generating data like nobody's business, and in that case you require a more dedicated host.
If you figure out years down the line that you're coming to the limit, you can make a decision then whether to dump some of your older data or move to a host that permits you to store more data.
I don't think your application would stop working immediately when you hit 1GB. I think it more likely that your host would start writing you emails telling you off and suggesting you upgrade packages, or something.
Most of this is specific to your host. 1GB is ~1 billion bytes (one letter usually = on byte). Having 2 databases will not slow anything down, so long as they're both on the same host and they're properly set up.

MySQL NDBCLUSTER: is it good for large scale solutions?

one question about NDBCLUSTER.
I inherited the writing of a web site basing on NDBCLUSTER 5.1 solution (LAMP platform).
Unfortunately, who designed the former solution didn't realize that this database engine has strong limits. One, the maximum number of fields a table can have is 128. The former programmer conceived tables with 369 fields in a single row, one for each day of the year plus some key field (he originally worked with MyISAM engine). Ok it must be refactored, anyways, I know.
What is more, the engine needs a lot of tuning: maximum number of attributes for a table (which defaults to 1000, a bit too few) and many other parameters, the misinterpretation or underestimation of which can lead to serious problems once you're in production with your database and you're forced to change something.
Even the fact that disk storage for NDBCLUSTER tables is kind of aleatory if not precisely configured: even if specified in CREATE TABLE statements, the engine seems to prefer keeping data in memory - which explains the speed - but can be a pain if your table on node 1 should suddenly collapse (as it did during testing). All table data lost on all nodes and table corrupted after 1000 records only.
We were on a server with 8Gb RAM, and the table had just 27 fields.
Please note that no ndb_mgm operation for nodes shutdown ran to compromise table data. It simply fell down, full stop. Our provider didn't understand why.
So the question is: would you recommend NDBCLUSTER as a stable solution for a large scale web service database?
We're talking about a database which should contain several millions of records, thousands of tables and thousands of catalogues.
If not that which database would you recommend as the best to accomplish the task of making a nation-level scale web service.
Thanks in advance.
I have a terrible experience with NDBCLUSTER. It's good replacement for memcached with range invalidation, nothing more. The stability and configurability does not exist for this solution. You can not force all processes to listen on specific ports, backup was working but I have to edit bkp files in vim to restore database etc..