Choosing Appropriate Database [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm working on a website which have requirement to store large numbers of data on single table. It will be over 100K entries per month and stores for minimum 5 years. It will approx 100k × 60 months = 6 million entries.
My Question is which is the best DBMS system which can handles this kind of data? Mysql/Oracle/PostgreSQL?

First of all, 6M records is not very much, so in these days it should not be a problem for any mainstream DBMS. However, I see two aspects:
1) Space assessment - approximate how much space will be needed. For this you can insert in a table several records that will be similar to yours and extrapolate this to 6M records. E.g. (I have used SQL Server, but this should be available for any other DBMS such as MySQL):
Record looks like this (4 integers and a varchar)
103 1033 15 0 The %S_MSG that starts with '%.*ls' is too long. Maximum length is %d.
I have inserted about 1M rows in a table and space usage returns something like:
rows reserved
1008656 268232 KB
So, it will be about 1.5GB for 6M rows.
2) Usage assessment - already specified by chanaka wije. If you do only SELECTs or INSERTs, no special features are required (like support for many transactions per time unit).
Also, in order to improve SELECT performance, you should take a look into partitioning (by time your case) - see here, here or here.

depends on the usage of your table whether you want insert only or whether you do selects frequently, I'm using a table to store web page views, 4 million records per month and I'm using mysql, also every 6 months I do trimming, no issues so far, if you want to use select queries use correct database engine like Innodb has Row-level locking, and MyISAM has Table-level locking

This is a good question. Apart from what has been suggested here, I think one issue to be considered would be how you connect to the database. Oracle itself can only scale well if you are using a connection pool (limited fixed amount of connections). If you are connecting all the time, peaking some data and disconnecting, don't use Oracle. Seriously, go for MySQL.
And if your application is very simple, consider the least expensive option. Don't throw Oracle at it just because is "the best out there".

Related

i am using MySQL as database and i have 1 table containing 60 columns. Is it good to have a table with so many columns? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 months ago.
Improve this question
I'm doing a training point management function, I need to save all those points in the database so that it can be displayed when needed. And I created table for that function with 60 columns. Is that good? Or can anyone suggest me another way to handle it?
It is unusual but not impossible for a table to have that many columns, however...
It suggests that you schema might not be normalized. If that is the case then you will run into problems designing queries and/or making efficient use of the available resources.
Depending on how often each row is updated, the table could become fragmented. MySQL, like most DBMS, does not add up the size of all the attributes in the relation to work out the size to allocate for the record (although this is an option with C-ISAM). It rounds that figuere up so that there is some space for the data to grow, but at some point it could be larger than the space available, At that point the record must be migrated elsewhere. This leads to fragmentation in the data.
You queries are going to be very difficult to read/maintain. You may fall into the trap of writing "select * ...." which means that the DBMS needs to read the entirety of the record into memory in order to resolve the query. This does not make for efficient use of your memory.
We can't tell you if what you have done is correct, nor if you should be doing it differently without a detailed understanding of the underlying the data.
I've worked with many tables that had dozens of columns. It's usually not a problem.
In relational database theory, there is no limit to the number of columns in a table, as long as it's finite. If you need 60 attributes and they are all properly attributes of the candidate key in that table, then it's appropriate to make 60 columns.
It is possible that some of your 60 columns are not proper attributes of the table, and need to be split into multiple tables for the sake of normalization. But you haven't described enough about your specific table or its columns, so we can't offer opinions on that.
There's a practical limit in MySQL for how many columns it supports in a given table, but this is a limit of the implementation (i.e. MySQL internal code), not of the theoretical data model. The actual maximum number of columns in a table is a bit tricky to define, since it depends on the specific table. But it's almost always greater than 60. Read this blog about Understanding the Maximum Number of Columns in a MySQL Table for details.

MySQL One table vs many tables (same data) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm building a website to monitor a bunch of IOT devices. E.g. Online/Offline status of each devices and some device specific information it may report back, IP address, Temperature etc this will vary. FYI These devices report back to my site via a processor/computer that poles these devices and then reports back (a maximum of 255 devices but in most cases between 10 - 100 devices).
To date, my approach had been that for each processor I would create a new table with just that processors devices would reside within. However in discussions with a colleague he suggested this might not be the best way to go, as it isn't particularly efficient and could be problematic later on e.g. if you wanted to add another column later on, having to add this to possible 50+ different processor tables etc.
Instead because all these tables would have the same structure e.g. identical amount of columns etc just the amount of devices e.g. rows would vary, would one big table with all these rows was a better way to go?
I know that in MySQL terms "scanning" is an expensive operation, and with one big table I would argue there would be more scanning as I would have to filter as I would have to take one big data set each time, and filter it down into a view, e.g. Processor or location against 5000+ rows vs lots of smaller tables of 100 rows. Also I would argue the data in this table would be written to allot e.g. each time a device goes offline the offline flag is updated, so I'm not sure if that makes it more suitable to a single table vs one large table.
Appropriate there's many different ways of approaching this, I just don't want to go down one rabbit hole and regret it later on. Front end will be PHP if that counts for anything.
Your friend is correct. Creating many tables to store very similar data would be a waste of configuration time and an inefficient way to store this information. Instead, creating a table that has columns which can differentiate your machines from each other (ID of machine, type, whatever), as well as columns for the information that all machines will be reporting (temperature, IP, etc), you will have a much more organized database and it will be much simpler when you want to update your table later on.
SQL is very well-optimized for search queries, and unless you're storing millions of rows, I think you'll be just fine in terms of performance.

Mysql is sucking for large data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a MySQL database with around 30gb data. Some of the tables contain over 40 million rows. Iam using InnoDB. I query by only use "select count(*) from table_name" in local PC takes me around 5 minutes. I think it's impossible for me to the joining of the table. I would like to ask would there anything I could do to improve the performance. Or do I need to switch to another database? I never encounter such large data in DB. Please help.
I have run mysql instances with over 100 million entries, and delivering over 30 million queries per day. So it can be done.
The problems you are experiencing will occur with any other database system if similarly configured.
I can only give you a few tips, if this is mission critical consider hiring a professional to tweak your system.
Basics that you need to look at;
This size database is best run on a dedicated server with SSD disks, and at least 2 cores;
Your going to need a lot of RAM in your server, at least your total database size + 20% for other system resources;
Make sure mysql has been configured with enough memory, 80% of your total RAM. The primary setting that does this will be innodb_buffer_pool_size;
Optimize your queries, and index where needed - this is a fine art but can drastically improve performance, learn to use EXPLAIN ... on your queries.
MySQL InnoDB tables do not keep a count of rows, thus SELECT COUNT(*) can be slow. It's not an indication of how other queries might perform, but it is an indication of how slow a full table scan might be. Five minutes is really bad for just 40 million rows and might indicate a serious problem with your database or disk.
Here is a performance blog on the subject. Also see this related answer.
I had encounter the large date size problem before and hope my experience is useful for you.
first, your need create index for your table, but which kind of index should be used depending on your query logic.
after indexing, if the query is still slow, you'd better divide the data into a hierarchy, for example, source tables, middle tables and report tables. the report table just store some final data and the query will be fast, also create index for it.
third, try to use something like memsql if above mentioned can not meet your require.
besides, learn some command like :
set profiling = 1
do some slow query
show profiles;
SHOW PROFILE FOR QUERY N;

MySQL Table Size Limit and Performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
In a MySQL database, I have one table that has 330 columns, each column is either a float or integer value. The entries are indexed by a millisecond time stamp column. Over the life of the application there is expected to be on the order of 100 - 200 million entries. This table is completely independent and has no relations to other tables. The only queries are ones filtering by the time stamp index.
Assuming I have a modern intel server with 6 cores and 32GB of ram and enough disk storage for the data, will any size limits be hit or will performance significantly degrade?
If there will be problems, what should be done to mitigate the problems.
I know similar questions have been asked, but the answer always seems to be it depends. Hopefully I've provided enough information so that a definitive answer can be determined.
Wow such huge hardware for such a small dataset!
You will not have any insurmountable problems with this dataset.
330 columns * 8 bytes = 2640 bytes (maximum) per row
2640 bytes * 200 million rows = 491GB
It's big, but not huge. It really depends what you're going to do with the data. If you are 'appending' to the data, never updating or inserting (in your case inserting earlier timestamps) then that eliminates two potential causes for concern.
If you are querying on the timestamp index, are you going to be using a RANGE or a specific timestamp?
Querying over ranges will be fine - make your timestamp the clustered index column. Since you are performing some earlier inserts, this can cause your table to fragment but that won't be a really big problem - if it does you can defragment the table.
A big choice for performance is InnoDB or MyISAM - InnoDB has full transactional capability - but do you need this? It involves twice as many writes, but for a single table with no referential integrity, you're probably going to be ok - more here.

Proper database practices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm a bit newer to structuring databases and I was wondering if, say I have 38 different pieces of data that I want to have per record. Is it better to break that up into say a couple different tables or can I just keep it all in one table.
In this case I have a table of energy usage data for accounts, I have monthly usage, monthly demand, and demand percentage, then 2 identifying keys for each which comes out to 38 pieces of data for each record.
So is it good practice to break it up or should I just leave that all as one table? Also are there any effects on the efficiency of the product doing queries once this database ends up accumulating a couple thousand records at it peak?
Edit: I'm using Hibernate to query, I'm not sure if that would have any effect on the efficiency depending on how I end up breaking this data up.
First, check the normal forms:
1) Wiki
2) A Simple Guide to Five Normal Forms in Relational Database Theory
Second, aggregation data like "monthly sales" or "daily clicks" typically go to a separate tables. This is motivated not only by normal forms, but also by the implementation of the database.
For example, MySQL offers the Archive storage engine which is designed for that.
If you're watching current month's data, these may appear in the same table, or can be stored in cache. The per-month data in a separated table may be computed 1st day of month.
when you read a record do you use often all data? or you have different sections or masks (loaded separatly) to show energy usage data, monthly statistics and so on?
how many records do you plan to have on this table? If they grow dramatically and continually, is it possible create tables with a postfix for grouping them by period (for month, half year, year ...)?