When is it useful to store aggregated data in SQL [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am working in a project where i need to calculate some avg values based on the users interaction on a site.
Now, the amount of records that needs to have their total avg calculated can range from a few to thousands.
My question is, at which threshold would it be wise to store the aggregated data in a seperate table and through a store procedure update that value everytime a new record is generated instead of just calculate it everytime it is neede?
Thanks in advance.

Dont do it, until you start having performance problems caused by the time it takes to aggregate your data.
Then do it.
If discovering this bottleneck in production is unacceptable, then run the system in a test environment that accurately matches your production environment and load in test data that accurately matches production data. If you hit a performance bottleneck in that environment that is caused by aggregation time, then do it.

You need to weigh the need of current data vs the need of quick data. If you absolutely need current data then you have to live with longer delays in your queries. If you absolutely need your data asap then you will have to deal with older data.
You can time your queries and time the insertion into a separate table and evaluate which seems to best fit your needs.

Related

Stored/Virtual Generated Column- Pros/Cons/Best Practices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 20 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've read the MySQL Documentation on them, but still not clear on the benefits of Stored/Virtual Generated Columns? What are the pros/cons over storing the same data in an actual column and indexing that into memory? What are the pros/cons, and the best times/examples to when using them are more efficient or better?
Thank you!
A good reason to use a stored generated column is when the expression is costly enough that you want to calculate it only when you insert/update the row. A virtual generated column must recalculate the expression every time you run a query that reads that column.
The manual confirms this:
Stored generated columns can be used as a materialized cache for complicated conditions that are costly to calculate on the fly.
Besides that, there are some uses of generated columns that require the column to be stored. They don't work with virtual generated columns. For example, you need to use a stored generated column if you want to create a foreign key or a fulltext index on that generated column.

Is it good to keep multi-valued attributes in a table existing in a 24/7 running service? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am working on a project where I am using a table with a multi-valued attribute having 5-10 values. Is it good to keep multivalued attributes or should I normalize it into normal forms ?
But I think that it unnecessarily increases the no of rows.If we have 10 multi values for an attribute then each row or tuple will be replaced with new 10 rows which might increase the query running time.
Can anyone give suggestions on this?
The first normal form requests that each attribute be atomic.
I would say that the answer to this question hinges on the “atomic”: it is too narrow to define it as “indivisible”, because then no string would be atomic, as it can be split into letters.
I prefer to define it as “a single unit as far as the database is concerned”. So if this array (or whatever it is) is stored and retrieved in its entirety by the application, and its elements are never accessed inside the database, it is atomic in this sense, and there is nothing wrong with the design.
If, however, you plan to use elements of that attribute in WHERE conditions, if you want to modify individual elements with UPDATE statements or (worst of all) if you want the elements to satisfy constraints or refer to other tables, your design is almost certainly wrong. Experience shows that normalization leads to simpler and faster queries in that case.
Don't try to get away with few large table rows. Databases are optimized for dealing with many small table rows.

Are there pitfalls to using a db table with 400 columns that maps to an ActiveRecord model? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have to work with a legacy MySQL database that, for reasons outside of my control, can not be normalised.
The db consists of one table with 400 columns of various types
The table has 2,000 rows, and grows by 300 per week
Basic calculations like averages and counts will be carried out
The data are graphed across varying time series, and presented on a dashboard (built in Rails)
I can change the type of database (to Postgresql or MongoDB), but I can not alter the structure of the table.
New data will be uploaded via CSV file
No data validation is required
There are no joins
I've worked with Rails for sometime, but I've never created a model with more than 15 or 20 columns. My concerns are:
Performance implications, if any, of an ActiveRecord model with 400 attributes.
Would using json data type in Postgresql or using MongoDB be a better fit given the data is not relational?
I'm under the impression that SQL databases excel at calculations, and I'm concerned that using JSON would add performance or complexity overhead when making calculations on the data.

Proper database practices [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm a bit newer to structuring databases and I was wondering if, say I have 38 different pieces of data that I want to have per record. Is it better to break that up into say a couple different tables or can I just keep it all in one table.
In this case I have a table of energy usage data for accounts, I have monthly usage, monthly demand, and demand percentage, then 2 identifying keys for each which comes out to 38 pieces of data for each record.
So is it good practice to break it up or should I just leave that all as one table? Also are there any effects on the efficiency of the product doing queries once this database ends up accumulating a couple thousand records at it peak?
Edit: I'm using Hibernate to query, I'm not sure if that would have any effect on the efficiency depending on how I end up breaking this data up.
First, check the normal forms:
1) Wiki
2) A Simple Guide to Five Normal Forms in Relational Database Theory
Second, aggregation data like "monthly sales" or "daily clicks" typically go to a separate tables. This is motivated not only by normal forms, but also by the implementation of the database.
For example, MySQL offers the Archive storage engine which is designed for that.
If you're watching current month's data, these may appear in the same table, or can be stored in cache. The per-month data in a separated table may be computed 1st day of month.
when you read a record do you use often all data? or you have different sections or masks (loaded separatly) to show energy usage data, monthly statistics and so on?
how many records do you plan to have on this table? If they grow dramatically and continually, is it possible create tables with a postfix for grouping them by period (for month, half year, year ...)?

How to Manage very Large MySql Database [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have a copy of the PAF (UK Postcode) database it is currently stored in a MySql Database, and i use it on my site to pre-fill address details, however the Database huge 28,000,000+ records and it is very slow to search.
Any ideas how I could slit the DB to improve performance?
Thanks for the help guys!
that is not a large database, not even a large table. you must set appropiate indexes over the table and you will get good performance
There could be several ideas:
create indexes, meaningful ofcourse
review your schema. Avoid using huge datatypes like INT, BIGINT, TEXT etc unless absolutely required
optimize your queries so they use indexes, EXPLAIN statement might help
split your table into multiple smaller tables, say for example based on zones - north, east, west, south etc.
If your table doesn't require many INSERTs or UPDATEs, which I assume it might not being a postcode table, query cache can be a big help for faster queries
You will need to play around and see what option works best for you. But I think the first 2 should just be enough.
Hope it helps!