Working out users points - update vs select - mysql

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column

Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.

myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

Related

Should I use more columns or more rows?

I need to create a table where each user (approx 60 atm) would have a defined task for each day. Right now the database have one column for each user with the task name in it (which is bad in my opinion as each new user would need to change the scheme of the table) and a "date" column.
A solution would be to have a "user" column and add a "task" column but that would mean there would be 60 (number of current users) rows per day.
I don't really know what's the best situation in this case.
Should I use more columns or more rows?
They're two completely different things, so this comparison doesn't make much sense...
Right now the database have one column for each user
Bad idea. Full stop. A user is a record of data, not a structural element of the database itself. For example, a table of users might contain columns like Username, Email, RegistrationDate, etc. It would not be a single row of data in which you add a column for each new user.
This would be a nightmare to maintain, would render things like Foreign Keys useless (and, honestly, render the entire concept of a relational database useless), would reach resource limits very quickly, etc.
Each record of information is a row, not a column (or table). In this case, each row in your table is a "User Task". It defines (or has a Feorign Key to) a User and defines (or has a Foreign Key to) a Task.
but that would mean there would be 60 (number of current users) rows per day
If the number of records in the table starts to become a problem, you can start looking into things like sharding and partitioning, archiving old data, etc. You've got time though, because "dozens of records per day" is sustainable for thousands of years. (And by then I imagine the hardware will be at least twice as good as it is today.)
Right now the database have one column for each user with the task
name in it (which is bad in my opinion as each new user would need > to change the scheme of the table)
You're right, this is very bad. Using one column for user, one for the task and one for the date, will be much better.
60 rows per day is not much. This means 21.900 rows per years and 219.000 rows in ten years. Mysql is able to handle millions of rows in a table
If you have two indexes, one for user and one for the date, searching for data will be fast enough.
Knowing nothing else about your database or schema, why not create a dimension table to store your users and fact table to track your task details?
That way you can more easily add new users and the tasks table would continue to grow as new facts are added. It would also be very easy to denormalize this model for query and/or reporting purposes.
Adding columns is a nuisance and can be slow. Instead have a table with columns (user, task, etc)
Even "60 rows per second" is not a problem. 600/second might be.
See the tag [pivot-table] for how to turn rows into columns for output display.

I came up with this SQL structure to allow rolling back and auditing user information, will this be adequate?

So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.

Using a MySQL View to Retrieve Running Total

We're building an e-commerce system and we need some help in deciding on what's the best way to determine how many stocks are available per product.
Say we have the tables "products", "products_in", and "products_out". "products_in" records all our transactions that increase the quantities of our products (e.g. when we buy the products from our wholesale suppliers). While "products_out" records all our transactions that decrease the quantities of our products (e.g. when our customers buy the products).
In our apps, retrieving the quantities available for our products is more common than writing/updating records in the "products_in" and "products_out" tables. Given this, will the use of a MySQL view that depends on "products_in" and "products_out" and computes the available stock be more efficient than computing it on the fly every time we query it? Will the value on the view be recomputed every time there's a new record in "products_in" or "products_out"? Or will the view recompute the value every time we query it (which can be quite expensive in our case)?
will the use of a MySQL view that depends on "products_in" and "products_out" and computes the available stock be more efficient than computing it on the fly every time we query it? Will the value on the view be recomputed every time there's a new record in "products_in" or "products_out"? Or will the view recompute the value every time we query it (which can be quite expensive in our case)?
Let's think of the db steps in each case:
Case 1 If you compute available_stock every time a product comes in or goes out and store it in say product table
If product comes in, Insert queries in product_in table or if product goes out, Insert queries in product_out table
In either case, Update queries in available_stock column of product. (Assume here if 10 products come or 10 products go, there will be 10 individual queries that will be fired) - Expensive?
Case 2 If you compute available_stock in view everytime and not store it in database
Fetch records from product_in and product_out tables (only for few products for which you want available_stock), do some math, and display the estimated stock - Expensive?
I personally would go with case 2, because it involves less db transactions overall then case 1 which involves tons of transactions to keep the stock in sync.
Footnote In the sidelines, I'd definitely say that if you are hardcore 'Object Oriented Programmer' then your db mappings definitely violates the fundamentals. Products_in, Products_out are both the same entities (objects) that records the inventory/stock transactions (like Father,Mother entities are Persons), therefore you should make them encapsulated into one general table ProductInOutData.
In ProductInOutData, you can then add an enum having either in value or out value. Having both in and out records in one table will not only improve the readability and accessibility but also will help in easy calculation of the products coming in or going out making the case 2 more lightweight.

Design for 'Total' field in a database

I am trying to find an optimal solution for my Database (MySQL), but I'm stuck over the decision whether or not to store a Total column.
This is the simplified version of my database :
I have a Team table, a Game table and a 'Score' table. Game will have {teamId, scoreId,...} while Score table will have {scoreId, Score,...} (Here ... indicates other columns in the tables).
On the home page I need to show the list of Teams with their scores. Over time the number of Teams will grow to 100s while the list of Score(s) will grow to 100000s. Which is the preferred way:
Should I sum up the scores and show along with teams every time the page is requested. (I don't want to cache because the scores will keep changing) OR
Should I have a total_score field in the Team table where I update the total_score of a team every time a new score is added to the Scores table for that group?
Which of the two is a better option or is there any other better way?
I use two guidelines when deciding to store a calculated value. In the best of all worlds, both of these statements will be true.
1) The value must be computationally expensive.
2) The value must have a low probability of changing.
If the cost of calculating the value is very high, but it changes daily, I might consider making a nightly job that updates the value.
Start without the total column and only add it if you start having performance issues.
Calculating sum at request time is better for accuracy but worse for efficiency.
Caching total in a field (dramatically) improves performance of certain queries, but increases code complexity or may show stale data (if you update cached value not at the same time, but via cron job).
It's up to you! :)
I agree that computed values should not be used except for special situations such as month end snapshots of databases.
I would simply create a view with one column in the view equal to your computed total column. Then you can query the view instead of the base tables.
Depending on how often your scores gets updated and what exactly the "score" means
Case1: Score is a LIVE score
If the "score" is the live score like "runs scored in cricket or baseball" or "score of vollyball match or tabletennis" then I really dont understand the need of showing the "sum" of the "running" scores. However, this may be a requirements also in some cases like showing the total runs scored by a team till now + the runs scored so far in the on going (live) game.
In this case I suggest you another option which is combination of your 1st and 2nd option
Total_score in the team table would be good with slight change in your data model. which is
Add a new column in the scores table called LIVE which will be 0 for a finished match 1 for a live match (and optionally -1 indicating match is about to start but the scores wont get update)
Now union two tables something like
select team_id,sum(total_sore) from (
select team_id,total_score from team
union
select team_id,sum(score) total_score from scores where live = 1 group by team_id)subquery
group by team_id
Case2: Score is just a RESULT
Well just query the db directly (your 1st option) as because the result will be updated only after the game ends and the update infact it will be a new entry in the score table.
If my assumption is correct, the scores get updated only after the game is finished. Moreover the update can be even less often when considered the games played by a team.

MySQL - Calculating Values vs Storing in Fields

I apologize if this has been asked before, but I'm pretty new to this and unable to find an answer that addresses the situation I'm faced with.
I'm trying to put together a database to run behind our company website. The database will store information on customer invoices and payments. I'm trying to figure out if I should create a field for the invoice balance, or if I should just have it calculate when the customer account is accessed? I don't want to create redundant data, and don't want to have the chance that somehow the field wouldn't be updated, and would thus be incorrect...but I also don't want to create to large of a burden on the server - especially if we pull up an overview of customer accounts - which would need to then calculate the balance of every account. Right now we are starting from scratch, so I want to set it up right!
We are anticipating having a couple hundred customer accounts by the end of the year, but will most likely be up to a couple thousand by the end of next year. (Average number of invoices per customer would be roughly 2-3 per year.)
There are probably other things to consider as well. For example, what if your invoice consist of ID's of products in another table... and the prices of those other products change? When you go to generate the invoice, you'll have the wrong total in there for what the guy actually paid 6 months ago. So if its a situation like that, you'll probably want to store the total on the invoice. And I wouldn't worry too much about doing a little math if you go the other route, it's not likely to be a huge bottleneck.
Yes, remember that items/goods could and will change their prices over time. You need to have the invoice balance as of the day of the purchase. Calculating the balance on the fly could lead to wrong balances later on.
Invoice balance is essential data to store, however I think you meant account balance since you referred to that later.
Storing the account balance would be denormalizing it, and that's not how accounting databases are typically designed. Always calculate account balance from invoices minus payments. Denormalizing is almost always a bad idea, and if you need to optimize in the future, there are other places to cache data that are more efficient than the database.
In your use case, a query like that on a few thousand rows would be negligible anyway, so don't optimize before you have to.