MySQL - Calculating Values vs Storing in Fields - mysql

I apologize if this has been asked before, but I'm pretty new to this and unable to find an answer that addresses the situation I'm faced with.
I'm trying to put together a database to run behind our company website. The database will store information on customer invoices and payments. I'm trying to figure out if I should create a field for the invoice balance, or if I should just have it calculate when the customer account is accessed? I don't want to create redundant data, and don't want to have the chance that somehow the field wouldn't be updated, and would thus be incorrect...but I also don't want to create to large of a burden on the server - especially if we pull up an overview of customer accounts - which would need to then calculate the balance of every account. Right now we are starting from scratch, so I want to set it up right!
We are anticipating having a couple hundred customer accounts by the end of the year, but will most likely be up to a couple thousand by the end of next year. (Average number of invoices per customer would be roughly 2-3 per year.)

There are probably other things to consider as well. For example, what if your invoice consist of ID's of products in another table... and the prices of those other products change? When you go to generate the invoice, you'll have the wrong total in there for what the guy actually paid 6 months ago. So if its a situation like that, you'll probably want to store the total on the invoice. And I wouldn't worry too much about doing a little math if you go the other route, it's not likely to be a huge bottleneck.

Yes, remember that items/goods could and will change their prices over time. You need to have the invoice balance as of the day of the purchase. Calculating the balance on the fly could lead to wrong balances later on.

Invoice balance is essential data to store, however I think you meant account balance since you referred to that later.
Storing the account balance would be denormalizing it, and that's not how accounting databases are typically designed. Always calculate account balance from invoices minus payments. Denormalizing is almost always a bad idea, and if you need to optimize in the future, there are other places to cache data that are more efficient than the database.
In your use case, a query like that on a few thousand rows would be negligible anyway, so don't optimize before you have to.

Related

How to store recent usage frequency in MySQL

I'm working on the Product Catalog module of an Invoicing application.
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.
How can I store this "usage recency/frequency" in the database?
I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.
Can you help me what is the best practice for this kind of problem?
Solution for the recency calculation:
Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).
Advantages:
Timestamps contains both date and time parts.
It makes possible VERY precise calculations and comparisons in regard
to dates and times.
It lets you format the saved values in the date-time format of your
choice.
You can convert from any date-time format into a timestamp.
In regard to your autocomplete fields, it allows you to filter
the products list as you wish. For example, to display all products
used since [date-time]. Or to fetch all products used between
[date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three
days (so flexible timestamps are).
Resources:
Unix-Time
The DATE, DATETIME, and TIMESTAMP Types
How should unix timestamps be stored in int columns?
How to convert human date to unix timestamp in Mysql?
Solution for the usage rate calculation:
Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.
Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.
A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%
That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).
This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:
SELECT
id,
name,
(usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;
Here's a little study case:
In the end, recency, frequency and rate are three different things.
Good luck.
To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:
Name: product_usage
Columns:
id - internal surrogate auto-incrementing primary key
product_id (int) - foreign key to product identifier
user_id (int) - foreign key to user identifier
timestamp (datetime) - date/time the product was used
This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.
(*) assuming something similar doesn't already exist in your database schema
Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).
A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).
The implementation only requires a single additional field per product,
orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:
orders_decayed = α * orders_decayed + (1-α) * orders_today.
You can sort your applicable suggestions according to this value.
To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.
The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.
Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.
Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.
You could design you history table with atleast below columns:
Id | ProductId | LastUsed (timestamp) | UserId
And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.
Same can be used for User's specific frequency by adding additional condition to filter by Userid.
I'm thinking about adding a new field recency which would be increased
by 1 every time the product was used, and decreased by 1/(count of all
products), when an other product is used. Then use this recency field
for ordering, but it doesn't seem to me the best solution.
Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.
The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.
Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.
The scenario
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.
your suggested solution
How can I store this "usage recency/frequency" in the database?
If it is a web application, don't store it in a Database in your server. Each user has different choices.
Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.
If you still want to store it in MySQL table,
Do the following
Create a column recency as said in question.
When each time the item used, increase the count by 1 as said in question.
Don't decrease it when other items get used.
To get the recent most used item,
query
SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);
Side note
Go for the database use only if you want to show the recent most used products without depending the user.
As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.

Organizing monthly account extracts for personal use MS Access

I'm a bit of a newbie with databases and database design, but I'm hoping someone can point me in the right direction. I currently have 14 monthly loan extracts, each of which contain all accounts, their status, balance and customer contact info as-of month end. Not knowing what to do, I imported each of the monthly files into Access with each table acting more like a tab from an Excel workbook. Laugh away - I now know that's not how it's supposed to work.
I've done my homework and I understand how to split up part of my data into Customer and Account tables, but what do I do with the account balances? My thought is to create a Balances table, create a relationship to the Accounts table and create columns for each month. This seems logical, but is it the best way?
99% of my analysis involves trend reporting and other ad hoc tasks - tracking the total balances by product type over time given other criteria, such as credit score or age. My intended use is to create queries to select the data I need and connect to it via Get & Transform in Excel for final manipulation and report writing.
This also begs the question "how normalized should my new database be?" Each monthly extract is cumulative, so a good 75% of my data is redundant contact info already, but how normalized should I go?
Sorry for ranting,but if anyone has any experience in setting up their own historical database or could point me in a direction that will get me on track, I would appreciate it.
Best practice for transactional systems is close to what you expect:
1. Create a Customer table
2. Create an Account table
3. Create an Account Balance table
4. Create relationships from the Account to Customer, and from the Account Balance to the Account table.
You can create a column for each month, provided you have Year as part of the key of the Account Balance table. Even better would be to have the key for the Account Balance be Account ID and Date.
However, since you are performing analytics over the data, a de-normalized approach is not only acceptable -- it is preferable. So yes, you can (and perhaps should, based upon your use cases) put all the data into one big flat table and then compile your analytics.

Money expiration tracking

I am working with money expiration tracking problem at the moment (originally it is not money, but I have used it as a more convenient example).
An user can earn money from a platform for some mysterious reason and spent them for buying stuff (products, gifts etc.).
I am looking an algorithm (SQL query best case) to find a current balance of an user balance.
The events of spending and earning money are stored different database (MySQL) tables (let's say user_earned and user_spent). So in normal case, I would simply count user totals from user_earned and subtract spent money (total of user_spent).
BUT! There is a condition, that earned user money expires in 2 years if they are not used.
That means, if user have not used his money or used just a part of it, they will expire. If an user uses his money, they are used from the oldest not expired money record, so the balance (bonus) could be calculated in user's favor.
These are 5 scenarios with events in time, to have a better understanding on the case:
Both tables (user_earned and user_spent) have timestamps for date tracking.
I did something similar in one of my projects.
Looks like you need an additional table spends_covering with columns
spend_id, earhed_id, sum
So for each spends record you need to insert one or many rows into the spends_covering to mark 'used' money.
Then balance would be just sum of not used where date is less than 2 years.
select sum(sub.earned_sum-sub.spent_sum) as balance
from
(select e.sum as earned_sum, sum(sc.sum) as spent_sum
from earned e
left join spends_covering sc on e.earhed_id=sc.earhed_id
where e.date BETWEEN ...
group by e.earhed_id
having earned_sum > spent_sum) sub
It may be worth it to have two tables -- one (or more) with all the historical details, one with just the current balances for each 'user'. Be sure to use transactions to keep the two in sync.

Working out users points - update vs select

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

Decoupling MySQL data versus ease of use

Assume a simple database for hotel reservations with three tables.
Table 1: Reservations
This table contains a check-in and check-out date as well as a reference to one or more rooms and a coupon if applicable.
Table 2: Rooms
This table holds the data of all the hotel rooms with prices per night and number of beds.
Table 3: Coupons
This table holds the data of all the coupons.
Option #1:
If you want to get an overview of the reservations for a particular month with the total cost of each reservation, you'd have to fetch the reservations, the rooms for each reservation, and the coupon (if one is present).
With this data, you can then calculate the total amount for the reservation.
Option #2:
However, there is also another option, which is to store the total cost and discount in the reservation table so that it is much easier to fetch these calculations. The downside is that your data becomes much more dependent and much less flexible to work with. What I mean is that you have to manually update the total cost and discount of the reservation table every time you change a room or a coupon that is linked to a reservation.
What is generally recommended in terms of performance (option #2) version data independence (option #1).
UPDATE:
It is a MySQL database with over 500 000 rows (reservations) at this point, but is growing rapidly. I want to optimize database performance at an early stage to make sure that the UX remains fast and responsive.
Let me start to answer this with a story. (Somewhat simplified.)
2011-01-01 I reserve a room for two nights, 2011-03-01 and 2011-03-02. You don't tell me which room I'll get. (Because you don't know yet which room I'll get.) You tell me it will cost $40 per night. I have no coupons. You enter my reservation into your computer, even though you're already fully reserved for both those nights. In fact, you already have one person on the waiting list for both those nights. (Overbooking is a normal thing, not an abnormal thing.)
2011-01-15 You raise the rates for every room by $5.
2011-02-01 I call again to make sure you still have my reservation. You confirm that I have a reservation for two nights, 2011-03-01 and 2011-03-02, at $40. (Not $45, your current rate. That wasn't our deal. Our deal was $40 a night.)
2011-02-12 One person calls and cancels their reservation for 2011-03-01 and 2011-03-02. You still don't yet have a room you know for certain that I'll be able to check in to. The other person from the waiting list now has a room; I'm still on the waiting list.
2011-02-15 One person calls and cancels their reservation for 2011-03-01 and 2011-03-02. Now I have a room.
2011-03-01 I check in with a coupon.
You can store the "current" or "default" price with each room, or with each class of
rooms, but you need to store the price we agreed to with my
reservation.
Reservations don't reserve rooms; they reserve potential rooms. You
don't know who will leave early, who will leave late, who will
cancel, and so on. (Based on my experience, once in a while a room will
be sealed with crime scene tape. You don't know how long that will last, either.)
You can have more reservations than room-nights.
Coupons can presumably appear at any time before check out.
If you want to get an overview of the reservations for a particular
month with the total cost of each reservation, you'd have to fetch the
reservations, the rooms for each reservation, and the coupon (if one
is present).
I don't think so. The price you agreed to should be in the reservation itself. Specific rooms can't resonably be assigned until the last minute. If there's one coupon per reservation, that might need to be stored with the reservation, too.
The only reporting problem is in making sure your reports clearly report how much expected revenue should be ignored due to overbooking.
The response of your answer depends of the size of your database. For small database option #1 is better, but for huge database option #2 is better. So if you could say how many rows you got in table, and the database used (oracle, sqlserver etc.) you will have a more precise answer.
You can add a table holds the data of the rooms`s historical prices and reason for change.
Table 2 only records the latest price.