Is there a best practice for storing data for a database object (model) that will change or be deleted in the future (Django)? - mysql

I am building an order management system for an online store and would like to store information about the Product being ordered.
If I use a Foreign Key relationship to the Product, when someone changes the price, brand, supplier etc. of the Product or deletes it, the Order will be affected as well. I want the order management system to be able to display the state of the Product when it was ordered even if it is altered or deleted from the database afterwards.
I have thought about it long and hard and have come up with ideas such as storing a JSON string representation of the object; creating a duplicate Product whose foreign key I then use for the Order etc. However, I was wondering if there is a best practice or what other people use to handle this kind of situation in commercial software?
PS: I also have other slightly more complex situations, for instance, I would like the data for a User object attached to the Order to change as the User changes but then never get deleted when the User is deleted. An answer to the above question would definitely give me a good starting point.

This price-change problem is commonly handled in RDBMS (SQL) commerce applications by doing two things.
inserting rows into an order_detail table when an order is placed. Each row of that table contains the particulars of the item as sold: item_id, item_count, unit_price, total_price, unit_weight, total_weight, tax_status, and so forth. So, the app captures what actually was sold, and at what price. A later price change doesn't mess up sales records. You really have to do this.
a price table containing item_id, price, start_time, end_time. You retrieve the current price something like this:
SELECT item.item, price.price
FROM item
JOIN price ON item.item = price.item
AND price.start_date <= NOW()
AND (price.end_date > NOW() OR price.end_date IS NULL)
This approach allows you to keep track of historical prices, and also to set up future price changes. But you still copy the price into the order_detail table.
The point is: once you've accepted an order, its details cannot change in the future. You copy the actual customer data (name, shipping address, etc) into a separate order table from your current customer table when you accept the order, and (as mentioned above) the details of each item into an order_detail table.
Your auditors will hate you if you don't do this. Ask me how I know that sometime.

I would recommend creating attributes for the Order model and extracting the data you need one by one into those attributes while you are saving the model and then implementing a historical data table where you store JSONFields or some other version of the Product etc. when it is created or updated; that way people can refer to the historical data table if need be. This would be more efficient usage than storing the full fledged representation of the Product in the Order object as time taken to create the historical data is essentially charged to the admin creating the Product rather than the customer creating the Order. You can even create historical data objects in the background using threads etc. when you get to those advanced levels.

While it is hard answering your question without seeing your models.py at least, I will suggest archiving the results. You can add a boolean field called historical which defaults to False. When an order is made you need to set the previous order's (or orders') historical value to True in your view set or function.
Here, historical=True means the record is being archived. You can filter on this historical column to display what you want when. Sorry this is just a high-level outline.

Related

Record Master values in MySQL database?

I’m trying to figure out a good way of handling this situation strictly using MySQL. (using generated columns, views, logic statements, procedures, etc.)
I’ve simplified this for the example. Let’s say I have a table storing cost of production information for particular products in particular years in particular factories.
Some of these costs are specific for the product. (plastic, molding cost, packaging, labour, etc.)
And some of these costs are fairly generic; I may want to assign a specific value to them, but for many of them most of the time, I’ll just want them to refer to a particular value for the factory in that year. I’ll refer to these as “Master” values. (Such as overhead costs, so things like interest costs, electricity, heat, property taxes, admin labour, etc.)
Then if I update my Master values, the costs on these will automatically be adjusted; and they could be different for each year and factory. (So I can’t just use default values.)
So my columns might be:
And here’s the logic of how that would be defined:
$MValue(var) = var WHERE product_id = M (Master) AND year = year AND factory_id = factory_id;
Then essentially, if I wanted to use a unique cost for that product, I could put the cost amount in the field, but if I wanted it to use the Master value (shown on row 4, designated by M), then I could insert $MValue(column_id) in the field.
Any thoughts on how this could be accomplished with MySQL?
I should add that I’m already using generated (calculated) columns and views on these fields, so thus why I’m looking for a strictly MySQL solution.
I suggest storing the derived costs as NULL in your product rows, and then define a view that joins to the master row.
CREATE VIEW finalcosts AS
SELECT p.cost_id, p.product_id, p.factory_id, p.plastic_cost, p.molding_cost,
COALESCE(p.interest_cost, m.interest_cost) AS interest_cost,
COALESCE(p.tax_cost, m.tax_cost) AS tax_cost
FROM costs AS p
JOIN costs AS m ON p.year = m.year and m.product_id = 'M (Master)'
There's no way to use a default or a generated column to retrieve data from a different row. Those expressions must only reference values within the same row.
P.S.: Regarding the terminology of "master" values, I have been accustomed to the terms "direct costs" and "indirect costs." Direct costs are those that are easily attributed to per-unit costs of products, and indirect costs are like your master costs, they're attributed to the business as a whole, and they usually don't scale per unit produced.

How to store recent usage frequency in MySQL

I'm working on the Product Catalog module of an Invoicing application.
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.
How can I store this "usage recency/frequency" in the database?
I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.
Can you help me what is the best practice for this kind of problem?
Solution for the recency calculation:
Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).
Advantages:
Timestamps contains both date and time parts.
It makes possible VERY precise calculations and comparisons in regard
to dates and times.
It lets you format the saved values in the date-time format of your
choice.
You can convert from any date-time format into a timestamp.
In regard to your autocomplete fields, it allows you to filter
the products list as you wish. For example, to display all products
used since [date-time]. Or to fetch all products used between
[date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three
days (so flexible timestamps are).
Resources:
Unix-Time
The DATE, DATETIME, and TIMESTAMP Types
How should unix timestamps be stored in int columns?
How to convert human date to unix timestamp in Mysql?
Solution for the usage rate calculation:
Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.
Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.
A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%
That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).
This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:
SELECT
id,
name,
(usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;
Here's a little study case:
In the end, recency, frequency and rate are three different things.
Good luck.
To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:
Name: product_usage
Columns:
id - internal surrogate auto-incrementing primary key
product_id (int) - foreign key to product identifier
user_id (int) - foreign key to user identifier
timestamp (datetime) - date/time the product was used
This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.
(*) assuming something similar doesn't already exist in your database schema
Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).
A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).
The implementation only requires a single additional field per product,
orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:
orders_decayed = α * orders_decayed + (1-α) * orders_today.
You can sort your applicable suggestions according to this value.
To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.
The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.
Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.
Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.
You could design you history table with atleast below columns:
Id | ProductId | LastUsed (timestamp) | UserId
And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.
Same can be used for User's specific frequency by adding additional condition to filter by Userid.
I'm thinking about adding a new field recency which would be increased
by 1 every time the product was used, and decreased by 1/(count of all
products), when an other product is used. Then use this recency field
for ordering, but it doesn't seem to me the best solution.
Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.
The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.
Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.
The scenario
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.
your suggested solution
How can I store this "usage recency/frequency" in the database?
If it is a web application, don't store it in a Database in your server. Each user has different choices.
Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.
If you still want to store it in MySQL table,
Do the following
Create a column recency as said in question.
When each time the item used, increase the count by 1 as said in question.
Don't decrease it when other items get used.
To get the recent most used item,
query
SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);
Side note
Go for the database use only if you want to show the recent most used products without depending the user.
As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.

Disregard changes to a product description when retrieving order records

The title is somewhat hard to understand, so here is the explanation:
I am building a system, that deals with retail transactions. Meaning - purchases. I have a database with products, where each product has an ID, that is also known to the POS system. When a customer makes a purchase, the data is sent to the back-end for parsing, and is saved. Now everything is fine and dandy, until there are changes to the products name, since my client wants to see the name of the product, as it was purchased then.
How do I save this data, while also keeping a nice, normal-formed database?
Solutions I could think of are:
De-normalization, where we correlate the incoming data with the info we have in the database, and then save only the final text values, not id's.
Versioning, where we keep multiple versions of every product, and save the transactions with the id of the products version, when it came in. The problem with this one is, that as our retail store chain grows, and there are more and more changes happening to the products, the complexity of the whole product will greatly increase.
Any thoughts on this?
This is called a slowly changing dimension.
Either solution that you mention works. My preference is the second, versioning. I would have a product table that has an effdate and enddate on the record. You can easily find the current record (where enddate is null) or the record at any point in time.
The first method always strikes me as more "quick-and-dirty", but it also works. It just gets cumbersome when you have more fields and more objects you are trying to track. It does, in general though, win on performance.
If the name has to be the name as it was originally, the easiest, simplest and most reliable way to do that is to save the name of the product in the invoice line item record.
You should still link to the product with a ProductID, of course.
If you want to keep a history of name changes, you can do that in a separate table if you wish:
ProductNameID
ProductID
Date
Description
And store a ProductNameID with the invoice line item.

Database Historization

We have a requirement in our application where we need to store references for later access.
Example: A user can commit an invoice at a time and all references(customer address, calculated amount of money, product descriptions) which this invoice contains and calculations should be stored over time.
We need to hold the references somehow but what if the e.g. the product name changes? So somehow we need to copy everything so its documented for later and not affected by changes in future. Even when products are deleted, they need to reviewed later when the invoice is stored.
What is the best practise here regarding database design? Even what is the most flexible approach e.g. when the user want to edit his invoice later and restore it from the db?
Thank you!
Here is one way to do it:
Essentially, we never modify or delete the existing data. We "modify" it by creating a new version. We "delete" it by setting the DELETED flag.
For example:
If product changes the price, we insert a new row into PRODUCT_VERSION while old orders are kept connected to the old PRODUCT_VERSION and the old price.
When buyer changes the address, we simply insert a new row in CUSTOMER_VERSION and link new orders to that, while keeping the old orders linked to the old version.
If product is deleted, we don't really delete it - we simply set the PRODUCT.DELETED flag, so all the orders historically made for that product stay in the database.
If customer is deleted (e.g. because (s)he requested to be unregistered), set the CUSTOMER.DELETED flag.
Caveats:
If product name needs to be unique, that can't be enforced declaratively in the model above. You'll either need to "promote" the NAME from PRODUCT_VERSION to PRODUCT, make it a key there and give-up ability to "evolve" product's name, or enforce uniqueness on only latest PRODUCT_VER (probably through triggers).
There is a potential problem with the customer's privacy. If a customer is deleted from the system, it may be desirable to physically remove its data from the database and just setting CUSTOMER.DELETED won't do that. If that's a concern, either blank-out the privacy-sensitive data in all the customer's versions, or alternatively disconnect existing orders from the real customer and reconnect them to a special "anonymous" customer, then physically delete all the customer versions.
This model uses a lot of identifying relationships. This leads to "fat" foreign keys and could be a bit of a storage problem since MySQL doesn't support leading-edge index compression (unlike, say, Oracle), but on the other hand InnoDB always clusters the data on PK and this clustering can be beneficial for performance. Also, JOINs are less necessary.
Equivalent model with non-identifying relationships and surrogate keys would look like this:
You could add a column in the product table indicating whether or not it is being sold. Then when the product is "deleted" you just set the flag so that it is no longer available as a new product, but you retain the data for future lookups.
To deal with name changes, you should be using ID's to refer to products rather than using the name directly.
You've opened up an eternal debate between the purist and practical approach.
From a normalization standpoint of your database, you "should" keep all the relevant data. In other words, say a product name changes, save the date of the change so that you could go back in time and rebuild your invoice with that product name, and all other data as it existed that day.
A "de"normalized approach is to view that invoice as a "moment in time", recording in the relevant tables data as it actually was that day. This approach lets you pull up that invoice without any dependancies at all, but you could never recreate that invoice from scratch.
The problem you're facing is, as I'm sure you know, a result of Database Normalization. One of the approaches to resolve this can be taken from Business Intelligence techniques - archiving the data ina de-normalized state in a Data Warehouse.
Normalized data:
Orders table
OrderId
CustomerId
Customers Table
CustomerId
Firstname
etc
Items table
ItemId
Itemname
ItemPrice
OrderDetails Table
ItemDetailId
OrderId
ItemId
ItemQty
etc
When queried and stored de-normalized, the data warehouse table looks like
OrderId
CustomerId
CustomerName
CustomerAddress
(other Customer Fields)
ItemDetailId
ItemId
ItemName
ItemPrice
(Other OrderDetail and Item Fields)
Typically, there is either some sort of scheduled job that pulls data from the normalized datas into the Data Warehouse on a scheduled basis, OR if your design allows, it could be done when an order reaches a certain status. (Such as shipped) It could be that the records are stored at each change of status (with a field called OrderStatus tacking the current status), so the fully de-normalized data is available for each step of the oprder/fulfillment process. When and how to archive the data into the warehouse will vary based on your needs.
There is a lot of overhead involved in the above, but the other common approach I'm aware of carries even MORE overhead.
The other approach would be to make the tables read-only. If a customer wants to change their address, you don't edit their existing address, you insert a new record.
So if my address is AddressId 12 when I first order on your site in Jamnuary, then I move on July 4, I get a new AddressId tied to my account. (Say AddressId 123123 because your site is very successful and has attracted a ton of customers.)
Orders I palced before July 4 would have AddressId 12 associated with them, and orders placed on or after July 4 have AddressId 123123.
Repeat that pattern with every table that needs to retain historical data.
I do have a third approach, but searching it is difficult. I use this in one app only, and it actually works out pretty well in this single instance, which had some pretty specific business needs for reconstructing the data exactly as it was at a specific point in time. I wouldn't use it unless I had similar business needs.
At a specific status, serialize the data into an Xml document, or some other document you can use to reconstruct the data. This allows you to save the data as it was at the time it was serialized, retaining original table structure and relaitons.
When you have time-sensitive data, you use things like the product and Customer tables as lookup tables and store the information directly in your Orders/orderdetails tables.
So the order table might contain the customer name and address, the details woudl contain all relevant information about the produtct including especially price(you never want to rely on the product table for price information beyond the intial lookup at teh time of the order).
This is NOT denormalizing, the data changes over time but you need the historical value, so you must store it at the time the record is created or you will lose data intergrity. You don't want your financial reports to suddenly indicate you sold 30% more last year because you have price updates. That's not what you sold.

What is the best way to store a historical price list in a MySQL table?

Basically, my question is this - I have a list of prices, some of which are historical (i.e. I want to be able to search that product X was $0.99 on March 11, $1.99 on April 1, etc...). What is the best way to store this information?
I assumed I would probably have a Product table that has a foreign key to a price table. I initially thought that storing the current price would probably be the best bet, but I think I want to be able to store historical price data, so would the better route to go be to store a table like the following for the price list:
CREATE TABLE prices (
id BIGINT auto_increment not null,
primary key (id),
price DECIMAL(4,2) not null,
effectiveStartDate DATETIME NOT NULL,
effectiveEndDate DATETIME
);
I'm at a bit of a loss here. I'd like to be able to search products efficiently and see how the price of that product changed over time. How can I efficiently associate a set of these prices with a product? I guess what I am asking is, 'What would be the best way to index this in order to be able to provide an efficient search for queries that span a specific set of dates?'
Separate the need for historical data from the need for current price. This means:
1) Keep the current price in the products table.
2) When the price changes, insert the new price into the history table with only the start date. You don't really need the end date because you can get it from the previous row. (You can still put it in, it makes querying easier)
Also remember that your order history provides another kind of history, the actual purchases at a given price over time.
First, make sure that you really need to do this. Are you storing orders in the same database? If so, you can always view historical price trends by examining the price of the item in orders over time. This will also allow you to make correlations between price changes and changes in ordering patterns; the only case it wouldn't address is if a price change resulted in no orders being placed.
That being said, if you want an independent record of price changes, what you've presented is good. The only thing I would recommend is eliminating the end date; unless you plan on having a gap in time where the product has no price or overlapping prices, start date is sufficient and will make your logic easier.
The end date may be viable for more complex system where you can plan prices of product (i.e. various seasonal promotions/etc.) ahead. (oh, this is BS, should have thought more about it ... ok, you need end date only if you plan multiple prices of product at the same time, differentiated by something else ... still it's often convenient to have it inside current record, not looking at previous/next one)
Actually with most complex systems it is not uncommon to have several current prices differentiated by "dimensions" only (i.e. some kind of attribute which may be then decided by actual shipping place or customer's country, etc...)
I would also check twice your platform/language/framework/style of work before you omit the custom "id" primary key in favor of [product_id, starting_date,..?..] composite pk. The latter is somewhat more logical choice (at least I personally prefer it), but it may backfire sometimes, for example if your DB library has only limited way to work with more complex primary keys.