I have a process wherein I need to keep the history of a database records information, however the user needs to be able to change it at any time they please.
Scenario:
Seller creates an item with price of $5 and name of "foo"
Buyers buys item, an order is created linking to that item id
A while later, seller updates item name to "foobar" and item price to $6
Buyer views order history. The item name should be "foo" and price should be $5 since that's what they bought it at, but they are "foobar" and $6, respectively
This happens because when the seller updates the item, they are updating the same item the order is related to.
I thought of 3 possible solutions to this problem, and I would like to get your thoughts on which one you think is best (maybe from your prior experience), or a better solution I have not yet thought of. This is my first time dealing with this situation, so not sure how best to proceed without needing a refactor later.
My solutions:
Make the item name and price immutable.
Bad UX, cause now user has to delete item and recreate it if they want to make a modification
Requires some kind of deleted_at column in case user wants to delete the item after it has been purchased so that I can still keep it for referencing later to grab history data
Create a second table for history purposes
Not horrible, but requires a second table with a different name, not a big fan of the idea
Would have to run queries potentially twice to check both tables for similar data, as opposed to just querying one table
Create two records in the same table, and mark a boolean flag or some other flag to differentiate from historical/current records
I like this one the best, but not sure if the boolean flag may have any negative performance implications
I've encountered this issue too, particularly in product catalogs where the price changes frequently. Or the price may be on sale or discounted for a specific customer for some reason.
The only solution I've found is to copy the relevant product details to the customer's order record at the time they buy the product. In your example, at least the product name and the product price would be copied.
This might seem like it goes against the philosophy of "don't store redundant data" but it's not redundant—it's a fact that the customer bought the product for some specific price on a specific date, and that is still a useful fact forever, even if the current price for that product changes.
There should still be a link to the original product table, so managers can track how many orders included each product, for example. But the current price in the product table does not affect the record of each customer's order.
You might also need to create a product history table, to keep a record of all the times the price or name was changed. But that's for historical record-keeping only, it wouldn't affect typical queries during shopping or buying activities.
In this design:
Product table always stores the current price.
When a customer buys a product, they copy the current price into their own order record.
When a manager changes a price, the app creates a new record in the ProductHistory table.
The most recent record for each product in the ProductHistory table matches the current price for the same product.
Related
I am building an order management system for an online store and would like to store information about the Product being ordered.
If I use a Foreign Key relationship to the Product, when someone changes the price, brand, supplier etc. of the Product or deletes it, the Order will be affected as well. I want the order management system to be able to display the state of the Product when it was ordered even if it is altered or deleted from the database afterwards.
I have thought about it long and hard and have come up with ideas such as storing a JSON string representation of the object; creating a duplicate Product whose foreign key I then use for the Order etc. However, I was wondering if there is a best practice or what other people use to handle this kind of situation in commercial software?
PS: I also have other slightly more complex situations, for instance, I would like the data for a User object attached to the Order to change as the User changes but then never get deleted when the User is deleted. An answer to the above question would definitely give me a good starting point.
This price-change problem is commonly handled in RDBMS (SQL) commerce applications by doing two things.
inserting rows into an order_detail table when an order is placed. Each row of that table contains the particulars of the item as sold: item_id, item_count, unit_price, total_price, unit_weight, total_weight, tax_status, and so forth. So, the app captures what actually was sold, and at what price. A later price change doesn't mess up sales records. You really have to do this.
a price table containing item_id, price, start_time, end_time. You retrieve the current price something like this:
SELECT item.item, price.price
FROM item
JOIN price ON item.item = price.item
AND price.start_date <= NOW()
AND (price.end_date > NOW() OR price.end_date IS NULL)
This approach allows you to keep track of historical prices, and also to set up future price changes. But you still copy the price into the order_detail table.
The point is: once you've accepted an order, its details cannot change in the future. You copy the actual customer data (name, shipping address, etc) into a separate order table from your current customer table when you accept the order, and (as mentioned above) the details of each item into an order_detail table.
Your auditors will hate you if you don't do this. Ask me how I know that sometime.
I would recommend creating attributes for the Order model and extracting the data you need one by one into those attributes while you are saving the model and then implementing a historical data table where you store JSONFields or some other version of the Product etc. when it is created or updated; that way people can refer to the historical data table if need be. This would be more efficient usage than storing the full fledged representation of the Product in the Order object as time taken to create the historical data is essentially charged to the admin creating the Product rather than the customer creating the Order. You can even create historical data objects in the background using threads etc. when you get to those advanced levels.
While it is hard answering your question without seeing your models.py at least, I will suggest archiving the results. You can add a boolean field called historical which defaults to False. When an order is made you need to set the previous order's (or orders') historical value to True in your view set or function.
Here, historical=True means the record is being archived. You can filter on this historical column to display what you want when. Sorry this is just a high-level outline.
I have three tables: items, sold, and invoice. The item table tracks individual item prices and descriptions, the sold table tracks items the belonging to a particular invoice, and Invoice tracks the date and other information. Invoice has a one-to-many relationship with sold and sold has a one-to-many relationship with items.
Here are the relevant columns in my tables: invoice(invoiceID, total) sold(soldID, invoiceID, itemID) item(itemID, description, price)
I currently have no way to track the total price for the invoice without manually summing the items. The total column in invoice must be manually inserted.
I'm looking to create a trigger the finds all the rows i sold that have a matching foreign key for invoiceID, then adds the prices of the relevant items and outputs that to invoice.
If this is not possible, I could also add a new price column to sold, then use multiple triggers to eventually work my way up to invoice.
I'm not in a position where I can make any significant changes to the structure of the database, so if possible it'd be best to keep the fields and relationships as is.
If anyone has any input on how to create this trigger it'll be greatly appreciated.
#stickybit is exactly right. You don't want redundant data. You should not put any sort of derived total in an invoice record. Instead you should get it with a query or a view.
It's sometimes hard for people new to using SQL to believe, but queries and views are typically just as quick for retrieving aggregates like totals.
Views and tables look precisely the same from the perspective of application software. So make a view of your invoices that shows the totals. Something like this.
CREATE OR REPLACE VIEW invoice_with_totals AS
SELECT sold.invoiceId, SUM(price) total
FROM sold
JOIN item ON sold.itemID = item.itemID
GROUP BY invoiceID
Believe it or not, doing things this way with views will save you (or the people who will use your application) all kinds of troubleshooting craziness in the future. If you do things this way, there's simply no possibility of your totals disagreeing with the details in your invoices. If you use a trigger, that's not guaranteed. And, you know, Murphy's law, big customer, incorrect invoice, you get the picture ....
I have hard-won experience backing up my suggestion. I suspect #stickybit does too. Please consider it.
The title is somewhat hard to understand, so here is the explanation:
I am building a system, that deals with retail transactions. Meaning - purchases. I have a database with products, where each product has an ID, that is also known to the POS system. When a customer makes a purchase, the data is sent to the back-end for parsing, and is saved. Now everything is fine and dandy, until there are changes to the products name, since my client wants to see the name of the product, as it was purchased then.
How do I save this data, while also keeping a nice, normal-formed database?
Solutions I could think of are:
De-normalization, where we correlate the incoming data with the info we have in the database, and then save only the final text values, not id's.
Versioning, where we keep multiple versions of every product, and save the transactions with the id of the products version, when it came in. The problem with this one is, that as our retail store chain grows, and there are more and more changes happening to the products, the complexity of the whole product will greatly increase.
Any thoughts on this?
This is called a slowly changing dimension.
Either solution that you mention works. My preference is the second, versioning. I would have a product table that has an effdate and enddate on the record. You can easily find the current record (where enddate is null) or the record at any point in time.
The first method always strikes me as more "quick-and-dirty", but it also works. It just gets cumbersome when you have more fields and more objects you are trying to track. It does, in general though, win on performance.
If the name has to be the name as it was originally, the easiest, simplest and most reliable way to do that is to save the name of the product in the invoice line item record.
You should still link to the product with a ProductID, of course.
If you want to keep a history of name changes, you can do that in a separate table if you wish:
ProductNameID
ProductID
Date
Description
And store a ProductNameID with the invoice line item.
We have a requirement in our application where we need to store references for later access.
Example: A user can commit an invoice at a time and all references(customer address, calculated amount of money, product descriptions) which this invoice contains and calculations should be stored over time.
We need to hold the references somehow but what if the e.g. the product name changes? So somehow we need to copy everything so its documented for later and not affected by changes in future. Even when products are deleted, they need to reviewed later when the invoice is stored.
What is the best practise here regarding database design? Even what is the most flexible approach e.g. when the user want to edit his invoice later and restore it from the db?
Thank you!
Here is one way to do it:
Essentially, we never modify or delete the existing data. We "modify" it by creating a new version. We "delete" it by setting the DELETED flag.
For example:
If product changes the price, we insert a new row into PRODUCT_VERSION while old orders are kept connected to the old PRODUCT_VERSION and the old price.
When buyer changes the address, we simply insert a new row in CUSTOMER_VERSION and link new orders to that, while keeping the old orders linked to the old version.
If product is deleted, we don't really delete it - we simply set the PRODUCT.DELETED flag, so all the orders historically made for that product stay in the database.
If customer is deleted (e.g. because (s)he requested to be unregistered), set the CUSTOMER.DELETED flag.
Caveats:
If product name needs to be unique, that can't be enforced declaratively in the model above. You'll either need to "promote" the NAME from PRODUCT_VERSION to PRODUCT, make it a key there and give-up ability to "evolve" product's name, or enforce uniqueness on only latest PRODUCT_VER (probably through triggers).
There is a potential problem with the customer's privacy. If a customer is deleted from the system, it may be desirable to physically remove its data from the database and just setting CUSTOMER.DELETED won't do that. If that's a concern, either blank-out the privacy-sensitive data in all the customer's versions, or alternatively disconnect existing orders from the real customer and reconnect them to a special "anonymous" customer, then physically delete all the customer versions.
This model uses a lot of identifying relationships. This leads to "fat" foreign keys and could be a bit of a storage problem since MySQL doesn't support leading-edge index compression (unlike, say, Oracle), but on the other hand InnoDB always clusters the data on PK and this clustering can be beneficial for performance. Also, JOINs are less necessary.
Equivalent model with non-identifying relationships and surrogate keys would look like this:
You could add a column in the product table indicating whether or not it is being sold. Then when the product is "deleted" you just set the flag so that it is no longer available as a new product, but you retain the data for future lookups.
To deal with name changes, you should be using ID's to refer to products rather than using the name directly.
You've opened up an eternal debate between the purist and practical approach.
From a normalization standpoint of your database, you "should" keep all the relevant data. In other words, say a product name changes, save the date of the change so that you could go back in time and rebuild your invoice with that product name, and all other data as it existed that day.
A "de"normalized approach is to view that invoice as a "moment in time", recording in the relevant tables data as it actually was that day. This approach lets you pull up that invoice without any dependancies at all, but you could never recreate that invoice from scratch.
The problem you're facing is, as I'm sure you know, a result of Database Normalization. One of the approaches to resolve this can be taken from Business Intelligence techniques - archiving the data ina de-normalized state in a Data Warehouse.
Normalized data:
Orders table
OrderId
CustomerId
Customers Table
CustomerId
Firstname
etc
Items table
ItemId
Itemname
ItemPrice
OrderDetails Table
ItemDetailId
OrderId
ItemId
ItemQty
etc
When queried and stored de-normalized, the data warehouse table looks like
OrderId
CustomerId
CustomerName
CustomerAddress
(other Customer Fields)
ItemDetailId
ItemId
ItemName
ItemPrice
(Other OrderDetail and Item Fields)
Typically, there is either some sort of scheduled job that pulls data from the normalized datas into the Data Warehouse on a scheduled basis, OR if your design allows, it could be done when an order reaches a certain status. (Such as shipped) It could be that the records are stored at each change of status (with a field called OrderStatus tacking the current status), so the fully de-normalized data is available for each step of the oprder/fulfillment process. When and how to archive the data into the warehouse will vary based on your needs.
There is a lot of overhead involved in the above, but the other common approach I'm aware of carries even MORE overhead.
The other approach would be to make the tables read-only. If a customer wants to change their address, you don't edit their existing address, you insert a new record.
So if my address is AddressId 12 when I first order on your site in Jamnuary, then I move on July 4, I get a new AddressId tied to my account. (Say AddressId 123123 because your site is very successful and has attracted a ton of customers.)
Orders I palced before July 4 would have AddressId 12 associated with them, and orders placed on or after July 4 have AddressId 123123.
Repeat that pattern with every table that needs to retain historical data.
I do have a third approach, but searching it is difficult. I use this in one app only, and it actually works out pretty well in this single instance, which had some pretty specific business needs for reconstructing the data exactly as it was at a specific point in time. I wouldn't use it unless I had similar business needs.
At a specific status, serialize the data into an Xml document, or some other document you can use to reconstruct the data. This allows you to save the data as it was at the time it was serialized, retaining original table structure and relaitons.
When you have time-sensitive data, you use things like the product and Customer tables as lookup tables and store the information directly in your Orders/orderdetails tables.
So the order table might contain the customer name and address, the details woudl contain all relevant information about the produtct including especially price(you never want to rely on the product table for price information beyond the intial lookup at teh time of the order).
This is NOT denormalizing, the data changes over time but you need the historical value, so you must store it at the time the record is created or you will lose data intergrity. You don't want your financial reports to suddenly indicate you sold 30% more last year because you have price updates. That's not what you sold.
Suppose i have tables
Products -------product_id , name , price , size
shopping_cart------cart_id,item_id,user_id,quantity
order----order_id , user_id ,totalprice , date
orderHistory---------user_id , item_id,date,order_id
I am confused how should i store shopping history because if i store item_id , then there may be the possibility that some product may be deleted , then what should i display in history.
There is possibility that price , size , other dimension of that product changes with time but in history i don't want to change
so how should i design the database
For the product deletion issue, try including something like an "Active" (boolean) field in the product table. This way you don't need to physically delete products, just deactivate them. Then you can build your code so that inactive products don't show in the catalogue, but they are still available in your database to show in the order history section.
I'm guessing you're trying to create something like an "OrderLine" table with your OrderHistory table. You should only need to link this to products, and your order header (Order) tables, you don't need to link it to users as the order header table is already linked to a user. If you add some additional fields like "quantity" and "price" to the OrderLine table then you can create a snapshot when the order is placed, and insert the price (at the time it was ordered) and the quantity ordered into you order history table. This way if the product price is changed over time, the information in the OrderLine table remains the same and you still have the original price.
You could still build some entities for save product history (price etc) if you wanted to hold this to show price trends, but in terms of maintaining your actual order information its not necessary.
This approach means your shopping cart table could used as a "work in progress" repository where you are only storing current carts, and once the order is completed the cart is emptied and the data inserted into your order header and order line tables.
This doesn't cover everything, but hopefully gives you some ideas on approaches you could take in regards to your questions.
I face the same problem right now, and I am solving it basically by duplicating the relevant data into a secondary table, that the order history models can look at. They will never change, and never be deleted.
This way, if prices change or titles change, you'll have a snapshot in time of the order.
Another way would be to create versioned products, and store the specific version id. When the product changes, the displayed version updates to the newest product ID.
You are duplication history either way.
In my opinion you should have a table between order and product in which you can store the information of order and product. you can use order history table for this purpose. Just store the at time of shopping information in that table. It should be a good practice to do in my opinion.
Even if the actual values changes in product table you dont need to change the values in order history table. Just touch that table only at the time when user do some kind of shopping otherwise you dont need to do.
I also suggest you to create a customer table and store customer information in that table and use also customer id in orderhistory table also. It will help you to classify the history even with respect to customer