Database Historization - mysql

We have a requirement in our application where we need to store references for later access.
Example: A user can commit an invoice at a time and all references(customer address, calculated amount of money, product descriptions) which this invoice contains and calculations should be stored over time.
We need to hold the references somehow but what if the e.g. the product name changes? So somehow we need to copy everything so its documented for later and not affected by changes in future. Even when products are deleted, they need to reviewed later when the invoice is stored.
What is the best practise here regarding database design? Even what is the most flexible approach e.g. when the user want to edit his invoice later and restore it from the db?
Thank you!

Here is one way to do it:
Essentially, we never modify or delete the existing data. We "modify" it by creating a new version. We "delete" it by setting the DELETED flag.
For example:
If product changes the price, we insert a new row into PRODUCT_VERSION while old orders are kept connected to the old PRODUCT_VERSION and the old price.
When buyer changes the address, we simply insert a new row in CUSTOMER_VERSION and link new orders to that, while keeping the old orders linked to the old version.
If product is deleted, we don't really delete it - we simply set the PRODUCT.DELETED flag, so all the orders historically made for that product stay in the database.
If customer is deleted (e.g. because (s)he requested to be unregistered), set the CUSTOMER.DELETED flag.
Caveats:
If product name needs to be unique, that can't be enforced declaratively in the model above. You'll either need to "promote" the NAME from PRODUCT_VERSION to PRODUCT, make it a key there and give-up ability to "evolve" product's name, or enforce uniqueness on only latest PRODUCT_VER (probably through triggers).
There is a potential problem with the customer's privacy. If a customer is deleted from the system, it may be desirable to physically remove its data from the database and just setting CUSTOMER.DELETED won't do that. If that's a concern, either blank-out the privacy-sensitive data in all the customer's versions, or alternatively disconnect existing orders from the real customer and reconnect them to a special "anonymous" customer, then physically delete all the customer versions.
This model uses a lot of identifying relationships. This leads to "fat" foreign keys and could be a bit of a storage problem since MySQL doesn't support leading-edge index compression (unlike, say, Oracle), but on the other hand InnoDB always clusters the data on PK and this clustering can be beneficial for performance. Also, JOINs are less necessary.
Equivalent model with non-identifying relationships and surrogate keys would look like this:

You could add a column in the product table indicating whether or not it is being sold. Then when the product is "deleted" you just set the flag so that it is no longer available as a new product, but you retain the data for future lookups.
To deal with name changes, you should be using ID's to refer to products rather than using the name directly.

You've opened up an eternal debate between the purist and practical approach.
From a normalization standpoint of your database, you "should" keep all the relevant data. In other words, say a product name changes, save the date of the change so that you could go back in time and rebuild your invoice with that product name, and all other data as it existed that day.
A "de"normalized approach is to view that invoice as a "moment in time", recording in the relevant tables data as it actually was that day. This approach lets you pull up that invoice without any dependancies at all, but you could never recreate that invoice from scratch.

The problem you're facing is, as I'm sure you know, a result of Database Normalization. One of the approaches to resolve this can be taken from Business Intelligence techniques - archiving the data ina de-normalized state in a Data Warehouse.
Normalized data:
Orders table
OrderId
CustomerId
Customers Table
CustomerId
Firstname
etc
Items table
ItemId
Itemname
ItemPrice
OrderDetails Table
ItemDetailId
OrderId
ItemId
ItemQty
etc
When queried and stored de-normalized, the data warehouse table looks like
OrderId
CustomerId
CustomerName
CustomerAddress
(other Customer Fields)
ItemDetailId
ItemId
ItemName
ItemPrice
(Other OrderDetail and Item Fields)
Typically, there is either some sort of scheduled job that pulls data from the normalized datas into the Data Warehouse on a scheduled basis, OR if your design allows, it could be done when an order reaches a certain status. (Such as shipped) It could be that the records are stored at each change of status (with a field called OrderStatus tacking the current status), so the fully de-normalized data is available for each step of the oprder/fulfillment process. When and how to archive the data into the warehouse will vary based on your needs.
There is a lot of overhead involved in the above, but the other common approach I'm aware of carries even MORE overhead.
The other approach would be to make the tables read-only. If a customer wants to change their address, you don't edit their existing address, you insert a new record.
So if my address is AddressId 12 when I first order on your site in Jamnuary, then I move on July 4, I get a new AddressId tied to my account. (Say AddressId 123123 because your site is very successful and has attracted a ton of customers.)
Orders I palced before July 4 would have AddressId 12 associated with them, and orders placed on or after July 4 have AddressId 123123.
Repeat that pattern with every table that needs to retain historical data.
I do have a third approach, but searching it is difficult. I use this in one app only, and it actually works out pretty well in this single instance, which had some pretty specific business needs for reconstructing the data exactly as it was at a specific point in time. I wouldn't use it unless I had similar business needs.
At a specific status, serialize the data into an Xml document, or some other document you can use to reconstruct the data. This allows you to save the data as it was at the time it was serialized, retaining original table structure and relaitons.

When you have time-sensitive data, you use things like the product and Customer tables as lookup tables and store the information directly in your Orders/orderdetails tables.
So the order table might contain the customer name and address, the details woudl contain all relevant information about the produtct including especially price(you never want to rely on the product table for price information beyond the intial lookup at teh time of the order).
This is NOT denormalizing, the data changes over time but you need the historical value, so you must store it at the time the record is created or you will lose data intergrity. You don't want your financial reports to suddenly indicate you sold 30% more last year because you have price updates. That's not what you sold.

Related

Is there a best practice for storing data for a database object (model) that will change or be deleted in the future (Django)?

I am building an order management system for an online store and would like to store information about the Product being ordered.
If I use a Foreign Key relationship to the Product, when someone changes the price, brand, supplier etc. of the Product or deletes it, the Order will be affected as well. I want the order management system to be able to display the state of the Product when it was ordered even if it is altered or deleted from the database afterwards.
I have thought about it long and hard and have come up with ideas such as storing a JSON string representation of the object; creating a duplicate Product whose foreign key I then use for the Order etc. However, I was wondering if there is a best practice or what other people use to handle this kind of situation in commercial software?
PS: I also have other slightly more complex situations, for instance, I would like the data for a User object attached to the Order to change as the User changes but then never get deleted when the User is deleted. An answer to the above question would definitely give me a good starting point.
This price-change problem is commonly handled in RDBMS (SQL) commerce applications by doing two things.
inserting rows into an order_detail table when an order is placed. Each row of that table contains the particulars of the item as sold: item_id, item_count, unit_price, total_price, unit_weight, total_weight, tax_status, and so forth. So, the app captures what actually was sold, and at what price. A later price change doesn't mess up sales records. You really have to do this.
a price table containing item_id, price, start_time, end_time. You retrieve the current price something like this:
SELECT item.item, price.price
FROM item
JOIN price ON item.item = price.item
AND price.start_date <= NOW()
AND (price.end_date > NOW() OR price.end_date IS NULL)
This approach allows you to keep track of historical prices, and also to set up future price changes. But you still copy the price into the order_detail table.
The point is: once you've accepted an order, its details cannot change in the future. You copy the actual customer data (name, shipping address, etc) into a separate order table from your current customer table when you accept the order, and (as mentioned above) the details of each item into an order_detail table.
Your auditors will hate you if you don't do this. Ask me how I know that sometime.
I would recommend creating attributes for the Order model and extracting the data you need one by one into those attributes while you are saving the model and then implementing a historical data table where you store JSONFields or some other version of the Product etc. when it is created or updated; that way people can refer to the historical data table if need be. This would be more efficient usage than storing the full fledged representation of the Product in the Order object as time taken to create the historical data is essentially charged to the admin creating the Product rather than the customer creating the Order. You can even create historical data objects in the background using threads etc. when you get to those advanced levels.
While it is hard answering your question without seeing your models.py at least, I will suggest archiving the results. You can add a boolean field called historical which defaults to False. When an order is made you need to set the previous order's (or orders') historical value to True in your view set or function.
Here, historical=True means the record is being archived. You can filter on this historical column to display what you want when. Sorry this is just a high-level outline.

How to handle changes in a relationship which would have an impact if there was an update [duplicate]

What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.

Access: Entering multiple subform values with one entry in the form

I've been using Access to create simple databases for a while with great success, but have run into a problem I can't find an answer to.
We ship individualized serialized units to various end-users, and occasionally to resellers that stock them for end-users. I must keep track of which serial numbers end up with each end-users.
The first database I created to handle this recorded company information in one table using their account number as primary key, order information in a second table using the order number as the primary key and linked via the company name, and unit information in a third table with the serial number as the primary key and linked via the order number.
This worked very well until I had to account for these stock orders with a reseller. As it was structured, every unit was linked to one company via the sales order. The issue is that I may ship 20 units on one order to Company A, who then sells 5 to Company B and 3 to Company C.
I realized I needed to link the company name directly to the units, not the orders and have fixed that.
My issue now is simplicity in entering information in the form. My previous database involved the employee in our shipping department merely entering the sales order, selecting the customer name from a drop down menu, then scanning the serial numbers in a subform. This was to ensure simplicity and try to eliminate human error. He had only three things to input, and most of the input was done by scanning barcodes.
As it is currently structured now, the employees out in shipping would have to populate the company name for every record in the subform with the serial number and that complicates things in a way that is unacceptable. At the point of shipping, the company name will always be the same for every unit in the subform.
So.
How would I go about creating a form where the company name is entered once in the form, and automatically populates itself for every record in the subform? The caveat here is that I must also be able to go back occasionally and change the company name of individual units in an order without necessarily affecting the rest of the order. I suppose it starts out as a one-to-many relationship that then must be able to change.
I hope that makes sense.
I have looked for answers using various approaches with auto-fill and relationships and not preserving data integrity, but I feel the answer is just beyond my reach.
The only solution I can think of is to create another field in the unit table for the end-user, and perhaps write a formula that sets this default value as the company name from the order that shipped it. This seems unnecessarily complicated and redundant, there has to be a better way.

Disregard changes to a product description when retrieving order records

The title is somewhat hard to understand, so here is the explanation:
I am building a system, that deals with retail transactions. Meaning - purchases. I have a database with products, where each product has an ID, that is also known to the POS system. When a customer makes a purchase, the data is sent to the back-end for parsing, and is saved. Now everything is fine and dandy, until there are changes to the products name, since my client wants to see the name of the product, as it was purchased then.
How do I save this data, while also keeping a nice, normal-formed database?
Solutions I could think of are:
De-normalization, where we correlate the incoming data with the info we have in the database, and then save only the final text values, not id's.
Versioning, where we keep multiple versions of every product, and save the transactions with the id of the products version, when it came in. The problem with this one is, that as our retail store chain grows, and there are more and more changes happening to the products, the complexity of the whole product will greatly increase.
Any thoughts on this?
This is called a slowly changing dimension.
Either solution that you mention works. My preference is the second, versioning. I would have a product table that has an effdate and enddate on the record. You can easily find the current record (where enddate is null) or the record at any point in time.
The first method always strikes me as more "quick-and-dirty", but it also works. It just gets cumbersome when you have more fields and more objects you are trying to track. It does, in general though, win on performance.
If the name has to be the name as it was originally, the easiest, simplest and most reliable way to do that is to save the name of the product in the invoice line item record.
You should still link to the product with a ProductID, of course.
If you want to keep a history of name changes, you can do that in a separate table if you wish:
ProductNameID
ProductID
Date
Description
And store a ProductNameID with the invoice line item.

The Integrity of linked data entities on update

What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.