hypothetically, if i were with a system that keeps track of products and order information across multiple tables (orders, order_items, product)
orders
id INT(11)
shipping_name VARCHAR(255)
shipping_street VARCHAR(255)
shipping_city VARCHAR(255)
[etc]
order_details
id INT(11)
order_id INT(11)
product_id INT(11)
products
id INT(11)
name VARCHAR(255)
description VARCHAR(255)
price DECIMAL(8,2)
structure is very simple order has multiple order_items, order_items has one product.
the problem is that when someone edits a product, those edits modify the data of previous orders. if an employee were to go back and look at that information later on, they may not have the same information that the customer received at the time the order was placed.
What would be best practice?
should i add a 'display_item' field to the products table, and on edit/delete set display to 0 and add edited product as new row?
should i duplicate the name, description, and price in order_details?
I think this is one of those cases where database normalization "breaks".
Some possible solutions:
Keep a copy of the product attributes for each order. This is expensive in terms of storage, but it makes it easier to track down the product data stored in the order.
Create a log of attributes that can be changed in time. Product attributes can change over time, so a log which stores the modification date can help you filter out the product attributes to the moment the order was made.
Proposal for option 1
Create a copy of the products table and create a relation (one-to-one) to the order_details table for each order and order detail.
Proposal for option 2
Split the products table in two: product_general_info and product_attributes. Product general info is meant to be stable through time (a product's general info will not change), as any modification to the data in this table will propagate to the whole orders set. Product attributes must have a date or timestamp value to define when the attributes changed. Then you can query the database and return the last record that is before or on the order date.
Related
In my project, a guest user can make donations to organisation in 2 ways.
1.organisation has invited donations for a 'x' purpose,
and a fixed amount to be paid, say Rs.150.
In this case list of donation exist.user can choose one or more donations
and can make payment.
2.A user can pay any amount as donation to the organisation.
I want to maintain the payments record for both fixed and raw donations to the organisation.
fixed_donations table
---------------
id pk
organisation_id fk
donation_name varchar(250)
description text
price decimal(10,2)
payments table
--------------
id pk
payment_id int(10)
donation_type enum('fixed','raw')
organisation_id fk
fixed_donation_id fk
amount decimal(10,2)
name varchar(50)
email varchar(50)
contact_number int(12)
date datetime
Is there a need to keep seperate payments table for raw donation and fixed donation, or is there a better way to include both donation payments within a table
If I understand your question correctly, a payment can be either for a known donation request, or with no donation request.
This is an example of a known problem with the relational model - it's hard to accommodate inheritance. There are lots of questions on Stack Overflow on that topic.
In your case, I would imagine that the "payments" table has several important uses in the database model - you'll want to work out total payments, payments in a period, find payments from a donor. So I would keep it pretty much as you have it, using the "single table inheritance" model. The alternatives are likely to be much harder to work with.
My client has given me about 14k urls of various products and he wants me to store all the price changes of that product per day. I think it will require an immense amount of db storage and a lot of optimization. I've never done this before. I'm using mysql DB. Should I store all these price changes per product in a JSON column or as separate row? Looking for tips regarding this. Thanks!
JSON columns are not as efficient as normal SQL columns and should be reserved for when you're not sure what data you're going to have. You're pretty sure what data you're going to have.
This is a pretty straightforward two table schema. One table for the product, and one for its price changes.
create table product (
id integer primary key auto_increment,
name varchar,
url varchar unique,
...any other information about the product you might want to store...
index(url)
);
By giving it a primary key it shields you from the URL changing, and it reduces the amount that must be stored in tables that refer to it. They only have to store the integer primary key, not the whole URL. The URL is indexed for faster searches.
Now that you have a product table other tables can refer to it. Like a table of price changes.
create table product_price_changes (
product_id integer references product(id),
price numeric(9,2) not null,
change_time datetime not null,
index(change_time)
);
This table stores when the price for a product changes, and what that price is. This is how you attach lists of data to things in SQL. The change_time is indexed for faster searches.
A simple join lets you efficiently see all the changes to a particular product in order.
select price, change_time
from product_price_changes ppc
join product prod on ppc.product_id = prod.id
where prod.url = ?
order by change_time
We are coding a MIS for customers. The price of a particular product changes often, customers need to maintain price and date period during which the price is effective. There is a table named PRODUCT_PRICE to maintain the price whose DDL is simply shown as below.
CREATE TABLE `PRODUCT_PRICE` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
`product_id` bigint(20) NOT NULL DEFAULT 0 COMMENT 'product id',
`price` bigint(20) NOT NULL DEFAULT 0 COMMENT 'price value in cent',
`start_date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT 'when this price takes effect',
`del_flag` tinyint(1) unsigned NOT NULL DEFAULT '0' COMMENT 'mark if logically deleted',
`status` tinyint(1) unsigned NOT NULL DEFAULT '0' COMMENT '0 new, 1 wait for audit, 2 accepted, 3 rejected',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='product price table'
Note that there is no end_date in DDL because it is exactly the day before start_date of the next record or infinite if there is no next record.
In the system, every time when some people create or edit the price of some product in system, information will be sent to admin to review and audit. The change will not take effect unless the admin accept it. A PRODUCT_PRICE that passed the review and audit can be edited again.
The problem is that in our old design, the certain record of PRODUCT_PRICE will be immediately changed, with status field flipped into 0 to wait for review and audit. But that is not what customers want.
The new requirement is that people can still view the price records that passed review after someone edits it unless the new price or start_date is accepted by admin.
How should we refactor the old table design to implement the new requirements that the change of a certain record will not immediately take effect unless the change is accepted by admin.
After discussion, we figured out 2 solutions.
Every time when someone changes a PRODUCT_PRICE record in system, a row that contains the new information will be created, and sent for audit. After it is accepted, the old one would be deleted. Hence, a new column reference_id should be added to the table to mark a certain old record for change.
Create a new table PRODUCT_PRICE_TEMP to store all new price records waiting for audit. After accepted, update the certain old record in PRODUCT_PRICE. This solution also need a column reference_id to refer a certain row in PRODUCT_PRICE, but we don't need to delete records in PRODUCT_PRICE for changing value(just update).
Is there a better design to our new requirement?
Your first solution is almost it.
Try to look at the problem from birds view, abstract from db model. Each time price changes system creates new instance of product price item. It is a change from user perspective, for system it is a new entity. Seems like you already figured it out.
What is more important - let me warn you from doing this: "the old one would be deleted". I believe the assumption behind is that product price start date can be in the past. It is very dangerous and it would cause terrible complications later. Change of the price in the past must be a rare exception, not a normal situation.
Of course is still happens because people do mistakes. For this critical (and rare!) situation you need to give one or two senior people (or admin/support) ability to change the price in the past without any approvals. Probably directly in the DB. There is just no time for it, company is loosing money every second. Usually, the same person will later take care of all the consequences of the wrong pricing on customers and bills. Even in this situation you should not edit row in the product_price, mark the old one as invalid and create a new row starting in the past.
I think I know what the problem is, but I need some help going in the correct direction.
I've got a tables 'products', and I've also got several temp_product tables for various suppliers.
The goal here is to update several fields in the products table from the appropriate temp_product table. My product suppliers give me a CSV file with all of their inventory data. Originally I was just looping through the CSV and updating line by line, but this takes forever, so now I load it into a temporary table using LOAD DATA LOCAL INFILE.
The problem I am having, is the UPDATE queries take forever to run, and most of the time MySQL just completely crashes. I am hoping I can show you my table structure, and somebody could help me out with what kind of key/index setup would work best?
I've tried 2 different update query variations, but neither one is working.
UPDATE product AS p, temp_product AS t
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
WHERE p.sku = t.sku
-AND-
UPDATE temp_product AS t
INNER JOIN product AS p ON p.sku = t.sku
SET p.quantity = t.quantity,
p.ean = t.inventory,
p.cost = t.price,
p.date_modified = NOW()
Here is the structure to my tables in question:
temp_product
sku varchar(15) PRI
status varchar(2)
statusid int(11)
quantity int(11)
inventory varchar(15)
ETA varchar(25)
ETA_Note varchar(255)
price double(10,2)
product
product_id int(11) PRI
model varchar(64)
sku varchar(64)
upc varchar(50)
ean varchar(50)
mpn varchar(64)
location varchar(128)
quantity int(4)
price decimal(15,4)
cost decimal(15,4)
status tinyint(1)
date_added datetime
date_modified datetime
I have a feeling I could get this to work correctly if I had keys/indices set up correctly. The only thing I have set up now is the Primary Key, but those don't match up across all the tables. I'm pretty new to all this, so any help would be appreciated.
To make things even more complicated, I'm not sure if some of my suppliers use the same SKUs, so I would like to update the product table WHERE sku = sku and location = 'suppliername'.
Thanks for the help!
EDIT: Slimmed down the problem a little bit, originally had a product and supplier_product table to update, once I get the product table working I can probably take it from there.
First of all, could you run SHOW CREATE TABLE product; and SHOW CREATE TABLE temp_product; and paste the results? Also, how exactly large is your product table? (select count(1) from products can help)
Regarding the keys: you need at least to add sku key to your product table.
If sku is supposed to be a unique field, then you can do it with the following command:
ALTER TABLE product ADD UNIQUE KEY sku(sku);
If sku is NOT a unique field, then you can still add it as a key like that:
ALTER TABLE product ADD KEY sku(sku);
but in that case, this mean that for one record with a partcular sku from the temp_product table, you will update more than one record in your product table.
Regarding the table size: even if the table is large (say several million rows), but it's OK to run queries that take a lot of time (for example, if you are the only one using this database) then after you have added the key, either of the variants should in principle work and take less time than what it takes now. Otherwise, you would be better off with doing the update in batches (e.g. 100, 500 or 1000 records at a time) and preferably with some script that might even wait a little between the updates. This is especially recommended if your database is a master database that replicates to slaves.
We have a shopping cart as pictured below, The setup works well, except for one fatal flaw. If you place an order the order is linked to a product, so If I update the product after you have purchased the product there is no way for me to show you want the product looked like when you bought it (including price). This means we need versioning.
My plan at present is to, when a new product, or variant is created, or an existing one is edited, create a duplicate of the product or variant in the database. When a purchase is made, link the order to the version, not the product.
This seems rather simple, except from what I can see the only things we don't need to version are the categories (as no one cares what categories it was in.). So we need to version:
Products
Variants
The key -> value pairs of attributes for each version
The images
My current thinking is,
note: When a product is created a default variant is created as well, this cannot be removed.
When a product is created
Insert the product into the products table.
Create the default variant
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate the variant into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a product is edited
Update the product into the products table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
When a variant is added
Add the variant into the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
When a variant is edited
Update the variant in the variants table.
Duplicate the product into the products_versions table
Replace current id column with a product_id column
Add id column
Duplicate all product variants into the variants_versions table
Replace current id column with variant_id column
Add id column
Replace product_id column with product_version_id column
Duplicate all variant_image_links into the variant_Image_link_version table
Replace current variant_id column with variant_version_id column
So the final structure looks like Full Size
Now this all seems great, except it seems like a heck of a lot of duplicated data, e.g. if we update a product we duplicate the variants even though they would not have been updated since they were inserted. Also, this seems like a lot of work.
Is there a better way of doing this?
You can do what ERP (and also possibly Payroll) systems do: Add a Start and End Date/Time. So...
the variant and prices match with their product based on the common dates.
all queries default to running on current date and the joins between each table need to also take into account the overlapping/intersecting date ranges. parent_start_date <= child_start_date AND parent_end_date >= child_end_date
You would end up with duplicated rows for each price change or variant but you then don't need to keep update as many records (like variant ids) when the product price changes.
Need to ensure valid dates are used. PS: Use your system's max date for the End datetime of the most current/recent record.
Btw, some related questions along the same line:
Ways to implement data versioning in MongoDB
Ways to implement data versioning in PostreSQL
Ways to implement data versioning in Cassandra
Row versioning in MySQL
Another approach to this would be to never edit or remove your data, only create new data. In SQL terms, the only operations you ever run on your tables are INSERTs and SELECTs.
To accomplish what you want, each table would need the following colums:
version_id - this would be your primary key
id - this would be the thing that holds versions of your object together (e.g. to find all versions of a product, SELECT * FROM products WHERE id = ?)
creation_date
is_active - you're not deleting anything, so you need to flag to (logically) get rid of data
With this, here's what your products table would look like:
CREATE TABLE products (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
name VARCHAR(1024) NOT NULL,
price INTEGER NOT NULL
);
CREATE TABLE variants (
version_id CHAR(8) NOT NULL PRIMARY KEY,
id INTEGER NOT NULL,
creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
is_active BOOLEAN DEFAULT true,
product_version_id CHAR(8) NOT NULL,
price INTEGER NOT NULL,
override_price INTEGER NOT NULL,
FOREIGN KEY (product_version_id) REFERENCES products(version_id)
);
Now, to insert into either table
Generate a unique version_id (there are several strategies for this, one is to use a database sequence, or for MySQL use ant AUTO_INCREMENT).
Generate an id. This id is consistent for all versions of a product.
To update a row in a table, one must insert the entire graph e.g. to update a product, one must insert a new product, and new variants. (There is a lot of room for optimization here, but it's easiest to start with the un-optimized solution.)
For example, to update a product
Generate a unique version_id
Use the same id
Insert new product variants. The variants will be the same as the ones linked to the previous version of the product that you're "updating", except the product_version_id will be different.
This principal can extend to all your tables.
To find the most recent version of a product, you need to use the creation_date column to get the product that was most recently created.
This model will use more space, but I think this may be a fair trade-off given it's simplicity: there are only INSERTs and SELECTs and data is never mutated.