Should order_products table be denormalized? - mysql

The order_products table holds data of products with the product name and price. It has a list of records what customers have bought.
There are also two fields called product_name and price which are duplicate data from the products table.
It is worth it to normalize order_products table and create history (audit) table for product name and price? Then I don't need product_name and price in the order_products table anymore?

I assume you need to store product name and price at the time of the order. Both will change in the course of time. If that happens a lot, your current approach may be good enough.
I would consider a normalized approach, especially if you have many rows in order_products per (product name, price). Have an additional table that stores the volatile states of a product every time they change. Could be called product_history like you already hinted. Just save the date (or timestamp) with every new state. Have a foriegn key link to the table product to preserve referential integrity. Like this:
create table product_history
(product_id integer -- or timestamp
,valid_from date
,product_name varchar
,price decimal
,PRIMARY KEY (product_id, valid_from)
,FOREIGN KEY (product_id) REFERENCES product(product_id)
ON DELETE CASCADE
ON UPDATE CASCADE)
A fast query to look up the applicable volatile attributes:
SELECT *
FROM product_history
WHERE product_id = $my_product_id
AND valid_from <= $my_date
ORDER BY valid_from DESC
LIMIT 1;
You definitely need an index on (product_id, valid_from) to speed up this query. The primary key in my example will probably do.

That depends. What is the purpose of that table?
In general tables like that can be used to statistical analysis of market trends so its important to have both product_name and price because the product price today may be different than what it was one month ago, but you may want to know at which prices products were most bought.
However if the presence of the price in that table is due to the fact that the price may be part of the products primary key then that is just bad practice and the key should be reduced.

It's not possible to make this judgement knowing just the database structure. It depends on how you use your database (ie. inserts, selects, updates and deletes... And how frequently?).
In one end, if your solution was a reporting solution on a read-only database, you should keep those duplicates! But if on the other end your solution is a logging solution that only logs information but never retreives, I'd go for the denormalized model you're suggesting.
Fully normalized database are not optimized for performance. You often have to denormalize your database design..
Very often a model that has a certain degree of redundant data is the fastest one. When denormalizing you just have to keep a steady eye on the balance between faster queries and slower insertions/updates!
Check these answers and maybe you'll find further help making your decision! When to Denormalize a Database Design

Yes that's a good idea, but a better idea is to create one field in order_products table and dump all your order info there after serializing them. With this approach you don't have to create 2 new tables (may be more if you want to do the same for gift coupon info, shipping info etc etc)
Rationale behind the approach is that order_products are placed order which means they are "published records". Published records don't change much and shouldn't be modified. And these records should be kept for future audits.

Related

Database design: Value(s) per user per day

I'm setting up a system where for every user (1000+), I want to add a set of values every single day.
Hypotetically:
A system where I can log when Alice and Bob woke up and what they had for dinner on the August 1st 2019 or 2024.
Any suggestions on how to best structure the database tables?
A person table with a primary person ID?
rows: n
A date table with a primary date ID?
rows: m
And a personDate table the person ID and date ID as foreign keys?
rows n x m
I don't think u need a date table unless u want to use it to make specific queries easier. Such as left join against the date to see what days you are missing events. Nevertheless, I would stick to the DATE or DATETIME as the field and avoid making a separate surrogate foreign key. It won't save any space and will potentially perform worse and will be more difficult to use for the developer.
This seems simple and fine to me. I wouldn't worry too much about the performance based upon the number of elements alone. You can insert a billion records with no problem and that implies a very large site.
Just don't insert records if the event didn't happen. In other words you want your database to grow in relation to the real usage. Avoid growth based upon phantom events and you should be okay.
person
person_id
action
action_id
personAction
person_id
action_id
action_datetime

What's the best way to store all these data in db?

My client has given me about 14k urls of various products and he wants me to store all the price changes of that product per day. I think it will require an immense amount of db storage and a lot of optimization. I've never done this before. I'm using mysql DB. Should I store all these price changes per product in a JSON column or as separate row? Looking for tips regarding this. Thanks!
JSON columns are not as efficient as normal SQL columns and should be reserved for when you're not sure what data you're going to have. You're pretty sure what data you're going to have.
This is a pretty straightforward two table schema. One table for the product, and one for its price changes.
create table product (
id integer primary key auto_increment,
name varchar,
url varchar unique,
...any other information about the product you might want to store...
index(url)
);
By giving it a primary key it shields you from the URL changing, and it reduces the amount that must be stored in tables that refer to it. They only have to store the integer primary key, not the whole URL. The URL is indexed for faster searches.
Now that you have a product table other tables can refer to it. Like a table of price changes.
create table product_price_changes (
product_id integer references product(id),
price numeric(9,2) not null,
change_time datetime not null,
index(change_time)
);
This table stores when the price for a product changes, and what that price is. This is how you attach lists of data to things in SQL. The change_time is indexed for faster searches.
A simple join lets you efficiently see all the changes to a particular product in order.
select price, change_time
from product_price_changes ppc
join product prod on ppc.product_id = prod.id
where prod.url = ?
order by change_time

Sql creating table without using foreign key

Consider we have one sql table customers
now consider iF we have a table where their are two columns customer_name and orders_name now one customer may have multiple orders (one to many relationship) So we have table where in which we choose customer_name as foriegn key. But now consider we have 100 orders to one customer_name so we have to write same customer_name 100 times. waist of memory.
customer_name,customer_orders table is
so i was thinking is can't we just make table with name of customer_name orders, for examle if we have customer_name bill so we can create a table with name of bill's orders, and write all his orders in it, now we not using any foriegn key,
bill's orders table is
and more tables we can create for other users so how it is possible to delete the table when we delete that customer_name from main table. any idea?
You solve the issue of wasted space by using surrogate keys. Instead of copying a huge alphanumeric field (names) to child tables, you would create an ID of sorts using a more compact data type (byteint, smallint, int, etc.). In the approach you propose where you create a separate table for each customer, you will run into the following issues:
cannot run aggregates across customers, i.e., you cannot simply do a sum, avg, min, etc. for sets of customers slicing the data different ways
SQL will be far more complex with each extra customer added to the queries
your data dictionary is going to grow huge and at some point you will incur major performance issues that are not easy to fix
The point of using a relational database is to allow for users to dynamically slice and dice the data. The method that you are proposing would not be useful for querying.

Beginner Database architecture

I am converting a spreadsheet to a database but how do i accommodate multiple values for a field?
This is a database tracking orders with factories.
Import PO# is the unique key. sometimes 1 order will have 0,1,2,3,4 or more customers requiring that we place their price tickets on the product in the factory. every order is different. what's the proper way to accommodate multiple values in 1 field?
Generally, having multiple values in a field is bad database design. Maybe a one to many relationship will work in this scenario.
So you will have an Order table with PO# as the primary key,
Then you will have a OrderDetails table with the PO# as a foriegn key. i.e. it will not be designated as a primary key.
For each row in the Order table you will have a unique PO# that will not repeat across rows.
In the OrderDetails table you will have a customer per row and because the PO# is not a primary key, it can repeat across rows. This will allow you to designate multiple customers per order. Therefore each row will have its own PriceTicketsOrdered field so you can know per customer what the price is.
Note that each customer can repeat across rows in the OrderDetails table as long as its for a different PO# and/or product.
This is the best I can tell you based on the clarity of your question.
Personally, I normally spend time desinging my database on paper or using some drawing software like visio before I start implementing my database in a specific software like MySql pr PostgreSql.
Reading up on ER Diagrams(Entity Relationship diagrams) might help you.
You should also read up on Database normalization. Probably you should read up on database normalization first.
here is a link that might help:
http://code.tutsplus.com/articles/sql-for-beginners-part-3-database-relationships--net-8561

Table with Non Identifying Keys

My company was using this model to manage the inventory
Model 1 http://img534.imageshack.us/img534/6024/modeltest2.jpg
But i was having problems because in this month we bought some plastic bags with a different price, expiration date on the same warehouse.
So now i changed the model to this.
Model 1 http://img16.imageshack.us/img16/8416/modeltest.jpg
My question is if this is ok.. it is working but is the first time i create a table with only no Primary key.
Example of Data:
PRODUCT WAREHOUSE Quantity Price Expiration_Date
PLASTIC BAG NEW YORK 20 1.20$ 12-10-2013
PLASTIC BAG NEW YORK 130 1.50$ 21-12-2015
Thanks
Basically it's OK. In Sales, Warehouse systems you can't save all products with different expiration date in Product Table, because there would be a lot records for each product. But usually you need to save them in "Item_Ledger_Entry" Table where would be all Transactions of Sales or Purcahse.
You using one same Product just with different expiration Date. I think you don't need at all Priamrey Key in Product has warehouseKey
One problem with this, specific to MySQL and InnoDB storage, is that InnoDB will silently create an extra 6-byte integer internally to serve as a surrogate primary key. Also, queries against any InnoDB table are more efficient if you can do them via the primary key. So it's to your advantage to define a primary key (or unique key) if possible.
If the combination of columns Product_ID, Warehouse_ID aren't sufficient to uniquely identify every row, then you could add a third column to distinguish between duplicates. For example, Stock_ID or something.