Data redundancy for order

Data redundancy for order - mysql

Table order_detail:
order_no | product_id
Table product_detail:
prod_id | prod_name | prod_size | prod_type
order_detail .product_id references to product_detail.prod_id
I heard data redundancy is a bad idea, so I inner join the tables to display the complete order details. But the problem is the data inside product_detail can be edit or delete by admin, which means when I inner join the tables, it might return null. Should I store something like JSON example: {size:23,type:MZ} in the order_detail to avoid the data 'loss'?

You will need to break down your table structure.
Having to store data in JSON format in your order_detail table compromises normalization on your database(which is not desirable).
Make product attributes as individual entities.
Product_detail
id | name | some_other_descriptive_columns | deleted_at
Product_Type:
id | type
Product_size:
id | size
Product_type_mapping:(pivot table which signifies many-to-many relationship between products and it's types)
id | product_id | product_type_id | deleted_at
Product_size_mapping:(pivot table which signifies many-to-many relationship between products and it's sizes)
id | product_id | product_size_id | deleted_at
As you might have already noticed, we have an additional column called deleted_at whose datatype will be a timestamp and which is nullable by default in all the tables shown above.
When the admin edits(could also be removal of certain sizes or types) or deletes a product, all we do is make an entry of a timestamp in deleted_at column. In other words, we do a soft delete.
So, when the admin operates on a product data set, fetch all the details whose deleted_at column IS NULL. When doing an inner join to fetch order details, it would not hinder the process of losing any data even if admin deletes a product after a few days after an order was made, because we would fetch all details regardless of what we have in our deleted_at column.

I suggest that you consider using a schema like the one in the diagram. (The arrowheads are the "many" end of the PK-FK relationships.)
This approach maintains referential integrity and avoids the use of nulls by recording deletions in a separate table.
You might also want to add a "ProductDataChangedOnDate" feature that keeps a record of the changes made by that pesky admin person :-)

Related

Why do many to many relation use all the key from previous table?

While using mysql workbench and for designing database using designer the relation tool uses a third table to form a many to many relation between 2 tables.
I have 3 tables
TABLE1
TABLE2
TABLE3
TABLE2 has foregin key from primary key of TABLE1,having a many to one relation
TABLE2 and TABLE3 are related using a many to many relation,
as soon as I create the relation
a new table TABLE3_has_TABLE2 is created with all the key from TABLE2(primarykey of table2 & foreign key of table1) and TABLE3 (primary key of table3).
Now,
why is there foreign key of table1.?
Even if i remove I will be able to query data from table1 and table3 using table2 as intermediate, so is it good to have this kind of relation or avoided?
For Example in below diagram
This is a geographical distribution of location, on right side it shows the hirarchy.
Now,
Table1(Zone) is the primary table i.e Zone
Table2(state) is related to table1 using zone_id
Table3(division) is related to table2(state) using state_id & zone_id of table1(zone)
Question: Should this zone_id column be in the table3 or not?
similarly table4 contains all the previous key columns of table3.

Strictly from a denormalization point-of-view, the DIVISION.STATE_ZONE_ID isn't required.
Since you can get the ZONE_ID from the DIVISION by joining STATE on the state_id.
And it's the same with the division_state_state_id & division_state_zone_id in DISTRICT.
Having the division_division_id is enough to join DIVISION, then STATE, then ZONE.
However, what if you would remove those 'extra' fields?
Then a SQL always needs to go through that cascade of joined tables to get the ZONE.zone_name.
So there's an advantage that by having those 'extra' fields, it becomes possible to JOIN directly to the ZONE table. Which can simplify/speed up certain popular queries.
The disadvantage is that it becomes harder to assure referential integrity.
Because for example, you could assign a different zone_id to a DIVISION.state_zone_id than the STATE.zone_id you can get via DIVISION.state_state_id.

It is best practice in relational models to avoid many-to-many relationships. Workbench usually compensates for user trying to do that as you have seen.
Let us use an example (or check the tl;dr), where there are two identified entities; buyers and hardware items. Some people buy 1 item, others buy more than one. The thing is, that same item can be bought by many people. So the buyer table has Mr. A buying nails. Simple enough to record in one row. But lo' and behold, he ups and gets another item! How do we show that he buys another item?
One way is by adding another attribute to the table (say "item_number_two"). But then he gets another! We can't keep going adding attributes like that. Databases were designed more for vertical addition of records, rather than horizontal addition of attributes (to give a visual picture). There is a longer explanation but you should read up, or probably might figure it out after reading this.
Another way is to re-enter a record for Mr. A and then put the ID of another item in that column, showing that he bought two items (not really "he" from a database stand-point, it's two different people!).
A better method would be to create a table that consists of the unique identifiers found in the original tables (just one per table may be necessary). This is called an intermediary table. The original tables themselves do not have foreign keys from the other table.
This is where the concept of a composite key comes in. It means that two or more candidate keys are used to uniquely identify a record rather than just one. This is how it works:
Person Table:
| person_ID | person_Name |
| P0001 | Mr. A |
| P0002 | Mr. B |
| P0003 | Mr. C |
| P0004 | Mr. D |
Cat Table
| item_ID | cat_Name |
| I0001 | Nails |
| I0002 | Screws |
| I0003 | Hammers |
| I0004 | Power-Saw |
Intermediary table
| person_ID | item_ID |
| P0001 | I0001 |
| P0001 | I0002 |
| P0001 | I0003 | //Shows that person 1 bought more than one item
| P0002 | I0004 |
| P0002 | I0001 | //Shows that an item has been bought by more that one person
So this new table matches a record of one table(through the use of a primary key) to a record of another. The only thing that will ever be repeated is one of the two ID's. A unique record is made as long as no two combinations are repeated.
tl;dr - Having tables mapped in a many to many relationship inevitably wastes space in the DB when entering records, as new records of the same data have to be made to show a small difference (adding no real value in proportion to the space). Another issue is that it causes more calculations than necessary when a query is made, wasting time and space. Or the results returned may just be plain wrong...
EDIT:
If you have tables A and B having a many-to-many relationship, do the following as an alternative. Create a table C. Take the primary keys from table A and B and place them in tables C. In table C they both exist as primary and foreign keys. This would mean the following relationship is created.
| Table A |-----------<| Table C |>------------|Table B|
Table A and B are linked through C.
Sample query:
SELECT C.itemID FROM A, C WHERE A.personID = P0001 AND A.personID = C.personID;
This query will return all ID's of the items bought by the person with an ID of P0001. Records must match the condition of having a personID of P0001, but the record selected must have that matching ID in Table C (the intermediary table). An extended query could be to take the item names from the Table B. Each attribute in C has a recorded value that corresponds to a value of a key in either Table A or B, meaning that a query can be run to pull other info, where the value in Table C is = to the values in Table A/B (depending on which one you want).

"Cascade" foreign keys through multiple tables?

Suppose I have 3 database tables: Countries, Provinces and Cities.
Countries has id (PK) and name.
Provinces has id (PK), name and country_id (FK).
Cities has id (PK), name and province_id (FK).
My question is: would be good to have country_id too as FK in the Cities table? I mean, this Country_id would be Provinces table's FK (Pronvices_Countries_id), not Countries PK directly. My mate says it is better for performance. But having FK's of all previous tables may be tedious when you have a lot of tables. For example, having 8 tables in relation, the last one could have 8+ FK's instead of the last table PK as FK.
Countries table:
+----+-----------+
| id | name |
+----+-----------+
| 1 | France |
+----+-----------+
Provinces table:
+----+-----------+------------+
| id | name | country_id |
+----+-----------+------------+
| 1 | Languedoc | 1 |
+----+-----------+------------+
Cities table:
+----+-----------+-------------+---------------------+
| id | name | province_id | province_country_id |
+----+-----------+-------------+---------------------+
| 1 | Toulouse | 1 | 1 |
+----+-----------+-------------+---------------------+
Can I have an explanation about this?
EDIT: Maybe the answer can be about identifying and non-identifying relationships? (I don't know.)

You should keep the structure as simple as possible until it actually affects performance.
A big effort is made at the engine level to make use of indexes when making queries, so a best bet would be to create indexes including all the fields you'll filter by.
Otherwise, countries, provinces and cities look like small tables, so the total performance impact wouldn't be noticeable anyway.

This makes your inserts and updates more complicated and updates should definitely be handled with a trigger on all the associated tables. That way if the country of the province changes, the country of cities in that province would also change automatically. You would also have to be careful that the source of the countryid value is consistent between the insert to province and the insert to city. You really have to be careful to maintain data integrity.
However there is a business case where it makes sense to do this.
That is:
when the id is unlikely to change frequently (so little extra work normally on update would happen)
when a significant portion of the queries against the child tables do
not need any information from the intermediate tables
and preferably when the descriptor information in the parent table is
not usually needed.
For instance, we do this for clientid as the client never changes in our system. almost all queries of subordinate tables need to be filtered by client but the client name is rarely needed in the end query. By cutting out several layers of joins to things we don't need in most queries, denormalizing clientid made sense.
However, in your case, while the first condition is likely met, are you really going to filter by country and not need province or city? I would find this a less likely scenario. Of course I am not familiar with how your data is used; it could be I am mistaken in this. I think in this case teh risk to data integrity woudl be higher than teh gain by doing this.

Should I use a single table for many categorized rows?

I want to implement some user event tracking in my website for statistics etc.
I thought about creating a table called tracking_events that will contain the following fields:
| id (int, primart) |
| event_type (int) |
| user_id (int) |
| date_happened (timestamp)|
this table will contain a large amount of rows (let's assume at least every page view is a tracked event and there are 1,000 daily visitors to the site).
Is it a good practice to create this table with the event_type field to differentiate between essentially different, yet identically structured rows?
or will it be a better idea to make a separate table for each type? e.g.:
table pageview_events
| id (int, primart) |
| user_id (int) |
| date_happened (timestamp)|
table share_events
| id (int, primart) |
| user_id (int) |
| date_happened (timestamp)|
and so on for 5-10 tables.
(the main concern is performance when selecting rows WHERE event_type = ...)
Thanks.

It really depends. If you need to have them separated, because you will only be querying them separately, then splitting them into two tables should be fine. That saves you from having to store an extra discriminator column.
BUT... if you need to query these sets together, as if they were a single table, it would be much easier to have them stored together, with a discriminator column.
As far as WHERE event_type=, if there are only two distinct values, with a pretty even distribution, then an index on just that column isn't going to help much. Including that column as the leading column in a multicolumn index(es) is probably the way to go, if a large number of your queries will include an equality predicate on that column.
Obviously, if these tables are going to be "large", then you'll want them indexed appropriately for your queries.

Does it make sense to have three primary keys, two of which are foreign keys, in one table?

I've created a database with three tables in it:
Restaurant
restaurant_id (autoincrement, PK)
Owner
owner_id (autoincrement, PK)
restaurant_id (FK to Restaurant)
Deal
deal_id (autoincrement)
owner_id (FK to Owner)
restaurant_id (FK to Restaurant)
(PK: deal_id, owner_id, restaurant_id)
There can be many owners for each restaurant. I chose two foreign keys for Deal so I can reference the deal by either the owner or the restaurant. The deal table would have three primary keys, two being foreign keys. And it would have two one-to-many relationships pointing to it. All of my foreign keys are primary keys and I don't know if I'll regret doing it like this later on down the road. Does this design make sense, and seem good for what I'm trying to achieve?
Edit: What I really need to be able to accomplish here is when a owner is logged in and viewing their account, I want them to be able to see and edit all the deals that are associated with that particular restaurant. And because there can be more that one owner per restaurant, I need to be able to perform a query something like: select *from deals where restaurant_id = restaurant_id. In other words, if I'm an owner and I'm logged in, I need to be able to make query: get all of the deal that are related to not just me, the owner, but to all of the owners associated with this restaurant.

You're having some trouble with terminology.
A table can only ever have a one primary key. It is not possible to create a table with two different primary keys. You can create a table with two different unique indexes (which are much like a primary key) but only one primary key can exist.
What you're asking about is whether you should have a composite or compound primary key; a primary key using more than one column.
Your design is okay, but as written you probably have no need for the column deal_id. It seems to me that restaurant_id and owner_id together are enough to uniquely identify a row in Deal. (This may not be true if one owner can have two different ownership stakes in a single restaurant as the result of recapitalization or buying out another owner, but you don't mention anything like that in your problem statement).
In this case, deal_id is largely wasted storage. There might be an argument to be made for using the deal_id column if you have many tables that have foreign keys pointing to Deal, or if you have instances in which you want to display to the user Deals for multiple restaurants and owners at the same time.
If one of those arguments sways you to adopt the deal_id column, then it, and only it, should be the primary key. There would be nothing added by including the other two columns since the autoincrement value itself would be unique.

If u have a unique field, this should be the PK, that would be the incremented field.
In this specific case it gives u nothing at all to add more fields to this key, it actually somewhat impacts performance (don't ask me how much, u bench it).

if you want to create 2 foreign keys in the deal table which are the restaurant and the owner the logic is something like a table could exist in the deal even without an owner or an owner could exist in the deal even without identifying the table on it but you could still identify the table because it's being used as a foreign key on the owner table, but if your going to put values on each columns that you defined as foreign key then I think it's going to be redundant cause I'm not sure how you would use the deal table later on but by it's name I think it speaks like it would be used to identify if a restaurant table is being reserved or not by a customer and to see how you have designed your database you could already identify the table which they have reserved even without specifying the table as foreign key in the deal table cause by the use of the owner table you would able to identify which table they have reserved already since you use it as foreign key on the owner table you just really have to be wise on defining relationships between your tables and avoid redundancy as much as possible. :)

I think it is not best.
First of all, the Deal table PK should be the deal_id. There is no reason to add additional columns to it--and if you did want to refer to the deal_id in another table, you'd have to include the restaurant_id and owner_id which is not good. Whether deal_id should also be the clustered index (a.k.a. index organized on this column) depends on the data access pattern. Will your database be full of data_id values most often used for lookup, or will you primarily be looking deals up by owner_id or restaurant_id?
Also, using two separate FKs way the you have described it (as far as I can tell!) would allow a deal to have an owner and restaurant combination that are not a valid (combining an owner that does not belong to that restaurant). In the Deal table, instead of one FK to Owner and one FK to Restaurant, if you must have both columns, there should be a composite FK to only the Owner table on (OwnerID, RestaurantID) with a corresponding unique key in the Owner table to allow this link up.
However, with such a simple table structure I don't really see the problem in leaving RestaurantID out of the Deal table, since the OwnerID always fully implies the RestaurantID. Obviously your deals cannot be linked only with the restaurant, because that would imply a 1:M relationship on Deal:Owner. The cost of searching based on Restaurant through the Owner table shouldn't really be that bad.

Its not wrong, it works. But, its not recommended.
Autoincrement Primary Keys works without Foreign Keys (or Master Keys)
In some databases, you cannot use several fields as a single primary key.
Compound Primary Keys or Compose Primary Keys are more difficult to handle in a query.
Compound Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(1=1) AND
(D.RestaurantKey = D.RestaurantKey) AND
(D.OwnerKey = D.OwnerKey)
Versus
Single Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(D.OwnerKey = O.OwnerKey)
Sometimes, you have to change the value of foreign key of a record, to another record. For Example, your customers already order, the deal record is registered, and they decide to change from one restaurant table to another. So, the data must be updated, in the "Owner", and "Deal" tables.
+-----------+-------------+
| OwnerKey | OwnerName |
+-----------+-------------+
| 1 | Anne Smith |
+-----------+-------------+
| 2 | John Connor |
+-----------+-------------+
| 3 | Mike Doe |
+-----------+-------------+
+-----------+-------------+-------------+
| OwnerKey | DealKey | Food |
+-----------+-------------+-------------+
| 1 | 1 | Hamburguer |
+-----------+-------------+-------------+
| 2 | 2 | Hot-Dog |
+-----------+-------------+-------------+
| 3 | 3 | Hamburguer |
+-----------+-------------+-------------+
| 1 | 3 | Soda |
+-----------+-------------+-------------+
| 2 | 1 | Apple Pie |
+-----------+-------------+-------------+
| 3 | 3 | Chips |
+-----------+-------------+-------------+
If you use compound primary keys, you have to create a new record for "Owner", and new records for "Deals", copy the other fields, and delete the previous records.
If you use single keys, you just have to change the foreign key of Table, without inserting or deleting new records.
Cheers.

Database table design - are my fields correct?

I need to sell items on my fictitious website and as such have come up with a couple of tables and was wondering if anyone could let me know if this is plausible and if not where i might be able to change things?
I am thinking along the lines of;
Products table : ID, Name, Cost, mediaType(FK)
Media: Id, Name(book, cd, dvd etc)
What is confusing me is that a user might have / own many products, but how would you store an array of product id's in a single column?
Thanks

You could something like store a JSON array in a text or varchar field and let the application handle parsing it.
MySQL doesn't have a native array type, unlike say PostgreSQL, but in general I find if you're trying to store an array you're probably doing something wrong. Of course every rule has its exceptions.
What your probably want is a user table and then a table that correlates products to users. If a product is only going to relate to one user then you can add a user ID column to your Products table. If not, then you'll want another lookup table which handles the many to many relationship. It would look something like this:
------------------------
| user_id | product_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 3 | 1 |
| 3 | 5 |
------------------------

I think one way of storing all the products which user has in one column is to store it as a string where product ids are separated by some delimiters like comma. Though this is not the way you want to solve. The best way to solve this problem would be to have a seperate user table and than have a user product table where you associate userid with product id. You could than simple use a simple query to get list of all the products owned by a particular userid

As a starting point, try to think of the system in terms of the major parts - you would have a 'warehouse', so you need a table to list the products you have, and you are going to possibly have users who register their details with you for regular visits - so an account per user. You would generally hold all details of a single product in the same row of the same table (unless you have a really complex product to detail, but not likely). If you're going to keep track of products bought per user account, there's always the option of keeping the order history as a delimited list in a large text field. For example: date,id,id,id,id;date,id,id. Or you could simply refer to order numbers and have a separate table for orders placed [by any customer].

What is confusing me is that a user might have / own many products, but how would you store an array of product id's in a single column?
This is called a "many-to-many" relationship. In essence you would have a table for users, a table for products, and a table to map them like this:
[table] Users
- id
- name
[table] Products
- id
- name
- price
[table] Users_Products
- user_id
- product_id
Then when you want to know what products a user has, you could perform a query like:
SELECT product_id FROM Users_Products WHERE user_id=23;
Of course, user id 23 is fictituous for examples sake. The resulting recordset would contain the id's of all the products the user owns.

You wouldn't store an array of things into a single column. In fact you usually wouldn't store them in separate columns either.
You need to step away from design for a bit and go investigate third normal form. That should be you starting point and, in the vast majority of cases, your ending point for designing database schemas.
The correct way of handling variable size "arrays" is with two tables with a many to one relationship, something like:
Users
User ID (primary key)
Name
Other user info
Objects:
Object Id (primary key)
User id (foreign key, references Users(User id)
Other object info
That's the simplest form where one object is tied to a specific user, but a specific user may have any number of objects.
Where an object can be "owned" by multiple users (I say an object meaning (for example) the book "Death of a Salesman", but obviously each user has their own copy of an object), you use a joining table:
Users
User ID (primary key)
Name
Other user info
Objects:
Object Id (primary key)
User id (foreign key, references Users(User id))
Other object info
UserObjects:
User id (foreign key, references Users(User id))
Object id (foreign key, references Objects(Object id))
Count
primary key (User id, Object id)
Similarly, you can handle one or more by adding an object id to the Users table.
But, until you've nutted out the simplest form and understand 3NF, they won't generally matter to you.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008