Refactoring old products table with many differents - mysql

I need to refactor an old mysql database of products, which is divided into tables that store characteristics of different types of products.
That is to say, not all products have the same number of characteristics, and some of them influence the price of the product while others do not. So I don't see clearly if need a pivot table to manage relations between products and its features.
I can represent this as a hardware store where there are everything from screws to tools, going through some materials such as iron and wood.
For example, if we consider screws, under the name screw we have different sizes and types (for wood, for metal, millimeter thread or self-drilling), each size and type combination will determine a price, but maybe the color of the screws don't affect prices
So I was thinking of the following table structure:
products:
id | name_id | feature_id | price
product_names:
id | name
product_features:
id | name | description
But it is not clear to me how to deal with the situation that a product may have more than one feature and not all are relevant for it's price.
Thank you for any suggestion.
Based on the exchange of opinions, I arrived at this concept of a diagram. I would like to know if you see it as appropriate.

Related

Bill of Materials: One table for everything, or a table for each sub-level?

I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?

Database design for ecommerce site with many product categories

I've a requirement to design a database for an ecommerce app that has vast scope of product categories ranging from pin to plane. All products have different kinds of features. For example, a mobile phone has specific features like memory, camera mega pixel, screen size etc whilst a house has land size, number of storeys and rooms, garage size etc. Such specific features go on and on as much as we've products. Whist all have some common features, there are mostly very different and specific features of all. So, it has gotten bit confusing while designing its database. I'm doing it for the first time.
My query is about database design. Here is what I'm planning to do:
Create a master table with all fields, that tells if a field is common or specific and map them with respective category of the product. All products will have "common" fields but "specific" will be shown only for one category.
table: ALL_COLUMNS
columns:
id,
name,
type(common or specific),
category(phone, car, laptop etc.)
Fetch respective fields from all_columns table while showing the fields on the front.
Store the user data in another table along with mapped fields
table: ALL_USER_DATA
columns:
id,
columnid,
value
I don't know what is the right way and how it is done with established apps and site. So, I'm looking forward if someone could tell if this is the right way of database architecture of an ecommerce app with highly comprehensive and sparse set of categories and features.
Thank you all.
There are many possible answers to this question - see the "related" questions alongside this one.
The design for your ALL_USER_DATA table is commonly known as "entity/attribute/value" (EAV). It's widely considered horrible (search SO for why) - it's theoretically flexible, but imagine finding "airplanes made by Boeing with a wingspan of at least 20 metres suitable for pilots with a new qualification" - your queries become almost unintelligible really fast.
The alternative is to create a schema that can store polymorphic data types - again, look on Stack Overflow for how that might work.
The simple answer is that the relational model is not a good fit for this - you don't want to make a schema change for each new product type your store uses, and you don't want to have hundreds of different tables/columns.
My recommendation is to store the core, common information, and all the relationships in SQL, and to store the extended information as XML or JSON. MySQL is pretty good at querying JSON, and it's a native data type.
Your data model would be something like:
Categories
---------
category_id
parent_category_id
name
Products
--------
product_id
price
valid_for_sale
added_date
extended_properties (JSON/XML)
Category_products
-----------------
category_id
product_id

multiple foreign key ERD

I having a question about same FK using in the schema. Here is the question
|=======================================|
| Book |
|=======================================|
| Book_ID (PK)| Cover_Paper | Page_Paper|
|-------------|-------------|-----------|
|====================================|
| Paper |
|====================================|
| Paper_ID (PK)| Paper_Type | weight |
|--------------|------------|--------|
Let say, I have different type of paper with different weight use to print cover and page.
So I need to plug the Paper_ID as FK into Book table. The problem is, it is wrong to have different column name as FK. If I change the table to the same column name it will be so weird.
|==========================================|
| Book |
|==========================================|
| Book_ID (PK)| Paper_ID(FK) | Paper_ID(FK)|
|-------------|--------------|-------------|
Any help on this problem??
It's not wrong to have column names that differ from the domain name of the column. In fact, it is often necessary.
The alternative - having two columns with the same name - is bad. How would you know which column indicated cover paper and which page paper? By position? This ties the meaning of the content to the physical representation of the data. What happens if I select Book_ID and just one of the Paper_ID columns? One wouldn't know, without additional external information, what the data means. Rather, that additional information should be part of the representation, so that it's as self-descriptive as possible.
In relations where each role is filled by a unique domain, it's easy enough to just use the name of the domain as the name of the role without confusion. If a book consisted of a single type of paper, talking about the book's paper makes sense. Same for a bicycle's seat and a person's nose.
However, when a relation has more than one of the same kind of thing, we need to indicate each thing's role. Distinguishing Cover_Paper and Page_Paper like you did is the right way to do it. (It's too bad SQL DBMSs don't have separate role and domain names for each column, but I digress.)
You could call it Cover_Paper_ID and Page_Paper_ID, it's sort of an industry convention to attach ID to surrogate identifier columns though I think it reads better without. In other relations, it's often sufficient to write just the role without the domain - e.g. in a Marriage we might have columns for Husband and Wife, instead of writing Husband_Person and Wife_Person.
Both Edgar Codd (author of A Relational Model of Data for Large Shared Data Banks) and Peter Chen (author of The Entity-Relationship Model - Toward a Unified View of Data) discuss roles in their papers. I highly recommend studying both, especially since very few online resources ever mention the topic.

Need help designing ERD for food bank

This is my first project outside of school so I'm rusty and lacking practice.
I want to create a database but I'm not sure if I'm doing ok so far. I need a little help with the reports table.
The situation:
The food bank has multiple agencies who hand out food
Each agency has to submit a report with how many families/people served from a long list of zip codes.
I thought of putting a fk on Report to Zips table. But will that make selecting multiple zips impossible?
Maybe I'm way off base. Does someone have a suggestion for me?
A better name for this table might be FoodService or something. I imagine the kind of reports you really want to end up are not just a single row in this table, so naming it Report is a bit confusing.
In any case, each Report is the unique combination of {Agency ID, ZIP code, Date} and of course Agency ID is a foreign key. The other columns in this table would be number of families and people served, as you've indicated. This means you'll have rows for each agency-ZIP-date combination like this:
Agency | ZIP | Date | FamiliesServed | PeopleServed
Agency A | 12345 | Jan-12 | 100 | 245
Agency A | 12340 | Jan-12 | 20 | 31
Agency B | 12345 | Jan-12 | 80 | 178
Agency B | 12340 | Jan-12 | 0 | 0
Are these totals also broken down by "program"? If so, program needs to be part of the primary key for this table. Otherwise, program doesn't belong here.
Finally, unless you're going to start storing data about the ZIP codes themselves, you don't need a table for ZIP codes.
Usually having orphan tables like "Food" is a sign something's missing. If there's that much data involved, you'd think it would link in to the order model in some capacity, or at the very least you'd have some kind of indication as to which agency stocks which kind of food.
What's curiously absent is how data like "families-served" is computed from this schema. There doesn't seem to be a source for this information, not even a "family served" record, or a spot for daily or weekly summaries to be put in and totalled.
A "Zips" table is only relevant if there is additional data that might be linked in by zip code. If you have a lat/long database or demographic data this would make sense. Having an actual foreign key is somewhat heavy handed, though. What if you don't know the zip? What if, for whatever reason, the zip is outside of the USA? How will you handle five and nine digit zip codes?
Since zips are not created by the user, the zips table is mostly auxiliary information that may or may not be referenced. This is a good candidate for an isolated "reference" table.
Remember that the structure of a diagram like this is largely influenced by the front-end of the application. If users are adding orders for food items, that translates into relationships between all three things. If agencies are producing reports based on daily activity logs, then once again you need relationships between those three entities.
The front end is usually based on use-cases, so be sure you're accommodating all of those that are relevant.

Find out the Product Sold

I have a query regarding a table structure.We are using single SQL SERVER 2008 Database for two online selling websites.i.e.,The products which the two websites uses are same,but the description about the products are different.For example,we will sell a "Toy" of same price and model on both websites but with different description.At present I used two different id for websites say,Id "1" for Website 1 and Id as "2" for website 2.And also populated the Product table with diiferent Id's for same product along with the description and website id.
Now the problem is I need to find out how many "Toy" has been sold out in both websites together.
Can any one help me out?Should I introduce a separate table structure to relate the Productid?
It seems to me your choices are
minimal change to achieve your purpose
produce a properly normalised design
A minimal change might be that table you propose
A normalised design might be
product
code
standard_description
standard_price
website
code
description
website_product
website_code
product_code
description
price
order
id
website_code
...
order_line
order_number
line_number
product_code
quantity
...
That way the same product has the same code on both websites but you can have differing descriptions (and prices if necessary)