Is this a good db architecture and if not what can i change about it - mysql

I found this in an old post and i'm thinking of using it for a project, but i don't know if i should change it or leave it, here's what i want to change:
remove product options and add product id and option group id to options
remove order details and have it's info in orders
is what i'm doing bad? also if you could be kind to tell me some of the best practices for something like this i would greatly appreciate it.
thanks for your time.

Both tables are there for a good reason.
productoptions is a mapping table between options and products: this is a many-to-many relationship, where a given product may have multiple options, and an option may be used by multiple products. If you remove this table, you end up redondantly adding the optionName to each and every row in productoptions that relate to the same option, which is inefficient, and might break data integrity (how do you ensure that a given option always has the same name?).
As for order_details: this is a many to one relationship towards orders. An order may have mutliple details line, each referring to a different product. Removing this table means losing this possibility.

Related

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

Avoiding conditional joins by model design

I'm designing my db and am trying to figure out the best way to avoid conditional joins in the future. I've read articles that show the conditional joins and it is definitely something I want to avoid if possible.
I have a CHECK table, and CHECK will store some data (amount, date, etc). I also have 3 'other' tables, VENDOR, VENDOR_DEPT, VENDOR_ACCOUNT of which VENDOR_ACCOUNT has a fk to VENDOR_DEPT, and VENDOR_DEPT has an fk to VENDOR.
My issue is this: how do I design my model so that CHECK can be either assigned to VENDOR, VENDOR_DEPT or VENDOR_ACCOUNT without having vend_id, vendacct_id and venddept_id in my CHECK table or having a VENDOR_CHECK table that has columns check_id ,vendor_level, join_id.... (hopefully you get the picture)
Is there a cleaner way? BTW, I'm using MYSQL but I'd like the solution to work on other platforms as well.
Since I'm at the model design phase, I'm open to all suggestions including redesigning these tables of course :)
You are trying to implement a "one-of" relationship in SQL. This type of relationship can be a bit challenging. I suppose the "relational" way to solve it is to say that you really have a CHECK_ENTITY, and this entity can be one of the three types. That seems unnecessarily cumbersome, however.
One suggestion is to have the three different columns in the table. My guess is that you will often want to be using vend_id for reporting purposes. Simply populate the appropriate ones for a given CHECK.
Yes, your data is then denormalized, because vend_id would be both in the CHECK table and in the VEND_ACCT table. If accounts and departments change, then this captures the relationship at the time of the CHECK, which may be what you want.
An alternative option is to have a dummy account that means "the entire vendor". Then just use this value to mean the entire vendor. Similarly, you would need accounts for each department.
This approach requires some discipline. It is tempting to set up a vendor hierarchy, with any possible depth, with a link back the parent (accounts --> departments --> vendors, so why not generalize it?). SQL is even worse at hierarchical queries than at one-of relationships. By "worse", I mean that the methods for handling such queries are very database-dependent.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

MySQL: how to do row-level security (like Oracle's Virtual Private Database)?

Say that I have vendors selling various products. So, at a basic level, I will have the following tables: vendor, product, vendor_product.
If vendor-1 adds Widget 1 to the product table, I want only vendor-1 to see that information (because that information is "owned" by vendor-1). Same goes for vendor-2. Say vendor-2 adds Widget 2, only vendor-2 should see that information.
If vendor-1 tries to add Widget 2, which was already entered by vendor-2, a duplicate entry for Widget 2 should not be made in the product table. This means that, somehow, I need to know that vendor-2 now also "owns" Widget 2.
A problem with having multiple "owners" of a piece of information is how to deal owners editing/deleting the data. Perhaps vendor-1 no longer wants Widget 2 to be available to him/her, but that doesn't necessarily apply for vendor-2.
Finally, I want the ability to flag(?) certain records as "yes, I have reviewed this data and it is correct" such that it then becomes available to all the vendors. Say I flag Widget 1 as good data, that product should now be seen by all vendors.
It seems that the solution is row level security. The problem is that I'm not too familiar with its concepts or how to implement it in MySQL. Any help is highly appreciated. Thanks.
NOTE: this problem is somewhat discussed here: Database Design: use composite key as FK, flag data for sharing?. When I asked the question, I wasn't sure how to phrase the question very well. Hopefully, I explained my problem better this time.
Mysql doesn't natively support row level security on tables. However, you can sort of implement it with views. So, just create a view on your table that exposes only the rows you want a given client to see. Then, only provide that client access to those views, and not the underlying tables.
See http://www.sqlmaestro.com/resources/all/row_level_security_mysql/
You already suggested a vendor, product and vendor_product mapping table. You want vendors to share the same product if they both want to use it, but you don't want duplicate products. Right?
If so, then define a unique index/constraint on the natural key that identifies a product (product name?).
If a vendor adds a product, and it doesn't exist, insert it into the product table, and map it to that vendor via the vendor_product table.
If the product already exists, but is mapped to another vendor, do not insert anything into the product table, and add another mapping row mapping the new vendor to the existing product (so that now the product is mapped to two vendors).
Finally, when a vendor removes a product, instead of actually removing it, just delete the vendor_product reference mapping the two. Finally, if no other vendors are still referencing a product, you can remove the product. Alternatively, you could run a script periodically that deletes all products that no longer have vendors referencing them.
Finally, have a flag on the product table that says that you've reviewed the product, and then use something like this to query for products viewable by a given vendor (we'll say vendor id 7):
select product.*
from product
left join vendor_map
on vendor_map.product_id = product.product_id
where vendor_map.vendor_id = 7
or product.reviewed = 1;
Finally, if a product is owned by multiple vendors, then you can either disallow edits or perhaps "split" the single product into a new unique product when one of the owning vendors tries to edit it, and allow them to edit their own copy of the product. They would likely need to modify the product name though, unless you come up with some other natural key to base your unique constraint on.
This sounds to me that you want to normalize your data. What you have is a 1 (product) to many (vendors) relationship. That the relationship is 1:1 for most cases and only 1:n for some doesn't really matter I would say - in general terms it's still 1:n and therefor you should design your database this way. The basic layout would probably be this:
Vendor Table
VendorId VendorName OtherVendorRelatedInformation
WidgetTable
WidgetId WidgetName WidgetFlag CreatorVendor OtherWidgetInformation
WidgetOwnerships
VendorId WidgetId OwnershipStatus OtherInformation
Update: The question of who is allowed to do what is a business problem so you need to have all the rules laid out. In the above structure you can flag which vendor created the widget. And in the ownership you can flag what the status of the ownership is, for example
CreatorFullOwnership
SharedOwnership
...
You would have to make up the flags based on your business rules and then design the business logic and data access part accordingly.

More rows or more tables in a db design?

I'm pretty sure I already know the answer, but would like some confirmation...
We received 220 text files of providers. Each file is a different category of provider. In total there are 3.2 million records.
My inclination is to create a category table and a provider table that links to category by an ID, then index any other columns that may be searched on like state, or even last name.
The other option is to have one table per category, but I think other than the smaller row size there are a lot of disadvantages to this approach.
It's a PHP/MySQL implementation.
Anyone think the separate table option is better for any reason?
Thanks,
D.
Go with two table approach -- categories and providers.
This will enable you to
easily adding new categories
easily reverse search Categories based on a column such as state of provider.
It make sense from data-structure point of view as well. One type of data in one table.
I agree with your original thought, and with Nishant's answer. In addition to his points, it also normalizes the data, and allows easy updates if a category changes names for some reason.