1 table or 2 tables? - mysql

I have table called UserComments.
It contains 3 columns:
id, user_id, and comment_id.
I query this table 2 separate ways.
1 by user id and
1 by comment id. Both of these fields are indexed.
I want to add an additional column tags.
I will only need this column when querying by comment id.
Does it make more sense to add the column to the existing table (and not return it back to avoid data transfer)?
OR
Create a new table and perform the join when necessary?
Why is 1 better than the other?

You should use a separate table for the specific purpose of tags.
Lets take this stack overflow question as an example. You have created a question with 3 tags. This means that ONE comment has THREE tags or in other words a one-to-many relationship.
The proper way to model one-to-many is with a separate table. Now, lets look at the differences.
One Table:
You will have one table. You will not be able to model a one-to-many so you will have to create your own method for having multiple tags such as a CSV for the tags.
example:
id, user_id, comment_id, tags
'2', '276', '2738', 'mysql,sql,sql-server'
Can you see how this is getting confusing already? You will need to write your own code to parse out the csv. Now, imagine you wanted to search by tag. Oh man... the nightmare that will become.. and the slowness if you use a sql regex or like...
On the other hand, a two table would have a second table
comment_id, tag
123, mysql
123, sql
123, sql-server
You grab all entries with 123, you have your list. Now if you want to search by tag, EASY.
My guess is you already have a separate table somewhere else for users, and you grab all users comments using this comment table. You did that inherently because users and comments are a one-to-many relationship. Same concept here.

Adding as answer because consensus agrees:
Generally speaking, more tables is better. Reason being, you want to avoid redundant data. Your User table should be on it's own. Your comments table should have it's own ID and a field for the UserID - join on that. And subsequent things you need that are not comments or new users should have their own tables with the same scheme.
From this you will have the benefit of having your Users sitting on their own, and be able to easily join each user to an indefinite number of comments with no redundancy.

I would do something like this. I would create a table just for tags rather then having a column containing n instances of say 'sql-server' tag when you can related it to a Tag table. So sql-server has an id of 1. int 1 over varchar 'sql'server' takes less space plus allows easy expand on.
Comment
CommentID
..etc
UserComment
UserCommentID
CommentID
UserID
CommentTag
CommentTagID
UserCommentID
TagID
Tag
TagID
Description

Related

Is it good to have a table with more rows or more tables with less rows in a database?

I am building a database for my application using Mysql, contains 2 tables in which one table will have user details and other table will have all user's activities(say posts,comments,..). I have 2 approaches for this PS.
Group all users activities under one table(say useractivities).
Maintain specific activities table for each user(say user1activity,user2activity,...).
If we go with approach 1, it builds time complexity in case of more users.
with approach 2, eats up database. which design will show less time and space complexity?
For better database maintain, you have to go with the first approach because you can normalize data easily.. and the perfect way to manage database structure, Need to take care of below points
You have to give proper indexing in user_id field for fast result in join query.
In case of large number of records in one table, then you can create another table like user_activities_archive for store old activities. in the regular period, you can move an old record from user_activities to user_activities_archive
You can create multiple tables for user_posts, user_comments instead of user_Activities for more splitting data and different structures of the table, for example you can manage replyto_id in the comment table and user_post table might have title field.
In the second approach for cerate tables for each user, there are many limitations like
Very hard in case of Table Joining with other tables
In case of fetch all user's activity records, you cant do it.
A number of the user base of your application.
Limitation of a number of tables in the database.
Create more complexity in edit update or delete user records.
If the user is not active (just registered) then separate user table useless.
As juergen d mentioned in the comment, approach 2 should not be used.
However I would consider splitting useractivities into different tables if the possible user activites are different from each other to avoid unneccessary column.
Example: A comment table with information about who made the comment (foreign key to user table) and the comment itself. + A foreign key to another user activity to wich the comment was made.
The comment column in the above table does not make sence for say, just a like of a post, so I would have created a different table for likes.

Database Design: Multiple join tables or one table with identifying 'table' column

I'm designing a database where any content can be tagged and I'm likely to want to be able to select all content with a specific tag.
I'm struggling with the following two options and would appreciate some advice. If there's a better way please let me know.
Option A
Multiple 'many to many' join tables.
tag:
id
tag
media:
id
title
src
creation
media_tags:
id
media_id
tag_id
article:
id
title
content
creation
article_tags:
id
article_id
tag_id
Options B
A single 'tag reference' table, which uses a 'table' column to identify which table to join to.
tag:
id
tag
tag_reference:
id
row_id
tag_id
table
media:
id
title
src
creation
article:
id
title
content
creation
From a maintenance point of view option B seems favorable but considering the SQL query to select all content and don't think it's possible without multiple queries.
When using Option B, you can't set up foreign keys to the other tables. Thus I would go with Option A and one table for each m:n relation.
"From a maintenance point of view option B" – is a nightmare. What happens if you delete an article? All the rows with that row_id will persist in tag_reference table. You always need to update those entries manually.
Option B contains a multivalued dependency - and as such is breach of 4th normal form. I much prefer Option A
Actually, It depends on every sql developer. But I prefer Option A since you can easily know that a certain column in a table is a foreign key (assuming it's true) of the other table.
Option B is somewhat bad design because storing table names in a column is bad idea. You can spend more IF or CASE here.
The second option pretty much prevents you using any JOINs for efficient SQL, forcing you into using slow multiple selects.
So I would say the first option is far preferable.

How to structure "categories" data in the database?

I have a website for which I am building in "categories" which would work pretty much like the tags of StackOverflow.
What I am confused about it how to best structure the tables for this sort of a thing. For example, I know I'd need a table to structure the actual categories like the name, who made it, what date it was made, etc.
What I am not sure about is: when a record gets n number of different categories, how to store that in the database. Should I have the record_ids in the item table to which the categories belong, and just comma-separate the its? Or should I have a separate table with something like item_categories with item_id, category_id, etc...and just join that table and the item table, and the categories table when getting the category?
The ladder seems slow because of the join, but more organized and clean.
Or is there another way to structure this that I have not thought of? How is a good way to go about structuring this sort of data?
Make three tables. One for the page, one for the categories (along with meta information etc), and one to bind them together. That last table only need to have a pageid and a categoryid, to link records from both tables together.
Don't ever store comma separated values in a database, if you need to use those in joins or searches.
You should use a separate table like you say. It's called normalization.
If you're using a column with comma separated values think of the performance when accessing the values versus doing a join. You will have to split every value and then do a comparison to see if there's a match.
Or should I have a separate table with something like item_categories with item_id, category_id, etc...and just join that table and the item table, and the categories table when getting the category?
Yes, this. That's a classic M:N relationship in SQL.

Shared Primary Key

I would guess this is a semi-common question but I can't find it in the list of past questions. I have a set of tables for products which need to share a primary key index. Assume something like the following:
product1_table:
id,
name,
category,
...other fields
product2_table:
id,
name,
category,
...other fields
product_to_category_table:
product_id,
category_id
Clearly it would be useful to have a shared index between the two product tables. Note, the idea of keeping them separate is because they have largely different sets of fields beyond the basics, however they share a common categorization.
UPDATE:
A lot of people have suggested table inheritance (or gen-spec). This is an option I'm aware of but given in other database systems I could share a sequence between tables I was hoping MySQL had a similar solution. I shall assume it doesn't based on the responses. I guess I'll have to go with table inheritance... Thank you all.
It's not really common, no. There is no native way to share a primary key. What I might do in your situation is this:
product_table
id
name
category
general_fields...
product_type1_table:
id
product_id
product_type1_fields...
product_type2_table:
id
product_id
product_type2_fields...
product_to_category_table:
product_id
category_id
That is, there is one master product table that has entries for all products and has the fields that generalize between the types, and type-specified tables with foreign keys into the master product table, which have the type-specific data.
A better design is to put the common columns in one products table, and the special columns in two separate tables. Use the product_id as the primary key in all three tables, but in the two special tables it is, in addition, a foreign key back to the main products table.
This simplifies the basic product search for ids and names by category.
Note, also that your design allows each product to be in one category at most.
It seems you are looking for table inheritance.
You could use a common table product with attributes common to both product1 and product2, plus a type attribute which could be either "product2" or "product1"
Then tables product1 and product2 would have all their specific attributes and a reference to the parent table product.
product:
id,
name,
category,
type
product1_table:
id,
#product_id,
product1_specific_fields
product2_table:
id,
#product_id,
product2_specific_fields
First let me state that I agree with everything that Chaos, Larry and Phil have said.
But if you insist on another way...
There are two reasons for your shared PK. One uniqueness across the two tables and two to complete referential integrity.
I'm not sure exactly what "sequence" features the Auto_increment columns support. It seem like there is a system setting to define the increment by value, but nothing per column.
What I would do in Oracle is just share the same sequence between the two tables. Another technique would be to set a STEP value of 2 in the auto_increment and start one at 1 and the other at 2. Either way, you're generating unique values between them.
You could create a third table that has nothing but the PK Column. This column could also provide the Autonumbering if there's no way of creating a skipping autonumber within one server. Then on each of your data tables you'd add CRUD triggers. An insert into either data table would first initiate an insert into the pseudo index table (and return the ID for use in the local table). Likewise a delete from the local table would initiate a delete from the pseudo index table. Any children tables which need to point to a parent point to this pseudo index table.
Note this will need to be a per row trigger and will slow down crud on these tables. But tables like "product" tend NOT to have a very high rate of DML in the first place. Anyone who complains about the "performance impact" is not considering scale.
Please note, this is provided as a functioning alternative and not my recommendation as the best way
You can't "share" a primary key.
Without knowing all the details, my best advice is to combine the tables into a single product table. Having optional fields that are populated for some products and not others is not necessarily a bad design.
Another option is to have a sort of inheritence model, where you have a single product table, and then two product "subtype" tables, which reference the main product table and have their own specialized set of fields. Querying this model is more painful than a single table IMHO, which is why I see it as the less-desirable option.
Your explanation is a little vague but, from my basic understanding I would be tempted to do this
The product table contains common fields
product
-------
product_id
name
...
the product_extra1 table and the product_extra2 table contain different fields
these tables habe a one to one relationship enforced between product.product_id and
product_extra1.product_id etc. Enforce the one to one relationship by setting the product_id in the Foreign key tables (product_extra1, etc) to be unique using a unique constraint.
you will need to decided on the business rules as to how this data is populated
product_extra1
---------------
product_id
extra_field1
extra_field2
....
product_extra2
---------------
product_id
different_extra_field1
different_extra_field2
....
Based on what you have above the product_category table is an intersecting table (1 to many - many to 1) which would imply that each product can be related to many categories
This can now stay the same.
This is yet another case of gen-spec.
See previous discussion

Different database tables joining on single table

So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate