which structure is better?
table1
postid category1 category2 category3
2 a b d
3 a c null
or
post table
postid
2
3
category_option table
category option
category1 a
category2 b
category3 c
category4 d
option_post table
post option
2 a
2 b
2 d
3 a
3 c
it seems buiding query for first structure is easier than second structure.
These two structures model different things. The first one rigidly allows only (up to) 3 categories (and differentiates between categories by position), while the second one can model any number of categories (which are not distinguished by position). Which one is better really depends on what you are trying to accomplish...
On purely technical level, the second one might require a JOIN for some queries where the first one could satisfy the query from the single (and only) table. Whether this is a problem or not, again, depends on circumstances...
Depends on requirements...
Do you anticipate increasing the number of options over time?
your first option is by far easier to code, the second option is much more modular design and scalable.
It depends greatly on the nature of the categories. If the list is fixed and unlikely to grow, then the first structure works just fine and can be easier to work with. If the list of categories is likely to grow, then the second option will grow better.
It also matters if the category values are sparse. If most of the categories will not have values, then the second approach will take up much less space. If every item will have values in every category, this is not an issue.
It is important in this case to understand what "likely" means. It doesn't mean that you the designer don't think it will grow. It means that the list of categories is well-understood and mature, and so unlikely to grow. I kept looking for examples, but none come to mind.
There are good reasons to select the first, but do so with care - switching to the second option in a production system will be a nightmare.
The second one is better. The first is a violation of First Normal Form:
http://en.wikipedia.org/wiki/First_normal_form#Repeating_groups_across_columns
2nd one better. it's typical many2many with join table case.
if you do it in 1st way, what are you gonna do if there are new category category 4,5,6,7,8... come? add new columns to the table?
And, I don't know if you have a requirement like, "how many posts with category option 'c'"?
2nd one is easy to do the statistic, but the 1st one...
category_option must have category id to make a conjunction table with opton_post otherwise there is no mean to create that 2nd table structure.
two thing you can achieve here with that structure.
1) make a one to one relationship with post to category. (this mean you can add more categories in future if its needed)
2) in this structure null values automatically avoided. this mean no more handling null values in table or in sql queries.
hope this helps.
Related
Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.
This is a tough design question for a application I'm working on. I have 2 different items in my app that both will use comments. What but I can't decide how to design my database.
There are 2 possibilities here. The first is a different comment table for every table that needs comments (normalized way):
movies -> movie_comments
articles -> article_comments
The second way I was thinking of was the use of a generic comments table and then have a many 2 many relationship for the comment and movie|article relations. Eg
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
What is your opinion on that the best method would be and can you give a good reason so I can decide.
i personally opt for 2nd solution
comments
comments_movies (movie_id, comment_id)
comments_articles (article_id, comment_id)
it is much more simple to maintain only on table model for logical Comment model e.g. when You wan't to add some feature to comments You just do it once or when You wan't count comments for specific user is much more easier because there are in one table
of course someone else could write his advantages of keeping that in multiple tables but You asked for opinions so here is mine :)
Keeping them separate has the benefit of supporting change without impacting the comments for the other entity (movie vs articles). Assuming there are differences in attributes for a comment against an article vs. a movie. Otherwise...
I suppose there could be a need for displaying a comment with an article and a movie. But the consolidation would also support if you want to provide comment functionality for other entities in the future.
The answer depends on what you need currently, and a best guess of what you want to do in the future. More details help us to know what to suggest.
There is no "best" method, because it is a straight-forward Normalisation question: the proposal is either correctly Normalised or it is not.
Actually, the first option is not Normalised, the Normalisation is not complete. You have identical repeating groups of columns in two tables which have not been identified and grouped into a single table.
The second option is Normalised. You have identified that, and placed them in a single table.
at the logical level then, you have a many-to-many relation (not a table) between Movie and Comment, and between Article and Comment. End of story at the logical level.
at the physical level, where n::n relations are implemented as Associative tables, you have CommentMovie and CommentArticle.
as the Db expands and grows, life is simple, because:
any new column that is 1::1 with Movie.PK is placed in Movie
any new column that is 1::1 with Article.PK is placed in Article
any new column that is 1::1 with Comment.PK is placed in Comment
any new column that is 1::1 with CommentArticle.PK (the relation; PK is as shown (ArticleId, CommentId) ) is placed in CommentArticle. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.
any new column that is 1::1 with CommentMovie.PK (the relation; PK is as shown (MovieId, CommentId) ) is placed in CommentMovie. This (adding attributes to an n::n relation) will now cause the table to show up on the Logical model.
I would suggest your second choice:
movies -> movie_comments -> comments
articles -> article_comments -> comments
One comments table, two pivot tables(many to many).
This will keep all the same data in one table and just loosely linking them. If you can get away with joins I usually recommend that for things that don't need to scale because joining can be a performance hit and a nightmare in cases. But this would be best for your case.
comment_table
-------------
comment_id (int)
object_id (int)
comment (varchar(max))
type (int)
--------------
object_id refers to object such as movie ,i articles and so on.
type equals 1: comment was done to movie ,
type equals 2: comment was done to article
You can design your tables like this.
i am currently writing a webapp in rails where users can mark items as favorites and also block them. I came up two ways and wondered which one is more common/better way.
1. Separate join tables
Would it be wise to have 2 tables for this? Like:
users_favorites
- user_id
- item_id
users_blocked
- user_id
- item_id
2. single table
users_marks (or so)
- users_id
- item_id
- type (["fav", "blk"])
Both ways seem to have advantages. Which one would you use and why?
The second one has at least the advantage (if the primary key is users_id + item_id) to make sure that no user will have an item both as favorited and blocked.
I suppose I would got with that second solution -- especially considering the two tables, in the first solution, would have the same structure, which seems strange ; and it also allows you to have all the information in the same place, which might help, in some cases (reporting, for instance ? ).
I would go with #2.
It leaves all the appropriate data in a single table.
Otherwise you might have to resort to a union or distinct joins to get a full list of details.
It's just a different status of an item, so #2 will do the job. What would you do if it would be colors? Two different tables? I don't think so ;)
Edit: You might want the status in a different table and link it with a foreign key, but that's up to you. It depend on how many different status you expect to have. Just these two or many others as well?
Let's say I'm making a program for an English class. I'd like to store data in this way:
ID Object
0 Present Tense
1 1st person singular
2 To Be
3 I am
How can I retrieve the value for ID 3 based on IDs 0-2? The only thing I can think of is:
ID Object FromIDs
3 I am 0,1,2
The problem with this is that I'd have to do a fulltext index and I think this table is going to get pretty large. I don't want separate tables for different types of objects, if possible, because I don't know what I'll end up doing with these objects and I want as much flexibility as possible. I could have a second table relating IDs to each other, but I've only done that successfully relating a column from one table to a column to another.
Thanks in advance!
You need to break the data into different tables. Have a table that stores the "tense"
and another that stores the type "1st person singular".
Can you explain your problem a little more. From what you have I'm not sure if you're trying to go down the path of Entity-Attribute-Value or probably what is more likely is that relational database is not a good fit for your problem; you may need to use some sort of tree data structure. If you update, I can try to provide a better answer.
What I've decided to do is a combination of what was suggested and what I originally thought. I'm going to have a master list with IDs that are auto-incremented and copied to other tables. That way, I have properties of different parts of speech separated, but still have everything relating to everything else.
This is really not a good fit for a relational database. Sorry, you're trying to drive a nail using a screwdriver.
When you have no distinction between an attribute type and a value, you're modeling semantic data. The open standard for this type of data modeling is RDF.
My solution (if you really dont want to break up the table)
**ParentChildTable**
ParentID ChildID
0 3
1 3
2 3
But well, in one table now you have:
-type of tense
-type of person (1st, 3rd....)
-values
So i think it would be better to split, i can see .. .well, right now, 3 tables: values, tensetypes, personetypes and relationship table (value-value for tense/person)
I have the following parent <-> child datamodel:
(almost every line is a table, indented means child-of)
consumerGoods
food
meat
item
fruit
item
vegetable
item
The child-items of meat, fruit and vegetables are in the same table (named items) because they have identical attributes. In the items table I have fields that describes the parent and the parentId.
So an item record could be:
id:1
parentType:meat
parentId:4
price:3.25
expDate:2009-12-31
description:bacon
I'm now building a full text MySQL search for the contents of the description field in "items", but I also want each result to have the information of its parent table, so a "bacon-item" has the data that's in its parent record. I also want each returned result to have data that is in the parent food record and the parent consumerGoods record.
I've got the following query now, but I don't know how to join based on the value of a field in a record, or if that's even possible.
SELECT
*
FROM
item
WHERE MATCH
(description
AGAINST
('searchKey')
One way to do this is is to do multiple queries for each matching "item" record, but if I had a lot of results that would be a lot of queries and would also slow down any filtering I'd want to do for facet-based searching. Another option is to make a new table that contains all the parent item info for each item record and search through that, but then I'd have to constantly update that table if I add item records, which is redundant and quite some work.
I'd like to hear it if I'm thinking in the right direction, or if I'm totally misguided. Any suggestions welcome.
As a general rule of thumb your database structure should contain data, but should not itself be data. A sign that you're breaking this is when you feel that you have to join to a different table based on the data you're reading from some other table. At that point you need to back up and consider your overall data model because odds are very good that you're doing something not quite right.
You could join against a subquery containing the union of all parent types:
select *
from item
left join (
select 'meat' as type, Redness, '' as Ripeness from meat
union all
select 'fruit' as type, -1 as Redness, Ripeness from fruit
union all
select 'vegetable' as type, -1 as Redness, Ripeness from vegetable
) parent on parent.type = item.parentType
But if you can, redesign the database. Instead of the complex model, change it to one table of Items and one table of Categories. The categories should contain one row for meat, one for fruit, and one for vegetables.
Since your example is contrived, it's difficult to know what the actual information requirements are in your case. Damir's diagram shows you the correct way to model PKs and FKs when you have a super-type sub-type relationships.
This situation is one case of a pattern called "generalization-specialization". Almost any treatment of object modeling will deal with generalization-specialization, although it may use different terminology. However, if you want to find articles that help you build a relational database that uses specialization-generalization, search for "generalization specialization relational modeling".
The best of the articles will start by teaching you the same concept that Damir's response illustrated for you. From there, you will learn how to create queries and views that can search for either all kinds of items, or for particular kinds of items, if you know what you are searching for.
A sample view follows:
create view FruitItems as
select
c.ConsumerGoodsID,
Price,
Description,
ConsumerGoodType,
ExpiryDate,
FoodType,
IsTropic
from
ConsumerGoods c
INNER JOIN Food f on f.ConsumerGoodsID = c.ConsumerGoodsID
INNER JOIN Fruit fr on fr.ConsumerGoodsID = c.ConsumerGoodsID
Similarly, you could create views for VegetableItems, MeatItems, and HouseSupplyItems, and even one large view, namely Items, that's the union of each of the specialized views.
In the Items view IsTropic would be true for all tropical fruits, false for all non tropical fruits, and null for Meats, Vegetables, and HouseSupplies. I'm not going to show you the entire Item view for a contrived case, but you get the idea. Especially if you read the best of the articles on relational modeling of this pattern.
The Items view might be a little slow, but it could come in handy when you really don't know any better way to search. And if you search for Istropic = True, you'll automatically exclude all the Meats, Vegetables, and HouseSupplies.
As #Andomar suggested, the design is a bit off; having "multiple parent tables" does not map to DB foreign keys concept. Here is one possible suggestion. This one uses two levels of super-type/subtype relationships. Super-type table contains columns specific to all subtypes (categories), while subtype tables contain columns specific only to the category.