How to properly design a simple favorites and blocked table? - mysql

i am currently writing a webapp in rails where users can mark items as favorites and also block them. I came up two ways and wondered which one is more common/better way.
1. Separate join tables
Would it be wise to have 2 tables for this? Like:
users_favorites
- user_id
- item_id
users_blocked
- user_id
- item_id
2. single table
users_marks (or so)
- users_id
- item_id
- type (["fav", "blk"])
Both ways seem to have advantages. Which one would you use and why?

The second one has at least the advantage (if the primary key is users_id + item_id) to make sure that no user will have an item both as favorited and blocked.
I suppose I would got with that second solution -- especially considering the two tables, in the first solution, would have the same structure, which seems strange ; and it also allows you to have all the information in the same place, which might help, in some cases (reporting, for instance ? ).

I would go with #2.
It leaves all the appropriate data in a single table.
Otherwise you might have to resort to a union or distinct joins to get a full list of details.

It's just a different status of an item, so #2 will do the job. What would you do if it would be colors? Two different tables? I don't think so ;)
Edit: You might want the status in a different table and link it with a foreign key, but that's up to you. It depend on how many different status you expect to have. Just these two or many others as well?

Related

How to set up relational database tables for this many-to-many relationship?

I have a type of data called a chain. Each chain is made up of a specific sequence of another type of data called a step. So a chain is ultimately made up of multiple steps in a specific order. I'm trying to figure out the best way to set this up in MySQL that will allow me to do the following:
Look up all steps in a chain, and get them in the right order
Look up all chains that contain a step
I'm currently considering the following table set up as the appropriate solution:
TABLE chains
id date_created
TABLE steps
id description
TABLE chains_steps (this would be used for joins)
chain_id step_id step_position
In the table chains_steps, the step_position column would be used to order the steps in a chain correctly. It seems unusual for a JOIN table to contain its own distinct piece of data, such as step_position in this case. But maybe it's not unusual at all and I'm just inexperienced/paranoid.
I don't have much experience in all this so I wanted to get some feedback. Are the three tables I suggested the correct way to do this? Are there any viable alternatives and if so, what are the advantages/drawback?
You're doing it right.
Consider a database containing the Employees and Projects tables, and how you'd want to link them in a many-to-many fashion. You'd probably come up with an Assignments table (or Project_Employees in some naming conventions).
At some point you'd decide you want not only to store each project assignment, but you'd also want to store when the assignment started, and when it finished. The natural place to put that is in the assignment itself; it doesn't make sense to store it either with the project or with the employee.
In further designs you might even find it necessary to store further information about the assignment, for example in an employee review process you may wish to store feedback related to their performance in that project, so you'd make the assignment the "one" end of a relationship with a Review table, which would relate back to Assignments with a FK on assignment_id.
So in short, it's perfectly normal to have a junction table that has its own data.
That looks fine, and it's not unusual for the join table to contain a position/rank field.
Look up all steps in a chain, and get them in the right order
SELECT * FROM chains_steps
LEFT JOIN steps ON steps.id = chains_steps.step_id
WHERE chains_steps.chain_id = ?
ORDER BY chains_steps.step_position ASC
Look up all chains that contain a step
SELECT DISTINCT chain_id FROM chains_steps
LEFT JOIN chains ON chains.id = chains_steps.chain_id
I think that the plan you've outlined is the correct approach. Don't worry too much about the presence of step_position on your mapping table. After all the step_position is a bit of data that is directly related to a step in the context of a chain. So the chains_steps table is the right place for it IMHO.
Some things to think about:
Foreign keys - use 'em!
Unique key on the chains_steps table - can a step be present in more than one position in a single chain? What about in different chains?
Good luck!

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

MySQL table design, one row or more pr user?

Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.

Keep first of duplicate records and delete the rest

This question does pretty much what I want to accomplish, but my table is more complicated and does not have a primary key. I also don't quite understand the top answer, what the t1 and t2 mean. If this answer can be applicable to me, would appreciate if someone explain the code.
I have several months' tables that contain info on clients and the policies they hold. Every client has a unique policy ID, but they can have multiple policies, resulting in multiple records under the same policy ID. The duplicate records can be completely different or exactly the same in every field.
For my purposes, I want to keep only one record for each policy ID. Ideally the record kept is the one with the highest Age, but does not need to if it's too complicated. Note there may be more than one record with the age that is the max for that particular Policy ID, then it doesn't matter which one of those we keep.
I do not plan on creating a primary key because there are some cases when I will be keeping two records under the same policy ID, which I will make the modification to the code myself. I also don't want to create another table because I am working with 10+ tables. Someone suggested using first(), but I'm not sure how to incorporate it into a query.
Please let me know if you need any additional information, and thank you for your help in advance!
=========UPDATE #1
Okay, looks like my question was a bit unrealistic, so I will add an autonumber primary key. How will I proceed with that?
Something on these lines:
DELETE Policies.*
FROM Policies
WHERE Policies.ID Not In (
SELECT TOP 1 id
FROM policies p
WHERE p.policyid = policies.policyid
ORDER BY createdate DESC, id )

MySQL: Table structure for a user's "views"

I've got a question to which I've had opposing pieces of advice, would appreciate additional views.
My site has users, each with a user_id. These users can view products, and I need to keep track of the unique instances of users viewing specific products. To record a view in a separate views table, I've currently got two options:
OPTION 1:
view_id (INT,PK) | user_id (INT,FK) | product_id (INT,FK) | view_date
... and create a unique constraint over the two middle columns for easy updating with ON DUPLICATE KEY. If the same view already exists, I just update view_date. If not, I write a new row.
OPTION 2:
user_product (VARCHAR20,PK) | view_date
... merge the two ids into a VARCHAR with a separator in the middle, and use the primary key column for easy updating with ON DUPLICATE KEY in the same way as above.
The structure should accommodate up to approx. million unique views. Any thoughts on which option might be better or worse, and why? Big thanks in advance.
EDIT:
Thanks for the answers, seems like there's a consensus. Was leaning to the same side but just needed the reassurance.
I like the first option better - in general, its good to maintain as much atomicity as possible. If you ever want to query for all of a user's views, or something like that, it would be more difficult to do after merging two columns into one (you would need to use LIKE with a wildcard match, which will never be as fast as an indexed single-valued column). You also lose the ability to index on different fields.
Also, there is no reason why you couldnt have a primary or unique key that involved multiple columns, so I see no advantage to option 2. To perform your update, just use REPLACE (documentation) instead of INSERT - this will allow you to easily maintain your invariant of having only one row per user/product combination.
I think that the first option is your better choice. Later down the line I think it will make querying for different things a bit easier. Queries will likely be faster as well since there won't be string manipulation involved. Further, you can have a primary key over multiple columns if you need.
Definitely go for the first option. The second option will mean many queries from hell if you need to make reports to look for particular groups of users (get me all users that often view product X and product Y so we can offer them a discount), same for looking for specific groups of products (which products are often viewed by the same users, so we can launch a discount promotion)
I understand that it is not a requirement to remember all individual views. But I would certainly capture the number of times they visited the product - this is almost free, as you can keep a running total (insert 1 , on duplicate key update view_count = view_count + 1)