mysql: How can I avoid saving the same data twice? - mysql

I made a voting system and recorded votes in vote table as a row like this:
post_id | user_id | votes | date
Then, I add this vote to user-meta table which has a structure like this:
user_id | meta_key | meta_value
In this table, giving an user_id with the meta_key of "votes", I can save or get an array of post_ids as meta_value.
The reason I save one vote in two tables is--
1. Why use the Vote table?
I need to calculate vote_total and vote_popularity across all posts, this is not easy to do if I save it only as an element of an array in the user's meta table.
2. Why use the user meta table?
I need to check if a user already voted for a post or not. The user meta is cached once set. So, I only need to do a check of in_array($post_id, $my_voted_posts). If without the user-meta table, for each post the user visit, I will have to check the vote table.
Is there a better way?
Although my voting system works fine. But somehow I feel this is not efficient way of doing this. I would like to consult you professionals, to learn how I can improve it for better performance. Thanks!

To start, I think checking the vote table is fine if you have an efficient index. You can use composite indexes to index post_id and user_id, which will make a query like select * from votes where post_id = X and vote_id = Y fairly performant.
There will come a point where you will need to denormalize your data for speed, but I don't think it should be in the same layer as your normalized data. Perhaps you can use redis / memcache for the user metadata?

Related

Liked Posts Design Specifics

So I've found through researching myself that the best way I can design a structure for liking posts is by having a database like the following. Let's say like Reddit, a post can be upvoted, downvoted, or not voted on at all.
The database would then having three columns, [username,post,liked].
Liked could be some kind of boolean, 1 indicating liked, and 0 indicating disliked.
Then to find a post like amount, I would do SELECT COUNT(*) FROM likes WHERE post=12341 AND liked=1 for example, then do the same for liked=0(disliked), and do the addition server side along with controversy percentage.
So I have a few concerns, first off, what would be the appropriate way to find out if a user liked a post? Would I try to select the liked boolean value, and either retrieve or catch error. Or would I first check if the record exist, and then do another select to find out the value? What if I want to check if a user liked multiple posts at once?
Secondly, would this table not need a primary key? Because no row will have the same post and username, should I use a compound primary key?
For performance you will want to alter your database plans:
User Likes Post table
Fields:
Liked should be a boolean, you are right. You can transform this to -1/+1 in your code. You will cache the numeric totals elsewhere.
Username should be UserID. You want only numeric values in this table for speed.
Post should be PostID for the same reason.
You also want a numeric primary key because they're easier to search against, and to perform sub-selects with.
And create a unique index on (Username, Post), because this table is mainly an index built for speed.
So did a user vote on a post?
select id
from user_likes_post
where userID = 123 and postID = 456;
Did the user like the post?
select id
from user_likes_post
where userID = 123 and postID = 456 and liked = true;
You don't need to worry about errors, you'll either get results or you won't, so you might as well go straight to the value you're after:
select liked from user_liked_post where userID=123 and postID=456
Get all the posts they liked:
select postID
from user_likes_post
where userID = 123 and liked = true;
Post Score table
PostID
TotalLikes
TotalDislikes
Score
This second table will be dumped and refreshed every n minutes by calculating on the first table. This second table is your cached aggregate score that you'll actually load for all users visiting that post. Adjust the frequency of this repeat dump-and-repopulate schedule however you see fit. For a small hobby or student project, just do it every 30 seconds or 2 minutes; bigger sites, every 10 or 15 minutes. For an even bigger site like reddit, you'd want to make the schema more complex to allow busier parts of the site to have faster refresh.
// this is not exact code, just an outline
totalLikes =
select count(*)
from user_likes_post
where postID=123 and liked=true
totalDislikes =
select count(*)
from user_likes_post
where postID=123 and liked=false
totalVotes = totalLikes + totalDislikes
score = totalLikes / totalVotes;
(You can simulate an update by involving the user's localStorage -- client-side Javascript showing a bump-up or down on the posts that user has voted on.)
Given your suggested 3-column table and the selects you suggest, be sure to have
PRIMARY KEY(username, post) -- helps with "did user like a post"
INDEX(post_id, liked) -- for that COUNT
When checking whether a user liked a post, either do a LEFT JOIN so that you get one of three things: 1=liked, 0=unliked, or NULL=not voted. Or you could use EXISTS( SELECT .. )
Tables need PKs.
I agree with Rick James that likes table should be uniquely indexed by (username, post) pair.
Also I advise you to let a bit redundancy and keep the like_counter in the posts table. It will allow you to significantly reduce the load on regular queries.
Increase or decrease the counter right after successful adding the like/dislike record.
All in all,
to get posts with likes: plain select of posts
no need to add joins and aggregate sub-queries.
to like/dislike: (1) insert into likes, on success (2) update posts.like_counter.
unique index prevents duplication.
get know if user has already liked the post: select from likes by username+post pair.
index helps to do it fast
My initial thought was that the problem is because boolean type is not rich enough to express the possible reactions to a post. So instead of boolean, you needed an enum with possible states of Liked, Disliked, and the third and the default state of Un-reacted.
Now however it seems, you can do away with boolean too because you do not need to record the Un-reacted state. A lack of reaction means that you do not add the entry in the table.
What would be the appropriate way to find out if a user liked a post?
SELECT Liked
FROM Likes
WHERE Likes.PostId == 1234
AND Likes.UserName == "UniqueUserName";
If the post was not interacted with by the user, there would be no results. Otherwise, 1 if liked and 0 if disliked.
What if I want to check if a user liked multiple posts at once?
I think for that you need to store a timestamp too. You can then use that timestamp to see if it there are multiple liked post within a short duration.
You could employ k-means clustering to figure if there are any "cluster" of likes. The complete explanation is too big to add here.
Would this table not need a primary key?
Of course it would. But Like is a weak entity depending upon the Post. So it would require the PK of Post, which is the field post (I assume). Combined with username we would have the PK because (post, username) would be unique for user's reaction.

MySQL - Best performance between 2 solutions

I need and advice about MySQL.
I have a user table that have id, nickname, numDVD, money and table DVD that have idDVD, idUser, LinkPath, counter.
Now I belive that I could have max. 20 user and each user has about 30 DVD.
So when I insert a DVD I should have idDVD(auto-Increment), idUser (same idUser of User table), LinkPath (generic String), and counter that it is a number from 1 to 30 (unique number) (depends from number or DVD) for each user.
The problem is handle the last column "counter", because I would select for example 2 3 random DVD from 1 to 30 that have the same UserId.
So I was thinking if it's the best solution in my case and hard to handle (for me I never used MySQL) OR it's better create 20 tables (1 for each user) that contains the ID and DVDname etc.
Thanks
Don't create 20 tables! That'd be way overkill, and what if you needed to add more users in the future ? It'd be practically impossible to maintain and update reliably.
A better way would be like:
Table users
-> idUser
-> other user specific data
Table dvd
-> idDvd
-> DVDname
-> LinkPath
-> other dvd specific data (no user data here)
Table usersDvds
-> idUser
-> idDvd
This way, it's no problem if one or more users has the same DVD, as it's just another entry in the usersDvds table - the idDvd value would be the same, but idUser woudl be different. And to count how many DVDs a user has, just do a SELECT count(*) FROM usersDvds WHERE userId = 1
You don't need a table per user, and doing so will make the subsequent SQL programming basically impossible. However with these data volumes practically nothing you do is going to cause or relieve bottlenecks. Very probably the entire database will fit into memory so access via any schema will be practically instantenous.
If I understand your requirements clearly, you should be able to accomplish that by creating a compound index for you to be able to select efficiently.
If there is too much of data that is being handled in that table, then it would help to clear up some historical data.

What is a good way to design a table with relationship between users?

I'm currently working on a social networking site. (Yeah, I know, there's a whole bunch of them. I'm not trying to make Facebook all over)
I was wondering if anyone could tell me if my way of thinking is way off, or if it is the way it is actually done.
I want a user to be able to have friends. And, for that, I'm thinking that I should have one usertable like so:
USER
uId
userName
email
etc..
This should probably be a 1:N relationship, so I'm thinking that a table "contacts" should hold a list of users and their friends like so:
CONTACTS
uId (From USER)
FriendId (From USER table)
Friendship type ENUM[Active, Inactive, Pending]
Would it be an effective solution to sort this table on uId, so that a query result would look something similar to this:
uID | friendId
1 | 2
1 | 6
1 | 97
75 | 1
75 | 34
etc
Or are there any different solutions to this?
If you are simply looking to select a specific users set of friends, the query will be straightforward and you won't have to worry about performance.
For example: If you are looking to return the id's of UID 8's friends, you can just do something like:
Select FriendId FROM TABLE where UID=8;
In your case, since the UID column is not unique, make sure to have an Index on this column to allow quick lookup (optimize performance).
You might also want to think about what other data you will need about the users friends. For example its probably not useful to just grab the FriendIds, you probable want names etc. So your query will likely look more like:
Select FriendId, Users.name FROM Friends JOIN Users ON Users.uid=Friends.FriendId WHERE Friends.UID=8;
Again, having the proper columns indexed is key for optimized lookups, especially once your table size gets big.
Also, since the act of adding friends is likely very uncommon in comparison to the number of lookup queries you do, be sure to choose a database engine that provides the fastest lookup speed. In this case MyISam is probably your best bet. MyISam uses table level locking for inserts (i.e. slower inserts) but the lookups are quick.
Good luck!
I think the best way is without doubt creating a table like you proposed. This will allow you to better manage the friends, do query's for friends on this table, ... this would be the best solution.

A table of friends - store user ids or usernames?

I have a pretty typical user table setup for my web application:
user_id | username
--------------------
0 greg
1 john
... ...
Both fields are indexed and unique, so I can look up a user by id or username quickly.
I want to keep a friends table, and am not sure whether to store the user_id values or usernames in that table:
user_id_1 | user_id_2
--------------------------
or
username_1 | username_2
--------------------------
I am going to want to get a list of friends for a user, so it would be convenient to immediately have the usernames in the friends table instead of doing a join on the users table:
select * from friends where username_1 = 'greg';
If I'm storing user ids, I need to do a join then to get the usernames - will that be costly?:
select * from friends
where user_id_1 = x
join users where user_id = { all those results }
Using user ids allows me to let users change usernames flexibly, but I'm not letting them do that anyway. Any advice would be great.
Thanks
A join on the IDs won't be too bad. The ID may be smaller to store on disk. Also, I would imagine a list of friends would have something other than just user names, in which case, you have to join no matter what.
Well, as you said, using id semantics means you can change the username without having to deal with cascading effects. For most cases PK / UNQ + FK indexes will make joins thundering fast, but you may have a point for huge tables (for which you will eventually need some kind of external index, or other tool anyway).
The ID will be smaller if you use numeric values. Also the index search will be faster. Here you'll find the data types for MySQL 5.0.
Also I don't know how are you using index, but I'd recommend to add and auto-increment field. You can do that to a table, for an integer index like this:
ALTER TABLE `Database`.`tableName` ADD COLUMN `indexName` INTEGER NOT NULL AUTO_INCREMENT

building activity feed

I want to create some kind of 'activity feed'. For example, There are total 1000 users in database, of which there are 100 people in contact list of user X, who is concerned with those 100 users only, and want that if any of them posts a note (in general, takes an action), he wants to get that update on my page. For this purpose, do i need to make a database table, like:
id user_id note_id
In this table, there will be users which are not concerned to user X, so I will make some query like,
select user_id from activity_table which exists in contact list of user X
Is my approach correct regarding this matter (for example database table design and query)?
Is there any better approach?
If I understand you correctly I think you need a relation table where you will store user_ids of the user that is being concerned and of the user that concerns.