Suggestions for revisions to my social network "Updates" table? - mysql

As I continue to work on my social networking site (which I'll probably never finish), I've decided I probably should revise my "Updates" table. If you think of this like Facebook, the Updates table stores stories for the newsfeed, such as User_123 changed his status, or SomeOtherUser added a new photo/video, or YetAnotherUser joined a group.
My current table structure is as follows:
UPDATES
PK Update_ID
Type
Update_Content
FK Photo_ID
FK Video_ID
FK Owner_ID
FK Group_Wall_ID
FK Friend_Wall_ID
Upvotes
Downvotes
Timestamp
As a note, Type refers to the kind of update it is (1 is a status update, 2 is a user joined a group, 3 is a new photo, etc...) and Update_Content is the status text, or a message like "User_123 joined a group"
Right now the way I have it, when a user posts an update to their own "wall", Group_Wall_ID and Friend_Wall_ID are 0 by default. Whereas if that user posts an update to a Group, Group_Wall_ID has a value and Friend_Wall_ID doesn't.
Also, if the update is only a status update, Photo_ID and Video_ID are 0 by default. However, if the update is a new photo, Photo_ID would have a value that corresponds with a PK in the Photos table.
I feel like the structure of this table is pretty inefficient and can use some revisions. Can anyone suggest any revisions to make this table better? Any feedback would be great! Thanks and Happy Holidays!

I don't think this application is a good fit for MySQL. With MySQL you are pulling all the resources together on every read. It also doesnt seem that a feed will need to span very far back chronologically.
I think a better solution is to push an activity to the appropriate feed on write. So if you post a video, it gets appended to all of your friends news feeds. You could limit each feed to 100 items to keep the lists smaller.
I think using redis would be more appropriate. You could have a list for each user's activity feed. LPUSH user_id 'John just added a video`
This solution requires you to have a lot of memory though, and also it may be problematic if a user deletes something from their feed.

Related

Storing count of records in SQL table

Lets say i have a table with posts, and each post has index of topic it belongs to. And i have a table with topics, with integer field, representing number of posts in this topic. When i create new post, i increase this value by 1, and then i delete post, i decrease value by 1.
I do it to not query database each time i need to count number of posts in certain topics.
But i heared that this approach may not be safe to use and actual number of posts in table may not match stored value.
Is there any ceratin info about how safe is it?
Without transactions, the primary issue is timing. Consider a delete and two users:
Time User 1 User 2
1 count = count - 1
2 update finishes How many posts?
3 delete post Count returned
4 delete finishes
Remember that actions such as updates and deletes take a finite amount of time -- even if they take effect all at once. Because of this, User 2 will get the wrong number of posts. This is a race condition; and it may or may not be an issue in your application.
Transactions fix this particular problem, by ensuring that resetting the count and deleting the post both take effect "at the same time".
A secondary issue is data quality. Your data consistency checks are outside the database. Someone can come directly into the database and say "Oh, these posts from user X should be removed". That user might then delete those posts en masse -- but "forget" or not know to change the associated values.
This can be a big issue. Triggers should solve this problem.

What is an efficient way of relating a likes table to a posts, comments and replies table in MySQL?

I am building a simple social networking website (a personal project of mine to help me understand back-end programming more) and as of the moment I am stuck on how I should tackle the above problem.
Right now I have a table for users, posts, comments, replies, post_likes, comment_likes and reply_likes.
As of the moment my system works as follows:
A user creates a post which will then be inserted to the posts table along with that user's id
Whenever a user comments on said post, a row is inserted into the comments table along with the user's id and the post's id
Whenever a user replies to a comment, it is inserted into the replies table together with that user's id as well as the comment's id in which the reply was made
Enter my likes tables which is structured as so...
post_likes
post_id
user_id
like_state
comment_likes
comment_id
user_id
like_state
reply_likes
reply_id
user_id
like_state
You can probably already tell where I am going with this, but each time a user likes a certain post, comment or reply it gets inserted into its respective like table along with that user's id and a like_state to prevent them from liking again.
This all works fine but I am clearly repeating myself which I know is taboo in the programming world. Which leads us to my question, what exactly can I do to remedy this? Although I came up with an idea, I just can't quite figure out how I can structure my question well enough to be able to get any good results from Google (I am not a native English speaker)
PS the solution I came up with is simply creating just one likes table and each row could either have just a post_id (if the user liked a post), a comment_id (if the user liked a comment) or a reply_id (if the user liked a reply), is that possible?
I think your current tables are good. I think your post, comments and replies has a one to many relationship with the likes. The like should go in separate tables. And you are exactly doing that. If you want to combine the likes into one table, then you will need an extra column to track what is belongs to what. And you will also not able to set the foreign key constrain on that table. So, IMO, you are good at this point.

Storing like count for a post in MySQL

Is it a good idea to store like count in the following format?
like table:
u_id | post_id | user_id
And count(u_id) of a post?
What if there were thousands of likes for each post? The like table is going to be filled with billions of rows after a few months.
What are other efficient ways to do so?
In two words answer is : yes , it is OK. (to store data about each like any user did for any post).
But I want just to separate or transform it to several questions:
Q. Is there other way to count(u_id)? or even better:
SELECT COUNT(u_id) FROM likes WHERE post_id = ?
A. Why not? you can save count in your post table and increase/decrease it every time when user like/dislike the post. You can set trigger (stored procedure) to automate this action. And then to get counter you need just:
SELECT counter FROM posts WHERE post_id = ?
If you like previous Q/A and think that it is good idea I have next question:
Q. Why do we need likes table then?
A. That depends of your application design and requirements. According to the columns set you posted : u_id, post_id, user_id (I would even add another column timestamp). Your requirements is to store info about user as well as about post when it liked. That means you can recognize if user already liked this post and refuse multilikes. If you don't care about multilikes or historical timeline and stats you can delete your likes table.
Last question I see here:
Q. The like table is going to be filled with billions of rows after a few months. isn't it?
A. I wish you that success but IMHO you are 99% wrong. to get just 1M records you need 1000 active users (which is very very good number for personal startup (you are building whole app with no architect or designer involved?)) and EVERY of those users should like EVERY of 1000 posts if you have any.
My point here is: fortunately you have enough time till your database become really big and that would hurt your application. Till your table get 10-20M of records you can do not worry about size and performance.

How to use MYSQL to track user likes

For websites like Digg. How can you use MYSQL to track when someone likes an article?
It seems simple enough to just keep track of the total number of likes. The part I don't understand, is how to
1. keep users from only voting on something once and
2. allow users to click on their profile to see the stories they have liked.
Would you have a column in the table containing the story info that you just add comma separated user names? You could keep track of who has liked a story, but the data would get huge, especially for websites like digg that has 100,000 users or more. And how would you allow the user to see all the stories they have liked?
Thank you.
You would need a row for each like. Don't use comma-separated lists.
how to 1. keep users from only voting on something once
Create a unique index on articleid, userid.
And how would you allow the user to see all the stories they have liked?
SELECT articleid FROM likes WHERE userid = 42
but the data would get huge
Yes, it could get huge. Most websites will easily be able to cope with just a single database. Very large websites will need to use a cluster to store data on several machines. The data needs to be partitioned so that the application knows on which server to find the data.
In Social Network these days are like the Graph dataStructure.
Where every entity like people,photo,video,status-updates, comments etc are nodes of the graph and likes,unlikes are connections between two nodes.
ideally you would have a Table for Likes where you would just add a like.
where you would store who liked, what is liked in columns and other info.
Complex social networks do more than just this.
You can store the likes in a seperate table called story_likes with two columns : story_id and user_id.
1) Put a constraint in the database that the combination of these should be unique. That way your user can like a story only once.
2) You can pull the stories that the user likes from this table and pull other story details using the story id you have. 100,000 rows is not that big for a MYSQL database.
You can also allow your users to dislike a story by having a column for state=ENUM('LIKED', 'DISLIKED').

Where to store users visited pages?

I have a project, where I have posts for example.
The task is next: I must show to user his last posts visit.
This is my solution: every time user visits new (for him) topic, I create a new record in table visits.
Table visits has next structure: id, user_id, post_id, last_visit.
Now my tables visits has ~14,000,000 records and its still growing every day..
May be my solution isnt optimal and exists another way how to store users visits?
Its important to save every visit as standalone record, because I also have feature to select and use users visits. And I cant purge this table, because data could be needed later month, year. How I could optimize this situation?
Nope, you don't really have much choice other than to store your visit data in a table with columns for (at a bare minimum) user id, post id, and timestamp if you need to track the last time that each user visited each post.
I question whether you need an id field in that table, rather than using a composite key on (user_id, post_id), but I'd expect that to have a minor effect, provided that you already have a unique index on (user_id, post_id). (If you don't have an index on that pair of fields, adding one should improve query performance considerably and making it a unique index or composite key will protect against accidentally inserting duplicate records.)
If performance is still an issue despite proper indexing, you should be able to improve it a bit by segmenting the table into a collection of smaller tables, but segment it by user_id or post_id (rather than by date as previous answers have suggested). If you break it up by user or post id, then you will still be able to determine whether a given user has previously viewed a given post and, if so, on what date with only a single query. If you segment it by date, then that information will be spread across all tables and, in the worst-case scenario of a user who has never previously viewed a post (which I expect to be fairly common), you'll need to separately query each and every table before having a definitive answer.
As for whether to segment it by user id or by post id, that depends on whether you will more often be looking for all posts viewed by a user (segment by user_id to get them all in one query) or all users who have viewed a post (segment by post_id).
If it doesn't need to be long lasting, you could store it in session instead. If it does, you could either break the records apart by table, like say 1 per month, or you could only store the last 5-10 pages visited, and delete old ones as new ones come in. You could also change it to pages visited today, this week, etc.
If you do need all 14 million records, I would create another historical table to archive the visits that are not the most relevant for the day-to-day site operation.
At the end of the month (or week, or quarter, etc...) have some scheduled logic to archive records beyond a certain cutoff point to the historical table and reduce the number of records in the "live" table. This should help increase the query speed on the "live" table since you would have less records in it.
If you do need to query all of the data, you can use both tables and have all of the data available to you.
you could delete the ones you don't need - if you only want to show the last 10 visited posts then
DELETE FROM visits WHERE user_id = ? AND id NOT IN (SELECT id from visits where user_id = ? ORDER BY last_visit DESC LIMIT 0, 10);
(i think that's the best way to do that query, any mysql guru can tell me otherwise? you can ORDER BY in DELETE but the LIMIT only takes 1 parameter, so you can't do LIMIT 10, 100 there)
after inserting/updating each new row, or every few days if you like
Having a structure like (id, user_id, post_id, last_visit) for your vists table, makes it appear as though you are saving all posts, not just last post per Topic. Don't you need a topic ID in there somewhere so that you can determine what there last post PER TOPIC was, and so you know which row to replace when they post in the same topic more than once?
Store post_ids to $_SESSION and then using MYSQL IN with one SELECT query you will be able to show his visited posts. But all those ids will be destroyed after member close his browser, but anyways, this is much more faster and optimal than using database.
edit: sorry, I didn't notice you that you must store that records in database and use it after months. Then I have no idea how to optimize it, but with 14 mln. records you should definitely use indexes.