database schema of "don't show me again what I saw before" in mysql - mysql

I'm making a website that shows you images. And special feature of site is "don't show me again what I saw before". It means, if you see a image, it goes to your "archive" category. There will be so many images and categories. And I need to very smooth schema of database to perfomance.
When you click a image, it appears on lightbox and in the lightbox code it sends request with ajax to make this image archived just for you.
Is that database schema above performanceful for about 5.000 images and 20.000 users?
users
user_id
user_email
pictures
picture_id
picture_url
tags
archived
user_id
picture_id
images will appear on front of you with excepting archived images for you from all images on this schema...

This is a diificult question to answer without knowing all the details. You mention how many users and images there will be. How many images will each user (on average) have in their archived list? If that number is small, the archived table won't approach 100M rows.
100M rows should not be a problem by itself, as the database can handle this. The concern may (or will) be with the way you are going to want to query the data. Something like:
SELECT
*
FROM
picture
WHERE
picture_id NOT IN
(
SELECT picture_id FROM archived WHERE user_id = [userIdParameter]
)
That will likely not perform very well with 100M rows.
Another option would be to cross join users and pictures so that the archived table always contains a Cartesian product. So the table would be:
archived
user_id
picture_id
visited
Then you could query like so:
SELECT
p.*
FROM
picture p
INNER JOIN archived a ON p.picture_id = a.picture_id
WHERE
a.user_id = [userIdParameter]
AND a.visited = [false]
This should perform acceptably with proper indexing, but would present the problem of having to make sure rows are created in the archived table any time a user or picture is added to the system. It also means you would always have a number of rows equal to pictures * users (100M in your example). That may not be desirable in your case.
Bottom line, you are going to have to create some test data that approximates your expected volume and do some performance testing that approximates your load. If you think this is the critical potential performance bottleneck for your system, it will be worth the time investment.

I used "NOT IN" solution for a while and there is performance problems started. Because I don't have a strong server to execute that query with lot of datas.
So, I found the most performanceful answer : "Collection Shuffle"
I'm shuffleing the collection with a userid seed and saving just users last image index id. After user comes back, looking to where this user's index id left lastly, showing next id from his collection.
This is really light and exactly solution. Thanks for everyone :)

Related

Storing like count for a post in MySQL

Is it a good idea to store like count in the following format?
like table:
u_id | post_id | user_id
And count(u_id) of a post?
What if there were thousands of likes for each post? The like table is going to be filled with billions of rows after a few months.
What are other efficient ways to do so?
In two words answer is : yes , it is OK. (to store data about each like any user did for any post).
But I want just to separate or transform it to several questions:
Q. Is there other way to count(u_id)? or even better:
SELECT COUNT(u_id) FROM likes WHERE post_id = ?
A. Why not? you can save count in your post table and increase/decrease it every time when user like/dislike the post. You can set trigger (stored procedure) to automate this action. And then to get counter you need just:
SELECT counter FROM posts WHERE post_id = ?
If you like previous Q/A and think that it is good idea I have next question:
Q. Why do we need likes table then?
A. That depends of your application design and requirements. According to the columns set you posted : u_id, post_id, user_id (I would even add another column timestamp). Your requirements is to store info about user as well as about post when it liked. That means you can recognize if user already liked this post and refuse multilikes. If you don't care about multilikes or historical timeline and stats you can delete your likes table.
Last question I see here:
Q. The like table is going to be filled with billions of rows after a few months. isn't it?
A. I wish you that success but IMHO you are 99% wrong. to get just 1M records you need 1000 active users (which is very very good number for personal startup (you are building whole app with no architect or designer involved?)) and EVERY of those users should like EVERY of 1000 posts if you have any.
My point here is: fortunately you have enough time till your database become really big and that would hurt your application. Till your table get 10-20M of records you can do not worry about size and performance.

member action table data model suggestion

I'm trying to add an action table, but i'm currently at odds as to how to approach the problem.
Before i go into more detail.
We have members who can do different actions on our website
add an image
update an image
rate an image
post a comment on image
add a blog post
update a blog post
comment on a blog post
etc, etc
the action table allows our users to "Watch" other member's activities if they want to add them to their watch list.
I currently created a table called member_actions with the following columns
[UserID] [actionDate] [actionType] [refID]
[refID] can be a reference either to the image ID in the DB or blogpost ID, or an id column of another actionable table (eg. event)
[actionType] is an Enum column with action names such as (imgAdd,imgUpdate,blogAdd,blogUpdate, etc...)
[actionDate] will decide which records get deleted every 90 days... so we won't be keeping the actions forever
the current mysql query i cam up with is
SELECT act.*,
img.Title, img.FileName, img.Rating, img.isSafe, img.allowComment AS allowimgComment,
blog.postTitle, blog.firstImageSRC AS blogImg, blog.allowComments AS allowBlogComment,
event.Subject, event.image AS eventImg, event.stimgs, event.ends,
imgrate.Rating
FROM member_action act
LEFT JOIN member_img img ON (act.actionType="imgAdd" OR act.actionType="imgUpdate")
AND img.imgID=act.refID AND img.isActive AND img.isReady
LEFT JOIN member_blogpost blog ON (act.actionType="blogAdd" OR act.actionType="blogUpdate")
AND blog.id=act.refID AND blog.isPublished AND blog.isPublic
LEFT JOIN member_event event ON (act.actionType="eventAdd" OR act.actionType="eventUpdate")
AND event.id=act.refID AND event.isPublished
LEFT JOIN img_rating imgrate ON act.actionType="imgRate" AND imgrate.UserID=act.UserID AND imgrate.imgID=act.refID
LEFT JOIN member_favorite imgfav ON act.actionType="imgFavorite" AND imgfav.UserID=act.UserID AND imgfav.imgID=act.refID
LEFT JOIN img_comment imgcomm ON (act.actionType="imgComment" OR act.actionType="imgCommentReply") AND imgcomm.imgID=act.refID
LEFT JOIN blogpost_comment blogcomm ON (act.actionType="blogComment" OR act.actionType="blogCommentReply") AND blogcomm.blogPostID=act.refID
ORDER BY act.actionDate DESC
LIMIT XXXXX,20
Ok so basically, given that i'll be deleting actions older than 90 days every week or so... would it make sense to go with this query for displaying the member action history?
OR should i add a new text column in member_actions table called [actionData] where i can store a few details in json or xml format for fast querying of the member_action table.
It adds to the table size and reduces query complexity, but the table will be purged from periodically from old entries.
the assumption is that eventually we'll have no more than a few 100k members so would i'm concerned about the table size of the member_action table with it's text [actionData] column that will contain some specific details.
I'm leaning towards the [actionData] model but any recommendations or considerations will be appreciated.
another consideration is that it's possible that the table entries for img or blog could get deleted... so i could have action but no reference record...this sure does add to the problem.
thanks in advance
Because you are dealing with user interface issues, performance is key. All the joins will do take time, even with indexes. And, querying the database is likely to lock records in all the tables (or indexes), which can slow down inserts.
So, I lean towards denormalizing the data, by maintaining the text in the record.
However, a key consideration is whether the text can be updated after the fact. That is, you will load the data when it is created. Can it then change? The problem of maintaining the data in light of changes (which could involve triggers and stored procedures) could introduce a lot of additional complexity.
If the data is static, this is not an issue. As for table size, I don't think you should worry about that too much. Databases are designed to manage memory. It is maintaining the table in a page cache, which should contain pages for currently active members. You can always increase memory size, especially for 100,000 users which is well within the realm of today's servers.
I'd be wary of this approach - as you add kinds of actions that you want to monitor the join is going to keep growing (and the sparse extra columns in the select statement as well).
I don't think it would be that scary to have a couple of extra columns in this table - and this query sounds like it would be running fairly frequently, so making it efficient seems like it would be a good idea.

Suggestions for revisions to my social network "Updates" table?

As I continue to work on my social networking site (which I'll probably never finish), I've decided I probably should revise my "Updates" table. If you think of this like Facebook, the Updates table stores stories for the newsfeed, such as User_123 changed his status, or SomeOtherUser added a new photo/video, or YetAnotherUser joined a group.
My current table structure is as follows:
UPDATES
PK Update_ID
Type
Update_Content
FK Photo_ID
FK Video_ID
FK Owner_ID
FK Group_Wall_ID
FK Friend_Wall_ID
Upvotes
Downvotes
Timestamp
As a note, Type refers to the kind of update it is (1 is a status update, 2 is a user joined a group, 3 is a new photo, etc...) and Update_Content is the status text, or a message like "User_123 joined a group"
Right now the way I have it, when a user posts an update to their own "wall", Group_Wall_ID and Friend_Wall_ID are 0 by default. Whereas if that user posts an update to a Group, Group_Wall_ID has a value and Friend_Wall_ID doesn't.
Also, if the update is only a status update, Photo_ID and Video_ID are 0 by default. However, if the update is a new photo, Photo_ID would have a value that corresponds with a PK in the Photos table.
I feel like the structure of this table is pretty inefficient and can use some revisions. Can anyone suggest any revisions to make this table better? Any feedback would be great! Thanks and Happy Holidays!
I don't think this application is a good fit for MySQL. With MySQL you are pulling all the resources together on every read. It also doesnt seem that a feed will need to span very far back chronologically.
I think a better solution is to push an activity to the appropriate feed on write. So if you post a video, it gets appended to all of your friends news feeds. You could limit each feed to 100 items to keep the lists smaller.
I think using redis would be more appropriate. You could have a list for each user's activity feed. LPUSH user_id 'John just added a video`
This solution requires you to have a lot of memory though, and also it may be problematic if a user deletes something from their feed.

Database model for individual users seeing notice

I am looking for the best solution for the way the mySQL db should be set up for my app.
My app works like a noticeboard with two sections, "New Notices" and "Seen Notices".
Now when a user has viewed a notice, they click a button and it moves from New to Seen. But ONLY for this person.
Each person will have all of the notices viewable - but not necessarily in the same sections - as users will view them at different times and check them off as seen at different times.
My guess is having one table "Notices" for all notices, and a seperate table called "Seen" with the rows "UserID" and "noticeID". This means that for each notice it will need to consult the "Seen" table to find out if it should be shown or not. Is this ideal or is there another way?
Having a table with NoticeID and UserID is correct, I'd also add viewed date.
You can use 3 tables
Users
Notices
SeenNotices(maybe not the best name)
In the SeenNotices table have three columns UserID, NoticesID, HaveSeen. The have HaveSeen column will tell you if the user has seen it.
The way you are thinking should work, although over time you'll end up with a very big 'Seen' table, which is not scalable. An easy alternative is to use 'Unseen' table instead. This way the table gets smaller as people view the notice and you can also delete very old entries (old notices may no longer relevant so doesn't matter if they are not shown as Unseen to user).
Using the 'unseen' table your query will look like this:
SELECT n.notice_id, n.notice_msg, IF(u.user_id, 'new', 'seen') AS status
FROM notice n
LEFT JOIN unseen u ON (u.user_id = $user_id AND n.notice_id = u.notice_id)
WHERE user_id = $user_id;

MYSQL Database Schema Question

I need opinions on the best way to go about creating a table or collection of tables to handle this unique problem. Basically, I'm designing this site with business profiles. The profile table contains all your usual things such as name, uniqueID, address, ect. Now, the whole idea of the site is that it's going to be collecting a small string of informative text. I want to allow the clients to be able to store one per date, with as many as 30 days in advance. The program is only going to show the information from the current date on forward, with expired dates not being shown.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text, but this creates pretty extensive queries. Eventually this table is going to be at least 20 times larger than the table of businesses in the first place as these businesses are going to be able to post up to 30 items in this table using their uniqueID.
Now, imagine the search page brings up a list of businesses in the area, it's then got to query the new table for all of those ids to get that block of information I want to show based on the date. I'm pretty sure it would be a rather intensive couple of queries just to show a rather simple block of text, but I imagine this is how status updates work for social networking sites in general? Does facebook store updates in a table of updates tied to a users ID number or have they come up with a better way?
I'm just trying to gain a little more insight into DB design, so throw out any ideas you might have.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text...
Assuming you mean the profile uniqueID, and not a unique ID for the text table, you're correct.
As pascal said in his comment, you'd need a primary index on uniqueID and date. A person could only enter one row of text for a given date.
If you want to retrieve the next text row for a person, your SQL query would have the following clauses:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 1
Since you have an index on uniqueID and date, this should be a fast query.
If you want to retrieve the next 5 texts for a particular person, you'd just have to make one change:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 5