Check if user has viewed something [closed] - cosmos

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
[edit] Removed it, because it is not worth it.

The exact field types will vary some depending on the database you're using, but here's the general technique:
You need a users table with unique IDs:
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
email VARCHAR(50) NULL,
password VARCHAR(32) NULL
);
And a table for your news items:
CREATE TABLE articles (
article_id INTEGER PRIMARY KEY,
title VARCHAR(50) NULL,
pubdate DATETIMESTAMP,
body blob or whatever your database supports
);
And finally a table that indicates which users have read which articles:
CREATE TABLE users_articles (
article_id INTEGER,
user_id INTEGER,
read_date DATETIMESTAMP
);
The users_articles table should probably be indexed by article_id, depending the queries you use and how your database chooses to optimize those queries.
Now, to get all of the articles from the last 7 days that user_id 999 has not yet read, your query would look something like this:
SELECT a.title, a.pubdate, a.body
FROM articles a
WHERE a.pubdate > date_sub(NOW(), INTERVAL "7 days")
AND NOT EXISTS (
SELECT *
FROM users_articles ua
WHERE ua.article_id = a.article_id
AND ua.user_id = 999
)
Other formulations of this query are possible, of course. And the interval syntax will vary from one database to the next. But that's the gist of it.
Whenever a user reads an article, you can insert/update the users_articles table with the user_id and article_id and the current timestamp. As a side-effect, this also gives you the information about what articles the user has read most recently.

I suggest to make a new table where you can save the relation between the article and the user. The table will look something like this:
newsId | userId | ip | date ...

You've got a users table. And you've got a news table.
You just need something like a user_has_read table...
id
time
user_id
news_id
That way you can add an entry to this table when users first view something to tie them to the news item. Then lookup their user_id in this table to see if they've been here before.
You can now also have a "Recently Viewed" section with links to the last 10 things they've read for ease of navigation.

You could have a boolean flag to determine if something was read or not. When the user gets to whatever you want him/her to read, you can go to the database and set that flag to true, showing that that record has already been seen by a specific user.

Related

Loading time when counting numbers of followers

Say I'll get all the followers of a certain content from my project; here is my db
table
contents
users
Now, everytime I want to get content's numbers of followers, I have this table here to get connections with users called content-followers.
table
contents
users
content-followers <
columns
user_id
content_id
Now my concern is say this will run getting the numbers of followers of a content, but this will be along with the other queries and stuff and I understand it may get the sql slower on process.
See, everytime people will visit the content, I'll have to show that count, but that count (as I imagine) will run through the entire table just to count.
Is there other way to make it simple? Like counting only once a certain time and save to contents table?
I have no proper database lessons so, thanks guys for your help in advance!
CREATE TABLE ContentFollowers (
user_id ...,
content_id ...,
PRIMARY KEY(user_id, content_id),
INDEX(content_id, user_id)
) ENGINE=InnoDB;
SELECT ...,
( SELECT COUNT(*) FROM ContentFollowers
WHERE user_id = u.id
) AS follower_count
FROM Contents AS c
JOIN Users AS u ON ...
WHERE ...
The COUNT(*) will efficiently use the PRIMARY KEY of ContentFollowers. The added time taken will be a few milliseconds, even with many millions of users and contents.
If you want to discuss further, please provide the SHOW CREATE TABLE for each relevant table and your tentative SELECT (which will have more than what I specified). So "... counting only once ..." should be unnecessary (and a hassle).
Is it possible for a "user" to "follow" a "content" more than once? This is a potential hack to mess up your numbers, but I think what I say here avoids that possibility. (A PRIMARY KEY includes an 'uniqueness' constraint.) Without this, a user could repeatedly click on [Follow] to inflate the number of 'followers'.
In what you have specified so far, I don't see the need for a TRIGGER. Furthermore, a Trigger would reopen the possibility of the above 'hack'.

Liked Posts Design Specifics

So I've found through researching myself that the best way I can design a structure for liking posts is by having a database like the following. Let's say like Reddit, a post can be upvoted, downvoted, or not voted on at all.
The database would then having three columns, [username,post,liked].
Liked could be some kind of boolean, 1 indicating liked, and 0 indicating disliked.
Then to find a post like amount, I would do SELECT COUNT(*) FROM likes WHERE post=12341 AND liked=1 for example, then do the same for liked=0(disliked), and do the addition server side along with controversy percentage.
So I have a few concerns, first off, what would be the appropriate way to find out if a user liked a post? Would I try to select the liked boolean value, and either retrieve or catch error. Or would I first check if the record exist, and then do another select to find out the value? What if I want to check if a user liked multiple posts at once?
Secondly, would this table not need a primary key? Because no row will have the same post and username, should I use a compound primary key?
For performance you will want to alter your database plans:
User Likes Post table
Fields:
Liked should be a boolean, you are right. You can transform this to -1/+1 in your code. You will cache the numeric totals elsewhere.
Username should be UserID. You want only numeric values in this table for speed.
Post should be PostID for the same reason.
You also want a numeric primary key because they're easier to search against, and to perform sub-selects with.
And create a unique index on (Username, Post), because this table is mainly an index built for speed.
So did a user vote on a post?
select id
from user_likes_post
where userID = 123 and postID = 456;
Did the user like the post?
select id
from user_likes_post
where userID = 123 and postID = 456 and liked = true;
You don't need to worry about errors, you'll either get results or you won't, so you might as well go straight to the value you're after:
select liked from user_liked_post where userID=123 and postID=456
Get all the posts they liked:
select postID
from user_likes_post
where userID = 123 and liked = true;
Post Score table
PostID
TotalLikes
TotalDislikes
Score
This second table will be dumped and refreshed every n minutes by calculating on the first table. This second table is your cached aggregate score that you'll actually load for all users visiting that post. Adjust the frequency of this repeat dump-and-repopulate schedule however you see fit. For a small hobby or student project, just do it every 30 seconds or 2 minutes; bigger sites, every 10 or 15 minutes. For an even bigger site like reddit, you'd want to make the schema more complex to allow busier parts of the site to have faster refresh.
// this is not exact code, just an outline
totalLikes =
select count(*)
from user_likes_post
where postID=123 and liked=true
totalDislikes =
select count(*)
from user_likes_post
where postID=123 and liked=false
totalVotes = totalLikes + totalDislikes
score = totalLikes / totalVotes;
(You can simulate an update by involving the user's localStorage -- client-side Javascript showing a bump-up or down on the posts that user has voted on.)
Given your suggested 3-column table and the selects you suggest, be sure to have
PRIMARY KEY(username, post) -- helps with "did user like a post"
INDEX(post_id, liked) -- for that COUNT
When checking whether a user liked a post, either do a LEFT JOIN so that you get one of three things: 1=liked, 0=unliked, or NULL=not voted. Or you could use EXISTS( SELECT .. )
Tables need PKs.
I agree with Rick James that likes table should be uniquely indexed by (username, post) pair.
Also I advise you to let a bit redundancy and keep the like_counter in the posts table. It will allow you to significantly reduce the load on regular queries.
Increase or decrease the counter right after successful adding the like/dislike record.
All in all,
to get posts with likes: plain select of posts
no need to add joins and aggregate sub-queries.
to like/dislike: (1) insert into likes, on success (2) update posts.like_counter.
unique index prevents duplication.
get know if user has already liked the post: select from likes by username+post pair.
index helps to do it fast
My initial thought was that the problem is because boolean type is not rich enough to express the possible reactions to a post. So instead of boolean, you needed an enum with possible states of Liked, Disliked, and the third and the default state of Un-reacted.
Now however it seems, you can do away with boolean too because you do not need to record the Un-reacted state. A lack of reaction means that you do not add the entry in the table.
What would be the appropriate way to find out if a user liked a post?
SELECT Liked
FROM Likes
WHERE Likes.PostId == 1234
AND Likes.UserName == "UniqueUserName";
If the post was not interacted with by the user, there would be no results. Otherwise, 1 if liked and 0 if disliked.
What if I want to check if a user liked multiple posts at once?
I think for that you need to store a timestamp too. You can then use that timestamp to see if it there are multiple liked post within a short duration.
You could employ k-means clustering to figure if there are any "cluster" of likes. The complete explanation is too big to add here.
Would this table not need a primary key?
Of course it would. But Like is a weak entity depending upon the Post. So it would require the PK of Post, which is the field post (I assume). Combined with username we would have the PK because (post, username) would be unique for user's reaction.

Storing duplicate fields: good or bad [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Let's say a user has posts table like this:
Post with id=1 is the first post that a user has posted. Post with an id=2 – is the edit that was made to the post, with id=3 – latest current version of the post.
post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. So we could store it like this:
So the question is: would it be better to store it the second way, with no duplication? This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time. Or is it okay to store duplicate fields in this case?
p.s. this is questioned because we want to avoid duplication and accident changes of values that cannot be changed throughout versions, so we want to be storing them all in one place
Take the entity Post and look at the simple tuple:
ID User_ID Post_Param_A Comment
1 69 foo This is a post
This is perfectly normalized. However, the post may undergo editing and you want to track the changes made. So you add another field to track the changes. Instead of an incremental value, however, it would make more sense to add a datetime field.
ID EffDate User_ID Post_Param_A Comment
1 1/1/16 12:00 69 foo This is a post
This has two advantages: 1) if you track the changes, you will want to know anyway when this version was saved and 2) you don't have to find the largest incremental value for the post to find out what value to save with each new version. Just save the current date and time.
However, with either an incremental value or date, there is a problem. In the simple row, each field has a function dependency on the PK. In the version row, User_ID and Post_Param_A maintain their dependency on the PK but Comment is now dependent on the PK and EffDate.
The tuple is no longer in 2nf.
So the solution is a simple matter of normalizing it:
ID User_ID Post_Param_A
1 69 foo
ID EffDate Comment
1 1/1/16 12:00 This is a post
1 1/1/17 12:00 An edit was made
1 1/1/17 15:00 The last and current version (so far)
with (ID, EffDate) the composite PK in the new table.
The query to read the latest post is a bit complicated:
select p.ID, v.EffDate, p.User_ID, p.Post_Param_A, v.Comment
from Posts p
join PostVersions v
on v.ID = p.ID
and v.EffDate = (
select Max( v1.EffDate )
from PostVersions v1
where v1.ID = p.ID
and v1.EffDate <= today )
and p.ID = 1;
This is not really as complicated as it looks and it is impressively fast. The really neat feature is -- if you replace "today" with, say, 1/1/17 13:00, the result will be the second version. So you can query the present or the past using the same query.
Another neat feature is achieved by creating a view from the "today" query with the last line ("and p.ID = 1") removed. This view will expose the latest version of all posts. Create triggers on the view and this allows the apps that are only interested in the current version to do their work without consideration of the underlying structure.
You could have a separate table where you store the post_param_a for each post_id, then you wouldn't need to have NULL values or duplicate values.
The 1st solution is better because user_id is aligned with the post_id and avoid various interpretations.
This way, to get a current version of user's post we'd have to join the first version and check its user_id all the time.
Do you think about adding a field timestamp, so that you can always get the last version of a post?
In the 2nd solution, NULL could be ambiguous when the data grow. And even querying will be difficult, every SQL should be well designed to think about the NULL cases and their specific meanings.
The 3rd solution could be a normalization of your table using 2 separated ones, e.g. post and post_history. As you mentioned in the question that post_param_a cannot be changed throughout versions, as well as user_id – they always stay the same since the first version. In this case,
In table post, you can store information related to the post which are permanent (won't be changed): id, param_a, user_id, created_at ...
In table post_history, you can store informations related to the post which are related to each version / modification: version_id, comment, modified_at ... And you can add a FK constraint for the second table which indicates post_history.post_id = post.id

Fill foreign key column with values from different table in mysql [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I think I'm a bit in over my head here - really appreciate any help! :)
I have two tables in two mysql-databases:
Database A.Table A
id (int)
name (varchar)
Database B.Table B
id (int)
name (varchar)
foreign key1 (int)
foreign key2 (int)
I would like to make a MySQL query like SELECT * FROM, so that the result will still be correct, even if some extra columns are added later. However, I would like the foreign keys in Table B to be replaced with the corresponding varchar name from table A. Both foreign keys point to the same table, but may differ, because they represent where a person is, and where he belongs.
I've tried this:
SELECT * FROM tblA RIGHT JOIN tblB ON tblA.id = tblB.foreignkey1
But it adds another column to the result set, which is not what I'm trying to achieve.
When query tbl.B like so SELECT * FROM tbl.B (add magic I'm looking for here), the end result should be something like so:
*--------*----------*--------------------------*---------------------------*
|tblB.id | tblB.name| foreign value1 from tbl.A | foreign value2 from tbl.A|
*--------*----------*--------------------------*---------------------------*
You could do things like this:
SELECT tblB.id, tblB.name, tblA.* -- all fields in tblA
FROM tblA RIGHT JOIN tblB ON tblA.id = tblB.foreignkey1
... but this is pretty much all I can think of. Unfortunately, you must either use SELECT * or SELECT [each and every column], there is no alternative.

Like and Dislike System for Posts

I want to include a like/dislike system similar to Facebook and so far, I have set the like/dislike columns as a 'text' type. This is so that I can add the id for the user(s) who liked/disliked a post. Would that be the best way of doing it? Also, in addition to the question above, how would I stop a user pressing the like and dislike button again? Since, once a user has liked a post, it should display an unlike/undislike option? A concept/idea would be great of how to do this.
While it's hard to make armchair decisions, here are my ideas:
First, you could have a 'likes' integer column for each post. When a user clicks up and down, have that number increment or decrement. This offers no protection against users clicking multiple times, but it's easy and fast.
Another way would be to have a 'Like' table, with columns post_id, user_id, and score. score can have two values: '1' or '-1'. All 3 columns are integers. When the user clicks 'like', you do an INSERT/UPDATE command on the row with user_id & post_id matching.
Then, to see the final score for a post, you do a SELECT SUM(score) FROM that_table WHERE post_id = ?.
With this second method, if you wanted to see the name of the most recent clicker, you could add a timestamp column and search for the most recent entry.
I would create a table with the following structure:
Table: Likes
PostId bigint
UserId bigint
Like bit(true, false)
Then set PostId and UserId together as the primary key. This will prevent the database from inserting multiple likes/unlikes for the same post.
In your code, check to see if the user/post combination exists and then toggle the bit value if it does or set the bit value to true if it does not.