MySQL Loading Rows via String's Contents - mysql

I'm making a "Like" button for a portfolio website I'm working on and I've gotten stumped by some of my own code!
I have two MySQL tables:
img_all: contains all images on the server (each image has a 6 INT id)
login: contains account information for all users of the website (each user has a 6 INT id as well a VARCHAR column labeled "likes")
The way my system is laid out. When a user "Likes" a picture, I save that picture's id to a column on the login table.
UPDATE login SET likes = CONCAT(likes,':$img_id:') WHERE user_key = $user_id;
and when they unlike a picture:
UPDATE login SET likes = REPLACE(likes,':$img_id:','') WHERE user_key = $user_id;
This will output strings in the likes column similar to this:
:456093:475829:203944:789203:
My problem starts here. I'm making a page that allows users to view all the pictures that they've liked (let's call this file "Likes.php").
However, the list of liked pictures are saved in the login table, while the actual picture information is saved in img_all.
How then do I take the list from my login table and translate it to select those images from img_all? I was thinking of using a mixture of:
SELECT user_key FROM login WHERE likes LIKE '%:$img_id:%';
and
while();
I also thought of a SQL query. I know it won't work. However, hopefully, it will also help relay what I'm trying to accomplish!
SELECT * FROM
img_all WHERE id =
SELECT likes FROM login
WHERE likes LIKE '%:$img_id:%' AND user_key = '$user_id';

You are almost there. Instead of using a subquery, you can turn your query into a JOIN, like :
SELECT i.*
FROM img_all i
INNER JOIN login l ON l.likes LIKE CONCAT('%:', l.img_id, ':%')
WHERE user_id = ?
While this might solve your question, please be aware that storing list of values in a single column is almost always an indication of poor design.
Accessing and modifying the data requires to manipulate strings, which is uneasy to do with SQL, error-prone and quite inefficient. Also, as commented by Bill Karwin, the number of likes that you can store for a single user is limited by the maximum size of the string column.
As commented by tim, you should use a separated table to store the likes, with foreign keys to the login and img_all tables.
CREATE TABLE likes (
like_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
img_id INT NOT NULL,
PRIMARY KEY (like_id),
FOREIGN KEY fk_likes_login(user_id) REFERENCES login(user_id),
FOREIGN KEY fk_likes_img(img_id) REFERENCES img_all(img_id)
);
NB : the auto-incremented primary key is not strictly necessary, and could be replaced by a composite unique index on the two foreign keys.
Then you can retrieve all images that a user liked with a simple JOINed query :
SELECT i.*
FROM likes
INNER JOIN img_all ON img_all.img_id = likes.img_id
WHERE likes.user_id = ?

Related

Loading time when counting numbers of followers

Say I'll get all the followers of a certain content from my project; here is my db
table
contents
users
Now, everytime I want to get content's numbers of followers, I have this table here to get connections with users called content-followers.
table
contents
users
content-followers <
columns
user_id
content_id
Now my concern is say this will run getting the numbers of followers of a content, but this will be along with the other queries and stuff and I understand it may get the sql slower on process.
See, everytime people will visit the content, I'll have to show that count, but that count (as I imagine) will run through the entire table just to count.
Is there other way to make it simple? Like counting only once a certain time and save to contents table?
I have no proper database lessons so, thanks guys for your help in advance!
CREATE TABLE ContentFollowers (
user_id ...,
content_id ...,
PRIMARY KEY(user_id, content_id),
INDEX(content_id, user_id)
) ENGINE=InnoDB;
SELECT ...,
( SELECT COUNT(*) FROM ContentFollowers
WHERE user_id = u.id
) AS follower_count
FROM Contents AS c
JOIN Users AS u ON ...
WHERE ...
The COUNT(*) will efficiently use the PRIMARY KEY of ContentFollowers. The added time taken will be a few milliseconds, even with many millions of users and contents.
If you want to discuss further, please provide the SHOW CREATE TABLE for each relevant table and your tentative SELECT (which will have more than what I specified). So "... counting only once ..." should be unnecessary (and a hassle).
Is it possible for a "user" to "follow" a "content" more than once? This is a potential hack to mess up your numbers, but I think what I say here avoids that possibility. (A PRIMARY KEY includes an 'uniqueness' constraint.) Without this, a user could repeatedly click on [Follow] to inflate the number of 'followers'.
In what you have specified so far, I don't see the need for a TRIGGER. Furthermore, a Trigger would reopen the possibility of the above 'hack'.

Liked Posts Design Specifics

So I've found through researching myself that the best way I can design a structure for liking posts is by having a database like the following. Let's say like Reddit, a post can be upvoted, downvoted, or not voted on at all.
The database would then having three columns, [username,post,liked].
Liked could be some kind of boolean, 1 indicating liked, and 0 indicating disliked.
Then to find a post like amount, I would do SELECT COUNT(*) FROM likes WHERE post=12341 AND liked=1 for example, then do the same for liked=0(disliked), and do the addition server side along with controversy percentage.
So I have a few concerns, first off, what would be the appropriate way to find out if a user liked a post? Would I try to select the liked boolean value, and either retrieve or catch error. Or would I first check if the record exist, and then do another select to find out the value? What if I want to check if a user liked multiple posts at once?
Secondly, would this table not need a primary key? Because no row will have the same post and username, should I use a compound primary key?
For performance you will want to alter your database plans:
User Likes Post table
Fields:
Liked should be a boolean, you are right. You can transform this to -1/+1 in your code. You will cache the numeric totals elsewhere.
Username should be UserID. You want only numeric values in this table for speed.
Post should be PostID for the same reason.
You also want a numeric primary key because they're easier to search against, and to perform sub-selects with.
And create a unique index on (Username, Post), because this table is mainly an index built for speed.
So did a user vote on a post?
select id
from user_likes_post
where userID = 123 and postID = 456;
Did the user like the post?
select id
from user_likes_post
where userID = 123 and postID = 456 and liked = true;
You don't need to worry about errors, you'll either get results or you won't, so you might as well go straight to the value you're after:
select liked from user_liked_post where userID=123 and postID=456
Get all the posts they liked:
select postID
from user_likes_post
where userID = 123 and liked = true;
Post Score table
PostID
TotalLikes
TotalDislikes
Score
This second table will be dumped and refreshed every n minutes by calculating on the first table. This second table is your cached aggregate score that you'll actually load for all users visiting that post. Adjust the frequency of this repeat dump-and-repopulate schedule however you see fit. For a small hobby or student project, just do it every 30 seconds or 2 minutes; bigger sites, every 10 or 15 minutes. For an even bigger site like reddit, you'd want to make the schema more complex to allow busier parts of the site to have faster refresh.
// this is not exact code, just an outline
totalLikes =
select count(*)
from user_likes_post
where postID=123 and liked=true
totalDislikes =
select count(*)
from user_likes_post
where postID=123 and liked=false
totalVotes = totalLikes + totalDislikes
score = totalLikes / totalVotes;
(You can simulate an update by involving the user's localStorage -- client-side Javascript showing a bump-up or down on the posts that user has voted on.)
Given your suggested 3-column table and the selects you suggest, be sure to have
PRIMARY KEY(username, post) -- helps with "did user like a post"
INDEX(post_id, liked) -- for that COUNT
When checking whether a user liked a post, either do a LEFT JOIN so that you get one of three things: 1=liked, 0=unliked, or NULL=not voted. Or you could use EXISTS( SELECT .. )
Tables need PKs.
I agree with Rick James that likes table should be uniquely indexed by (username, post) pair.
Also I advise you to let a bit redundancy and keep the like_counter in the posts table. It will allow you to significantly reduce the load on regular queries.
Increase or decrease the counter right after successful adding the like/dislike record.
All in all,
to get posts with likes: plain select of posts
no need to add joins and aggregate sub-queries.
to like/dislike: (1) insert into likes, on success (2) update posts.like_counter.
unique index prevents duplication.
get know if user has already liked the post: select from likes by username+post pair.
index helps to do it fast
My initial thought was that the problem is because boolean type is not rich enough to express the possible reactions to a post. So instead of boolean, you needed an enum with possible states of Liked, Disliked, and the third and the default state of Un-reacted.
Now however it seems, you can do away with boolean too because you do not need to record the Un-reacted state. A lack of reaction means that you do not add the entry in the table.
What would be the appropriate way to find out if a user liked a post?
SELECT Liked
FROM Likes
WHERE Likes.PostId == 1234
AND Likes.UserName == "UniqueUserName";
If the post was not interacted with by the user, there would be no results. Otherwise, 1 if liked and 0 if disliked.
What if I want to check if a user liked multiple posts at once?
I think for that you need to store a timestamp too. You can then use that timestamp to see if it there are multiple liked post within a short duration.
You could employ k-means clustering to figure if there are any "cluster" of likes. The complete explanation is too big to add here.
Would this table not need a primary key?
Of course it would. But Like is a weak entity depending upon the Post. So it would require the PK of Post, which is the field post (I assume). Combined with username we would have the PK because (post, username) would be unique for user's reaction.

MySQL - get rows with same ID but different specific values on another column

I know the title isn't very helpful, part of the reason I'm having trouble figuring this out on my own is I can't figure out how to word it, so I can't google it.
Anyway, I'm making a Netflix style website with movies and TV shows ripped from my DVD collection. It's a LAMP stack running off my Raspberry Pi. I want to have the option to search by genre by selecting genres from a bunch of check boxes. I want it to work so that if I check "horror" and "comedy", the search results only return movies/TV shows that have BOTH those genres, not either/or.
So I have a couple MySQL tables, THR_MOVIE, THR_SHOW, and THR_GENRE. The structures of THR_MOVIE and THR_SHOW aren't really important for this question, just know that each movie/TV show takes up just one row and has a unique ID. Here's the structure of THR_GENRE:
CREATE TABLE `THR_GENRE` (
`media_id` INT(7) UNSIGNED NOT NULL,
`genre` VARCHAR(255) NOT NULL,
`media_type` ENUM('movie','show') NOT NULL,
`date_added` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`media_id`, `genre`, `media_type`))
Obviously if media_type is 'movie', then media_id is a THR_MOVIE ID and not a THR_SHOW ID. Some example data from the genres table might look like this:
row 1: media_id=12, genre='horror', media_type='movie'
row 2: media_id=12, genre='comedy', media_type='movie'
So how would this query work? I need to get the movie data, so I need to join THR_MOVIE with THR_GENRE to get the movies, then do the same with THR_SHOW and THR_GENRE to get the TV shows.
You might use left outer join:
select
*
from
THE_GENRE g
left outer join
THR_MOVIE m
on m.id = g.media_id
and g.media_type = 'movie'
left outer join
THR_SHOW s
on s.id = g.media_id
and g.media_type = 'show'
Query will return columns for both movie and show tables. Code which consume this query result will need to check media_type value and use movie table columns if type is 'movie' or show table columns if type is show.
Thanks for the response, but I just figured it out myself.
select *
from THR_GENRE
where (genre='horror'
or genre='comedy')
and media_type='movie'
group by media_id
having count(*)=2
Obviously my PHP script will have to create the query string with count(*)=X, where X is some variable holding the number of genres searched for.

Mysql setup for multiple users with large number of individual options

i'm building a study tool and i'm not sure of the best way to go about structuring my database.
Basically, i have a simple but big table with around 50000 bits of information in it.
info (50'000 rows)
id
info_text
user
id
name
email
password
etc
What i want is for the students to be able to marked each item as studied or to be studied(basically on and off), so that they can tick off each item when they have revised it.
I want to build tool to cope with thousands of users and was wondering what the most efficient/easiest option way of setting up the database and associated queries.
At the moment i would lean towards just having one huge table with two primary keys one with user id and then id of the info they had studied and then doing some sort of JOIN statement so i could only pull back the items that they had left to study.
user_info
user_id
info_id
Thanks in advance
Here is one way to model this situation:
The table in the middle has a composite primary key on USER_ID and ITEM_ID, so a combination of the two must be unique, even though individually they don't have to be.
A user (with given USER_ID) has studied a particular item (with given ITEM_ID) only if there is a corresponding row in the STUDIED table (with these same USER_ID and ITEM_ID values).
Conversely, the user has not studied the item, if and only if the corresponding row in STUDIED is missing. To pull all items a given user hasn't studied, you can do something like this:
SELECT * FROM ITEM
WHERE NOT EXISTS (
SELECT * FROM STUDIED
WHERE
USER_ID = <given_user_id>
AND ITEM.ITEM_ID = STUDIED.ITEM_ID
)
Or, alternatively:
SELECT ITEM.*
FROM ITEM LEFT JOIN STUDIED ON ITEM.ITEM_ID = STUDIED.ITEM_ID
WHERE USER_ID = <given_user_id> AND STUDIED.ITEM_ID IS NULL
The good thing about this design is that you don't need to care about STUDIED table in advance. When adding a new user or item, just leave the STUDIED alone - you'll gradually fill it later as users progress with their studies.
I would do something like this:
1) A users table with a uid primary key
2) A enrolled table (this table shows all courses that have enrolled students) with a primary key of (uid, cid)
3) A items (info) table holding all items to study, with a primary key of itemid
Then in the enrolled table just have one attribute (a binary flag) 1 means it has been studyed and 0 means they still need to study it.

Storing Friends in Database for Social Network

For storing friends relationships in social networks, is it better to have another table with columns relationship_id, user1_id, user2_id, time_created, pending or should the confirmed friend's user_id be seralized/imploded into a single long string and stored along side with the other user details like user_id, name, dateofbirth, address and limit to like only 5000 friends similar to facebook?
Are there any better methods? The first method will create a huge table! The second one has one column with really long string...
On the profile page of each user, all his friends need to be retrieved from database to show like 30 friends similar to facebook, so i think the first method of using a seperate table will cause a huge amount of database queries?
The most proper way to do this would be to have the table of Members (obviously), and a second table of Friend relationships.
You should never ever store foreign keys in a string like that. What's the point? You can't join on them, sort on them, group on them, or any other things that justify having a relational database in the first place.
If we assume that the Member table looks like this:
MemberID int Primary Key
Name varchar(100) Not null
--etc
Then your Friendship table should look like this:
Member1ID int Foreign Key -> Member.MemberID
Member2ID int Foreign Key -> Member.MemberID
Created datetime Not Null
--etc
Then, you can join the tables together to pull a list of friends
SELECT m.*
FROM Member m
RIGHT JOIN Friendship f ON f.Member2ID = m.MemberID
WHERE f.MemberID = #MemberID
(This is specifically SQL Server syntax, but I think it's pretty close to MySQL. The #MemberID is a parameter)
This is always going to be faster than splitting a string and making 30 extra SQL queries to pull the relevant data.
Separate table as in method 1.
method 2 is bad because you would have to unserialize it each time and wont be able to do JOINS on it; plus UPDATE's will be a nightmare if a user changes his name, email or other properties.
sure the table will be huge, but you can index it on Member11_id, set the foreign key back to your user table and could have static row sizes and maybe even limit the amount of friends a single user can have. I think it wont be an issue with mysql if you do it right; even if you hit a few million rows in your relationship table.