MySQL Join Query - joining tables into themselves many times - mysql

I have 4 queries I need to excecute in order to suggest items to users based on items they've already expressed an interest in:
Select 5 random items the user already likes
SELECT item_id
FROM user_items
WHERE user_id = :user_person
ORDER BY RAND()
LIMIT 5
Select 50 people who like the same items
SELECT user_id
FROM user_items
WHERE user_id != :user_person
AND item_id = :selected_item_list
LIMIT 50
SELECT all items that the original user likes
SELECT item_id
FROM user_items
WHERE user_id = :user_person
SELECT 5 items the user doesn't already like to suggest to the user
SELECT item_id
FROM user_items
WHERE user_id = :user_id_list
AND item_id != :item_id_list
LIMIT 5
What I would like to know is how would I excecute this as one query?
There are a few reasons for me wanting to do this:
at the moment, I have to excecute the 'select 50 people' query 5 times and pick the top 50 people from it
I then have to excecute the 'select 5 items' query 50 * (number of items initial user likes)
Once the query has been excecuted, I intend to store the query result in a cookie (if the user gives consent to me using cookies, otherwise they don't get the 'item suggestion' at all) with the key being a hash of the query, meaning it will only fire once a day / once a week (that's why I return 5 suggestions and select a key at random to display)
Basically, if anybody knows how to write these queries as one query, could you show me and explain what is going on in the query?

This will select all items you need:
SELECT DISTINCT ui_items.item_id
FROM user_items AS ui_own
JOIN user_items AS ui_others ON ui_own.item_id = ui_others.item_id
JOIN user_items AS ui_items ON ui_others.user_id = ui_items.user_id
WHERE ui_own.user_id = :user_person
AND ui_others.user_id <> :user_person
AND ui_items.item_id <> ui_own.item_id
(please, check if result are exact same with you version - I tested it on a very small fake data set)
Next you just cache this list and show 5 items randomly, because ORDER BY RAND() is VERY inefficient (non-deterministic query => no caching)
EDIT: Added the DISTINCT to not show duplicate rows.
You can also return a most popular suggestions in descending popularity order by removing DISTINCT and adding the following code to the end of the query:
GROUP BY ui_items.item_id
ORDER BY COUNT(*) DESC
LIMIT 20
To the end of the query which will return the 20 most popular items.

Related

How can I get an even distribution using WHERE id IN(1,2,3,4)

I have a query that is pulling users who liked a specific object from a users table. Ratings are stored in a ratings table. The query I have come up with so far looks like this:
SELECT user.id, user.name, user.image
FROM users
LEFT JOIN ratings ON ratings.userid = user.id
WHERE rating.rating > 0
AND rating.objectId IN (1,2,3,4)
I want to be able to put a LIMIT on this query, to avoid returning all the results, when I only need 3 or so results for each ID. If I just put a LIMIT 12 for example, I might get 8 records with one id, and 1 or 2 each for the others - i.e. an uneven distribution across the IDs.
Is there a way to write this query so as to guarantee that (assuming an object has been "liked" at least three times), I get three results for each of the ids in the list?
By setting the row number whit variables, and then filter that result to show only row 1-3 should work
SET #last_objectId = 0;
SET #count_objectId = 0;
SELECT id, name, image FROM (
SELECT
user.id,
user.name,
user.image,
#count_objectId := IF(#last_objectId = rating.objectId, #count_objectId, 0) + 1 AS rating_row_number,
#last_objectId := rating.objectId
FROM users
LEFT JOIN ratings ON (ratings.userid = user.id)
WHERE
rating.rating > 0 AND
rating.objectId IN (1,2,3,4)
ORDER BY rating.objectId
) AS subquery WHERE rating_row_number <= 3;

MySQL nested query counting

A bit of background info; this is an application that allows users to created challenges and then vote on those challenges (bog standard userX-vs-userY type application).
The end goal here is to get a list of 5 users sorted by the number of challenges they have won, to create a type of leaderboard. A challenge is won by a user if it's status = expired and the user has > 50 votes for that challenge (challenges expire after 100 votes in total).
I'll simplify things a bit here, but essentially there are three tables:
users
id
username
...
challenges
id
issued_to
issued_by
status
challenges_votes
id
challenge_id
user_id
voted_for
So far I have an inner query which looks like:
SELECT `challenges`.`id`
FROM `challenges_votes`
LEFT JOIN `challenges` ON (`challenges`.`id` = `challenges_votes`.`challenge_id`)
WHERE `voted_for` = 1
WHERE `challenges`.`status` = 'expired'
GROUP BY `challenges`.`id`
HAVING COUNT(`challenges_votes`.`id`) > 50
Which in this example would return challenge IDs that have expired and where the user with ID 1 has > 50 votes for.
What I need to do is count the number of rows returned here, apply it to each user from the users table, order this by the number of rows returned and limit it to 5.
To this end I have the following query:
SELECT `users`.`id`, `users`.`username`, COUNT(*) AS challenges_won
FROM (
SELECT `challenges`.`id`
FROM `challenges_votes`
LEFT JOIN `challenges` ON (`challenges`.`id` = `challenges_votes`.`challenge_id`)
WHERE `voted_for` = 1
GROUP BY `challenges`.`id`
HAVING COUNT(`challenges_votes`.`id`) > 0
) AS challenges_won, `users`
GROUP BY `users`.`id`
ORDER BY challenges_won
LIMIT 5
Which is kinda getting there but of course the voted_for user ID here is always 1. Is this even the right way to go about this type of query? Can anyone shed any light on how I should be doing it?
Thanks!
I guess the following script will solve your problem:
-- get the number of chalenges won by each user and return top 5
SELECT usr.id, usr.username, COUNT(*) AS challenges_won
FROM users usr
JOIN (
SELECT vot.challenge_id, vot.voted_for
FROM challenges_votes vot
WHERE vot.challenge_id IN ( -- is this check really necessary?
SELECT cha.id -- if any user is voted 51 he wins, so
FROM challenges cha -- why wait another 49 votes that won't
WHERE cha.status = 'expired' -- change the result?
) --
GROUP BY vot.challenge_id
HAVING COUNT(*) > 50
) aux ON (aux.voted_for = usr.id)
GROUP BY usr.id, usr.username
ORDER BY achallenges_won DESC LIMIT 5;
Please allow me to propose a small consideration to the condition to close a challenge: if any user wins after 51 votes, why is it necessary to wait another 49 votes that will not change the result? If this constraint can be dropped, you won't have to check challenges table and this can improve the query performance -- but, it can worsen too, you can only tell after testing with your actual database.

How to use MYSQL LIMIT per person

So I have a scenario where there are 1-8 people that i need to query up to 3 things they "liked" per person. I have the query set up as
SELECT liked FROM likeTable WHERE uid IN (uid1,uid2,uid3,uid4) LIMIT 12
but obviously this can potentially stop when i have 12 "likes" for uid1, leaving the rest at 0. I read a possible solution as using UNION ALL for example...
(SELECT liked FROM likeTable WHERE uid = uid1 LIMIT 3)
UNION ALL
(SELECT liked FROM likeTable WHERE uid = uid2 LIMIT 3)
UNION ALL
(SELECT liked FROM likeTable WHERE uid = uid3 LIMIT 3)
UNION ALL
(SELECT liked FROM likeTable WHERE uid = uid4 LIMIT 3)
And i would be able to achieve this by making the sql query string in php with a forloop, but is this an efficient way of querying my data?
note: I don't really care about the order in which the "liked" is retrieved, although it would be nice if i could add an ORDER BY likeID DESC, which is my autoincrementing column
Thanks!
For small number (not hundreds, I would say) of categories (users), just use the union solution.
And would I be able to achieve this by making the sql query string in php with a forloop, but is this an efficient way of querying my data?
Definitelly yes! Unless you query hundreds of users in one go. For 1-8 people, it is just perfect solution!

mysql intersection, comparison, opposite of UNION?

I'm trying to compare two set of resutls aving hard time to undesrtand how subqueries work and if they are efficient. I'm not gonna explain all my tables, but just think i have apair of arrays...i might do it in php but i wonder if i can do it in mysql right away...
this is my query to check how many items user 1 has in lists he owns
SELECT DISTINCT *
FROM list_tb
INNER JOIN item_to_list_tb
ON list_tb.list_id = item_to_list_tb.list_id
WHERE list_tb.user_id = 1
ORDER BY item_to_list_tb.item_id DESC
this is my query to check how many items user 2 has in lists he owns
SELECT DISTINCT *
FROM list_tb
INNER JOIN item_to_list_tb
ON list_tb.list_id = item_to_list_tb.list_id
WHERE list_tb.user_id = 1
ORDER BY item_to_list_tb.item_id DESC
now the problem is that i would intersect those results to check how many item_id they have in common...
thanks!!!
Unfortunately, MySQL does not support the Intersect predicate. However, one way to accomplish that goal would be to exclude List_Tb.UserId from your Select and Group By and then count by distinct User_Id:
Select ... -- everything except List_Tb.UserId
From List_Tb
Inner Join Item_To_List_Tb
On List_Tb.List_Id = Item_To_List_Tb.List_Id
Where List_Tb.User_Id In(1,2)
Group By ... -- everything except List_Tb.UserId
Having Count( Distinct List_Tb.User_Id ) = 2
Order By item_to_list_tb.item_id Desc
Obviously you would replace the ellipses with the actual columns you want to return and on which you wish to group.

Combining queries in MySQL

I know you can combine multiple table-selects using a Join statement but is there a way to combine these two queries into one?
SELECT Statistics.StatisticID
FROM Statistics
ORDER BY `SubmittedTime` DESC
LIMIT 0, 10
And this one?
SELECT COUNT(Votes.StatisticID)
FROM Votes
WHERE Votes.StatisticID = ?
(fluff removed)
At the moment I've achieved something that nearly works.
SELECT Statistics.StatisticID, COUNT(Score.StatisticID) AS Votes
FROM `Statistics`
LEFT JOIN `Votes` AS `Score` ON `Statistics`.`StatisticID` = `Score`.`StatisticID`
ORDER BY `SubmittedTime` DESC
LIMIT 0, 10
The Votes table is a simple StatID, UserID joiner. In my test case it contains 3 rows, two with StatID 5 - 1 with StatID 2.
My query will work if I add a WHERE clause, for example WHERE StatisticID = 5 and correctly return 2 votes. However if I remove the WHERE clause I still get one row returned and 3 votes.
Is it possible to combine these queries or will I have to run the second for each result (obviously not preferable)?
Assuming that you want to count the number of votes per statistics
SELECT StatisticID, COUNT(*) AS CountVotes
FROM `Votes`
GROUP BY Statistics.StatisticsID
I'm not seeing the reason why the tables have to be joined.
[EDIT] Ah...I see you want to order by submittedtime of the statistics table.
SELECT Votes.StatisticID, COUNT(*) AS CountVotes
FROM `Votes` JOIN Statistics ON votes.statisticsID = Statistics.StatisticsID
GROUP BY Statistics.StatisticsID
ORDER BY Statistics.SubmittedTime