Say I have a table of ratings:
create table ratings (
user_id int unsigned not null,
post_id int unsigned not null,
rating set('like', 'dislike') not null,
primary key (user_id, post_id)
);
And a given user with id 1, how can I select the user with more likes in common? And the user with more dislikes in common? And the user with more ratings (likes or dislikes) in common? I guess that the queries would be very similar, buy I can't figure any of them out yet. I'll update with any progress I make.
Any help is appreciated, thanks!
select
r1.user_id as user1
,r2.user_id as user2
,r1.rating as rating
,count(*) as num_matching_ratings
from
ratings r1
inner join ratings r2
on r1.post_id = r2.post_id
and r1.rating = r2.rating
and r1.user_id <> r2.user_id --don't want to count
--matches with self
where
r1.user_id = 1 -- change this to any user, or use a
-- variable to increase reusebility
and r1.rating = 'like' -- set this to dislike to common dislikes
group by
r1.user_id
,r2.user_id
,r1.rating
having
count(*) > 1 --show only those with more than 1 in common
order by
count(*) desc
/* limit 1 -- uncomment to show just the top match */
By joining the tables together, we can count the number of occurances where the second user has rated an article similarly. This query will return the evalution from the most in common to the least. If you uncomment the "limit 1" statement, it will only return the match with the most in common.
Give this a try:
select r2.user_id from (
select post_id, rating from ratings,
(select #userId := 2) init
where user_id = #userId
) as r1
join ratings r2
on r1.post_id = r2.post_id and r1.rating = r2.rating
where r2.user_id != #userId and r2.rating = 'like'
group by r2.user_id
order by count(*) desc
limit 1
It should work for likes and dislikes by changing the string. And to change the user just modify the variable assignation.
The following should work for both dislikes and likes in common (just by removing the filtering condition):
select r2.user_id from (
select post_id, rating from ratings,
(select #userId := 2) init
where user_id = #userId
) as r1
join ratings r2
on r1.post_id = r2.post_id and r1.rating = r2.rating
where r2.user_id != #userId
group by r2.user_id
order by count(*) desc
limit 1
pardon my syntax, i don't write raw sql very often. you can consider this psudocode.
first, i'd get the table where id is 1
view1 = SELECT * FROM ratings, WHERE ( user_id = 1)
then i'd join it with ratings
view2 = select * from view1, ratings, where(view1.rating = ratings.rating AND view1.post_id = records.post_id)
then i'd aggregate by count
view3 = select count from view2 group by (user_id)
and then i'd get the max of that.
now, that's only an algorithmic overview of what my first thoughts would be. I don't think it would be particularly efficient, and you probably wouldn't use that syntax.
Building on Chris's and Mostacho's answers, I made the following query. I'm not 100% sure that it works every time, but I havent found a flaw yet.
select r2.user_id
from ratings r1
join ratings r2
on r1.user_id <> r2.user_id
and r1.post_id = r2.post_id
and r1.rating = r2.rating
where r1.user_id = 1
and r1.rating = 'like'
group by r2.user_id
order by count(r2.user_id) desc
limit 1
This query returns the id of the user with more common likes with the user 1. To fetch the user with more common ratings, just remove and r1.rating = 'like' from the where clause.
Related
I have a MySQL query which I want to execute to see who is the employee with the best skill X in a company I work for. To do this I randomly pick a company from my cv_profile (skill_cv_test) and find all users who work there for the same employer. And then I randomly choose a skill I have.
The result should either be zero or a list.
But when testing with PHPMyAdmin I get results where I don't see any row, but the status says there is at least one row.
Here's an example of the message I get: https://imgur.com/bVMH716
I have been trying different structures, even "walling" the query with another query, different joins.
SELECT
DISTINCT(sv.usr_id),
u.first_name AS fn,
u.last_name AS ln,
c.name AS company,
s.name AS skill
FROM
(
SELECT
MAX(last_change) as date,
id,
usr_id,
skill_id
FROM skill_valuations
GROUP BY usr_id, skill_id
ORDER BY date
) sv
LEFT JOIN skill_valuations skv ON skv.last_change = sv.date
INNER JOIN
(
SELECT
DISTINCT(skct.comp_id),
skct.usr_id AS usr_id,
skct.category
FROM skill_cv_test skct
WHERE skct.end_date IS NULL AND skct.comp_id IN (SELECT comp_id FROM (SELECT comp_id FROM skill_cv_test WHERE usr_id = 1 ORDER BY RAND() LIMIT 1) x)
) uqv ON uqv.usr_id = sv.usr_id
INNER JOIN
(
SELECT skill_id
FROM usr_skills
WHERE usr_id = $uid
ORDER BY RAND()
LIMIT 1
) usq ON usq.skill_id = sv.skill_id
LEFT JOIN companies c ON c.id = uqv.comp_id
LEFT JOIN skills s ON s.id = sv.skill_id
LEFT JOIN users u ON u.id = sv.usr_id
As mentioned before, I expect either no results or a result of at least one row.
In my MySQL database I have these tables:
I want to select count of users who only own birds and no other pet.
So far I've came up with this:
SELECT COUNT(DISTINCT(user_id)) FROM users_pets_map WHERE pet_id IN (SELECT id FROM pets WHERE animal = 'bird')
but it doesn't satisfy the requirement of not owning other animals.
You can do aggregation :
select m.user_id, count(*)
from user_pets_map m inner join
pets p
on p.id = m.pet_id
group by m.user_id
having sum( p.animal <> 'bird' ) = 0;
In other way, you can also do :
select m.user_id, count(*)
from user_pets_map m inner join
pets p
on p.id = m.pet_id
group by m.user_id
having min(p.animal) = max(p.animal) and min(p.animal) = 'bird';
EDIT : If you want only Users count then you can do :
select count(distinct m.user_id)
from user_pets_map m
where not exists (select 1 from user_pets_map m1 where m1.user_id = m.user_id and m1.pet_id <> 3);
You can modify your query as below:
SELECT COUNT(DISTINCT(user_id)) FROM users_pets_map WHERE pet_id IN (SELECT id
FROM pets WHERE animal = 'bird') AND user_id NOT IN (SELECT user_id FROM
users_pets_map WHERE pet_id IN (SELECT id FROM pets WHERE animal <> 'bird'))
The last sub-query will fetch the pet_id who are not birds, the query outside it will fetch users who have animal other than birds. Finally combined your current query it will fetch you the users who does not have any other animals as well as have bird. Although the above query is not the best possible solution in terms of time complexity, but it's one of many solutions as well as easier to understand.
You can use GROUP BY AND HAVING
SELECT COUNT(DISTINCT(user_id)) FROM users_pets_map
WHERE pet_id IN (SELECT id FROM pets WHERE animal = 'bird')
GROUP BY pet_id HAVING COUNT(distinct pet_id)=1
One table is Users with id and email columns.
Another table is Payments with id, created_at, user_id and foo columns.
User has many Payments.
I need a query that returns each user's email, his last payment date and this last payment's foo value. How do I do that? What I have now is:
SELECT users.email, MAX(payments.created_at), payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
GROUP BY users.id
This is wrong, because foo value does not necessarily belong to user's most recent payment.
Try this :
select users.email,foo,create_at
from users
left join(
select a.* from payments a
inner join (
select id,user_id,max(create_at)
from payments
group by id,user_id
)b on a.id = b.id
) payments on users.id = payments.user_id
If users has no payment yet, then foo and create_at would return NULL. if you want to exclude users who has no payment, then use INNER JOIN.
One approach would be to use a MySQL version of rank over partition and then select only those rows with rank = 1:
select tt.email,tt.created_at,tt.foo from (
select t.*,
case when #cur_id = t.id then #r:=#r+1 else #r:=1 end as rank,
#cur_id := t.id
from (
SELECT users.id,users.email, payments.created_at, payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
order by users.id asc,payments.created_at desc
) t
JOIN (select #cur_id:=-1,#r:=0) r
) tt
where tt.rank =1;
This would save hitting the payments table twice. Could be slower though. Depends on your data!
I have a SQL query which does most of what I need it to do but I'm running into a problem.
There are 3 tables in total. entries, entry_meta and votes.
I need to get an entire row from entries when competition_id = 420 in the entry_meta table and the ID either doesn't exist in votes or it does exist but the user_id column value isn't 1.
Here's the query I'm using:
SELECT entries.* FROM entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
WHERE 1=1
AND ( ( entry_meta.meta_key = 'competition_id' AND CAST(entry_meta.meta_value AS CHAR) = '420') )
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
LIMIT 0, 25;
The votes table has 4 columns. vote_id, entry_id, user_id, value.
One option I was thinking of was to SELECT entry_id FROM votes WHERE user_id = 1 and include it in an AND clause in my query. Is this acceptable/efficient?
E.g.
AND entries.ID NOT IN (SELECT entry_id FROM votes WHERE user_id = 1)
A left join with an appropriate where clause might be useful:
SELECT
entries.*
FROM
entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
LEFT JOIN votes ON entries.ID = votes.entry_id
WHERE 1=1
AND (
entry_meta.meta_key = 'competition_id'
AND CAST(entry_meta.meta_value AS CHAR) = '420')
AND votes.entry_id IS NULL -- This will remove any entry with votes
)
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
Here's an implementation of Andrew's suggestion to use exists / not exists.
select
e.*
from
entries e
join entry_meta em on e.ID = em.entry_id
where
em.meta_key = 'competition_id'
and cast(em.meta_value as char) = '420'
and (
not exists (
select 1
from votes v
where
v.entry_id = e.ID
)
or exists (
select 1
from votes v
where
v.entry_id = e.ID
and v.user_id != 1
)
)
group by e.ID
order by e.submission_date desc
limit 0, 25;
Note: it's generally not a good idea to put a function inside a where clause (due to performance reasons), but since you're also joining on IDs you should be OK.
Also, The left join suggestion by Barranka may cause the query to return more rows than your are expecting (assuming that there is a 1:many relationship between entries and votes).
LEFT JOIN
(
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
I am trying to fit WHERE LENGTH(review) > 100 in this somewhere but every I put it, it gives me problems.
The sub-query above counts all total reviews by user_id. I simply want to add one more qualification. Only count reviews greater than 100 length.
On a side note, I've seen the function CHAR_LENGTH -- not sure if that i what I need either.
EDIT:
Here is complete query working perfectly as expected for my needs:
static public $top_users = "
SELECT u.username, u.score,
(COALESCE(a.totalCount, 0) * 4) +
(COALESCE(b.totalCount, 0) * 5) +
(COALESCE(c.totalCount, 0) * 1) +
(COALESCE(d.totalCount, 0) * 2) +
(COALESCE(u.friend_points, 0)) AS totalScore
FROM users u
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM items
GROUP BY user_id
) a ON a.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM reviews
GROUP BY user_id
) b ON b.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM ratings
GROUP BY user_id
) c ON c.user_id = u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM comments
GROUP BY user_id
) d ON d.user_id = u.user_id
ORDER BY totalScore DESC LIMIT 25;";
LENGTH() returns the length of the string measured in bytes. You probably want CHAR_LENGTH() as it will give you the actual characters.
SELECT user_id, review, COUNT(user_id) totalCount
FROM reviews
WHERE CHAR_LENGTH(review) > 100
GROUP BY user_id, review
You're also not using GROUP BY correctly.
See the documentation
The query that you want is:
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount,
sum(case when length(review) > 100 then 1 else 0 end
) as NumLongReviews
FROM reviews
GROUP BY user_id
) b ON b.user_id= b.user_id
This counts both the reviews and the "long" reviews. That count is done using a case statement nested in a sum() function.