Find users with most number of common pages they liked - mysql

I am trying to find pairs of users who liked the same pages and list the ones who have the most common page likes at the top.
For simplicity I am considering the following table schema
Likes (LikeID, UserID)
LikeDetail (LikeDetailID, LikeID, PageID)
I am trying to find pairs of users with most number of common page likes ordered descending. E.g User1 and User2 have liked 3 pages in common.
I would to have the resulting set of the query to be
UserID1 UserID2 NoOfCommonLikes
2 3 10
4 3 8
1 5 4
I am guessing it would need aggregation, join and aliases however I needed to rename a table twice using AS which did not work for me.
Any tip would be appreciated in MySQL, or SQL Server.

In SQL Server and MySQL 8+, you can use a CTE which JOINs the Likes and LikeDetail table, and then self-JOIN that where PageID is the same but UserID is not, and then grouping on the two userID values:
WITH CTE AS
(SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID)
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM CTE l1
JOIN CTE l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
In versions of MySQL prior to 8.0, you need to repeat the CTE defintion twice in a JOIN to achieve the same result:
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l1
JOIN (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
Note that we use < in the UserID comparison rather than != to avoid getting duplicate rows (e.g. for (UserID1, UserID2) = (1, 2) and (UserID1, UserID2) = (2, 1).
I've made a small demo on dbfiddle which demonstrate the queries.

Related

Query to select random values with inner join on three tables

I have a database with tree tables,
person: id, bio, name
book: id, id_person, title, info
file: id, id_book, location
Other information: Book is about ~50,000 rows, File is about ~ 300,000 rows.
What I'm trying to do is to select 12 different authors and select just one book and from that book select location from the table file.
What I tried is the following:
SELECT DISTINCT(`person`.`id`), `person`.`name`, `book`.`id`, `book`.`title`, `book`.`info`, `file`.`location`
FROM `person`
INNER JOIN `book`
ON `book`.`id_person` = `person`.`id`
INNER JOIN `file`
ON `file`.`id_book` = `book`.`id`
LIMIT 12
I have learned that the DISTINCT does not work the way one might expect. Or is it me that I'm missing something? The above code returns books from the same author and goes with the next one. Which is NOT what I want. I want 1 book from each one of the 12 different authors.
What would be the correct way to retrieve this information from the database? Also, I would want to retrieve 12 random people. Not people that are stored in consecutive order in the database,. I could not formulate any query with rand() since I couldn't even get different authors.
I use MariaDB. And I would appreciate any help, especially help that allows to me do this with great performance.
In MySQL, you can do this, in practice, using GROUP BY
SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`, f.`location`
FROM `person` p INNER JOIN
`book` b
ON b.`id_person` = p.`id` INNER JOIN
`file` f
ON f.id_book = b.id
GROUP BY p.id
ORDER BY rand()
LIMIT 12;
However, this is not guaranteed to return the non-id values from the same row (although it does in practice). And, although the authors are random, the books and locations are not.
The SQL Query to do this consistently is a bit more complicated:
SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`,
(SELECT f.location
FROM file f
WHERE f.id_book = b.id
ORDER BY rand()
LIMIT 1
) as location
FROM (SELECT p.*,
(SELECT b.id
FROM book b
WHERE b.id_person = p.id
ORDER BY rand()
LIMIT 1
) as book_id
FROM person p
ORDER BY rand()
LIMIT 12
) p INNER JOIN
book b
ON b.id = p.book_id ;

MySQL "Distinct" join super slow

I have the following query which gives me the right results. But it's super slow.
What makes it slow is the
AND a.id IN (SELECT id FROM st_address GROUP BY element_id)
part. The query should show from which countries we get how many orders.
A person can have multiple addresses, but in this case, we only only want one.
Cause otherwise it will count the order multiple times. Maybe there is a better way to achieve this? A distinct join on the person or something?
SELECT cou.title_en, COUNT(co.id), SUM(co.price) AS amount
FROM customer_order co
JOIN st_person p ON (co.person_id = p.id)
JOIN st_address a ON (co.person_id = a.element_id AND a.element_type_id = 1)
JOIN st_country cou ON (a.country_id = cou.id)
WHERE order_status_id != 7 AND a.id IN (SELECT id FROM st_address GROUP BY element_id)
GROUP BY cou.id
Have you tried to replace the IN with an EXISTS?
AND EXISTS (SELECT 1 FROM st_address b WHERE a.id = b.id)
The EXISTS part should stop the subquery as soon as the first row matching the condition is found. I have read conflicting comments on if this is actually happening though so you might throw a limit 1 in there to see if you get any gain.
I found a faster solution. The trick is a join with a sub query:
JOIN (SELECT element_id, country_id, id FROM st_address WHERE element_type_id = 1 GROUP BY
This is the complete query:
SELECT cou.title_en, COUNT(o.id), SUM(o.price) AS amount
FROM customer_order o
JOIN (SELECT element_id, country_id, id FROM st_address WHERE element_type_id = 1 GROUP BY element_id) AS a ON (o.person_id = a.element_id)
JOIN st_country cou ON (a.country_id = cou.id)
WHERE o.order_status_id != 7
GROUP BY cou.id

MySQL - COUNT and retrieve n rows from a subquery

Context:
I have an app that shows posts and comments on the home page.
My intention is to limit the number of posts shown (ie, 10 posts) and...
Limit the number of comments shown per post (ie, 2 comments).
Show the total number of comments in the front end (ie, "read all 10 comments")
MySQL:
(SELECT *
FROM (SELECT *
FROM post
ORDER BY post_timestamp DESC
LIMIT 0, 10) AS p
JOIN user_profiles
ON user_id = p.post_author_id
LEFT JOIN (SELECT *
FROM data
JOIN pts
ON pts_id = pts_id_fk) AS d
ON d.data_id = p.data_id_fk
LEFT JOIN (SELECT *
FROM comment
JOIN user_profiles
ON user_id = comment_author_id
ORDER BY comment_id ASC) AS c
ON p.post_id = c.post_id_fk))
I've failed to insert LIMIT and COUNT in this code to get what I want - any suggestions? - will be glad to post more info if needed.
If I'm understanding you correctly you want no more than 10 posts (and 2 comments) to come back for each unique user in the returned result set.
This is very easy in SQLServer / Oracle / Postgre using a "row_number() PARTITION BY".
Unfortunately there is no such function in MySql. Similar question has been asked here:
ROW_NUMBER() in MySQL
I'm sorry I can't offer a more specific solution for MySql. Definitely further research "row number partition by" equivalents for MySql.
The essence of what this does:
You can add a set of columns that make up a unique set, say user id for example sake (this is the "partition") A "row number" column is then added to each row that matches the partition and starts over when it changes.
This should illustrate:
user_id row_number
1 1
1 2
1 3
2 1
2 2
You can then add an outer query that says: select where row_number <= 10, which can be used in your case to limit to no more than 10 posts. Using the max row_number for that user to determine the "read all 10 comments" part.
Good luck!
This is the skeleton of the query you're looking for:
select * from (
select p1.id from posts p1
join posts p2 on p1.id <= p2.id
group by p1.id
having count(*) <= 3
order by p1.post_timestamp desc
) p left join (
select c1.id, c2.post_id from comments c1
join comments c2 on c1.id <= c2.id and c1.post_id = c2.post_id
group by c1.id
having count(*) <= 2
order by c1.comment_timestamp desc
) c
on p.id = c.post_id
It will get posts ordered by their descending timestamp but only the top 3 of them. That result will be joined with the top 2 comments of each post order by their descending timestamp. Just change the column names and it will work :)

How can I use MySQL to COUNT with a LEFT JOIN?

How can I use MySQL to count with a LEFT JOIN?
I have two tables, sometimes the Ratings table does not have ratings for a photo so I thought LEFT JOIN is needed but I also have a COUNT statement..
Photos
id name src
1 car bmw.jpg
2 bike baracuda.jpg
Loves (picid is foreign key with photos id)
id picid ratersip
4 1 81.0.0.0
6 1 84.0.0.0
7 2 81.0.0.0
Here the user can only rate one image with their IP.
I want to combine the two tables in order of the highest rating. New table
Combined
id name src picid
1 car bmw.jpg 1
2 bike baracuda.jpg 2
(bmw is highest rated)
My MySQL code:
SELECT * FROM photos
LEFT JOIN ON photos.id=loves.picid
ORDER BY COUNT (picid);
My PHP Code: (UPDATED AND ADDED - Working Example...)
$sqlcount = "SELECT p . *
FROM `pics` p
LEFT JOIN (
SELECT `loves`.`picid`, count( 1 ) AS piccount
FROM `loves`
GROUP BY `loves`.`picid`
)l ON p.`id` = l.`picid`
ORDER BY coalesce( l.piccount, 0 ) DESC";
$pics = mysql_query($sqlcount);
MySQL allows you to group by just the id column:
select
p.*
from
photos p
left join loves l on
p.id = l.picid
group by
p.id
order by
count(l.picid)
That being said, I know MySQL is really bad at group by, so you can try putting the loves count in a subquery in your join to optimize it:
select
p.*
from
photos p
left join (select picid, count(1) as piccount from loves group by picid) l on
p.id = l.picid
order by
coalesce(l.piccount, 0)
I don't have a MySQL instance to test out which is faster, so test them both.
You need to use subqueries:
SELECT id, name, src FROM (
SELECT photos.id, photos.name, photos.src, count(*) as the_count
FROM photos
LEFT JOIN ON photos.id=loves.picid
GROUP BY photos.id
) t
ORDER BY the_count
select
p.ID,
p.name,
p.src,
PreSum.LoveCount
from
Photos p
left join ( select L.picid,
count(*) as LoveCount
from
Loves L
group by
L.PicID ) PreSum
on p.id = PreSum.PicID
order by
PreSum.LoveCount DESC
I believe you just need to join the data and do a count(*) in your select. Make sure you specify which table you want to use for ambigous columns. Also, don't forget to use a group by function when you do a count(*). Here is an example query that I run on MS SQL.
Select CmsAgentInfo.LOGID, LOGNAME, hCmsAgent.SOURCEID, count(*) as COUNT from hCmsAgent
LEFT JOIN CmsAgentInfo on hCmsAgent.logid=CmsAgentInfo.logid
where SPLIT = '990'
GROUP BY CmsAgentInfo.LOGID, LOGNAME, hCmsAgent.SOURCEID
The example results form this will be something like this.
77615 SMITH, JANE 1 36
29422 DOE, JOHN 1 648
Hope that helps. Good Luck.

SELECTING from a previous record and using two primary ORDER BY(s)

users:
user_id user_name
---------------------
1 User A
2 User B
tracking:
user_id track
---------------------
1 no
2 no
applications:
user_id date_of_application date_ended grade status
---------------------------------------------------------------
1 2011-01-01 2011-02-28 1.0 Ended
1 2011-02-02 2011-03-28 1.0 Ended
1 2011-03-03 2011-04-28 (1.5) Ended
2 2011-01-01 2011-02-20 2.0 Ended
2 2011-02-02 2011-03-11 2.5 Ended
2 2011-03-03 2011-04-28 (1.0) Ended
1 2011-05-10 - - Pending
2 2011-05-15 - - Pending
note that the table can contain multiple records of the same user as long as all its previous applications have ended (status = ended)
user_id is not unique (applies to the applications table only)
date is in yy-mm-dd format
date_ended and grade are only updated the instant the application has ended
also, I understand that it probably is recommended for 'status' to have its own table, however I would prefer that the above tables are taken as is (minus the typos and significant errors of course)
What I want to accomplish here is to retrieve all rows WHERE status is 'Pending' and such that the value for the grade column for each of these retrieved rows is the value of the latest grade (in other words the row with the latest date_ended), (in parenthesis above) where status is 'Ended' for this particular user (or row).
Also, I would need to have the first 10 rows of the result to be ORDERed BY grade ASC. And have the succeeding rows after that (11th row up to the final row) to be ORDERed BY date_of_application ASC.
Clearly SQL queries isn't my strongest area so I'm not sure if it's better (or is only possible) to perform those ORDER BY(s) using 2 or more queries. I however prefer this to be done using a single query only.
The desired result:
user_id user_name date_of_application grade status track
--------------------------------------------------------------------
1 User A 2011-05-10 (1.5) Pending no
2 User B 2011-05-15 (1.0) Pending no
Working code I have so far on my end [minus the possible typos], (and listed are additions to be applied):
latest grade
ORDER BY grade (first 10), ORDER BY date_of_application (11th up to last row)
Query:
SELECT users.user_name,
t.track,
a.user_id,
a.date_of_application,
a.status,
(SELECT ae.grade
FROM applications AS ae
WHERE ae.status = 'Ended'
AND ae.user_id = a.user_id
LIMIT 1) AS grade
FROM users
JOIN applications AS a ON users.user_id = a.user_id
JOIN tracking AS t ON users.user_id = t.user_id
WHERE a.status = 'Pending'
ORDER BY grade ASC
You probably trying to do too much in one query here.
Anyway, if you want something to hurt your eyes:
select a.* from
(
SELECT u.user_name,
a.user_id,
a.date_of_application,
td.grade,
a.status,
t.track
FROM users u
JOIN applications AS a ON u.user_id = a.user_id
JOIN tracking AS t ON u.user_id = t.user_id
LEFT OUTER JOIN
(
select ap.user_id,ap.grade
from applications ap
inner join
(select a.user_id,max(date_ended) as max_ended_date
from applications a
where a.status = 'Ended'
group by a.user_id
) md on md.user_id = ap.user_id and ap.date_ended = md.max_ended_date
) as td on u.user_id = td.user_id
WHERE a.status = 'Pending'
ORDER BY cast(replace(replace(td.grade,'(',''),')','') as decimal(12,2)),u.user_id ASC
LIMIT 10
) a
WHERE grade is not null
UNION ALL
select b.* from
(
SELECT u.user_name,
u.user_id,
a2.date_of_application,
td.grade,
ifnull(a2.status,'No applications yet') as status,
t2.track
FROM users u
LEFT OUTER JOIN (select user_id,date_of_application,status from applications where status = 'Pending') AS a2 ON u.user_id = a2.user_id
JOIN tracking AS t2 ON u.user_id = t2.user_id
LEFT OUTER JOIN
(
select ap.user_id,ap.grade
from applications ap
inner join
(select a.user_id,max(date_ended) as max_ended_date
from applications a
where a.status = 'Ended'
group by a.user_id
) md on md.user_id = ap.user_id and ap.date_ended = md.max_ended_date
) as td on u.user_id = td.user_id
where u.user_id not in (
select t1.user_id
from (
select ap1.user_id,ap1.grade
from applications ap1
inner join
(select a1.user_id,max(date_ended) as max_ended_date
from applications a1
where a1.status = 'Ended'
group by a1.user_id
) md1 on md1.user_id = ap1.user_id and ap1.date_ended = md1.max_ended_date
order by cast(replace(replace(ap1.grade,'(',''),')','') as decimal(12,2)),md1.user_id asc
limit 10
) as t1
)
ORDER BY status desc,a2.date_of_application ASC
) b;
This does make the following assumptions:
There is always only one row for each user_id in the users and
tracking table
EDIT
To explain this query a bit:
Inline view aliased a (aka 'The Top Half') brings back a list of the top 10 users according to their most recent 'ended' grade ascending. Note the following part of the query that strips any brackets from the grade, converts the resulting number to a decimal to 2 decimal places and orders them ascending by grade and then, in case of equal grade scores, by user_id:
ORDER BY cast(replace(replace(td.grade,'(',''),')','') as decimal(12,2)),u.user_id ASC
Inline view b is pretty much the same as inline view a except that excludes users that would appear in The Top Half and orders the results by status DESC (to move those users with no applications to the bottom of the list) and date of application ASC.
This should work well for you... To clarify what is going on, you have to start at the inner-most part of the query. For each user, find the highest "Pending" date (since as you stated, there would only be one), and the last "Ended" class date. Grouping by user. This will guarantee one record per user with both calculated up front as a PreQuery.
Next, do a self-join back to the applications table TWICE... once by the user and Last End date, next by user and last pending date. By doing a LEFT JOIN, if you only have a person with an application and no end, they will be included... likewise, if only a completed class with no more pending application, they too will be included.
Pull the respective columns from those aliased references to get the grade. While we're at it, by using SQL variables, and using this query's order by Grade DESCENDING will put the best grades from 1-n without respect to the application date.
Finally, take this entire result set and do a special order by... Order by the condition that if the user's rank is less than 11, use its order. Otherwise, let everyone else have the same "11" value for the first order by portion... After that, order by the application date.
Small chunks relying on the previous set. And this one shouldn't make your head hurt, nor does it require any unions
select
QryRank.*
from
( select
PreQuery.User_ID,
usr.user_Name,
trk.Track,
PreQuery.LastEnded,
appEnd.Grade,
PreQuery.LastPend as Date_Of_Application,
#Rank := #Rank +1 UserRank
from
( select
app.user_id,
max( if( app.status = "Ended", date_ended, null ) ) as LastEnded,
max( if( app.status = "Pending", app.date_of_application, null )) LastPend
from
Applications app
group by
app.user_id ) PreQuery
LEFT JOIN Applications appEnd
on PreQuery.User_ID = appEnd.User_ID
AND PreQuery.LastEnded = appEnd.date_ended
LEFT JOIN Applications appPend
on PreQuery.User_ID = appPend.User_ID
AND PreQuery.LastPend = appPend.date_of_application
join Users usr
on PreQuery.user_id = usr.user_id
join Tracking trk
on PreQuery.user_id = trk.user_id,
( select #Rank := 0 ) sqlvars
order by
appEnd.Grade DESC ) QryRank
order by
if( QryRank.UserRank < 11, QryRank.UserRank, 11 ),
QryRank.Date_Of_Application