Check for match in other column - mysql

I am trying to fabricate an SQL query that will provide these results:
| Category Title | Subcategory Of |
-----------------------------------
| Category 1 | |
| Category 2 | |
| Category 3 | |
| Category 4 | |
| Category 5 | |
| Category 6 | Category 4 |
| Category 7 | Category 5 |
This is what my database looks like:
CREATE TABLE `categories` (
`category_id` int(4) NOT NULL AUTO_INCREMENT,
`subcategory_id` int(4) NOT NULL,
`category_title` longtext COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `categories` (`category_id`, `subcategory_id`, `category_title`) VALUES
(1, 0, 'Category 1'),
(2, 0, 'Category 2'),
(3, 0, 'Category 3'),
(4, 0, 'Category 4'),
(5, 0, 'Category 5'),
(6, 4, 'Category 6'),
(7, 5, 'Category 7');
I thought that you would use JOIN, but I wasn't able to mentally think of what kind of query to run, since as far as I knew JOIN was for joining two tables, not two columns. I'm new to these advanced queries (I'm good with INSERT, UPDATE, DELETE, etc. though). Any help is appreciated.
This is what I was trying, which makes no sense really.
SELECT * FROM categories RIGHT JOIN categories ON subcategory_id = category_id

It's called a self-join. You incldue the table name twice in the query, but giving it two different aliases and then it's just like a normal join:
SELECT
C1.category_title AS category_title,
C2.category_title AS subcategory_of
FROM categories C1
LEFT JOIN categories C2
ON C1.subcategory_id = C2.category_id

as far as I knew JOIN was for joining two tables, not two columns
A better way to think about JOIN is that it defines the relationship in your query between columns.
There is no restriction that the columns being joined be in different tables. The only issue is how to refer to them, which you do using aliases, as described by a previous answer. Even when joining different tables the query is, usually, easier to read if you use aliases for the table names.
Aliases are also useful when you need to join two (or more) tables with identical column names.

Related

Using the count function on third table in two table select statement in MariaDB

I just spent a few hours reading through the MariaDB docs and various questions here trying to figure out a SQL statement that did what I want. I'm definitely not an expert... eventually I did get the result I expected, but I have no idea why it works. I want to be sure I am actually getting the result I want, and it isn't just working for the few test cases I have thrown at it.
I have three tables guestbook, users, and user_likes. I am trying to write a SQL statement that will return the user name and first name from users, post content, post date, post id from guestbook, and a third column likes which is the total number of times that post id from guestbook appears in the user_likes table. It should only return posts which are of type standard and should order the rows by ascending post date.
Sample data:
CREATE TABLE users
(`user_id` int, `user_first` varchar(6), `user_last` varchar(7),
`user_email` varchar(26), `user_uname` varchar(6))
;
INSERT INTO users
(`user_id`, `user_first`, `user_last`, `user_email`, `user_uname`)
VALUES
(0, 'Bob', 'Abc', 'email#example.com', 'user1'),
(13, 'Larry', 'Abc', 'email#example.com', 'user2'),
(15, 'Noel', 'Abc', 'email#example.com', 'user3'),
(16, 'Kate', 'Abc', 'email#example.com', 'user4'),
(17, 'Walter', 'Sobchak', 'walter.sobchak#shabbus.com', 'Walter'),
(18, 'Jae', 'Abc', 'email#example.com', 'user5')
;
CREATE TABLE user_likes
(`user_id` int, `post_id` int, `like_id` int)
;
INSERT INTO user_likes
(`user_id`, `post_id`, `like_id`)
VALUES
(0, 23, 1),
(0, 41, 2),
(13, 23, 7)
;
CREATE TABLE guestbook
(`post_id` int, `user_id` int, `post_date` datetime,
`post_content` varchar(27), `post_type` varchar(8),
`post_level` int, `post_parent` varchar(4))
;
INSERT INTO guestbook
(`post_id`, `user_id`, `post_date`, `post_content`,
`post_type`, `post_level`, `post_parent`)
VALUES
(2, 0, '2018-12-15 20:32:40', 'test1', 'testing', 0, NULL),
(8, 0, '2018-12-16 14:06:40', 'test2', 'testing', 0, NULL),
(9, 13, '2018-12-16 15:47:55', 'test4', 'testing', 0, NULL),
(23, 0, '2018-12-25 17:59:46', 'Merry Christmas!', 'standard', 0, NULL),
(39, 16, '2018-12-26 00:28:04', 'Hello!', 'standard', 0, NULL),
(40, 15, '2019-01-27 00:46:12', 'Hello 2', 'standard', 0, NULL),
(41, 18, '2019-02-25 00:44:35', 'What are you doing?', 'standard', 0, NULL)
;
I tried a whole bunch of convoluted statements involving count and couldn't get what I wanted. Through what seems like dumb luck I stumbled into creating this statement which appears to be giving me what I want.
SELECT
u.user_uname, u.user_first, g.post_id, g.post_date,
g.post_content, count(user_likes.post_id) AS likes
FROM
users AS u, guestbook AS g
LEFT JOIN
user_likes on g.post_id=user_likes.post_id
WHERE
u.user_id=g.user_id AND g.post_type='standard'
GROUP BY
g.post_id
ORDER BY
g.post_date ASC;
Question:
Why does this count function appear to work?
The count function that I was able to get working is this, but it only works for hard coded post_id values.
SELECT COUNT(CASE post_id WHEN 23 THEN 1 ELSE null END) FROM user_likes;
When I try to match the post_id from guestbook table by changing to this I get an incorrect value which appears to be the whole table of user_likes.
SELECT COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) FROM guestbook, user_likes;
Adding a GROUP BY guestbook.post_id to the end gets me closer, but now I need to figure out how to combine that with my original select statement.
+----------------------------------------------------------------------------+
| COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) |
+----------------------------------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 2 |
| 0 |
| 0 |
| 1 |
+----------------------------------------------------------------------------+
This is the output I want, which I am getting. I just don't trust that my statement is reliable or correct.
+------------+------------+---------+---------------------+---------------------+-------+
| user_uname | user_first | post_id | post_date | post_content | likes |
+------------+------------+---------+---------------------+---------------------+-------+
| user1 | Bob | 23 | 2018-12-25 17:59:46 | Merry Christmas! | 2 |
| user4 | Kate | 39 | 2018-12-26 00:28:04 | Hello! | 0 |
| user3 | Noel | 40 | 2019-01-27 00:46:12 | Hello 2 | 0 |
| user5 | Jae | 41 | 2019-02-25 00:44:35 | What are you doing? | 1 |
+------------+------------+---------+---------------------+---------------------+-------+
Fiddle of statement working: http://sqlfiddle.com/#!9/968656/1/0
JOIN + COUNT -- A query first combines the tables as directed by the JOIN and ON clauses. The result is put (at least logically) into a temporary table. Often this temp table has many more rows than any of the tables being JOINed.
Then the COUNT(..) is performed. It is counting the number of rows in that temp table. Maybe that count is exactly what you want, maybe it is a hugely inflated number.
count(user_likes.post_id) has the additional hiccup of not counting any rows where user_likes.post_id IS NULL. That is usually irrelevant, in which case, you should simply say COUNT(*).
Please don't use the commalist form for joining. Always use FROM a JOIN b ON ... where the ON clause says how tables a and b are related. If there is also some filtering, put that into the WHERE clause.
If the COUNT is too big, put aside the query you have developed and start over to develop a query that does exactly one thing -- compute the county. This query will probably use fewer tables.
Then build on that to get any other data you need. It may look something like
SELECT ...
FROM ( SELECT foo, COUNT(*) AS ct FROM t1 GROUP BY foo ) AS sub1
JOIN t2 ON t2.foo = sub1.foo
JOIN t3 ON ...
WHERE ...
Get that initial query that gets the right COUNT. Then, if needed, come back for more help.
As tried by Bryan
OK, I made a few changes.
SELECT u.user_uname, u.user_first,
g2.post_id, g2.post_content, g2.post_date,
sub.likes
FROM
(
SELECT g.post_id,
SUM(g.post_id = ul.post_id) AS likes
FROM guestbook AS g
JOIN user_likes AS ul
WHERE g.post_type = 'standard'
) AS sub
JOIN guestbook AS g2 ON sub.post_id = g2.post_id
JOIN users AS u ON u.user_id = g2.user_id;
Indexes:
guestbook: (post_type, post_id) -- for derived table
guestbook: (post_id) -- for outer SELECT
users: (user_id)
user_likes: (post_id)
Notes:
ORDER BY removed since it was useless in context.
COUNT..CASE changed to shorter SUM.
JOIN ON used
Since there is only one value coming from the derived table, this might work equally well:
SELECT u.user_uname, u.user_first,
g.post_id, g.post_content, g.post_date,
( SELECT COUNT(*)
FROM user_likes AS ul
WHERE g.post_id = ul.post_id
) AS likes
FROM guestbook AS g
JOIN users AS u USING(user_id);
WHERE g.post_type = 'standard'
This involved lots of changes; see if it looks 'right'. It is now a lot simpler.
Indexes are same as above.

I have to filter records by retrieving from SQL table. I want the values in the same field in my table column in sql to not show again

I am not able to do so please help. In the code I have written below there are two values in Speciality for program id 1. So is there a way to filter so that value are not shown again in the filtered results i.e. free lunch as specified here. While filtering i am getting checkbox like below, when I am retrieving from database.
a Free meal, Free lunch
b Free lunch
c Free Dinner
I want a to only show Free meal
INSERT INTO `programs` (`ProgramID`, `UserID`,`Speciality`) VALUES
(1, 'huy45', 'Free meal, Free lunch'),
(2, 'ga32','Free lunch'),
(3, 'sharvar3','Free Dinner'),
There is repeated informations in your table, and you don't want it. DRY !.
I would use another table to store the speciality, such as :
Speciality
id | name
----+-------------
1 | Free meal
2 | Free lunch
3 | Free dinner
So you can easilly use a foreign key to store such informations in your table programs
Next, you don't want to store serialized informations. This goes against the purpose of using a RDBMS.
I would structure the table programs like this :
ProgramID | UserID | SpecialityID
-----------+------------+--------------
1 | 'huy45' | 1
1 | 'huy45' | 2
2 | 'ga32' | 2
3 | 'sharvar3' | 3
To retrieve the ProgramID, UserID and Speciality for the Speciality name 'Free meal', ou can use then this query :
SELECT p.`ProgramID`,
p.`UserID`,
s.`name` AS "Speciality Name"
FROM `programs` p
INNER JOIN `Speciality` s
ON p.SpecialityID = s.id
WHERE s.`name` = 'Free lunch';
Schema (MySQL v5.7)
CREATE TABLE Speciality (
`id` INTEGER,
`name` VARCHAR(11)
);
INSERT INTO Speciality
(`id`, `name`)
VALUES
(1, 'Free meal'),
(2, 'Free lunch'),
(3, 'Free dinner');
CREATE TABLE programs (
`ProgramID` INTEGER,
`UserID` VARCHAR(10),
`SpecialityID` INTEGER
);
INSERT INTO programs
(`ProgramID`, `UserID`, `SpecialityID`)
VALUES
(1, 'huy45', 1),
(1, 'huy45', 2),
(2, 'ga32', 2),
(3, 'sharvar3', 3);
Query #1
SELECT p.`ProgramID`,
p.`UserID`,
s.`name` AS "Speciality Name"
FROM `programs` p
INNER JOIN `Speciality` s
ON p.SpecialityID = s.id
WHERE s.`name` = 'Free lunch';
| ProgramID | UserID | Speciality Name |
| --------- | ------ | --------------- |
| 1 | huy45 | Free lunch |
| 2 | ga32 | Free lunch |
View on DB Fiddle

LIMIT number of rows in a JOIN between MySQL tables

What I have
I have the following two tables in a MySQL database (version 5.6.35).
CREATE TABLE `Runs` (
`Name` varchar(200) NOT NULL,
`Run` varchar(200) NOT NULL,
`Points` int(11) NOT NULL
) DEFAULT CHARSET=latin1;
INSERT INTO `Runs` (`Name`, `Run`, `Points`) VALUES
('John', 'A08', 12),
('John', 'A09', 3),
('John', 'A01', 15),
('Kate', 'A02', 92),
('Kate', 'A03', 1),
('Kate', 'A04', 33),
('Peter', 'A05', 8),
('Peter', 'A06', 14),
('Peter', 'A07', 5);
CREATE TABLE `Users` (
`Name` varchar(500) NOT NULL,
`NumberOfRun` int(11) NOT NULL
) DEFAULT CHARSET=latin1;
INSERT INTO `Users` (`Name`, `NumberOfRun`) VALUES
('John', 2),
('Kate', 1),
('Peter', 3);
ALTER TABLE `Runs`
ADD PRIMARY KEY (`Run`);
What is my target
John have Users.NumberOfRun=2, so I will extract the 2 top records from Runs table
Kate have Users.NumberOfRun=1, so I will extract the 1 top record from Runs table
Peter have Users.NumberOfRun=3, so I will extract the 3 top records from Runs table
I would like to came to the following result
+-------+-----+--------+
| Name | Run | Points |
+-------+-----+--------+
| John | A01 | 15 |
| John | A08 | 12 |
| Kate | A02 | 92 |
| Peter | A06 | 14 |
| Peter | A05 | 8 |
| Peter | A07 | 5 |
+-------+-----+--------+
What I have tried
First of all, if it was SQL Server I would use ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ) AS [rn] function to the Runs table and then make a JOIN with the Users table on Users.NumberOfRun<=[rn].
I have read this document but it seems that PARTITONING in MySQL it is available since version 8.X, but I am using the 5.6.X version.
Finally, I have tried this query, based on this Stackoverflow answer:
SELECT t0.Name,t0.Run
FROM Runs AS t0
LEFT JOIN Runs AS t1 ON t0.Name=t1.Name AND t0.Run=t1.Run AND t1.Points>t0.Points
WHERE t1.Points IS NULL;
but it doesn't give me the row number, which is essentially for me to make a JOIN as described above.
SQL Fiddle to this example.
A combination of 'group_concat' and 'find_in_set', followed by the filtering using the position returned by 'find_in_set' will do the job for you.
GROUP_CONCAT will sort the data in descending order of points first.
GROUP_CONCAT(Run ORDER BY Points DESC)
FIND_IN_SET will then retrieve the number of rows you want to include in the result.
FIND_IN_SET(Run, grouped_run) BETWEEN 1 AND Users.NumberOfRun
The below query should work for you.
SELECT
Runs.*
FROM
Runs
INNER JOIN (
SELECT
Name, GROUP_CONCAT(Run ORDER BY Points DESC) grouped_run
FROM
Runs
GROUP BY Name
) group_max ON Runs.Name = group_max.Name
INNER JOIN Users ON Users.Name = Runs.Name
WHERE FIND_IN_SET(Run, grouped_run) BETWEEN 1 AND Users.NumberOfRun
ORDER BY
Runs.Name Asc, Runs.Points DESC;

JOIN query in MySQL produces wrong result

I have two tables complaints and complaints_reply in my MySQl database. Users can add complaints which are stored in complaints the complaints reply are stored in complaints_reply table. I am trying to JOIN both these table contents on a specific condition. Before I mention what I am trying to get and the problem I faced, I will explain the structure of these two tables first.
NB: The person who adds complaints is complaint owner & person who adds a complaint reply is complaint replier. Complaint owner can also add replies. So he can either be the complaint owner or the complaint replier. The two tables have a one-to-many relationship. A complaint can have more than one complaint reply. member_id in complaint table represents complaint owner & mem_id in complaints_reply represent complaint replier
DESIRED OUTPUT:
Join the two tables and fetch values and show the complaint and complaint’s reply as a single result set. But the condition is kinda tricky. The last added complaint reply from the complaints_reply table should be fetched for the complaint in complaints table in such a way that the complaint owner should not be the complaint replier. I use posted_date & posted_time from complaints_reply table to fetch the last added complaint reply for a complaint & that complaint replier has to be shown in the result set.
So, from the sample data the tables contain now, the output that I should get is:
+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |2002 | 2015-05-2610:11:17|
| 2 | y | 1001 |1000 | 2015-05-2710:06:16|
+------+---------+----------+-------------+-------------------+
But what I got is:
+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |1001 | 2015-05-2610:11:17|
| 2 | y | 1001 |2000 | 2015-05-2710:06:16|
+------+---------+----------+-------------+-------------------+
The date is correct, but the returned complaint replier last_replier is wrong.
This is my query.
SELECT com.id,
com.title,
com.member_id,
last_comp_reply.last_replier,
last_comp_reply.last_posted_dt
FROM complaints com
LEFT JOIN
(SELECT c.id AS complaint_id,
c.member_id AS parent_mem_id,
cr.mem_id AS last_replier,
max(cr.posted_dt) AS last_posted_dt
FROM
(SELECT cr.complaint_id,cr.mem_id,c.id,c.member_id,(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id=c.id
AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
cr.mem_id,
posted_dt)cr,
complaints c
WHERE cr.complaint_id=c.id
GROUP BY cr.complaint_id,
c.id,
c.member_id) AS last_comp_reply ON com.id=last_comp_reply.complaint_id
Table structure for table complaints
CREATE TABLE IF NOT EXISTS `complaints` (
`id` int(11) NOT NULL,
`title` varchar(500) NOT NULL,
`member_id` int(11) NOT NULL,
`posted_date` date NOT NULL,
`posted_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
Indexes for table complaints
ALTER TABLE `complaints`
ADD PRIMARY KEY (`id`);
AUTO_INCREMENT for table complaints
ALTER TABLE `complaints`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=3;
Dumping data for table complaints
INSERT INTO `complaints` (`id`, `title`, `member_id`, `posted_date`, `posted_time`) VALUES
(1, 'x', 1000, '2015-05-05', '02:06:15'),
(2, 'y', 1001, '2015-05-14', '02:08:10');
Table structure for table complaints_reply
CREATE TABLE IF NOT EXISTS `complaints_reply` (
`id` int(11) NOT NULL,
`complaint_id` int(11) NOT NULL,
`comments` text NOT NULL,
`mem_id` int(11) NOT NULL,
`posted_date` date NOT NULL,
`posted_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;
Indexes for table complaints_reply
ALTER TABLE `complaints_reply`
ADD PRIMARY KEY (`id`);
AUTO_INCREMENT for table complaints_reply
ALTER TABLE `complaints_reply`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=10;
Dumping data for table complaints_reply
INSERT INTO `complaints_reply` (`id`, `complaint_id`, `comments`, `mem_id`, `posted_date`, `posted_time`) VALUES
(1, 1, 'reply1', 2000, '2015-05-08', '02:07:08'),
(2, 1, 'reply2', 2001, '2015-05-06', '06:05:08'),
(3, 1, 'reply3', 1000, '2015-05-14', '02:12:13'),
(4, 2, 'hola', 1000, '2015-05-27', '10:06:16'),
(5, 2, 'hello', 2000, '2015-05-04', '03:09:09'),
(6, 2, 'gracias', 1001, '2015-05-31', '06:12:18'),
(7, 1, 'reply4', 1001, '2015-01-04', '04:08:12'),
(8, 2, 'puta', 1001, '2015-06-13', '06:12:18'),
(9, 1, 'reply5', 1000, '2015-06-01', '04:08:12'),
(10, 1, 'reply next', 2002, '2015-05-26', '10:11:17');
P.S.
To give an idea about what my query is all about, I'll explain the sub query that is used to combine the tables & give result based on the condition: complaint owner should not be the complaint replier is:
SELECT cr.complaint_id,
cr.mem_id,
c.id,
c.member_id,
(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id=c.id
AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
cr.mem_id,
posted_dt
And the result for this is:
+--------------+---------+----------+-------------+-------------------+
| complaint_id | mem_id | id |member_id | posted_dt |
+--------------+---------+------- +-------------+-------------------+
| 1 | 1001 | 1 |1000 | 2015-01-0404:08:12|
| 1 | 2000 | 1 |1000 | 2015-05-0802:07:08|
| 1 | 2001 | 1 |1000 | 2015-05-0606:05:08|
| 1 | 2002 | 1 |1000 | 2015-05-2610:11:17|
| 2 | 1000 | 2 |1001 | 2015-05-2710:06:16|
| 2 | 2000 | 2 |1001 | 2015-05-0403:09:09|
+--------------+---------+----------+-------------+-------------------+
member_id here represents complaint owner and mem_id represents complaint replier
The inner query gives the result based on the condition, then everything after this goes haywire. I don't know where I made mistake. The complaint replies added by complaint owner is not fetched in this table. So far so good. Is there any alternative way to get the result from here?
This query gives the result.
SELECT com.id AS complaint_id,
com.member_id AS parent_mem_id,
crep.mem_id AS last_replier,
crl.last_posted_dt
FROM complaints com
LEFT JOIN complaints_reply crep ON com.id=crep.complaint_id
JOIN
(SELECT cr.complaint_id,
max(CONCAT(cr.posted_date,'_',cr.posted_time)) AS last_posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id=c.id
AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id)crl ON CONCAT(crep.posted_date,'_',crep.posted_time)=crl.last_posted_dt
AND crep.complaint_id=crl.complaint_id

Collaborative filtering in MySQL?

I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B.
What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that.
Here's an example of 4 nodes:
User1: 'book A'->'book B'->'book C'
User2: 'book A'->'book B'->'book C'->'book D'
user3: 'book X'->'book Y'->'book C'->'book Z'
user4: 'book W'->'book Q'->'book C'->'book Z'
So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern.
So do any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this?
Thanks for your time!
Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"
Create a table and insert the test data:
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A'), (1, 'B'), (1, 'C');
insert into ub values (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');
insert into ub values (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');
insert into ub values (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');
Join the test data onto itself by book_id, and create a temporary table to hold each user_id and the number of books it has in common with the target user_id:
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
3 rows in set (0.00 sec)
We can see that user_id has 3 in common with user_id 1, but user_id 3 and user_id 4 only have 1 each.
Next, select all the books that the users in the temporary table have that do not match the target user_id's books, and arrange these by rank. Note that the same book might appear in different user's lists, so we sum the rankings for each book so that common books get a higher ranking.
select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id
where target.book_id is null
group by similar.book_id
order by total_rank desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| D | 3 |
| Z | 2 |
| X | 1 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
6 rows in set (0.00 sec)
Book Z appeared in two user lists, and so was ranked above X,Y,Q,W which only appeared in one user's list. Book D did best because it appeared in user_id 2's list, which had 3 items in common with target user_id 1.