Collaborative filtering in MySQL? - mysql

I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B.
What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that.
Here's an example of 4 nodes:
User1: 'book A'->'book B'->'book C'
User2: 'book A'->'book B'->'book C'->'book D'
user3: 'book X'->'book Y'->'book C'->'book Z'
user4: 'book W'->'book Q'->'book C'->'book Z'
So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern.
So do any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this?
Thanks for your time!
Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"

Create a table and insert the test data:
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A'), (1, 'B'), (1, 'C');
insert into ub values (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');
insert into ub values (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');
insert into ub values (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');
Join the test data onto itself by book_id, and create a temporary table to hold each user_id and the number of books it has in common with the target user_id:
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
3 rows in set (0.00 sec)
We can see that user_id has 3 in common with user_id 1, but user_id 3 and user_id 4 only have 1 each.
Next, select all the books that the users in the temporary table have that do not match the target user_id's books, and arrange these by rank. Note that the same book might appear in different user's lists, so we sum the rankings for each book so that common books get a higher ranking.
select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id
where target.book_id is null
group by similar.book_id
order by total_rank desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| D | 3 |
| Z | 2 |
| X | 1 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
6 rows in set (0.00 sec)
Book Z appeared in two user lists, and so was ranked above X,Y,Q,W which only appeared in one user's list. Book D did best because it appeared in user_id 2's list, which had 3 items in common with target user_id 1.

Related

Using the count function on third table in two table select statement in MariaDB

I just spent a few hours reading through the MariaDB docs and various questions here trying to figure out a SQL statement that did what I want. I'm definitely not an expert... eventually I did get the result I expected, but I have no idea why it works. I want to be sure I am actually getting the result I want, and it isn't just working for the few test cases I have thrown at it.
I have three tables guestbook, users, and user_likes. I am trying to write a SQL statement that will return the user name and first name from users, post content, post date, post id from guestbook, and a third column likes which is the total number of times that post id from guestbook appears in the user_likes table. It should only return posts which are of type standard and should order the rows by ascending post date.
Sample data:
CREATE TABLE users
(`user_id` int, `user_first` varchar(6), `user_last` varchar(7),
`user_email` varchar(26), `user_uname` varchar(6))
;
INSERT INTO users
(`user_id`, `user_first`, `user_last`, `user_email`, `user_uname`)
VALUES
(0, 'Bob', 'Abc', 'email#example.com', 'user1'),
(13, 'Larry', 'Abc', 'email#example.com', 'user2'),
(15, 'Noel', 'Abc', 'email#example.com', 'user3'),
(16, 'Kate', 'Abc', 'email#example.com', 'user4'),
(17, 'Walter', 'Sobchak', 'walter.sobchak#shabbus.com', 'Walter'),
(18, 'Jae', 'Abc', 'email#example.com', 'user5')
;
CREATE TABLE user_likes
(`user_id` int, `post_id` int, `like_id` int)
;
INSERT INTO user_likes
(`user_id`, `post_id`, `like_id`)
VALUES
(0, 23, 1),
(0, 41, 2),
(13, 23, 7)
;
CREATE TABLE guestbook
(`post_id` int, `user_id` int, `post_date` datetime,
`post_content` varchar(27), `post_type` varchar(8),
`post_level` int, `post_parent` varchar(4))
;
INSERT INTO guestbook
(`post_id`, `user_id`, `post_date`, `post_content`,
`post_type`, `post_level`, `post_parent`)
VALUES
(2, 0, '2018-12-15 20:32:40', 'test1', 'testing', 0, NULL),
(8, 0, '2018-12-16 14:06:40', 'test2', 'testing', 0, NULL),
(9, 13, '2018-12-16 15:47:55', 'test4', 'testing', 0, NULL),
(23, 0, '2018-12-25 17:59:46', 'Merry Christmas!', 'standard', 0, NULL),
(39, 16, '2018-12-26 00:28:04', 'Hello!', 'standard', 0, NULL),
(40, 15, '2019-01-27 00:46:12', 'Hello 2', 'standard', 0, NULL),
(41, 18, '2019-02-25 00:44:35', 'What are you doing?', 'standard', 0, NULL)
;
I tried a whole bunch of convoluted statements involving count and couldn't get what I wanted. Through what seems like dumb luck I stumbled into creating this statement which appears to be giving me what I want.
SELECT
u.user_uname, u.user_first, g.post_id, g.post_date,
g.post_content, count(user_likes.post_id) AS likes
FROM
users AS u, guestbook AS g
LEFT JOIN
user_likes on g.post_id=user_likes.post_id
WHERE
u.user_id=g.user_id AND g.post_type='standard'
GROUP BY
g.post_id
ORDER BY
g.post_date ASC;
Question:
Why does this count function appear to work?
The count function that I was able to get working is this, but it only works for hard coded post_id values.
SELECT COUNT(CASE post_id WHEN 23 THEN 1 ELSE null END) FROM user_likes;
When I try to match the post_id from guestbook table by changing to this I get an incorrect value which appears to be the whole table of user_likes.
SELECT COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) FROM guestbook, user_likes;
Adding a GROUP BY guestbook.post_id to the end gets me closer, but now I need to figure out how to combine that with my original select statement.
+----------------------------------------------------------------------------+
| COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) |
+----------------------------------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 2 |
| 0 |
| 0 |
| 1 |
+----------------------------------------------------------------------------+
This is the output I want, which I am getting. I just don't trust that my statement is reliable or correct.
+------------+------------+---------+---------------------+---------------------+-------+
| user_uname | user_first | post_id | post_date | post_content | likes |
+------------+------------+---------+---------------------+---------------------+-------+
| user1 | Bob | 23 | 2018-12-25 17:59:46 | Merry Christmas! | 2 |
| user4 | Kate | 39 | 2018-12-26 00:28:04 | Hello! | 0 |
| user3 | Noel | 40 | 2019-01-27 00:46:12 | Hello 2 | 0 |
| user5 | Jae | 41 | 2019-02-25 00:44:35 | What are you doing? | 1 |
+------------+------------+---------+---------------------+---------------------+-------+
Fiddle of statement working: http://sqlfiddle.com/#!9/968656/1/0
JOIN + COUNT -- A query first combines the tables as directed by the JOIN and ON clauses. The result is put (at least logically) into a temporary table. Often this temp table has many more rows than any of the tables being JOINed.
Then the COUNT(..) is performed. It is counting the number of rows in that temp table. Maybe that count is exactly what you want, maybe it is a hugely inflated number.
count(user_likes.post_id) has the additional hiccup of not counting any rows where user_likes.post_id IS NULL. That is usually irrelevant, in which case, you should simply say COUNT(*).
Please don't use the commalist form for joining. Always use FROM a JOIN b ON ... where the ON clause says how tables a and b are related. If there is also some filtering, put that into the WHERE clause.
If the COUNT is too big, put aside the query you have developed and start over to develop a query that does exactly one thing -- compute the county. This query will probably use fewer tables.
Then build on that to get any other data you need. It may look something like
SELECT ...
FROM ( SELECT foo, COUNT(*) AS ct FROM t1 GROUP BY foo ) AS sub1
JOIN t2 ON t2.foo = sub1.foo
JOIN t3 ON ...
WHERE ...
Get that initial query that gets the right COUNT. Then, if needed, come back for more help.
As tried by Bryan
OK, I made a few changes.
SELECT u.user_uname, u.user_first,
g2.post_id, g2.post_content, g2.post_date,
sub.likes
FROM
(
SELECT g.post_id,
SUM(g.post_id = ul.post_id) AS likes
FROM guestbook AS g
JOIN user_likes AS ul
WHERE g.post_type = 'standard'
) AS sub
JOIN guestbook AS g2 ON sub.post_id = g2.post_id
JOIN users AS u ON u.user_id = g2.user_id;
Indexes:
guestbook: (post_type, post_id) -- for derived table
guestbook: (post_id) -- for outer SELECT
users: (user_id)
user_likes: (post_id)
Notes:
ORDER BY removed since it was useless in context.
COUNT..CASE changed to shorter SUM.
JOIN ON used
Since there is only one value coming from the derived table, this might work equally well:
SELECT u.user_uname, u.user_first,
g.post_id, g.post_content, g.post_date,
( SELECT COUNT(*)
FROM user_likes AS ul
WHERE g.post_id = ul.post_id
) AS likes
FROM guestbook AS g
JOIN users AS u USING(user_id);
WHERE g.post_type = 'standard'
This involved lots of changes; see if it looks 'right'. It is now a lot simpler.
Indexes are same as above.

I have to filter records by retrieving from SQL table. I want the values in the same field in my table column in sql to not show again

I am not able to do so please help. In the code I have written below there are two values in Speciality for program id 1. So is there a way to filter so that value are not shown again in the filtered results i.e. free lunch as specified here. While filtering i am getting checkbox like below, when I am retrieving from database.
a Free meal, Free lunch
b Free lunch
c Free Dinner
I want a to only show Free meal
INSERT INTO `programs` (`ProgramID`, `UserID`,`Speciality`) VALUES
(1, 'huy45', 'Free meal, Free lunch'),
(2, 'ga32','Free lunch'),
(3, 'sharvar3','Free Dinner'),
There is repeated informations in your table, and you don't want it. DRY !.
I would use another table to store the speciality, such as :
Speciality
id | name
----+-------------
1 | Free meal
2 | Free lunch
3 | Free dinner
So you can easilly use a foreign key to store such informations in your table programs
Next, you don't want to store serialized informations. This goes against the purpose of using a RDBMS.
I would structure the table programs like this :
ProgramID | UserID | SpecialityID
-----------+------------+--------------
1 | 'huy45' | 1
1 | 'huy45' | 2
2 | 'ga32' | 2
3 | 'sharvar3' | 3
To retrieve the ProgramID, UserID and Speciality for the Speciality name 'Free meal', ou can use then this query :
SELECT p.`ProgramID`,
p.`UserID`,
s.`name` AS "Speciality Name"
FROM `programs` p
INNER JOIN `Speciality` s
ON p.SpecialityID = s.id
WHERE s.`name` = 'Free lunch';
Schema (MySQL v5.7)
CREATE TABLE Speciality (
`id` INTEGER,
`name` VARCHAR(11)
);
INSERT INTO Speciality
(`id`, `name`)
VALUES
(1, 'Free meal'),
(2, 'Free lunch'),
(3, 'Free dinner');
CREATE TABLE programs (
`ProgramID` INTEGER,
`UserID` VARCHAR(10),
`SpecialityID` INTEGER
);
INSERT INTO programs
(`ProgramID`, `UserID`, `SpecialityID`)
VALUES
(1, 'huy45', 1),
(1, 'huy45', 2),
(2, 'ga32', 2),
(3, 'sharvar3', 3);
Query #1
SELECT p.`ProgramID`,
p.`UserID`,
s.`name` AS "Speciality Name"
FROM `programs` p
INNER JOIN `Speciality` s
ON p.SpecialityID = s.id
WHERE s.`name` = 'Free lunch';
| ProgramID | UserID | Speciality Name |
| --------- | ------ | --------------- |
| 1 | huy45 | Free lunch |
| 2 | ga32 | Free lunch |
View on DB Fiddle

INNER JOIN not working for me

I'm currently working on a database for my Magic: The Gathering Playgroup which keeps track of decks and more specific which decks win against how many others and so on.
The table "Wins" looks like the following:
PNr (Playernumber which is primary key in the table players)
DNr (Decknumber which is primary key in the table decks)
Date (combined primary key with MNr)
MNr (Matchnumber of the day)
Pl (Amount of Players in the game)
Loc (Location)
Code (containing of all the playing players Shortcuts, e.g. AMT for the Players Alex, Martin and Tobias, see below)
The table Players is pretty easy:
PNr
Pname (Playersname)
SC (Players Shortcut)
Now I wanted to make a Query that provides a table of Expected Winrate (which is 1/4 in a 4 Player game, 1/5 in a 5 player game etc.) and the actual amount of Wins for each player (and later on Expected and actual Winrate but I think I can workthat out on my own once I got this baby to work).
So far I've come up with smth like this:
SELECT a.'Player',a.'ExpectedWinrate',b.'Wins'
FROM(
SELECT
ROUND(((SUM(1/Pl))/Count(*))*100, 1) as 'ExpectedWinrate',
Players.Pname as 'Player'
FROM
Wins, Players
WHERE Code LIKE CONCAT('%', Players.SC, '%')
GROUP BY Players.Pname) a
INNER JOIN
(SELECT
Count(*) as 'Wins',
Players.Pname as 'Players'
FROM Players, Wins
WHERE Players.PNr = Wins.PNr
GROUP BY Players.Pname
ORDER BY Count(*) desc) b ON 'Players' = 'Player';
The problem that I've run into is that I need the Count(*) for two different things in one query so I had to make two independent ones and join them, but I don't know how to "name" them (in this case I tried with "a" and "b") in order to use expressions like a.'Player', a.'ExpectedWinrate', etc.
Can anyone help a MYSQL newb?^^
greetzSP
EDIT: added expample tables...
CREATE TABLE Players
(
PNr int primary key,
Pname varchar(20),
SC varchar(1)
);
INSERT INTO Players
(PNr, Pname, SC)
VALUES
(1, 'Tobias', 'T'),
(2, 'Alex', 'A'),
(3, 'Martin', 'M'),
(4, 'Maria', 'R');
CREATE TABLE Wins
(
PNr int,
DNr int,
Pl int,
Code varchar(10)
);
INSERT INTO Wins
(PNr, DNr, Pl, Code)
VALUES
(1, 13, 3, 'ATM'),
(4, 1, 4, 'RTMA'),
(3, 20, 3, 'RTM');
Wins: (leaving out columns that don't matter in this query)
| PNR | DNR | PL | CODE |
|-----|-----|----|------|
| 1 | 13 | 3 | ATM |
| 4 | 1 | 4 | RTMA |
| 3 | 20 | 3 | RTM |
Players:
| PNR | PNAME | SC |
|-----|--------|----|
| 1 | Tobias | T |
| 2 | Alex | A |
| 3 | Martin | M |
| 4 | Maria | R |
SELECT a.Player ,a.ExpectedWinrate ,b.Wins
FROM(
SELECT
ROUND(((SUM(1/w.Pl))/Count(*))*100, 1) as 'ExpectedWinrate'
,p.Pname as 'Player'
FROM Wins w inner join Players p
on w.Code LIKE CONCAT('%', p.SC, '%')
GROUP BY p.Pname
) a
inner join
(
SELECT
Count(*) as 'Wins'
,p.Pname as 'Players'
FROM Players p inner join Wins w
on p.PNr = w.PNr
GROUP BY p.Pname
--ORDER BY Count(*) desc
) b
ON a.Player = b.Players
I have tested it on SQL Server, try on MySQL

Columns with multiple values

I have one table called Employee that contains the following information like
ID Name Skills
1 xyz java,php,dotnet
2 abc ruby,java,python
Skills column saves comma seprated values. it could be one or more.
I want to design a query based on OR operate.When user search java, Database displays two employees likes xyz, abc.
I have tried this query but no result comes out:
SELECT m
FROM Employee m
Where m.Skills LIKE '%JAVA% MS PAINT%'
Any Suggestion?
Ideally you should not store the data in a comma-separated list. You should create a join table between the employees and the skills:
CREATE TABLE employees (`e_id` int, `e_name` varchar(3));
INSERT INTO employees (`e_id`, `e_name`)
VALUES
(1, 'xyz'),
(2, 'abc');
CREATE TABLE skills (`s_id` int, `s_name` varchar(6));
INSERT INTO skills (`s_id`, `s_name`)
VALUES
(1, 'java'),
(2, 'php'),
(3, 'dotnet'),
(4, 'ruby'),
(5, 'python');
CREATE TABLE employees_skills (`e_d` int, `s_id` int);
INSERT INTO employees_skills
(`e_d`, `s_id`)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 4),
(2, 1),
(2, 5);
Then when you want to select from the tables you will use:
select *
from employees e
inner join employees_skills es
on e.e_id = es.e_id
inner join skills s
on es.s_is = s.s_id
where s.s_name in ('java', 'ruby')
Or you can use the OR clause:
select *
from employees e
inner join employees_skills es
on e.e_id = es.e_id
inner join skills s
on es.s_is = s.s_id
where s.s_name = 'java'
or s.s_name = 'ruby'
use like not good solution. Full scan and slow query.
Create new table with catalog of skills.
Create table user_skills
You should set up your tables like this:
Employee:
ID | Name
---+------
1 | xyz
2 | abc
Skill:
ID | Name
---+------
1 | java
2 | php
3 | dotnet
4 | ruby
5 | python
EmployeeSkills:
ID | EmployeeID | SkillID
---+------------+----------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 4
5 | 2 | 1
6 | 2 | 5
the query to find employees with skills in java would look like this
SELECT
E.Name
FROM
Employee AS E
INNER JOIN
EmployeeSkill AS ES
ON
ES.EmployeeID = E.ID
INNER JOIN
Skill AS S
ON
ES.SkillID = S.ID
WHERE
S.Name = 'java'
select name from table where skill like '%java%' should do

Check for match in other column

I am trying to fabricate an SQL query that will provide these results:
| Category Title | Subcategory Of |
-----------------------------------
| Category 1 | |
| Category 2 | |
| Category 3 | |
| Category 4 | |
| Category 5 | |
| Category 6 | Category 4 |
| Category 7 | Category 5 |
This is what my database looks like:
CREATE TABLE `categories` (
`category_id` int(4) NOT NULL AUTO_INCREMENT,
`subcategory_id` int(4) NOT NULL,
`category_title` longtext COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `categories` (`category_id`, `subcategory_id`, `category_title`) VALUES
(1, 0, 'Category 1'),
(2, 0, 'Category 2'),
(3, 0, 'Category 3'),
(4, 0, 'Category 4'),
(5, 0, 'Category 5'),
(6, 4, 'Category 6'),
(7, 5, 'Category 7');
I thought that you would use JOIN, but I wasn't able to mentally think of what kind of query to run, since as far as I knew JOIN was for joining two tables, not two columns. I'm new to these advanced queries (I'm good with INSERT, UPDATE, DELETE, etc. though). Any help is appreciated.
This is what I was trying, which makes no sense really.
SELECT * FROM categories RIGHT JOIN categories ON subcategory_id = category_id
It's called a self-join. You incldue the table name twice in the query, but giving it two different aliases and then it's just like a normal join:
SELECT
C1.category_title AS category_title,
C2.category_title AS subcategory_of
FROM categories C1
LEFT JOIN categories C2
ON C1.subcategory_id = C2.category_id
as far as I knew JOIN was for joining two tables, not two columns
A better way to think about JOIN is that it defines the relationship in your query between columns.
There is no restriction that the columns being joined be in different tables. The only issue is how to refer to them, which you do using aliases, as described by a previous answer. Even when joining different tables the query is, usually, easier to read if you use aliases for the table names.
Aliases are also useful when you need to join two (or more) tables with identical column names.