Entry with most matching relations - mysql

I'm trying to create little "recommended" functionality based on the posts with the most matching tags.
I got a layout like this:
Posts
id
---
1
2
3
4
post_tags
post_id | tag_id
---------+---------
1 | 1
1 | 2
2 | 2
2 | 3
2 | 4
3 | 1
3 | 2
3 | 4
4 | 5
tags
id
----
1
2
3
4
5
So if I would retrieve recommendations for the post with id 1 the list should go
3 (2/2 matches)
2 (1/2 matches)
4 (0/2 matches)
My Query so far looks like this:
SELECT DISTINCT
p.id,
p.title,
count(*) as cnt
FROM
posts p
INNER JOIN posts_tags pt ON pt.post_id= p.id
INNER JOIN tags t ON pt.tag_id = t.id
WHERE
t.id IN (
SELECT
pt.tag_id
FROM
posts_tags pt
WHERE
pt.post_id = '30213'
)
GROUP BY
t. NAME
order by count(*) desc
LIMIT 0, 4
I know DISTINCT isn't working because of the count but I wanted to see just what he counted, so the result looks like this:
4 Foo 4881
4 Foo 2560
11 Bar 2094
12 Baz 1998
So what happened? It counted the occurences of the tag in general. So appearantly the first associated tag of "Post 1" is 4881 associated and then pulls the first entry that matches... the one with the lowest id.
I see the problem but I can't solve it.

Your group by makes no sense. You want to aggregate by the post not the tag:
SELECT p.id, p.title, count(*) as cnt
FROM posts p INNER JOIN
posts_tags pt
ON pt.post_id = p.id
WHERE pt.tag_id IN (SELECT pt2.tag_id
FROM posts_tags pt2
WHERE pt2.post_id = 30213
)
GROUP BY p.id, p.title
ORDER BY count(*) desc
LIMIT 0, 4;
This will not return 0. If that is important, you need to use a LEFT JOIN instead of WHERE . . . IN . . ..
Also:
SELECT DISTINCT is almost never used with GROUP BY. It is hard (but not impossible) to come up with a use-case for it.
You don't need the tags table, so I removed it.
Don't use single quotes around numbers. I am guessing that post_id is really a number.
The fix is in the GROUP BY.

Related

SQL select posts without specific tag

Straight to the point.
I have three tables POSTS, TAGS, POST_TAGS
POSTS { p_id, title }
TAGS { t_id, name }
POST_TAGS { p_id, t_id }
One post can have multiple tags and i want to select all post that don't have a specific tag. for example take this demo data:
TASKS
p_id | title
1 MyPost 1
2 MyPost 2
3 MyPost 3
TAGS
t_id | name
1 red
2 green
POST_TAGS
p_id | t_id
1 1
2 1
2 2
3 2
Now i want to see all POSTS that do not have the TAG 'green'. My current SQL query looks like this:
SELECT DISCTINCT
p.p_id, p.title
FROM
POSTS as p,
POST_TAGS as pt
WHERE
pt.p_id = p.p_id AND pt.t_id != 2
but this is going to return me this
RESULT
p_id | title
1 MyPost 1
2 MyPost 2
because 'MyPost 2' also has the TAG red it is listet.
Desired result is:
RESULT
p_id | title
1 MyPost 1
EDIT:
Thanks to all of you guys, i accepted GarethD answer because NOT EXISTS is more self-explanatory. NOT IN is working but not NULL save (even if i wasn't asking for it - thanks to Nico Haase as well)
GermanC solution also is correct and working, but isn't as self-explanatory as the selected answer. thanks to you too.
You can do this using NOT EXISTS:
SELECT p.p_id, p.title
FROM POSTS AS p
WHERE NOT EXISTS
( SELECT 1
FROM POST_TAGS AS pt
WHERE pt.p_id = p.p_id
AND pt.t_id = 2
);
You can explicitely join looking for the Green tag, and show those posts where the join was not successful:
SELECT
p.p_id, p.title
FROM
POSTS as p
LEFT OUTER JOIN
POST_TAGS as pt on pt.p_id = p.p_id AND pt.t_id = 2
WHERE
pt.p_id is null
This will do the job, as it searches for all postings which are tagged with 2 in an inner query and excludes them in the outer one
SELECT DISTINCT p.p_id WHERE p.p_id NOT IN(
SELECT DISCTINCT
p.p_id
FROM
POSTS as p,
POST_TAGS as pt
WHERE
pt.p_id = p.p_id AND pt.t_id = 2
)

MySQL - JOIN multiple rows to single row multiple times

I am trying to join multiple rows of information for single row, but it seems to multiply every time there is more rows in one of the joins.
My tables structure is as follows:
news
id | title | public
------------------------
1 | Test | 0
news_groups_map
id | news_id | members_group_id
------------------------------------
1 | 1 | 5
2 | 2 | 6
members_groups_map
id | member_id | group_id
------------------------------
1 | 750 | 5
2 | 750 | 6
The query I've got so far is:
SELECT
n.title,
n.public,
CAST(GROUP_CONCAT(ngm.members_group_id) AS CHAR(1000)) AS news_groups,
CAST(GROUP_CONCAT(member_groups.group_id) AS CHAR(1000)) AS user_groups
FROM news n
LEFT JOIN news_groups_map ngm ON n.id = ngm.news_id
JOIN (
SELECT group_id
FROM members_groups_map
WHERE member_id = 750
) member_groups
WHERE n.public = 0
GROUP BY n.id
However, the result is as follows:
title | public | news_groups | user_groups
-------------------------------------------------
Test | 0 | 5,6,5,6 | 6,6,5,5
As you can see, the news_group and user_groups are duplicating, so if a news article is in 3 groups, the user_groups will be multiplied as well and show something like 5,6,6,6,5,5.
How can I group those groups, so that they are only displayed once?
The ultimate goal here is to compare news_groups and user_groups. So if at least one group matches (meaning user has enough permissions), then there should be a boolean with true returned, and false otherwise. I don't know how to do that either, however, I thought I should sort out the grouping first, as once the number of groups gets bigger there is going to be unnecessary lots of same data selected.
Thanks!
The simplest method is to use distinct:
SELECT n.title, n.public,
GROUP_CONCAT(DISTINCT ngm.members_group_id) AS news_groups,
GROUP_CONCAT(DISTINCT mg.group_id) AS user_groups
FROM news n LEFT JOIN
news_groups_map ngm
ON n.id = ngm.news_id CROSS JOIN
(SELECT group_id
FROM members_groups_map
WHERE member_id = 750
) mg
WHERE n.public = 0
GROUP BY n.id;
This query doesn't actually make sense. First, the subquery is not needed:
SELECT n.title, n.public,
GROUP_CONCAT(DISTINCT ngm.members_group_id) AS news_groups,
GROUP_CONCAT(DISTINCTD mg.group_id) AS user_groups
FROM news n LEFT JOIN
news_groups_map ngm
ON n.id = ngm.news_id CROSS JOIN
members_groups_map mg
ON member_id = 750
WHERE n.public = 0
GROUP BY n.id;
Second, the CROSS JOIN (or equivalently, JOIN without an ON clause) doesn't make sense. Normally, I would expect a join condition to one of the other tables.
Use DISTINCT in the GROUP_CONCAT
...
CAST(GROUP_CONCAT(DISTINCT ngm.members_group_id) AS CHAR(1000)) AS news_groups,
CAST(GROUP_CONCAT(DISTINCT member_groups.group_id) AS CHAR(1000)) AS user_groups
...

MYSQL Optimising JOINED UNION SELECT query

I've re-written this query a few times now (it's Monday) with the attempt of finding the most efficient way of getting the data I require however I'm not sure I'm even approaching it correctly at the moment.
To summarise the problem;
Users have two sets of tags (key_terms, project_terms), there's a link table between each of these between users and tags tables.
I would like to pull out any users that have specified tags in either table. Ideally it'd also include the 'most relevant' tag to that user - but lets put that aside for now.
users
| id | name |
| 1 | dayjo |
| 2 | stackoverflow |
tags
| id | tag |
| 1 | tag1 |
| 2 | tag2 |
user_key_term
| user_id | tag_id |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
project_key_term
| user_id | tag_id |
| 1 | 3 |
| 2 | 3 |
What I want to be able to query on is the named tags, i.e if I search for "tag1" both users should be returned, however if I search for "tag2" only User 1 should be returned.
My Solutions
I tried by selecting users, and joining tags twice (one for each link table), this seemed to work ok but wasn't sure it was the best way, and couldn't figure out how to get the most relevant tag.
SELECT t1.tag, t2.tag as most_relevant_tag, users.* FROM users
LEFT JOIN user_key_term ON user_key_term.user_id = users.id
LEFT JOIN tags t1 ON user_key_term.tag_id = t1.id
LEFT JOIN project_key_term ON project_key_term.user_id = users.id
LEFT JOIN tags t2 ON project_key_term.tag_id = t2.id
WHERE t1.tag IN ('tag1','tag2') OR t2.tag IN ('tag1','tag2')
GROUP BY users.id;
My next attempt was a UNION select, but this one feels dirty;
SELECT users.* FROM
`users`
INNER JOIN (
SELECT project_key_term.user_id, tags.id, tags.tag FROM project_key_term
JOIN tags ON tags.id = project_key_term.tag_id AND tags.tag IN ('tag1')
UNION ALL
SELECT user_key_term.user_id,tags.id, tags.tag FROM user_key_term
JOIN tags ON tags.id = user_key_term.tag_id AND tags.tag IN ('tag1')
) tags ON tags.user_id = users.id
WHERE tags.tag IN ('tag1')
GROUP BY users.id;
But
I've tried running EXPLAIN on both queries to see which is best, but it doesn't reveal anything particularly useful to me. Especially because at the moment there's not a lot of data in the tables, there will potentially be hundreds / thousands of tags.
Any help on the 'correct' or best practice way to do this sort of query would be great!
The union query can be simplified to:
SELECT users.*
FROM users
INNER JOIN (
SELECT user_id,tag_id
FROM project_key_term
UNION ALL
SELECT user_id,tag_id
FROM user_key_term
) alltags ON alltags.user_id = users.id
INNER JOIN tags t on t.id = alltags.tag_id
where t.tag IN ('tag1')
Edit: Getting the most relevant tags
SELECT score, t.tag, users.*
FROM users
INNER JOIN (select user_id, tag_id, count(*) as score
from (SELECT user_id,tag_id
FROM project_key_term
UNION ALL
SELECT user_id,tag_id
FROM user_key_term
) alltags
group by user_id,tag_id) tagcounts ON tagcounts.user_id = users.id
INNER JOIN tags t on t.id = tagcounts.tag_id
where t.tag IN ('tag1','tag2','tag3')
ORDER BY score DESC

MySQL query filter only rows containing certain value, after join

I'm trying to get something done, basically it comes down to this:
I want to retrieve all products and show all categories this product is in. But then, I want to filter out only the products which exists in categories x and y.
So, this is my query:
SELECT p.id, p.name,GROUP_CONCAT(distinct(pc.category_id) SEPARATOR ", ") as category
FROM products p
LEFT JOIN product_category pc ON p.id = pc.productid
GROUP BY p.id;
This works great, I get result like this:
p.id | p.name | category
10 | example| 15,16,17
11 | example| 15,20
12 | example| 39,40
Obviously the '15,16,16' are the categories the product is in. However, now I want to filter the resultset on products only containing category 15 or 16. So the resultset I want to get is:
p.id | p.name | category
10 | example| 15,16,17
11 | example| 15,20
So, what I tried is adding a WHERE to my MySQL statement like this:
WHERE category IN (15,16)
This works as for the filtering, but the problem is, in the resultset I don`t see which other categories the product is also in. So the result I see is:
p.id | p.name | category
10 | example| 15,16
11 | example| 15
Note the difference with the desired result is I just see the filtered cats and not all the cats.
I do get why this is behaving as it is, since obviously the 'category' column in my resultset is based on the values after filtering. However, I don`t know how to work around this or if what I want is even possible.
PS: this query will run on huge databases so the faster the query, the better.
One (admittedly weird) option is to use find_in_set on the aggregate result:
SELECT p.id,
p.name,
GROUP_CONCAT(DISTINCT(pc.category_id) SEPARATOR ", ") AS category
FROM products p
LEFT JOIN product_category pc ON p.id = pc.productid
GROUP BY p.id
HAVING FIND_IN_SET('15', GROUP_CONCAT (pc.category_id)) > 0 OR
FIND_IN_SET('16', GROUP_CONCAT (pc.category_id)) > 0
Try this untested query:
select * p.id, p.name,GROUP_CONCAT(distinct(pc.category_id) SEPARATOR ", ") as category
from FROM products p
LEFT JOIN product_category pc ON p.id = pc.productid
where p.id in (
SELECT pc.productid
product_category pc ON p.id = pc.productid
where category IN (15,16)
GROUP BY pc.productid
)
GROUP BY p.id
I was just wondering how would I resolved this task:
SET #needle = '15,16';
SELECT p.id, p.name, GROUP_CONCAT(pc.category_id SEPARATOR ', ') AS 'category' FROM products p
LEFT JOIN product_category pc ON p.id = pc.productid GROUP BY id
HAVING REGEXP_INSTR(GROUP_CONCAT(pc.category_id), CONCAT('([[:<:]])(',REPLACE(#needle,',','|'),')([[:>:]])'));
The groups before passing the HAVING clause:
p.id | p.name | category
10 | example| 15,16,17
11 | example| 15,20
12 | example| 39,40
Then we check the groups with the HAVING clause and REGEXP:
HAVING REGEXP_INSTR(t3.category, CONCAT('[[:<:]](15|16)[[:>:]]'))
/* REGEX: START_WORD_BOUNDARY(15 OR 16)END_WORD_BOUNDARY */
And here it is:
p.id | p.name | category
10 | example| 15,16,17
11 | example| 15,20

Using a second table as a lookup?

I have Three tables,
Posts,
Tags,
Posts_Tags_Link
Posts has:
id, content
Tags has: id, tag
Posts_Tags_Link has: post_id, tag_id
Basically if a tag is linked to a post then an entry is created in Posts_Tags_Link as this is a many-many relationship.
Anyway, I want to do some searches and return all rows from Posts that are linked to a particular keyword.
E.g. If I have the
Posts:
id | content
1 | some stuff
2 | more stuff
3 | stuff again
Tags:
id | tag
1 | first
2 | second
3 | third
4 | fourth
Posts_Tags_Link
post_id | tag_id
1 | 1
1 | 2
2 | 2
3 | 3
3 | 4
and I search for second I want to return
id | content
1 | some stuff
2 | more stuff
I assume I am to use a join for this,
Would I just join my posts table to the link table, on the post_id and join the link table to the link table to the tags table on the tag_id column?
I believe that is right, but If I only want to rows that match the search (like not where) would I use like or would one of the different joins work?
I want that if I search for sec it would have the same result as if I searched for second so believe that I have to do this using like?
You should join the three tables since you want to search from them, example
SELECT a.*
FROM post a
INNER JOIN Posts_Tags_Link b
on a.id = b.post_id
INNER JOIN Tag c
ON b.tag_tag_id = id
WHERE a.content like '%keyword%' OR -- build you conditions here
c.tag like '%keyword%'
Try to use the following query.
SELECT p.id, p.content FROM
Posts_Tags_Link ptl
INNER JOIN Posts p ON p.id = ptl.post_id
INNER JOIN Tags t ON t.id = ptl.tag_id
WHERE t.tag = 'second'