I've re-written this query a few times now (it's Monday) with the attempt of finding the most efficient way of getting the data I require however I'm not sure I'm even approaching it correctly at the moment.
To summarise the problem;
Users have two sets of tags (key_terms, project_terms), there's a link table between each of these between users and tags tables.
I would like to pull out any users that have specified tags in either table. Ideally it'd also include the 'most relevant' tag to that user - but lets put that aside for now.
users
| id | name |
| 1 | dayjo |
| 2 | stackoverflow |
tags
| id | tag |
| 1 | tag1 |
| 2 | tag2 |
user_key_term
| user_id | tag_id |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
project_key_term
| user_id | tag_id |
| 1 | 3 |
| 2 | 3 |
What I want to be able to query on is the named tags, i.e if I search for "tag1" both users should be returned, however if I search for "tag2" only User 1 should be returned.
My Solutions
I tried by selecting users, and joining tags twice (one for each link table), this seemed to work ok but wasn't sure it was the best way, and couldn't figure out how to get the most relevant tag.
SELECT t1.tag, t2.tag as most_relevant_tag, users.* FROM users
LEFT JOIN user_key_term ON user_key_term.user_id = users.id
LEFT JOIN tags t1 ON user_key_term.tag_id = t1.id
LEFT JOIN project_key_term ON project_key_term.user_id = users.id
LEFT JOIN tags t2 ON project_key_term.tag_id = t2.id
WHERE t1.tag IN ('tag1','tag2') OR t2.tag IN ('tag1','tag2')
GROUP BY users.id;
My next attempt was a UNION select, but this one feels dirty;
SELECT users.* FROM
`users`
INNER JOIN (
SELECT project_key_term.user_id, tags.id, tags.tag FROM project_key_term
JOIN tags ON tags.id = project_key_term.tag_id AND tags.tag IN ('tag1')
UNION ALL
SELECT user_key_term.user_id,tags.id, tags.tag FROM user_key_term
JOIN tags ON tags.id = user_key_term.tag_id AND tags.tag IN ('tag1')
) tags ON tags.user_id = users.id
WHERE tags.tag IN ('tag1')
GROUP BY users.id;
But
I've tried running EXPLAIN on both queries to see which is best, but it doesn't reveal anything particularly useful to me. Especially because at the moment there's not a lot of data in the tables, there will potentially be hundreds / thousands of tags.
Any help on the 'correct' or best practice way to do this sort of query would be great!
The union query can be simplified to:
SELECT users.*
FROM users
INNER JOIN (
SELECT user_id,tag_id
FROM project_key_term
UNION ALL
SELECT user_id,tag_id
FROM user_key_term
) alltags ON alltags.user_id = users.id
INNER JOIN tags t on t.id = alltags.tag_id
where t.tag IN ('tag1')
Edit: Getting the most relevant tags
SELECT score, t.tag, users.*
FROM users
INNER JOIN (select user_id, tag_id, count(*) as score
from (SELECT user_id,tag_id
FROM project_key_term
UNION ALL
SELECT user_id,tag_id
FROM user_key_term
) alltags
group by user_id,tag_id) tagcounts ON tagcounts.user_id = users.id
INNER JOIN tags t on t.id = tagcounts.tag_id
where t.tag IN ('tag1','tag2','tag3')
ORDER BY score DESC
Related
I have a situation where I want to join multiple SQL tables and get back one row per record in the base table as well as GROUP_CONCAT the other table data together with |. Unfortunately, with the query method I'm currently using, I'm getting back undesired multiplicity in the GROUP_CONCAT data and I don't know how to solve it.
I have the following basic DB structure:
things
id | name
1 | Some Thing
2 | Some Other Thing
items
id | name
1 | Blob
2 | Starfish
3 | Wrench
4 | Stereo
users
id | name
1 | Alice
2 | Bill
3 | Charlie
4 | Daisy
things_items
thing_id | item_id
1 | 1
1 | 2
2 | 3
2 | 4
things_users
thing_id | user_id
1 | 1
1 | 2
1 | 3
2 | 4
And I would ideally like to write a query that gets back the following for the Some Thing row in the things table:
Some Thing | Blob|Starfish | Alice|Bill|Charlie
However, what I'm getting back is the following:
Some Thing | Blob|Blob|Blob|Starfish|Starfish|Starfish | Alice|Alice|Bill|Bill|Charlie|Charlie
And this is the query I'm using:
SELECT things.name,
GROUP_CONCAT(items.name SEPARATOR '|')
GROUP_CONCAT(users.name SEPARATOR '|')
FROM things
JOIN things_items ON things.id = things_items.thing_id
JOIN items ON things_items.item_id = items.id
JOIN things_users ON things.id = things_users.thing_id
JOIN users ON things_items.user_id = users.id
GROUP BY things.id;
How should I change the query to get the data back the way I'd like to and avoid the multiplying of the GROUP_CONCAT data? Thank you.
You are concatenating along two separate dimensions. The simplest solution is DISTINCT:
SELECT t.name,
GROUP_CONCAT(DISTINCT i.name SEPARATOR '|')
GROUP_CONCAT(DISTINCT u.name SEPARATOR '|')
FROM things t JOIN
things_items ti
ON t.id = ti.thing_id JOIN
items i
ON ti.item_id = i.id JOIN
things_users tu
ON t.id = tu.thing_id JOIN
users u
ON tu.user_id = u.id
GROUP BY t.id;
Note the above filters out things that have either no items or no users.
The above will work fine if there are a handful of items and users for each thing. As the numbers grow, the performance gets worse because it generates a Cartesian product for each thing.
That can be solved by aggregating before joining:
SELECT t.name, i.items, u.users
FROM things t JOIN
(SELECT ti.thing_id, GROUP_CONCAT(i.name SEPARATOR '|') as items
FROM things_items ti JOIN
items i
ON ti.item_id = i.id
GROUP BY ti.thing_id
) i
ON t.id = ti.thing_id JOIN
(SELECT tu.user_id, GROUP_CONCAT(DISTINCT u.name SEPARATOR '|') as users
FROM things_users tu JOIN
users u
ON tu.user_id = u.id
GROUP BY tu.user_id
) tu
ON t.id = tu.thing_id ;
You can replace the outer JOINs with LEFT JOIN if you want all things, even those with no items or names.
I'm trying to create little "recommended" functionality based on the posts with the most matching tags.
I got a layout like this:
Posts
id
---
1
2
3
4
post_tags
post_id | tag_id
---------+---------
1 | 1
1 | 2
2 | 2
2 | 3
2 | 4
3 | 1
3 | 2
3 | 4
4 | 5
tags
id
----
1
2
3
4
5
So if I would retrieve recommendations for the post with id 1 the list should go
3 (2/2 matches)
2 (1/2 matches)
4 (0/2 matches)
My Query so far looks like this:
SELECT DISTINCT
p.id,
p.title,
count(*) as cnt
FROM
posts p
INNER JOIN posts_tags pt ON pt.post_id= p.id
INNER JOIN tags t ON pt.tag_id = t.id
WHERE
t.id IN (
SELECT
pt.tag_id
FROM
posts_tags pt
WHERE
pt.post_id = '30213'
)
GROUP BY
t. NAME
order by count(*) desc
LIMIT 0, 4
I know DISTINCT isn't working because of the count but I wanted to see just what he counted, so the result looks like this:
4 Foo 4881
4 Foo 2560
11 Bar 2094
12 Baz 1998
So what happened? It counted the occurences of the tag in general. So appearantly the first associated tag of "Post 1" is 4881 associated and then pulls the first entry that matches... the one with the lowest id.
I see the problem but I can't solve it.
Your group by makes no sense. You want to aggregate by the post not the tag:
SELECT p.id, p.title, count(*) as cnt
FROM posts p INNER JOIN
posts_tags pt
ON pt.post_id = p.id
WHERE pt.tag_id IN (SELECT pt2.tag_id
FROM posts_tags pt2
WHERE pt2.post_id = 30213
)
GROUP BY p.id, p.title
ORDER BY count(*) desc
LIMIT 0, 4;
This will not return 0. If that is important, you need to use a LEFT JOIN instead of WHERE . . . IN . . ..
Also:
SELECT DISTINCT is almost never used with GROUP BY. It is hard (but not impossible) to come up with a use-case for it.
You don't need the tags table, so I removed it.
Don't use single quotes around numbers. I am guessing that post_id is really a number.
The fix is in the GROUP BY.
I am trying to join multiple rows of information for single row, but it seems to multiply every time there is more rows in one of the joins.
My tables structure is as follows:
news
id | title | public
------------------------
1 | Test | 0
news_groups_map
id | news_id | members_group_id
------------------------------------
1 | 1 | 5
2 | 2 | 6
members_groups_map
id | member_id | group_id
------------------------------
1 | 750 | 5
2 | 750 | 6
The query I've got so far is:
SELECT
n.title,
n.public,
CAST(GROUP_CONCAT(ngm.members_group_id) AS CHAR(1000)) AS news_groups,
CAST(GROUP_CONCAT(member_groups.group_id) AS CHAR(1000)) AS user_groups
FROM news n
LEFT JOIN news_groups_map ngm ON n.id = ngm.news_id
JOIN (
SELECT group_id
FROM members_groups_map
WHERE member_id = 750
) member_groups
WHERE n.public = 0
GROUP BY n.id
However, the result is as follows:
title | public | news_groups | user_groups
-------------------------------------------------
Test | 0 | 5,6,5,6 | 6,6,5,5
As you can see, the news_group and user_groups are duplicating, so if a news article is in 3 groups, the user_groups will be multiplied as well and show something like 5,6,6,6,5,5.
How can I group those groups, so that they are only displayed once?
The ultimate goal here is to compare news_groups and user_groups. So if at least one group matches (meaning user has enough permissions), then there should be a boolean with true returned, and false otherwise. I don't know how to do that either, however, I thought I should sort out the grouping first, as once the number of groups gets bigger there is going to be unnecessary lots of same data selected.
Thanks!
The simplest method is to use distinct:
SELECT n.title, n.public,
GROUP_CONCAT(DISTINCT ngm.members_group_id) AS news_groups,
GROUP_CONCAT(DISTINCT mg.group_id) AS user_groups
FROM news n LEFT JOIN
news_groups_map ngm
ON n.id = ngm.news_id CROSS JOIN
(SELECT group_id
FROM members_groups_map
WHERE member_id = 750
) mg
WHERE n.public = 0
GROUP BY n.id;
This query doesn't actually make sense. First, the subquery is not needed:
SELECT n.title, n.public,
GROUP_CONCAT(DISTINCT ngm.members_group_id) AS news_groups,
GROUP_CONCAT(DISTINCTD mg.group_id) AS user_groups
FROM news n LEFT JOIN
news_groups_map ngm
ON n.id = ngm.news_id CROSS JOIN
members_groups_map mg
ON member_id = 750
WHERE n.public = 0
GROUP BY n.id;
Second, the CROSS JOIN (or equivalently, JOIN without an ON clause) doesn't make sense. Normally, I would expect a join condition to one of the other tables.
Use DISTINCT in the GROUP_CONCAT
...
CAST(GROUP_CONCAT(DISTINCT ngm.members_group_id) AS CHAR(1000)) AS news_groups,
CAST(GROUP_CONCAT(DISTINCT member_groups.group_id) AS CHAR(1000)) AS user_groups
...
I am currently trying to write a general query which returns the content of 1 table and another joined table plus the count of resulting rows from a third table.
Now my description might seem abstract so I'll try to visualize it
Tables:
posts
| ID | title | description | creator_id |
1 Title1 Descr1 1
2 Title2 Descr2 1
users
| ID | name | avatar |
1 User1 PATH
interactions
| ID | type | target_id | identifier |
1 view 1 IP
2 view 1 IP
Now what I am looking for is an output like this:
| ID | title | description | name | avatar | view_count |
1 Title1 Descr1 User1 PATH 2
2 Title2 Descr2 User1 PATH 0
My current query looks like following:
SELECT
posts.id, posts.title, posts.description,
users.name, users.avatar,
COUNT(interactions.id) AS view_count
FROM
posts
LEFT JOIN
users
ON
posts.creator_id = users.id
LEFT JOIN
interactions
ON
posts.id = interactions.target_id
But only prints out the posts result which has an interaction like this:
| ID | title | description | name | avatar | view_count |
1 Title1 Descr1 User1 PATH 2
How do I need to alter the query in order to also get the other rows which happen to not have any interactions yet?
Thank you for your help!
You can simply subquery third table to count entries:
SELECT
posts.id, posts.title, posts.description,
users.name, users.avatar,
(SELECT COUNT(*) FROM interactions i WHERE i.target_id = posts.id) AS view_count
FROM
posts
LEFT JOIN
users
ON
posts.creator_id = users.id
This is also better for performance (no groups, no unoptimized joins)
Try this:
SELECT P.ID
, P.title
, P.description
, U.name
, U.avatar
, IFNULL(COUNT(I.ID), 0) AS view_count
FROM posts P
LEFT JOIN users U ON U.ID = P.creator_id
LEFT JOIN interactions I ON I.target_id = P.ID
GROUP BY P.ID
It seems like you missed the GROUP BY clause. Without this, when you use an aggregate function like COUNT, the documentation says:
there is a single group and it is indeterminate
which name value to choose for the group
That's why your query only returned 1 row.
Try this;)
select posts.id, posts.title, posts.description, users.name, users.avatar, coalesce(t3.view_count, 0) as view_count
from posts
left join users on posts.creator_id = users.id
left join (
select target_id, count(1) as view_count from interactions group by target_id
) t3 on posts.id = t3.target_id
SQLFiddle HERE
I have Three tables,
Posts,
Tags,
Posts_Tags_Link
Posts has:
id, content
Tags has: id, tag
Posts_Tags_Link has: post_id, tag_id
Basically if a tag is linked to a post then an entry is created in Posts_Tags_Link as this is a many-many relationship.
Anyway, I want to do some searches and return all rows from Posts that are linked to a particular keyword.
E.g. If I have the
Posts:
id | content
1 | some stuff
2 | more stuff
3 | stuff again
Tags:
id | tag
1 | first
2 | second
3 | third
4 | fourth
Posts_Tags_Link
post_id | tag_id
1 | 1
1 | 2
2 | 2
3 | 3
3 | 4
and I search for second I want to return
id | content
1 | some stuff
2 | more stuff
I assume I am to use a join for this,
Would I just join my posts table to the link table, on the post_id and join the link table to the link table to the tags table on the tag_id column?
I believe that is right, but If I only want to rows that match the search (like not where) would I use like or would one of the different joins work?
I want that if I search for sec it would have the same result as if I searched for second so believe that I have to do this using like?
You should join the three tables since you want to search from them, example
SELECT a.*
FROM post a
INNER JOIN Posts_Tags_Link b
on a.id = b.post_id
INNER JOIN Tag c
ON b.tag_tag_id = id
WHERE a.content like '%keyword%' OR -- build you conditions here
c.tag like '%keyword%'
Try to use the following query.
SELECT p.id, p.content FROM
Posts_Tags_Link ptl
INNER JOIN Posts p ON p.id = ptl.post_id
INNER JOIN Tags t ON t.id = ptl.tag_id
WHERE t.tag = 'second'