Assuming my database has 3 tables:
Documents
Tags
Documents_Tags (the join table)
I know how to find the tags assigned to a document using a LEFT JOIN, but I'm having trouble finding the tags that are not assigned.
SELECT * FROM `documents_tags`
LEFT JOIN `tags` ON `tags`.`id` = `documents_tags`.`tag_id`
WHERE `document_id` = 111;
I've tried different joins, but I keep getting only one record. I thought there was a way to join all the tags and then limit the results to where the document is null?
EDIT: In the above example I need to find all tags not assigned to document 111.
SELECT Tags.name -- or whatever it is
FROM
tags
LEFT JOIN documents_tags dt ON (tags.id = dt.tag_id AND dt.document_id = 111)
WHERE dt.id IS NULL
Related
I have an m:n relationship of images and tags in my database, which uses a crosstable to model this.
Table imgs contains much more information than just img_id, but that is all that's required to uniquely identify an image.
I want to find every img_id which has both tagA and tagB (and tagC and so on, I'll build this string so it won't really matter whether its two or ten tags).
Now, where I'm stuck is, of course first you'll join imgs with img_tags with tags, add a where clause for the tags;
SELECT *
FROM imgs
INNER JOIN img_tags ON imgs.img_id = img_tags.img_id
INNER JOIN tags ON img_tags.tag_id = tags.tag_id
WHERE tag = 'tagA' OR tag = 'tagB';
and then you'll get rows with identical imgs information, only differing in tag and tag_id. Now I should be able to count those, targeting only those which appear in the same amount as tags were supplied (Count(*) = n), and then using group by to aggregate them? But I can't quite figure it out.
In case it might be of relevance, you might assume the fields in img_tags are both foreign keys referencing the other tables, however that is not the case, they are not linked in any way.
You can use aggregation like this:
SELECT i.*
FROM imgs i JOIN
img_tags it
ON i.img_id = it.img_id JOIN
tags t
ON it.tag_id = t.tag_id
WHERE tag IN ('tagA', 'tagB')
GROUP BY i.img_id
HAVING COUNT(*) = 2;
Aggregating by i.img_id is safe -- and supported by the SQL standard -- assuming that img_id is the primary key in the table.
Here is on approach using a correlated subquery:
SELECT i.*
FROM imgs i
WHERE (
SELECT COUNT(*)
FROM img_tags it
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE i.img_id = it.img_id AND t.tag IN('tagA', 'tagB')
) = 2
This assumes no duplicate tags in your data structure. Otherwise, you can use COUNT(DISTINCT t.tag) instead of COUNT(*).
You can also use aggregation:
SELECT i.id
FROM imgs i
INNER JOIN img_tags it ON i.img_id = it.img_id
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE t.tag IN('tagA', 'tagB')
GROUP BY i.id
HAVING COUNT(*) = 2
If there are not many tags involved, I would just use exists (or not exists if you want to exclude some tags) for this
select *
from imgs
where
exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagA') and it.img_id=imgs.img_id)
and exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagB') and it.img_id=imgs.img_id);
especially if you end up wanting to do more complicated boolean expressions e.g. (A and (B or not C)).
I have an SQL setup akin to the following:
ARTICLES
id (PK)
name
TAGS
id (PK)
tag
...and a third table logging associations between the two, since there can be multiple tags to each article:
ARTICLE_TAG_ASSOCS
id (PK)
article_id (FK)
tag_id (FK)
Via this question I managed to construct a query that would find articles that were tagged with at least one of a number of tags, e.g.
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata ON articles.id = ata.article_id
JOIN tags ON tags.id = ata.tag_id
WHERE tags.tag = 'budgie' OR tags.tag = 'parrot';
Question: How can I alter the above to find articles that match ALL tags, i.e. both 'budgie' and 'parrot', not just one?
Clearly modifying the logic to
WHERE tags.tag = 'budgie' && tags.tag = 'parrot';
...is logically flawed, since MySQL is considering each tag in isolation, one at a time, but hopefully you get what I mean.
There are several workable approaches.
One approach is to perform separate JOIN operations for each tag. For example:
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata
ON ata.article_id = articles.id
JOIN tags ta
ON ta.id = ata.tag_id
AND ta.tag = 'budgie'
JOIN article_tag_assocs atb
ON atb.article_id = articles.id
JOIN tags tb
ON tb.id = atb.tag_id
AND tb.tag = 'parrot'
Note that this can return "duplicate" rows if a given articles is associated to the same tag value more than once. (Adding the DISTINCT keyword or a GROUP BY clause are ways to eliminate the duplicates.)
Another approach, if we are guaranteed that a given article has no duplicate tag values, is to use an inline view to get the list of article_id that are associated with both tags, and then JOIN that set to the articles table. For example:
SELECT a.*
FROM ( SELECT ata.article_id
FROM article_tag_assocs ata
JOIN tags t
ON t.id = ata.tag_id
WHERE t.tag IN ('budgie','parrot')
GROUP BY ata.article_id
HAVING COUNT(1) = 2
) s
JOIN articles a
ON a.id = s.article_id
Note that the literal "2" in the HAVING clause matches the number of values in the predicate on the tag column. The inline view (aliased as s) returns a distinct list of article_id, and we can join that to the articles table.
This approach is useful if you wanted to match, for example, at least three out of four tags. We could use lines like this in the inline view query.
WHERE t.tag IN ('fee','fi','fo','fum')
HAVING COUNT(1) >= 3
Then, any article that matched at least three of those four tags would be returned.
These aren't the only ways to return the specified result, there are several other approaches.
As Roland's answer pointed out, you can also do something like this:
FROM articles a
WHERE a.id IN ( <select article id values related to tag 'parrot'> )
AND a.id IN ( <select article id values related to tag 'bungie'> )
You could also use an EXISTS clause with a correlated subquery, though this approach doesn't usually perform as well with large sets, due to the number of executions of the subquery
FROM articles a
WHERE EXISTS ( SELECT 1
FROM article_tag_assocs s1
JOIN tags t1 ON t1.tag = 'bungie'
WHERE s1.article_id = a.id
)
AND EXISTS ( SELECT 1
FROM article_tag_assocs s2
JOIN tags t2 ON t2.tag = 'parrot'
WHERE s2.article_id = a.id
)
NOTE: in this case, it is possible to reuse the same table aliases within each subquery, because it doesn't lead to ambiguity, though I still prefer distinct aliases because the table aliases show up in the EXPLAIN output, and the distinct aliases make it easier to match the rows in the EXPLAIN output to the references in the query.)
What about this?
Will this give bad performance like EXISTS for large data sets?
This query is to check which rows of 'a1' table has some specified tags and not has some other specified tags
SELECT * FROM a1 WHERE a1.id IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (1))
AND a1.id NOT IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (2))
ORDER BY a1.file_count DESC LIMIT 0, 5
So I have three tables, one is posts , having columns id,title,content,timestamp . Other is tags, having columns id,tag and third posttags describes one to many relation between posts and tags , having columns postid,tagid .
Now instead of having columns like hidden,featured etc in the table posts to describe whether a post should be visible to all or should be displayed on a special featured page, I thought why not use tags to save time. So what I decided is that all posts that have a tag #featured will be featured and all posts with tag #hidden will be hidden.
Implementing first one was easy as I could use a join query and in my where clause I could mention WHERE tag='featured' and this would get all the featured posts for me.
But take an example of a post tagged #sports and #hidden if I were to use the query
SELECT * FROM posts
INNER JOIN posttags ON posttags.postid = posts.id
INNER JOIN tags ON posttags.tagid = tags.id
WHERE tag !='hidden'
but that'd still return the post tagged hidden since its also tagged sports
PS my question is different from this question : Select a post that does not have a particular tag since it uses tagid directly and I'm unable to achieve same result using double join to check against tag name instead of tagid. And also I wish to retrieve the other tags of the post in same query which is not possible using the method in that question's answers
Group the tags by post, then use the HAVING clause to filter the groups for those that do not contain a 'hidden' tag. Because of MySQL's implicit type conversion and lack of genuine boolean types, one can do:
SELECT posts.*
FROM posts
JOIN posttags ON posttags.postid = posts.id
JOIN tags ON posttags.tagid = tags.id
GROUP BY posts.id
HAVING NOT SUM(tag='hidden')
You can do this with a NOT EXISTS subquery:
SELECT p.*, t.* -- what columns you need
FROM posts AS p
INNER JOIN posttags AS pt
ON pt.postid = p.id
INNER JOIN tags AS t
ON pt.tagid = t.id
WHERE NOT EXISTS
( SELECT *
FROM posttags AS pt_no
INNER JOIN tags AS t_no
ON pt_no.tagid = t_no.id
WHERE t_no.tag = 'hidden'
AND pt_no.postid = p.id
) ;
or the equivalent LEFT JOIN / IS NULL:
SELECT p.*, t.*
FROM posts AS p
LEFT JOIN posttags AS pt_no
INNER JOIN tags AS t_no
ON t_no.tag = 'hidden'
AND pt_no.tagid = t_no.id
ON pt_no.postid = p.id
INNER JOIN posttags AS pt
ON pt.postid = p.id
INNER JOIN tags AS t
ON pt.tagid = t.id
WHERE pt_no.postid IS NULL ;
Thsi type of queries are called anti-semijoins or just anti-joins. It's slightly more complex in your case because the condition (tag='hidden') is in a 3rd table.
I have a situation where I need to match an objects to multiple tags simultaneously so that results set is "narrowed down" to match all tags. I've found the following MySQL query for this:
SELECT *
FROM OBJECTS o
JOIN OBJECTSTAGS ot ON ot.object_id = o.id
JOIN TAGS t ON t.id = ot.tag_id
WHERE t.name IN ('tag1','tag2')
GROUP BY o.id
HAVING COUNT(DISTINCT t.name) = 2
... where 2 is the number of tags being matched. It works fine.
However, I need the query to return a count of the objects instead of the objects themselves. This query seems to confuse itself if I add COUNT(*) to the SELECT. I'm hesitant to return just the ids for example and do a PHP count of them because they could add up to a very large number. I would therefore like MySQL to return the count.
Could anyone suggest a good way to do this? Breaking it into two queries would be acceptable.
Thanks in advance.
Use:
SELECT COUNT(o.*) AS numObjects
FROM OBJECTS o
WHERE EXISTS (SELECT NULL
FROM OBJECTSTAGS ot
JOIN TAGS t ON t.id = ot.tag_id
AND t.name IN ('tag1','tag2')
WHERE ot.object_id = o.id)
I am trying to query for Objects that match ALL of a given set of Tags.
Basically I want users to be able to add on more and more Tags to filter or "narrow down" their search results, kind of like newegg.com does.
My table structure is a table of Objects, a table of Tags, and a MANY:MANY relation table ObjectsTags. So I have a JOIN query like so:
SELECT * FROM Objects
LEFT OUTER JOIN ObjectsTags ON (Objects.id=ObjectsTags.object_id)
LEFT OUTER JOIN Tags ON (Tags.id=ObjectsTags.tag_id)
I tried using an IN clause/condition, like this:
SELECT * FROM Objects
LEFT OUTER JOIN ObjectsTags ON (Objects.id=ObjectsTags.object_id)
LEFT OUTER JOIN Tags ON (Tags.id=ObjectsTags.tag_id)
WHERE Tags.name IN ('tag1','tag2')
GROUP BY Objects.id
But I learned that this simulates a series of ORs, so the more tags you add to the query the MORE results you get, instead of the result set narrowing down like I was hoping.
I also tried doing multiple LIKE WHERE conditions, ANDed together:
SELECT * FROM Objects
LEFT OUTER JOIN ObjectsTags ON (Objects.id=ObjectsTags.object_id)
LEFT OUTER JOIN Tags ON (Tags.id=ObjectsTags.tag_id)
WHERE Tags.name LIKE 'tag1'
AND Tags.name LIKE 'tag2'
GROUP BY Objects.id
But this returns no results, since when the results are grouped together the OUTER JOINed Tags.name column just contains 'tag1', and not also 'tag2'. The result row where 'tag2' matched is "hidden" by the GROUPing.
How can I match ALL of the tags to get the "narrow down" or "drill down" effect that I am after? Thanks.
Use:
SELECT *
FROM OBJECTS o
JOIN OBJECTSTAGS ot ON ot.object_id = o.id
JOIN TAGS t ON t.id = ot.tag_id
WHERE t.name IN ('tag1','tag2')
GROUP BY o.id
HAVING COUNT(DISTINCT t.name) = 2
You were missing the HAVING clause.
There's no need to LEFT JOIN if you want only rows where both tags exist.