Select only entities with n relations from an n:m relationship - mysql

I have an m:n relationship of images and tags in my database, which uses a crosstable to model this.
Table imgs contains much more information than just img_id, but that is all that's required to uniquely identify an image.
I want to find every img_id which has both tagA and tagB (and tagC and so on, I'll build this string so it won't really matter whether its two or ten tags).
Now, where I'm stuck is, of course first you'll join imgs with img_tags with tags, add a where clause for the tags;
SELECT *
FROM imgs
INNER JOIN img_tags ON imgs.img_id = img_tags.img_id
INNER JOIN tags ON img_tags.tag_id = tags.tag_id
WHERE tag = 'tagA' OR tag = 'tagB';
and then you'll get rows with identical imgs information, only differing in tag and tag_id. Now I should be able to count those, targeting only those which appear in the same amount as tags were supplied (Count(*) = n), and then using group by to aggregate them? But I can't quite figure it out.
In case it might be of relevance, you might assume the fields in img_tags are both foreign keys referencing the other tables, however that is not the case, they are not linked in any way.

You can use aggregation like this:
SELECT i.*
FROM imgs i JOIN
img_tags it
ON i.img_id = it.img_id JOIN
tags t
ON it.tag_id = t.tag_id
WHERE tag IN ('tagA', 'tagB')
GROUP BY i.img_id
HAVING COUNT(*) = 2;
Aggregating by i.img_id is safe -- and supported by the SQL standard -- assuming that img_id is the primary key in the table.

Here is on approach using a correlated subquery:
SELECT i.*
FROM imgs i
WHERE (
SELECT COUNT(*)
FROM img_tags it
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE i.img_id = it.img_id AND t.tag IN('tagA', 'tagB')
) = 2
This assumes no duplicate tags in your data structure. Otherwise, you can use COUNT(DISTINCT t.tag) instead of COUNT(*).
You can also use aggregation:
SELECT i.id
FROM imgs i
INNER JOIN img_tags it ON i.img_id = it.img_id
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE t.tag IN('tagA', 'tagB')
GROUP BY i.id
HAVING COUNT(*) = 2

If there are not many tags involved, I would just use exists (or not exists if you want to exclude some tags) for this
select *
from imgs
where
exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagA') and it.img_id=imgs.img_id)
and exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagB') and it.img_id=imgs.img_id);
especially if you end up wanting to do more complicated boolean expressions e.g. (A and (B or not C)).

Related

Select on join table with exact number of itmes

I have two tables
tracks
tags
One track have many tags
I want to have list of tracks that have both of two tags example tag_id 1 and tag_id 2
SELECT * FROM tracks
LEFT JOIN tags ON tracks.tag_id = tags.id
WHERE tags.id in (1,2)
GROUP BY track.id
HAVING count(tags.id) = 2
The problem if a tracks have tag 1 and 3 it will be listed.
any help please?
Add distinct to count
SELECT track.id FROM tracks
LEFT JOIN tags ON tracks.tag_id = tags.id
WHERE tags.id in (1,2)
GROUP BY track.id
HAVING count(Distinct tags.id) = 2
You can change the LEFT JOIN to INNER JOIN since it is converted implicitly based on your Where clause
Your code should do what you want it to. I would write it as:
SELECT track.id
FROM tracks INNER JOIN
tags
ON tracks.tag_id = tags.id
WHERE tags.id in (1, 2)
GROUP BY track.id
HAVING count(tags.id) = 2;
Note:
The LEFT JOIN is turned into an INNER JOIN by the WHERE clause. You might as well be specific.
If you have duplicates in track, then you want to use COUNT(DISTINCT) rather than COUNT().
Because you are returning non-aggregated columns, you might get unexpected results in other columns.
Actually, this can be further simplified to:
SELECT t.id
FROM tracks t
WHERE t.tag_id in (1, 2)
GROUP BY t.id
HAVING count(t.id) = 2;
The JOIN is not needed at all, because you have the information you need in tracks.tag_id.

MySQL - match all tags rather than any

I have an SQL setup akin to the following:
ARTICLES
id (PK)
name
TAGS
id (PK)
tag
...and a third table logging associations between the two, since there can be multiple tags to each article:
ARTICLE_TAG_ASSOCS
id (PK)
article_id (FK)
tag_id (FK)
Via this question I managed to construct a query that would find articles that were tagged with at least one of a number of tags, e.g.
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata ON articles.id = ata.article_id
JOIN tags ON tags.id = ata.tag_id
WHERE tags.tag = 'budgie' OR tags.tag = 'parrot';
Question: How can I alter the above to find articles that match ALL tags, i.e. both 'budgie' and 'parrot', not just one?
Clearly modifying the logic to
WHERE tags.tag = 'budgie' && tags.tag = 'parrot';
...is logically flawed, since MySQL is considering each tag in isolation, one at a time, but hopefully you get what I mean.
There are several workable approaches.
One approach is to perform separate JOIN operations for each tag. For example:
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata
ON ata.article_id = articles.id
JOIN tags ta
ON ta.id = ata.tag_id
AND ta.tag = 'budgie'
JOIN article_tag_assocs atb
ON atb.article_id = articles.id
JOIN tags tb
ON tb.id = atb.tag_id
AND tb.tag = 'parrot'
Note that this can return "duplicate" rows if a given articles is associated to the same tag value more than once. (Adding the DISTINCT keyword or a GROUP BY clause are ways to eliminate the duplicates.)
Another approach, if we are guaranteed that a given article has no duplicate tag values, is to use an inline view to get the list of article_id that are associated with both tags, and then JOIN that set to the articles table. For example:
SELECT a.*
FROM ( SELECT ata.article_id
FROM article_tag_assocs ata
JOIN tags t
ON t.id = ata.tag_id
WHERE t.tag IN ('budgie','parrot')
GROUP BY ata.article_id
HAVING COUNT(1) = 2
) s
JOIN articles a
ON a.id = s.article_id
Note that the literal "2" in the HAVING clause matches the number of values in the predicate on the tag column. The inline view (aliased as s) returns a distinct list of article_id, and we can join that to the articles table.
This approach is useful if you wanted to match, for example, at least three out of four tags. We could use lines like this in the inline view query.
WHERE t.tag IN ('fee','fi','fo','fum')
HAVING COUNT(1) >= 3
Then, any article that matched at least three of those four tags would be returned.
These aren't the only ways to return the specified result, there are several other approaches.
As Roland's answer pointed out, you can also do something like this:
FROM articles a
WHERE a.id IN ( <select article id values related to tag 'parrot'> )
AND a.id IN ( <select article id values related to tag 'bungie'> )
You could also use an EXISTS clause with a correlated subquery, though this approach doesn't usually perform as well with large sets, due to the number of executions of the subquery
FROM articles a
WHERE EXISTS ( SELECT 1
FROM article_tag_assocs s1
JOIN tags t1 ON t1.tag = 'bungie'
WHERE s1.article_id = a.id
)
AND EXISTS ( SELECT 1
FROM article_tag_assocs s2
JOIN tags t2 ON t2.tag = 'parrot'
WHERE s2.article_id = a.id
)
NOTE: in this case, it is possible to reuse the same table aliases within each subquery, because it doesn't lead to ambiguity, though I still prefer distinct aliases because the table aliases show up in the EXPLAIN output, and the distinct aliases make it easier to match the rows in the EXPLAIN output to the references in the query.)
What about this?
Will this give bad performance like EXISTS for large data sets?
This query is to check which rows of 'a1' table has some specified tags and not has some other specified tags
SELECT * FROM a1 WHERE a1.id IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (1))
AND a1.id NOT IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (2))
ORDER BY a1.file_count DESC LIMIT 0, 5

Select documents that are not tagged

Assuming my database has 3 tables:
Documents
Tags
Documents_Tags (the join table)
I know how to find the tags assigned to a document using a LEFT JOIN, but I'm having trouble finding the tags that are not assigned.
SELECT * FROM `documents_tags`
LEFT JOIN `tags` ON `tags`.`id` = `documents_tags`.`tag_id`
WHERE `document_id` = 111;
I've tried different joins, but I keep getting only one record. I thought there was a way to join all the tags and then limit the results to where the document is null?
EDIT: In the above example I need to find all tags not assigned to document 111.
SELECT Tags.name -- or whatever it is
FROM
tags
LEFT JOIN documents_tags dt ON (tags.id = dt.tag_id AND dt.document_id = 111)
WHERE dt.id IS NULL

MySQL - Select by some "tags", include all tags

I've successfully managed to fetch articles filtering by matching tags in an AND manner.
This is my current code:
SELECT *
FROM articles a
JOIN article_tags a_t ON a_t.article_id = a.id
LEFT JOIN tags t ON t.id = a_t.tag_id
WHERE t.caption IN ('fire', 'water', 'earth')
HAVING COUNT(DISTINCT t.caption) = 3
Where:
articles are the articles I want to fetch, with id, title, etc…
tags are the list of tags, with id and caption
article_tags a relationship table, with article_id and tag_id
Now The problem is that after matching, I want to retrieve all the tags that each article has. Even if they are matched by 3 different ones, one may have 5 tags, other 4 tags, and I want them included in each row. Something like "tag,tag,tag" or whatever I can parse, in some "tags" column.
Any ideas? I can't find a way around it...
You need to join your query as a subquery with a query that returns all the tags and combines them with GROUP_CONCAT().
select a.*, GROUP_CONCAT(DISTINCT t.caption) tags
from (select distinct a.*
from articles a
JOIN article_tags a_t on a_t.article_id = a.id
JOIN tags t on t.id = a_t.tag_id
WHERE t.caption IN ('fire', 'water', 'earth')
GROUP BY a.id
HAVING COUNT(DISTINCT t.caption) = 3) a
JOIN article_tags a_t on a_t.article_id = a.id
JOIN tags t on t.id = a_t.tag_id
GROUP BY a.id
BTW, there's no reason to use LEFT JOIN in your query, because you only care about rows with matches in tags.
I also wonder about the need for DISTINCT in the COUNT() -- do you really allow multiple tag IDs with the same caption?

Mysql inner joins manage count() behaviour

I have the following query that uses reference tables tags_titles and tags_blogs to compare against the table that holds tags in it called tags. Tags themselves are held in the column t.label.
My problem is that sometimes it count() excessive total_matches. Usually when the tag can be found referenced in tags_titles and tags_blogs. Is there any way to make the inner joins mutually exclusive, or other solution so that the count of actual matches to the column t.label is accurate?
SELECT b.blog_id AS id, b.title AS title, b.body AS body, COUNT(t.label) AS total_matches, b.creation_time AS creation_time, '1' AS type
FROM tags AS t
INNER JOIN tags_titles AS tt
ON tt.tag_id = t.tag_id
INNER JOIN tags_blogs AS tb
ON tb.tag_id = t.tag_id
INNER JOIN blogs AS b
ON tt.blog_id=b.blog_id OR tb.blog_id=b.blog_id
WHERE t.label IN ($in) AND b.title IS NOT NULL
GROUP BY id, title, body, creation_time, type
Your problem is that there is a tags list for titles and a tags list for blogs, and you are getting a Cartesian product of these tags for each blog.
The simple solution to your problem is to use count(distinct):
SELECT b.blog_id AS id, b.title AS title, b.body AS body, COUNT(distinct t.label) AS total_matches,
b.creation_time AS creation_time, '1' AS type
FROM tags AS t
INNER JOIN tags_titles AS tt
ON tt.tag_id = t.tag_id
INNER JOIN tags_blogs AS tb
ON tb.tag_id = t.tag_id
INNER JOIN blogs AS b
ON tt.blog_id=b.blog_id OR tb.blog_id=b.blog_id
WHERE t.label IN ($in) AND b.title IS NOT NULL
GROUP BY id, title, body, creation_time, type;
In more complicated scenarios, you sometimes need to aggregate along the separate dimensions independently before the joins.
You have another problem which is t.label in ($in). This doesn't work for in. Instead, you can use:
find_in_set(t.label, $in) > 0;
Or do a direct substitution of the list in SQL. The former method does not use indexes for the filtering. The latter will (if an appropriate one is available).