I have the following query that uses reference tables tags_titles and tags_blogs to compare against the table that holds tags in it called tags. Tags themselves are held in the column t.label.
My problem is that sometimes it count() excessive total_matches. Usually when the tag can be found referenced in tags_titles and tags_blogs. Is there any way to make the inner joins mutually exclusive, or other solution so that the count of actual matches to the column t.label is accurate?
SELECT b.blog_id AS id, b.title AS title, b.body AS body, COUNT(t.label) AS total_matches, b.creation_time AS creation_time, '1' AS type
FROM tags AS t
INNER JOIN tags_titles AS tt
ON tt.tag_id = t.tag_id
INNER JOIN tags_blogs AS tb
ON tb.tag_id = t.tag_id
INNER JOIN blogs AS b
ON tt.blog_id=b.blog_id OR tb.blog_id=b.blog_id
WHERE t.label IN ($in) AND b.title IS NOT NULL
GROUP BY id, title, body, creation_time, type
Your problem is that there is a tags list for titles and a tags list for blogs, and you are getting a Cartesian product of these tags for each blog.
The simple solution to your problem is to use count(distinct):
SELECT b.blog_id AS id, b.title AS title, b.body AS body, COUNT(distinct t.label) AS total_matches,
b.creation_time AS creation_time, '1' AS type
FROM tags AS t
INNER JOIN tags_titles AS tt
ON tt.tag_id = t.tag_id
INNER JOIN tags_blogs AS tb
ON tb.tag_id = t.tag_id
INNER JOIN blogs AS b
ON tt.blog_id=b.blog_id OR tb.blog_id=b.blog_id
WHERE t.label IN ($in) AND b.title IS NOT NULL
GROUP BY id, title, body, creation_time, type;
In more complicated scenarios, you sometimes need to aggregate along the separate dimensions independently before the joins.
You have another problem which is t.label in ($in). This doesn't work for in. Instead, you can use:
find_in_set(t.label, $in) > 0;
Or do a direct substitution of the list in SQL. The former method does not use indexes for the filtering. The latter will (if an appropriate one is available).
Related
I have an m:n relationship of images and tags in my database, which uses a crosstable to model this.
Table imgs contains much more information than just img_id, but that is all that's required to uniquely identify an image.
I want to find every img_id which has both tagA and tagB (and tagC and so on, I'll build this string so it won't really matter whether its two or ten tags).
Now, where I'm stuck is, of course first you'll join imgs with img_tags with tags, add a where clause for the tags;
SELECT *
FROM imgs
INNER JOIN img_tags ON imgs.img_id = img_tags.img_id
INNER JOIN tags ON img_tags.tag_id = tags.tag_id
WHERE tag = 'tagA' OR tag = 'tagB';
and then you'll get rows with identical imgs information, only differing in tag and tag_id. Now I should be able to count those, targeting only those which appear in the same amount as tags were supplied (Count(*) = n), and then using group by to aggregate them? But I can't quite figure it out.
In case it might be of relevance, you might assume the fields in img_tags are both foreign keys referencing the other tables, however that is not the case, they are not linked in any way.
You can use aggregation like this:
SELECT i.*
FROM imgs i JOIN
img_tags it
ON i.img_id = it.img_id JOIN
tags t
ON it.tag_id = t.tag_id
WHERE tag IN ('tagA', 'tagB')
GROUP BY i.img_id
HAVING COUNT(*) = 2;
Aggregating by i.img_id is safe -- and supported by the SQL standard -- assuming that img_id is the primary key in the table.
Here is on approach using a correlated subquery:
SELECT i.*
FROM imgs i
WHERE (
SELECT COUNT(*)
FROM img_tags it
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE i.img_id = it.img_id AND t.tag IN('tagA', 'tagB')
) = 2
This assumes no duplicate tags in your data structure. Otherwise, you can use COUNT(DISTINCT t.tag) instead of COUNT(*).
You can also use aggregation:
SELECT i.id
FROM imgs i
INNER JOIN img_tags it ON i.img_id = it.img_id
INNER JOIN tags t ON it.tag_id = t.tag_id
WHERE t.tag IN('tagA', 'tagB')
GROUP BY i.id
HAVING COUNT(*) = 2
If there are not many tags involved, I would just use exists (or not exists if you want to exclude some tags) for this
select *
from imgs
where
exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagA') and it.img_id=imgs.img_id)
and exists(select 1 from img_tags it where it.tag_id=(select tag_id from tags where tag='tagB') and it.img_id=imgs.img_id);
especially if you end up wanting to do more complicated boolean expressions e.g. (A and (B or not C)).
I have two tables
tracks
tags
One track have many tags
I want to have list of tracks that have both of two tags example tag_id 1 and tag_id 2
SELECT * FROM tracks
LEFT JOIN tags ON tracks.tag_id = tags.id
WHERE tags.id in (1,2)
GROUP BY track.id
HAVING count(tags.id) = 2
The problem if a tracks have tag 1 and 3 it will be listed.
any help please?
Add distinct to count
SELECT track.id FROM tracks
LEFT JOIN tags ON tracks.tag_id = tags.id
WHERE tags.id in (1,2)
GROUP BY track.id
HAVING count(Distinct tags.id) = 2
You can change the LEFT JOIN to INNER JOIN since it is converted implicitly based on your Where clause
Your code should do what you want it to. I would write it as:
SELECT track.id
FROM tracks INNER JOIN
tags
ON tracks.tag_id = tags.id
WHERE tags.id in (1, 2)
GROUP BY track.id
HAVING count(tags.id) = 2;
Note:
The LEFT JOIN is turned into an INNER JOIN by the WHERE clause. You might as well be specific.
If you have duplicates in track, then you want to use COUNT(DISTINCT) rather than COUNT().
Because you are returning non-aggregated columns, you might get unexpected results in other columns.
Actually, this can be further simplified to:
SELECT t.id
FROM tracks t
WHERE t.tag_id in (1, 2)
GROUP BY t.id
HAVING count(t.id) = 2;
The JOIN is not needed at all, because you have the information you need in tracks.tag_id.
I've successfully managed to fetch articles filtering by matching tags in an AND manner.
This is my current code:
SELECT *
FROM articles a
JOIN article_tags a_t ON a_t.article_id = a.id
LEFT JOIN tags t ON t.id = a_t.tag_id
WHERE t.caption IN ('fire', 'water', 'earth')
HAVING COUNT(DISTINCT t.caption) = 3
Where:
articles are the articles I want to fetch, with id, title, etc…
tags are the list of tags, with id and caption
article_tags a relationship table, with article_id and tag_id
Now The problem is that after matching, I want to retrieve all the tags that each article has. Even if they are matched by 3 different ones, one may have 5 tags, other 4 tags, and I want them included in each row. Something like "tag,tag,tag" or whatever I can parse, in some "tags" column.
Any ideas? I can't find a way around it...
You need to join your query as a subquery with a query that returns all the tags and combines them with GROUP_CONCAT().
select a.*, GROUP_CONCAT(DISTINCT t.caption) tags
from (select distinct a.*
from articles a
JOIN article_tags a_t on a_t.article_id = a.id
JOIN tags t on t.id = a_t.tag_id
WHERE t.caption IN ('fire', 'water', 'earth')
GROUP BY a.id
HAVING COUNT(DISTINCT t.caption) = 3) a
JOIN article_tags a_t on a_t.article_id = a.id
JOIN tags t on t.id = a_t.tag_id
GROUP BY a.id
BTW, there's no reason to use LEFT JOIN in your query, because you only care about rows with matches in tags.
I also wonder about the need for DISTINCT in the COUNT() -- do you really allow multiple tag IDs with the same caption?
I'm using the following query to search for blogs that contain certain words in their titles. Each word is recorded as a unique in the table tags and then referenced to an actual blog in the table tags_titles. t.label is where the actual tag words are stored.
For some reason this query does not produce ay results, unless I input a number in which case it produces all the blogs without filtering. How can I get this to work?
SELECT tt.blog_id, b.title, COUNT(*) AS total_matches
FROM tags_titles AS tt
INNER JOIN tags AS t
ON tt.tag_id = t.tag_id
LEFT JOIN blogs AS b
ON tt.blog_id=b.blog_id
WHERE t.label IN ('boats','planes')
GROUP BY tt.blog_id
ORDER BY total_matches DESC
I think you want a right join rather than a left join and to fix some other details in the query:
SELECT b.blog_id, b.title, COUNT(t.label) AS total_matches
FROM tags_titles tt INNER JOIN
tags t
ON tt.tag_id = t.tag_id RIGHT JOIN
blogs b
ON tt.blog_id=b.blog_id and
t.label IN ('boat','plane')
GROUP BY b.blog_id
ORDER BY total_matches DESC;
You are asking for something at the blog level. However, the join is instead keeping all the tags, rather than the blogs. Once this switches to the blogs, then total_matches counts the number of matching tags to get the count (count(*) would never return 0 in this case, because there would be no row).
If you want at least one match, then include having total_matches > 0.
So I have three tables, one is posts , having columns id,title,content,timestamp . Other is tags, having columns id,tag and third posttags describes one to many relation between posts and tags , having columns postid,tagid .
Now instead of having columns like hidden,featured etc in the table posts to describe whether a post should be visible to all or should be displayed on a special featured page, I thought why not use tags to save time. So what I decided is that all posts that have a tag #featured will be featured and all posts with tag #hidden will be hidden.
Implementing first one was easy as I could use a join query and in my where clause I could mention WHERE tag='featured' and this would get all the featured posts for me.
But take an example of a post tagged #sports and #hidden if I were to use the query
SELECT * FROM posts
INNER JOIN posttags ON posttags.postid = posts.id
INNER JOIN tags ON posttags.tagid = tags.id
WHERE tag !='hidden'
but that'd still return the post tagged hidden since its also tagged sports
PS my question is different from this question : Select a post that does not have a particular tag since it uses tagid directly and I'm unable to achieve same result using double join to check against tag name instead of tagid. And also I wish to retrieve the other tags of the post in same query which is not possible using the method in that question's answers
Group the tags by post, then use the HAVING clause to filter the groups for those that do not contain a 'hidden' tag. Because of MySQL's implicit type conversion and lack of genuine boolean types, one can do:
SELECT posts.*
FROM posts
JOIN posttags ON posttags.postid = posts.id
JOIN tags ON posttags.tagid = tags.id
GROUP BY posts.id
HAVING NOT SUM(tag='hidden')
You can do this with a NOT EXISTS subquery:
SELECT p.*, t.* -- what columns you need
FROM posts AS p
INNER JOIN posttags AS pt
ON pt.postid = p.id
INNER JOIN tags AS t
ON pt.tagid = t.id
WHERE NOT EXISTS
( SELECT *
FROM posttags AS pt_no
INNER JOIN tags AS t_no
ON pt_no.tagid = t_no.id
WHERE t_no.tag = 'hidden'
AND pt_no.postid = p.id
) ;
or the equivalent LEFT JOIN / IS NULL:
SELECT p.*, t.*
FROM posts AS p
LEFT JOIN posttags AS pt_no
INNER JOIN tags AS t_no
ON t_no.tag = 'hidden'
AND pt_no.tagid = t_no.id
ON pt_no.postid = p.id
INNER JOIN posttags AS pt
ON pt.postid = p.id
INNER JOIN tags AS t
ON pt.tagid = t.id
WHERE pt_no.postid IS NULL ;
Thsi type of queries are called anti-semijoins or just anti-joins. It's slightly more complex in your case because the condition (tag='hidden') is in a 3rd table.