Been trying to reintroduce myself to SQL through some practice questions I've developed for myself, but struggling to find a better way of approaching the following problem:
playlists
id title
1 Title1
2 Title2
playlist_clips
id playlist_id clip_id
1 Title1 3
2 Title2 1
playlist_tags
playlist_id tag_id
1 1
1 2
2 2
Clips and Tags are two entirely separate tables, and I am using the playlist_tags and playlist_clips to connect them to the playlists table, to represent the two-way one-to-many relationships.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
This is what I've come up with:
select p_clips.* from
(
select p.id, p.title, count(pc.id) as number_of_clips
from playlists p
left join playlist_clips pc on p.id = pc.playlist_id
where p.title like "Test1"
group by id
) as p_clips
inner join
(
select *
from playlists p
left join playlist_tags pt on p.id = pt.playlist_id
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) as p_tags
on p_clips.id = p_tags.id
Whilst, from my testing I've found this to work, it doesn't look particularly elegant, and I also assume it's not terribly efficient performance-wise. (I've removed irrelevant parameters from the code for this example, such as select parameters.)
What would be a cleaner way of approaching this, or at the least, a more optimized approach?
Expected Result:
id title
260 Title1
EDIT: I apologize for my initial confusing post, I've tried to clean up my tables and the information they contain.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
You don't need the clips table at all. You don't need left joins or the playlists table in the subquery.
That suggests:
select p.*
from playlists p join
(select pt.playlist_id
from playlist_tags pt
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) pt
on p.id = pt.playlist_id
where p.title like 'Test1';
You could phrase this without a subquery as well:
select p.*
from playlists p join
playlist_tags pt
on p.id = pt.id
where p.title like 'Test1' and
pt.tag_id in (1, 2)
group by p.id
having count(*) = 2
Related
given these tables :
id_article | title
1 | super article
2 | another article
id_tag | title
1 | great
2 | awesome
id_relation | id_article | id_tag
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
I'd like to be able to select all articles that are "great" AND "awesome" (eventually, I'll probably have to implement OR too)
And basically, if I do a select on articles the relation table joining on id_article: of course, I cant join two different values of id_tag. Only lead I had with concatenating IDs to test as a string, but that seems so lame, there has to be a prettier solution.
Oh and if it matters, I use a MySQL server.
EDIT: for ByWaleed, the typical sql select that would surely fail that I cited in my original question:
SELECT
a.id_article,
a.title
FROM articles a, relations r
WHERE
r.id_article = a.id_article and r.id_tag = 1 and r.id_tag = 2
wouldnt work because r.id_tag cant obviously be 1 and 2 on the same line. I doubt w3schools has an article on that. My search on google didnt yield any result, probably because I searched with the wrong keyword.
If you do all the joins as normal, then aggregate the rows to one group by article, then you can assert that they must have at least two different tags.
(Having already filtered to great and/or awesome, that means they have both.)
SELECT
a.id_article,
a.title
FROM
articles a
INNER JOIN
relations r
ON r.id_article = a.id_article
INNER JOIN
tags t
ON t.id_tag = r.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
COUNT(DISTINCT t.id_tag) = 2
(The DISTINCT is to avoid the possibility of one article having 'great' twice, for example.)
To do OR, you just remove the HAVING clause.
One approach is to aggregate by article, and then assert that the article both the "great" and "awesome" tags:
SELECT
a.id_article,
a.title
FROM articles a
INNER JOIN relations r
ON a.id_article = r.id_article
INNER JOIN tags t
ON r.id_tag = t.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
MIN(t.title) <> MAX(t.title);
Demo
The logic here is that we first limit records, for each article, to only those of the two targets tags. Then we assert, in the HAVING clause, that both tags appear. I use a MIN/MAX trick here, because if the min and max differ, then it implies that there are two distinct tags.
Step 1: Use a temp table to get all articles with titles.
Step 2: If an article occurs multiple times in your temp table, that means it has great and awesome as titles.
Try:
CREATE TEMPORARY TABLE MyTempTable (
select t1.id_article, t2.title
from table1 t1
inner join table3 t3 on t3.id_article = t1.id_article
inner join table2 t2 on t2.id_tag = t3.id_tag
)
select m.id_article
from MyTempTable m
group by m.id_article
having count(*)>1
Edit: This solution assumes there are two possible tags, great and awesome. If more, please add a "where" clause to the select query for creating the temp table like where t2.title in ('great','awesome')
I want to retrieve variants of the same article which are part of the same group. DBM is MySQL 5.7.
There are 2 tables:
articles
articles_group
Table articles has the fields:
article_id | title
1 first product
2 second prod
3 3rd prod
4 4th example
Table articles_groups:
group_id | article_id
1 1
1 2
1 3
2 4
In this example I would like to retrieve all other articles which are in the same group as article 1. So that would be article 2 and 3.
My best shot, but somehow pretty complex:
SELECT
art.article_id,
model
FROM
articles art
INNER JOIN
articles_group art_g ON art.article_id = art_g.article_id
WHERE art_g.group_id = (
SELECT ag.group_id
FROM articles a
INNER JOIN articles_group ag ON a.article_id = ag.article_id
WHERE a.article_id = 1
)
How can I retrieve all other articles which belong to the same group of given article in an easy way? I can still change the shema if there is a better setup.
Let's say your given article is 1. To get all articles in same group of given article, you can use subquery to get group_id of given article. Then use outer query to get all articles in same group.
SELECT a.article_id, a.title
FROM articles a
JOIN articles_groups g ON a.ref_id = g.ref_id
WHERE g.group_id = (
SELECT g.group_id
FROM articles a
JOIN articles_groups g ON a.ref_id = g.ref_id
WHERE a.article_id = '1'
)
I have 3 tables: tags, products and relation table between them.
Relation table looks for example like this:
tagId | ProductId
1 | 1
2 | 1
2 | 9
The user can pick two options "All of these" or "One of these".
So if user picks All of these, it's means that the product must have exactly all of tags which the user chose.
So if user pick tags with id 1 and 2, it should select only product with id 1, because this product has exactly the same tags the user chose. (Another way is if the user picks the tag with id 2, it should select only product with id 9.)
So, the product has to have all tags which the user chose (no more, no less).
SQL that I already have for Any/One of these:
SELECT DISTINCT s.SKU
FROM SKUToEAN as s
LEFT JOIN ProductDetails as p ON s.ProductDetailID=p.id
JOIN ProductTagRelation as ptr ON (ptr.productId=p.id and ptr.tagId IN(Ids of selected tags))
Example behavior:
TagId = 1 it should select => None
TagId = 2 it should select => 9
TagId = 1,2 it should select = 1,9
So probably I need two queries. One for any/one of these ( I already have this one ) and the second for all of these.
With PHP I decide which query to use.
You can GROUP BY on the ProductID and use conditional aggregation based filtering inside the Having clause. MySQL automatically casts boolean values to 0/1 when using in numeric context. So, in order to have a specific tagID value available against a ProductID, its SUM(tagId = ..) should be 1.
All of these:
SELECT ptr.productId, s.SKU
FROM SKUToEAN AS s
LEFT JOIN ProductDetails AS p
ON p.id = s.ProductDetailID
JOIN ProductTagRelation AS ptr
ON ptr.productId = p.id
GROUP BY ptr.productId, s.SKU
HAVING SUM(ptr.tagID = 1) AND -- 1 should be there
SUM(ptr.tagID = 2) AND -- 2 should be there
NOT SUM(ptr.tagID NOT IN (1,2)) -- other than 1,2 should not be there
Is this you are looking for (for all condition)?
select product.id
from products
inner join <table> on products.id = <table>.productId
group by product.id
having group_concat(<table>.tagId order by <table>.tagId separator ',') = '1,2';
I thought a query like this would be pretty easy because of the nature of relational databases but it seems to be giving me a fit. I also searched around but found nothing that really helped. Here's the situation:
Let's say I have a simple relationship for products and product tags. This is a one-to-many relationship, so we could have the following:
productid | tag
========================
1 | Car
1 | Black
1 | Ford
2 | Car
2 | Red
2 | Ford
3 | Car
3 | Black
3 | Lexus
4 | Motorcycle
4 | Black
5 | Skateboard
5 | Black
6 | Skateboard
6 | Green
What's the most efficient way to query for all (Ford OR Black OR Skateboard) AND NOT (Motorcycles OR Green)? Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).
There are about 150k records in the products table and 600k records in the tags tables, so the query is going to need to be as efficient as possible. Here's one query that I've been messing around with (example #1), but it seems to be taking about 4 seconds or so. Any help would be much appreciated.
SELECT p.productid
FROM products p
JOIN producttags tag1 USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skatebaord' )
AND p.productid NOT IN (SELECT productid
FROM producttags
WHERE tag IN ( 'Motorcycle', 'Green' ));
Update
The quickest query I've found so far is something like this. It's taking 100-200ms but it seems pretty inflexible and ugly. Basically I'm grabbing all products that match Ford, Black, or Skateboard. Them I'm concatenating all of the tags for those matched products into a colon-separated string and removing all products that match on :Green: AND :Motorcycle:. Any thoughts?
SELECT p.productid,
Concat(':', Group_concat(alltags.tag SEPARATOR ':'), ':') AS taglist
FROM products p
JOIN producttags tag1 USING (productid)
JOIN producttags alltags USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skateboard' )
GROUP BY tag1.productid
HAVING ( taglist NOT LIKE '%:Motorcycle:%'
AND taglist NOT LIKE '%:Green:%' );
I'd write the exclusion join with no subqueries:
SELECT p.productid
FROM products p
INNER JOIN producttags AS t ON p.productid = t.productid
LEFT OUTER JOIN producttags AS x ON p.productid = x.productid
AND x.tag IN ('Motorcycle', 'Green')
WHERE p.active = 1
AND t.tag IN ( 'Ford', 'Black', 'Skateboard' )
AND x.productid IS NULL;
Make sure you have an index on products over the two columns (active, productid) in that order.
You should also have an index on producttags over the two columns (productid, tag) in that order.
Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).
Sometimes these complex conditions are hard for the MySQL optimizer. One common workaround is to use UNION to combine simpler queries:
SELECT p.productid
FROM products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
WHERE p.active = 1
AND t1.tag IN ('Car', 'Skateboard')
UNION ALL
SELECT p.productid
FROM products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
INNER JOIN producttags AS t2 ON p.productid = t2.productid
WHERE p.active = 1
AND t1.tag IN ('Motorcycle')
AND t2.tag IN ('Green', 'Red');
PS: Your tagging table is not an Entity-Attribute-Value table.
I would get all the unique ID matches and the unique IDs to filter out, then LEFT JOIN those lists (as per tigeryan) and filter out any IDs that match. The query should also be easier to read and modify by keeping all the queries separate. It should be fairly quick also, although it may not look like it.
SELECT * FROM products p
WHERE
p.active=1 AND
productid IN (
SELECT matches.productid FROM (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
)
Sometimes a JOIN is faster than an IN, depending on how mysql optimizes the query:
SELECT p.* FROM (
SELECT matches.productid FROM (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
) AS idfilter
JOIN products p ON p.productid=idfilter.productid AND p.active=1
The second query should force the join order since the internal selects have to be done first.
I would usually attack this by trying to eliminate records in the from...
select p.productid
from product p
left join producttags tag1
on p.productid = tag1.productid and tag1.tag NOT IN ('Motorcycles','Green')
where tag1.tag IN ('Ford','Black','Skateboard') and p.active = 1
What about this one:
SELECT DISTINCT p.id FROM products AS p
JOIN producttags AS included ON (
included.productid = p.id
AND included.tag IN ('Ford', 'Black', 'Skatebaord')
)
WHERE active = 1
AND p.id NOT IN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycle', 'Green')
)
Alternative to the CONCAT/LIKE solution:
SELECT p.productid
FROM products p
JOIN producttags USING (productid)
WHERE p.active = 1
AND tag IN ('Ford', 'Black', 'Skateboard')
GROUP BY p.productid
HAVING SUM(IF(tag IN ('Motorcycle','Green'), 1, 0)) = 0;
I have a table "articles" with columns and data:
article_id title body
1 This is the title This is the body text
2 Another title Another body text
Another table "category" with columns and data:
category_id category
1 localnews
2 visible
3 first10
And a table "categories" with columns and data:
categories_id article_id category_id
1 1 1
2 1 2
3 1 3
4 2 1
5 2 3
I want to SELECT the row(s) WHERE categories.category_id = 1 AND =2 AND =3
I'm using:
SELECT articles.article_id, articles.title, articles.body,
categories.article_id, categories.category_id
FROM articles, categories
WHERE articles.article_id = categories.article_id
AND categories.article_id = 1
AND categories.article_id = 2
AND categories.article_id = 3
but it doesn't work. Obviously mySQL needs another syntax.
Can someone help?
Thanks
SELECT
Articles.article_id,
COUNT( Categories.article_id ) AS total
FROM CategoryArticles
LEFT JOIN Articles USING (article_id)
WHERE
CategoryArticles.category_id IN (1,2,3)
GROUP BY CategoryArticles.article_id
HAVING total = 3
I used a bit different names for table because in your example the distinction between category and categories is hard to notice.
An column of a row cannot be 1, 2 or 3 at the same time, which is what AND stipulates. Use OR in your WHERE condition. Better yet - for readability - you can use IN:
SELECT ...
WHERE `categories`.`article_id` IN(1,2,3)
In addition to the commonly used IN() and using a HAVING count, I would be interested in the performance difference by doing a multiple-join as follows...
SELECT STRAIGHT_JOIN
articles.article_id,
articles.title,
articles.body
FROM
categories c1
JOIN articles
on c1.article_id = articles.article_id
JOIN categories c2
on c1.article_id = c2.article_id
AND c2.category_id = 2
JOIN categories c3
on c1.article_id = c3.article_id
AND c3.category_id = 3
WHERE
c1.Category_ID = 1
Yes, this may look obscure, but lets think about it... by doing a join FIRST on the categories table where ONE of your specific categories -- THIS FIRST FROM instance of categories should be representative of whichever category would have the smallest granularity. Ex: Your categories of Local News, Visible and First 10. Local news would probably have the most entries, while Visible and First 10 would have even less... of those, which would have even the smallest number of records. Use THIS category as the where clause.
So, say you have 100,000 articles, and 90,000 are in local news, 45,000 in Visible, and 12,000 in First 10. By starting your query on only those in the 12,000, you are eliminating most of the data.
By then joining to the articles table, and categories AGAIN as alias C2 and C3 respectively based on the other conditions, if found, done, if not, its excluded.
Again, I'm wondering the performance impact. I would also have a compound index on the categories table on both (article_id, category_id)
The value cannot be all three values simultaneously, so you'd better use an IN clause in your WHERE to define which you want to return. Give you've already got a join condition there, you'd want to move that to an ON clause instead as well; ie:
SELECT articles.article_id, articles.title, articles.body, categories.article_id, categories.category_id
FROM articles
INNER JOIN categories ON articles.article_id = categories.article_id
WHERE categories.article_id IN ( 1, 2, 3 )
Of course, you can go to the next step and do:
SELECT articles.article_id, articles.title, articles.body, category.category
FROM articles
INNER JOIN categories ON articles.article_id = categories.article_id
INNER JOIN category ON categories.category_id = category.category_id
WHERE categories.article_id IN ( 1, 2, 3 )
If instead you wanted to show only articles that appear in all three categories, then you could take an approach like:
SELECT articles.article_id, articles.title, articles.body
FROM articles
INNER JOIN categories AS c1
ON articles.article_id = c1.article_id
AND c1.category_id = 1
INNER JOIN categories AS c2
ON articles.article_id = c2.article_id
AND c2.category_id = 2
INNER JOIN categories AS c3
ON articles.article_id = c3.article_id
AND c3.category_id = 3