I thought a query like this would be pretty easy because of the nature of relational databases but it seems to be giving me a fit. I also searched around but found nothing that really helped. Here's the situation:
Let's say I have a simple relationship for products and product tags. This is a one-to-many relationship, so we could have the following:
productid | tag
========================
1 | Car
1 | Black
1 | Ford
2 | Car
2 | Red
2 | Ford
3 | Car
3 | Black
3 | Lexus
4 | Motorcycle
4 | Black
5 | Skateboard
5 | Black
6 | Skateboard
6 | Green
What's the most efficient way to query for all (Ford OR Black OR Skateboard) AND NOT (Motorcycles OR Green)? Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).
There are about 150k records in the products table and 600k records in the tags tables, so the query is going to need to be as efficient as possible. Here's one query that I've been messing around with (example #1), but it seems to be taking about 4 seconds or so. Any help would be much appreciated.
SELECT p.productid
FROM products p
JOIN producttags tag1 USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skatebaord' )
AND p.productid NOT IN (SELECT productid
FROM producttags
WHERE tag IN ( 'Motorcycle', 'Green' ));
Update
The quickest query I've found so far is something like this. It's taking 100-200ms but it seems pretty inflexible and ugly. Basically I'm grabbing all products that match Ford, Black, or Skateboard. Them I'm concatenating all of the tags for those matched products into a colon-separated string and removing all products that match on :Green: AND :Motorcycle:. Any thoughts?
SELECT p.productid,
Concat(':', Group_concat(alltags.tag SEPARATOR ':'), ':') AS taglist
FROM products p
JOIN producttags tag1 USING (productid)
JOIN producttags alltags USING (productid)
WHERE p.active = 1
AND tag1.tag IN ( 'Ford', 'Black', 'Skateboard' )
GROUP BY tag1.productid
HAVING ( taglist NOT LIKE '%:Motorcycle:%'
AND taglist NOT LIKE '%:Green:%' );
I'd write the exclusion join with no subqueries:
SELECT p.productid
FROM products p
INNER JOIN producttags AS t ON p.productid = t.productid
LEFT OUTER JOIN producttags AS x ON p.productid = x.productid
AND x.tag IN ('Motorcycle', 'Green')
WHERE p.active = 1
AND t.tag IN ( 'Ford', 'Black', 'Skateboard' )
AND x.productid IS NULL;
Make sure you have an index on products over the two columns (active, productid) in that order.
You should also have an index on producttags over the two columns (productid, tag) in that order.
Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).
Sometimes these complex conditions are hard for the MySQL optimizer. One common workaround is to use UNION to combine simpler queries:
SELECT p.productid
FROM products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
WHERE p.active = 1
AND t1.tag IN ('Car', 'Skateboard')
UNION ALL
SELECT p.productid
FROM products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
INNER JOIN producttags AS t2 ON p.productid = t2.productid
WHERE p.active = 1
AND t1.tag IN ('Motorcycle')
AND t2.tag IN ('Green', 'Red');
PS: Your tagging table is not an Entity-Attribute-Value table.
I would get all the unique ID matches and the unique IDs to filter out, then LEFT JOIN those lists (as per tigeryan) and filter out any IDs that match. The query should also be easier to read and modify by keeping all the queries separate. It should be fairly quick also, although it may not look like it.
SELECT * FROM products p
WHERE
p.active=1 AND
productid IN (
SELECT matches.productid FROM (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
)
Sometimes a JOIN is faster than an IN, depending on how mysql optimizes the query:
SELECT p.* FROM (
SELECT matches.productid FROM (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
) AS idfilter
JOIN products p ON p.productid=idfilter.productid AND p.active=1
The second query should force the join order since the internal selects have to be done first.
I would usually attack this by trying to eliminate records in the from...
select p.productid
from product p
left join producttags tag1
on p.productid = tag1.productid and tag1.tag NOT IN ('Motorcycles','Green')
where tag1.tag IN ('Ford','Black','Skateboard') and p.active = 1
What about this one:
SELECT DISTINCT p.id FROM products AS p
JOIN producttags AS included ON (
included.productid = p.id
AND included.tag IN ('Ford', 'Black', 'Skatebaord')
)
WHERE active = 1
AND p.id NOT IN (
SELECT DISTINCT productid FROM producttags
WHERE tag IN ('Motorcycle', 'Green')
)
Alternative to the CONCAT/LIKE solution:
SELECT p.productid
FROM products p
JOIN producttags USING (productid)
WHERE p.active = 1
AND tag IN ('Ford', 'Black', 'Skateboard')
GROUP BY p.productid
HAVING SUM(IF(tag IN ('Motorcycle','Green'), 1, 0)) = 0;
Related
I have two tables, Products and ProductTags (I also have table Tags which is not part of the problem)
My query is
SELECT
product.id
FROM
Products JOIN ProductTags ON Products.id = ProductTags.product_id
WHERE ProductTags.tag_id = 10 and ProductTags.tag_id <> 20
Table ProductTags is one-to-many connection between product_id and tag_id, if it's called like that. Let's say that whole ProductTags table is:
product_id
tag_id
777
10
777
20
888
10
888
30
what I get as an output from my query is 777 and 888. But I want to exclude products which have tag #20.
In real query I also join with other tables (also I need to have access to other fields of Products table), so I CAN'T get proper result using only ProductTags table! I am aware that I could do it like
SELECT product_id ProductTags WHERE tag_id = 10
but this WON'T be proper solution!
One way is to use aggregation.
SELECT product_id id
FROM producttags
WHERE tag_id IN (10,
20)
GROUP BY product_id
HAVING max(tag_id) = 10;
Another one uses NOT EXISTS and a correlated subquery.
SELECT pt1.product_id
FROM producttag pt1
WHERE pt1.tag_id = 10
AND NOT EXISTS (SELECT *
FROM producttag pt2
WHERE pt2.product_id = pt1.product_id
AND pt2.tag_id = 20);
Note that the join of product isn't needed to only get the product IDs, unless there's no proper foreign key constraint on producttags.product_id.
SELECT product.id
FROM Products
JOIN ProductTags ON Products.id = ProductTags.product_id
GROUP BY product.id
HAVING SUM(ProductTags.tag_id = 10) > 0 -- at least one
AND SUM(ProductTags.tag_id = 20) = 0 -- none
This form allows any amount of simple or complex conditions.
For example:
-- strictly one row with tag_id = 30
AND SUM(ProductTags.tag_id = 30) = 1
-- at least one tag_id 40 or 50
AND SUM(ProductTags.tag_id IN (40, 50)) > 0
-- ... and so on
Is there a more efficient way to filter on a joined table as in the following example? Or is this a fine approach? This query returns the desired results, but I am an amateur at MySQL.
I have indexes on products.id, product_details.product_id and product_details.value
SELECT p.id
FROM products p
LEFT
JOIN product_details d
ON d.product_id = p.id
WHERE d.value = 1
OR p.id = 4
Simplified structure as follows:
products table
product_id (PRIMARY KEY) | name
--------------------------------
1 | Shirt
2 | Shoes
3 | Dress
4 | A product with no corresponding details row
product_details table
product_id (PRIMARY KEY) | value
---------------------------------
1 | 1
2 | 23
3 | 32
This is your query:
SELECT products.id
FROM products LEFT JOIN
product_details
ON product_details.product_id = products.id
WHERE product_details.value = 1 OR products.id = 4;
This is not a bad practice. I do think the query is easier to follow using EXISTS:
SELECT p.id
FROM products p
WHERE p.id = 4 OR
EXISTS (SELECT 1
FROM product_details pd
WHERE pd.product_id = p.id AND pd.value = 1
);
In addition EXISTS makes it clear that you don't want to return duplicates if there are duplicate matching rows in product_details.
If performance is you main consideration, then EXISTS is probably your best choice, with an index on product_details(product_id, value).
Couple of notes:
As a rule of thumb, a UNION ALL statement performs better than an OR operator. Also, this helps clear up the query.
Using both an implicit JOIN and a predicate in the WHERE clause on the same table can get you into trouble - especially if you're using a LEFT OUTER JOIN (the predicate in the WHERE clause has precedence over the LEFT OUTER JOIN).
Seems like you always want to pull back any records that has a products.id = 4, and also any products that have a product_details.value = 1. This seems like two separate queries to me, and splitting it would probably make it easier to maintain in the future.
SELECT
p.id
FROM
products p
WHERE
p.id = 4
UNION ALL
SELECT
p.id
FROM
product_details pd
JOIN
products p
ON
p.id = pd.product_id
WHERE
pd.value = 1
Source: https://bertwagner.com/posts/or-vs-union-all-is-one-better-for-performance/
Been trying to reintroduce myself to SQL through some practice questions I've developed for myself, but struggling to find a better way of approaching the following problem:
playlists
id title
1 Title1
2 Title2
playlist_clips
id playlist_id clip_id
1 Title1 3
2 Title2 1
playlist_tags
playlist_id tag_id
1 1
1 2
2 2
Clips and Tags are two entirely separate tables, and I am using the playlist_tags and playlist_clips to connect them to the playlists table, to represent the two-way one-to-many relationships.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
This is what I've come up with:
select p_clips.* from
(
select p.id, p.title, count(pc.id) as number_of_clips
from playlists p
left join playlist_clips pc on p.id = pc.playlist_id
where p.title like "Test1"
group by id
) as p_clips
inner join
(
select *
from playlists p
left join playlist_tags pt on p.id = pt.playlist_id
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) as p_tags
on p_clips.id = p_tags.id
Whilst, from my testing I've found this to work, it doesn't look particularly elegant, and I also assume it's not terribly efficient performance-wise. (I've removed irrelevant parameters from the code for this example, such as select parameters.)
What would be a cleaner way of approaching this, or at the least, a more optimized approach?
Expected Result:
id title
260 Title1
EDIT: I apologize for my initial confusing post, I've tried to clean up my tables and the information they contain.
I wanted to select all the playlists that have a given title, and have ALL of the tags provided in the query (in this example [1, 2]), not just "at least one of them".
You don't need the clips table at all. You don't need left joins or the playlists table in the subquery.
That suggests:
select p.*
from playlists p join
(select pt.playlist_id
from playlist_tags pt
where pt.tag_id in (1, 2)
group by id
having count(*) = 2
) pt
on p.id = pt.playlist_id
where p.title like 'Test1';
You could phrase this without a subquery as well:
select p.*
from playlists p join
playlist_tags pt
on p.id = pt.id
where p.title like 'Test1' and
pt.tag_id in (1, 2)
group by p.id
having count(*) = 2
For the sake of clarity and this question i will rename the tables so it is a bit clearer for everybody and explain what i want to achieve:
There is an input form with options that return categories ID's. If a 'Product' has 'Category', i want to return/find the 'Product' which lets say has multiple categories(or just 1) and all of its categories are inside the array that is passed from the form.
Products table
ID Title
1 Pizza
2 Ice Cream
Categories table
ID Title
1 Baked food
2 Hot food
ProductsCategories table
ID ProductId CategoryId
1 1 1
2 1 2
So if i pass [1,2] the query should return Product with id 1 since all ProductsCategories are inside the requested array, but if i pass only 1 or 2, the query should return no results.
Currently i have the following query which works, but for some reason if i create a second Product and create a ProductCategory that has a CategoryId same as the first product, the query returns nulll...
SELECT products.*
FROM products
JOIN products_categories
ON products_categories.product_id= products.id
WHERE products_categories.category_id IN (1, 2)
HAVING COUNT(*) = (select count(*) from products_categories pc
WHERE pc .product_id = products.id)
All help is deeply appretiated! Cheers!
In order to match all values in IN clause, you just need to know in addition the number of passed categories which you must use it in HAVING clause:
SELECT
p.*,
GROUP_CONCAT(c.title) AS categories
FROM
Products p
INNER JOIN ProductsCategories pc ON pc.productId = p.ID
INNER JOIN Categories c ON c.ID = pc.categoryId
WHERE
pc.categoryId IN (1,2)
GROUP BY
p.id
HAVING
COUNT(DISTINCT pc.categoryId) = 2 -- this is # of unique categories in IN clause
So in case IN (1,2) result is:
+----+-------+---------------------+
| id | title | categories |
+----+-------+---------------------+
| 1 | Pizza | Baked Food,Hot Food |
+----+-------+---------------------+
1 row in set
In case IN (1,3) result is Empty set (no results).
#mitkosoft, thanks for your answer, but sadly the query is not producing the needed results. If the product's categories are partially in the passed categories the product is still returned. Additionally i might not know how many parameters are sent by the form.
Luckily I managed to create the query that does the trick and works perfectly fine (at least so far)
SELECT products.*,
COUNT(*) as resultsCount,
(SELECT COUNT(*) FROM products_categories pc WHERE pc.product_id = products.id) as categoriesCount
FROM products
JOIN products_categories AS productsCategories
ON productsCategories.product_id= products.id
WHERE productsCategories.category_id IN (7, 15, 8, 1, 50)
GROUP BY products.id
HAVING resultsCount = categoriesCount
ORDER BY amount DESC #optional
That way the query is flexible and gives me exactly what I needed! - Only those products that have all their categories inside the search parameters(not partially).
Cheers! :)
I have 2 tables in my database:
Products:
--------------------------------------------------
| id | product_name | manufacturer |
--------------------------------------------------
Products_photos:
-----------------------------------------------
| id | product_id | image_name |
-----------------------------------------------
I want select all Products, where Product_photos count is greater than 0.
How I can do that?
#Edit:
I don't want to add results from Products_photos for my output. I want only show entries from Products, where are any images. Sorry for my english :)
Thanks for help
I think the joining solutions already offered are the best bet, in terms of query efficiency. But for clarity - in terms of expressing exactly what you ask for - I would choose an approach like this:
select * from products p
where exists (select * from products_photos pp where pp.product_id = p.id)
SELECT p.id, p.product_name, p.manufacturer
FROM Products p
INNER JOIN Products_photos i on i.product_id = p.id
you can do
Select P.id, P.product_name, P.manufacturer
from Products P
INNER JOIN Products_photos Pp on P.Id = Pp.product_id
For the Inner Join, it will only return rows where it's posible the joining, which means that you have at least one value in the Products_photos table.
SELECT P.* FROM Products AS P INNER JOIN Products_Photos AS PP ON P.id=PP.id
Another method, more inefficient but maybe better for you to understand, would be
SELECT P.* FROM Products AS P
WHERE P.id IN (SELECT DISTINCT id FROM Product_photos)