How to select a distinct column value matching multiple criterea - mysql

I have a table containing attributes with the following structure:
id: bigint unsigned autoincrement
product_id: bigint foreign key
attribute_id: bigint foreign key
value: varchar(100)
I can query one criteria in the following fashion:
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = ? AND value = ?
However I need to be able to find products that match multiple such criteria and would like to avoid multiple database queries for performance reasons. Simply adding more criteria with AND won't work since they will involve the same columns so for example:
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 1 AND value = 'Blue'
INTERSECT
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 2 AND value = '36'
INTERSECT
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 3 AND value = 'slim'
I have read about the INTERSECT statement which seems like it might work but I've read that MySQL doesn't support it, a search through MySQL 8 documentation produced no relevant result and the query above which I assume is correct produces an error on MySQL.
I've also read that something similar could be achieved with an inner join, but all the examples I've found involve multiple tables. There might also be an even better or simpler way to write the query that hasn't occurred to me. Or perhaps it's actually better to just send multiple queries and calculate the intersection outside of MySQL (though I would be very surprised) I appreciate greatly any help from anyone who has done something similar in the past.

You need to use aggregation to count the number of matching rows to the set of conditions and assert that it is equal to the number of conditions:
SELECT product_id
FROM product_attributes
WHERE (attribute_id, value) IN ((1, 'Blue'), (2, '36'), (3, 'slim'))
GROUP BY product_id
HAVING COUNT(*) = 3

This is the key / value store problem.
It's a slight pain in the neck to do what you want. Use JOIN operations to pivot the values into a row. Like this.
SELECT p.product_id,
color.value AS color,
size.value AS size,
cut.value AS cut
FROM ( SELECT DISTINCT product_id FROM product_attributes ) p
LEFT JOIN product_attributes color ON color.product_id = p.product_id
AND color.attribute_id = 1
LEFT JOIN product_attributes size ON size.product_id = p.product_id
AND size.attribute_id = 2
LEFT JOIN product_attributes cut ON cut.product_id = p.product_id
AND cut.attribute_id = 3
This generates a resultset with one row per product/color/size/cut combination
Then you can filter that resultset like this
SELECT *
FROM (
SELECT p.product_id,
color.value AS color,
size.value AS size,
cut.value AS cut
FROM ( SELECT DISTINCT product_id FROM product_attributes ) p
LEFT JOIN product_attributes color ON color.product_id = p.product_id
AND color.attribute_id = 1
LEFT JOIN product_attributes size ON size.product_id = p.product_id
AND size.attribute_id = 2
LEFT JOIN product_attributes cut ON cut.product_id = p.product_id
AND cut.attribute_id = 3
) combinations
WHERE color='Blue' AND size='36' AND cut='slim'
MySQL's query planner is smart enough that this doesn't run as slowly as you might guess, given the proper indexes.
The FROM clause generates a comprehensive list of product ids, from your product_attributes table to join to the specific attributes. If you have some other table for products, use that instead of the SELECT DISTINCT....

Related

Search for product ids where an attribute is not present

I am using opencart for an online store and I have a SQL structure like this:
(image from phpmyadmin)
I am trying to cross match product ids with attribute ids.
I need to find products that don't have a particular attribute_id (attribute_id 17 to be more precise).
I tried sorting and exporting in various formats without success.
I am not good with mysql syntax but I am sure there has to be a way to achieve this result.
Also tried using this code:
SELECT product_id FROM oc_product_attribute WHERE NOT EXISTS (SELECT * FROM oc_product_attribute WHERE attribute_id = 17)
(oc_product_attribute is the table name)
...but it didn't output any results.
Please help me understand how I can find the product IDs that don't have attribute ID 17.
Thanks!
You should have a product table (in your case probably oc_product). Use it to avoid multiple checks. Also there might be a product which has no attributes. And you would miss that product in the result, if you only use the attributes table.
There are two common ways to achieve your goal. One is using a LEFT JOIN:
select p.*
from oc_product p
left join oc_product_attribute a
on a.product_id = p.product_id
and a.attribute_id = 17
where a.product_id is null
It's important that the condition a.attribute_id = 17 is in the ON clause. If you use it in the WHERE clause, the LEFT JOIN would be converted to an INNER JOIN, and you would get an empty result.
The other way is to use a correlated NOT EXISTS subquery:
select p.*
from oc_product p
where not exists (
select *
from oc_product_attribute a
where a.product_id = p.product_id
and a.attribute_id = 17
)
Note the (correlation) condition a.product_id = p.product_id. If you miss it (like in your attempt) the subquery will always find a row, and NOT EXISTS will always return FALSE.
Both approaches have similar performance.
If you only need the product ids you can replace p.* with p.product_id.
Your current approach is on the right track, but you need to correlate the exists subquery with the outer query:
SELECT DISTINCT o1.product_id
FROM oc_product_attribute o1
WHERE NOT EXISTS (SELECT 1 FROM oc_product_attribute o2
WHERE o1.product_id = o2.product_id AND o2.attribute_id = 17);
We could also use an aggregation approach here:
SELECT product_id
FROM oc_product_attribute
GROUP BY product_id
HAVING COUNT(attribute_id = 17) = 0;

SQL query with JOIN

I'm creating a product filter for e-commerce store. I have a product table, characteristics table and a table in which I store product_id, characteristic_id and a single filter value.
shop_products - id, name
shop_characteristics - id, values (json)
shop_values - product_id, characteristic_id, value
I can build a query to get all the products by a single value like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
It works fine. Also, I can modify this query and get all the products by multiple values that belong to the very same characteristics group (have identical characteristics_id) like this:
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
OR ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='indoor'))
but when I try to create a query for multiple conditions with different characteristic_id I get nothing
SELECT `p`.* FROM `shop_products` `p`
LEFT JOIN `shop_values` `fv` ON `p`.`id` = `fv`.`product_id`
WHERE ((`fv`.`characteristic_id`=3) AND (`fv`.`value`='outdoor'))
AND ((`fv`.`characteristic_id`=5) AND (`fv`.`value`='white'))
My guess it does not work because of AND operator that I am using wrong in this case due to there are no records in shop_values table that have both characteristic_id 3 and 5.
So my question is how to combine or modify my query to get all related products or maybe it is a flaw to store data like this and I need to create a different kind of shop_values table?
Use aggregation. You can also use tuples with the in clause. So:
SELECT p.*
FROM shop_products p JOIN
shop_values v
ON p.id = v.product_id
WHERE (v.characteristic_id, v.value) IN ( (3, 'outdoor'), (5, 'white'))
GROUP BY p.id
HAVING COUNT(DISTINCT v.characteristic_id) = 2;
Notes:
Unnecessarily escaping column and table aliases (with backticks) just makes the query harder to write and to read.
In general, using SELECT p.* and GROUP BY p.id is really, really bad form. The one exception is when you are grouping by a unique or primary key. This latter form is actually supported in the ANSI standard.
A LEFT JOIN is not needed. You need to find matches between the tables for the logic to work.
The use of AND and OR is fine for the WHERE clause. MySQL happens to support tuples with IN, which somewhat simplifies the logic.

SQL subquery going heywire

I am trying to fetch some combined result from two separate individual tables.
The transaction_fact table has around 3.6 million rows and translation_table has around 300000 rows.
Now i want a sum of amount for all transactions grouped by location and the product within that location. But as the fact table has only location id and product id and i would like the names in the result , I am using sub query.
My query is as follows:
SELECT
( SELECT translation
FROM translation_table
WHERE dim_name LIKE 'location_dim'
AND lang_id LIKE 'es'
AND dim_id LIKE CAST(o.loc_id AS CHAR(50))
AND field_name LIKE 'city') AS Location
, ( SELECT product_name
FROM prod_dim
WHERE prod_id = o.prod_id) AS Product
, SUM(amount)
FROM transaction_fact o
GROUP
BY loc_id
, prod_id
ORDER
BY loc_id
, prod_id;
But this query is not returning anything , just keeps on processing.
I waited for about one and half hour but still no result.
Please tell me what might be going wrong.
Joining the tables should eliminate the need for subqueries and give some performance boost. If not you may need to provide more details on the table structure before we can help. Something like this should get you started:
SELECT t.translation AS Location, p.product_name AS Product, SUM(o.amount) AS Total
FROM transaction_fact o
INNER JOIN translation_table t ON CAST(o.loc_id AS char(50)) = t.dim_id
INNER JOIN prod_dim p ON p.prod_id = o.prod_id
WHERE t.dim_name = 'location_dim'
AND t.lang_id = 'es'
AND t.field_name = 'city'
GROUP BY t.translation, p.product_name
ORDER BY o.loc_id, o.prod_id;
Notes: I've changed the LIKEs to =, as LIKE is for when you want to match on a pattern that includes wildcards.
The CAST that is used in the join to translation_table is not ideal. If you could do away with that you'd get better performance.

MySQL left join multiple column pairs

Given the following table (products_filter):
How can I do a SELECT ... FROM products LEFT JOIN products_filter ... in such a way that it only returns products which have ALL the specified (filter_id,filter_value) pairs.
Example: for (filter_id, filter_value) = (1,1),(3,0) it should only return the product with id 90001, because it matches both values.
If the specified filter pairs is restricted to a deifnite number the the following query should work.
Select a. Product_id
From products a
Left outer join
(Select product_id,filter_id,filter_value,count(*)
From product_filter
Where filter_id in (1,1) and filter_value in(3,0)
Group by product_id,filter_id,filter_value
Having count(*)=2)b
On(a.product_id=b.product_id)
As you only said you wanted the PRODUCTS values having the desired filter attributes... I've limited results to just product.*
The below query uses an inline view with the count of distinct filters by product ID. The outer where clause then uses the distinct count (in case duplicate filters could exist for a product) of the filter_IDs.
The # in the where clause should always match the number of where clause paired sets in the inline view.
Your sample data indicated that the paired sets could be a subset of all filters. so this ensures each filter pair (or more) exists for the desired product.
SELECT p.*
FROM products p
LEFT JOIN (SELECT product_ID, count(Distinct filter_ID) cnt
FROM products_Filter
WHERE (Filter_ID = 1 and filter_value = 1)
or (Filter_ID = 3 and filter_value = 0)
GROUP BY Product_ID) pf
on P.Product_ID = PF.Product_ID
WHERE pf.cnt = 2

mysql aggregate functions in query with two joins gives unexpected results

Given the following (very simplified) mysql table structure:
products
id
product_categories
id
product_id
status (integer)
product_tags
id
product_id
some_other_numeric_value
I am trying to find every product that has an association to a certain product_tag, and that a relation to at least one category whichs status-attribute is 1.
I tried the following query:
SELECT *
FROM `product` p
JOIN `product_categories` pc
ON p.`product_id` = pc.`product_id`
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
GROUP BY p.`product_id`
HAVING SUM( pc.`status` ) > 0
ORDER BY SUM( pt.`some_other_numeric_value` ) DESC
Now my problem is: The SUM(pt.some_other_numeric_value) returns unexpected values.
I realized that if the product in question has more then one relation to the product_categories table, then every relation to the product_tags table is counted as many timed as there are relations to the product_categories table!
For example: If product with id=1 has a relation to product_categories with ids = 2, 3 and 4, and a relation with the product_tags with ids 5 and 6 - then if I insert a GROUP_CONCAT(pt.id), then it does give 5,6,5,6,5,6 instead of the expected 5,6.
At first I suspected it was a problem with the join type (left join, right join, inner join, and so on), so I tried every join type that I know of, but to no avail. I also tried to include more id-fields into the GROUP BY clause, but this didnĀ“t solve the problem either.
Can somebody explain to me what is actually going wrong here?
You join a "main" (product) table to two tables (tags and categories) via 1:n relationships, so this is expected, you are creating a mini cartesian product. For those products that have both more than one associated tags and more than one associated categories, multiple rows are created in the result set. If you Group By, you have wrong results in aggregate functions.
One way to avoid this is to remove one of the two joins, which is a valid startegy if you don't need results from that table. Say you don't need anything in the SELECT list from the product_categories table. Then you can use a semi-join (the EXISTS subquery)to that table:
SELECT p.*,
SUM( pt.`some_other_numeric_value` )
FROM `product` p
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
AND EXISTS
( SELECT *
FROM product_categories pc
WHERE pc.product_id = pc.product_id
AND pc.status = 1
)
GROUP BY p.`product_id`
ORDER BY SUM( pt.`some_other_numeric_value` ) DESC ;
Another way to circumvent this problem is - after the GROUP BY MainTable.pk - to use DISTINCT inside the COUNT() or GROUP_CONCAT() aggregate functions. This works but you can't use it with SUM(). So, it's not useful in your specific query.
A third option - which works always - is to first group by the two (or more) side tables and then join to the main table. Something like this in your case:
SELECT p.* ,
COALESCE(pt.sum_other_values, 0) AS sum_other_values
COALESCE(pt.cnt, 0) AS tags_count,
COALESCE(pc.cnt, 0) AS categories_count,
COALESCE(category_titles, '') AS category_titles
FROM `product` p
JOIN
( SELECT product_id
, COUNT(*) AS cnt
, GROUP_CONCAT(title) AS category_titles
FROM `product_categories` pc
WHERE status = 1
GROUP BY product_id
) AS pc
ON p.`product_id` = pc.`product_id`
JOIN
( SELECT product_id
, COUNT(*) AS cnt
, SUM(some_other_numeric_value) AS sum_other_values
FROM `product_tags` pt
WHERE some_value = 'some comparison value'
GROUP BY product_id
) AS pt
ON p.`product_id` = pt.`product_id`
ORDER BY sum_other_values DESC ;
The COALESCE() are not strictly needed there - just in case you chnage the inner joins to LEFT outer joins.
you cant order by a sum function
instead you could do it like that
SELECT * ,SUM( pt.`some_other_numeric_value` ) as sumvalues
FROM `product` p
JOIN `product_categories` pc
ON p.`product_id` = pc.`product_id`
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
GROUP BY p.`product_id`
HAVING SUM( pc.`status` ) > 0
ORDER BY sumvalues DESC