Anti Join with group/conditions

Anti Join with group/conditions - mysql

Note: I have simplified the question since both that and the answer have become I believe more complex than intended.
I want to an an anti-join that has a condition other than just not existing in the first table.
Table Product / Manufacturer
Widget / Acme
Paddle / Acme
Ball / Acme
Gas / Exxon
Pump / Exxon
Table: Customer / Product
Karen / Ball
Bob / Paddle
Karen / Gas
Bob / Pump
A "normal" anti-join would find out which products have not been ordered via
Select Products from `Product / Manufacturer` as T1
Left Join `Customer / Product` as T2
On T2.Zip is NULL
However what I am looking for is which customers didn't order which products, in essence:
Select Products from `Product / Manufacturer`
where Manufacturer = 'Acme' that do not exist in `Customer / Product`
where Customer = 'Karen'
and
Select Products from `Product / Manufacturer`
where Manufacturer = 'Exxon' that do not exist in `Customer / Product`
where Customer = 'Karen'
and
Select Products from `Product / Manufacturer`
where Manufacturer = 'Acme' that do not exist in `Customer / Product`
where Customer = 'Bob'
and
Select Products from `Product / Manufacturer`
where Manufacturer = 'Exxon' that do not exist in `Customer / Product`
where Customer = 'Bob'
'
But as one query since there are 100s of "Customers" and 100s of Manufacturers.

If you want to exclude all products for a manufacturer for which no product from that manufacturer appears in any order...
Then that means that you only want to include only products from certain manufacturers...
Which manufacturers have had a product appear in an order ?
SELECT r.manufacturer
FROM products r
JOIN orders s
ON s.product = r.product
GROUP BY r.manufacturer
You can wrap that query in parens and include it as an inline view ...
SELECT p.*
FROM ( SELECT r.manufacturer
FROM product r
JOIN orders s
ON s.product = r.product
GROUP BY r.manufacturer
) q
JOIN product p
ON p.manufacturer = q.manufacturer
LEFT
JOIN orders o
ON o.product = p.Product
WHERE o.product IS NULL
There are other query patterns that will return an equivalent result.
FOLLOWUP
NOTE: The "breakdown by gender/hour" part wasn't made clear in the original specification.
The query pattern is very much the same. Use an inline view query to return a distinct list of manufacturers for each gender/hour.
Then join that set to the product table, to get every product from those manufacturer. That will included products that were ordered, as well as products that weren't ordered.
Then apply the anti-join pattern, to exclude the products that were ordered by gender/hour.
SELECT q.gender
, q.hour
, p.manufacturer
, p.product
FROM ( SELECT s.gender
, s.hour
, r.manufacturer
FROM orders s
JOIN product r
ON r.product = s.product
GROUP
BY s.gender
, s.hour
, r.manufacturer
) q
JOIN product p
ON p.manufacturer = q.manufacturer
LEFT
JOIN orders o
ON o.gender = q.gender
AND o.hour = q.hour
AND o.product = p.product
WHERE o.product IS NULL
If that's not clear, consider that the following query returns an equivalent set. The inline line view query t returns the set of all products from a manufacturer, by gender/hour.
This query is somewhat less efficient (at least in MySQL) due to the additional inline view. And while longer, it may be more understandable, since the view query t makes explicit the set of all possible rows that could be returned... every product by manufacturer/gender/hour. (To see that set, the view query t can be pulled out and run separately to see what it returns.)
In the outermost query, t is referenced as if it were a table. If it t were replaced by a simple table reference, the query would just be a simple anti-join. All rows from t excluding rows that have a match.
SELECT t.gender
, t.hour
, t.manufacturer
, t.product
FROM (
SELECT q.gender
, q.hour
, q.manufacturer
, p.product
FROM ( SELECT s.gender
, s.hour
, r.manufacturer
FROM orders s
JOIN product r
ON r.product = s.product
GROUP
BY s.gender
, s.hour
, r.manufacturer
) q
JOIN product p
ON p.manufacturer = q.manufacturer
) t
LEFT
JOIN orders o
ON o.gender = t.gender
AND o.hour = t.hour
AND o.product = t.product
WHERE o.product IS NULL
I recommend you get the set of rows returned first. Before you futz with adding a GROUP BY and a GROUP_CONCAT aggregate to collapse the rows.
If you want to group multiple values of "hour" into just "am" or "pm", you can use an expression (in place of "hour") that returns "am" or "pm". (Think in terms of that expression being another column in the table; but instead of referencing a column in the table, you use an expression that derives the value from other columns in the table.
IF(x.hour<12,'am','pm')

Related

How do MySQL aggregate sum function with two different data tables?

SELECT
category_id,
product_size,
category_name,
SUM(product_quantity) AS total_quantity
FROM tbl_categories_quantity -- (table-1)
INNER JOIN tbl_categories USING (category_id)
GROUP BY category_id,product_size
The Above Code is working in a single table, and I want to Add the below code (2nd table) that does not work
SELECT
category_id,
product_size,
SUM(product_sell) AS total_sell
FROM tbl_product_sell -- (table-2)
GROUP BY category_id,product_size;

From first subquery retrieves category and product size wise total quantity and second one retrieves total sales based on category and product size. Then combine this two subquery with LEFT JOIN because sometimes sale may not happen. COALESCE() is used for replacing NULL value to 0 (zero). If specific category or product size wise data are required then use WHERE clause in both the subquery. As category id is unique so MAX(category_name) is used otherwise category name must be placed in GROUP BY clause. Subtract total sale from total quantity for calculating available quantity.
-- MySQL
SELECT t.category_name category
, t.product_size
, t.product_quantity
, COALESCE(p.total_sell, 0) product_sell
, (t.product_quantity - COALESCE(p.total_sell, 0)) available_in_stock
FROM (SELECT tc.category_id
, tcq.product_size
, MAX(tc.category_name) category_name
, SUM(tcq.product_quantity) product_quantity
FROM tbl_categories tc
INNER JOIN tbl_categories_quantity tcq
ON tc.category_id = tcq.category_id
GROUP BY tc.category_id
, tcq.product_size) t
LEFT JOIN (SELECT category_id
, product_size
, SUM(product_sell) total_sell
FROM tbl_stock_sell
GROUP BY category_id
, product_size) p
ON t.category_id = p.category_id
AND t.product_size = p.product_size
Please check from url https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=b8c54aa656d9dc930fcb7a93d2bc0960
N.B.: Table name or column name may vary based on your DB.

Product groups in US that has no sale in SQL

I have written the following two queries for the below requirement. Please let me know which method is correct or both methods are wrong? Thanks a lot
There were two tables -
'Orders' with - order_id(PK), item id, quantity, order_date [Transactional Table]
'Catalog' with-item id, product group, location [Dimension Table]
They asked to write a SQL code that will return the product groups of US that has no sale in any unit(i.e all the item id from an individual product group has no sale).
1st Method:
with cte as
(
select c.*,o.order_id,
case when o.order_id is not null then 1 else 0 end sale_ind
from Catalog c
left join Orders o
on c.item_id = o.item_id
and c.location = 'US'
)
select product_group
from cte
group by product_group having sum(sale_ind) = 0
2nd Method:
select c.*
from Catalog c
where c.location='US'
and item_id not in (
select item_id
from Orders)

They asked to write a SQL code that will return the product groups of US that has no sale in any unit(i.e all the item id from an individual product group has no sale).
I would tend to go with not exists for this:
select distinct c.product_group
from catalog c
where c.location = 'US' and
not exists (select 1
from orders o
where o.item_id = c.item_id
);
That said, both your queries look okay, but the first is correct. The second is returning all catalog records not all product_groups. As for the second, I would discourage you from ever using not in with a subquery. No rows are returned if item_id returned by the subquery is ever NULL.

SELECT DISTINCT c.product_group
FROM Catalog c
LEFT OUTER JOIN Orders o
on c.item_id = o.item_id
WHERE c.location='US'
AND o.item_id is null
Left join: because you want catalog records (left side) even if there are no order records (right side). The second part of the WHERE clause filters out instances where there are orders.
You can’t use an inner join as that would return only records where the Catalog record had corresponding orders, which is not what you want

duplicate data being return with varied # of results in cross join

I created a database for a clothing brand. The products consists of 3 tables: products, product_photos, inventory.
The issue I'm having pertains to the amount of results in the product_photos, inventory tables that are returned. So say there should be 4 results in the inventory table (ex. size: s, m, l, xl AND quantity: 20, 20, 20, 20) and 1 result in the product_photos table (ex. photos: url). With my current query the product_photos data is duplicated 4 times (ex. photos: url, url, url, url).
My current query looks like this:
SELECT
products.id,
products.type,
products.collection,
products.title,
products.price,
products.thumbnail,
GROUP_CONCAT(product_photos.id) AS photoId,
GROUP_CONCAT(product_photos.photo) AS photos,
GROUP_CONCAT(inventory.size) AS size,
GROUP_CONCAT(inventory.quantity) as quantity
FROM `products`
RIGHT JOIN
`product_photos` ON products.id = product_photos.product_id
RIGHT JOIN `inventory` ON products.id = inventory.product_id
WHERE
products.id = ?
GROUP BY
products.id
I have played around with some things such as changing the right's to inner join's and left join's but right join seems to be the technical choice for what I'm trying to do.
Here is some sample data:
product:
id: 1
product_photo:
id: 1,
product_id: 1,
photo: url
inventory:
id: 1,
product_id: 1,
size: s,
quantity: 20
id: 2,
product_id: 1,
size: l,
quantity: 14

The reason for the many results is that both the product_photos as and the inventory table can have multiple records for the same product_id and so all pairwise combinations of records from both tables will be iterated (Cartesian product).
You can solve this by first selecting all records from both these tables with a union and then join that result with the products table:
SELECT products.id,
products.type,
products.collection,
products.title,
products.price,
products.thumbnail,
GROUP_CONCAT(photoId) AS photoId,
GROUP_CONCAT(photos) AS photos,
GROUP_CONCAT(size) AS size,
GROUP_CONCAT(quantity) as quantity
FROM products
LEFT JOIN (
SELECT product_id,
id AS photoId,
photo AS photos,
null AS size,
null AS quantity
FROM product_photos
UNION
SELECT product_id,
null,
null,
size,
quantity
FROM inventory
) combi
ON products.product_id = combi.product_id
WHERE products.id = ?
GROUP BY products.id

This problem would normally be handled by aggregating along each dimension independently:
SELECT p.*
pp.photoIds, pp.photos,
i.sizes, i.quantitys
FROM products p LEFT JOIN
(SELECT pp.product_id,
GROUP_CONCAT(pp.id) AS photoIds,
GROUP_CONCAT(pp.photo) AS photos
FROM product_photos pp
GROUP BY pp.product_id
) pp
ON p.id = pp.product_id LEFT JOIN
(SELECT i.product_id,
GROUP_CONCAT(i.size) AS sizes,
GROUP_CONCAT(i.quantity) as quantitys
FROM inventory i
GROUP BY i.product_id
) i
ON p.id = i.product_id
WHERE p.id = ?;
Notes:
You want LEFT JOIN from the products table, not RIGHT JOIN. Presumably, you want information about a product if it is available, not only if it has inventory.
You should include an ORDER BY so the GROUP_CONCAT() values are guaranteed to be in the same order. That is, the first id matches the first image, for instance.
Table aliases make the query easier to write and to read.
The above should be more efficient (and simpler to code and follow0 than using UNION -- or UNION ALL). That is, two aggregations on "n" rows should be more efficient than one aggregation on "2 * n" rows. In more sophisticated databases, this formulation also gives the optimizer more information.
Because you only want this for one product, it is actually more efficient to filter before the aggregation:
SELECT p.*
pp.photoIds, pp.photos,
i.sizes, i.quantitys
FROM products p LEFT JOIN
(SELECT pp.product_id,
GROUP_CONCAT(pp.id) AS photoIds,
GROUP_CONCAT(pp.photo) AS photos
FROM product_photos pp
WHERE pp.product_id = ?
GROUP BY pp.product_id
) pp
ON p.id = pp.product_id LEFT JOIN
(SELECT i.product_id,
GROUP_CONCAT(i.size) AS sizes,
GROUP_CONCAT(i.quantity) as quantitys
FROM inventory i
WHERE i.product_id = ?
GROUP BY i.product_id
) i
ON p.id = i.product_id
WHERE p.id = ?;
In this case, you need to pass the parameter three times.

mysql aggregate functions in query with two joins gives unexpected results

Given the following (very simplified) mysql table structure:
products
id
product_categories
id
product_id
status (integer)
product_tags
id
product_id
some_other_numeric_value
I am trying to find every product that has an association to a certain product_tag, and that a relation to at least one category whichs status-attribute is 1.
I tried the following query:
SELECT *
FROM `product` p
JOIN `product_categories` pc
ON p.`product_id` = pc.`product_id`
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
GROUP BY p.`product_id`
HAVING SUM( pc.`status` ) > 0
ORDER BY SUM( pt.`some_other_numeric_value` ) DESC
Now my problem is: The SUM(pt.some_other_numeric_value) returns unexpected values.
I realized that if the product in question has more then one relation to the product_categories table, then every relation to the product_tags table is counted as many timed as there are relations to the product_categories table!
For example: If product with id=1 has a relation to product_categories with ids = 2, 3 and 4, and a relation with the product_tags with ids 5 and 6 - then if I insert a GROUP_CONCAT(pt.id), then it does give 5,6,5,6,5,6 instead of the expected 5,6.
At first I suspected it was a problem with the join type (left join, right join, inner join, and so on), so I tried every join type that I know of, but to no avail. I also tried to include more id-fields into the GROUP BY clause, but this didn´t solve the problem either.
Can somebody explain to me what is actually going wrong here?

You join a "main" (product) table to two tables (tags and categories) via 1:n relationships, so this is expected, you are creating a mini cartesian product. For those products that have both more than one associated tags and more than one associated categories, multiple rows are created in the result set. If you Group By, you have wrong results in aggregate functions.
One way to avoid this is to remove one of the two joins, which is a valid startegy if you don't need results from that table. Say you don't need anything in the SELECT list from the product_categories table. Then you can use a semi-join (the EXISTS subquery)to that table:
SELECT p.*,
SUM( pt.`some_other_numeric_value` )
FROM `product` p
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
AND EXISTS
( SELECT *
FROM product_categories pc
WHERE pc.product_id = pc.product_id
AND pc.status = 1
)
GROUP BY p.`product_id`
ORDER BY SUM( pt.`some_other_numeric_value` ) DESC ;
Another way to circumvent this problem is - after the GROUP BY MainTable.pk - to use DISTINCT inside the COUNT() or GROUP_CONCAT() aggregate functions. This works but you can't use it with SUM(). So, it's not useful in your specific query.
A third option - which works always - is to first group by the two (or more) side tables and then join to the main table. Something like this in your case:
SELECT p.* ,
COALESCE(pt.sum_other_values, 0) AS sum_other_values
COALESCE(pt.cnt, 0) AS tags_count,
COALESCE(pc.cnt, 0) AS categories_count,
COALESCE(category_titles, '') AS category_titles
FROM `product` p
JOIN
( SELECT product_id
, COUNT(*) AS cnt
, GROUP_CONCAT(title) AS category_titles
FROM `product_categories` pc
WHERE status = 1
GROUP BY product_id
) AS pc
ON p.`product_id` = pc.`product_id`
JOIN
( SELECT product_id
, COUNT(*) AS cnt
, SUM(some_other_numeric_value) AS sum_other_values
FROM `product_tags` pt
WHERE some_value = 'some comparison value'
GROUP BY product_id
) AS pt
ON p.`product_id` = pt.`product_id`
ORDER BY sum_other_values DESC ;
The COALESCE() are not strictly needed there - just in case you chnage the inner joins to LEFT outer joins.

you cant order by a sum function
instead you could do it like that
SELECT * ,SUM( pt.`some_other_numeric_value` ) as sumvalues
FROM `product` p
JOIN `product_categories` pc
ON p.`product_id` = pc.`product_id`
JOIN `product_tags` pt
ON p.`product_id` = pt.`product_id`
WHERE pt.`some_value` = 'some comparison value'
GROUP BY p.`product_id`
HAVING SUM( pc.`status` ) > 0
ORDER BY sumvalues DESC

select one to one relationship in one to many table structure

I have three MySql tables:-
tbl_part - Contains a list of parts with a part_id
tbl_product - Contains a list of products with a product_id
tbl_part_to_product - Contains one to many relationships between parts and products (part_id & product_id)
I'm trying to do two things:-
Select all products that only have one part.
Find all products that only have a specific part as there only part.

SELECT
*
FROM
tbl_part part
INNER JOIN tbl_part_to_product p2p ON part.part_id = p2p.part_id
INNER JOIN tbl_product prod ON p2p.product_id =prod.product_id
WHERE part.name = 'whatever'
GROUP BY prod.product_id
HAVING COUNT(*) = 1
To select all products that only have one part just delete the WHERE clause.
If you don't want to join to the parts table:
SELECT
*
FROM
tbl_product prod
INNER JOIN tbl_part_to_product p2p ON p2p.product_id =prod.product_id
GROUP BY prod.product_id
HAVING COUNT(*) = 1

Let's address both questions:
Select all products that only have one part.
SELECT tbl_product.*
FROM
tbl_product product
INNER JOIN tbl_part_to_product ptop ON ptop.product_id = product.product_id
GROUP BY product.product_id
HAVING COUNT(ptop.part_id) = 1
Find all products that only have a specific part as there only part.
SELECT tbl_product.*
FROM
tbl_product product
INNER JOIN tbl_part_to_product ptop ON ptop.product_id = product.product_id
INNER JOIN tbl_part part ON ptop.part_id = part.part_id
WHERE part.part_name = 'some name'
GROUP BY product.product_id
HAVING COUNT(ptop.part_id) = 1
If you get GROUPing errors from either query, you need to add all the tbl_product fields to the GROUP.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Anti Join with group/conditions - mysql

Related

How do MySQL aggregate sum function with two different data tables?

Product groups in US that has no sale in SQL

duplicate data being return with varied # of results in cross join

mysql aggregate functions in query with two joins gives unexpected results

select one to one relationship in one to many table structure

Categories

Resources