Two left joins gives me untrue data(double data?) with MySQL

Two left joins gives me untrue data(double data?) with MySQL - mysql

This is my query:
SELECT `products`.*, SUM(orders.total_count) AS revenue,
SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars
FROM `products`
LEFT JOIN `orders`
ON (`products`.`id` = `orders`.`product_id`) AND
(`orders`.`status` = 'delivered' OR `orders`.`status` = 'new')
LEFT JOIN product_reviews
ON (products.id = product_reviews.product_id)
GROUP BY `products`.`ID`
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0
When I have this second left join, my first left joins data, revenue and qty from orders table gives me values that are not true at all (way too high, many doubles?)
From this question.
I got the direction that I am getting a semi-cartesian product, so two reviews for a product is doubling the quantities, and I believe this is my problem.
How can this be solved?

The problem is that the product_reviews and orders table can have more that one row per product id. One way you can fix this is to use a subquery:
SELECT `products`.*,
o.revenue,
o.qty,
ROUND(avg_stars) as avg_stars
FROM `products`
LEFT JOIN
(
select `product_id`,
sum(total_count) revenue,
sum(quantity) qty
from `orders`
where `status` in ('delivered', 'new')
group by `product_id`
) o
ON `products`.`id` = o.`product_id`
LEFT JOIN
(
select product_id, avg(stars) avg_stars
from product_reviews
group by product_id
) pr
ON (products.id = pr.product_id)
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0

Its not easy to solve this without seeing your table schemas,
I would suggest you look at your Aggregations and Group By statements first, then look at your column default values, how are you handling empty values, also look at DISTINCT in the Aggregation functions.
If all else fails and a "optimized" solution is not vital and your data volumes are low do a Sub Select only on the tables for which you require the values, within the Sub Select on 1 table you have a much narrower row scope and it will yield the correct result.
I would suggest that you supply your table schemas here.

One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, ( SELECT ROUND(AVG(r.stars))
FROM `product_reviews` r
WHERE r.product_id = p.id
) AS avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars).
If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, s.avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
LEFT
JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
, r.product_id
FROM `product_reviews` r
GROUP BY r.product_id
) s
ON s.product_id = p.id
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.
I apologize if my use of the term "semi-cartesian" was misleading or confusing.
The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).
For example, given three rows in reviews for product_id 101, and two rows in orders for product_id 101, e.g.:
REVIEWS
pid stars text
--- ----- --------------
101 4.5 woo hoo perfect
101 3 ehh
101 1 totally sucked
ORDERS
pid date qty
--- ----- ---
101 1/13 100
101 1/22 7
Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:
id date qty stars text
--- ---- ---- ---- ------------
101 1/13 100 4.5 woo hoo perfect
101 1/13 100 3 ehh
101 1/13 100 1 totally sucked
101 1/22 7 4.5 woo hoo perfect
101 1/22 7 3 ehh
101 1/22 7 1 totally sucked
Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.

Related

Find out extra quantity available in a table

I have two tables, let's say OrderPlaced and OrderDelivered.
The OrderPlaced table looks like this:
In a single order we can have multiple products(which is defined by sku in the table) and each product can have multiple quantity.
The OrderDelivered table looks like this:
So technically 3 products have not been delivered. Orderid 1000 - product S101, Orderid 1001 - product S102(as 3 quantity required, but 2 delivered) and Orderid 1002 - product S100.
I am trying to write a SQL query that can give me the OrderId and sku those have not been delivered. For now I have written something like
select OrderPlaced.orderid,OrderPlaced.sku
from OrderPlaced
left join OrderDelivered
on OrderPlaced.Orderid = OrderDelivered.orderid and OrderPlaced.sku = OrderDelivered.sku
where OrderDelivered.sku is NULL;
This is giving me Orderid 1000 - product S101 and Orderid 1002 - product S100, but Orderid 1001 - product S102 is missing. I understand I have to do a check on qty as well, but couldn't think how to do that. I would really appreciate it if someone can help me with that part.

Add up the deliveries per order and sku and then outer join the delivered quantities to the order table so you can compare the quantities.
select
p.orderid,
p.sku,
p.qty as ordered,
coalesce(d.sum_qty, 0) as delivered
from orderplaced p
left join
(
select orderid, sku, sum(qty) as sum_qty
from orderdelivered
group by orderid, sku
) d on d.orderid = p.orderid and d.sku = p.sku
where p.qty > coalesce(d.sum_qty, 0)
order by p.orderid, p.sku;

Your query works for any items that have not been delivered at all, this is your WHERE OrderDelivered.sku IS NULL. But you can also have a scenario in which fewer items are delivered than ordered, and importantly, you can have multiple records related to your deliveries even if they refer to the same order and sku (two rows with 1 qty each).
In this case you will need to sum up all the deliveries per placed order id, sku and quantity (GROUP BY clause in the query below) check if that sum (or 0 if nothing is found) differs from the placed order (HAVING clause). You could use such a query:
SELECT OrderPlaced.orderid, OrderPlaced.sku,
OrderPlaced.qty - COALESCE(SUM(OrderDelivered.qty), 0) AS qty_missing,
CASE
WHEN SUM(OrderDelivered.qty) IS NULL
THEN 'Yes'
ELSE 'No'
END AS is_missing_completely
FROM OrderPlaced
LEFT
JOIN OrderDelivered
ON OrderPlaced.Orderid = OrderDelivered.orderid
AND OrderPlaced.sku = OrderDelivered.sku
GROUP BY OrderPlaced.orderid, OrderPlaced.sku, OrderPlaced.qty
HAVING OrderPlaced.qty != COALESCE(SUM(OrderDelivered.qty), 0)
Here's a live demo on dbfiddle

I would create two aggregated representations of your ordered and delivered products, and then outer join them to get the differences. If you are using MySql 8 you can represent these as a CTE, otherwise just use two equivalent sub-queries
with op as (
select OrderId, Sku, Sum(qty) Qty
from OrderPlaced
group by OrderId, Sku
), od as (
select OrderId, Sku, Sum(qty) Qty
from OrderDelivered
group by OrderId, Sku
)
select op.OrderId, op.Sku, op.Qty - Coalesce(od.qty,0) notDelivered
from op
left join od on od.orderid = op.orderid and od.sku = op.sku
where op.Qty - Coalesce(od.qty,0)>0;
Example DB<>Fiddle

Calculate Count of items summed up in a Group By Query

I have two tables:
Orders
======
id total_price created_on
1 100 2021-01-22
2 200 2021-01-23
Items
=====
id order_id
11 1
12 1
13 2
I want to create a query to get revenue by date. For this i'm going to sum up total price in order and grouping it up by date. Along with revenue, I also want to get total numbers of orders and items for that date. Here's a query that I wrote:
SELECT
count(orders.id) as orders,
sum(orders.total_price) as billing,
DATE(CREATED_ON) as created_on
FROM
orders
WHERE orders.deleted_on IS NULL
group by Date(orders.created_on);
Now I found 2 problems:
The count of orders is coming incorrect. Not sure what. i'm doing wrong here.
How can I calculate the count of items also in same query ?
I'm learning sql and this seems a big difficult to get my head around. Thanks for your help.

As Items.order_id is foreign key to Order.id as a result we need to join both tables first.
SELECT count(order_id) AS orders,sum(total_price) AS billing,Orders.created_on as created_on FROM Orders,(select order_id from Items) as new WHERE Orders.id=new.order_id GROUP BY created_on;

This is tricky, because when you combine the items you might multiple the revenue. One method is to aggregate the items before joining to orders:
SELECT DATE(o.Created_On) as created_on_date,
COUNT(*) as num_orders,
SUM(i.num_items) as num_items,
SUM(o.total_price) as billing
FROM orders o LEFT JOIN
(SELECT i.order_id, COUNT(*) as num_items
FROM items i
GROUP BY i.order_id
) i
ON i.order_id = o.id
WHERE o.deleted_on IS NULL
GROUP BY DATE(o.created_on);
Note: This uses a LEFT JOIN because you have not specified that all orders have items. If all do then an INNER JOIN would suffice.

Uptimize select product details each customer purchased from sales tables

I have 4 tables, Customers, Products, Sales & Sale_Items. I pull data from it using the query below.
SELECT (
SELECT c.name
FROM Customers c
WHERE s.customer_id=c.id
) customer
,(
Select group_concat(description)
FROM (
SELECT si.id
,si.sale_id
,concat("x", si.Qty, " ", p.name, " ",(si.total)) description
FROM Sale_Items si
LEFT JOIN Products p ON p.id = si.product_id
) p
where s.id = Sale_ID
GROUP BY Sale_ID
) detail,
s.total
FROM Sales s
The query produces the result, but it becomes slow with just 2000 records (takes 114 seconds to finish)
Customer Product Total
--------------------------------------
James x1 ItemA 10.00 75.00
x3 ItemB 15.00
x1 ItemC 20.00
Mark x2 ItemA 10.00 50.00
x2 ItemB 15.00
Bisi x1 ItemC 20.00 30.00
x2 ItemA 10.00
how can i make this faster?
An attempt has been made here
https://www.db-fiddle.com/f/pkL2HtsT659EXgRSevFSAm/4

If we want to stick with correlated subqueries, we can eliminate the inline view p.
That's going to get materialized for every row retrieved from Sales. The predicate in the WHERE clause in the outer query doesn't get "pushed" down into the view. So the materialized view (or "derived table" in the MySQL parlance) is going to be a full set, and from that, we're going to pick out just a few rows. And we're going to repeat that for each row from Sales.
Unwinding that derived table should give us some performance benefit. This would be reasonable approach for a small number of rows returned from Sales, with suitable indexes defined. That is, if we were limiting the number of rows examined by the outer query with a WHERE clause. With a large number of rows, those correlated subqueries are going to drag down performance.
SELECT ( SELECT c.name
FROM Customers c
WHERE c.id = s.customer_id
) AS customer
, ( SELECT GROUP_CONCAT(CONCAT('x',si.Qty,' ',p.name,' ',si.total) ORDER BY p.name SEPARATOR '\r\n')
FROM Sale_Items si
LEFT
JOIN Products p
ON p.id = si.product_id
WHERE si.sale_id = s.id
) AS detail
, s.total
FROM Sales s
WHERE ...
ORDER
BY ...
If the query is returning all rows from Sales and we are doing the whole bloomin' set, then I'd tend to avoid the correlated subqueries. (That's because the subqueries gets executed for each and every row returned by the outer query. Those subqueries are going to eat our lunch, in terms of performance, with a large number of rows returned.)
Assuming id is unique in customers, we're usually much better off with a join operation.
SELECT c.name AS customer
, d.detail
, s.total
FROM Sales s
LEFT
JOIN Customers c
ON c.id = s.customer_id
LEFT
JOIN ( SELECT si.sale_id
, GROUP_CONCAT(CONCAT('x',si.Qty,' ',p.name,' ',si.total) ORDER BY p.name SEPARATOR '\r\n') AS detail
FROM Sale_Items si
LEFT
JOIN Products p
ON p.id = si.product_id
GROUP
BY si.sale_id
) d
ON d.sale_id = s.id
ORDER
BY ...
The inline view d is going to be expensive with large sets; but at least we're only doing that query one time, materializing the results into a "derived table". Then the outer query can run, and retrieve rows from the derived table.

Retrieve most popular products in a category sql statement

I'm trying to write an all encompassing SQL statement to retrieve the most popular items from my shop based on category ID. In english: For categoryID sort the most sold items
I've managed to query tblOrderContents and group + sort the results with most popular at the top:
SELECT productID, count(productID)
FROM tblOrderContents oc
GROUP BY productID
ORDER BY count(productID) DESC
which produces:
ID COUNT
16 419
12 52
34 38
33 33
But I'm struggling to figure out how to retrieve this for only a specific category. I'm thinking it will require a left join along the lines of but this doesn't work:
SELECT productID from tblProdCat
LEFT JOIN (SELECT STATEMENT DETAILED ABOVE)
WHERE categoryID = '7'
I hope this makes sense. I've been thinking about it so long i'm sure ive missed something obvious. Any advice would be great.
Thanks

Is productID in tblProdCat? If so, the following statement should work:
SELECT productID, count(productID)
FROM tblOrderContents oc
INNER JOIN tblProdCat pc ON oc.ProductId = pc.ProductId AND pc.CategoryId = '7'
GROUP BY productID
ORDER BY count(productID) DESC
Explanation: This inner join gives back only rows where the criteria (categoryID = '7' and matching productId's) is met. If a row does not satisfy this criteria, in either table, the row will not be returned.

getting stocks on hand from two dependent tables and join the result to another table

I have a problem with regards to the stated title. I want to get the stocks_on_hand from these two tables, namely:
stocks_added
product_id quantity_added
ANK001 50
stocks_released
product_id quantity_released
ANK001 20
after getting the stocks_on_hand (result of the two tables), i want to join it to the products table:
product_id product_name price
ANK0001 ANKLET 200

Use joins and sum():
select
p.product_id, product_name, price,
coalesce(sum(quantity_added), 0) - coalesce(sum(quantity_released), 0) as stocks_on_hand
from products p
left join stocks_added a on a.product_id = p.product_id
left join stocks_released r on r.product_id = p.product_id
group by p.product_id, product_name, price
Using outer (ie left) joins means you'll get a row for every product whether or not there are rows in the stock movement tables. Using coalesce() means a default value of zero is used when there are no rows in the stock table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Two left joins gives me untrue data(double data?) with MySQL - mysql

Related

Find out extra quantity available in a table

Calculate Count of items summed up in a Group By Query

Uptimize select product details each customer purchased from sales tables

Retrieve most popular products in a category sql statement

getting stocks on hand from two dependent tables and join the result to another table

Categories

Resources