Converting Multiple subqueries with GROUP BY to JOIN - mysql

I'm working on a simple ordering system in MySQL and I came across this snag that I'm hoping some SQL genius can help me out with.
I have a table for Orders, Payments (with a foreign key reference to the Order table), and OrderItems (also, with a foreign key reference to the Order table) and what I would like to do is get the total outstanding balance (Total and Paid) for the Order with a single query. My initial thought was to do something simple like this:
SELECT Order.*, SUM(OrderItem.Amount) AS Total, SUM(Payment.Amount) AS Paid
FROM Order
JOIN OrderItem ON OrderItem.OrderId = Order.OrderId
JOIN Payment ON Payment.OrderId = Order.OrderId
GROUP BY Order.OrderId
However, if there are multiple Payments or multiple OrderItems, it messes up Total or Paid, respectively (eg. One OrderItem record with an amount of 100 along with two Payment Records will produce a Total of 200).
In order to overcome this, I can use some subqueries in the following way:
SELECT Order.OrderId, OrderItemGrouped.Total, PaymentGrouped.Paid
FROM Order
JOIN (
SELECT OrderItem.OrderId, SUM(OrderItem.Amount) AS Total
FROM OrderItem
GROUP BY OrderItem.OrderId
) OrderItemGrouped ON OrderItemGrouped.OrderId = Order.OrderId
JOIN (
SELECT Payment.OrderId, SUM(Payment.Amount) AS Paid
FROM Payment
GROUP BY Payment.OrderId
) PaymentGrouped ON PaymentGrouped.OrderId = Order.OrderId
As you can imagine (and as an EXPLAIN on this query will show), this is not exactly an optimal query so, I'm wondering, is there any way to convert these two subqueries with GROUP BY statements into JOINs?

The following is likely to be faster with the right indexes:
select o.OrderId,
(select sum(oi.Amount)
from OrderItem oi
where oi.OrderId = o.OrderId
) as Total,
(select sum(p.Amount)
from Payment p
where oi.OrderId = o.OrderId
) as Paid
from Order o;
The right indexes are OrderItem(OrderId, Amount) and Payment(OrderId, Amount).
I don't like writing aggregation queries this way, but it can sometimes help performance in MySQL.

Some answers have already suggested using a correlated subquery, but have not really offered an explanation as to why. MySQL does not materialise correlated subqueries, but it will materialise a derived table. That is to say with a simplified version of your query as it is now:
SELECT Order.OrderId, OrderItemGrouped.Total
FROM Order
JOIN (
SELECT OrderItem.OrderId, SUM(OrderItem.Amount) AS Total
FROM OrderItem
GROUP BY OrderItem.OrderId
) OrderItemGrouped ON OrderItemGrouped.OrderId = Order.OrderId;
At the start of execution MySQL will put the results of your subquery into a temporary table, and hash this table on OrderId for faster lookups, whereas if you run:
SELECT Order.OrderId,
( SELECT SUM(OrderItem.Amount)
FROM OrderItem
WHERE OrderItem.OrderId = OrderId
) AS Total
FROM Order;
The subquery will be executed once for each row in Order. If you add something like WHERE Order.OrderId = 1, it is obviously not efficient to aggregate the entire OrderItem table, hash the result to only lookup one value, but if you are returning all orders then the inital cost of creating the hash table will make up for itself it not having to execute the subquery for every row in the Order table.
If you are selecting a lot of rows and feel the materialisation will be of benefit, you can simplifiy your JOIN query as follows:
SELECT Order.OrderId, SUM(OrderItem.Amount) AS Total, PaymentGrouped.Paid
FROM Order
INNER JOIN OrderItem
ON OrderItem.OrderID = Order.OrderID
INNER JOIN
( SELECT Payment.OrderId, SUM(Payment.Amount) AS Paid
FROM Payment
GROUP BY Payment.OrderId
) PaymentGrouped
ON PaymentGrouped.OrderId = Order.OrderId;
GROUP BY Order.OrderId, PaymentGrouped.Paid;
Then you only have one derived table.

What about something like this:
SELECT Order.OrderId, (
SELECT SUM(OrderItem.Amount)
FROM OrderItem as OrderItemGrouped
where
OrderItemGrouped.OrderId = Order.OrderId
), AS Total,
(
SELECT SUM(Payment.Amount)
FROM Payment as PaymentGrouped
where
PaymentGrouped.OrderId = Order.OrderId
) as Paid
FROM Order
PS: You win again #Gordon xD

Select o.orderid, i.total, s.paid
From orders o
Left join (select orderid, sum(amount)
From orderitem) i
On i.orderid = o.orderid
Ieft join (select orderid, sum(amount)
From payments) s
On s.orderid = o.orderid

Related

SQL How to select original(distinct) values from table without using distinct, group by and over keywords?

Currently I'm studying and I received task to write query (join 4 tables: people, goods, orders and order details). So main table Order_details has two columns: Order_id and Good_id, in order to make possible to have many goods in one order (e.g. order with id = 1 has 3 rows in Order_details table but has different goods primary keys in each row).
So the problem is that I don't know any other possible methods(besides using group by, distinct or over()) to receive only one row of each order in Order_details table (like I would get by using for example Distinct keyword). I'm receiving completely same rows of each order (with same Order_id and Good_id) but i don't know how to get only one row of each order.
Here's my query(so basically i need to select sum of all goods in order but i don't think that it really matters in my problem) and scheme (if it'll help)
By the way I'm working with MYSQL.
SELECT
Order_details.Order_id,
Orders.Date, People.First_name,
People.Surname,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details, Goods
WHERE Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Order_details, Goods, Orders, People
WHERE Order_details.Order_id = Orders.Order_id
AND Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
AND Orders.Person_id = People.Person_id
ORDER BY Order_id ASC;
I have tried several methods, but still cant figure it out. Maybe somehow it is possible with subquery? But i'm not sure...
(I have tried method with UNION but it's not the key as well)
Remove the Goods and Order_details tables from the FROM clause and the corresponding conditions in the WHERE clause. You are not selecting anything from it anyway, except the SUM in the subselect. The Order_id can be selected from the Orders table. The join is just causing multiple rows per order.
Also please don't join with comma. Use the JOIN .. ON syntax. This makes it easier to see if the join conditions are reasonable.
SELECT
Orders.Order_id
Orders.Date,
People.First_name,
People.Surname,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details
JOIN Goods ON Order_details.Good_id = Goods.Good_id
WHERE Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Orders
JOIN People ON Orders.Person_id = People.Person_id
ORDER BY Orders.Order_id ASC;
you can use row_number() for this kind of thing it will assign a row number based on your criteria and then you can just pick the rows where the value is 1.
with t as (SELECT
Order_details.Order_id,
Orders.Date, People.First_name,
People.Surname,
row_number() over (
partition by order_id, good_id
order by order_id, good_id) rn,
(
SELECT SUM(Goods.Price * Order_details.Quantity)
FROM Order_details, Goods
WHERE Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
) AS Total_price
FROM Order_details, Goods, Orders, People
WHERE Order_details.Order_id = Orders.Order_id
AND Order_details.Good_id = Goods.Good_id
AND Order_details.Order_id = Orders.Order_id
AND Orders.Person_id = People.Person_id
ORDER BY Order_id ASC)
select * from t where rn = 1

Mysql subquery slow

I have two tables that track sales:
orders order_line_items
------ ----------------
id id
customer_id order_id
created_datetime item_id
quantity
was_paid_for
An order can have many order_line_items. For some orders, all of the line items have been paid for. For other orders, they have not all been paid for.
I am trying to fetch a list of all the orders for a specific customer, and indicate if the order was fully paid for, or not. I have it working with this query:
SELECT o.id,
(SELECT count(*) from order_line_items WHERE order_id = o.id AND was_paid_for = 0) = 0 as isFullyPaid
FROM orders o
WHERE o.customer_id = 12345
However some customers have 1000+ orders and the query takes 70 seconds to run (this is a simplified example, the real one joins in five other tables).
Is indexes the only way to speed this up? Thanks.
You could try adding the following index to the order_line_items:
CREATE INDEX idx ON order_line_items(order_id, was_paid_for);
That being said, you also could try using the following join version of your query:
SELECT o.id, COALESCE(oli.cnt, 0) = 0 AS isFullyPaid
FROM orders o
LEFT JOIN
(
SELECT order_id, COUNT(*) AS cnt
FROM order_line_items
WHERE was_paid_for = 0
GROUP BY order_id
) oli
ON oli.order_id = o.id
WHERE o.customer_id = 12345;
The same index suggestion applied to the above join query.

More than 1 rows returned from SELECT inside SELECT

I'm trying to create a query to find what is the total amount owed by each customer to the company. It is the GROUP BY customerNumber in the sub query that is creating the problem.
SELECT customerName,
customers.customerNumber,
SUM(quantityOrdered * priceEach) - ( SELECT SUM(amount) AS MoneyPayed FROM payments GROUP BY customerNumber ) AS AmountOwed
FROM payments
INNER JOIN customers ON payments.customerNumber = customers.customerNumber
INNER JOIN orders ON customers.customerNumber = orders.customerNumber
INNER JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber
GROUP BY customerNumber;
The tables I'm trying to link are payments and orderdetails.
When I get rid of the GROUP BY I get results in negatives as the total SUM of amount is subtracted from each row of SUM(quantityOrdered * priceEach).
How can I change this so that I can return multiple rows from payments to subtract from SUM(quantityOrdered * priceEach) from the order details table.
Link to DB as StackOverflow doesn't allow me to post images
Thanks for help, sorry if format is bad, this is my first post.
You will need a couple of subqueries to meet your requirement. Let us break it down.
First, you need the total value of orders from each customer. You're very close to the correct query for that. It should be
SELECT orders.customerNumber,
SUM(orderdetails.quantityOrdered * orderdetails.priceEach) owed
FROM orders
JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber
GROUP BY orders.customerNumber
This subquery's result set gives customerNumber and owed, the amount owed. Notice that orders::orderdetails is a one::many relationship, so we're sure we're counting each detail just once, so the SUMs will be correct.
Next we need the amount paid by each customer. This subquery is fairly simple.
SELECT customerNumber,
SUM(amount) paid
FROM payments
GROUP BY customerNumber
Now for the operation you're missing in your question: we need to join these two subqueries to your customers table.
SELECT customers.customerName, customers.customerNumber
owed.owed - paid.paid balance
FROM customers
LEFT JOIN (
SELECT orders.customerNumber,
SUM(orderdetails.quantityOrdered * orderdetails.priceEach) owed
FROM orders
JOIN orderdetails ON orders.orderNumber = orderdetails.orderNumber
GROUP BY orders.customerNumber
) paid ON customers.customerNumber = paid.customerNumber
LEFT JOIN (
SELECT customerNumber,
SUM(amount) paid
FROM payments
GROUP BY customerNumber
) owed ON customers.customerNumber = owed.customerNumber
See how this works? We join a table and two subqueries. Each subquery has either zero or one rows for each row in the table, so we need not use SUMs or GROUP BY in the outer query.
There's only one complication left: what if a customer has never paid anything? Then the value of paid.paid will be NULL after the LEFT JOIN operation. That will force the value of owed - paid to be NULL. So we need more smarts in the SELECT statement to yield correct sums.
SELECT customers.customerName, customers.customerNumber
COALESCE(owed.owed,0) - COALESCE(paid.paid,0) balance
...
COALESCE(a,b) is equivalent to if a is not null then a else b.
Pro tip In queries or subqueries with JOIN operations, always mention table.column instead of just column. The next person to work on your query will thank you.

SQL retrieving filtered value in subquery

in this cust_id is a foreign key and ords returns the number of orders for every customers
SELECT cust_name, (
SELECT COUNT(*)
FROM Orders
WHERE Orders.cust_id = Customers.cust_id
) AS ords
FROM Customers
The output is correct but i want to filter it to retrieve only the customers with less than a given amount of orders, i don't know how to filter the subquery ords, i tried WHERE ords < 2 at the end of the code but it doesn't work and i've tried adding AND COUNT(*)<2 after the cust_id comparison but it doesn't work. I am using MySQL
Use the HAVING clause (and use a join instead of a subquery).....
SELECT Customers.cust_id, Customers.cust_name, COUNT(*) ords
FROM Orders, Customers
WHERE Orders.cust_id = Customers.cust_id
GROUP BY 1,2
HAVING COUNT(*)<2
If you want to include people with zero orders you change the join to an outer join.
There is no need for a correlated subquery here, because it calculates the value for each row which doesn't give a "good" performance. A better approach would be to use a regular query with joins, group by and having clause to apply your condition to groups.
Since your condition is to return only customers that have less than 2 orders, left join instead of inner join would be appropriate. It would return customers that have no orders as well (with 0 count).
select
cust_name, count(*)
from
customers c
left join orders o on c.cust_id = o.cust_id
group by cust_name
having count(*) < 2

SQL max of a count involving 3 tables

I have one table Customers with CustomerID and PhoneNumber, the second table is Orders that has CustomerId and OrderNumber and a third table OrderDetails that has OrderNumber, PriceOfOneUnit and UnitsOrdered. I need to find the PhoneNumber for the customer who placed the largest order (PriceOfOneUnit * UnitsOrdered). Doing a count(PriceOfOneUnit*UnitsOrdered) as A1 and then `Group By CustomerId Order By A1 DESC LIMIT 1 is evidently not working after joining the 3 tables. Can anyone help.
If we take you at your word, and what you want is the biggest single line-item rather than the largest order, you can find the largest line-item and then find the order to which it belongs and then the customer who placed that order. You can use a dummy aggregate function to pull back the order id from orderDetails.
EDIT:
OK, for someone just starting out, I think it can be clearer to think in terms of Venn diagrams and use what are called Inline Views and subqueries:
select customername, phone
from customer
inner join
(
select o.id, customerid from orders o
inner join
(
select od.orderid from orderdetail od
where (od.qty * od.itemprice) =
(
select max(od.qty * od.itemprice)
from orderdetail as od
)
) as biggestorder
on o.id = biggestorder.orderid
) as X
on customer.id = X.customerid
Each of the queries inside parentheses returns a set that can be joined/intersected with other sets.
Give this a try,
SELECT cus.CustomerId, cus.PhoneNumber
FROM Customers cus
INNER JOIN Orders a
ON cus.CustomerId = a.CustomerId
INNER JOIN OrderDetails b
On a.OrderNumber = b.OrderNumber
GROUP BY cus.CustomerId, cus.PhoneNumber
HAVING SUM(b.PriceOfOneUnit * b.UnitsOrdered) =
(
SELECT SUM(b.PriceOfOneUnit * b.UnitsOrdered) totalOrdersAmount
FROM Orders aa
INNER JOIN OrderDetails bb
On aa.OrderNumber = bb.OrderNumber
GROUP BY aa.CustomerId
ORDER BY totalOrdersAmount DESC
LIMIT 1
)