Double data amount when joining multiple tables - mysql

Link to all my database tables in excel format:
Database
This is the assignment I was tasked with:
Find total deposit, total withdraw, and total difference for each customer.
What I did was
select c.customerid, customername, sum(d.depositamount) as 'Total Deposit', sum(w.withdrawalamount) as 'Total Withdraw', sum(d.depositamount) - sum(w.withdrawalamount) as 'differences'
from customers c left outer join deposits d on c.customerid = d.customerid
left outer join withdrawals w on c.CustomerID = w.CustomerID
group by c.CustomerID
order by c.CustomerID;
The result
My problem is that the "Total Deposit" and "Total Withdraw" have its data doubled. Since these two column data are doubled, the differences is also doubled. I know I can divide by 2 to all the column to solve the problem but I would like to know a proper method in doing this.
My question is how do I join multiple tables in such a way that the data will not be doubled?
(For example, "James Carlton Brokeridge" is suppose to have 450, 380, and 70 repsectively).

You are getting incorrect values because every time a customer makes more than one deposit or withdrawal, that causes the rows in the other (withdrawal/deposit) table to be replicated when the two tables are joined. To work around this, do the sums in subqueries:
select c.customerid,
c.customername,
d.totaldeposit as 'Total Deposit',
w.totalwithdrawal as 'Total Withdraw',
d.totaldeposit - w.totalwithdrawal as 'differences'
from customers c
left outer join (
select customerid, sum(depositamount) as totaldeposit
from deposits
group by customerid
) d on c.customerid = d.customerid
left outer join (
select customerid, sum(withdrawalamount) as totalwithdrawal
from withdrawals
group by customerid
) w on c.customerid = w.customerid
order by c.customerid

Related

Inner Join for 3 tables with SUM of two columns in SQL Query?

I have the following three tables:
I have Following Query to Join Above 3 Tables
customer.customer_id,
customer.name,
SUM(sales.total),
sales.created_at,
SUM(sales_payments.amount)
FROM
sales INNER JOIN customer ON customer.customer_id = sales.customer_id
INNER JOIN sales_payments ON sales.customer_id = sales_payments.customer_id
WHERE sales.created_at ='2020-04-03'
GROUP By customer.name
Result for Above Query is given below
Sum of sales.total is double of the actual sum of sales.total column which has 2-row count, I need to have the actual SUM of that column, without doubling the SUM of those rows, Thank you, for your help in advance..
PROBLEM
The problem here is that there are consecutive inner joins and the number of rows getting fetched in the second inner join is not restricted. So, as we have not added a condition on sales_payment_id in the join between the sales and sales_payment tables, one row in sales table(for customer_id 2, in this case) would be mapped to 2 rows in the payment table. This causes the same values to be reconsidered.
In other words, the mapping for customer_id 2 between the 3 tables is 1:1:2 rather than 1:1:1.
SOLUTION
Solution 1 : As mentioned by Gordon, you could first aggregate the amount values of the sales_payments table and then aggregate the values in sales table.
Solution 2 : Alternatively (IMHO a better approach), you could add a foreign key between sales and sales_payment tables. For example, the sales_payment_id column of sales_payment table can be introduced in the sales table as well. This would facilitate the join between these tables and reduce additional overheads while querying data.
The query would then look like:
`SELECT c.customer_id,
c.name,
SUM(s.total),
s.created_at,
SUM(sp.amount)
FROM customer c
INNER JOIN sales s
ON c.customer_id = s.customer_id
INNER JOIN sales_payments sp
ON c.customer_id = sp.customer_id
AND s.sales_payments_id = sp.sales_payments_id
WHERE s.created_at ='2020-04-03'
GROUP BY c.customer_id,
c.name,
s.created_at ;`
Hope that helps!
You have multiple rows for sales_payments and sales per customer. You need to pre-aggregate to get the right value:
SELECT c.customer_id, c.name, s.created_at, s.total, sp.amount
FROM customer c JOIN
(SELECT s.customer_id, s.created_at, SUM(s.total) as total
FROM sales s
WHERE s.created_at ='2020-04-03'
GROUP BY s.customer_id, s.created_at
) s
ON c.customer_id = s.customer_id JOIN
(SELECT sp.customer_id, SUM(sp.amount) as amount
FROM sales_payments sp
GROUP BY sp.customer_id
) sp
ON s.customer_id = sp.customer_id

MySQL SUM function in multiple joins

Hi so this is my case I have those tables
Customer {id,name}
Charges {id,amount,customer_id}
Taxes {id,amount,charge_id}
so I want to SUM amount of charges and taxes then group by customer id here is my query
SELECT SUM(ch.amount),SUM(t.amount)
FROM Customer c
LEFT JOIN Charges ch ON ch.customer_id = c.id
LEFT JOIN Taxes t ON t.charge_id = ch.id
GROUP BY c.id;
so in case I have 1 charge for customer than I have 2 taxes for that charge when I use SUM function it's counting amount of charge twice for example in case to show me 10$ it' showing me 20$
I know how can I fix that through subqueries, but I want to know is there any option to get correct value without subqueries like query I use above what can I modify there to fix that.
Thanks !
UPDATED ANSWER WITHOUT SUBQUERIES
SELECT
SUM(CASE WHEN #ch_id != ch.id
THEN ch.amount END) AS ch_amount,
SUM(t.amount) AS t_sum,
c.*,
#ch_id := ch.id
FROM
Customer c
LEFT JOIN charges ch ON c.id = ch.reservation_id
LEFT JOIN taxes t ON ch.id = t.charge_id
GROUP BY rs.id;
You want to know if you can do this without subqueries. No, you can't.
If a row in Charges has more than one corresponding row in Taxes, you can't simply join the tables without duplicating Charges rows. Then, as you have discovered, when you sum them up, you'll get multiple copies.
You need a way to get a virtual table (a subquery) with one row for each Charge.
SELECT ch.customer_id,
ch.amount amount,
tx.tax tax
FROM Charges
LEFT JOIN (
SELECT SUM(amount) tax,
charge_id
FROM Taxes
GROUP BY charge_id
) tx ON ch.id = tx.charge_id
You can then join that subquery to your Customer table to summarize sales by customer.
This is a pain because of the multiple hierarchies. I would suggest:
SELECT c.id, ch.charge_amount, ch.taxes_amount
FROM Customer c LEFT JOIN
(SELECT ch.customer_id, SUM(ch.amount) as charge_amount,
SUM(t.taxes_amount) as taxes_amount
FROM Charges ch LEFT JOIN
(SELECT t.charge_id, SUM(t.amounts) as taxes_amount
FROM taxes t
GROUP BY t.charge_id
) t
ON t.charge_id = ch.id
GROUP BY ch.customer_id
) ch
ON ch.customer_id = c.id;
You are not going to be able to fix this without subqueries of one form or another, if there are multiple charges for a customer or multiple taxes on a charge.

Mysql Query: LEFT JOIN List all orders and "balance due" for a customer, if no orders, just list customer. 3 Tables

I have a customers table, orders table, and payments table.
From these I am trying to get one table that has for each record: customer#, order#, Balance due. But if a customer has no order then I just want to list the customer# with NULL values for the other fields.
Using a LEFT JOIN I was able to do this with just the customer# and order# but I cannot figure out how to get the balance due in there as well. I am fairly new to Mysql and tried to search the answer but was unable.
Here is what works for just the customer# and order#:
SELECT
customers.cust_num,
orders.order_id
FROM customers
LEFT JOIN orders ON customers.cust_num = orders.cust_num
But I am trying to incorporate "orders.invoice_amount - SUM(payments.amount) AS balance_due" as another column where the payments table is related to the orders table by a field called "order_id" in both.
Perhaps something like:
SELECT orders.invoice_amount - SUM(payments.amount) AS balance_due FROM payments, orders WHERE payments.order_id = orders.order_id
Any idea of how I could go about doing that or pointers in the right direction?
do an additional LEFT JOIN, but to a sub-query on payments grouped by order. So, if it finds a record in the prequeried result, you are good to go. Also, I changed to using table "aliases" for shorter readability, especially if table names get long, or you have to join multiple times to the same table in a query.
SELECT
c.cust_num,
coalesce( o.order_id, 0 ) as Order_ID,
coalesce( o.invoice_amount, 0 ) as InvoiceAmount,
coalesce( Prepaid.TotalPaid, 0 ) as TotalPaid,
coalesce( o.invoice_amount - coalesce( PrePaid.TotalPaid, 0 ), 0) as BalanceDue
FROM
customers c
LEFT JOIN orders o
ON c.cust_num = o.cust_num
LEFT JOIN
( select
p.order_id,
sum( p.amount ) as totalPaid
from
payments p
group by
p.order_id ) as PrePaid
on o.order_id = PrePaid.order_id
you can try something like
SELECT
customers.cust_num,
orders.order_id,
orders.invoice_amount - SUM(payments.amount)
FROM customers
LEFT JOIN orders ON customers.cust_num = orders.cust_num
INNER JOIN payments P ON P..order_id = orders.order_id
group by cust_num,order_id,orders.invoice_amount

Error 1111 (HY000) - Invalid use of group function

I'm getting a problem when trying to run this query:
Select
c.cname as custName,
count(distinct o.orderID) as No_of_orders,
avg(count(distinct o.orderID)) as avg_order_amt
From Customer c
Inner Join Order_ o
On o.customerID = c.customerID
Group by cname;
This is an error message: #1111 (HY000) - Invalid use of group function
I just want to select each customer, find how many orders each customer has, and average the total number of orders for each customer. I think it might have a problem with too many aggregates in query.
The issue is that you need to have two separate groupings if you want to calculate the average over a count, so this expression isn't valid:
avg(count(distinct o.orderID))
Now it's hard to understand what exactly you mean, but it sounds as if you just want to use avg(o.amount) instead.
[edit] I see your addition now, so while the error is still the same, the solution will be slightly more complex. The last value you need, the avarage number of orders per customer, is not a value to calculate per customer. You'd need analytical functions to that, but that might be quite tricky in MySQL. I'd recommend to write a separate query for that, otherwise you would have very complex query which would return the same number for each row anyway.
select c.cname, o.customerID, count(*), avg(order_total)
from order o join customer using(customerID)
group by 1,2
This will calculate the number of orders and average order total (substitute the real column name for order_total) for each customer.
how many orders each customer has,
average the total number of orders.
SELECT
c1.cname AS custName,
c1.No_of_orders,
c2.avg_order_amt
FROM (
SELECT
c.id,
c.cname,
COUNT(DISTINCT o.orderID) AS No_of_orders
FROM
Customer c
JOIN Order_ o ON o.customerID = c.customerID
GROUP BY c.id, c.cname
) c1
CROSS JOIN (SELECT AVG(No_of_orders) AS avg_order_amt FROM (
SELECT
c.id,
COUNT(DISTINCT o.orderID) AS No_of_orders
FROM
Customer c
JOIN Order_ o ON o.customerID = c.customerID
GROUP BY c.id
)) c2

MySQL query with JOINS and GROUP BY

I'm building a MySQL query but I can't seem to get it right.
I have four tables:
- customers
- orders
- sales_rates
- purchase_rates
There is a 1:n relation 'customernr' between customers and orders.
There is a 1:n relation 'ordernr' between orders and sales_rates.
There is a 1:n relation 'ordernr' between orders and purchase_rates.
What I would like to do is produce an output of all customers with their total purchase and sales amounts.
So far I have the following query.
SELECT c.customernr, c.customer_name, SUM(sr.sales_price) AS sales_price, SUM(pr.purchase_price) AS purchase_price
FROM orders o, customers c, sales_rates sr, purchase_rates pr
WHERE o.customernr = c.customernr
AND o.ordernr = sr.ordernr
AND o.ordernr = pr.ordernr
GROUP BY c.customer_name
The result of the sales_price and purchase_price is far too high. I seem to be getting double counts. What am I doing wrong? Is it possible to perform this in a single query?
Thank for your response!
It seems that the problem is that when you join the orders table to the tables with sales rates and purchase rates, you are getting the cartesian product of these two latter tables. I.e each row in these two tables are repeated once for each correponding row in the other table. The following query should solve this problem by summing the rates for each order before joining the sales rates and purchase rates to the other tables:
SELECT c.customernr, c.customer_name,
SUM(sr.sales_price) AS sales_price,
SUM(pr.purchase_price) AS purchase_price
FROM customers c
INNER JOIN orders o
ON o.customernr = c.customernr
LEFT JOIN (SELECT ordernr, SUM sales_price) AS sales_price
FROM sales_rates
GROUP BY ordernr) sr
ON sr.ordernr = o.ordernr
LEFT JOIN (SELECT ordernr, SUM(purchase_price) AS purchase_price
FROM purchase_rates
GROUP BY ordernr) pr
ON pr.ordernr = o.ordernr
GROUP BY c.customernr, c.customer_name;
It doesn't look like you are grouping by the customer. C.customerid or something like that.