MySQL: How to sum distinct rows in complex joined query - mysql

I have MySQL question I cannot solve myself (for the first time).
I have a query-with-parameters database plus PHP program that, together, generate extensive MySQL queries to run.
The problem is actually a simple one: that of correct summation. I need to SUM distinct rows (not values) within a complex, multi-joined query, and I cannot get it to work.
Do not ask why I work with the data structure below - I am working with data that is supplied to me and it needs to be this way. (The tables represent existing invoices.)
I will try to reproduce the situation very simplified here.
TABLE INVOICE
=============
Inv.Nr (ID) Other Data
------------------------
#1 Stuff
#2 Stuff
#3 More Stuff
TABLE INVOICE LINE
==================
ID Inv.Nr QUANTITY ArticleID UNIT PRICE
----------------------------------------------
1 #1 1 5 € 2.50
2 #1 1 109 € 4.00
3 #2 4 77 € 5.00
4 #2 10 91 € 6.00
TABLE INVOICE LINE VAT
======================
ID LINE-ID AMOUNT VATP VAT
1 1 € 2.00 25% € 0.50
2 2 € 2.00 25% € 0.50
3 2 € 1.42 6% € 0.08
4 3 €18.87 6% € 1.23
5 4 €16.00 25% € 4.00
6 4 €37.74 6% € 2.26
As you can see: some articles have a double VAT rate, because they consist of more elements that have different VAT rates (i.e. a book with a cd).
Now the queries are very long, there are much more tables joined that can have dynamic WHERE and GROUP BY clauses. So a query might look somewhat like (again much simplified):
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN `Invoice Line VAT`
ON `Invoice Line`.ID = `Invoice Line VAT`.`Line-ID`)
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE ....
GROUP BY .....
HAVING .....
The INNER JOINs defined by ... are many to 1, so Invoice Line VAT is on the many-side of both its JOIN relations.
The WHERE, GROUP BY and HAVING are semi-dynamically created in PHP code.
My problem is that i cannot get a proper SUM(Amount) and SUM(Quantity) at the same time, since the Quantity is added multiple times if there are multiple VAT rates to one invoice line.
SUM(DISTINCT Quantity) obviously doesn't work, since I need distinct rows, not values.
I cannot really create a subquery that either calculates the number of VAT rates (and divides the SUM(Quantity)), or calculates the Amount, since the subquery needs the same WHERE/HAVING parameters as the main query to work properly, and those are semi-dynamic (the queries are in a database and contain parameters that are filled in following the user's commands). Well, to be fair, I could do it, but it would leave the query-database and the php software extremely complicated, and I don't want to use a very complex solution for such a very simple problem, especially since someone else will have to maintain it in the future.
So how do I:
SUM the quantity only on distinct rows, or
COUNT the number of VAT rates per line, given the WHERE/HAVING (so without a subquery)?
I could add extra fields to the tables to help with this problem, but that possibility didn't help me - yet. For instance: storing the number of VAT rates doesn't help, since in the WHERE there may be a selection on VAT rate.
I hope it is something VERY simple that I overlooked, but I have been searching for hours now to no avail...
If anyone can help me that would be great! Thanks in advance!
EDIT: I found a solution, but I am not very pleased with it. I have to split up the WHERE, and SUM SUMs and repeat columns... It is UGLY and badly maintainable.
It is as follows:
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN
(SELECT SUM(Amount) AS Amount, SUM(VAT) AS VAT, `Line-ID`
FROM ((`Invoice Line VAT`
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE some-where-stuff
GROUP BY `Line-ID`) x
ON `Invoice Line`.ID = x.`Line-ID`)
WHERE other-where-stuff
GROUP BY .....
HAVING .....
I hope someone got a more elegant, simpler solution!

In an update to the question, I answered the question myself. I said that I hoped for a less ugly and badly maintainable solution than:
SELECT `Inv.Nr`, ArticleID, SUM(Quantity), SUM(Amount), SUM(VAT)
FROM ((`Invoice` INNER JOIN `Invoice Line`
ON `Invoice`.`Inv.Nr`=`Invoice Line`.`Inv.Nr`)
INNER JOIN
(SELECT SUM(Amount) AS Amount, SUM(VAT) AS VAT, `Line-ID`
FROM ((`Invoice Line VAT`
INNER JOIN `More Stuff`
ON .... )
INNER JOIN ....
ON ..... )
WHERE some-where-stuff
GROUP BY `Line-ID`) x
ON `Invoice Line`.ID = x.`Line-ID`)
WHERE other-where-stuff
GROUP BY .....
HAVING .....
It turns out, that, now that I am working with this solution and rephrasing all my queries based in it, it is not so humongous and ugly after all. It turns out that it works quite well and much better than other solutions and workarounds I have tried. Because I guess there is no other solution than what I wrote I close this question by answering that above cited answer is the right one.
It turns out that using the correct SQL code instead of workarounds is the right way to do, even when it looks too complicated at first. And since there is nothing like SUM(DISTINCT ...) that works with distinct records instead of values, in this case the above code is the correct code.

Related

Left Join and Sum

I have two tables, one that lists grants/loans and one that lists individual expenditures. They share an ID column as each expenditure is assigned to a specific grant or loan. I'm trying to use LEFT JOIN to sum the expenditures for all the loans combined, but not the grants.
Here's where I'm at:
SELECT SUM(expenses.total_amt) AS total
FROM expenses WHERE loans_grants.grant_loan_type = 'Loan'
LEFT JOIN loans_grants
ON expenses.grant_loan_id = loans_grants.internal_id;
Any tips much appreciated!
Edit: thanks all, and apologies for the half baked question, it was late and I was in the weeds.
Here's the basic structures:
expenses:
expenses table structure
loans_grants:
loans_grants table structure
I've updated the code based on #jwood74's answer to this:
SELECT l.internal_id, SUM(e.total_amt) amount
FROM loans_grants l
LEFT JOIN expenses e ON e.grant_loan_id = l.internal_id
WHERE grant_loan_type = 'Loan'
group by l.internal_id
which produces this:
internal id
amount
1
3234
4
null
5
7625
7
null
9
null
Please excuse my noviceness, but I'm trying to sum up all expenses for loans, so I'd like to return 3234 + 7625, rather than summing expenses for each loan separately. Thanks for your help!
If you are looking for a SINGLE ROW RETURNED, you do not need to do a group by anything... just the SUM() of what you are looking for.
Second, do not post pictures of your sample data and table structures. Edit your original post and type the values in, even if you copy/paste the data and format it for readability (via Ctrl+K, or the curly brackets {} icon above post editing header area).
In this case, your tables
Loan_Grants table
Internal_id grant_loan_type
1 Loan
2 Grant
3 Grant
4 Loan
5 Loan
6 Grant
7 Loan
8 Grant
9 Loan
Expenses Table
total_amt grant_loan_id
2000 1
245 5
4500 5
2200 5
445 5
185 5
1234 1
50 5
Starting with your Loan_Grants table filtered on just your 'Loan' records
select
sum( e.total_amt ) totalExpenses
from
loan_grants lg
JOIN expenses e
on lg.internal_id = e.grant_loan_id
where
lg.grant_loan_type = 'Loan'
You dont want a left-join unless you explicitly want to see ALL "Loan" entries, even if they have no expenses yet recorded. By doing a regular (inner) JOIN, it means there MUST be a record in the expenses table. Again, based on your needs. If you have 10,000 loans and only 247 loans have expenses, do you want to see all 10,000 or just the 247 and what their totals are. Since you are summarizing to a single return record, JOIN is your best choice here.
For future, ALWAYS try to apply a table.column or alias.column to all your fields so anyone assisting does not have to guess which table the column comes from.
Without knowing the exact format of the two tables, it's a bit hard. But here would be the general idea-
select
l.id,
sum(e.amount) amount
from loans_grants l
left join expenses e on e.grant_loan_id = l.internal_id
where grant_loan_type = 'Loan'
group by l.id

How to query scalable prices in MySQL

Dear stack overflow community,
This is my first post so please bear with me :)
I need to solve a SQL problem for a friend of mine.
He is running a web shop and wants to create a finance report.
The application he is using provides such functionality using an interface were MySQL queries can be executed.
I already created most of the report with my (limited) SQL knowledge, however I am struggeling to solve the last problem.
The goal of the report is to UNION and JOIN several tables to get an overview of all commissions, invoices and proposals with their respective articles and prices.
So what I did so far:
I did a union of commissions, invoices, proposals (lets call them receipt) and joined them with articles and prices.
That worked very well.
However, here is my problem:
An article could have multiple prices depending on the date of the respective receipt.
So I end up with more rows in my table as there should be.
There is a "valid_until" field within the prices table, which I have to use for the filter ... but how?
Example:
receipt_id
receipt_date
article_id
article_price
valid_until
price_id
209986-1
2020-09-10
2925
13
2020-12-06
1
209986-1
2020-09-10
2931
13
2020-09-09
2
209986-1
2020-09-10
2937
12,6
2020-09-12
3
209986-1
2020-09-10
2980
12,32
0000-00-00
4
In this case, only price_id 3 is valid as the receipt_date is "2020-09-10".
My Query (with limited SQL knowledge):
SELECT *
FROM (SELECT * FROM commissions UNION ALL SELECT * FROM invoices UNION ALL SELECT * FROM proposals) AS receipt
LEFT JOIN article ON receipt.article = article.id
LEFT JOIN prices ON article.id = prices.artikel
WHERE receipt.date <= IF(prices.valid_until = '0000-00-00', Date('3000-01-01'), prices.valid_until)
With that query I still get 3 results (price_id 4, 3 and 1).
I managed to identify the valid price using DATEDIFF(), ORDER BY and LIMIT, however MySQL does not allow to use LIMIT in sub-queries :(
Any help would be much appreciated.
KR,
Wlad

Total amount of sales done for each product using SQL

Here is the structure of 1st Table called Product.
PRODID PDESC PRICE CATEGORY DISCOUNT
101 BALL 10 SPORTS 5
102 SHIRT 20 APPAREL 10
Here is the structure of 2nd table called SaleDetail.
SALEID PRODID QUANTITY
1001 101 5
1001 101 2
1002 102 10
1002 102 5
I am trying to get total sales amount for each product by joining 2 tables. Here is the SQL i tried but its not giving correct result.
select a.prodid,
(sum((price - discount))),
sum(quantity),
(sum((price - discount))) * sum(quantity)
from product a
join saledetail b on a.prodid = b.prodid
group by a.prodid
2nd column of the query is giving incorrect final price. Please help me correct this SQL.
Please find an indicative answer to your question in the fiddle.
A problem stems from the aggregation of the difference of price. In case that the same product has two different prices, then these prices would be aggregated to one.
Moreover, you multiple the sums of the prices and quantities, while you need to perform the calculation on every sample. Look at the answer by #DanteTheSmith.
You might consider to use the SaleDetail table on the left side of your query.
SELECT SD.PRODID,
P.Price-P.Discount AS Final_Price,
SUM(SD.QUANTITY) AS Amount_Sold,
SUM((P.Price-P.Discount)*SD.QUANTITY) AS Sales_Amount
FROM SaleDetail AS SD
JOIN Product AS P
ON SD.PRODID = P.PRODID
GROUP BY SD.PRODID, P.Price-P.Discount
It would help if you built the example in SQL fiddle or gave the creates for the tables, but if I have to guess your problem is:
(sum((price - discount))) * sum(quantity)
needs to be:
sum((price - discount) * quantity)
(price - discount) * quantity is the function you wanna apply PER ROW of the joined table then you wanna add all those up with SUM() when grouping by prodid.
Furthermore, you can notice that (price - discount) needs to be done ONLY ONCE PER ROW so a quicker version would be to do:
(price-discount) * sum(quantity)
That would give you the total money earned for that product across all the sales you made, and I am guessing this is what you want?
I just notice you have a problem with 2nd column, dunno if that has been in question all along:
(sum((price - discount)))
Why are you summing? Do you want the money earned per product per unit of the product? Well guess what, your price is the same all the time, same as your discount so you can simply go with:
(price-discount) as PPP
NOTE: This assumes the discount is numerical (not percentage) and is applicable to all your sales, also the price is forever the same all which is not real life like.

Moving average query MS Access

I am trying to calculate the moving average of my data. I have googled and found many examples on this site and others but am still stumped. I need to calculate the average of the previous 5 flow for the record selected for the specific product.
My Table looks like the following:
TMDT Prod Flow
8/21/2017 12:01:00 AM A 100
8/20/2017 11:30:45 PM A 150
8/20/2017 10:00:15 PM A 200
8/19/2017 5:00:00 AM B 600
8/17/2017 12:00:00 AM A 300
8/16/2017 11:00:00 AM A 200
8/15/2017 10:00:31 AM A 50
I have been trying the following query:
SELECT b.TMDT, b.Flow, (SELECT AVG(Flow) as MovingAVG
FROM(SELECT TOP 5 *
FROM [mytable] a
WHERE Prod="A" AND [a.TMDT]< b.TMDT
ORDER BY a.TMDT DESC))
FROM mytable AS b;
When I try to run this query I get an input prompt for b.TMDT. Why is b.TMDT not being pulled from mytable?
Should I be using a different method altogether to calculate my moving averages?
I would like to add that I started with another method that works but is extremely slow. It runs fast enough for tables with 100 records or less. However, if the table has more than 100 records it feels like the query comes to a screeching halt.
Original method below.
I created two queries for each product code (There are 15 products): Q_ProdA_Rank and Q_ProdA_MovAvg
Q_ProdA_RanK (T_ProdA is a table with Product A's information):
SELECT a.TMDT, a.Flow, (Select count(*) from [T_ProdA]
where TMDT<=a.TMDT) AS Rank
FROM [T_ProdA] AS a
ORDER BY a.TMDT DESC;
Q_ProdA_MovAvg
SELECT b.TMDT, b.Flow, Round((Select sum(Flow) from [Q_PRodA_Rank] where
Rank between b.Rank-1 and (b.Rank-5))/IIf([Rank]<5,Rank-1,5),0) AS
MovingAvg
FROM [Q_ProdA_Rank] AS b;
The problem is that you're using a nested subquery, and as far as I know (can't find the right site for the documentation at the moment), variable scope in subqueries is limited to the direct parent of the subquery. This means that for your nested query, b.TMDT is outside of the variable scope.
Edit: As this is an interesting problem, and a properly-asked question, here is the full SQL answer. It's somewhat more complex than your try, but should run more efficiently
It contains a nested subquery that first lists the 5 previous flows for per TMDT and prod, then averages that, and then joins that in with the actual query.
SELECT A.TMDT, A.Prod, B.MovingAverage
FROM MyTable AS A LEFT JOIN (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Avg(Top5.Flow) As MovingAverage
FROM (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Top5.Flow
FROM MyTable As JoinKeys INNER JOIN MyTable AS Top5 ON JoinKeys.Prod = Top5.Prod
WHERE Top5.TMDT In (
SELECT TOP 5 A.TMDT FROM MyTable As A WHERE JoinKeys.Prod = A.Prod AND A.TMDT < JoinKeys.TMDT ORDER BY A.TMDT
)
)
GROUP BY JoinKeys.TMDT, JoinKeys.Prod
) AS B
ON A.Prod = B.JoinKeys.Prod AND A.TMDT = B.JoinKeys.TMDT
While in my previous version I advocated a VBA approach, this is probably more efficient, only more difficult to write and adjust.

MySQL summing totals by state by year

I have been playing around with this for what seems like hours and I can't get the results I want. Here is the query I am having trouble with:
SELECT year.year, dstate,
(SELECT sum(amount) FROM gift
WHERE year.year = gift.year
AND gift.donorno = donor.donorno)
FROM donor, gift, year
WHERE year.year = gift.year
AND gift.donorno = donor.donorno;
This seems redundant. Anyway, I am trying display the total donations (gift.amount) for each state by year.
ex.
1999 GA 500 (donorno 1 from GA donated 200 and donorno 2 from GA donated 300)
1999 FL 400
2000 GA 600
2000 FL 500
...
To clarify donors can be from the same state but I am trying to total the gift amounts for that state for the year it is donated.
Any advice is appreciated. I feel like the answer is right in front of me.
Here is a picture of tables for reference:
This is a very simple join & aggregation problem.
SELECT y.year, d.state, SUM(g.amount) AS total
FROM gift AS g
INNER JOIN year AS y ON y.year=g.year
INNER JOIN donor AS d ON d.donorno=g.donorno
GROUP BY y.year, d.state
You don't need the sub-query in your SELECT clause in order to get the total amount. You can sum it by grouping. (I think the GROUP BY clause is what you're missing. I recommend reading up on it.) What you've done is called a correlated sub-query and it is going to be very slow over large data sets because it has to be calculated row-by-row instead of as a set operation.
Also, please don't use the old style comma join syntax. Instead use the explicit join syntax as shown above. It is much clearer and will help avoid accidental Cartesian products.