SQL: Less invasive way to find difference between two distinct columns - mysql

I am specifically targeting SQL Server, but this information would be helpful if it exists for MySQL also. I have found how to create a column that calculates the difference between columns of two consecutive rows, but what I am interested in is how do I calculate the difference between one column in two distinctly identifiable rows?
I know of one way to do it, but it requires deep query nesting and repeating the same base queries. Suppose I have a large query joining multiple tables that ultimately boils down to this:
SELECT
projectStatus AS [Project Status],
COUNT(projectStatus) AS [# of Projects]
FROM
projectList
GROUP BY projectStatus
And let's say it gives the following result:
Project Status | # of Projects
------------------------------
Delayed | 167
Delayed Known | 83
On Time | 92
Ahead | 86
What I would like to do is append a row that calculates the difference between the # of Projects value of rows Delayed and Delayed Known, then omits the Delayed row, like so:
Project Status | # of Projects
------------------------------
Delayed | 167
Delayed Known | 83
On Time | 92
Ahead | 86
Delayed Unknown| 84
Based on the first query, the way I have figured out how to do this looks like:
SELECT
projectStatus AS [Project Status],
COUNT(projectStatus) AS [# of Projects]
FROM
projectList
GROUP BY projectStatus
UNION
SELECT
'Delayed Unknown' AS [Project Status],
SUM([Sum Val]) AS [# of Projects]
FROM (
SELECT
[# of Projects] *
CASE
WHEN [Project Status] = 'Delayed' THEN 1
WHEN [Project Status] = 'Delayed Known'] THEN -1
ELSE 0
END AS [Sum Val]
FROM (
SELECT
projectStatus AS [Project Status],
COUNT(projectStatus) AS [# of Projects]
FROM
projectList
WHERE
projectStatus IN ('Delayed', 'Delayed Known')
GROUP BY projectStatus
) AS queryC
) AS queryB
Keep in mind that the inner query that this is based on is simplified for this post, but it actually is a larger query that is composed of its own UNION. Therefore, this gets ugly very quickly and approaches hard-to-maintain status.
The target for this query is to act as a dataset for SQL Server Reporting Services, so my constraint is to do this all in one query (unless it is possible to use temp tables within SSRS datasets). So, in one query, is there a less invasive way to do this calculation?

Please try this, may be you are looking for something like the below.
;with cte as(
select Projectstatus, count(*) as [# of Projects]
from #Table
where projectStatus IN ('Delayed', 'Delayed Known')
group by ProjectStatus
), cte2 as(
select
(Select [# of Projects] From cte Where Projectstatus = 'Delayed')
-(Select [# of Projects] From cte Where Projectstatus = 'Delayed Known') as DelayedUnKownProjects
)
select Projectstatus, [# of Projects]
From cte
UNION ALL
SELECT 'Delayed UnKnown' as Projectstatus, DelayedUnKownProjects as [# of Projects]
From cte2

Related

MySQL - Column to Row

In the original problem, we have a table that stores the date and win/loss information for each game played by a team.
the matches table
We can use the following SQL statements to get information about the number of games won and lost for each day.
SELECT match_date AS match_date,
SUM(IF(result = 'win',1,0)) AS win,
SUM(IF(result = 'lose',1,0)) AS lose
FROM matches
GROUP BY date;
We store the query results in the matches_2 table.the matches_2 table
My question is, how can we get the matches table based on the matches_2 table with a query?
In the simpler case, we can achieve the task of 'column to row' using union/union all. But that doesn't seem to work in this problem.
All relevant sql code can be found in the following fiddle:
https://dbfiddle.uk/rM-4Y_YN
You can use recursive CTE for this:
WITH RECURSIVE wins (mdate, w) AS
(
SELECT match_date as mdate, 1
FROM matches
WHERE win>0
UNION ALL
SELECT match_date, w + 1 FROM matches
JOIN wins on match_date=mdate
WHERE w < win
),
losses (mdate, l) AS
(
SELECT match_date as mdate, 1
FROM matches
WHERE lose>0
UNION ALL
SELECT match_date, l + 1 FROM matches
JOIN losses on match_date=mdate
WHERE l < lose
)
SELECT mdate as match_date, 'lose' FROM losses
UNION ALL
SELECT mdate as match_date, 'win' FROM wins
See a db-fiddle

SQL crosstab query to see the sum of some transactions

I have an SQL database table containing money transfer transactions. I have 4 columns: TransferID, Payer, Payee, Amount
Let's say this is my database table:
(source: pbrd.co)
I can create a crosstab query to see how much money was sent by each guy to each of his buddy. The result will be like something like this:
(source: pbrd.co)
However what I want to see is the balance of transactions between each two guys. For example if Peter sent $13 to John and John sent $2 to Peter then I want to see $11 (and $-11) as the summarized result of their transactions instead of $13 and $2. The result should look something like this:
(source: pbrd.co)
What query could make the trick?
As mentioned, the dynamic, pivot crosstab query using the TRANSFORM clause is uniquely an MS Access SQL method unavailable in other RDMS's. Since OP has linked MySQL backend tables to an MS Access frontend app, a frontend crosstab query can be run on MySQL data.
For specific needs, consider a source query that joins aggregate queries of matching Payer and Payee. Then run a crosstab on source query:
Source Query
SELECT m1.Payer, m1.Payee, (m1.SumAmount - m2.SumAmount) As NetTransfer
FROM
(SELECT t.Payer, t.Payee, Sum(t.Amount) AS SumAmount
FROM Transfers t
GROUP BY t.Payer, t.Payee) m1
INNER JOIN
(SELECT t.Payer, t.Payee, Sum(t.Amount) AS SumAmount
FROM Transfers t
GROUP BY t.Payer, t.Payee) m2
ON m1.Payer = m2.Payee AND m1.Payee = m2.Payer
-- Payer Payee NetTransfer
-- John Fred 2
-- Fred John -2
-- Peter John 11
-- John Peter -11
Crosstab Query (syntax is only valid in MS Access):
TRANSFORM Sum(q.NetTransfer) AS SumOfNetTransfer
SELECT q.Payer
FROM SourceQueryQ q
GROUP BY q.Payer
PIVOT q.Payee;
-- Payer Fred John Peter
-- Fred -2
-- John 2 -11
-- Peter 11
Of course, first query can also be nested as a derived table in crosstab:
Combined Query
TRANSFORM Sum(q.NetTransfer) AS SumOfNetTransfer
SELECT q.Payer
FROM
(SELECT m1.Payer, m1.Payee, (m1.SumAmount - m2.SumAmount) As NetTransfer
FROM
(SELECT t.Payer, t.Payee, Sum(t.Amount) AS SumAmount
FROM Transfers t
GROUP BY t.Payer, t.Payee) m1
INNER JOIN
(SELECT t.Payer, t.Payee, Sum(t.Amount) AS SumAmount
FROM Transfers t
GROUP BY t.Payer, t.Payee) m2
ON m1.Payer = m2.Payee AND m1.Payee = m2.Payer) q
GROUP BY q.Payer
PIVOT q.Payee;
NOTE: As in any Access query, the crosstab is limited to 255 columns, so if data contains more than 254 distinct Payers/Payees, use the PIVOT...IN clause to define columns:
PIVOT q.Payee IN ('Fred', 'John', 'Peter')
A pivot query should do the trick:
SELECT Payer,
SUM(CASE WHEN Payee = 'Peter' THEN Amount END) AS Peter,
SUM(CASE WHEN Payee = 'John' THEN Amount END) AS John,
SUM(CASE WHEN Payee = 'Fred' THEN Amount END) AS Fred
FROM yourTable
GROUP BY Payer
Depending on the database you are using, you might be able to take advantage of some built in pivot capability. Also, formatting the output as a currency, or with dashes for no amount, would be database specific.

Aggregating data to get a running total month on month

I have a table which holds the below data
This issue im having is that i need a running total for each month, I've managed to create this is an excel sheet pretty easily but when i try anything in SQL the data result varies.
The image below shows the sum of each paid amount by month, then a total of each one added onto it. I've edited excel to show the formula and the result of the formula. Also have the result i get from SQL 2008 when using (example only)
***UPDATE - The result set im trying to achieve that is in the excel document is for example month 117 + Month 118 gives Month118 TotalToDate, then month 118 + 119 gives Months 119 Total to Date.
Not sure how else to explain this?
( select sum(paid) from #tmp005 t2 where t2.[monthid] <=
t5.[monthid] ) as paid
Really feel that this is less complicated than what I think!
As I understand this you are trying to get a running total month by month, the below CTE should do what you want.
--create table #temp (M_ID Int, Paid Float)
--Insert Into #temp VALUES (116, '50.00'), (117, '50.00'),(117, '5.00'),(117, '20.00'),(117, '10.00'),(117, '75.40'),(118, '125.00'),(118, '200.00'),(118, '5.00')
;WITH y AS
(
SELECT M_ID, paid, rn = ROW_NUMBER() OVER (ORDER BY M_ID)
FROM #temp
), x AS
(
SELECT M_ID, rn, paid, rt = paid
FROM y
WHERE rn = 1
UNION ALL
SELECT y.M_ID, y.rn, y.paid, x.rt + y.paid
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT M_ID, MAX(rt) as RunningTotal
FROM x
Group By M_ID
OPTION (MAXRECURSION 10000);
It is based on the first 3 M_ID of your sample data, just change around the #temp to your specific table, I didn't know whether you had another unique identifier in the table which is why I had to use the ROW_NUMBER()but this should order it correctly based on the M_ID field.
I guess that you are storing the month in a separated table and using M_ID to reference it. So, to get the sum of each month do this:
SELECT [M_ID]
,sum([Paid])
FROM #tmp005
GROUP BY [M_ID]
I think I'd use a correlated sub query:-
select r.m_id,
(
select sum(csq.paid)
from #tmp005 csq
where csq.m_id<=r.m_id
)
from (
select distinct m_id
from #tmp005
) r
Hopefully you can figure out how to apply it to your circumstance/schema.

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;

How do I efficiently create logical subsets of data in a many-to-many mapping table?

I have a many-to-many relationship between invoices and credit card transactions, which I'm trying to map sums of together. The best way to think of the problem is to imagine TransactionInvoiceMap as a bipartite graph. For each connected subgraph, find the total of all invoices and the total of all transactions within that subgraph. In my query, I want to return the values computed for each of these subgraphs along with the transaction ids they're associated with. Totals for related transactions should be identical.
More explicitly, given the following transactions/invoices
Table: TransactionInvoiceMap
TransactionID InvoiceID
1 1
2 2
3 2
3 3
Table: Transactions
TransactionID Amount
1 $100
2 $75
3 $75
Table: Invoices
InvoiceID Amount
1 $100
2 $100
3 $50
my desired output is
TransactionID TotalAsscTransactions TotalAsscInvoiced
1 $100 $100
2 $150 $150
3 $150 $150
Note that invoices 2 and 3 and transactions 2 and 3 are part of a logical group.
Here's a solution (simplified, names changed) that apparently works, but is very slow. I'm having a hard time figuring out how to optimize this, but I think it would involve eliminating the subqueries into TransactionInvoiceGrouping. Feel free to suggest something radically different.
with TransactionInvoiceGrouping as (
select
-- Need an identifier for each logical group of transactions/invoices, use
-- one of the transaction ids for this.
m.TransactionID,
m.InvoiceID,
min(m.TransactionID) over (partition by m.InvoiceID) as GroupingID
from TransactionInvoiceMap m
)
select distinct
g.TransactionID,
istat.InvoiceSum as TotalAsscInvoiced,
tstat.TransactionSum as TotalAsscTransactions
from TransactionInvoiceGrouping g
cross apply (
select sum(ii.Amount) as InvoiceSum
from (select distinct InvoiceID, GroupingID from TransactionInvoiceGrouping) ig
inner join Invoices ii on ig.InvoiceID = ii.InvoiceID
where ig.GroupingID = g.GroupingID
) as istat
cross apply (
select sum(it.Amount) as TransactionSum
from (select distinct TransactionID, GroupingID from TransactionInvoiceGrouping) ig
left join Transactions it on ig.TransactionID = it.TransactionID
where ig.GroupingID = g.GroupingID
having sum(it.Amount) > 0
) as tstat
I've implemented the solution in a recursive CTE:
;with TranGroup as (
select TransactionID
, InvoiceID as NextInvoice
, TransactionID as RelatedTransaction
, cast(TransactionID as varchar(8000)) as TransactionChain
from TransactionInvoiceMap
union all
select g.TransactionID
, m1.InvoiceID
, m.TransactionID
, g.TransactionChain + ',' + cast(m.TransactionID as varchar(11))
from TranGroup g
join TransactionInvoiceMap m on g.NextInvoice = m.InvoiceID
join TransactionInvoiceMap m1 on m.TransactionID = m1.TransactionID
where ',' + g.TransactionChain + ',' not like '%,' + cast(m.TransactionID as varchar(11)) + ',%'
)
, RelatedTrans as (
select distinct TransactionID, RelatedTransaction
from TranGroup
)
, RelatedInv as (
select distinct TransactionID, NextInvoice as RelatedInvoice
from TranGroup
)
select TransactionID
, (
select sum(Amount)
from Transactions
where TransactionID in (
select RelatedTransaction
from RelatedTrans
where TransactionID = t.TransactionID
)
) as TotalAsscTransactions
, (
select sum(Amount)
from Invoices
where InvoiceID in (
select RelatedInvoice
from RelatedInv
where TransactionID = t.TransactionID
)
) as TotalAsscInvoiced
from Transactions t
There is probably some room for optimization (including object naming on my part!) but I believe I have at least a correct solution which will gather all possible Transaction-Invoice relations to include in the calculations.
I was unable to get the existing solutions on this page to give the OP's desired output, and they got uglier as I added more test data. I'm not sure if the OP's posted "slow" solution is correct as stated. It's very possible that I'm misinterpreting the question.
Additional info:
I've often seen that recursive queries can be slow when working with large sets of data. Perhaps that can be the subject of another SO question. If that's the case, things to try on the SQL side might be to limit the range (add where clauses), index base tables, select the CTE into a temp table first, index that temp table, think of a better stop condition for the CTE...but profile first, of course.
If I have understood the question right, I think you are trying to find the minimum of transaction id for each invoice and I have used ranking function to do the same.
WITH TransactionInvoiceGrouping AS (
SELECT
-- Need an identifier for each logical group of transactions/invoices, use
-- one of the transaction ids for this.
m.TransactionID,
m.InvoiceID,
ROW_NUMBER() OVER (PARTITION BY m.InvoiceID ORDER BY m.TransactionID ) AS recno
FROM TransactionInvoiceMap m
)
SELECT
g.TransactionID,
istat.InvoiceSum AS TotalAsscInvoiced,
tstat.TransactionSum AS TotalAsscTransactions
FROM TransactionInvoiceGrouping g
CROSS APPLY(
SELECT SUM(ii.Amount) AS InvoiceSum
FROM TransactionInvoiceGrouping ig
inner JOIN Invoices ii ON ig.InvoiceID = ii.InvoiceID
WHERE ig.TransactionID = g.TransactionID
AND ig.recno = 1
) AS istat
CROSS APPLY(
SELECT sum(it.Amount) AS TransactionSum
FROM TransactionInvoiceGrouping ig
LEFT JOIN transactions it ON ig.TransactionID = it.TransactionID
WHERE ig.TransactionID = g.TransactionID
AND ig.recno = 1
HAVING SUM(it.Amount) > 0
) AS tstat
WHERE g.recno = 1