More complex cumulative sum column in mysql - mysql

I read Create a Cumulative Sum Column in MySQL, and tried to adapt it to what I'm doing, but I can't seem to get it right.
The table:
Id (primary key)
AcctId
CommodId
Date
PnL
There is a unique index which contains AcctId, CommodId, Date. I want to get a cumulative total grouped by date.
This query
select c.date
, c.pnl
,(#cum := #cum + c.pnl) as "cum_pnl"
from commoddata c join (select #cum := 0) r
where
c.acctid = 2
and
c.date >= "2011-01-01"
and
c.date <= "2011-01-31"
order by c.date
will correctly calculate the running total for all records, showing data in the format
date pnl cum_pnl
======== ====== =======
2011-01-01 1 1
2011-01-01 1 2
2011-01-01 1 3
2011-01-01 1 4
2011-01-02 1 5
2011-01-02 1 6
...
(there can be many records per date). What I want is
date cum_pnl
======== =======
2011-01-01 4
2011-01-02 6
...
But nothing I've tried works. TIA.

Alternately I think you can replace all your pnl with sum(pnl), and let your #cum run across those. I think it would look like this:
select c.date
,SUM(c.pnl)
,(#cum := #cum + SUM(c.pnl)) as "cum_pnl"
from commoddata c join (select #cum := 0) r
where
c.acctid = 2 and c.date >= "2011-01-01" and c.date <= "2011-01-31"
order by c.date
GROUP BY c.date
I'm just trying to figure out if SQL will give you grief over selecting cum_pnl when it is not a group by expression... maybe you can try grouping by it as well?
EDIT New Idea, if you're really not averse to nested queries, replace commoddata with a summed grouped query
select c.date
,c.pnl
,(#cum := #cum + c.pnl) as "cum_pnl"
from
(SELECT date, sum(pnl) as pnl FROM commoddata WHERE [conditions] GROUP BY date) c
join (select #cum := 0) r
order by c.date

Probably not the best way, but you can SELECT Date, MAX(Cum_PnL) FROM (existing_query_here) GROUP BY Date

Related

mysql finding the sum of subgroup maximums

If I have the following table in MySQL:
date type amount
2017-12-01 3 2
2018-01-01 1 100
2018-02-01 1 50
2018-03-01 2 2000
2018-04-01 2 4000
2018-05-01 3 2
2018-06-01 3 1
...is there a way to find the sum of the amounts corresponding to the latest dates of each type? There are guaranteed to be no duplicate dates for any given type.
The answer I'd be looking to get from the data above could broken down like this:
The latest date for type 1 is 2018-02-01, where the amount is 50;
The latest date for type 2 is 2018-04-01, where the amount is 4000;
The latest date for type 3 is 2018-06-01, where the amount is 1;
50 + 4000 + 1 = 4051
Is there a way to arrive directly at 4051 in a single query? This is for a Django project using MySQL if that makes a difference; I wasn't able to find an ORM-related solution either, so figured a raw SQL query might be a better place to start.
Thanks!
Not sure for Django but in raw sql you could use a self join to pick latest row for each type based on latest date and then aggregate your results to get the sum of amounts for each type
select sum(a.amount)
from your_table a
left join your_table b on a.type = b.type
and a.date < b.date
where b.type is null
Demo
Or
select sum(a.amount)
from your_table a
join (
select type, max(date) max_date
from your_table
group by type
) b on a.type = b.type
and a.date = b.max_date
Demo
Or by using a correlated subuery
select sum(a.amount)
from your_table a
where a.date = (
select max(date)
from your_table
where type = a.type
)
Demo
For Mysql 8 you can use window functions to get you desired result as
select sum(amount)
from (select *, row_number() over (partition by type order by date desc) as seq
from your_table
) t
where seq = 1;
Demo

Select top x records for each team

How do I select the x most recent records per team, Home and Away?
So the below gets me the most recent games for Swansea both home and away, how do I get it for all teams?
select d.date, d.hometeam, d.awayteam
from dump d
where
d.hometeam = 'Swansea'
or d.awayteam ='Swansea'
order by STR_TO_DATE(date, '%d/%m/%Y') desc limit 6
For an example of the data that I have. I'm using the CSV data provided at football-data.co.uk: http://www.football-data.co.uk/mmz4281/1415/E0.csv
I'm using MySQL however if there is a function or Stored Procedure which you find ideal for this purpose I can use SQL Server.
Edit: Expected Output
X | Date | Home Team | Away Team
----------------------------------------
Swansea| 23/03/15|Swansea |Arsenal
----------------------------------------
Swansea| 14/03/15|Man City |Swansea
----------------------------------------
Man Utd| 14/03/15|Man Utd |Man City
----------------------------------------
Man Utd| 14/03/15|Man Utd |Liverpool
Though if you have any suggestions on how better to present it I'm open to suggestions.
Where the left is the team in question, as the above table shows 2 per team, I'm trying to get 6 per team.
You just need to GROUP BY team and date:
SELECT d.team, d.date, d.hometeam, d.awayteam
FROM dump d
GROUP BY team, date
ORDER BY STR_TO_DATE(date, '%d/%m/%Y');
You want to get most recent 6 games for all team separately
Here are the 2 things need to take care.
In your schema you don't have specific column for team. So first you've to get all the team using HomeTeam & AwayTeam columns.
2nd thing you want to get 6 most recent games for each team. Means within the team group you've to do the ranking but mysql doesn't support ranking function. Although we've an alternative to for ranking functions.
based on my analysis here is the query. please try it.
SELECT
r.homeTeamOrAwayTeam AS team
, r.date
, r.hometeam
, r.awayteam
-- , r.rank
FROM (
SELECT
d.date,
d.hometeam,
d.awayteam,
subQuery.homeTeamOrAwayTeam,
CASE WHEN #runningElement = subQuery.homeTeamOrAwayTeam THEN #groupRank := (#groupRank + 1)
ELSE #groupRank := 1
END AS rank
, CASE WHEN #runningElement = subQuery.homeTeamOrAwayTeam THEN #runningElement := subQuery.homeTeamOrAwayTeam
ELSE #runningElement := subQuery.homeTeamOrAwayTeam
END AS runnigElement
FROM
dump d
JOIN (
-- to get all the hometeam & awayteam in one column
SELECT d.hometeam AS homeTeamOrAwayTeam FROM DUMP AS d
UNION
SELECT d.awayteam AS homeTeamOrAwayTeam FROM dump AS d
) AS subQuery
ON d.hometeam = subQuery.homeTeamOrAwayTeam OR d.awayteam = subQuery.homeTeamOrAwayTeam,
-- for ranking purpose
(SELECT #groupRank := 1) a,
(SELECT #runningElement := '') b
ORDER BY
subQuery.homeTeamOrAwayTeam,
STR_TO_DATE(d.date, '%d/%m/%Y')
) as r
-- set your criteria (e.g. if want to get only 6 results per team)
WHERE r.rank between 1 and 6

query to sum a column until get defined value

i've tried some other topics for this but couldn't get answers that meet my requirement so posting a new question. sorry bout this.
i'm trying to query on mysql to get a 'sum' data until it reaches the defined value. like
from my table 'purchase', for each 'sid' starting from the last row, i need sum of 'pqty' until the result equals a value from string (but to try i've given a certain value).
let me define with the rows from my table---
the rows for 'sid=1' from 'purchase' are like this---
date pqty prate pamt
2014/04/29 5 38000 190000
2014/05/04 1 38000 38000
2014/05/13 20 35000 700000
2014/05/19 1 38000 38000
from this row, starting from the last row i want to 'sum(pqty) until it reaches 19(for now). it is achieved from adding last 2 rows(for 19). and stop sum here and return valus or sum of 'pqty', 'prate' and 'pamt'. to achieve this i tried the following according to example found on this forum.
SELECT date, pqty, #total := #total + pqty AS total
FROM (purchase, (select #total :=0) t)
WHERE #total<19 AND sid = $sid ORDER BY date DESC
but it's not working for me. please guide me through this. also suggest something else if this is not the good technique for my purpose.
thankz in advance.....
Not 100% certain, but I think both of these work...
SELECT x.*, SUM(y.pqty) FROM purchase x
JOIN purchase y
ON y.date >= x.date
GROUP
BY x.date
HAVING 19 BETWEEN SUM(y.pqty)-x.pqty AND SUM(y.pqty)
OR 19 >= SUM(y.pqty);
SELECT a.*
FROM
( SELECT x.*, #i := #i+pqty i
FROM purchase x
, (SELECT #i:= 0) var
ORDER
BY x.date DESC
) a
WHERE 19 BETWEEN a.i-a.pqty AND a.i
OR 19 >= a.i;

What's the most efficient way to generate this report?

Given a table (daily_sales) with say 100k rows of the following data/columns:
id rep sales date
1 a 123 12/15/2011
2 b 153 12/15/2011
3 a 11 12/14/2011
4 a 300 12/13/2011
5 a 120 12/12/2011
6 b 161 11/15/2011
7 a 3 11/14/2011
8 c 13 11/14/2011
9 c 44 11/13/2011
What would be the most efficient way to write a report (completely in SQL) showing the two most recent entries (rep, sales, date) for each name, so the output would be:
a 123 12/15/2011
a 11 12/14/2011
b 153 12/15/2011
b 161 11/15/2011
c 13 11/14/2011
c 44 11/13/2011
Thanks!
FYI, your example is using mostly reserved words and makes it horrid for us to attempt to program against. If you've got the real table columns, gives those to us. This is postgres:
select name,value, max(date)
from the_table_name_you_neglect_to_give_us
group by 1,2
That'll give you a list of first name,value,max(date)...though I gotta ask why give us a column called value if it doesn't change in the example?
Lets say you do have an id column...we'll be consistent with your scheme and call it 'ID'...
select b.id from
(select name,value, max(date) date
from the_table_name_you_neglect_to_give_us
group by 1,2) a
inner join the_table_name_you_neglect_to_give_us b on a.name=b.name and a.value=b.value and a.date = b.date
This gives a list of all ID's that are the max...put it together:
select name,value, max(date)
from the_table_name_you_neglect_to_give_us
group by 1,2
union all
select name,value, max(date)
from the_table_name_you_neglect_to_give_us
where id not in
(select b.id from
(select name,value, max(date) date
from the_table_name_you_neglect_to_give_us
group by 1,2) a
inner join the_table_name_you_neglect_to_give_us b on a.name=b.name and a.value=b.value and a.date = b.date)
Hoping my syntax is right...should be close at any rate. I'd put a bracket around that entire thing then select * from (above query) order by name...gives you the order you want.
For MySQL, explained in #Quassnoi's blog, an index on (name, date) and using this:
SELECT t.*
FROM (
SELECT name,
COALESCE(
(
SELECT date
FROM tableX ti
WHERE ti.name = dto.name
ORDER BY
ti.name, ti.date DESC
LIMIT 1
OFFSET 1 --- this is set to 2-1
), CAST('1000-01-01' AS DATE)) AS mdate
FROM (
SELECT DISTINCT name
FROM tableX dt
) dto
) tg
, tableX t
WHERE t.name >= tg.name
AND t.name <= tg.name
AND t.date >= tg.mdate
If I understand what you mean.. Then this MIGHT be helpful:
SELECT main.name, main.value, main.date
FROM tablename AS main
LEFT OUTER JOIN tablename AS ctr
ON main.name = ctr.rname
AND main.date <= ctr.rdate
GROUP BY main.name, main.date
HAVING COUNT(*) <= 2
ORDER BY main.name ASC, main.date DESC
I know the SQL is shorter than the other posts, but just give it a try first..

Get last record in each group and SUM some of them

i have a problem with sql query to mysql to take the last record in each group and sum some field in one query.i have a table:
name date interested made_call
andrew.h 2011-02-04 10 10
andrew.h 2011-02-11 20 10
andrew.h 2011-02-13 2 10
sasha.g 2011-02-11 5 20
sasha.g 2011-02-12 5 1
i need to sum made_call column grouping by name and return the last record from interested.
here what i want to get in result:
name date interested made_call
andrew.h 2011-02-13 2 30
sasha.g 2011-02-12 5 21
i tried to get result with this query
SELECT a.name,a.date,a.interested,sum(made_call) as made_call
FROM `resultboard` a
WHERE a.attendence = 1
AND NOT EXISTS (select 1 from resultboard where name = a.name
and id > a.id and attendence = 1)
GROUP BY name
but in result i got
andrew.h 2011-02-13 2 10
sasha.g 2011-02-12 5 1
so the query didnot sum, just return the last record from group
help)
That may be a little slow if the table is very big, but it will get the wanted result:
SELECT a.name, t.date, a.interested, t.calls
FROM resultboard a
JOIN (SELECT name, MAX(date) AS date, SUM(made_call) AS calls FROM resultboard
GROUP BY name) AS t
ON a.name = t.name AND a.date = t.date
Your WHERE clause is eliminating all but the last row from consideration as part of the sum.
In some other DB's you could use the LAST aggregate function. MySQL doesn't have that, but you can emulate it like so for your case:
SELECT
a.name,
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(a.date AS CHAR) ORDER BY date desc),
',', 1
) AS date,
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(a.interested AS CHAR) ORDER BY date desc),
',', 1
) AS interested,
sum(made_call) as made_call
FROM `resultboard` a
WHERE a.attendence = 1
GROUP BY name
It might not be fast on large data sets, but it should at least do the job if my research is correct. I haven't tested this, so YMMV.
I think that using WITH ROLLUP for GROUP BY modifier may help you. http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html
Edit; I got i wrong , no need for WITH ROLLUP
SELECT r.name,
MAX(r.date) as date,
(SELECT r2.interested FROM resultboard r2 WHERE r2.name = r.name ORDER BY r.date DESC LIMIT 1),
SUM(made_call) as made_call
FROM resultboard r
GROUP BY name;