Slamdata: how to find cumulative sum based on the date - mysql

i have used the query function in SlamData.
My code:
SELECT
DATE_PART("year",thedate) AS year,DATE_PART("month",thedate) AS month,
SUM(runningPnL) AS PnL
FROM "/Mickey/testdb/sampledata3" AS c
GROUP BY DATE_PART("year", thedate) ,DATE_PART("month", thedate)
order by DATE_PART("year", thedate) ,DATE_PART("month", thedate)
The extract of my table:
PnL month year
-1651.8752 1 2001
17180.4776 2 2001
48207.54560000001 3 2001
Now, how can i find the cumulative sum of the PnL?
eg.-1651.8752 for the first month
15528.6024 for the second month
Thank you very much >.<

I am generating sample data same as you for cumulative sum. Hope from this you get some idea.
Create table tempData
(
pnl float,
[month] int,
[year] int
)
Go
insert into tempData values ( -1651.8752, 1,2001)
insert into tempData values ( 17180.4776, 2,2001)
insert into tempData values ( 48207.54560000001, 3,2001)
Select * , (SELECT SUM(Alias.pnl)
FROM tempData As Alias
WHERE Alias.[Month] <= tempData.[Month]
) As CumulativSUM
FROm tempData
ORDER BY tempData.[MOnth]

done
my code is
SELECT a1.year, a1.month, a1.PnL, a1.PnL/(SUM(a2.PnL)+125000) as Running_Total
FROM/Mickey/testdb/sampledata6as a1,/Mickey/testdb/sampledata6as a2
WHERE (a1.month > a2.month And a1.year=a2.year) or (a1.year>a2.year)
GROUP BY a1.year, a1.month,a1.PnL
ORDER BY a1.year,a1.month ASC;

Related

SQL - Error when inserting into table from union all

I am trying to create a new table from one table using different values. But for some reason I am getting a
'Operand type clash: date is incompatible with float' error on line 1
insert into mvp(Acco, Bal, BalDate)
Select Acco,
l_Date,
sum(amount) over (partition by Acco order by l_Date asc) as amount
from (
Select Acco, l_Date, sum(amount) amount
from (
Select Acco, cbs as amount
, '2020-06-21' as l_Date
from homeworktable
union all
Select Acco, abs as amount
, '2020-06-21' as l_Date
from homeworktable
) a
group by Acco, l_Date
) a
I don't understand why there would be a problem on line 1. Any help ?
you have inverted the columns in the insert:
l_Date ==> Bal
amount ==> BalDate
the insert should be: insert into mvp(Acco, BalDate, Bal)....

How to calculate 3 month active users

I have a table that contains an orderId, a timestamp and a customerId, like this:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata (
`orderId` int,
`createdOn` datetime(6),
`customerId` int,
PRIMARY KEY (`orderId`)
);
INSERT INTO testdata (orderId, createdOn, customerId) VALUES
('1000001','2020-01-01 17:08:41.460000','101'),
('1000002','2020-01-02 18:01:00.180000','102'),
('1000003','2020-01-03 12:26:02.460000','103'),
('1000004','2020-01-04 13:32:42.610000','104'),
('1000005','2020-01-05 20:21:28.540000','101'),
('1000006','2020-01-06 11:54:20.530000','102'),
('1000007','2020-02-01 20:54:42.470000','102'),
('1000008','2020-02-02 10:21:29.470000','102'),
('1000009','2020-02-03 16:22:23.880000','102'),
('1000010','2020-02-04 16:22:23.880000','103'),
('1000011','2020-02-05 17:08:41.460000','103'),
('1000012','2020-02-06 18:01:00.180000','103'),
('1000013','2020-03-01 12:26:02.460000','102'),
('1000014','2020-03-02 13:32:42.610000','102'),
('1000015','2020-03-03 20:21:28.540000','103'),
('1000016','2020-03-04 11:54:20.530000','103'),
('1000017','2020-03-05 20:54:42.470000','104'),
('1000018','2020-03-06 10:21:29.470000','104'),
('1000019','2020-04-01 16:22:23.880000','103'),
('1000020','2020-04-02 16:22:23.880000','103'),
('1000021','2020-04-03 17:08:41.460000','103'),
('1000022','2020-04-04 18:01:00.180000','104'),
('1000023','2020-04-05 12:26:02.460000','104'),
('1000024','2020-04-06 13:32:42.610000','104'),
('1000025','2020-05-01 20:21:28.540000','103'),
('1000026','2020-05-02 11:54:20.530000','103'),
('1000027','2020-05-03 20:54:42.470000','104'),
('1000028','2020-05-04 10:21:29.470000','104'),
('1000029','2020-05-05 16:22:23.880000','105'),
('1000030','2020-05-06 16:22:23.880000','105'),
('1000031','2020-05-01 20:21:28.540000','104'),
('1000032','2020-05-02 11:54:20.530000','104'),
('1000033','2020-05-03 20:54:42.470000','104'),
('1000034','2020-05-04 10:21:29.470000','105'),
('1000035','2020-05-05 16:22:23.880000','105'),
('1000036','2020-05-06 16:22:23.880000','105')
;
Now I want to calculate for each month the number of customers that have been active (i.e., have an order) within the last 3 months (i.e., current month or the preceding two months).
I manage to calculate the active users for the current month, like this:
SELECT
EXTRACT(YEAR_MONTH FROM createdOn) AS order_createdOn_ym
,COUNT(DISTINCT customerId) AS mau
FROM testdata
GROUP BY order_createdOn_ym
ORDER BY order_createdOn_ym asc
;
(Fiddle over here.)
However, I'm completely stumped as to how you can approach calculating the 3-months-active users.
Any help is greatly appreciated!
Here is one option:
select c.createdmonth, count(distinct customerid) as mau
from (
select distinct date_format(createdon, '%Y-%m-01') as createdmonth
from testdata
) c
left join testdata t
on t.createdon >= c.createdmonth - interval 2 month
and t.createdon < c.createdmonth + interval 1 month
group by c.createdmonth
The idea is to enumerate the distinct months, then bring the table with a left join that recovers the last 2 month and the current month. You can then aggregate and count the number of distinct customers per group.
Thanks to #GMB for providing the solution. Purely as a matter of taste I prefer to have the month interval the following way though:
SELECT date_format(c.end_of_createdOn_month, '%Y-%m') as order_month,
count(distinct customerid) as mau_3m
FROM (
select distinct LAST_DAY(createdOn) as end_of_createdOn_month
from testdata
) c
LEFT JOIN testdata t
ON t.createdon >= (c.end_of_createdOn_month - interval 3 month)
AND t.createdon <= c.end_of_createdOn_month
GROUP BY c.end_of_createdOn_month;

Minimum number of Meeting Rooms required to Accomodate all Meetings in MySQL

I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha

Calculate Date by number of working days from a certain Startdate

I've got two tables, a project table and a calendar table. The first containts a startdate and days required. The calendar table contains the usual date information, like date, dayofweek, and a column is workingday, which shows if the day is a saturday, sunday, or bank holiday (value = 0) or a regular workday (value = 1).
For a certain report I need write a stored procedure that calculates the predicted enddate by adding the number of estimated workddays needed.
Example:
**Projects**
Name Start_Planned Work_days_Required
Project A 02.05.2016 6
Calendar (04.05 is a bank holdiday)
Day Weekday Workingday
01.05.2016 7 0
02.05.2016 1 1
03.05.2016 2 1
04.05.2016 3 0
05.05.2016 4 1
06.05.2016 5 1
07.05.2016 6 0
08.05.2016 7 0
09.05.2016 1 1
10.05.2016 2 1
Let's say, the estimated number of days required is given as 6 (which leads to the predicted enddate of 10.05.2016). Is it possible to join the tables in a way, which allows me to put something like
select date as enddate_predicted
from calendar
join projects
where number_of_days = 6
I would post some more code, but I'm quite stuck on how where to start.
Thanks!
You could get all working days after your first date, then apply ROW_NUMBER() to get the number of days for each date:
SELECT Date, DayNum = ROW_NUMBER() OVER(ORDER BY Date)
FROM Calendar
WHERE IsWorkingDay = 1
AND Date >= #StartPlanned
Then it would just be a case of filtering for the 6th day:
DECLARE #StartPlanned DATE = '20160502',
#Days INT = 6;
SELECT Date
FROM ( SELECT Date, DayNum = ROW_NUMBER() OVER(ORDER BY Date)
FROM Calendar
WHERE WorkingDay = 1
AND Date >= #StartPlanned
) AS c
WHERE c.DayNum = #Days;
It's not part of the question, but for future proofing this is easier to acheive in SQL Server 2012+ with OFFSET/FETCH
DECLARE #StartPlanned DATE = '20160502',
#Days INT = 6;
SELECT Date
FROM dbo.Calendar
WHERE Date >= #StartPlanned
AND WorkingDay = 1
ORDER BY Date
OFFSET (#Days - 1) ROWS FETCH NEXT 1 ROWS ONLY
ADDENDUM
I missed the part earlier about having another table, and the comment about putting it into a cursor has prompted me to amend my answer. I would add a new column to your calendar table called WorkingDayRank:
ALTER TABLE dbo.Calendar ADD WorkingDayRank INT NULL;
GO
UPDATE c
SET WorkingDayRank = wdr
FROM ( SELECT Date, wdr = ROW_NUMBER() OVER(ORDER BY Date)
FROM dbo.Calendar
WHERE WorkingDay = 1
) AS c;
This can be done on the fly, but you will get better performance with it stored as a value, then your query becomes:
SELECT p.Name,
p.Start_Planned,
p.Work_days_Required,
EndDate = c2.Date
FROM Projects AS P
INNER JOIN dbo.Calendar AS c1
ON c1.Date = p.Start_Planned
INNER JOIN dbo.Calendar AS c2
ON c2.WorkingDayRank = c1.WorkingDayRank + p.Work_days_Required - 1;
This simply gets the working day rank of your start date, and finds the number of days ahead specified by the project by joining on WorkingDayRank (-1 because you want the end date inclusive of the range)
This will fail, if you ever plan to start your project on a non working day though, so a more robust solution might be:
SELECT p.Name,
p.Start_Planned,
p.Work_days_Required,
EndDate = c2.Date
FROM Projects AS P
CROSS APPLY
( SELECT TOP 1 c1.Date, c1.WorkingDayRank
FROM dbo.Calendar AS c1
WHERE c1.Date >= p.Start_Planned
AND c1.WorkingDay = 1
ORDER BY c1.Date
) AS c1
INNER JOIN dbo.Calendar AS c2
ON c2.WorkingDayRank = c1.WorkingDayRank + p.Work_days_Required - 1;
This uses CROSS APPLY to get the next working day on or after your project start date, then applies the same join as before.
This query returns a table with a predicted enddate for each project
select name,min(day) as predicted_enddate from (
select c.day,p.name from dbo.Calendar c
join dbo.Calendar c2 on c.day>=c2.day
join dbo.Projects p on p.start_planned<=c.day and p.start_planned<=c2.day
group by c.day,p.work_days_required,p.name
having sum(c2.workingday)=p.work_days_required
) a
group by name
--This gives me info about all projects
select p.projectname,p.Start_Planned ,c.date,
from calendar c
join
projects o
on c.date=dateadd(days,p.Work_days_Required,p.Start_Planned)
and c.isworkingday=1
now you can use CTE like below or wrap this in a procedure
;with cte
as
(
Select
p.projectnam
p.Start_Planned ,
c.date,datediff(days,p.Start_Planned,c.date) as nooffdays
from calendar c
join
projects o
on c.date=dateadd(days,p.Work_days_Required,p.Start_Planned)
and c.isworkingday=1
)
select * from cte where nooffdays=6
use below logic
CREATE TABLE #proj(Name varchar(50),Start_Planned date,
Work_days_Required int)
insert into #proj
values('Project A','02.05.2016',6)
CReATE TABLE #Calendar(Day date,Weekday int,Workingday bit)
insert into #Calendar
values('01.05.2016',7,0),
('02.05.2016',1,1),
('03.05.2016',2,1),
('04.05.2016',3,0),
('05.05.2016',4,1),
('06.05.2016',5,1),
('07.05.2016',6,0),
('08.05.2016',7,0),
('09.05.2016',1,1),
('10.05.2016',2,1)
DECLARE #req_day int = 3
DECLARE #date date = '02.05.2016'
--SELECT #req_day = Work_days_Required FROM #proj where Start_Planned = #date
select *,row_number() over(order by [day] desc) as cnt
from #Calendar
where Workingday = 1
and [Day] > #date
SELECT *
FROM
(
select *,row_number() over(order by [day] desc) as cnt
from #Calendar
where Workingday = 1
and [Day] > #date
)a
where cnt = #req_day

MySQL - Group By Year using 2 columns & Count

I have a table like this:
ID_____StartDate_____EndDate
----------------------------
1______05/01/2012___02/03/2013
2______06/30/2013___07/12/2013
3______02/17/2010___02/17/2013
4______12/10/2012___11/16/2013
I'm trying to get a count of the ID's that were active during each year. If the ID was active for multiple years, it would be counted multiple times. I don't want to "hardcode" years into my query because the data is over many many multiple years. (i.e. can't use CASE YEAR(StartDate) WHEN x then y or IF...
Desired Result from the table above:
YEAR_____COUNT
2010_____1
2011_____1
2012_____3
2013_____4
I've tried:
SELECT COUNT(ID)
FROM table
WHERE (DATE_FORMAT(StartDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12'
OR DATE_FORMAT(EndDate, '%Y-%m') BETWEEN '2013-01' AND '2013-12')
of course this only is for the year 2013. I also tried:
SELECT YEAR(StartDate) AS 'Start Year', YEAR(EndDate) AS 'End Year', COUNT(id)
FROM table
WHERE StartDate IS NOT NULL
GROUP BY YEAR(StartDate);
though this gave me just those that started in a given year.
Assuming that there is an auxiliary table that contains consecutive numbers from 1 .. to X (where X must be grather than possible number of years in the table):
create table series( x int primary key auto_increment );
insert into series( x )
select null from information_schema.tables;
then the query might look like:
SELECT years.year, count(*)
FROM (
SELECT mm.min_year + s.x - 1 as year
FROM (
SELECT min( year( start_date )) min_year,
max( year( end_date )) max_year
FROM tab
) mm
JOIN series s
ON s.x <= mm.max_year - mm.min_year + 1
GROUP BY mm.min_year + s.x - 1
) years
JOIN tab
ON years.year between year( tab.start_date )
and year( tab.end_date )
GROUP BY years.year
;
see a demo: http://www.sqlfiddle.com/#!2/f49ab/14