Never decreasing cumulative figure - mysql

CREATE TABLE `test` (
`UniqueID` INT(11) NOT NULL AUTO_INCREMENT,
`Date` date,
`Entry` VARCHAR(20),
PRIMARY KEY (`UniqueID`)
);
INSERT INTO `test` (Date,Entry) VALUES
('2015-09-01','text1'),
('2015-09-01','text1'),
('2015-09-01','text1'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-02','text2'),
('2015-09-03','text3'),
('2015-09-03','text3'),
('2015-09-03','text3'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4'),
('2015-09-04','text4');
SET #total:= 0;
SET #prevCount:= 0;
SELECT
#total:= IF (#prevCount <= COUNT(Entry),#total + (COUNT(Entry) - #prevCount),#total) AS total,
#prevCount := COUNT(Entry) AS dayTotal,
`Entry`,
`Date`
FROM test
GROUP BY `Date`
ORDER BY `Date` ASC
| total | dayTotal | Entry | Date |
|-------|----------|-------|-----------------------------|
| 3 | 3 | text1 | September, 01 2015 00:00:00 |
| 4 | 4 | text2 | September, 02 2015 00:00:00 |
| 3 | 3 | text3 | September, 03 2015 00:00:00 |
| 6 | 6 | text4 | September, 04 2015 00:00:00 |
fiddle of same: http://sqlfiddle.com/#!9/d9031/2
I need the total figure to never decrease because it is a cumulative figure over time.
My problem seems to be that MySQL doesn't store #prevCount on the loop - so I can't use it to calculate the total.
What I expect to see is that total will show
3
4
4
7
Note that the 7 is correct because it is the 4 plus the 3 new entries on the 4th.

Doing calculations with variables is tricky. With group by, you need to use a subquery.
Your logic doesn't make full sense to me. The closest reasonable thing I can think of is a cumulative max:
SELECT #max := if(#max > dayTotal, #max, dayTotal)
FROM (SELECT `Date`, COUNT(*) as dayTotal
FROM test
GROUP BY `Date`
) t CROSS JOIN
(SELECT #max := 0) params
ORDER BY `Date` ASC;
Note: I removed Entry because it is not in the GROUP BY.

Related

Calculating average time between dates in SQL

Using MySQL, I'm trying to figure out how to answer the question: What is the average number of months between users creating their Nth project?
Expected result:
| project count | Average # months |
| 1 | 0 | # On average, it took 0 months to create the first project (nothing to compare to)
| 2 | 12 | # On average, it takes a user 12 months to create their second project
| 3 | 3 | # On average, it takes a user 3 months to create their third project
My MySQL table represents projects created by users. The table can be summarized as:
| user_id | project created at |
|---------|--------------------|
| 1 | Jan 1, 2020 1:00 pm|
| 1 | Feb 2, 2020 3:45 am|
| 1 | Nov 6, 2020 0:01 am|
| 1 | Mar 4, 2021 5:01 pm|
|------------------------------|
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
| 2 | Another timestamp |
|------------------------------|
| ... | Another timestamp |
| ... | Another timestamp |
Some users will have one project while some may have hundreds.
Edit: Current Implementation
with
paid_self_serve_projects_presentation as (
select
`Paid Projects`.owner_email
`Owner Email`,
row_number() over (partition by `Paid Projects`.owner_uuid order by created_at)
`Project Count`,
day(`Paid Projects`.created_at)
`Created Day`,
month(`Paid Projects`.created_at)
`Created Month`,
year(`Paid Projects`.created_at)
`Created Year`,
`Paid Projects`.created_at
`Created`
from self_service_paid_projects as `Paid Projects`
order by `Paid Projects`.owner_uuid, `Paid Projects`.created_at
)
select `Projects`.* from paid_self_serve_projects_presentation as `Projects`
You can use window functions. I am thinking row_number() to enumerate the projects of each user ordered by creation date, and lag() to get the date when the previous project was created:
select rn, avg(datediff(created_at, lag_created_at)) avg_diff_days
from (
select t.*,
row_number() over(partition by user_id order by created_at) rn,
lag(created_at, 1, created_at) over(partition by user_id order by created_at) lag_created_at
from mytable t
) t
group by rn
This gives you the average difference in days, which is somehow more accurates that months. If you really want months, then use timestampdiff(month, lag_created_at, created_at) instead of datediff() - but be aware that the function returns an integer value, hence there is a loss of precision.

Get the count() where created_date is cumulative and date based

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.
Here is a sqlfiddle.
We have a contacts table with two fields, eid and create_time:
eid create_time
991772 April, 21 2016 11:34:21
989628 April, 17 2016 02:19:57
985557 April, 04 2016 09:56:39
981920 March, 30 2016 11:03:12
981111 March, 30 2016 09:36:48
I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:
set #csum = 0;
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts,
(#csum + count(c.eid)) as cumulative_contacts
from
contacts c
group by
yr,
mth
That runs but gives me unexpected results.
If I run:
select count(*) from contacts where date(create_time) < current_date
I get the total number of records in the table 146.
I therefore expected the final row in my query using #csum to have 146 for April 2016. It has only 3?
What my goal is for field cumulative_contacts:
For the record with e.g. January 2016.
select count(*) from contacts where date(create_time) < '2016-02-01';
And the record for February would have:
select count(*) from contacts where date(create_time) < '2016-03-01';
And so on
Try this, a bit of modification from your sql;)
CREATE TABLE IF NOT EXISTS `contacts` (
`eid` char(50) DEFAULT NULL,
`create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');
SET #csum = 0;
SELECT t.*, #csum:=(#csum + new_contacts) AS cumulative_contacts
FROM (
SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
FROM contacts c
GROUP BY yr, mth) t
Output results is
| yr | mth | new_contacts | cumulative_contacts |
------ ----- -------------- ---------------------
| 2016 | 3 | 2 | 2 |
| 2016 | 4 | 3 | 5 |
This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.
SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
(SELECT
YEAR(create_time) AS yr,
MONTH(create_time) AS mth,
COUNT(eid) AS new_contacts,
MAX(eid) AS max_eid
FROM
contacts
GROUP BY
yr,
mth
ORDER BY create_time) s1 INNER JOIN
(SELECT eid, (#sum:=#sum+1) AS cummulative_contacts
FROM
contacts INNER JOIN
(SELECT #sum := 0) r
ORDER BY create_time) s2 ON max_eid=s2.eid;
--Result sample--
| yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 | 1 | 4 | 132 |
| 2016 | 2 | 4 | 136 |
| 2016 | 3 | 7 | 143 |
| 2016 | 4 | 3 | 146 |
Try this: fiddele
Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)
select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as dates
left join
(
select
year(c.create_time) as yr,
month(c.create_time) as mth,
count(c.eid) as new_contacts
from
contacts c
group by
year(c.create_time),
month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by
dates.yr,
dates.mth,
dates.new_contacts -- not needed by MySql, present here for other DBs compatibility
order by 1,2

MySQL calculate moving average of N rows

I'm trying to calculate the moving average of N rows, for all rows in a single query. In the example case, I am attempting to calculate the moving average of 50 rows.
SELECT
h1.date,
h1.security_id,
( SELECT
AVG(last50.close)
FROM (
SELECT h.close
FROM history as h
WHERE h.date <= h1.date AND h.security_id = h1.security_id
ORDER BY h.date DESC
LIMIT 50
) as last50
) as avg50
FROM history as h1
However, MySQL gives me an error when running this query:
Unknown column 'h1.date' in 'where clause'
I'm trying this method because the other solutions listed don't really seem to work for my use case. There are solutions for a moving average of N days, but since all dates are not accounted for in my data set, I need the average of N rows.
This solution, shown below, doesn't work because AVG (also SUM and COUNT) doesn't account for LIMIT:
SELECT
t1.data_date
( SELECT SUM(t2.price) / COUNT(t2.price)
FROM t as t2
WHERE t2.data_date <= t1.data_date
ORDER BY t2.data_date DESC
LIMIT 5
) AS 'five_row_moving_average_price'
FROM t AS t1
ORDER BY t1.data_date;
This question looks promising, but is somewhat indecipherable to me.
Any suggestions? Here's an SQLFiddle to play around in.
plan
self join history on last 50 days
take average grouping by date and security id ( of current )
query
select curr.date, curr.security_id, avg(prev.close)
from history curr
inner join history prev
on prev.`date` between date_sub(curr.`date`, interval 49 day) and curr.`date`
and curr.security_id = prev.security_id
group by 1, 2
order by 2, 1
;
output
+---------------------------+-------------+--------------------+
| date | security_id | avg(prev.close) |
+---------------------------+-------------+--------------------+
| January, 04 2016 00:00:00 | 1 | 10.770000457763672 |
| January, 05 2016 00:00:00 | 1 | 10.800000190734863 |
| January, 06 2016 00:00:00 | 1 | 10.673333485921225 |
| January, 07 2016 00:00:00 | 1 | 10.59250020980835 |
| January, 08 2016 00:00:00 | 1 | 10.432000160217285 |
| January, 11 2016 00:00:00 | 1 | 10.40166680018107 |
| January, 12 2016 00:00:00 | 1 | 10.344285828726631 |
| January, 13 2016 00:00:00 | 1 | 10.297500133514404 |
| January, 14 2016 00:00:00 | 1 | 10.2877779006958 |
| January, 04 2016 00:00:00 | 2 | 56.15999984741211 |
| January, 05 2016 00:00:00 | 2 | 56.18499946594238 |
| .. | .. | .. |
+---------------------------+-------------+--------------------+
sqlfiddle
reference
sql rolling averages
modified to use last 50 rows
select
rnk_curr.`date`, rnk_curr.security_id, avg(rnk_prev50.close)
from
(
select `date`, security_id,
#row_num := if(#lag = security_id, #row_num + 1,
if(#lag := security_id, 1, 1)) as row_num
from history
cross join ( select #row_num := 1, #lag := null ) params
order by security_id, `date`
) rnk_curr
inner join
(
select date, security_id, close,
#row_num := if(#lag = security_id, #row_num + 1,
if(#lag := security_id, 1, 1)) as row_num
from history
cross join ( select #row_num := 1, #lag := null ) params
order by security_id, `date`
) rnk_prev50
on rnk_curr.security_id = rnk_prev50.security_id
and rnk_prev50.row_num between rnk_curr.row_num - 49 and rnk_curr.row_num
group by 1,2
order by 2,1
;
sqlfiddle
note
the if function is to force the correct order of evaluation of variables.
In mysql 8 window function frame can be used to obtain the averages.
SELECT date, security_id, AVG(close) OVER (PARTITION BY security_id ORDER BY date ROWS 49 PRECEDING) as ma
FROM history
ORDER BY date DESC
This calculates the average of the current row and 49 preceding rows.

Obtain a list of month and records from the current date(month) in two different years

I have such a set of records in a table in a mysql database;
Date | Number_of_leaves
10th-December-2015 | 10 leaves
6th-August-2015 | 10 leaves
15th-September-2015 | 14 leaves
15th-January-2016: | 100 leaves
7th-November-2015: | 4 leaves
9th-October -2015: | 200 leaves
How can i return a list months and their records for just the past 4 months from Jan-2016 backwards? In other words, i need a result for the past 4 months including the current one like this:
January 2016 | 100 leaves
December 2015 | 10 leaves
November 2015 | 4 leaves
October 2015 | 200 leaves
The above is the kind of result which shows the month and the corresponding year with the number of leaves collected in that month and corresponding year
Schema
create table xyz
( id int auto_increment primary key,
theDate date not null,
leaves int not null
);
-- truncate table xyz;
insert xyz(theDate,leaves) values
('2016-04-10',444510),
('2016-02-10',55510),
('2015-12-10',10),
('2015-08-06',10),
('2015-09-15',14),
('2016-01-15',100),
('2015-11-07',4),
('2015-10-09',200);
Query 1
select month(theDate) as m,
year(theDate) as y,
sum(leaves) as leaves
from xyz
where theDate<='2016-02-01'
group by month(theDate),year(theDate)
order by theDate desc
limit 4;
or
Query 2
select concat(monthname(theDate),' ',year(theDate)) as 'Month/Year',
sum(leaves) as leaves
from xyz
where theDate<='2016-02-01'
group by month(theDate),year(theDate)
order by theDate desc
limit 4;
+---------------+--------+
| Month/Year | leaves |
+---------------+--------+
| January 2016 | 100 |
| December 2015 | 10 |
| November 2015 | 4 |
| October 2015 | 200 |
+---------------+--------+
op is your table name first use str_to_date for convert string to date format .we use if because your date format is different
select * FROM (
SELECT *,
IFNULL(
IFNULL(
str_to_date(Date,'%D-%b-%Y'),str_to_date(Date,'%d-%M-%Y')) ,
str_to_date(Date,'%D-%M-%Y')
)
f_date
FROM `op`
order by number_of_leaves DESC,f_date ASC
) tab
group by month(tab.f_date) LIMIT 5

Null Values during Query

I have the below table, pretty simple.
==========================================================================
attendanceID | agentID | incurredDate | points | comment
==========================================================================
10 | vimunson | 2013-07-22 | 2 | Some Text
11 | vimunson | 2013-07-29 | 2 | Some Text
12 | vimunson | 2013-12-06 | 1 | Some Text
The his query below:
SELECT
attendanceID,
agentID,
incurredDate,
leadDate,
points,
#1F:=IF(incurredDate <= curdate() - 90
AND leadDate = NULL,
points - 1,
IF(DATEDIFF(leadDate, incurredDate) > 90,
points - 1,
points)) AS '1stFallOff',
#2F:=IF(incurredDate <= curdate() - 180
AND leadDate = NULL,
points - 2,
IF(DATEDIFF(leadDate, incurredDate) > 180,
points - 2,
#1F)) AS '2ndFallOff',
IF(#total < 0, 0, #total:=#total + #2F) AS Total,
comment,
linked
FROM
(SELECT
attendanceID,
mo.agentID,
#r AS leadDate,
(#r:=incurredDate) AS incurredDate,
comment,
points,
linked
FROM
(SELECT
m . *
FROM
(SELECT #_date = NULL, #total:=0) varaible, attendance m
ORDER by agentID , incurredDate desc) mo
where
agentID = 'vimunson'
AND (case
WHEN #_date is NULL or #_date <> incurredDate THEN #r:=NULL
ELSE NULL
END IS NULL)
AND (#_date:=incurredDate) IS NOT NULL) T
ORDER BY agentID , incurredDate
When I run the query it returns the below:
========================================================================================================================================
attendanceID | agentID | incurredDate | leadDate | points | 1stFallOff | 2ndFallOff | Total | comment
========================================================================================================================================
10 | vimunson | 2013-07-22 | NULL | 2 | 2 | 2 | 2 | Some Text
11 | vimunson | 2013-07-29 | NULL | 2 | 2 | 2 | 4 | Some Text
12 | vimunson | 2013-12-06 | NULL | 1 | 2 | 1 | 5 | Some Text
I cannot figure out why the leadDate column is `null'. I have narrowed it down to a user session. For example if I run it again with the same user session it will come back with what I want.
The way variables #r and #_date are passed around relies on a specific order in which certain parts of the query are evaluated. That's a risky assumption to make in a query language that is declarative rather than imperative. The more sophisticated a query optimizer is, the more unpredictable the behaviour of this query will be. A 'simple' engine might follow your intentions, another engine might adapt its behaviour as you go, for example because it uses temporary indexes to improve query performance.
In situations where you need to pass values from one row to another, it would be better to use a cursor.
http://dev.mysql.com/doc/refman/5.0/en/cursors.html
EDIT: sample code below.
I focused on column 'leadDate'; implementation of the falloff and total columns should be similar.
CREATE PROCEDURE MyProc()
BEGIN
DECLARE done int DEFAULT FALSE;
DECLARE currentAttendanceID int;
DECLARE currentAgentID, previousAgentID varchar(8);
DECLARE currentIncurredDate date;
DECLARE currentLeadDate date;
DECLARE currentPoints int;
DECLARE currentComment varchar(9);
DECLARE myCursor CURSOR FOR
SELECT attendanceID, agentID, incurredDate, points, comment
FROM attendance
ORDER BY agentID, incurredDate DESC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
CREATE TEMPORARY TABLE myTemp (
attendanceID int,
agentID varchar(8),
incurredDate date,
leadDate date,
points int,
comment varchar(9)
) ENGINE=MEMORY;
OPEN myCursor;
SET previousAgentID := NULL;
read_loop: LOOP
FETCH myCursor INTO currentAttendanceID, currentAgentID, currentIncurredDate, currentPoints, currentComment;
IF done THEN
LEAVE read_loop;
END IF;
IF previousAgentID IS NULL OR previousAgentID <> currentAgentID THEN
SET currentLeadDate := NULL;
SET previousAgentID := currentAgentID;
END IF;
INSERT INTO myTemp VALUES (currentAttendanceID, currentAgentID, currentIncurredDate, currentLeadDate, currentPoints, currentComment);
SET currentLeadDate := currentIncurredDate;
END LOOP;
CLOSE myCursor;
SELECT *
FROM myTemp
ORDER BY agentID, incurredDate;
DROP TABLE myTemp;
END
FYC: http://sqlfiddle.com/#!2/910a3/1/0
Try this:
SELECT a.attendanceID, a.agentID, a.incurredDate, a.leadDate, a.points,
#1F:= (CASE WHEN incurredDate <= CURDATE()-90 AND leadDate=NULL THEN points-1
WHEN DATEDIFF(leadDate, incurredDate) > 90 THEN points-1
ELSE points
END) AS '1stFallOff',
#2F:= (CASE WHEN incurredDate <= CURDATE()-180 AND leadDate=NULL THEN points-2
WHEN DATEDIFF(leadDate, incurredDate) > 180 THEN points-2
ELSE #1F
END) AS '2ndFallOff',
(#Total:=#Total + a.points) AS Total, a.comment
FROM (SELECT a.attendanceID, a.agentID, a.incurredDate, b.incurredDate leadDate, a.points, a.comment
FROM attendance a
LEFT JOIN attendance b ON a.agentID = b.agentID AND a.incurredDate < b.incurredDate
GROUP BY a.agentID, a.incurredDate
) AS A, (SELECT #Total:=0, #1F:=0, #2F:=0) B;
Check the SQL FIDDLE DEMO
OUTPUT
| ATTENDANCEID | AGENTID | INCURREDDATE | LEADDATE | POINTS | 1STFALLOFF | 2NDFALLOFF | TOTAL | COMMENT |
|--------------|----------|---------------------------------|---------------------------------|--------|------------|------------|-------|-----------|
| 10 | vimunson | July, 22 2013 00:00:00+0000 | July, 29 2013 00:00:00+0000 | 2 | 2 | 2 | 2 | Some Text |
| 11 | vimunson | July, 29 2013 00:00:00+0000 | December, 06 2013 00:00:00+0000 | 2 | 1 | 1 | 4 | Some Text |
| 12 | vimunson | December, 06 2013 00:00:00+0000 | (null) | 1 | 1 | 1 | 5 | Some Text |
After reviewing multiple results I was able to come up with something that is what I expect. I have entered more data and the below syntax. I liked the idea of the cursor but it was not ideal for my use, so I did not use it. I did not want to use CASE or any JOINS since they can be complex.
http://sqlfiddle.com/#!2/2fb86/1
SELECT
attendanceID,
agentID,
incurredDate,
#ld:=(select
incurredDate
from
attendance
where
incurredDate > a.incurredDate
and agentID = a.agentID
order by incurredDate
limit 1) leadDate,
points,
#1F:=IF(incurredDate <= DATE_SUB(curdate(),
INTERVAL IF(incurredDate < '2013-12-02', 90, 60) DAY)
AND #ld <=> NULL,
points - 1,
IF(DATEDIFF(COALESCE(#ld, '1900-01-01'),
incurredDate) > IF(incurredDate < '2013-12-02', 90, 60),
points - 1,
points)) AS '1stFallOff',
#2F:=IF(incurredDate <= DATE_SUB(curdate(),
INTERVAL IF(incurredDate < '2013-12-02',
180,
120) DAY)
AND getLeadDate(incurredDate, agentID) <=> NULL,
points - 1,
IF(DATEDIFF(COALESCE(#ld, '1900-01-01'),
incurredDate) > IF(incurredDate < '2013-12-02',
180,
120),
points - 2,
#1F)) AS '2ndFallOff',
IF((#total + #2F) < 0,
0,
IF(DATE_ADD(incurredDate, INTERVAL 365 DAY) <= CURDATE(),
#total:=0,
#total:=#total + #2F)) AS Total,
comment,
linked,
DATE_ADD(incurredDate, INTERVAL 365 DAY) AS 'fallOffDate'
FROM
(SELECT #total:=0) v,
attendance a
WHERE
agentID = 'vimunson'
GROUP BY agentID , incurredDate