Using MySQL, I'm trying to get the number of active users I have in any given month. I have a table with ActivationDate and TerminationDate columns, and if the month being counted is after the ActivationDate and TerminationDate is null, then the user is active and should be counted. I would like to summarize these amounts by month. I'm thinking I could just sum each side and calculate the total but breaking that down won't give me a running total. I've tried with window functions, but I don't have enough experience with them to know exactly what I'm doing wrong and I'm not certain how to ask the right question.
So for instance, if I have the following data...
UserId ActivationDate TerminationDate
1 2020-01-01 null
2 2020-01-15 null
3 2020-01-20 2020-01-30
4 2020-02-01 null
5 2020-02-14 2020-02-27
6 2020-02-15 2020-02-28
7 2020-03-02 null
8 2020-03-05 null
9 2020-03-20 2020-03-21
I would like my results to be similar to:
2020-01 2 (there are 2 active users, since one signed up but cancelled before the end of the month)
2020-02 3 (2 from the previous month, plus 1 that signed up this month and is still active)
2020-03 5 (3 from previous, 2 new, 1 cancellation)
You can unpivot, then aggregate and sum. In MySQL 8.0.14 or higher, you can use a lateral join:
select date_format(x.dt, '%Y-%m-01') as dt_month,
sum(sum(cnt)) over(order by date_format(x.dt, '%Y-%m-01')) as cnt_active_users
from mytable t
cross join lateral (
select t.activationdate as dt, 1 as cnt
union all select t.terminationdate, -1
) x
where x.dt is not null
group by dt_month
order by dt_month
In earlier 8.x versions:
select date_format(x.dt, '%Y-%m-01') as dt_month,
sum(sum(cnt)) over(order by date_format(x.dt, '%Y-%m-01')) as cnt_active_users
from (
select activationdate as dt, 1 as cnt from from mytable
union all select terminationdate, -1 from mytable
) x
where x.dt is not null
group by dt_month
order by dt_month
You don't say what version of MySQL. If you're using 8.0, this should work:
create table userdates (
UserId int not null,
ActivationDate date not null,
TerminationDate date null
);
insert into userdates (UserId, ActivationDate, TerminationDate)
values
(1, cast("2020-01-01" as date), null )
, (2, cast("2020-01-15" as date), null )
, (3, cast("2020-01-20" as date), cast("2020-01-30" as date))
, (4, cast("2020-02-01" as date), null )
, (5, cast("2020-02-14" as date), cast("2020-02-27" as date))
, (6, cast("2020-02-15" as date), cast("2020-02-28" as date))
, (7, cast("2020-03-02" as date), null )
, (8, cast("2020-03-05" as date), null )
, (9, cast("2020-03-20" as date), cast("2020-03-21" as date))
, (10, cast("2020-07-20" as date), null)
, (11, cast("2019-09-12" as date), cast("2019-09-14" as date));
WITH RECURSIVE d (dt)
AS (
SELECT cast("2019-01-01" as date)
UNION ALL
SELECT date_add(dt, interval 1 month)
FROM d
WHERE dt < cast("2020-12-01" as date)
)
select d.dt
, count(distinct ud.UserId) as UserCount
from userdates ud
right outer join d on d.dt >= date_format(ud.ActivationDate, '%Y-%m-01')
and (d.dt <= ud.TerminationDate or ud.TerminationDate is null)
group by d.dt;
Related
I want to retrieve the sum of transactions for every date from the last 7 days from my MySQL database, but some dates don't have any transactions. How do I return a 0 for those days?
Here is the SQL query I've worked on nd tried, but this one only gives those that do have a value for those days.
SELECT COUNT(transaction_id) AS orders, SUM(amount) AS sales, CAST(time AS DATE) AS time FROM tbltransactions WHERE time BETWEEN CAST(? AS DATE) AND CAST(? AS DATE) GROUP BY CAST(time AS DATE) ORDER BY time ASC
Try to generate the dates first, then join your data table:
SELECT COUNT(transaction_id) AS orders
, SUM(amount) AS sales
, CAST(dates.time AS DATE) AS time
FROM
(
SELECT DATE_SUB(CURDATE(), INTERVAL 7 DAY) + INTERVAL num DAY AS time
FROM
(
SELECT 1 AS num UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
) n
) dates
LEFT JOIN tbltransactions
ON dates.time = tbltransactions.time
GROUP BY CAST(dates.time AS DATE)
ORDER BY dates.time ASC
here is the test data:
CREATE TABLE tbltransactions (
time DATE,
transaction_id INT,
amount DECIMAL(10,2)
);
INSERT INTO tbltransactions (time, transaction_id, amount)
VALUES
('2023-01-20', 1, 100.00),
('2023-01-21', 2, 200.00),
('2023-01-27', 3, 300.00),
('2023-01-29', 4, 400.00),
('2023-01-29', 5, 500.00);
I have been trying to solve the below problem
https://www.hackerrank.com/challenges/15-days-of-learning-sql/problem?isFullScreen=true
but looks like stuck at finding the count of hacker_ids who have submissions for every date in the order by given start date following. Below is the 2 versions of solution max_submissions which gives max count of submission per date with lowest id if multiple max dates that is coming as correct but in the final query for count I am unable to get proper counts it is giving count as 35 for all dates with submissions on every day per hacker_id. Only 2nd column which is unique hackers count in the output that I am unable to get either I get 35 as count value for all or other values which seems to be different from expected output but by logic seems correct
with max_submissions
as
(
Select t.submission_date,t.hacker_id,t.cnt,h.name From
(Select * from
(Select submission_date, hacker_id, cnt, dense_rank() over (partition by submission_date order by cnt desc,hacker_id asc) as rn
from
(Select
submission_date, hacker_id, count(submission_id) cnt
from
submissions
where submission_date between '2016-03-01' and '2016-03-15'
group by submission_date, hacker_id
)
)where rn =1
) t join
hackers h on t.hacker_id=h.hacker_id
),
t1
as
(
select hacker_id
from
(
Select
hacker_id, lead(submission_date) over ( order by hacker_id,submission_date)
-submission_date cnt
from
submissions
where submission_date between '2016-03-01' and '2016-03-15'
order by hacker_id asc, submission_date asc)
group by hacker_id having sum(case when cnt=1 then 1 else 0 end) =14)
select s.submission_date,count( t1.hacker_id)
from submissions s
join
t1 on
s.hacker_id=t1.hacker_id
group by s.submission_date;
This should give you the correct result:
WITH calendar (day) AS (
-- Generate a calendar so we don't need to assume that there will always be a submission
-- every day.
SELECT DATE '2016-03-01' + LEVEL - 1 AS day
FROM DUAL
CONNECT BY LEVEL <= 15
),
daily_hacker_submissions (submission_date, hacker_id, num_submissions) AS (
-- Find the number of submissions for hackers on each day.
SELECT c.day,
hacker_id,
COUNT(*) AS num_submissions
FROM calendar c
LEFT OUTER JOIN submissions s
ON (
-- Don't assume dates are always midnight.
c.day <= s.submission_date
AND s.submission_date < c.day + 1
)
GROUP BY
c.day,
s.hacker_id
),
daily_submissions (submission_date, num_hackers, hacker_id ) AS (
-- Find the number of hackers on each day and the hacker with the greatest number of
-- submissions and the least hacker id.
SELECT submission_date,
COUNT(DISTINCT hacker_id),
MIN(hacker_id) KEEP (DENSE_RANK LAST ORDER BY num_submissions)
FROM daily_hacker_submissions
GROUP BY
submission_date
)
-- Include the hacker's name
SELECT d.submission_date,
d.num_hackers,
d.hacker_id,
h.name
FROM daily_submissions d
LEFT OUTER JOIN hackers h
ON (d.hacker_id = h.hacker_id)
Which, for the sample data:
CREATE TABLE submissions (submission_date, submission_id, hacker_id, score) AS
SELECT DATE '2016-03-01', 1, 1, 80 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 2, 1, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 3, 1, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 4, 2, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 5, 2, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 6, 1, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 7, 2, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 8, 2, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 9, 3, 80 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 10, 3, 100 FROM DUAL;
CREATE TABLE hackers (hacker_id, name) AS
SELECT 1, 'Alice' FROM DUAL UNION ALL
SELECT 2, 'Betty' FROM DUAL UNION ALL
SELECT 3, 'Carol' FROM DUAL;
Outputs:
SUBMISSION_DATE
NUM_HACKERS
HACKER_ID
NAME
2016-03-01 00:00:00
2
1
Alice
2016-03-02 00:00:00
3
2
Betty
2016-03-03 00:00:00
0
null
null
...
...
...
...
db<>fiddle here
I have a database table (raw_data) where there are multiple rows. I am looking to count the number of rows between a given time interval (9:25:00 and 9:29:59) by grouping the rows if the time difference between each row is less than or equal to 2 seconds.
For example:
EventId Date Time
1 2019/10/16 9:27:08
2 2019/10/16 9:27:11
3 2019/10/16 9:27:37
4 2019/10/16 9:27:40
5 2019/10/16 9:27:45
6 2019/10/16 9:27:45
7 2019/10/16 9:27:45
8 2019/10/16 9:27:57
the data in this snippet should yield a count of 6 (when counting items that are less than 2 seconds from each other). I.e. if an item is less than 2 seconds from the next row, chances are its the same event and therefore grouped together.
Much appreciated
Have attempted queries like:
(found at: MySQL grouping results by time periods)
SELECT count(*)
FROM
(
SELECT a.starttime AS ThisTimeStamp, MIN(b.starttime) AS NextTimeStamp
FROM raw_data a
INNER JOIN raw_data b
ON a.starttime < b.starttime
and a.startdate = b.startdate
where a.startdate ='2019-10-16'
and a.starttime >= '09:27:00' and a.starttime < '09:28:00'
and b.startdate ='2019-10-16'
and b.starttime >= '09:27:00' and b.starttime < '09:28:00'
GROUP BY a.starttime
) Sub1
WHERE Sub1.ThisTimeStamp < (Sub1.NextTimeStamp - 2)
purposefully hard coding the dates and times and comparing the results manually but the result always end up being different from the manual count.
With this table.
CREATE TABLE table2
(`EventId` int, `Date` date, `Time` time)
;
INSERT INTO table2
(`EventId`, `Date`, `Time`)
VALUES
(1, '2019-10-16', '9:27:08'),
(2, '2019-10-16, '9:27:11'),
(3, '2019-10-16, '9:27:37'),
(4, '2019-10-16, '9:27:40'),
(5, '2019-10-16, '9:27:45'),
(6, '2019-10-16, '9:27:45'),
(7, '2019-10-16', '9:27:45'),
(8, '2019-10-16', '9:27:57')
;
And this select statement
SELECT
EventId,`Date`, `Time`
FROM
(Select
EventId,`Date`, `Time`
,if (TIMESTAMPDIFF(SECOND,#date_time,STR_TO_DATE(CONCAT(`Date`, ' ', `Time`), '%Y-%m-%d %H:%i:%s') ) > 2
,1,-1) inintrvall
,#date_time := STR_TO_DATE(CONCAT(`Date`, ' ', `Time`), '%Y-%m-%d %H:%i:%s')
From table2,
(SELECT #date_time:= (SELECT min(STR_TO_DATE(CONCAT(`Date`, ' ', SUBTIME( `Time`, "5")), '%Y-%m-%d %H:%i:%s') )
FROM table2)) ti
order by `Date` ASC, `Time` ASC) t1
WHERE inintrvall = 1
order by `Date` ASC, `Time` ASC;
You will get 6 rows
EventId Date Time
1 2019-10-16 09:27:08
2 2019-10-16 09:27:11
3 2019-10-16 09:27:37
4 2019-10-16 09:27:40
5 2019-10-16 09:27:45
8 2019-10-16 09:27:57
Group by will not work on time intervalls.
so this little algorithm.
I check every row, if the prior row has a datetime older than 2 seconds
Then it marks it with 1 and if not with -1.
The ugly part is to get the actual datetime to better calculate the time differenz, for example when when a new day begins.
For these purposes it would be better to save the directy as timestamp or datetime.
I have a question that make me feel silly !
I have to do some stats on the use of my apps.
I have a table call : customer_point
id int(11) auto_increment
id_customer int(11)
type_point int(11)
date timestamp CURRENT_TIMESTAMP
I want to make this request for the entire month (with a row for each night ;) ) :
SELECT COUNT( id_customer ) , type_point, date(date)
FROM customer_point
WHERE date BETWEEN "2014-06-01 20:00:00" AND "2014-06-02 10:00:00"
GROUP BY type_point, date;
I nearly sure that i miss a crusial point but i can't find witch one.
Thank you very much for reading me !
Bye,
edit :
Sample :
INSERT INTO `customer_point` ( `id` , `id_customer` , `type_point`, `date` )
VALUES ( '', '15', '1', '2014-06-01 22:50:00'), ( '', '15', '1', '2014-06-01 23:52:00'), ( '', '15', '1', '2014-06-02 9:50:00'), ( '', '15', '1', '2014-06-30 22:50:00'), ( '', '15', '1', '2014-06-30 23:52:00'), ( '', '15', '1', '2014-07-01 02:50:00', ( '', '15', '1', '2014-07-01 09:50:00');
result :
1, 3, 2014-06-01
1, 4, 2014-06-30
I hope this will help everbody to understand my probleme :/
If you just want coutns of the actual data, check the date is within the range you are interested in and that the time is at night (ie, greater than 8pm or less than 10am, if would seem from your SQL):-
SELECT type_point, date(customer_point.date) AS aDate, COUNT( id_customer )
FROM customer_point
WHERE DATE(customer_point.date) BETWEEN "2014-06-01" AND "2014-06-30"
AND TIME(customer_point.date) >= '20:00:00' OR TIME(customer_point.date) <= '10:00:00'
GROUP BY type_point, aDate;
To get a row per day, irrespective of whether there is any data that day(ie, a count of zero it no data) then you need to generate a list of dates and then LEFT JOIN your data to it.
Something like this:-
SELECT sub0.aDate, type_point, COUNT( id_customer )
FROM
(
SELECT DATE_ADD('2014-06-01', INTERVAL units.i + tens.i * 10 DAY) AS aDate
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3) tens
) sub0
LEFT OUTER JOIN customer_point
ON sub0.aDate = date(customer_point.date)
WHERE sub0.aDate BETWEEN "2014-06-01" AND "2014-06-30"
GROUP BY sub0.aDate, type_point;
You would also probably need to generate a list of type_point values.
EDIT - to go with the updated question, can you just subtract 10 hours from the date / time. So 10am on the 1st July becomes midnight on the 30th June?
SELECT type_point, date(DATE_ADD(customer_point.date, INTERVAL -10 HOUR)) AS aDate, COUNT( id_customer )
FROM customer_point
WHERE DATE(DATE_ADD(customer_point.date, INTERVAL -10 HOUR)) BETWEEN "2014-06-01" AND "2014-06-30"
AND TIME(customer_point.date) >= '20:00:00' OR TIME(customer_point.date) <= '10:00:00'
GROUP BY type_point, aDate;
SQL fiddle:-
http://www.sqlfiddle.com/#!2/ddc95/2
The issue with this is whether items from before 10am on the 1st of June count as dates for May or for June?
Using mysql you even could do
WHERE date LIKE "2014-06-%"
Edit: You need exactly from 20:00 and then you have to take in account the first day of the next mounth until the 22:00...
Ok, then just substract those 20 hours to the date:
SELECT DATE_SUB(column, INTERVAL 20 HOUR)....
Finally:
SELECT COUNT( id_customer ) , type_point, DATE_SUB(date, INTERVAL 20 HOUR) as mydate
FROM customer_point
WHERE mydate LIKE "2014-06-%"
GROUP BY type_point, date;
I wanted to find out user's availability from database table:
primary id | UserId | startdate | enddate
1 | 42 | 2014-05-18 09:00 | 2014-05-18 10:00
2 | 42 | 2014-05-18 11:00 | 2014-05-18 12:00
3 | 42 | 2014-05-18 14:00 | 2014-05-18 16:00
4 | 42 | 2014-05-18 18:00 | 2014-05-18 19:00
Let's consider above inserted data is user's busy time, I want to find out free time gap blocks from table between start time and end time.
BETWEEN 2014-05-18 11:00 AND 2014-05-18 19:00;
Let me add here schema of table for avoiding confusion:
Create Table availability (
pid int not null,
userId int not null,
StartDate datetime,
EndDate datetime
);
Insert Into availability values
(1, 42, '2013-10-18 09:00', '2013-10-18 10:00'),
(2, 42, '2013-10-18 11:00', '2013-10-18 12:00'),
(3, 42, '2013-10-18 14:00', '2013-11-18 16:00'),
(4, 42, '2013-10-18 18:00', '2013-11-18 19:00');
REQUIREMENT:
I wanted to find out free gap records like:
'2013-10-27 10:00' to '2013-10-28 11:00' - User is available for 1 hours and
'2013-10-27 12:00' to '2013-10-28 14:00' - User is available for 2 hours and
available start time is '2013-10-27 10:00' and '2013-10-27 12:00' respectively.
Here you go
SELECT t1.userId,
t1.enddate, MIN(t2.startdate),
MIN(TIMESTAMPDIFF(HOUR, t1.enddate, t2.startdate))
FROM user t1
JOIN user t2 ON t1.UserId=t2.UserId
AND t2.startdate > t1.enddate AND t2.pid > t1.pid
WHERE
t1.endDate >= '2013-10-18 09:00'
AND t2.startDate <= '2013-11-18 19:00'
GROUP BY t1.UserId, t1.endDate
http://sqlfiddle.com/#!2/50d693/1
Using your data, the easiest way is to list the hours when someone is free. The following gets a list of hours when someone is available:
select (StartTime + interval n.n hour) as FreeHour
from (select cast('2014-05-18 11:00' as datetime) as StartTime,
cast('2014-05-18 19:00' as datetime) as EndTime
) var join
(select 0 as n union all select 1 union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all select 8 union all select 9
) n
on StartTime + interval n.n hour <= EndTime
where not exists (select 1
from availability a
where StartTime + interval n.n hour < a.EndDate and
StartTime + interval n.n hour >= a.StartDate
);
EDIT:
The general solution to your problem requires denormalizing the data. The basic query is:
select thedate, sum(isstart) as isstart, #cum := #cum + sum(isstart) as cum
from ((select StartDate as thedate, 1 as isstart
from availability a
where userid = 42
) union all
(select EndDate as thedate, -1 as isstart
from availability a
where userid = 42
) union all
(select cast('2014-05-18 11:00' as datetime), 0 as isstart
) union all
(select cast('2014-05-18 19:00' as datetime), 0 as isstart
)
) t
group by thedate cross join
(select #cum := 0) var
order by thedate
You then just choose the values where cum = 0. The challenge is getting the next date from this list. In MySQL that is a real pain, because you cannot use a CTE or view or window function, so you have to repeat the query. This is why I think the first approach is probably better for your situation.
The core query can be this. You can dress it up as you like, but I'd handle all that stuff in the presentation layer...
SELECT a.enddate 'Available From'
, MIN(b.startdate) 'To'
FROM user a
JOIN user b
ON b.userid = a.user
AND b.startdate > a.enddate
GROUP
BY a.enddate
HAVING a.enddate < MIN(b.startdate)
For times outside the 'booked' range, you have to extend this a little with a UNION, or again handle the logic at the application level