Could anybody give me an idea or hint how you could check for X consecutive days in a database table (MySQL) where logins (user id, timestamp) are stored?
Stackoverflow does it (e.g. badges like Enthusiast - if you log in for 30 consecutive days or so...). What functions would you have to use or what is the idea of how to do it?
Something like SELECT 1 FROM login_dates WHERE ...?
You can accomplish this using a shifted self-outer-join in conjunction with a variable. See this solution:
SELECT IF(COUNT(1) > 0, 1, 0) AS has_consec
FROM
(
SELECT *
FROM
(
SELECT IF(b.login_date IS NULL, #val:=#val+1, #val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT #val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
) a
GROUP BY a.consec_set
HAVING COUNT(1) >= 30
) a
This will return either a 1 or a 0 based on if a user has logged in for 30 consecutive days or more at ANYTIME in the past.
The brunt of this query is really in the first subselect. Let's take a closer look so we can better understand how this works:
With the following example data set:
CREATE TABLE tbl (
user_id INT,
login_date DATE
);
INSERT INTO tbl VALUES
(1, '2012-04-01'), (2, '2012-04-02'),
(1, '2012-04-25'), (2, '2012-04-03'),
(1, '2012-05-03'), (2, '2012-04-04'),
(1, '2012-05-04'), (2, '2012-05-04'),
(1, '2012-05-05'), (2, '2012-05-06'),
(1, '2012-05-06'), (2, '2012-05-08'),
(1, '2012-05-07'), (2, '2012-05-09'),
(1, '2012-05-09'), (2, '2012-05-11'),
(1, '2012-05-10'), (2, '2012-05-17'),
(1, '2012-05-11'), (2, '2012-05-18'),
(1, '2012-05-12'), (2, '2012-05-19'),
(1, '2012-05-16'), (2, '2012-05-20'),
(1, '2012-05-19'), (2, '2012-05-21'),
(1, '2012-05-20'), (2, '2012-05-22'),
(1, '2012-05-21'), (2, '2012-05-25'),
(1, '2012-05-22'), (2, '2012-05-26'),
(1, '2012-05-25'), (2, '2012-05-27'),
(2, '2012-05-28'),
(2, '2012-05-29'),
(2, '2012-05-30'),
(2, '2012-05-31'),
(2, '2012-06-01'),
(2, '2012-06-02');
This query:
SELECT a.*, b.*, IF(b.login_date IS NULL, #val:=#val+1, #val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT #val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
Will produce:
As you can see, what we are doing is shifting the joined table by +1 day. For each day that is not consecutive with the prior day, a NULL value is generated by the LEFT JOIN.
Now that we know where the non-consecutive days are, we can use a variable to differentiate each set of consecutive days by detecting whether or not the shifted table's rows are NULL. If they are NULL, the days are not consecutive, so just increment the variable. If they are NOT NULL, then don't increment the variable:
After we've differentiated each set of consecutive days with the incrementing variable, it's then just a simple matter of grouping by each "set" (as defined in the consec_set column) and using HAVING to filter out any set that has less than the specified consecutive days (30 in your example):
Then finally, we wrap THAT query and simply count the number of sets that had 30 or more consecutive days. If there was one or more of these sets, then return 1, otherwise return 0.
See a SQLFiddle step-by-step demo
You can add X to timestamp date and chech if distinct( dates ) in this date range is == X:
At least once every day of those 30 days:
SELECT distinct 1
FROM
login_dates l1
inner join
login_dates l2
on l1.user = l2.user and
l2.timestamp between l1.timestamp and
date_add( l1.timestamp, Interval X day )
where l1.user = some_user
group by
DATE(l1.timestamp)
having
count( distinct DATE(l1.timestamp) ) = X
(You don't speack about performance requirements ... ;) )
* Edited * The query for only last X days: east once every day of those 30 days
SELECT distinct 1
FROM
login_dates l1
where l1.user = some_user
and l1.timestamp > date_add( CURDATE() , Interval -X day )
group by
l1.user
having
count( distinct DATE(l1.timestamp) ) = X
That's a hard problem to solve with SQL alone.
The core of the problem is that you need to compare dynamic results sets to each other in one query. For example, you need to get all the logins/session IDs for one DATE, then JOIN or UNION them with a list to a grouping of logins from the DATE() (which you could use DATE_ADD to determine). You could do this for N number of consecutive dates. If you have any rows left, then those sessions have been logged in over that period.
Assume the following table:
sessionid int, created date
This query returns all the sessionids that have have rows for the last two days:
select t1.sessionid from logins t1
join logins t2 on t1.sessionid=t2.sessionid
where t1.created = DATE(date_sub(now(), interval 2 day))
AND t2.created = DATE(date_sub(now(), interval 1 day));
As you can see, the SQL will get gnarly for 30 days. Have a script generate it. :-D
This further assumes that every day, the login table is updated with the session.
I don't know if this actually solves your problem, but I hope I have helped frame the problem.
Good luck.
Wouldn't it be more simple to have an extra column consecutive_days in login_dates table with default value 1. This would indicate the length of consecutive dates ending on that day.
You create an insert after trigger on login_dates where you check if there is an entry for the previous day.
If there is none, then the field would have the default value 1 meaning that a new sequence is started on that date.
If here is an entry for previous day then you change the days_logged_in value from the default 1 to be 1 greater then that of previous day.
Ex:
| date | consecutive_days |
|------------|------------------|
| 2013-11-13 | 5 |
| 2013-11-14 | 6 |
| 2013-11-16 | 1 |
| 2013-11-17 | 2 |
| 2013-11-18 | 3 |
Related
I have to make some SQL query.
I'll only put here tables and results I need - I am sure this is the best way for a clear explanation (at the bottom of the question I provided SQL queries for database filling).
short description:
TASK: After full join concatenation I receive a result where (for example) tableA.point column (that is used in the SELECT statement) in some cells returns NULL. In these cases, I need to change tableA.point column to the tableB.point (from the joined table).
So, tables:
(Columns point + date are composite key.)
outcome_o:
income_o:
The result I need an example (we can see - I need a concatenated table with both out and inc columns in rows)
My attempt:
SELECT outcome_o.point,
outcome_o.date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date
The result is the same as I need, except NULL in different point and date columns:
I tried to avoid this with CASE statement:
SELECT
CASE outcome_o.point
WHEN NULL
THEN income_o.point
ELSE outcome_o.point
END as point,
....
But this not works as I imagined (all cells became NULL in point column).
Could anyone help me with this solution? I know there is I have to use JOIN, CASE (case-mandatory) and possibly UNION commands.
Thanks
Tables creation:
CREATE TABLE income(
point INT,
date VARCHAR(60),
inc FLOAT
)
CREATE TABLE outcome(
point INT,
date VARCHAR(60),
ou_t FLOAT
)
INSERT INTO income VALUES
(1, '2001-03-22', 15000.0000),
(1, '2001-03-23', 15000.0000),
(1, '2001-03-24', 3400.0000),
(1, '2001-04-13', 5000.0000),
(1, '2001-05-11', 4500.0000),
(2, '2001-03-22', 10000.0000),
(2, '2001-03-24', 1500.0000),
(3, '2001-09-13', 11500.0000),
(3, '2001-10-02', 18000.0000);
INSERT INTO outcome VALUES
(1, '2001-03-14 00:00:00.000', 15348.0000),
(1, '2001-03-24 00:00:00.000', 3663.0000),
(1, '2001-03-26 00:00:00.000', 1221.0000),
(1, '2001-03-28 00:00:00.000', 2075.0000),
(1, '2001-03-29 00:00:00.000', 2004.0000),
(1, '2001-04-11 00:00:00.000', 3195.0400),
(1, '2001-04-13 00:00:00.000', 4490.0000),
(1, '2001-04-27 00:00:00.000', 3110.0000),
(1, '2001-05-11 00:00:00.000', 2530.0000),
(2, '2001-03-22 00:00:00.000', 1440.0000),
(2, '2001-03-29 00:00:00.000', 7848.0000),
(2, '2001-04-02 00:00:00.000', 2040.0000),
(3, '2001-09-13 00:00:00.000', 1500.0000),
(3, '2001-09-14 00:00:00.000', 2300.0000),
(3, '2002-09-16 00:00:00.000', 2150.0000);
The first step is to create a date range reference table. To do that, we can use Common Table Expression (cte):
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt)
SELECT mindt
FROM cte
Here I'm trying to generate the dynamic date range based on the minimum & maximum date value from both of your tables. This is particularly useful when you don't to keep on changing the date range but if you don't mind, you can just generate them simply like so:
WITH RECURSIVE cte AS (
SELECT '2001-03-14 00:00:00' dt
UNION
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt + INTERVAL 1 DAY <= '2002-09-16')
SELECT mindt
FROM cte
From here, I'll do a CROSS JOIN to get the distinct point value from both tables:
...
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p
Now we have a reference table with all the point and date range. Let's wrap those in another cte.
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, mindt
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2;
Next step is taking your current query attempt and LEFT JOIN it to the reference table:
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, CAST(mindt AS DATE) AS rdate
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2
LEFT JOIN outcome
ON cte2.point=outcome.point
AND cte2.rdate=outcome.date
LEFT JOIN income
ON cte2.point=income.point
AND cte2.rdate=income.date
/*added conditions*/
WHERE cte2.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY cte2.rdate;
I noticed that your date column is using VARCHAR() datatype instead of DATE or DATETIME. Which is why my initial test return only one result. However, I do notice that if I compare YYYY-MM-DD format against your table date value, it returns other results, which is why I did CAST(mindt AS DATE) AS rdate in cte2. I do recommend that you change the date column to MySQL standard date format though.
You probably find the query a bit too long but if you have a table where you store dates or as we call it calendar table, the query will be much shorter, perhaps like this:
SELECT *
FROM calendar
LEFT JOIN outcome
ON calendar.point=outcome.point
AND calendar.rdate=outcome.date
LEFT JOIN income
ON calendar.point=income.point
AND calendar.rdate=income.date
/*added conditions*/
WHERE calendar.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY calendar.rdate;
Demo fiddle
It seems I was using the wrong syntax for the solution. So, as I found out, dynamically column selection is accessible in the SELECT query:
correct CASE statement:
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
In this case query selects joined table column in the case the main table column is NULL.
Full query (returns result exactly I need):
SELECT
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
(
CASE
WHEN outcome_o.date IS NULL
THEN income_o.date
ELSE outcome_o.date
END
) as date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date
I have Week table -
Week, Friday_dates, Start_date, End_date
1, 2021-07-16, 2021-07-12, 2021-07-18
2, 2021-07-23, 2021-07-19, 2021-07-25
and so on..
I have another table with list of details by date
Date, To_do_list
20220-01-02, Text
20220-01-03, Call
20220-01-03, Text
20220-01-04, Call
20220-01-05, Call
I want count no of to_do_list where it says Call and want join this table with date table for specific dates falling under start_date and end_date.
Can anyone help me please?
So given your data (I am using a CTE so I don't have to create the tables)
WITH Weeks(Week, Friday_dates, Start_date, End_date) as (
SELECT * FROM VALUES
(1, '2021-07-16'::date, '2021-07-12'::date, '2021-07-18'::date),
(2, '2021-07-23'::date, '2021-07-19'::date, '2021-07-25'::date)
), Activity(Date, To_do_list) as (
SELECT * FROM VALUES
('20220-01-02'::date, 'Text'),
('20220-01-03'::date, 'Call'),
('20220-01-03'::date, 'Text'),
('20220-01-04'::date, 'Call'),
('20220-01-05'::date, 'Call')
)
To join the activity to the weeks and limit the count to just the call's, which will be counted per week, you would:
SELECT w.Friday_dates
,count(*) as call_count
FROM Activity AS a
JOIN Weeks AS w
on a.DATE between w.Start_date and w.End_date
WHERE a.To_do_list = 'Call'
GROUP BY 1
ORDER BY 1;
which gives no results as none of your weeks overlap the activity time ranges.
So if we change the data to be overlapping:
WITH Weeks(Week, Friday_dates, Start_date, End_date) as (
SELECT * FROM VALUES
(1, '2021-07-16'::date, '2021-07-12'::date, '2021-07-18'::date),
(2, '2021-07-23'::date, '2021-07-19'::date, '2021-07-25'::date)
), Activity(Date, To_do_list) as (
SELECT * FROM VALUES
('2021-07-18'::date, 'Text'),
('2021-07-18'::date, 'Call'),
('2021-07-18'::date, 'Text'),
('2021-07-19'::date, 'Call'),
('2021-07-20'::date, 'Call')
)
we now get:
FRIDAY_DATES
CALL_COUNT
2021-07-16
1
2021-07-23
2
But if we take you SQL from the comment and make it valid:
select * from Weeks a
left join (
select Date,
count(*) from Activity
where To_do_list like '%Call%'
group by DATE
) b
on b.DATE between a.Start_date and a.End_date
ORDER BY 1;
we get..
WEEK
FRIDAY_DATES
START_DATE
END_DATE
DATE
COUNT(*)
1
2021-07-16
2021-07-12
2021-07-18
2021-07-18
1
2
2021-07-23
2021-07-19
2021-07-25
2021-07-19
1
2
2021-07-23
2021-07-19
2021-07-25
2021-07-20
1
I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha
I got a 2 column table with transactions where time of change (unix_time) and change value is stored.
create table transactions (
changed int(11),
points int(11)
);
insert into transactions values (UNIX_TIMESTAMP('2014-03-27 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2014-05-02 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2015-01-01 03:00:00'), -100);
insert into transactions values (UNIX_TIMESTAMP('2015-05-01 03:00:00'), +150);
To get current balance you need to sum all values and to get balance from the past you need to sum if change time for this value is less then requested like:
select
sum(case when changed < unix_timestamp('2013-12-01') then
points
else
0
end) as cash_balance_2013_11,
...
so for each month there need to be a separate SQL code. I would like to have SQL code that will give me balances for all months. (eg from fixed date till now)
EDIT:
HERE IS SQL FIDDLE
Can you just group by and order by month?
UPDATE: to get running totals you have to join the individual months to a set of totals-by-month, matching on "less than or equal to":-
select
m.single_month
, sum(month_of_change.total_points) as running total_by_month
from
(
select
sum(points) as total_points
, month_of_change
from
(
select
points
, MONTH(FROM_UNIXTIME(t.time_of_change)) as month_of_change -- assumes unix_time
from mytable t
) x
group by month_of_change
) monthly_totals
inner join
(
select distinct MONTH(FROM_UNIXTIME(t.time_of_change)) as single_month
) m
on monthly_totals.month_of_change <= m.single_month
group by m.single_month
(N.B: not tested)
I have the following MySQL table to log the registration status changes of pupils:
CREATE TABLE `pupil_registration_statuses` (
`status_id` INT(11) NOT NULL AUTO_INCREMENT,
`status_pupil_id` INT(10) UNSIGNED NOT NULL,
`status_status_id` INT(10) UNSIGNED NOT NULL,
`status_effectivedate` DATE NOT NULL,
PRIMARY KEY (`status_id`),
INDEX `status_pupil_id` (`status_pupil_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Example data:
INSERT INTO `pupil_registration_statuses` (`status_id`, `status_pupil_id`, `status_status_id`, `status_effectivedate`) VALUES
(1, 123, 1, '2013-05-06'),
(2, 123, 2, '2014-03-15'),
(3, 123, 5, '2013-03-15'),
(4, 123, 6, '2013-05-06'),
(5, 234, 2, '2013-02-02'),
(6, 234, 4, '2013-04-17'),
(7, 345, 2, '2014-02-01'),
(8, 345, 3, '2013-06-01');
It is possible that statuses can be inserted, thus the sequence of dates does not necessarily follow the same sequence of IDs.
For example: status_id 1 might has a date of 2013-05-06, but status_id 3 might have a date of 2013-03-15.
status_id values are, however, sequential within any particular date. Thus if a pupil's registration status changes multiple times on one day then the last row will will reflect their status for that date.
It is necessary to find out a particular student's registration status on a particular date. The following query works for an individual pupil:
SELECT *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
LIMIT 1;
This returns the expected row of status_id = 4
However, I now need to issue a (single) query to return the status for all pupils on a particular date.
The following query is proposed, but doesn't obey the "last status_id in a day" requirement:
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
This query, however, returns 2 rows for pupil 123.
EDIT
To clarify, if the input is the date '2013-05-06', I expect to get the rows 4 and 6 from the query.
http://sqlfiddle.com/#!2/68ee6/2
Is this what you're after?
SELECT a.*
FROM pupil_registration_statuses a
JOIN
( SELECT prs.status_pupil_id
, MIN(prs.status_id) min_status_id
FROM pupil_registration_statuses prs
JOIN
( SELECT status_pupil_id
, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP
BY status_pupil_id
) qprs
ON prs.status_pupil_id = qprs.status_pupil_id
AND prs.status_effectivedate = qprs.last_date
GROUP
BY prs.status_pupil_id
) b
ON b.min_status_id = a.status_id;
http://sqlfiddle.com/#!2/68ee6/7
(Incidentally, there's an ugly and undocumented hack for this kind of problem which goes something like this:
SELECT x.* FROM (SELECT * FROM prs WHERE status_effectivedate <= '2013-05-06' ORDER BY status_pupil_id, status_effectivedate DESC, status_id)x GROUP BY status_pupil_id;
...but I didn't tell you that! ;) )
If I understood right, you want to...
1) Get 1 row per person.
2) Get the status changes from the specific day you manually input.
3) Get the last status changes from within the specific day.
If that's right, you need the query you already have ordering by date and then by id, just with a distinct.
SELECT DISTINCT on status_pupil_id *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
I have changed where clause, please try it.
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE Datediff(status_effectivedate, '2013-05-06') <= 0
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
EDIT
Try this
SELECT *
FROM
(
select status_pupil_id,max(status_id) as status_id from pupil_registration_statuses innr
--where Datediff(dd,status_effectivedate, '2013-05-06') >= 0
group by status_pupil_id
)as ca
inner join pupil_registration_statuses prs on prs.status_id = ca.status_id
where Datediff(dd,prs.status_effectivedate, '2013-05-06') >= 0