Related
I have to make some SQL query.
I'll only put here tables and results I need - I am sure this is the best way for a clear explanation (at the bottom of the question I provided SQL queries for database filling).
short description:
TASK: After full join concatenation I receive a result where (for example) tableA.point column (that is used in the SELECT statement) in some cells returns NULL. In these cases, I need to change tableA.point column to the tableB.point (from the joined table).
So, tables:
(Columns point + date are composite key.)
outcome_o:
income_o:
The result I need an example (we can see - I need a concatenated table with both out and inc columns in rows)
My attempt:
SELECT outcome_o.point,
outcome_o.date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date
The result is the same as I need, except NULL in different point and date columns:
I tried to avoid this with CASE statement:
SELECT
CASE outcome_o.point
WHEN NULL
THEN income_o.point
ELSE outcome_o.point
END as point,
....
But this not works as I imagined (all cells became NULL in point column).
Could anyone help me with this solution? I know there is I have to use JOIN, CASE (case-mandatory) and possibly UNION commands.
Thanks
Tables creation:
CREATE TABLE income(
point INT,
date VARCHAR(60),
inc FLOAT
)
CREATE TABLE outcome(
point INT,
date VARCHAR(60),
ou_t FLOAT
)
INSERT INTO income VALUES
(1, '2001-03-22', 15000.0000),
(1, '2001-03-23', 15000.0000),
(1, '2001-03-24', 3400.0000),
(1, '2001-04-13', 5000.0000),
(1, '2001-05-11', 4500.0000),
(2, '2001-03-22', 10000.0000),
(2, '2001-03-24', 1500.0000),
(3, '2001-09-13', 11500.0000),
(3, '2001-10-02', 18000.0000);
INSERT INTO outcome VALUES
(1, '2001-03-14 00:00:00.000', 15348.0000),
(1, '2001-03-24 00:00:00.000', 3663.0000),
(1, '2001-03-26 00:00:00.000', 1221.0000),
(1, '2001-03-28 00:00:00.000', 2075.0000),
(1, '2001-03-29 00:00:00.000', 2004.0000),
(1, '2001-04-11 00:00:00.000', 3195.0400),
(1, '2001-04-13 00:00:00.000', 4490.0000),
(1, '2001-04-27 00:00:00.000', 3110.0000),
(1, '2001-05-11 00:00:00.000', 2530.0000),
(2, '2001-03-22 00:00:00.000', 1440.0000),
(2, '2001-03-29 00:00:00.000', 7848.0000),
(2, '2001-04-02 00:00:00.000', 2040.0000),
(3, '2001-09-13 00:00:00.000', 1500.0000),
(3, '2001-09-14 00:00:00.000', 2300.0000),
(3, '2002-09-16 00:00:00.000', 2150.0000);
The first step is to create a date range reference table. To do that, we can use Common Table Expression (cte):
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt)
SELECT mindt
FROM cte
Here I'm trying to generate the dynamic date range based on the minimum & maximum date value from both of your tables. This is particularly useful when you don't to keep on changing the date range but if you don't mind, you can just generate them simply like so:
WITH RECURSIVE cte AS (
SELECT '2001-03-14 00:00:00' dt
UNION
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt + INTERVAL 1 DAY <= '2002-09-16')
SELECT mindt
FROM cte
From here, I'll do a CROSS JOIN to get the distinct point value from both tables:
...
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p
Now we have a reference table with all the point and date range. Let's wrap those in another cte.
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, mindt
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2;
Next step is taking your current query attempt and LEFT JOIN it to the reference table:
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, CAST(mindt AS DATE) AS rdate
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2
LEFT JOIN outcome
ON cte2.point=outcome.point
AND cte2.rdate=outcome.date
LEFT JOIN income
ON cte2.point=income.point
AND cte2.rdate=income.date
/*added conditions*/
WHERE cte2.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY cte2.rdate;
I noticed that your date column is using VARCHAR() datatype instead of DATE or DATETIME. Which is why my initial test return only one result. However, I do notice that if I compare YYYY-MM-DD format against your table date value, it returns other results, which is why I did CAST(mindt AS DATE) AS rdate in cte2. I do recommend that you change the date column to MySQL standard date format though.
You probably find the query a bit too long but if you have a table where you store dates or as we call it calendar table, the query will be much shorter, perhaps like this:
SELECT *
FROM calendar
LEFT JOIN outcome
ON calendar.point=outcome.point
AND calendar.rdate=outcome.date
LEFT JOIN income
ON calendar.point=income.point
AND calendar.rdate=income.date
/*added conditions*/
WHERE calendar.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY calendar.rdate;
Demo fiddle
It seems I was using the wrong syntax for the solution. So, as I found out, dynamically column selection is accessible in the SELECT query:
correct CASE statement:
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
In this case query selects joined table column in the case the main table column is NULL.
Full query (returns result exactly I need):
SELECT
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
(
CASE
WHEN outcome_o.date IS NULL
THEN income_o.date
ELSE outcome_o.date
END
) as date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date
I want to give condition in a column selection while performing the select statement.
I want to perform average of TOTAL_TIMEONSITE, RENAME IT, and want to average it for the values existing in the month of Jun'20, Jul'20 and Aug'20 against a visitor.
Also the range of the whole query must be the month of Aug'20 only. So I want to put the constraint on TOTAL_TIMEONSITE so that it averages the values for the months of Jun'20, Jul'20 and Aug'20 against a visitor.
select FULLVISITORID AS VISITOR_ID,
VISITID AS VISIT_ID,
VISITSTARTTIME_TS,
USER_ACCOUNT_TYPE,
(select AVG(TOTAL_TIMEONSITE) AS AVG_TOTAL_TIME_ON_SITE_LAST_3M FROM "ACRO_DEV"."GA"."GA_MAIN" WHERE
(cast((visitstarttime_ts) as DATE) >= to_date('2020-06-01 00:00:00.000') and CAST((visitstarttime_ts) AS DATE) <= to_date('2020-08-31 23:59:00.000'))
GROUP BY TOTAL_TIMEONSITE),
CHANNELGROUPING,
GEONETWORK_CONTINENT
from "ACRO_DEV"."GA"."GA_MAIN"
where (FULLVISITORID) in (select distinct (FULLVISITORID) from "ACRO_DEV"."GA"."GA_MAIN" where user_account_type in ('anonymous', 'registered')
and (cast((visitstarttime_ts) as DATE) >= to_date('2020-08-01 00:00:00.000') and CAST((visitstarttime_ts) AS DATE) <= to_date('2020-08-31 23:59:00.000')));
The issue is that it is giving me the 'select subquery for TOTAL_TIMEONSITE' as the resultant column name and the values in that column are all same but I want the values to be unique for visitors.
So for Snowflake:
So I am going to assume visitstarttime_ts is a timestamp thus
cast((visitstarttime_ts) as DATE) is the same as `visitstarttime_ts::date'
select to_timestamp('2020-08-31 23:59:00') as ts
,cast((ts) as DATE) as date_a
,ts::date as date_b;
gives:
TS
DATE_A
DATE_B
2020-08-31 23:59:00.000
2020-08-31
2020-08-31
and thus the date range also can be simpler
select to_timestamp('2020-08-31 13:59:00') as ts
,cast((ts) as DATE) as date_a
,ts::date as date_b
,date_a >= to_date('2020-08-01 00:00:00.000') and date_a <= to_date('2020-08-31 23:59:00.000') as comp_a
,date_b >= to_date('2020-08-01 00:00:00.000') and date_b <= to_date('2020-08-31 23:59:00.000') as comp_b
,date_b >= '2020-08-01'::date and date_a <= '2020-08-31 23:59:00.000'::date as comp_c
,date_b between '2020-08-01'::date and '2020-08-31 23:59:00.000'::date as comp_d
TS
DATE_A
DATE_B
COMP_A
COMP_B
COMP_C
COMP_D
2020-08-31 13:59:00.000
2020-08-31
2020-08-31
TRUE
TRUE
TRUE
TRUE
Anyways, if I understand what you want I would write it like using CTE to make it more readable (to me):
with distinct_aug_ids as (
SELECT DISTINCT
fullvisitorid
FROM acro_dev.ga.ga_main
WHERE user_account_type IN ('anonymous', 'registered')
AND visitstarttime_ts::date BETWEEN '2020-08-01::date AND '2020-08-31'::date
), three_month_avg as (
SELECT
fullvisitorid
,AVG(total_timeonsite) AS avg_total_time_on_site_last_3m
FROM acro_dev.ga.ga_main
WHERE visitstarttime_ts::DATE BETWEEN to_date('2020-06-01 00:00:00.000') AND to_date('2020-08-31 23:59:00.000')
GROUP BY 1
)
select
m.fullvisitorid as visitor_id,
m.visitid as visit_id,
m.visitstarttime_ts,
m.user_account_type,
tma.avg_total_time_on_site_last_3m,
m.channelgrouping,
m.geonetwork_continent
FROM acro_dev.ga.ga_main as m
JOIN distinct_aug_ids AS dai
ON m.fullvisitorid = dai.fullvisitorid
JOIN three_month_avg AS tma
ON m.fullvisitorid = tma.fullvisitorid
;
But if you want that to be sub-selects, they are the same:
select
m.fullvisitorid as visitor_id,
m.visitid as visit_id,
m.visitstarttime_ts,
m.user_account_type,
tma.avg_total_time_on_site_last_3m,
m.channelgrouping,
m.geonetwork_continent
FROM acro_dev.ga.ga_main as m
JOIN (
SELECT DISTINCT
fullvisitorid
FROM acro_dev.ga.ga_main
WHERE user_account_type IN ('anonymous', 'registered')
AND visitstarttime_ts::date BETWEEN '2020-08-01::date AND '2020-08-31'::date
) AS dai
ON m.fullvisitorid = dai.fullvisitorid
JOIN (
SELECT
fullvisitorid
,AVG(total_timeonsite) AS avg_total_time_on_site_last_3m
FROM acro_dev.ga.ga_main
WHERE visitstarttime_ts::DATE BETWEEN to_date('2020-06-01 00:00:00.000') AND to_date('2020-08-31 23:59:00.000')
GROUP BY 1
)AS tma
ON m.fullvisitorid = tma.fullvisitorid
;
I am trying to track the rank change of a player in the leaderboard according to the month and year. Due to that there are some players play no games in some certain time, their rank may be lower during that period.
The simplfied version of the table can be created:
create table rating
(player_id Integer(20) ,
game_id integer(20),
start_date_time date,
rating int (10)
)
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 1,'2019-01-02',1250);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 2,'2019-01-03',2230);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 3,'2019-02-04',3362);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (1, 4,'2019-02-05',1578);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 5,'2019-01-03',2269);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 6,'2019-01-05',3641);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 7,'2019-02-07',1548);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (2, 8,'2019-02-09',1100);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 9,'2019-01-03',4690);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 10,'2019-01-05',3258);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 11,'2019-01-07',1520);
INSERT INTO rating (player_id, game_id,start_date_time,rating) VALUES (3, 12,'2019-01-09',3652);
The query I used is as followed:
select q1.rating_rank, q1.rating, q1.month,q1.year from (
SELECT player_id,month(start_date_time) as month, year(start_date_time) as year, round(avg(rating),2) as rating, count(*) as games_palyed,
rank() over(
partition by year(start_date_time),month(start_date_time)
order by avg(rating) desc ) as rating_rank
FROM rating
group by player_id,month(start_date_time), year(start_date_time)
having rating is not null) as q1
where player_id=1
The result I got is:
rating_rank rating month year
3 1740.00 1 2019
1 2470.00 2 2019
But the third guy(id=3) is clearly better among them but because he didnt play for february, so the first guy can be ranked no.1.
In this situation i still want the third to be the 1 on the leaderboard. How should I fix this?
I am thinking maybe I can use a period which is 15 days before the date and 15 days after the date instead of the exact month. But I'm not sure how exactly can that to be done?
Thank you.
It took time but i thing you can accomplish with this query:
First you have to generate all combinations between the possible datas and players:
This will give the dates:
WITH recursive mnths as (
select date_add(min(start_date_time),interval -DAY(min(start_date_time))+1 DAY) as mnth ,
date_add(max(start_date_time),interval -DAY(max(start_date_time))+1 DAY) as maxmnth from rating
UNION ALL -- start date begining of next month
SELECT DATE_ADD(mnth, INTERVAL +1 MONTH) , maxmnth
FROM mnths WHERE
mnth < maxmnth
)
An this will bind with all the players:
select * from mnths
cross join (Select distinct player_id from rating) as P
Then, besides the calculation involved you also need to get the rating value until the current period. This will be done by this subquery:
(
SELECT
round(avg(rating),2) as rating
from
rating
where start_date_time < mnths.mnth and player_id = P.player_id
group by player_id,month(start_date_time), year(start_date_time)
having round(avg(rating),2) is not null
order by year(start_date_time) desc, month(start_date_time) desc
limit 1 ) as PrevRating,
That will allow you to rank using not only the current rank but the previous one when it does not exist.
order by CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN
Binding all together you'll end up with this:
WITH recursive mnths as (
select date_add(min(start_date_time),interval -DAY(min(start_date_time))+1 DAY) as mnth ,
date_add(max(start_date_time),interval -DAY(max(start_date_time))+1 DAY) as maxmnth from rating
UNION ALL -- start date begining of next month
SELECT DATE_ADD(mnth, INTERVAL +1 MONTH) , maxmnth
FROM mnths WHERE
mnth < maxmnth
)
select q1.rating_rank, q1.rating, q1.month,q1.year from (
select AUX.player_id, AUX.month, AUX.year, CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN 0 ELSE AUX.PrevRating END ELSE AUX.rating END as rating,
AUX.games_palyed,
rank() over(
partition by AUX.month, AUX.year
order by CASE WHEN AUX.rating IS NULL THEN case WHEN AUX.PrevRating IS NULL THEN 0 ELSE AUX.PrevRating END ELSE AUX.rating END desc ) as rating_rank
FROM(
select
P.player_id,
MONTH(mnths.mnth) as month,
YEAR(mnths.mnth) as year,
(
SELECT
round(avg(rating),2) as rating
from
rating
where start_date_time < mnths.mnth and player_id = P.player_id
group by player_id,month(start_date_time), year(start_date_time)
having round(avg(rating),2) is not null
order by year(start_date_time) desc, month(start_date_time) desc
limit 1 ) as PrevRating,
V.rating rating,
case when V.games_palyed IS NULL THEN 0 ELSE V.games_palyed END as games_palyed
from mnths
cross join (Select distinct player_id from rating) as P
LEFT JOIN
(SELECT
player_id,
month(start_date_time) as month,
year(start_date_time) as year,
round(avg(rating),2) as rating,
count(*) as games_palyed
from
rating as R
group by player_id,month(start_date_time), year(start_date_time)
having rating is not null
) V On YEAR(mnths.mnth) = V.year and MONTH(mnths.mnth) = V.month and P.player_id = V.player_id
) as AUX
) as q1
where q1.player_id=1
You can se the result here
I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha
Could anybody give me an idea or hint how you could check for X consecutive days in a database table (MySQL) where logins (user id, timestamp) are stored?
Stackoverflow does it (e.g. badges like Enthusiast - if you log in for 30 consecutive days or so...). What functions would you have to use or what is the idea of how to do it?
Something like SELECT 1 FROM login_dates WHERE ...?
You can accomplish this using a shifted self-outer-join in conjunction with a variable. See this solution:
SELECT IF(COUNT(1) > 0, 1, 0) AS has_consec
FROM
(
SELECT *
FROM
(
SELECT IF(b.login_date IS NULL, #val:=#val+1, #val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT #val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
) a
GROUP BY a.consec_set
HAVING COUNT(1) >= 30
) a
This will return either a 1 or a 0 based on if a user has logged in for 30 consecutive days or more at ANYTIME in the past.
The brunt of this query is really in the first subselect. Let's take a closer look so we can better understand how this works:
With the following example data set:
CREATE TABLE tbl (
user_id INT,
login_date DATE
);
INSERT INTO tbl VALUES
(1, '2012-04-01'), (2, '2012-04-02'),
(1, '2012-04-25'), (2, '2012-04-03'),
(1, '2012-05-03'), (2, '2012-04-04'),
(1, '2012-05-04'), (2, '2012-05-04'),
(1, '2012-05-05'), (2, '2012-05-06'),
(1, '2012-05-06'), (2, '2012-05-08'),
(1, '2012-05-07'), (2, '2012-05-09'),
(1, '2012-05-09'), (2, '2012-05-11'),
(1, '2012-05-10'), (2, '2012-05-17'),
(1, '2012-05-11'), (2, '2012-05-18'),
(1, '2012-05-12'), (2, '2012-05-19'),
(1, '2012-05-16'), (2, '2012-05-20'),
(1, '2012-05-19'), (2, '2012-05-21'),
(1, '2012-05-20'), (2, '2012-05-22'),
(1, '2012-05-21'), (2, '2012-05-25'),
(1, '2012-05-22'), (2, '2012-05-26'),
(1, '2012-05-25'), (2, '2012-05-27'),
(2, '2012-05-28'),
(2, '2012-05-29'),
(2, '2012-05-30'),
(2, '2012-05-31'),
(2, '2012-06-01'),
(2, '2012-06-02');
This query:
SELECT a.*, b.*, IF(b.login_date IS NULL, #val:=#val+1, #val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT #val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
Will produce:
As you can see, what we are doing is shifting the joined table by +1 day. For each day that is not consecutive with the prior day, a NULL value is generated by the LEFT JOIN.
Now that we know where the non-consecutive days are, we can use a variable to differentiate each set of consecutive days by detecting whether or not the shifted table's rows are NULL. If they are NULL, the days are not consecutive, so just increment the variable. If they are NOT NULL, then don't increment the variable:
After we've differentiated each set of consecutive days with the incrementing variable, it's then just a simple matter of grouping by each "set" (as defined in the consec_set column) and using HAVING to filter out any set that has less than the specified consecutive days (30 in your example):
Then finally, we wrap THAT query and simply count the number of sets that had 30 or more consecutive days. If there was one or more of these sets, then return 1, otherwise return 0.
See a SQLFiddle step-by-step demo
You can add X to timestamp date and chech if distinct( dates ) in this date range is == X:
At least once every day of those 30 days:
SELECT distinct 1
FROM
login_dates l1
inner join
login_dates l2
on l1.user = l2.user and
l2.timestamp between l1.timestamp and
date_add( l1.timestamp, Interval X day )
where l1.user = some_user
group by
DATE(l1.timestamp)
having
count( distinct DATE(l1.timestamp) ) = X
(You don't speack about performance requirements ... ;) )
* Edited * The query for only last X days: east once every day of those 30 days
SELECT distinct 1
FROM
login_dates l1
where l1.user = some_user
and l1.timestamp > date_add( CURDATE() , Interval -X day )
group by
l1.user
having
count( distinct DATE(l1.timestamp) ) = X
That's a hard problem to solve with SQL alone.
The core of the problem is that you need to compare dynamic results sets to each other in one query. For example, you need to get all the logins/session IDs for one DATE, then JOIN or UNION them with a list to a grouping of logins from the DATE() (which you could use DATE_ADD to determine). You could do this for N number of consecutive dates. If you have any rows left, then those sessions have been logged in over that period.
Assume the following table:
sessionid int, created date
This query returns all the sessionids that have have rows for the last two days:
select t1.sessionid from logins t1
join logins t2 on t1.sessionid=t2.sessionid
where t1.created = DATE(date_sub(now(), interval 2 day))
AND t2.created = DATE(date_sub(now(), interval 1 day));
As you can see, the SQL will get gnarly for 30 days. Have a script generate it. :-D
This further assumes that every day, the login table is updated with the session.
I don't know if this actually solves your problem, but I hope I have helped frame the problem.
Good luck.
Wouldn't it be more simple to have an extra column consecutive_days in login_dates table with default value 1. This would indicate the length of consecutive dates ending on that day.
You create an insert after trigger on login_dates where you check if there is an entry for the previous day.
If there is none, then the field would have the default value 1 meaning that a new sequence is started on that date.
If here is an entry for previous day then you change the days_logged_in value from the default 1 to be 1 greater then that of previous day.
Ex:
| date | consecutive_days |
|------------|------------------|
| 2013-11-13 | 5 |
| 2013-11-14 | 6 |
| 2013-11-16 | 1 |
| 2013-11-17 | 2 |
| 2013-11-18 | 3 |