Related
I have a table that contains an orderId, a timestamp and a customerId, like this:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata (
`orderId` int,
`createdOn` datetime(6),
`customerId` int,
PRIMARY KEY (`orderId`)
);
INSERT INTO testdata (orderId, createdOn, customerId) VALUES
('1000001','2020-01-01 17:08:41.460000','101'),
('1000002','2020-01-02 18:01:00.180000','102'),
('1000003','2020-01-03 12:26:02.460000','103'),
('1000004','2020-01-04 13:32:42.610000','104'),
('1000005','2020-01-05 20:21:28.540000','101'),
('1000006','2020-01-06 11:54:20.530000','102'),
('1000007','2020-02-01 20:54:42.470000','102'),
('1000008','2020-02-02 10:21:29.470000','102'),
('1000009','2020-02-03 16:22:23.880000','102'),
('1000010','2020-02-04 16:22:23.880000','103'),
('1000011','2020-02-05 17:08:41.460000','103'),
('1000012','2020-02-06 18:01:00.180000','103'),
('1000013','2020-03-01 12:26:02.460000','102'),
('1000014','2020-03-02 13:32:42.610000','102'),
('1000015','2020-03-03 20:21:28.540000','103'),
('1000016','2020-03-04 11:54:20.530000','103'),
('1000017','2020-03-05 20:54:42.470000','104'),
('1000018','2020-03-06 10:21:29.470000','104'),
('1000019','2020-04-01 16:22:23.880000','103'),
('1000020','2020-04-02 16:22:23.880000','103'),
('1000021','2020-04-03 17:08:41.460000','103'),
('1000022','2020-04-04 18:01:00.180000','104'),
('1000023','2020-04-05 12:26:02.460000','104'),
('1000024','2020-04-06 13:32:42.610000','104'),
('1000025','2020-05-01 20:21:28.540000','103'),
('1000026','2020-05-02 11:54:20.530000','103'),
('1000027','2020-05-03 20:54:42.470000','104'),
('1000028','2020-05-04 10:21:29.470000','104'),
('1000029','2020-05-05 16:22:23.880000','105'),
('1000030','2020-05-06 16:22:23.880000','105'),
('1000031','2020-05-01 20:21:28.540000','104'),
('1000032','2020-05-02 11:54:20.530000','104'),
('1000033','2020-05-03 20:54:42.470000','104'),
('1000034','2020-05-04 10:21:29.470000','105'),
('1000035','2020-05-05 16:22:23.880000','105'),
('1000036','2020-05-06 16:22:23.880000','105')
;
Now I want to calculate for each month the number of customers that have been active (i.e., have an order) within the last 3 months (i.e., current month or the preceding two months).
I manage to calculate the active users for the current month, like this:
SELECT
EXTRACT(YEAR_MONTH FROM createdOn) AS order_createdOn_ym
,COUNT(DISTINCT customerId) AS mau
FROM testdata
GROUP BY order_createdOn_ym
ORDER BY order_createdOn_ym asc
;
(Fiddle over here.)
However, I'm completely stumped as to how you can approach calculating the 3-months-active users.
Any help is greatly appreciated!
Here is one option:
select c.createdmonth, count(distinct customerid) as mau
from (
select distinct date_format(createdon, '%Y-%m-01') as createdmonth
from testdata
) c
left join testdata t
on t.createdon >= c.createdmonth - interval 2 month
and t.createdon < c.createdmonth + interval 1 month
group by c.createdmonth
The idea is to enumerate the distinct months, then bring the table with a left join that recovers the last 2 month and the current month. You can then aggregate and count the number of distinct customers per group.
Thanks to #GMB for providing the solution. Purely as a matter of taste I prefer to have the month interval the following way though:
SELECT date_format(c.end_of_createdOn_month, '%Y-%m') as order_month,
count(distinct customerid) as mau_3m
FROM (
select distinct LAST_DAY(createdOn) as end_of_createdOn_month
from testdata
) c
LEFT JOIN testdata t
ON t.createdon >= (c.end_of_createdOn_month - interval 3 month)
AND t.createdon <= c.end_of_createdOn_month
GROUP BY c.end_of_createdOn_month;
I have a database which looks like this:
https://i.imgur.com/dwaqUF2.png
I want to select all rooms which don't have any reservation for a given time period.
Here is what I tried:
SELECT idRoom, type, beds FROM Room r
INNER JOIN Reservation_has_Room has on r.idRoom = has.Room_idRoom
INNER JOIN Rezervation re on has.Reservation_has_Room = re.idReservation
WHERE (re.checkIn<'2019-06-04'
AND re.checkOut<'2019-06-01')
or
(re.checkIn>'2019-06-04'
AND re.CheckOut>'2019-06-01');
But this script returns a room every time it finds a reservation which is not overlapping the date.
edit:
I think I might be misunderstood.
This is the case:
In Rooms i have this records:
idRoom, type, beds
'1', 'Standard', '1'
'2', 'Standard', '1'
Reservation:
idReservation, checkIn, checkOut
'1', '2019-05-22', '2019-06-03'
'2', '2021-05-22', '2021-06-03'
'3', '2022-05-22', '2022-06-03'
Reservation_has_Room
Reservation_idReservation, Room_idRoom
'1', '1'
'2', '1'
'3', '1'
So as you can see for room nr 1 I have 3 reservation. One is overlapping and rest are not. In this case, the script from #AlexYes returns this:
idRoom, type, beds
'1', 'Standard', '1'
'1', 'Standard', '1'
'2', 'Standard', '1'
So it returns room number one, even two times.
My expected result is :
idRoom, type, beds
'2', 'Standard', '1'
So only room nr 2 is available.
I would use not exists:
SELECT r.*
FROM Room r
WHERE NOT EXISTS (SELECT 1
FROM Reservation_has_Room rhr INNER JOIN
Rezervation re
ON rhr.Reservation_has_Room = re.idReservation
WHERE r.idRoom = rhr.Room_idRoom AND
re.checkIn <= '2019-06-04' AND
re.checkOut >= '2019-06-01'
);
The logic for the overlaps is simple If someone checks in one or before the last date and checks out on or after the first day, then there is an overlap and the room is not available.
This accounts for all the possible ways the intervals could overlap. Note that this includes the all four days; you might or might not want to include the first and last days depending on your rules for check in/check out.
This is the overlapping range problem, and your WHERE clause should be:
WHERE
re.checkIn <= '2019-06-04' AND
re.checkOut >= '2019-06-01'
Your full query:
SELECT
idRoom,
type,
beds
FROM Room r
INNER JOIN Reservation_has_Room has
ON r.idRoom = has.Room_idRoom
INNER JOIN Rezervation re
ON has.Reservation_has_Room = re.idReservation
WHERE
re.checkIn <= '2019-06-04' AND
re.checkOut >= '2019-06-01';
I am pretty sure that you need only 2 tables here. Properties of rooms and properties of reservations. If a room is in the reservations table than it has a reservation obviously. so:
room_id start_date end_date
1 s_d1 e_d1
1 s_d2 e_d2
2 s_d3 e_d3
you want to find room_id's of the customer's date range that is not booked. (normally till 12 pm)
cust_s_d NOT BETWEEN s_d and e_d or
cust_e_d NOT BETWEEN s_d and e_d
select room_id from reservations
where
cust_s_d NOT BETWEEN s_d and e_d
AND
cust_e_d NOT BETWEEN s_d and e_d;
join the result with the room's properties and thats it.
Yes, it does because you're doing an inner join and filter the reservations. What you need is rooms left join reservations on the time period conditions and then filter units that don't have any reservations joined:
SELECT idRoom, type, beds
FROM Room r
LEFT JOIN ( -- subquery that returns reservation business entities filtered by your time period
SELECT *
FROM Reservation_has_Room has
JOIN Rezervation re
ON has.Reservation_has_Room = re.idReservation
WHERE (re.checkIn <= '2019-06-01' AND re.checkOut >= '2019-06-04') -- (1)
OR re.checkIn between '2019-06-01' and '2019-06-04' -- (2)
OR re.checkOut between '2019-06-01' and '2019-06-04' -- (3)
) rr
ON r.idRoom = rr.Room_idRoom
WHERE rr.checkIn is null -- filter out units that don't have matching reservations
the reservations are first filtered by the 3 cases:
1) fully overlapping reservations
2) reservations that start somewhere in the middle of the period and end some time after
3) reservations that start some time before and end somewhere in the middle of the period
The condition suggested by Tim might also work. But the main point is to switch to LEFT JOIN with range filter in the join condition and have WHERE re.checkIn is null
To make your solution more dynamic, you would use subqueries to specify the time bounds just once and reuse these values in the filter:
WITH
start_date as (select '2019-06-01'::date)
,end_date as (select '2019-06-04'::date)
and have this as a join condition:
(
re.checkIn <= (select * from start_date) AND re.checkOut >= (select * from end_date) -- (1)
or re.checkIn between (select * from start_date) and (select * from end_date) -- (2)
or re.checkOut between (select * from start_date) and (select * from end_date) -- (3)
)
I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha
I have a single table with rows like this: (Date, Score, Name)
The Date field has two possible dates, and it's possible that a Name value will appear under only one date (if that name was recently added or removed).
I'm looking to get a table with rows like this: (Delta, Name), where delta is the score change for each name between the earlier and later dates. In addition, only a negative change interests me, so if Delta>=0, it shouldn't appear in the output table at all.
My main challenge for me is calculating the Delta field.
As stated in the title, it should be an SQL query.
Thanks in advance for any help!
I assumed that each name can have it's own start/end dates. It can be simplified significantly if there are only two possible dates for the entire table.
I tried this out in SQL Fiddle here
SELECT (score_end - score_start) delta, name_start
FROM
( SELECT date date_start, score score_start, name name_start
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date < t.date
AND x.name = t.name
)
) AS start_date_t
JOIN
( SELECT date date_end, score score_end, name name_end
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date > t.date
AND x.name = t.name
)
) end_date_t ON start_date_t.name_start = end_date_t.name_end
WHERE score_end-score_start < 0
lets say you have a table with date_value, sum_value
Then it should be something like that:
select t.date_value,sum_value,
sum_value - COALESCE((
select top 1 sum_value
from tmp_num
where date_value > t.date_value
order by date_value
),0) as sum_change
from tmp_num as t
order by t.date_value
The following uses a "trick" in MySQL that I don't really like using, because it turns the score into a string and then back into a number. But, it is an easy way to get what you want:
select t.name, (lastscore - firstscore) as diff
from (select t.name,
substring_index(group_concat(score order by date asc), ',', 1) as firstscore,
substring_index(group_concat(score order by date desc), ',', 1) as lastscore
from table t
group by t.name
) t
where lastscore - firstscore < 0;
If MySQL supported window functions, such tricks wouldn't be necessary.
I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID