Minimum number of Meeting Rooms required to Accomodate all Meetings in MySQL - mysql

I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?

Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.

Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.

CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0

Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;

Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);

Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)

I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a

IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha

Related

How to determine columns dynamically for the SELECT query in MySQL with CASE statement. OR: How to replace columns dynamically in the SELECT query

I have to make some SQL query.
I'll only put here tables and results I need - I am sure this is the best way for a clear explanation (at the bottom of the question I provided SQL queries for database filling).
short description:
TASK: After full join concatenation I receive a result where (for example) tableA.point column (that is used in the SELECT statement) in some cells returns NULL. In these cases, I need to change tableA.point column to the tableB.point (from the joined table).
So, tables:
(Columns point + date are composite key.)
outcome_o:
income_o:
The result I need an example (we can see - I need a concatenated table with both out and inc columns in rows)
My attempt:
SELECT outcome_o.point,
outcome_o.date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date
The result is the same as I need, except NULL in different point and date columns:
I tried to avoid this with CASE statement:
SELECT
CASE outcome_o.point
WHEN NULL
THEN income_o.point
ELSE outcome_o.point
END as point,
....
But this not works as I imagined (all cells became NULL in point column).
Could anyone help me with this solution? I know there is I have to use JOIN, CASE (case-mandatory) and possibly UNION commands.
Thanks
Tables creation:
CREATE TABLE income(
point INT,
date VARCHAR(60),
inc FLOAT
)
CREATE TABLE outcome(
point INT,
date VARCHAR(60),
ou_t FLOAT
)
INSERT INTO income VALUES
(1, '2001-03-22', 15000.0000),
(1, '2001-03-23', 15000.0000),
(1, '2001-03-24', 3400.0000),
(1, '2001-04-13', 5000.0000),
(1, '2001-05-11', 4500.0000),
(2, '2001-03-22', 10000.0000),
(2, '2001-03-24', 1500.0000),
(3, '2001-09-13', 11500.0000),
(3, '2001-10-02', 18000.0000);
INSERT INTO outcome VALUES
(1, '2001-03-14 00:00:00.000', 15348.0000),
(1, '2001-03-24 00:00:00.000', 3663.0000),
(1, '2001-03-26 00:00:00.000', 1221.0000),
(1, '2001-03-28 00:00:00.000', 2075.0000),
(1, '2001-03-29 00:00:00.000', 2004.0000),
(1, '2001-04-11 00:00:00.000', 3195.0400),
(1, '2001-04-13 00:00:00.000', 4490.0000),
(1, '2001-04-27 00:00:00.000', 3110.0000),
(1, '2001-05-11 00:00:00.000', 2530.0000),
(2, '2001-03-22 00:00:00.000', 1440.0000),
(2, '2001-03-29 00:00:00.000', 7848.0000),
(2, '2001-04-02 00:00:00.000', 2040.0000),
(3, '2001-09-13 00:00:00.000', 1500.0000),
(3, '2001-09-14 00:00:00.000', 2300.0000),
(3, '2002-09-16 00:00:00.000', 2150.0000);
The first step is to create a date range reference table. To do that, we can use Common Table Expression (cte):
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt)
SELECT mindt
FROM cte
Here I'm trying to generate the dynamic date range based on the minimum & maximum date value from both of your tables. This is particularly useful when you don't to keep on changing the date range but if you don't mind, you can just generate them simply like so:
WITH RECURSIVE cte AS (
SELECT '2001-03-14 00:00:00' dt
UNION
SELECT dt + INTERVAL 1 DAY
FROM cte
WHERE dt + INTERVAL 1 DAY <= '2002-09-16')
SELECT mindt
FROM cte
From here, I'll do a CROSS JOIN to get the distinct point value from both tables:
...
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p
Now we have a reference table with all the point and date range. Let's wrap those in another cte.
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, mindt
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2;
Next step is taking your current query attempt and LEFT JOIN it to the reference table:
WITH RECURSIVE cte AS (
SELECT Min(mndate) mindt, MAX(mxdate) maxdt
FROM (SELECT MIN(date) AS mndate, MAX(date) AS mxdate
FROM outcome
UNION
SELECT MIN(date), MAX(date)
FROM income) v
UNION
SELECT mindt + INTERVAL 1 DAY, maxdt
FROM cte
WHERE mindt + INTERVAL 1 DAY <= maxdt),
cte2 AS (
SELECT point, CAST(mindt AS DATE) AS rdate
FROM cte
CROSS JOIN (SELECT DISTINCT point FROM outcome
UNION
SELECT DISTINCT point FROM income) p)
SELECT *
FROM cte2
LEFT JOIN outcome
ON cte2.point=outcome.point
AND cte2.rdate=outcome.date
LEFT JOIN income
ON cte2.point=income.point
AND cte2.rdate=income.date
/*added conditions*/
WHERE cte2.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY cte2.rdate;
I noticed that your date column is using VARCHAR() datatype instead of DATE or DATETIME. Which is why my initial test return only one result. However, I do notice that if I compare YYYY-MM-DD format against your table date value, it returns other results, which is why I did CAST(mindt AS DATE) AS rdate in cte2. I do recommend that you change the date column to MySQL standard date format though.
You probably find the query a bit too long but if you have a table where you store dates or as we call it calendar table, the query will be much shorter, perhaps like this:
SELECT *
FROM calendar
LEFT JOIN outcome
ON calendar.point=outcome.point
AND calendar.rdate=outcome.date
LEFT JOIN income
ON calendar.point=income.point
AND calendar.rdate=income.date
/*added conditions*/
WHERE calendar.point=1
AND COALESCE(outcome.date, income.date) IS NOT NULL
/*****/
ORDER BY calendar.rdate;
Demo fiddle
It seems I was using the wrong syntax for the solution. So, as I found out, dynamically column selection is accessible in the SELECT query:
correct CASE statement:
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
In this case query selects joined table column in the case the main table column is NULL.
Full query (returns result exactly I need):
SELECT
(
CASE
WHEN outcome_o.point IS NULL
THEN income_o.point
ELSE outcome_o.point
END
) as point,
(
CASE
WHEN outcome_o.date IS NULL
THEN income_o.date
ELSE outcome_o.date
END
) as date,
inc,
out
FROM income_o
FULL JOIN outcome_o ON income_o.point = outcome_o.point AND income_o.date = outcome_o.date

How to calculate 3 month active users

I have a table that contains an orderId, a timestamp and a customerId, like this:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata (
`orderId` int,
`createdOn` datetime(6),
`customerId` int,
PRIMARY KEY (`orderId`)
);
INSERT INTO testdata (orderId, createdOn, customerId) VALUES
('1000001','2020-01-01 17:08:41.460000','101'),
('1000002','2020-01-02 18:01:00.180000','102'),
('1000003','2020-01-03 12:26:02.460000','103'),
('1000004','2020-01-04 13:32:42.610000','104'),
('1000005','2020-01-05 20:21:28.540000','101'),
('1000006','2020-01-06 11:54:20.530000','102'),
('1000007','2020-02-01 20:54:42.470000','102'),
('1000008','2020-02-02 10:21:29.470000','102'),
('1000009','2020-02-03 16:22:23.880000','102'),
('1000010','2020-02-04 16:22:23.880000','103'),
('1000011','2020-02-05 17:08:41.460000','103'),
('1000012','2020-02-06 18:01:00.180000','103'),
('1000013','2020-03-01 12:26:02.460000','102'),
('1000014','2020-03-02 13:32:42.610000','102'),
('1000015','2020-03-03 20:21:28.540000','103'),
('1000016','2020-03-04 11:54:20.530000','103'),
('1000017','2020-03-05 20:54:42.470000','104'),
('1000018','2020-03-06 10:21:29.470000','104'),
('1000019','2020-04-01 16:22:23.880000','103'),
('1000020','2020-04-02 16:22:23.880000','103'),
('1000021','2020-04-03 17:08:41.460000','103'),
('1000022','2020-04-04 18:01:00.180000','104'),
('1000023','2020-04-05 12:26:02.460000','104'),
('1000024','2020-04-06 13:32:42.610000','104'),
('1000025','2020-05-01 20:21:28.540000','103'),
('1000026','2020-05-02 11:54:20.530000','103'),
('1000027','2020-05-03 20:54:42.470000','104'),
('1000028','2020-05-04 10:21:29.470000','104'),
('1000029','2020-05-05 16:22:23.880000','105'),
('1000030','2020-05-06 16:22:23.880000','105'),
('1000031','2020-05-01 20:21:28.540000','104'),
('1000032','2020-05-02 11:54:20.530000','104'),
('1000033','2020-05-03 20:54:42.470000','104'),
('1000034','2020-05-04 10:21:29.470000','105'),
('1000035','2020-05-05 16:22:23.880000','105'),
('1000036','2020-05-06 16:22:23.880000','105')
;
Now I want to calculate for each month the number of customers that have been active (i.e., have an order) within the last 3 months (i.e., current month or the preceding two months).
I manage to calculate the active users for the current month, like this:
SELECT
EXTRACT(YEAR_MONTH FROM createdOn) AS order_createdOn_ym
,COUNT(DISTINCT customerId) AS mau
FROM testdata
GROUP BY order_createdOn_ym
ORDER BY order_createdOn_ym asc
;
(Fiddle over here.)
However, I'm completely stumped as to how you can approach calculating the 3-months-active users.
Any help is greatly appreciated!
Here is one option:
select c.createdmonth, count(distinct customerid) as mau
from (
select distinct date_format(createdon, '%Y-%m-01') as createdmonth
from testdata
) c
left join testdata t
on t.createdon >= c.createdmonth - interval 2 month
and t.createdon < c.createdmonth + interval 1 month
group by c.createdmonth
The idea is to enumerate the distinct months, then bring the table with a left join that recovers the last 2 month and the current month. You can then aggregate and count the number of distinct customers per group.
Thanks to #GMB for providing the solution. Purely as a matter of taste I prefer to have the month interval the following way though:
SELECT date_format(c.end_of_createdOn_month, '%Y-%m') as order_month,
count(distinct customerid) as mau_3m
FROM (
select distinct LAST_DAY(createdOn) as end_of_createdOn_month
from testdata
) c
LEFT JOIN testdata t
ON t.createdon >= (c.end_of_createdOn_month - interval 3 month)
AND t.createdon <= c.end_of_createdOn_month
GROUP BY c.end_of_createdOn_month;

Get transactions balance for each month

I got a 2 column table with transactions where time of change (unix_time) and change value is stored.
create table transactions (
changed int(11),
points int(11)
);
insert into transactions values (UNIX_TIMESTAMP('2014-03-27 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2014-05-02 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2015-01-01 03:00:00'), -100);
insert into transactions values (UNIX_TIMESTAMP('2015-05-01 03:00:00'), +150);
To get current balance you need to sum all values and to get balance from the past you need to sum if change time for this value is less then requested like:
select
sum(case when changed < unix_timestamp('2013-12-01') then
points
else
0
end) as cash_balance_2013_11,
...
so for each month there need to be a separate SQL code. I would like to have SQL code that will give me balances for all months. (eg from fixed date till now)
EDIT:
HERE IS SQL FIDDLE
Can you just group by and order by month?
UPDATE: to get running totals you have to join the individual months to a set of totals-by-month, matching on "less than or equal to":-
select
m.single_month
, sum(month_of_change.total_points) as running total_by_month
from
(
select
sum(points) as total_points
, month_of_change
from
(
select
points
, MONTH(FROM_UNIXTIME(t.time_of_change)) as month_of_change -- assumes unix_time
from mytable t
) x
group by month_of_change
) monthly_totals
inner join
(
select distinct MONTH(FROM_UNIXTIME(t.time_of_change)) as single_month
) m
on monthly_totals.month_of_change <= m.single_month
group by m.single_month
(N.B: not tested)

Deleting records based on a group by in MYSQL

I'm having trouble coming up with a query which is going to allow me to keep only the most recent order from a user (maybe a better way to say this is delete all old orders):
CREATE TABLE orders(id integer, created_at datetime, user_id integer, label nvarchar(25));
INSERT INTO orders values(1, now(), 1, 'FRED FIRST');
INSERT INTO orders values(2, DATE_ADD(now(), INTERVAL 1 DAY), 1, 'FRED SECOND');
INSERT INTO orders values(3, DATE_ADD(now(), INTERVAL 2 DAY), 1, 'FRED THIRD');
INSERT INTO orders values(4, DATE_ADD(now(), INTERVAL 1 DAY), 3, 'BARNEY FIRST');
SELECT * FROM orders;
'1','2014-03-07 08:39:36','1','FRED FIRST'
'2','2014-03-08 08:39:36','1','FRED SECOND'
'3','2014-03-09 08:39:36','1','FRED THIRD'
'4','2014-03-08 08:39:36','3','BARNEY FIRST'
I would like to run a query which would leave me with FRED's THIRD order and BARNEY's FIRST order. FRED FIRST and FRED SECOND should be deleted because they are not the latest order from FRED.
Any thoughts about how I might be able to do this with a single query?
EDIT: After posting this, I found something that works (it does what I'm looking to do)-- but it seems a bit messy:
DELETE old_orders
FROM orders old_orders
left outer join(
SELECT MAX(created_at) as created_at, user_id
FROM orders
GROUP BY user_id) new_orders
ON new_orders.user_id = old_orders.user_id and new_orders.created_at = old_orders.created_at
WHERE new_orders.user_id is null;
Use a nested query, like this:
DELETE FROM orders
WHERE id NOT IN (
SELECT id FROM (
select id from orders o JOIN (
select user_id, max(created_at) t from orders group by user_id
) o1 ON o.user_id = o1.user_id AND o.created_at = o1.t
) AS tmp
)
Working Fiddle: http://sqlfiddle.com/#!2/56d913/1
One way you might achieve this is to set a flag for the row indicating that it is the most recent order. So when a new order is placed you would clear the flag on other orders for that customer and set the flag for the row that your inserting. Then your DELETE query could just delete all orders that don't have that flag set.

How to generate data in MySQL?

Here is my SQL:
SELECT
COUNT(id),
CONCAT(YEAR(created_at), '-', MONTH(created_at), '-', DAY(created_at))
FROM my_table
GROUP BY YEAR(created_at), MONTH(created_at), DAY(created_at)
I want a row to show up even for days where there was no ID created. Right now I'm missing a ton of dates for days where there was no activity.
Any thoughts on how to change this query to do that?
SQL is notoriously bad at returning data that is not in the database. You can find the beginning and ending values for gaps of dates, but getting all the dates is hard.
The solution is to create a calendar table with one record for each date and OUTER JOIN it to your query.
Here is an example assuming that created_at is type DATE:
SELECT calendar_date, COUNT(`id`)
FROM calendar LEFT OUTER JOIN my_table ON calendar.calendar_date = my_table.created_at
GROUP BY calendar_date
(I'm guessing that created_at is really DATETIME, so you'll have to do a bit more gymnastics to JOIN the tables).
General idea
There are two main approaches to generating data in MySQL. One is to generate the data on the fly when running the query and the other one is to have it in the database and using it when necessary. Of course, the second one would be faster than the first one if you're going to run your query frequently. However, the second one will require a table in the database which only purpose will be to generate the missing data. It will also require you to have privileges enough to create that table.
Dynamic data generation
This approach involves making UNIONs to generate a fake table that can be used to join the actual table with. The awful and repetitive query is:
select aDate from (
select #maxDate - interval (a.a+(10*b.a)+(100*c.a)+(1000*d.a)) day aDate from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) b, /*100 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) c, /*1000 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) d, /*10000 day range*/
(select #minDate := '2001-01-01', #maxDate := '2002-02-02') e
) f
where aDate between #minDate and #maxDate
Anyway, it is simpler than it seems. It makes cartesian products of derived tables with 10 numeric values so the result will have 10^X rows where X is the amount of derived tables in the query. In this example there is 10000 day range so you would be able to represent periods of over 27 years. If you need more, add another UNION to the query and update the interval, and if you don't need so many you can remove UNIONs or individual values from the derived tables. Just to clarify, you can fine tune the date period by applying a filter with a WHERE clause on #minDate and #maxDate variables (but don't use a longer period than the one you created with the cartesian products).
Static data generation
This solution will require you to generate a table in your database. The approach is similar to the previous one. You'll have to first insert data into that table: a range of integers ranging from 1 to X where X is the maximum needed range. Again, if you are unsure just insert 100000 values and you'll be able to create day ranges for over 273 years. So, once you've got the integer sequence, you can transform it into a date range like this:
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
Assuming a table named seq with a column named value. On top the from date and at the bottom the to date.
Turning this into something useful
Ok, now we have our date periods generated but we're still missing a way to query data and display the missing values as an actual 0. This is where left join comes to the rescue. To make sure we're all on the same page, a left join is similar to an inner join but with only one difference: it will preserve all records from the left table of the join, regardless of whether there is a matching record on the table of the right. In other words, an inner join will remove all non-matched rows on the join while the left join will keep the ones on the left table and, for the records on the left that have no matching record on the right table, the left join will fill that "space" with a null value.
So we should join our domain table (the one that has "missing" data) with our newly generated table putting the latter on the left part of the join and the former on the right, so that all elements are considered, regardless of their presence in the domain table.
For example, if we had a table domainTable with fields ID, birthDate and we would like to see a count of all the birthDate in the first 5 days of 2012 per day and if the count is 0 to show that value, then this query could be run:
select allDays.aDay, count(dt.id) from (
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
) allDays
left join domainTable dt on allDays.aDay = dt.birthDate
group by allDays.aDay
This generates a derived table with all the requried days (notice I'm using the static data generation) and performs a left join against our domain table, so all days will be displayed, regardless of whether they have a matching values in our domain tables. Also note the count should be done on the field that will have null values as those are not counted.
Notes to be considered
1) The queries can be used to query other intervals (months, years) performing small changes to the code
2) Instead of hardcoding the dates you can query for min and max values from the domain tables like this:
select (select min(aDate) from domainTable) + interval value - 1 day aDay
from seq
having aDay <= (select max(aDate) from domainTable)
This would avoid generating more records than necessary.
Actually answering your question
I think you should have already figured out how to do what you want. Anyway, here are the steps so that others can benefit from them too. Firstly, create the integer table. Secondly, run this query:
select allDays.aDay, count(mt.id) aCount from (
select (select date(min(created_at)) from my_table) + interval value - 1 day aDay
from seq s
having aDay <= (select date(max(created_at)) from my_table)
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
I guess created_at is a datetime and that's why you're concatenating that way. However, that happens to be the way MySQL natively stores dates, so I'm just grouping by the date field but casting the created_at to an actual date datatype. You can play with it using this fiddle.
And here is the solution generating data dynamically:
select allDays.aDay, count(mt.id) aCount from (
select #maxDate - interval a.a day aDay from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select #minDate := (select date(min(created_at)) from my_table),
#maxDate := (select date(max(created_at)) from my_table)) e
where #maxDate - interval a.a day between #minDate and #maxDate
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
As you can see the skeleton of the query is the same as the previous one. The only thing that changes is how the derived table allDays is generated. Now, the way the derived table is generated is also slightly different from the one I added before. This is because in the example filddle I only needed a 10-day range. As you can see, it is more readable than adding a 1000 day range. Here is the fiddle for the dynamic solution so that you can play with it too.
Hope this helps!
The way to do it in one query:
SELECT COUNT(my_table.id) AS total,
CONCAT(YEAR(dates.ddate), '-', MONTH(dates.ddate), '-', DAY(dates.ddate))
FROM (
-- Creates "on the fly" 65536 days beginning from 2000-01-01 (179 years)
SELECT DATE_ADD("2000-01-01", INTERVAL (b1.b + b2.b + b3.b + b4.b + b5.b + b6.b + b7.b + b8.b + b9.b + b10.b + b11.b + b12.b + b13.b + b14.b + b15.b + b16.b) DAY) AS ddate FROM
(SELECT 0 AS b UNION SELECT 1) b1,
(SELECT 0 AS b UNION SELECT 2) b2,
(SELECT 0 AS b UNION SELECT 4) b3,
(SELECT 0 AS b UNION SELECT 8) b4,
(SELECT 0 AS b UNION SELECT 16) b5,
(SELECT 0 AS b UNION SELECT 32) b6,
(SELECT 0 AS b UNION SELECT 64) b7,
(SELECT 0 AS b UNION SELECT 128) b8,
(SELECT 0 AS b UNION SELECT 256) b9,
(SELECT 0 AS b UNION SELECT 512) b10,
(SELECT 0 AS b UNION SELECT 1024) b11,
(SELECT 0 AS b UNION SELECT 2048) b12,
(SELECT 0 AS b UNION SELECT 4096) b13,
(SELECT 0 AS b UNION SELECT 8192) b14,
(SELECT 0 AS b UNION SELECT 16384) b15,
(SELECT 0 AS b UNION SELECT 32768) b16
) dates
LEFT JOIN my_table ON dates.ddate = my_table.created_at
GROUP BY dates.ddate
ORDER BY dates.ddate
The next code is only necessary if you want to test and don't have the "my_table" indicated on the question:
create table `my_table` (
`id` int (11),
`created_at` date
);
insert into `my_table` (`id`, `created_at`) values('1','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('2','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('3','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('4','2001-01-01');
insert into `my_table` (`id`, `created_at`) values('5','2100-06-06');
Testbed:
create table testbed (id integer, created_at date);
insert into testbed values
(1, '2012-04-01'),
(1, '2012-04-30'),
(2, '2012-04-02'),
(3, '2012-04-03'),
(3, '2012-04-04'),
(4, '2012-04-04');
I also use any_table, which I created artificially like this:
create table any_table (id integer);
insert into any_table values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
insert into any_table select * from any_table; -- repeat this insert 7-8 times
You can use any table in your database that is expected to have more rows then max(created_dt) - min(created_dt) range, at least 365 to cover a year.
Query:
SELECT concat(year(dr._date),'-',month(dr._date),'-',day(dr._date)),
-- or, instead of concat(), simply: dr._date
count(id)
FROM (
SELECT date_add(r.mindt, INTERVAL #dist day) _date,
#dist := #dist + 1 AS days_away
FROM any_table t
JOIN (SELECT min(created_at) mindt,
max(created_at) maxdt,
#dist := 0
FROM testbed) r
WHERE date_add(r.mindt, INTERVAL #dist day) <= r.maxdt) dr
LEFT JOIN testbed tb ON dr._date = tb.created_at
GROUP BY dr._date;