MySQL 5.5 - count open items per day with condition - mysql

I have been working on one item that is easy in excel and I can not do it in MySQL. This is a follow up question with new values and new requirements to this one:
MySQL 5.5 - count open items per day
So, again I have got the same table in excel and I want to achive Count_open in MySQL.
Excel's formula is =COUNTIFS($A$2:$A$30000,"<="&E2,$B$2:$B$30000,">="&E2)
So, in my T1 table I have got two dates, open and close and I want to calculate how many where open per date.
Previously I used temp table for the last 7 days but this time I need to just stick to T1 table.
To get T1 table, I use the following code:
CREATE TABLE T1
(
ID int (10),
Open_Date date,
Close_Date date);
insert into T1 values (1, '2018-12-17', '2018-12-18');
insert into T1 values (2, '2018-12-18', '2018-12-18');
insert into T1 values (3, '2018-12-18', '2018-12-18');
insert into T1 values (4, '2018-12-19', '2018-12-20');
insert into T1 values (5, '2018-12-19', '2018-12-21');
insert into T1 values (6, '2018-12-20', '2018-12-22');
insert into T1 values (7, '2018-12-20', '2018-12-22');
insert into T1 values (8, '2018-12-21', '2018-12-25');
insert into T1 values (9, '2018-12-22', '2018-12-26');
insert into T1 values (10, '2018-12-23', '2018-12-27');
So far I have tried below code but it does not yield the correct results.
SELECT T1.Open_Date, count(*) FROM T1
WHERE
T1.Open_Date>='2018-12-01' and t1.Close_Date <='2019-03-17'
GROUP BY T1.Open_Date;
I am lost at the moment and your help is much needed!

The difference between Excel and a database is that you have manually generated the dates first in Excel. You could do that too in mysql and write a list of queries each for every date. That is basically the same as you do in your excel.
But luckily mysql isn't excel, so we can automate that. First we must generate a interval of dates. There is a big thread about that here: generate days from date range.
Then we just have to group the valid dates and voila:
Select a.Date, Count(t.ID)
from (
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a, T1 t
where a.Date between '2018-12-01' and '2019-03-17'
and a.Date between t.Open_Date and t.Close_Date
group by a.Date

Related

Get number of monday in a rangedate mysql

Get number of monday in a rangedate MySQL, I run this code but it give me result 0:
select count(*) from tarif where weekday(`end_tarif`<= '2019-02-21'AND `start_tarif`>='2019-02-05') = 0;
my table:
CREATE TABLE `tarif` (
`tarif_id` int(11) NOT NULL AUTO_INCREMENT,
`start_tarif` date NOT NULL,
`end_tarif` date NOT NULL,
`day_tarif` varchar(50) NOT NULL,
PRIMARY KEY (`tarif_id`)
);
INSERT INTO `tarif` VALUES (1, '2019-02-01', '2019-02-10', '10'),
(2, '2019-02-11', '2019-02-20', '20'),
(3, '2019-02-21', '2019-02-28', '10'),
(4, '2019-03-01', '2019-02-10', '15');
You can use a solution using a calendar table. So you can use a solution like the following:
1. create a table with calendar data
-- create the table "calendar"
CREATE TABLE `calendar` (
`dateValue` DATE
);
-- insert the days to the table "calendar"
INSERT INTO calendar
SELECT adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
HAVING gen_date BETWEEN '2019-01-01' AND '2019-12-31'
You can find the script to generate the calendar data on StackOverflow:
How to populate a table with a range of dates?
2. create the table with your data (with monday tarif)
-- create the table "tarif"
CREATE TABLE tarif (
tarif_id INT(11) NOT NULL AUTO_INCREMENT,
start_tarif DATE NOT NULL,
end_tarif DATE NOT NULL,
day_tarif VARCHAR(50) NOT NULL,
monday_tarif VARCHAR(50) NOT NULL,
PRIMARY KEY (tarif_id)
);
-- insert the tarif information
INSERT INTO tarif VALUES
(1, '2019-02-01', '2019-02-10', '10', '5'),
(2, '2019-02-11', '2019-02-20', '20', '5'),
(3, '2019-02-21', '2019-02-28', '10', '5'),
(4, '2019-03-01', '2019-02-10', '15', '5');
Note: To create a useful example I added the column monday_tarif and insert the value 5 on every date range.
3. get the result
Now you can get all days of your needed range (between 2019-02-05 and 2019-02-21) from the calendar table. With a LEFT JOIN you add your tarif table to all days of date range.
With a CASE WHEN and the condition DAYOFWEEK = 2 or DAYNAME = 'Monday' you can check if the current date is a Monday or not, to get the correct tarif value of the day.
SELECT SUM(CASE WHEN DAYOFWEEK(cal.dateValue) = 2 THEN tarif.monday_tarif ELSE tarif.day_tarif END) AS sumWithMondayTarif
FROM calendar cal
LEFT JOIN tarif ON cal.dateValue BETWEEN start_tarif AND end_tarif
WHERE cal.dateValue BETWEEN '2019-02-05' AND '2019-02-21';
You can also use a SELECT with a sub select of the calendar:
SELECT SUM(CASE WHEN DAYOFWEEK(cal.dateValue) = 2 THEN tarif.monday_tarif ELSE tarif.day_tarif END) AS sumWithMondayTarif FROM (
SELECT adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) dateValue FROM
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
HAVING dateValue BETWEEN '2019-02-05' AND '2019-02-21'
) cal LEFT JOIN tarif ON cal.dateValue BETWEEN start_tarif AND end_tarif
demo on dbfiddle.uk
Below mentioned query is for sundays count. you can modify it as per your requirement
select ROUND((
(unix_timestamp(`end_tarif`) - unix_timestamp(`start_tarif`) )/(24*60*60)
-7+WEEKDAY(`start_tarif`)-WEEKDAY(`end_tarif`)
)/7)
+ if(WEEKDAY(`start_tarif`) <= 6, 1, 0)
+ if(WEEKDAY(`end_tarif`) >= 6, 1, 0) as Sunday
from tarif
where `end_tarif`<= '2019-02-21' AND `start_tarif`>='2019-02-05' ;

how to get series of overlapping events in MySQL

I have a table with overlapping time periods. I would like to group the overlapping time events which are continuous (i.e. not separated with time gaps).
ID StartDate EndDate
1 2013-01-30 2013-01-31
2 2013-01-31 2013-01-31
3 2013-01-29 2013-01-31
4 2013-01-25 2013-01-28
5 2013-01-29 2013-01-30
6 2013-02-01 2013-02-01
7 2013-01-31 2013-02-02
8 2013-02-04 2013-02-05
9 2013-02-05 2013-02-06
10 2013-02-08 2013-02-09
01-24 01-25 01-26 01-27 01-28 01-29 01-30 01-31 02-01 02-02 02-03 02-04 02-05 02-06 02-07 02-08 02-09
1 --------------
2 -----
3 ---------------------
4 -----------------------------
5 ------------
6 -----
7 --------------------
8 -------------
9 -------------
10 --------------
As a result I would like to have following four time groups:
group 1 (IDs: 1, 2, 3, 5, 6, 7)
group 2 (Id: 4)
group 3 (IDs: 8, 9)
group 4: (Id: 10)
Is there an easy way in Sql of doing it? Here is a create sql for my example table:
DROP TABLE IF EXISTS tb_data_log;
CREATE TABLE tb_data_log (
`event_id` int(10) unsigned NOT NULL,
`startdate` date DEFAULT NULL,
`enddate` date DEFAULT NULL
);
INSERT INTO tb_data_log VALUES (1, '2013-01-30', '2013-01-31');
INSERT INTO tb_data_log VALUES (2, '2013-01-31', '2013-01-31');
INSERT INTO tb_data_log VALUES (3, '2013-01-29', '2013-01-31');
INSERT INTO tb_data_log VALUES (4, '2013-01-25', '2013-01-28');
INSERT INTO tb_data_log VALUES (5, '2013-01-29', '2013-01-30');
INSERT INTO tb_data_log VALUES (6, '2013-02-01', '2013-02-01');
INSERT INTO tb_data_log VALUES (7, '2013-01-31', '2013-02-02');
INSERT INTO tb_data_log VALUES (8, '2013-02-04', '2013-02-05');
INSERT INTO tb_data_log VALUES (9, '2013-02-05', '2013-02-06');
INSERT INTO tb_data_log VALUES (10, '2013-02-08', '2013-02-09');
EDIT #1:
It looks like the problem is a bit hard to understand, here is the desired output:
GroupID StartDate EndDate Overlapped Id
1 2013-01-29 2013-02-02 1, 2, 3, 5, 6, 7
2 2013-01-25 2013-01-28 4
3 2013-02-04 2013-02-06 8,9
4 2013-02-08 2013-02-09 10
Here is a solution. It should work and uses no stored procedure:
select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
from tb_data_log start join tb_data_log finish
on start.startdate <= finish.enddate -- first find all possible periods
where not exists (-- make sure there are two events in the period which do not overlap and between whom there is no event.
select * from tb_data_log a, tb_data_log b where
a.enddate < b.startdate and
a.startdate>=start.startdate and
b.enddate<=finish.enddate and not exists
(
select * from tb_data_log inside where
inside.event_id <> a.event_id
and inside.event_id<> b.event_id
and inside.enddate > a.enddate and inside.startdate < b.startdate
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end
The idea is to first find all possible periods by joining the table with itself. Then for each period P make sure that there is no pair of periods A,B so that A is before B (no overlapping), both are contained in P, and there is no event between them. Also make sure that is not the longest possible period.
Here is the previous solution I posted, it is worse. Keeping it for reference
It is probably not very efficient. I used the selected answer from here:
How to get list of dates between two dates in mysql select query
So note that this query will stop working after 300 years!!!
select per_start,per_end,group_concat(contained.event_id) from tb_data_log contained,(
select distinct start.startdate as per_start,
finish.enddate as per_end
from tb_data_log start, tb_data_log finish
where start.startdate <= finish.enddate -- first find all possible periods
and not exists (-- make sure there are no two consecutive days that are not contained in some event period.
select * from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) day1, adddate('1970-01- 01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0+1) day2 from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where day1 between start.startdate and finish.enddate and day2 between start.startdate and finish.enddate
and not exists (
select * from tb_data_log where tb_data_log.startdate <= cast(day1 as date) and tb_data_log.enddate >= cast(day2 as date)
)
)
and not exists (-- make sure there is no longer period
select * from tb_data_log later where later.startdate<=finish.enddate and later.enddate >finish.enddate
)
and not exists (-- make sure there is no longer period
select * from tb_data_log earlier where earlier.startdate<start.startdate and earlier.enddate >=start.startdate
)
) periods where contained.enddate<=per_end and contained.startdate>=per_start
group by per_start,per_end
The idea is to first find all possible periods by joining the table with itself. Then for each period make sure that there is no pair of consecutive days that are contained in that period but are not covered by some event period in the table. Also make sure that is not the longest possible period.
I think the performance of this query can be somewhat improved.
something close to the answer (just close)
select
tmp.group_id, group_concat(tmp.id)
from
(select
a.event_id as 'group_id', b.event_id as 'id'
from
tb_data_log a
LEFT join tb_data_log b ON (a.startdate BETWEEN b.startdate AND b.enddate)
or (a.enddate BETWEEN b.startdate AND b.enddate)) as tmp
group by group_id

how can we insert an array with last_insert_id into a table?

I have a stored procedure in mysql with this syntax:
insert into tbl1 (p1, p2) values (p2, p3);
set inserted_id = last_insert_id();
insert into tbl2 (id, image) values (inserted_id, 'list_of_image');
now i don't know how can i separate this image list(that doesn't have fixed count) for this insert.
any idea?
Assuming that 'list_of_image' is a string that contains comma-delimited values you can do following with pure SQL
DELIMITER $$
CREATE PROCEDURE sp_insert_images(IN p3 VARCHAR(64), IN p4 VARCHAR(64), IN images VARCHAR(512))
BEGIN
INSERT INTO Table1 (p1, p2)
VALUES (p3, p4);
INSERT INTO Table2 (id, image)
SELECT LAST_INSERT_ID(), SUBSTRING_INDEX(SUBSTRING_INDEX(i.images, ',', n.n), ',', -1) image
FROM
(
SELECT images images
) i CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(i.images) - LENGTH(REPLACE(i.images, ',', '')));
END$$
DELIMITER ;
Call your SP
CALL sp_insert_images('Some value1', 'Some value2', 'image1, image2, image3');
Here is SQLFiddle demo.
Note:
The example query will split up to 100 comma-separated values. If you need more or less you can adjust a limit by editing the inner subquery.
You might consider to create a permanent tally (numbers) table and use it instead of inner select (with an alias n) that produces a sequence of numbers on the fly.

SELECT de-normalized columns into separate records?

I am playing around with SQL a little just so I am not completely ignorant about it if I am ever asked in a job interview. My friend was recently asked the following question at an interview and he couldn't get it and I asked somebody at work who knows SQL decently and he didn't know. Can you guys answer this problem for me and then explain how it works? Please?
*The problem*
Database normalization (or lack of normalization) often presents a challenge for developers.
Consider a database table of employees that contains three fields:
EmployeeID
EmployeeName
EmailAddresses
Every employee, identified by a unique EmployeeID, may have one or more comma-separated, #rockauto.com email address(es) in the EmailAddresses field.
The database table is defined below:
CREATE TABLE Employees
(
EmployeeID int UNSIGNED NOT NULL PRIMARY KEY,
EmployeeName varchar(50) NOT NULL,
EmailAddresses varchar(40) NOT NULL ,
PRIMARY KEY(EmployeeID)
);
For testing purposes, here is some sample data:
INSERT INTO Employees (EmployeeID, EmployeeName, EmailAddresses) VALUES
('1', 'Bill', 'bill#companyx.com'),
('2', 'Fred', 'fred#companyx.com,freddie#companyx.com'),
('3', 'Fred', 'fredsmith#companyx.com'),
('4', 'Joe', 'joe#companyx.com,joe_smith#companyx.com');
Your task is to write a single MySQL SELECT query that will show the following output for the sample data above:
Employee EmailAddress
Bill bill#companyx.com
Fred (2) fred#companyx.com
Fred (2) freddie#companyx.com
Fred (3) fredsmith#companyx.com
Joe joe#companyx.com
Joe joe_smith#companyx.com
Please take note that because there is more than one person with the same name (in this case, "Fred"), the EmployeeID is included in parenthesis.
Your query is required to written in MySQL version 5.1.41 compatible syntax. You should assume that the ordering is accomplished using standard database ascending ordering: "ORDER BY EmployeeID ASC"
For this problem, you need to submit a single SQL SELECT query. Your query should be able to process a table of 1000 records in a reasonable amount of time.
only if you have less than 10000 emails.... is that acceptable?
select
if(t1.c > 1, concat(e.employeename, ' (', e.employeeid, ')'), e.employeename) as Employee,
replace(substring(substring_index(e.EmailAddresses, ',', n.row), length(substring_index(e.EmailAddresses, ',', n.row - 1)) + 1), ',', '') EmailAddress
from
(select employeename, count(*) as c from Employees group by employeename) as t1,
(select EmployeeID, length(EmailAddresses) - length(replace(EmailAddresses,',','')) + 1 as emails from Employees) as t2,
(SELECT #row := #row + 1 as row FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x4,
(SELECT #row:=0) as ff) as n,
Employees e
where
e.employeename = t1.employeename and
e.employeeid = t2.employeeid and
n.row <= t2.emails
order by e.employeeid;
EDIT:
With less useless numbers generated:
select
if(t1.c > 1, concat(e.EmployeeName, ' (', e.EmployeeID, ')'), e.EmployeeName) as Employee,
replace(substring(substring_index(e.EmailAddresses, ',', n.row), length(substring_index(e.EmailAddresses, ',', n.row - 1)) + 1), ',', '') as EmailAddress
from
(select EmployeeName, count(*) as c from Employees group by EmployeeName) as t1,
(select EmployeeID, length(EmailAddresses) - length(replace(EmailAddresses,',','')) + 1 as emails from Employees) as t2,
(select `1` as row from (select 1 union all select 2 union all select 3 union all select 4) x) as n,
Employees e
where
e.EmployeeName = t1.EmployeeName and
e.EmployeeID = t2.EmployeeID and
n.row <= t2.emails
order by e.EmployeeID;
And what did we learn? Poor database design results awful queries. And you can do stuff with SQL, that are probably supported only because people do poor database designs... :)

How to generate data in MySQL?

Here is my SQL:
SELECT
COUNT(id),
CONCAT(YEAR(created_at), '-', MONTH(created_at), '-', DAY(created_at))
FROM my_table
GROUP BY YEAR(created_at), MONTH(created_at), DAY(created_at)
I want a row to show up even for days where there was no ID created. Right now I'm missing a ton of dates for days where there was no activity.
Any thoughts on how to change this query to do that?
SQL is notoriously bad at returning data that is not in the database. You can find the beginning and ending values for gaps of dates, but getting all the dates is hard.
The solution is to create a calendar table with one record for each date and OUTER JOIN it to your query.
Here is an example assuming that created_at is type DATE:
SELECT calendar_date, COUNT(`id`)
FROM calendar LEFT OUTER JOIN my_table ON calendar.calendar_date = my_table.created_at
GROUP BY calendar_date
(I'm guessing that created_at is really DATETIME, so you'll have to do a bit more gymnastics to JOIN the tables).
General idea
There are two main approaches to generating data in MySQL. One is to generate the data on the fly when running the query and the other one is to have it in the database and using it when necessary. Of course, the second one would be faster than the first one if you're going to run your query frequently. However, the second one will require a table in the database which only purpose will be to generate the missing data. It will also require you to have privileges enough to create that table.
Dynamic data generation
This approach involves making UNIONs to generate a fake table that can be used to join the actual table with. The awful and repetitive query is:
select aDate from (
select #maxDate - interval (a.a+(10*b.a)+(100*c.a)+(1000*d.a)) day aDate from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) b, /*100 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) c, /*1000 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) d, /*10000 day range*/
(select #minDate := '2001-01-01', #maxDate := '2002-02-02') e
) f
where aDate between #minDate and #maxDate
Anyway, it is simpler than it seems. It makes cartesian products of derived tables with 10 numeric values so the result will have 10^X rows where X is the amount of derived tables in the query. In this example there is 10000 day range so you would be able to represent periods of over 27 years. If you need more, add another UNION to the query and update the interval, and if you don't need so many you can remove UNIONs or individual values from the derived tables. Just to clarify, you can fine tune the date period by applying a filter with a WHERE clause on #minDate and #maxDate variables (but don't use a longer period than the one you created with the cartesian products).
Static data generation
This solution will require you to generate a table in your database. The approach is similar to the previous one. You'll have to first insert data into that table: a range of integers ranging from 1 to X where X is the maximum needed range. Again, if you are unsure just insert 100000 values and you'll be able to create day ranges for over 273 years. So, once you've got the integer sequence, you can transform it into a date range like this:
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
Assuming a table named seq with a column named value. On top the from date and at the bottom the to date.
Turning this into something useful
Ok, now we have our date periods generated but we're still missing a way to query data and display the missing values as an actual 0. This is where left join comes to the rescue. To make sure we're all on the same page, a left join is similar to an inner join but with only one difference: it will preserve all records from the left table of the join, regardless of whether there is a matching record on the table of the right. In other words, an inner join will remove all non-matched rows on the join while the left join will keep the ones on the left table and, for the records on the left that have no matching record on the right table, the left join will fill that "space" with a null value.
So we should join our domain table (the one that has "missing" data) with our newly generated table putting the latter on the left part of the join and the former on the right, so that all elements are considered, regardless of their presence in the domain table.
For example, if we had a table domainTable with fields ID, birthDate and we would like to see a count of all the birthDate in the first 5 days of 2012 per day and if the count is 0 to show that value, then this query could be run:
select allDays.aDay, count(dt.id) from (
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
) allDays
left join domainTable dt on allDays.aDay = dt.birthDate
group by allDays.aDay
This generates a derived table with all the requried days (notice I'm using the static data generation) and performs a left join against our domain table, so all days will be displayed, regardless of whether they have a matching values in our domain tables. Also note the count should be done on the field that will have null values as those are not counted.
Notes to be considered
1) The queries can be used to query other intervals (months, years) performing small changes to the code
2) Instead of hardcoding the dates you can query for min and max values from the domain tables like this:
select (select min(aDate) from domainTable) + interval value - 1 day aDay
from seq
having aDay <= (select max(aDate) from domainTable)
This would avoid generating more records than necessary.
Actually answering your question
I think you should have already figured out how to do what you want. Anyway, here are the steps so that others can benefit from them too. Firstly, create the integer table. Secondly, run this query:
select allDays.aDay, count(mt.id) aCount from (
select (select date(min(created_at)) from my_table) + interval value - 1 day aDay
from seq s
having aDay <= (select date(max(created_at)) from my_table)
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
I guess created_at is a datetime and that's why you're concatenating that way. However, that happens to be the way MySQL natively stores dates, so I'm just grouping by the date field but casting the created_at to an actual date datatype. You can play with it using this fiddle.
And here is the solution generating data dynamically:
select allDays.aDay, count(mt.id) aCount from (
select #maxDate - interval a.a day aDay from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select #minDate := (select date(min(created_at)) from my_table),
#maxDate := (select date(max(created_at)) from my_table)) e
where #maxDate - interval a.a day between #minDate and #maxDate
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
As you can see the skeleton of the query is the same as the previous one. The only thing that changes is how the derived table allDays is generated. Now, the way the derived table is generated is also slightly different from the one I added before. This is because in the example filddle I only needed a 10-day range. As you can see, it is more readable than adding a 1000 day range. Here is the fiddle for the dynamic solution so that you can play with it too.
Hope this helps!
The way to do it in one query:
SELECT COUNT(my_table.id) AS total,
CONCAT(YEAR(dates.ddate), '-', MONTH(dates.ddate), '-', DAY(dates.ddate))
FROM (
-- Creates "on the fly" 65536 days beginning from 2000-01-01 (179 years)
SELECT DATE_ADD("2000-01-01", INTERVAL (b1.b + b2.b + b3.b + b4.b + b5.b + b6.b + b7.b + b8.b + b9.b + b10.b + b11.b + b12.b + b13.b + b14.b + b15.b + b16.b) DAY) AS ddate FROM
(SELECT 0 AS b UNION SELECT 1) b1,
(SELECT 0 AS b UNION SELECT 2) b2,
(SELECT 0 AS b UNION SELECT 4) b3,
(SELECT 0 AS b UNION SELECT 8) b4,
(SELECT 0 AS b UNION SELECT 16) b5,
(SELECT 0 AS b UNION SELECT 32) b6,
(SELECT 0 AS b UNION SELECT 64) b7,
(SELECT 0 AS b UNION SELECT 128) b8,
(SELECT 0 AS b UNION SELECT 256) b9,
(SELECT 0 AS b UNION SELECT 512) b10,
(SELECT 0 AS b UNION SELECT 1024) b11,
(SELECT 0 AS b UNION SELECT 2048) b12,
(SELECT 0 AS b UNION SELECT 4096) b13,
(SELECT 0 AS b UNION SELECT 8192) b14,
(SELECT 0 AS b UNION SELECT 16384) b15,
(SELECT 0 AS b UNION SELECT 32768) b16
) dates
LEFT JOIN my_table ON dates.ddate = my_table.created_at
GROUP BY dates.ddate
ORDER BY dates.ddate
The next code is only necessary if you want to test and don't have the "my_table" indicated on the question:
create table `my_table` (
`id` int (11),
`created_at` date
);
insert into `my_table` (`id`, `created_at`) values('1','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('2','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('3','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('4','2001-01-01');
insert into `my_table` (`id`, `created_at`) values('5','2100-06-06');
Testbed:
create table testbed (id integer, created_at date);
insert into testbed values
(1, '2012-04-01'),
(1, '2012-04-30'),
(2, '2012-04-02'),
(3, '2012-04-03'),
(3, '2012-04-04'),
(4, '2012-04-04');
I also use any_table, which I created artificially like this:
create table any_table (id integer);
insert into any_table values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
insert into any_table select * from any_table; -- repeat this insert 7-8 times
You can use any table in your database that is expected to have more rows then max(created_dt) - min(created_dt) range, at least 365 to cover a year.
Query:
SELECT concat(year(dr._date),'-',month(dr._date),'-',day(dr._date)),
-- or, instead of concat(), simply: dr._date
count(id)
FROM (
SELECT date_add(r.mindt, INTERVAL #dist day) _date,
#dist := #dist + 1 AS days_away
FROM any_table t
JOIN (SELECT min(created_at) mindt,
max(created_at) maxdt,
#dist := 0
FROM testbed) r
WHERE date_add(r.mindt, INTERVAL #dist day) <= r.maxdt) dr
LEFT JOIN testbed tb ON dr._date = tb.created_at
GROUP BY dr._date;