I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.
Related
I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.
I'm trying to write a query that aggregates data from a table.
Essentially I have a long list of devices that have been inventoried and eventually installed over the last couple of years.
I want to find the average amount of time between when the device was received and when it was installed, and then have that data sorted by the month the device was installed. BUT in each month's row, I also want to include the data from the previous months.
So essentially what I want to see is: (sorry for terrible formatting)
MonthInstalled | TimeToInstall | Total#Devices
-----------------+---------------+----------------------------
Jan | 10 Days | 5
Feb(=Jan+Feb) | 15 Days | 18 (5 in Jan + 13 in Feb)
Mar(=Jan+Feb+Mar)| 13 Days | 25 (5 + 13 + 7)
...
The query I currently have written looks like this:
INSERT INTO DevicesInstall
SELECT ROUND(AVG(DATEDIFF(dvc.dt_install , dvc.dt_receive)), 1) AS 'Install',
COUNT(dvc.dvc_model) AS 'Total Devices',
MAX(dvc.dt_install) AS 'Date',
loc.loc_campus AS 'Campus'
FROM dvc_info dvc, location loc
WHERE dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < '20160201'
;
Although this is functional, I have to iterate this for each month manually, so it is not scale-able. Is there a way to condense this at all?
We can return the dates using an inline view (derived table), and then join to the dvc_info table, so we can get the "cumulative" results.
To get the results for:
Jan
Jan+Feb
Jan+Feb+Mar
We need to return three copies of the rows for Jan, and two copies of the rows for Feb, and then collapse the those rows into an appropriate group.
The loc_campus is being included in the SELECT list... not clear why that is needed. If we want results "by campus", then we need to include that expression in the GROUP BY clause. Otherwise, the value returned for that non-aggregate is indeterminate... we will get a value for some row "in the group", but it could be any row.
Something like this:
SELECT d.dt AS `before_date`
, loc.loc_campus AS `Campus`
, ROUND(AVG(DATEDIFF(dvc.dt_install,dvc.dt_receive)),1) AS `Install`
, COUNT(dvc.dvc_model) AS `Total Devices`
, MAX(dvc.dt_install) AS `latest_dt_install`
FROM ( SELECT '2016-01-01' + INTERVAL 1 MONTH AS dt
UNION ALL SELECT '2016-01-01' + INTERVAL 2 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 3 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 4 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 5 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 6 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 7 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 8 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 9 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 10 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 11 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 12 MONTH
) d
CROSS
JOIN location loc
LEFT
JOIN dvc_info dvc
ON dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < d.dt
GROUP
BY d.dt
, loc.loc_campus
ORDER
BY d.dt
, loc.loc_campus
Note that the value returned for d.dt will be the "up until" date. We're going to get '2016-02-01' returned for the January results. If we want to return a value of January date, we can use an expression in the SELECT list...
SELECT DATE_FORMAT(d.dt + INTERVAL -1 MONTH,'%Y-%m') AS `month`
Lots of options on query alternatives.
But it looks like the "big hump" is that to get cumulative results, we need to return multiple copies of the dvc_info rows, so the rows can be collapsed into each "grouping".
I recommend working on just the SELECT first. And get that tested working, before monkeying around to turn it into an INSERT ... SELECT.
FOLLOWUP
We can use any query as an inline view (derived table d) that returns a set of dates we want.
e.g.
FROM ( SELECT DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH AS dt
FROM dvc_install m
WHERE m.install_dt >= '2016-01-01'
GROUP BY DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH
) d
Note that with this approach, if there are no install_dt in February, we won't get back a row for February. Using the static UNION ALL SELECT approach allows us to get back "zero" counts, i.e. to return rows for months where there isn't an install_dt in that month. (But that's the answer to a different question... how do I get back a "zero" count for February when there aren't any rows for Februrary?)
Alternatively, if we have a calendar table e.g. cal that contains a list of the dates we want, we could just reference the table in place of the inline view, or the inline view query could get rows from that.
FROM ( SELECT cal.dt
FROM cal cal
WHERE cal.dt >= '2016-01-01'
AND cal.dt <= NOW()
AND DATE_FORMAT(cal.dt,'%d') = '01'
) d
I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.
I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.
I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.