Grouping by X days - mysql

I have a database that shows me different stats about different campaigns, each row has a timestamp value name "date".
I wrote the code for choosing and summarizing a range of dates, for example: 21-24/07/2010.
Now I need to add an option to choose a range of dates, but also to group the stats for each X days.
Let's say the user chooses to see stats from all the month: 01/07-31/07. I would like to present him the stats grouped by X days, let's say 3, so he will see the stats 01-03/07, 04-06/07,07-09/07 and so on...
I almost managed doing it using this code:
SELECT t1.camp_id,from_days( floor( to_days( date ) /3 ) *3 ) AS 'first_date'
FROM facebook_stats t1
INNER JOIN facebook_to_campaigns t2 ON t1.camp_id = t2.facebook_camp_id
WHERE date
BETWEEN 20100717000000
AND 20100724235959
GROUP BY from_days( floor( to_days( date ) /3 ) *3 ) , t2.camp_id
It actually does group it (by 3 days), but the problem is that for some reason it starts from the 16/07, and not the 17/07, then grouping each time 3 days at a time.
Would love to hear a solution to the code or I gave, or a better solution you have in mind.

To_Days returns the number of days since the year 0. When you divide it by 3, it considers only the quotient and not the remainder. eg. If it has been 5 days since year 0, then to_days will return 1.
To_days(20100717000000) must be leaving a remainder of 1. Basically To_days(20100716000000) is exactly divisible by 3 but 17th is not.
You could try this query:
DECLARE #startDate datetime
DECLARE #endDate datetime
DECLARE #groupByInterval INT
SET #startdate = 20100717000000
SET #enddate = 20100724235959
SET #groupByInterval = 3
SELECT
t1.camp_id, from_days(
to_days(#startDate)+floor(
(to_days(date)-to_days(#startDate))/#groupByInterval
)
* #groupByInterval)
AS first_date
FROM facebook_stats t1
INNER JOIN facebook_to_campaigns t2 ON t1.camp_id = t2.facebook_camp_id
WHERE date
BETWEEN #startDate
AND #endDate
GROUP BY first_date , t2.camp_id

Related

Only count working days in a DATEDIFF (MySQL)

So, next problem :'), I have the following query that #MatBailie provided to me here (thanks again!):
SELECT
taskname,
employee,
SUM(
DATEDIFF(
LEAST( enddate, '2023-12-31'),
GREATEST(startdate, '2023-01-01')
)
+1
) AS total_days,
FROM
schedule
WHERE
startDate <= '2023-12-31'
AND
endDate >= '2023-01-01'
GROUP BY
employee,
taskname
This query will tell me how many days a certain employee has spent on a certain task in a given period of time, and it works great!
The next thing I would like to do however, is to substract non-working days from the SUM of DATEDIFFs for some of the tasks (e.g. when the task has "count_non_working_days= 0" in a reference table called 'activities').
For example, my schedule also keeps track of the amount of days off every employee has taken (days off are also scheduled as tasks). But of course, days off that fall in a weekend or on a holiday should not be counted towards the total of days off a person has taken in a year. (Note that I did consider scheduling days off only on weekdays/non-holidays, but this is not a practical option in the scheduling software I use because employees request a leave from date A to date B, and this request is approved or denied as-is (they don't make 3 holiday requests excluding the weekends if they want to go on a vacation for 3 weeks, if you get my drift).
So, if an employee goes on a vacation for 10 days, this is counted as 10 days off, but this holiday may have 1 or 2 weekends in it, so the sum of days of that the employee has taken off should be 6, 7 or 8, and not 10. Furthermore, if it has a holiday such as Easter Monday in it (I have all dates of my holidays in a PHP array), this should also be subtracted.
I have tried the solutions mentioned here, but I couldn't get them to work (a) because those are in SQL server and (b) because they don't allow putting in an array of holidays, (c) nor allow toggling the subtraction on and off depending on the event type.
Here's my attempt of explaining what I'm trying to do in my pseudo-SQL:
SELECT
taskname,
employee,
IF( activities.count_non_working_days=1,
-- Just count the days that fall in the current year:
SUM(
DATEDIFF(
LEAST( enddate, '2023-12-31'),
GREATEST( startdate, '2023-01-01')
)
+ 1
) AS total_days,
-- Subtract the amount of saturdays, sundays and holidays:
SUM(
DATEDIFF(
LEAST( enddate, '2023-12-31'),
GREATEST( startdate, '2023-01-01')
)
- [some way of getting the amount of saturdays, sundays and holidays that fall within this date range]
+ 1
) AS total_days
)
FROM
schedule
LEFT JOIN
activities
ON activity.name = schedule.name
WHERE
startDate <= '2023-12-31'
AND
endDate >= '2023-01-01'
GROUP BY
employee,
taskname
I know the query above is probably faulty on so many levels, but I hope it clarifies what I'm trying to do.
Thanks once more for all the help!
Edit: basically I need something like this, but in MySQL and preferably with a toggle that turns the subtraction on or off depending on the task type.
Edit 2: To clarify: my schedule table holds ALL activities, including holidays. For example, some records may include:
employee
taskname
startDate
endDate
Mr. Anderson
Programming
2023-01-02
2023-01-06
Mr. Anderson
Programming
2023-01-09
2023-01-14
Mr. Anderson
Vacation
2023-01-14
2023-01-31
In another table, Programming is defined as "count_non_working_days=1", because working in the weekends should count, while Vacation is defined as "count_non_working_days=0", because taking a day off on the weekend should not count towards your total amount of days taken off.
The totals for this month should therefore state that:
Mr. Anderson has done Programming for 11 days (of which 1 was on a saturday)
Mr. Anderson has taken 12 days off for (because the 2 weekends in this period don't count as days off).
Create a calendar table, with every date of interest (so, something like 2000-01-01 to 2099-01-01) and include columns such as is_working_day which can be set to TRUE/FLASE or 1/0. Then you can update that column as necessary, and join on that table in your query to get working dates that the employee has booked off.
In short, you count the relevant dates, rather than deducting the irrelevant dates.
SELECT
s.employee,
s.taskname,
COUNT(*) AS total_days,
FROM
(
schedule AS s
INNER JOIN
activities AS a
ON a.taskname = s.taskname
)
INNER JOIN
calendar AS c
ON c.calendar_date >= s.startDate
AND c.calendar_date <= s.endDate
AND c.is_working_day >= 1 - a.count_non_working_days
WHERE
c.calendar_date >= '2023-01-01'
AND c.calendar_date <= '2023-12-31'
GROUP BY
s.employee,
s.taskname
Your calendar table can then also include flags such as is_weekend, is_bank_holiday, is_fubar, is_amazing, etc, and the is_working_day can be a computed column from those inputs.
Note on is_working_day filter...
WHERE
( count_non_working_day = 1 AND is_working_day IN (0, 1) )
OR
( count_non_working_day = 0 AND is_working_day IN ( 1) )
-- change to (1 - count_non_working_day)
WHERE
( (1 - count_non_working_day) = 0 AND is_working_day IN (0, 1) )
OR
( (1 - count_non_working_day) = 1 AND is_working_day IN ( 1) )
-- simplify
WHERE
( (1 - count_non_working_day) <= is_working_day )
OR
( (1 - count_non_working_day) <= is_working_day )
-- simplify
WHERE
( (1 - count_non_working_day) <= is_working_day )
Demo: https://dbfiddle.uk/YAmpLmVE
This is to calculate all the weeekends between two giving dates It may help you :
SELECT (
((WEEK('2022-12-31') - WEEK('2022-01-01')) * 2) -
(case when weekday('2022-12-31') = 6 then 1 else 0 end) -
(case when weekday('2022-01-01') = 5 then 1 else 0 end)
)
You will have to substract also holidays that fall within this date range.

get zero value in sql [duplicate]

I'm building a quick csv from a mysql table with a query like:
select DATE(date),count(date) from table group by DATE(date) order by date asc;
and just dumping them to a file in perl over a:
while(my($date,$sum) = $sth->fetchrow) {
print CSV "$date,$sum\n"
}
There are date gaps in the data, though:
| 2008-08-05 | 4 |
| 2008-08-07 | 23 |
I would like to pad the data to fill in the missing days with zero-count entries to end up with:
| 2008-08-05 | 4 |
| 2008-08-06 | 0 |
| 2008-08-07 | 23 |
I slapped together a really awkward (and almost certainly buggy) workaround with an array of days-per-month and some math, but there has to be something more straightforward either on the mysql or perl side.
Any genius ideas/slaps in the face for why me am being so dumb?
I ended up going with a stored procedure which generated a temp table for the date range in question for a couple of reasons:
I know the date range I'll be looking for every time
The server in question unfortunately was not one that I can install perl modules on atm, and the state of it was decrepit enough that it didn't have anything remotely Date::-y installed
The perl Date/DateTime-iterating answers were also very good, I wish I could select multiple answers!
When you need something like that on server side, you usually create a table which contains all possible dates between two points in time, and then left join this table with query results. Something like this:
create procedure sp1(d1 date, d2 date)
declare d datetime;
create temporary table foo (d date not null);
set d = d1
while d <= d2 do
insert into foo (d) values (d)
set d = date_add(d, interval 1 day)
end while
select foo.d, count(date)
from foo left join table on foo.d = table.date
group by foo.d order by foo.d asc;
drop temporary table foo;
end procedure
In this particular case it would be better to put a little check on the client side, if current date is not previos+1, put some addition strings.
When I had to deal with this problem, to fill in missing dates I actually created a reference table that just contained all dates I'm interested in and joined the data table on the date field. It's crude, but it works.
SELECT DATE(r.date),count(d.date)
FROM dates AS r
LEFT JOIN table AS d ON d.date = r.date
GROUP BY DATE(r.date)
ORDER BY r.date ASC;
As for output, I'd just use SELECT INTO OUTFILE instead of generating the CSV by hand. Leaves us free from worrying about escaping special characters as well.
not dumb, this isn't something that MySQL does, inserting the empty date values. I do this in perl with a two-step process. First, load all of the data from the query into a hash organised by date. Then, I create a Date::EzDate object and increment it by day, so...
my $current_date = Date::EzDate->new();
$current_date->{'default'} = '{YEAR}-{MONTH NUMBER BASE 1}-{DAY OF MONTH}';
while ($current_date <= $final_date)
{
print "$current_date\t|\t%hash_o_data{$current_date}"; # EzDate provides for automatic stringification in the format specfied in 'default'
$current_date++;
}
where final date is another EzDate object or a string containing the end of your date range.
EzDate isn't on CPAN right now, but you can probably find another perl mod that will do date compares and provide a date incrementor.
You could use a DateTime object:
use DateTime;
my $dt;
while ( my ($date, $sum) = $sth->fetchrow ) {
if (defined $dt) {
print CSV $dt->ymd . ",0\n" while $dt->add(days => 1)->ymd lt $date;
}
else {
my ($y, $m, $d) = split /-/, $date;
$dt = DateTime->new(year => $y, month => $m, day => $d);
}
print CSV, "$date,$sum\n";
}
What the above code does is it keeps the last printed date stored in a
DateTime object $dt, and when the current date is more than one day
in the future, it increments $dt by one day (and prints it a line to
CSV) until it is the same as the current date.
This way you don't need extra tables, and don't need to fetch all your
rows in advance.
I hope you will figure out the rest.
select * from (
select date_add('2003-01-01 00:00:00.000', INTERVAL n5.num*10000+n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date from
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n5
) a
where date >'2011-01-02 00:00:00.000' and date < NOW()
order by date
With
select n3.num*100+n2.num*10+n1.num as date
you will get a column with numbers from 0 to max(n3)*100+max(n2)*10+max(n1)
Since here we have max n3 as 3, SELECT will return 399, plus 0 -> 400 records (dates in calendar).
You can tune your dynamic calendar by limiting it, for example, from min(date) you have to now().
Since you don't know where the gaps are, and yet you want all the values (presumably) from the first date in your list to the last one, do something like:
use DateTime;
use DateTime::Format::Strptime;
my #row = $sth->fetchrow;
my $countdate = strptime("%Y-%m-%d", $firstrow[0]);
my $thisdate = strptime("%Y-%m-%d", $firstrow[0]);
while ($countdate) {
# keep looping countdate until it hits the next db row date
if(DateTime->compare($countdate, $thisdate) == -1) {
# counter not reached next date yet
print CSV $countdate->ymd . ",0\n";
$countdate = $countdate->add( days => 1 );
$next;
}
# countdate is equal to next row's date, so print that instead
print CSV $thisdate->ymd . ",$row[1]\n";
# increase both
#row = $sth->fetchrow;
$thisdate = strptime("%Y-%m-%d", $firstrow[0]);
$countdate = $countdate->add( days => 1 );
}
Hmm, that turned out to be more complicated than I thought it would be.. I hope it makes sense!
I think the simplest general solution to the problem would be to create an Ordinal table with the highest number of rows that you need (in your case 31*3 = 93).
CREATE TABLE IF NOT EXISTS `Ordinal` (
`n` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`n`)
);
INSERT INTO `Ordinal` (`n`)
VALUES (NULL), (NULL), (NULL); #etc
Next, do a LEFT JOIN from Ordinal onto your data. Here's a simple case, getting every day in the last week:
SELECT CURDATE() - INTERVAL `n` DAY AS `day`
FROM `Ordinal` WHERE `n` <= 7
ORDER BY `n` ASC
The two things you would need to change about this are the starting point and the interval. I have used SET #var = 'value' syntax for clarity.
SET #end = CURDATE() - INTERVAL DAY(CURDATE()) DAY;
SET #begin = #end - INTERVAL 3 MONTH;
SET #period = DATEDIFF(#end, #begin);
SELECT #begin + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal` WHERE `n` < #period
ORDER BY `n` ASC;
So the final code would look something like this, if you were joining to get the number of messages per day over the last three months:
SELECT COUNT(`msg`.`id`) AS `message_count`, `ord`.`date` FROM (
SELECT ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH) + INTERVAL (`n` + 1) DAY AS `date`
FROM `Ordinal`
WHERE `n` < (DATEDIFF((CURDATE() - INTERVAL DAY(CURDATE()) DAY), ((CURDATE() - INTERVAL DAY(CURDATE()) DAY) - INTERVAL 3 MONTH)))
ORDER BY `n` ASC
) AS `ord`
LEFT JOIN `Message` AS `msg`
ON `ord`.`date` = `msg`.`date`
GROUP BY `ord`.`date`
Tips and Comments:
Probably the hardest part of your query was determining the number of days to use when limiting Ordinal. By comparison, transforming that integer sequence into dates was easy.
You can use Ordinal for all of your uninterrupted-sequence needs. Just make sure it contains more rows than your longest sequence.
You can use multiple queries on Ordinal for multiple sequences, for example listing every weekday (1-5) for the past seven (1-7) weeks.
You could make it faster by storing dates in your Ordinal table, but it would be less flexible. This way you only need one Ordinal table, no matter how many times you use it. Still, if the speed is worth it, try the INSERT INTO ... SELECT syntax.
Use some Perl module to do date calculations, like recommended DateTime or Time::Piece (core from 5.10). Just increment date and print date and 0 until date will match current.
I don't know if this would work, but how about if you created a new table which contained all the possible dates (that might be the problem with this idea, if the range of dates is going to change unpredictably...) and then do a left join on the two tables? I guess it's a crazy solution if there are a vast number of possible dates, or no way to predict the first and last date, but if the range of dates is either fixed or easy to work out, then this might work.

Is there a MySQL Statement for this or are multiple statements needed?

I have a table with MLSNumber, ListingContractDate, CloseDate.
I want to summarize the activity grouped my month starting with the current month and going back to January 2000.
I have this statement which summarizes the ListingContractDate by month.
SELECT COUNT(MLSNumber) AS NewListings, DATE_FORMAT(ListingContractDate,'%M %Y')
FROM Listings
WHERE Neighbourhood = 'Beachside'
AND ListingContractDate >= '2000-01-01'
GROUP BY YEAR(ListingContractDate), MONTH(ListingContractDate)
ORDER BY ListingContractDate DESC
The two problems with this statement are if there is nothing found in a specific month it skips that month, and I would need to return a 0 so no months are missing, and I am not sure how to get the same count on the CloseDate field or if I just have to run a 2nd query and match the two results up by month and year using PHP.
An exceptionally useful item to have is a "tally table" which simply consists on a set of integers. I used a script found HERE to generate such a table.
With that table I can now LEFT JOIN the time related data to it as shown below:
set #startdt := '2000-01-01';
SELECT COUNT(MLSNumber) AS NewListings, DATE_FORMAT(T.Mnth,'%M %Y')
FROM (
select
tally.id
, date_add( #startdt, INTERVAL (tally.id - 1) MONTH ) as Mnth
, date_add( #startdt, INTERVAL tally.id MONTH ) as NextMnth
from tally
where tally.id <= (
select period_diff(date_format(now(), '%Y%m'), date_format(#startdt, '%Y%m')) + 1
)
) t
LEFT JOIN Temp On Temp.ListingContractDate >= T.Mnth and Temp.ListingContractDate < T.NextMnth
GROUP BY YEAR(T.Mnth), MONTH(T.Mnth)
ORDER BY T.Mnth DESC
Logc,
define a stating date
calculate the number of months from that date until now (using
PERIOD_DIFF + 1)
choose that number of records from the tally table
create period start and end dates (tally.Mnth & tally.NextMnth)
LEFT JOIN the actual data to the tally table using
Temp.ListingContractDate >= T.Mnth and Temp.ListingContractDate < T.NextMnth
group and count the data
see this sqlfiddle`

Find number of "active" rows each month for multiple months in one query

I have a mySQL database with each row containing an activate and a deactivate date. This refers to the period of time when the object the row represents was active.
activate deactivate id
2015-03-01 2015-05-10 1
2013-02-04 2014-08-23 2
I want to find the number of rows that were active at any time during each month. Ex.
Jan: 4
Feb: 2
Mar: 1
etc...
I figured out how to do this for a single month, but I'm struggling with how to do it for all 12 months in a year in a single query. The reason I would like it in a single query is for performance, as information is used immediately and caching wouldn't make sense in this scenario. Here's the code I have for a month at a time. It checks if the activate date comes before the end of the month in question and that the deactivate date was not before the beginning of the period in question.
SELECT * from tblName WHERE activate <= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND deactivate >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
If anybody has any idea how to change this and do grouping such that I can do this for an indefinite number of months I'd appreciate it. I'm at a loss as to how to group.
If you have a table of months that you care about, you can do:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from months m;
If you don't have such a table handy, you can create one on the fly:
select m.*,
(select count(*)
from table t
where t.activate_date <= m.month_end and
t.deactivate_date >= m.month_start
) as Actives
from (select date('2015-01-01') as month_start, date('2015-01-31') as month_end union all
select date('2015-02-01') as month_start, date('2015-02-28') as month_end union all
select date('2015-03-01') as month_start, date('2015-03-31') as month_end union all
select date('2015-04-01') as month_start, date('2015-04-30') as month_end
) m;
EDIT:
A potentially faster way is to calculate a cumulative sum of activations and deactivations and then take the maximum per month:
select year(date), month(date), max(cumes)
from (select d, (#s := #s + inc) as cumes
from (select activate_date as d, 1 as inc from table t union all
select deactivate_date, -1 as inc from table t
) t cross join
(select #s := 0) param
order by d
) s
group by year(date), month(date);

MySQL query to count items by week for the current 52-weeks?

I have a query that I'd like to change so that it gives me the counts for the current 52 weeks. This query makes use of a calendar table I've made which contains a list of dates in a fixed range. The query as it stands is selecting max and min dates and not necessarily the last 52 weeks.
I'm wondering how to keep my calendar table current such that I can get the last 52-weeks (i.e, from right now to one year ago). Or is there another way to make the query independent of using a calendar table?
Here's the query:
SELECT calendar.datefield AS date, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(purchase_date))
FROM items_purchased) AND (SELECT MAX(DATE(purchase_date)) FROM items_purchased))
GROUP BY week(date)
thoughts?
Some people dislike this approach but I tend to use a dummy table that contains values from 0 - 1000 and then use a derived table to produce the ranges that are needed -
CREATE TABLE dummy (`num` INT NOT NULL);
INSERT INTO dummy VALUES (0), (1), (2), (3), (4), (5), .... (999), (1000);
If you have a table with an auto-incrementing id and plenty of rows you could generate it from that -
CREATE TABLE `dummy`
SELECT id AS `num` FROM `some_table` WHERE `id` <= 1000;
Just remember to insert the 0 value.
SELECT CURRENT_DATE - INTERVAL num DAY
FROM dummy
WHERE num < 365
So, applying this approach to your query you could do something like this -
SELECT WEEK(calendar.datefield) AS `week`, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN (
SELECT (CURRENT_DATE - INTERVAL num DAY) AS datefield
FROM dummy
WHERE num < 365
) AS calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE calendar.datefield >= (CURRENT_DATE - INTERVAL 1 YEAR)
GROUP BY week(datefield) -- shouldn't this be datefield instead of date?
I too typically "simulate" a table on the fly by using #sql variables and just join to ANY table in your system that has AT least as many weeks as you want. NOTE... when dealing with dates, I like to typically use the date-part only which implies a 12:00:00 am. Also, by advancing the start date by 7 days for the "EndOfWeek", you can now apply a BETWEEN clause for records within a given time period... such as your weekly needs.
I've applied such a sample to coordinate the join based on date association to the per week basis... Since your
select
DynamicCalendar.StartOfWeek,
COALESCE( SUM( IP.PurchaseYesNo ), 0 ) as Item_Sales
from
( select
#weekNum := #weekNum +1 as WeekNum,
#startDate as StartOfWeek,
#startDate := date_add( #startDate, interval 1 week ) EndOfWeek
from
( select #weekNum := 0,
#startDate := date(date_sub(now(), interval 1 year ))) sqlv,
AnyTableThatHasAtLeast52Records,
limit
52 ) DynamicCalendar
LEFT JOIN items_purchased IP
on IP.Purchase_Date bewteen DynamicCalendar.StartOfWeek
AND DynamicCalendar.EndOfWeek
group by
DynamicCalendar.StartOfWeek
This is under the premise that your "PurchaseYesNo" value is in your purchased table directly. If so, no need to join to the ITEMS table. If the field IS in the items table, then I would just tack on a LEFT JOIN for your items table and get value from that.
However you could use the dynamicCalendar context in MANY conditions.