Padding MYSQL data with missing dates when comparing year over year stats?

Padding MYSQL data with missing dates when comparing year over year stats? - mysql

I have a table that tracks emails sent. It is pretty simple.
ID | DATETIME | E-MAIL | SUBJECT | MESSAGE
I have been collecting data for several years. Some days I don't have any entries in the table.
query1:
SELECT COUNT(ID) FROM emails
WHERE DATE(datetime) >= 'XXXX-XX-XX'
AND DATE(datetime) is <= 'ZZZZ-ZZ-ZZ'
GROUP BY DATE(datetime)
I then use a some php to get one year prior for both XXXX and YYYY and run the second query which is the same as the first...
query2:
SELECT COUNT(ID) from emails
WHERE DATE(datetime) >= 'XXXX-XX-XX'
AND DATE(datetime) is <= 'ZZZZ-ZZ-ZZ'
GROUP BY DATE(datetime)
I am using a charting package to compare how many emails I got for a date range and then I overlay how many emails I got for the same range only one year prior. This is two queries right now and I chart the results.
The issue is where mysql does not have any emails for 2011 for a day in question, but has a few in 2012 for the same day.
Combining the results and graphing them skews the results since I am missing a date and a 0 value for last year for that day, effectively making all my values no longer match up.
2011-03-01 10 2012-03-01 4
2011-03-02 4 2012-03-02 2
2011-03-03 6 2012-03-04 1 <---- see where the two queries
end up diverging? (I had nothing
logged for 2012-03-03 so naturally
it was not in the results.
Is there a way I can get mysql to output the data I need including dates where value appear in one year but not another OR if no values appear in either year (still need date and 0) so my chart works?
I cannot seem to figure out how to do this...
Thanks!

There are a few different ways to get the results for a contiguous set of dates. My favourite one is to create the full set that is required using a dummy table or an existing contiguous set of ids from an AI PK. Something like this -
SELECT '2011-01-01' + INTERVAL (id -1) DAY
FROM dummy
WHERE id BETWEEN 1 AND 365
This will return a full set of days for 2011 which can then be LEFT JOINed to your emails table to get the counts -
SELECT `dates`.`date`, COUNT(emails.id)
FROM (
SELECT '2011-01-01' + INTERVAL (id - 1) DAY AS `date`, '2011-01-01 23:59:59' + INTERVAL (id - 1) DAY AS `end_of_day`
FROM dummy
WHERE id BETWEEN 1 AND 365
) `dates`
LEFT JOIN emails
ON `emails`.`datetime` BETWEEN `dates`.`date` AND `dates`.`end_of_day`
GROUP BY `dates`.`date`
To populate your dummy / seq table you can insert the first ten values manually and then use INSERT ... SELECT to add the rest -
CREATE TABLE dummy (id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO dummy VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
SET #tmp := (SELECT MAX(id) FROM dummy) + 1;
INSERT INTO dummy
SELECT #tmp + id
FROM dummy;
You need to execute the SET query before each run of the INSERT ... SELECT query.

Related

How to return zero values if nothing was written in time interval?

I am using the Graph Reports for the select below. The MySQL database only has the active records in the database, so if no records are in the database from X hours till Y hours that select does not return anything. So in my case, I need that select return Paypal zero values as well even the no activity was in the database. And I do not understand how to use the UNION function or re-create select in order to get the zero values if nothing was recorded in the database in time interval. Could you please help?
select STR_TO_DATE ( DATE_FORMAT(`acctstarttime`,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(*) as `Active Paid Accounts`
from radacct_history where `paymentmethod` = 'PayPal'
group by DATE_FORMAT(`#date`,'%y-%m-%d %H')
When I run the select the output is:
Current Output
But I need if there are no values between 2016-07-27 07:00:00 and 2016-07-28 11:00:00, then in every hour it should show zero active accounts Like that:
Needed output with no values every hour
I have created such select below , but it not put to every hour the zero value like i need. showing the big gap between the 12 Sep and 13 Sep anyway, but there should be the zero values every hour
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(paymentmethod) as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H'))
union ALL
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', 0 as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H')) ;

I guess, you want to return 0 if there is no matching rows in MySQL. Here is an example:
(SELECT Col1,Col2,Col3 FROM ExampleTable WHERE ID='1234')
UNION (SELECT 'Def Val' AS Col1,'none' AS Col2,'' AS Col3) LIMIT 1;
Updated the post: You are trying to retrieve data that aren't present in the table, I guess in reference to the output provided. So in this case, you have to maintain a date table to show the date that aren't in the table. Please refer to this and it's little bit tricky - SQL query that returns all dates not used in a table

You need an artificial table with all necessary time intervals. E.g. if you need daily data create a table and add all day dates e.g. start from 1970 till 2100.
Then you can use the table and LEFT JOIN your radacct_history. So for each desired interval you will have group item (group by should be based on the intervals table.

Get stats for each day in a month without ignoring days with no data

I want to get stats for each day in a given month. However, if a day has no rows in the table, it doesn't show up in the results. How can I include days with no data, and show all days until the current date?
This is the query I have now:
SELECT DATE_FORMAT(FROM_UNIXTIME(timestamp), '%d'), COUNT(*)
FROM data
WHERE EXTRACT(MONTH FROM FROM_UNIXTIME(timestamp)) = 6
GROUP BY EXTRACT(DAY FROM FROM_UNIXTIME(timestamp))
So if I have
Row 1 | 01-06
Row 2 | 02-06
Row 3 | 03-06
Row 4 | 05-06
Row 5 | 05-06
(i changed timestamp values to a day/month date just to explain)
It should output
01 | 1
02 | 1
03 | 1
04 | 0
05 | 2
06 | 0
...Instead of ignoring day 4 and today (day 6).

You will need a calendar table to do something in the form
SELECT `date`, count(*)
FROM Input_Calendar c
LEFT JOIN Data d on c.date=d.date
GROUP BY `date`
I keep a full copy of a calendar table in my database and used a WHILE loop to fill it but you can populate one on the fly for use based on the different solutions out there like http://crazycoders.net/2012/03/using-a-calendar-table-in-mysql/

In MySQL, you can use MySQL variables (act like in-line programming values). You set and can manipulate as needed.
select
dayofmonth( DynamicCalendar.CalendarDay ) as `Day`,
count(*) as Entries
from
( select
#startDate := date_add( #startDate, interval 1 day ) CalendarDay
from
( select #startDate := '2013-05-31' ) sqlvars,
AnyTableThatHasAsManyDaysYouExpectToReport
limit
6 ) DynamicCalendar
LEFT JOIN Input_Calendar c
on DynamicCalendar.CalendarDay = date( from_unixtime( c.date ))
group by
DynamicCalendar.CalendarDay
In the above sample, the inner query can join against as the name implies "Any Table" in your database that has at least X number of records you are trying to generate for... in this case, you are dealing with only the current month of June and only need 6 records worth... But if you wanted to do an entire year, just make sure the "Any Table" has 365 records(or more).
The inner query will start by setting the "#startDate" to the day BEFORE June 1st (May 31). Then, by just having the other table, will result in every record joined to this variable (creates a simulated for/next loop) via a limit of 6 records (days you are generating the report for). So now, as the records are being queried, the Start Date keeps adding 1 day... first record results in June 1st, next record June 2nd, etc.
So now, you have a simulated calendar with 6 records dated from June 1 to June 6. Take that and join to your "data" table and you are already qualifying your dates via the join and get only those dates of activity. I'm joining on the DATE() of the from unix time since you care about anything that happend on June 1, and June 1 # 12:00:00AM is different than June 1 # 8:45am, so matching on the date only portion, they should remain in proper grouping.
You could expand this answer by changing the inner '2013-05-31' to some MySQL Date function to get the last day of the prior month, and the limit based on whatever day in the current month you are doing so these are not hard-coded.

Create a Time dimension. This is a standard OLAP reporting trick. You don't need a cube in order to do OLAP tricks, though. Simply find a script on the internet to generate a Calendar table and join to that table.
Also, I think your query is missing a WHERE clause.
Other useful tricks include creating a "Tally" table that is a list of numbers from 1 to N where N is usually the max of the bigint on your database management system.
No code provided here, as I am not a MySQL guru.
Pseudo-code is:
Select * from Data left join TimeDimension on data.date = timedimension.date

MySQL: Find Missing Dates Between a Date Range

I need some help with a mysql query. I've got db table that has data from Jan 1, 2011 thru April 30, 2011. There should be a record for each date. I need to find out whether any date is missing from the table.
So for example, let's say that Feb 2, 2011 has no data. How do I find that date?
I've got the dates stored in a column called reportdatetime. The dates are stored in the format: 2011-05-10 0:00:00, which is May 5, 2011 12:00:00 am.
Any suggestions?

This is a second answer, I'll post it separately.
SELECT DATE(r1.reportdate) + INTERVAL 1 DAY AS missing_date
FROM Reports r1
LEFT OUTER JOIN Reports r2 ON DATE(r1.reportdate) = DATE(r2.reportdate) - INTERVAL 1 DAY
WHERE r1.reportdate BETWEEN '2011-01-01' AND '2011-04-30' AND r2.reportdate IS NULL;
This is a self-join that reports a date such that no row exists with the date following.
This will find the first day in a gap, but if there are runs of multiple days missing it won't report all the dates in the gap.

CREATE TABLE Days (day DATE PRIMARY KEY);
Fill Days with all the days you're looking for.
mysql> INSERT INTO Days VALUES ('2011-01-01');
mysql> SET #offset := 1;
mysql> INSERT INTO Days SELECT day + INTERVAL #offset DAY FROM Days; SET #offset := #offset * 2;
Then up-arrow and repeat the INSERT as many times as needed. It doubles the number of rows each time, so you can get four month's worth of rows in seven INSERTs.
Do an exclusion join to find the dates for which there is no match in your reports table:
SELECT d.day FROM Days d
LEFT OUTER JOIN Reports r ON d.day = DATE(r.reportdatetime)
WHERE d.day BETWEEN '2011-01-01' AND '2011-04-30'
AND r.reportdatetime IS NULL;`

It could be done with a more complicated single query, but I'll show a pseudo code with temp table just for illustration:
Get all dates for which we have records:
CREATE TEMP TABLE AllUsedDates
SELECT DISTINCT reportdatetime
INTO AllUsedDates;
now add May 1st so we track 04-30
INSERT INTO AllUsedData ('2011-05-01')
If there's no "next day", we found a gap:
SELECT A.NEXT_DAY
FROM
(SELECT reportdatetime AS TODAY, DATEADD(reportdatetime, 1) AS NEXT_DAY FROM AllUsed Dates) AS A
WHERE
(A.NEXT_DATE NOT IN (SELECT reportdatetime FROM AllUsedDates)
AND
A.TODAY <> '2011-05-01') --exclude the last day

If you mean reportdatetime has the entry of "Feb 2, 2011" but other fields associated to that date are not present like below table snap
reportdate col1 col2
5/10/2011 abc xyz
2/2/2011
1/1/2011 bnv oda
then this query works fine
select reportdate from dtdiff where reportdate not in (select df1.reportdate from dtdiff df1, dtdiff df2 where df1.col1 = df2.col1)

Try this
SELECT DATE(t1.datefield) + INTERVAL 1 DAY AS missing_date FROM table t1 LEFT OUTER JOIN table t2 ON DATE(t1.datefield) = DATE(t2.datefield) - INTERVAL 1 DAY WHERE DATE(t1.datefield) BETWEEN '2020-01-01' AND '2020-01-31' AND DATE(t2.datefield) IS NULL;
If you want to get missing dates in a datetime field use this.
SELECT CAST(t1.datetime_field as DATE) + INTERVAL 1 DAY AS missing_date FROM table t1 LEFT OUTER JOIN table t2 ON CAST(t1.datetime_field as DATE) = CAST(t2.datetime_field as DATE) - INTERVAL 1 DAY WHERE CAST(t1.datetime_field as DATE) BETWEEN '2020-01-01' AND '2020-07-31' AND CAST(t2.datetime_field as DATE) IS NULL;

The solutions above seem to work, but they seem EXTREMELY slow (taking possibly hours, I waited for 30 min only) at least in my database.
This clause takes less than a second in same database (of course you need to repeat it manually dozen times and possibly change function names to find the actual dates). pvm = my datetime, WEATHER = my table.
mysql> select year(pvm) as _year,count(distinct(date(pvm))) as _days from WEATHER where year(pvm)>=2000 and month(pvm)=1 group by _year order by _year asc;
--ako

Group by day and still show days without rows?

I have a log table with a date field called logTime. I need to show the number of rows within a date range and the number of records per day. The issue is that i still want to show days that do not have records.
Is it possible to do this only with SQL?
Example:
SELECT logTime, COUNT(*) FROM logs WHERE logTime >= '2011-02-01' AND logTime <= '2011-02-04' GROUP BY DATE(logTime);
It returns something like this:
+---------------------+----------+
| logTime | COUNT(*) |
+---------------------+----------+
| 2011-02-01 | 2 |
| 2011-02-02 | 1 |
| 2011-02-04 | 5 |
+---------------------+----------+
3 rows in set (0,00 sec)
I would like to show the day 2011-02-03 too.

MySQL will not invent rows for you, so if the data is not there, they will naturally not be shown.
You can create a calendar table, and join in that,
create table calendar (
day date primary key,
);
Fill this table with dates (easy with a stored procedure, or just some general scripting), up till around 2038 and something else will likely break unitl that becomes a problem.
Your query then becomes e.g.
SELECT logTime, COUNT(*)
FROM calendar cal left join logs l on cal.day = l.logTime
WHERE day >= '2011-02-01' AND day <= '2011-02-04' GROUP BY day;
Now, you could extend the calendar table with other columns that tells you the month,year, week etc. so you can easily produce statistics for other time units. (and purists might argue the calendar table would have an id integer primary key that the logs table references instead of a date)

In order to accomplish this, you need to have a table (or derived table) which contains the dates that you can then join from, using a LEFT JOIN.
SQL operates on the concept of mathematical sets, and if you don't have a set of data, there is nothing to SELECT.
If you want more details, please comment accordingly.

I'm not sure if this is a problem that should be solved by SQL. As others have shown, this requires maintaining a second table that contains the all of the individual dates of a given time span, which must be updated every time that time span grows (which presumably is "always" if that time span is the current time.
Instead, you should use to inspect the results of the query and inject dates as necessary. It's completely dynamic and requires no intermediate table. Since you specified no language, here's pseudo code:
EXECUTE QUERY `SELECT logTime, COUNT(*) FROM logs WHERE logTime >= '2011-02-01' AND logTime <= '2011-02-04' GROUP BY DATE(logTime);`
FOREACH row IN query result
WHILE (date in next row) - (date in this row) > 1 day THEN
CREATE new row with date = `date in this row + 1 day`, count = `0`
INSERT new row IN query result AFTER this row
ADVANCE LOOP INDEX TO new row (`this row` is now the `new row`)
END WHILE
END FOREACH
Or something like that

DECLARE #TOTALCount INT
DECLARE #FromDate DateTime = GetDate() - 5
DECLARE #ToDate DateTime = GetDate()
SET #FromDate = DATEADD(DAY,-1,#FromDate)
Select #TOTALCount= DATEDIFF(DD,#FromDate,#ToDate);
WITH d AS
(
SELECT top (#TOTALCount) AllDays = DATEADD(DAY, ROW_NUMBER()
OVER (ORDER BY object_id), REPLACE(#FromDate,'-',''))
FROM sys.all_objects
)
SELECT AllDays From d

SQL query that returns all dates not used in a table

So lets say I have some records that look like:
2011-01-01 Cat
2011-01-02 Dog
2011-01-04 Horse
2011-01-06 Lion
How can I construct a query that will return 2011-01-03 and 2011-01-05, ie the unused dates. I postdate blogs into the future and I want a query that will show me the days I don't have anything posted yet. It would look from the current date to 2 weeks into the future.
Update:
I am not too excited about building a permanent table of dates. After thinking about it though it seems like the solution might be to make a small stored procedure that creates a temp table. Something like:
CREATE PROCEDURE MISSING_DATES()
BEGIN
CREATE TABLE TEMPORARY DATES (FUTURE DATETIME NULL)
INSERT INTO DATES (FUTURE) VALUES (CURDATE())
INSERT INTO DATES (FUTURE) VALUES (ADDDATE(CURDATE(), INTERVAL 1 DAY))
...
INSERT INTO DATES (FUTURE) VALUES (ADDDATE(CURDATE(), INTERVAL 14 DAY))
SELECT FUTURE FROM DATES WHERE FUTURE NOT IN (SELECT POSTDATE FROM POSTS)
DROP TABLE TEMPORARY DATES
END
I guess it just isn't possible to select the absence of data.

You're right — SQL does not make it easy to identify missing data. The usual technique is to join your sequence (with gaps) against a complete sequence, and select those elements in the latter sequence without a corresponding partner in your data.
So, #BenHoffstein's suggestion to maintain a permanent date table is a good one.
Short of that, you can dynamically create that date range with an integers table. Assuming the integers table has a column i with numbers at least 0 – 13, and that your table has its date column named datestamp:
SELECT candidate_date AS missing
FROM (SELECT CURRENT_DATE + INTERVAL i DAY AS candidate_date
FROM integers
WHERE i < 14) AS next_two_weeks
LEFT JOIN my_table ON candidate_date = datestamp
WHERE datestamp is NULL;

One solution would be to create a separate table with one column to hold all dates from now until eternity (or whenever you expect to stop blogging). For example:
CREATE TABLE Dates (dt DATE);
INSERT INTO Dates VALUES ('2011-01-01');
INSERT INTO Dates VALUES ('2011-01-02');
...etc...
INSERT INTO Dates VALUES ('2099-12-31');
Once this reference table is set up, you can simply outer join to determine the unused dates like so:
SELECT d.dt
FROM Dates d LEFT JOIN Blogs b ON d.dt = b.dt
WHERE b.dt IS NULL
If you want to limit the search to two weeks in the future, you could add this to the WHERE clause:
AND d.dt BETWEEN NOW() AND ADDDATE(NOW(), INTERVAL 14 DAY)

The way to extract rows from the mysql database is via SELECT. Thus you cannot select rows that do not exist.
What I would do is fill my blog table with all possible dates (for a year, then repeat the process)
create table blog (
thedate date not null,
thetext text null,
primary key (thedate));
doing a loop to create all dates entries for 2011 (using a program, eg $mydate is the date you want to insert)
insert IGNORE into blog (thedate,thetext) values ($mydate, null);
(the IGNORE keyword to not create an error (thedate is a primary key) if thedate exists already).
Then you insert the values normally
insert into blog (thedate,thetext) values ($mydate, "newtext")
on duplicate key update thetext="newtext";
Finally to select empty entries, you just have to
select thedate from blog where thetext is null;

You probably not going to like this:
select '2011-01-03', count(*) from TABLE where postdate='2011-01-03'
having count(*)=0 union
select '2011-01-04', count(*) from TABLE where postdate='2011-01-04'
having count(*)=0 union
select '2011-01-05', count(*) from TABLE where postdate='2011-01-05'
having count(*)=0 union
... repeat for 2 weeks
OR
create a table with all days in 2011, then do a left join, like
select a.days_2011
from all_days_2011
left join TABLE on a.days_2011=TABLE.postdate
where a.days_2011 between date(now()) and date(date_add(now(), interval 2 week))
and TABLE.postdate is null;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008