How to find missing data rows using SQL? - mysql

My problem:
I got a MySQL database that stores a great amount of meteorological data in chronological order (New data are inserted every 10 min). Unfortunately there have been several blackouts and hence certain rows are missing. I recently managed to obtain certain backup-files from the weather station and now I want to use these to fill in the missing data.
The DB ist structures like this (example):
date* the data
2/10/2009 10:00 ...
2/10/2009 10:10 ...
( Missing data!)
2/10/2009 10:40 ...
2/10/2009 10:50 ...
2/10/2009 11:00 ...
...
*=datatime-type, primary key
My idea:
Since backup and database are located on different computers and traffic is quite slow, I thought of creating a MySQL-query that, when run, will return a list of all missing dates in a specified range of time. I could then extract these dates from the backup and insert them to the database.
The question:
How to write such a query? I don't have the permission to create any auxilary table. Is it possible to formulate a "virtual table" of all required dates in the specified interval and then use it in a JOIN? Or are there entirely different propositions for solving my problem?
Edit:
Yes, the timestamps are consistently in the form shown above (always 10 minutes), except that some are just missing.
Okay, what about the temporary tables? Is there an elegant way of populating them with the time-range automatically? What if two scripts try to run simultaneously, does this cause problems with the table?

select t1.ts as hival, t2.ts as loval
from metdata t1, metdata t2
where t2.ts = (select max(ts) from metdata t3
where t3.ts < t1.ts)
and not timediff(t1.ts, t2.ts) = '00:10:00'
This query will return couplets you can use to select the missing data. The missing data will have a timestamp between hival and loval for each couplet returned by the query.
EDIT - thx for checking, Craig
EDIT2 :
getting the missing timestamps - this SQL gets a bit harder to read, so I'll break it up a bit. First, we need a way to calculate a series of timestamp values between a given low value and a high value in 10 minute intervals. A way of doing this when you can't create tables is based on the following sql, which creates as a resultset all of the digits from 0 to 9.
select d1.* from
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d1
...now by combining this table with a copy of itself a couple of times means we can dynamically generate a list of a specified length
select curdate() +
INTERVAL (d1.digit * 100 + d2.digit * 10 + d3.digit) * 10 MINUTE
as date
from (select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d1
join
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d2
join
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d3
where (d1.digit * 100 + d2.digit * 10 + d3.digit) between 1 and 42
order by 1
... now this piece of sql is getting close to what we need. It has 2 input variables:
a starting timestamp (I used
curdate() in the example); and a
number of iterations - the where
clause specifies 42 iterations in
the example, maximum with 3 x digit tables is 1000 intervals
... which means we can use the original sql to drive the example from above to generate a series of timestamps for each hival lowval pair. Bear with me, this sql is a bit long now...
select daterange.loval + INTERVAL (d1.digit * 100 + d2.digit * 10 + d3.digit) * 10 MINUTE as date
from
(select t1.ts as hival, t2.ts as loval
from metdata t1, metdata t2
where t2.ts = (select max(ts) from metdata t3
where t3.ts < t1.ts)
and not timediff(t1.ts, t2.ts) = '00:10:00'
) as daterange
join
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d1
join
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d2
join
(select 1 as digit
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 0
) as d3
where (d1.digit * 100 + d2.digit * 10 + d3.digit) between 1 and
round((time_to_sec(timediff(hival, loval))-600) /600)
order by 1
...now there's a bit of epic sql
NOTE : using the digits table 3 times gives a maximum gap it will cover of a bit over 6 days

If you can create a temporary table, you can solve the problem with a JOIN
CREATE TEMPORARY TABLE DateRange
(theDate DATE);
Populate the table with all 10 minute intervals between your dates, then use the following
SELECT theDate
FROM DateRange dr
LEFT JOIN Meteorological mm on mm.date = dr.theDate
WHERE mm.date IS NULL
The result will be all of the date/times that do not have entries in your weather table.
If you need to quickly find days with missing data, you can use
select Date(mm.Date),144-count(*) as TotMissing
from Meteorological mm
group by Date(mm.Date)
having count(*) < 144
This is assume 24 hour a day, 6 entries per hour (hence 144 rows). – Sparky 0 secs ago

Create a temporary table (JOIN). Or take all the dates and query them locally, where you should have free reign (loop/hash).
For the JOIN, your generated reference of all dates is your base table and your data is your joined table. Seek out pairs where the joined data does not exist and select the generated date.

As a quick solotion using Sql Server, check for dates that do not have a follower of date+interval. I think MySql does have some sort of dateadd function, but you can try something like this. This will show you the ranges where you have missing data.
DECLARE #TABLE TABLE(
DateValue DATETIME
)
INSERT INTO #TABLE SELECT '10 Feb 2009 10:00:00'
INSERT INTO #TABLE SELECT '10 Feb 2009 10:10:00'
INSERT INTO #TABLE SELECT '10 Feb 2009 10:40:00'
INSERT INTO #TABLE SELECT '10 Feb 2009 10:50:00'
INSERT INTO #TABLE SELECT '10 Feb 2009 11:00:00'
SELECT *
FROM #TABLE currentVal
WHERE ((SELECT * FROM #TABLE nextVal WHERE DATEADD(mi,10,currentVal.DateValue) = nextVal.DateValue) IS NULL AND currentVal.DateValue != (SELECT MAX(DateValue) FROM #TABLE))
OR ((SELECT * FROM #TABLE prevVal WHERE DATEADD(mi,-10,currentVal.DateValue) = prevVal.DateValue) IS NULL AND currentVal.DateValue != (SELECT MIN(DateValue) FROM #TABLE))

Note: uses MSSQL syntax. I think MySQL uses DATE_ADD(T1.date, INTERVAL 10 MINUTE) instead of DATEADD, but I haven't tested this.
You can get the missing timestamps with two self-joins:
SELECT T1.[date] AS DateFrom, MIN(T3.[date]) AS DateTo
FROM [test].[dbo].[WeatherData] T1
LEFT JOIN [test].[dbo].[WeatherData] T2 ON DATEADD(MINUTE, 10, T1.date) = T2.date
LEFT JOIN [test].[dbo].[WeatherData] T3 ON T3.date > T1.Date
WHERE T2.[value] IS NULL
GROUP BY T1.[date]
If you have a lot of data, You might want to try restricting the range to one month at a time to avoid heavy load on your server, as this operation could be quite intensive.
The results will be something like this:
DateFrom DateTo
2009-10-02 10:10:00.000 2009-10-02 10:40:00.000
2009-10-02 11:00:00.000 NULL
The last row represents all data from the last timestamp into the future.
You can then use another join to get the rows from the other database that have a timestamp in between any of these intervals.

This solution uses sub-queries, and there is no need for any explicit temporary tables. I've assumed your backup data is in another database on the other machine; if not you'd only need to do up to step 2 for the result-set you need, and write your program to update the main database accordingly.
The idea is to start out by producing a 'compact' result-set summarising the gap-list. I.e. the following data:
MeasureDate
2009-12-06 13:00:00
2009-12-06 13:10:00
--missing data
2009-12-06 13:30:00
--missing data
2009-12-06 14:10:00
2009-12-06 14:20:00
2009-12-06 14:30:00
--missing data
2009-12-06 15:00:00
Would be transformed into the following where actual gaps are strictly between (i.e. exclusive of) the endpoints:
GapStart GapEnd
2009-12-06 13:10:00 2009-12-06 13:30:00
2009-12-06 13:30:00 2009-12-06 14:10:00
2009-12-06 14:30:00 2009-12-06 15:00:00
2009-12-06 15:00:00 NULL
The solution query is built up as follows:
Obtain all MeasureDates that don't have an entry 10 minutes later as this will be the start of a gap. NOTE: The last entry will be included even though not strictly a gap; but this won't have any adverse effects.
Augment the above by adding the end of the gap using the first MeasureDate after the start of the gap.
NOTE: The gap-list is compact, and unless you have an exceptionally high prevalence of fragmented gaps, it should not consume much bandwidth in passing that result-set to the backup machine.
Use an INNER JOIN with inequalities to identify any missing data that may be available in the backup. (Run tests and checks to verify the integrity of your backup data.)
Assuming your backup data is sound, and won't produce anomalous unfounded spikes in your measurements, INSERT the data in your main database.
The following query should be tested (preferably adjusted to run on the backup server for performance reasons).
/* TiC Copyright
This query is writtend (sic) by me, and cannot be used without
expressed (sic) written permission. (lol) */
/*Step 3*/
SELECT gap.GapStart, gap.GapEnd,
rem.MeasureDate, rem.Col1, ...
FROM (
/*Step 2*/
SELECT gs.GapStart, (
SELECT MIN(wd.MeasureDate)
FROM WeatherData wd
WHERE wd.MeasureDate > gs.GapStart
) AS GapEnd
FROM (
/*Step 1*/
SELECT wd.MeasureDate AS GapStart
FROM WeatherData wd
WHERE NOT EXISTS (
SELECT *
FROM WeatherData nxt
WHERE nxt.MeasureDate = DATEADD(mi, 10, wd.MeasureDate)
)
) gs
) gap
INNER JOIN RemoteWeatherData rem ON
rem.MeasureDate > gap.GapStart
AND rem.MeasureDate < gap.GapEnd
The insert...
INSERT INTO WeatherData (MeasureDate, Col1, ...)
SELECT /*gap.GapStart, gap.GapEnd,*/
rem.MeasureDate, rem.Col1, ...
...

Do a self join and then calculate the max values that are smaller and have a difference larger than your interval.
In Oracle I'd do it like this (with ts being the timestamp column):
Select t1.ts, max(t2.ts)
FROM atable t1 join atable t2 on t1.ts > t2.ts
GROUP BY t1.ts
HAVING (t1.ts - max(t2.ts))*24*60 > 10
There will be better ways to handle the difference calculation in mySql, but I hope the idea comes across.
This query will give you the timestamps directly after and before outage, and you can build from there.

Related

Calculating Value Between Date Ranges

So I am a bit stumped, I know how to do this theoretically but I am having trouble executing it in practice.
Basically I have a table and a revisions table. The table reflects the status as of now and the revisions table reflects the past status of the table.
id3, id2, id1, title, timestamp, status,
56456 229299 4775 x name 1432866912 0
56456 232054 123859 x name 1434000054 1
56456 235578 16623 x name 1435213281 1
56456 237496 139811 x name 1464765447 1
56456 381557 0 x name 1487642800 1
56456 616934 186319 x name 1496103368 1
56456 668046 246292 x name 1505386262 1
56456 766390 246292 x name 1523273582 1
Basically what I want is to look at the historical live/offline status of all entries in the table. So I know the current status is live, and I know the dates the entry was offline/live as well.
What I want to do is calculate the live or offline dates between the timestamps.
The dates between 1 -> 0 Are live dates. The dates between 1 -> Are live dates. The dates Between 0 -> 1 Are offline dates and the dates between 0 -> 0 Are offline dates.
So ideally my data would have a live/offline status delineated by each day in between each of these status changes.
I.E
The the output would display the dates between Timestamp 1432866912 & 1434000054 as the Status being Offline
I tried searching but didn't see anything relevant.
EDIT:
#RaymondNijland The first row has a unixtimestamp for the date May 28, 2015 & the second row a timestamp of the date June 11, 2015. The first row is offline and the second row is live.
So I basically want my data to look like this
Date Status
May 28, 2015 Offline
May 29, 2015 Offline
May 30, 2015 Offline
....
....
June 9, 2015 Offline
June 10, 2015 Offline
June 11, 2015 Live
June 12, 2015 Live
I need to do it this way because our database doesn't store the data on a daily basis, but only when a change is made to the data.
Use "case" to know if it is Offline or Live and "date_format" function to display timestamp in date. See this demo: http://sqlfiddle.com/#!9/88281f/13
select DATE_FORMAT(FROM_UNIXTIME(`timestamp`), '%b %e, %Y') AS `date` ,
case when status=0 then 'Offline' else 'Live' end as `status`
from yourTbl
order by `timestamp`
You can't retrieve records that aren't in your table. In such cases, you must generate the dates sequence and then perform cross-checks or left joins etc.
have a look at
Generating Date sequence
Below code generates list of dates using min and max dates from your revision table. Do a cross check with your revision table and get the last seen/found status code for the current row date.
assuming your table is called Revisions with status and timestamp fields.
following SQL code should work for you:
Fiddle here
select
TDates.genDate, -- generated date sequence
(select case when r.status =0 then 'Offline' else 'Live' end
from revisions R
WHERE date(from_unixtime(R.Timestamp)) <= TDates.genDate
order by R.timestamp desc
limit 1
) as genStatus
from
(
SELECT *
FROM
(select adddate(T4.minDate, t3*1000 + t2*100 + t1*10 + t0) genDate, T4.minDate, T4.maxDate from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
-- using your table get the min date and max date we are going to check. So we generate just the required dates.
(select date(from_unixtime(min(R.timestamp))) as minDate, date(from_unixtime(max(R.timestamp))) as maxDate from revisions as R ) T4
) T
where T.genDate <= T.maxDate
) As TDates
order by TDates.genDate
this is just concept, you are more than welcome to improve performance hints

MySQL query to return dates of every Sunday between a given date and now (not relying on data in tables)

I need a query that will return the date of every Monday between two dates. It is intended to be in the basis of another query that counts weekly transactions (Monday till Sunday), and it should be able to represent weeks without any transaction.
This means it can't rely on the existing data in the transactions table, because there is no representation there to weeks without any transactions.
For example:
for the 3 weeks starting on Monday - July 21, 2014 - I need a query that returns the following:
2014-07-21
2014-07-28
2014-08-04
Assuming my transactions were logged on the following dates:
2014-07-22
2014-07-23
2014-07-25
2014-08-05
I will want to write a query that returns the aggregated number of transaction per week:
2014-07-21 => 3
2014-07-28 => 0
2014-08-04 => 1
And that's why I can't rely on the data itself, and need a query to generate every Monday between two given dates. Any suggestions?
OK so, if you only need the next 3 weeks, and we don't worry about what day of the week it is exactly, and you don't want to make another table, this query will solve your problem.
SET #startdate = '2014-07-21';
SET #seconddate = DATE_ADD(#startdate,INTERVAL 7 DAY);
SET #thirddate = DATE_ADD(#seconddate,INTERVAL 7 DAY);
SELECT count(id),#startdate as week FROM transactions
WHERE transdate BETWEEN #startdate AND #seconddate
UNION
SELECT count(id),#seconddate as week from transactions
WHERE transdate BETWEEN #seconddate AND #thirddate
UNION
SELECT count(id),#thirddate as week FROM transactions
WHERE transdate BETWEEN #thirddate AND DATE_ADD(#thirddate,INTERVAL 7 DAY)
IF you use another table, you can get it to work with this query (which also has the added advantage of being more adjustable and probably faster)
SELECT count(transactions.id), weekstarts.start
FROM weekstarts LEFT JOIN transactions ON
transactions.transdate
BETWEEN weekstarts.start AND DATE_ADD(weekstarts.start,INTERVAL 7 DAY)
WHERE weekstarts.start BETWEEN '2014-07-21' AND DATE_ADD('2014-07-21',INTERVAL 14 DAY)
GROUP BY weekstarts.start
So, I got part of this to work.. its not populating an empty week. but this gets the week that each occurrence happens to group by.
I would not recommend you do all of this in MySQL.. you should do it in another programming language
SETUP:
create table time_date (id int, date_part date);
insert into time_date values
(1, '2014-07-22'),
(2, '2014-07-23'),
(3, '2014-07-25'),
(4, '2014-08-05');
QUERY:
SELECT
FROM_DAYS(TO_DAYS(date_part) - MOD(TO_DAYS(date_part) -2, 7)) as 'Week'
, COUNT(*) as 'Num Per Week'
FROM time_date
GROUP BY FROM_DAYS(TO_DAYS(date_part) - MOD(TO_DAYS(date_part) -2, 7)) ;
OUTPUT:
Week Num Per Week
2014-07-21 3
2014-08-04 1
from here you should do the rest in another programming language. build a list of weeks... and then compare.. put in a 0 and the week if its not returned from this query
DEMO
Found it!
Inspired by a reply on this thread -
The following query returns (out of thin air!) every Monday between 2 dates:
select *
from (
select date_add('2010-01-01', INTERVAL n4.num*1000+n3.num*100+n2.num*10+n1.num DAY ) as date
from (select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n1,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n2,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n3,
(select 0 as num
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9) n4
) a
where date >= '2014-07-21' and date < NOW()
and weekday(date) = 0
order by date
Below query gives you the answer -
create table transactions
( id integer,
trans_date date
);
select DATE_ADD(trans_date, INTERVAL(-WEEKDAY(trans_date)) DAY)
Week_start_date,count(*) total_transactions from transactions
group by 1;

How to find daily average over a time period in mysql?

I've a table where there's two column:
MARKS
CREAT_TS
I want to daily average marks for between two date range (e.g. startDate & endDate)
I've made the following query:
select SUM(MARKS)/ COUNT(date(CREAT_TS)) AS DAILY_AVG_MARKS,
date(CREAT_TS) AS DATE
from TABLENAME
group by date(CREAT_TS)
With this query I can get the daily average only if there's a row in the database for the date. But my requirement is that even if there's no row, I want to show 0 for that date.
I mean I want the query to return X rows if there are X days between (startDate, endDate)
Can anyone help me. :(
You need to create a set of integers that you can add to the dates. The following will give you an idea:
select thedate, avg(Marks) as DAILY_AVG_MARKS
from (select startdate+ interval num day as thedate
from (select d1.d + 10 * d2.d + 100*d3.d as num
from (select 0 as d union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9
) d1 cross join
(select 0 as d union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9
) d2 cross join
(select 0 as d union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9
) d3
) n cross join
(select XXX as startdate, YYY as enddate) const
where startdate + num <= enddate
) left outer join
tablename t
on date(CREAT_TS) = thedate
group by thedate
All the complication is in creating a set of sequential dates for the report. If you have a numbers table or a calendar table, then the SQL looks much simpler.
How does this work? The first big subquery has two parts. The first just generates the numbers from 0 to 999 by cross joining the digits 0-9 and doing some arithmetic. The second joins this to the two dates, startdate and enddate -- you need to put the correct values in for XXX and YYY. With this table, you have all the dates between the two values. If you need more than 999 days, just add in another cross join.
This is the left joined to your data table. The result is that all dates appear for the group by.
In terms of reporting, there are advantages and disadvantages to doing this in the presentation layer. Basically, the advantage to doing it in SQL is that the report layer is simpler. The advantage to doing it in the reporting layer is that the SQL is simpler. It is hard for an outsider to make that judgement.
My suggestion would be to create a numbers table that you can just use in reports like this. Then the query will look simpler and you won't have to change the reporting layer.

mysql skips certain months

Been trying to sort this one out for a while. I'd really appreciate any help.
I've got this table where I'm getting 2 columns with date and int values respectively. The problem is that mysql skips the date values wherever the int value is null.
Here the sql statement
SELECT DATE_FORMAT(sales_date_sold, '%b \'%y')
AS sale_date, sales_amount_sold
AS sale_amt
FROM yearly_sales
WHERE sales_date_sold BETWEEN DATE_SUB(SYSDATE(), INTERVAL 2 YEAR) AND SYSDATE()
GROUP BY YEAR(sales_date_sold), MONTH(sales_date_sold)
ORDER BY YEAR(sales_date_sold), MONTH(sales_date_sold) ASC;
There aren't any values for feb 2011 so that month gets skipped, along with a few others. Coalesce and if_null don't work too.
You need a row source that provides values for all of the months in the dimension, and then left join your yearly_sales table to that.
You are doing a GROUP BY, you most likely want an aggregate on your measure (sales_amount_sold), or you don't want a GROUP BY. (The query in your question is going to return a value from sales_amount_sold for only one row in a given month. That may be what you want, but its a very odd resultset to return.)
One approach is to have a "calendar_month" table that contains DATE values all of the months you want returned. (There are other ways to generate this, existing answers to questions elsewhere on stackoverflow)
SELECT m.month AS sale_date
, IFNULL(SUM(s.sales_amount_sold),0) AS sale_amt
FROM calendar_months m
LEFT
JOIN yearly_sales s
ON s.sales_date_sold >= m.month
AND s.sales_date_sold < DATE_ADD(m.month,INTERVAL 1 MONTH)
WHERE m.month BETWEEN DATE_SUB(SYSDATE(), INTERVAL 2 YEAR) AND SYSDATE()
GROUP BY m.month
ORDER BY m.month
This query returns a slightly different result, you are only going to get rows in groups of "whole months", rather than including partial months, as in your original query, because the WHERE clause on sale_date references two years before the current date and time, rather than the "first of the month" two years before.
A calendar_months table is not necessarily required; this could be replaced with a query that returns the row source. In that case, the predicate on the month value could be moved from the outer query into the subquery.
Addendum: if you use a calendar_month table as a rowsource, you'll need to populate it with every possible "month" value you want to return.
CREATE TABLE calendar_month
(`month` DATE NOT NULL PRIMARY KEY COMMENT 'one row for first day of each month');
INSERT INTO calendar_month(`month`) VALUES ('2011-01-01'),('2011-02-01'),('2011-03-01')
As an alternative, you can specify a dynamically generated rowsource, as an inline view, rather than a table reference. (You could use a similar query to quickly populate a calendar_months table.)
You can wrap this query in parenthesis, and paste it between FROM and calendar_months in the previous query I provided.
SELECT DATE_ADD('1990-01-01',INTERVAL 1000*thousands.digit + 100*hundreds.digit + 10*tens.digit + ones.digit MONTH) AS `month`
FROM ( SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ) ones
JOIN ( SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ) tens
JOIN ( SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ) hundreds
JOIN ( SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ) thousands
The problem is not that the value is NULL, the problem is that you are selecting data off your Database. If you don't have data for a specific month, MySQL has no way of selecting data that is not there.
The only way to solve this completely in MySQL is already answered in a very similar question
I have had this problem before with timestamps. The solution I used was to create a reference table with all of your months. This could be a table with just the numbers 1-12 (12 rows) or you could go one step further and put the month names. Then you can left join your yearly_sales table to the 1_through_12 table to get every month.
Why don't you just use 0 instead of NULL?

MySQL: Is it possible to 'fill' a SELECT with values without a table?

I need to display the total of 'orders' for each year and month. But for some months there is no data, but I DO want to display that month (with a total value of zero). I could make a helpertable 'months' with 12 records for each year, but is there maybe a way to get a range of months, without introducing a new table?
Something like:
SELECT [all year-month combinations between january 2000 and march 2011]
FROM DUAL AS years_months
Does anybody have an idea how to do this? Can you use SELECT with some kind of formula, to 'create' data on the fly?!
UPDATE:
Found this myself:
generate days from date range
The accepted answer in this question is kind of what I'm looking for. Maybe not the easiest method, but it does what I want: fill a select with data, based on a formula....
To 'create' a table on the fly with all months of the last 10 years:
SELECT CONCAT(MONTHNAME(datetime), ' ' , YEAR(datetime)) AS YearMonth,
MONTH(datetime) AS Month,
YEAR(datetime) AS Year
FROM (
select (curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) MONTH) as datetime
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
LIMIT 120
) AS t
ORDER BY datetime ASC
I must admit, this is VERY exotic, but it DOES work...
I can use this select to join it with my 'orders'-table and get the totals for each month, even when there is no data in a certain month.
But using a 'numbers' or 'calendar' table is probably the best option, so I'm going to use that.
If at all possible, try to stay away from generating data on the fly. It makes very simple queries ridiculusly complex, but above all: it confuses the optimizer to no end.
If you need a series of integers, use a static table of integers. If you need a series of dates, months or whatever, use a calendar table. Unless you are dealing with some truly extraordinary requirements, a static table is the way to go.
I gave an example on how to create a table of numbers and a minimal calendar table(only dates) in this answer.
If you have those tables in place, it becomes easy to solve your query.
Aggregate the order data to MONTH.
Right join to the table of months (or distinct MONTH from the table of dates)
You could try something like this
select * from
(select 2000 as year union
select 2001 as year union
select 2009
) as years,
(select 1 as month union
select 2 as month union
select 3 as month union
select 4 as month union
select 5 as month union
select 6 as month union
select 7 as month union
select 8 as month union
select 9 as month
)
AS months
WHERE year between 2001 AND 2008 OR (year=2000 and month>0) OR (year = 2009 AND month < 4)
ORDER by year,month
You could just fill in the missing months after you've done your query in your application logic.
You should most definitely do this in your application rather than the DB layer. Simply create an array of dates for the time range, and merge the actual data with the empty dates you pre-created. See this answer to similar question
I do following query to generate months in a given interval. For my case it generate list of month started from may 2013 until now.
SELECT date_format(#dt:= DATE_ADD( #dt, INTERVAL 1 MONTH),'%M %Y') date_string,
#dt as date_full
FROM (SELECT #dt := DATE_SUB(CAST(DATE_FORMAT('2013-05-01' ,'%Y-%m-01') AS DATE),
INTERVAL 1 MONTH) ) vars,
your_tables
WHERE #dt<NOW()
The concern is, it should be joined with table containing sufficient rows to supply number of month you expected. E.g. if you need to generate all month in a particular year, you will need a tables consisting at least 12 rows.
For me it is a bit straight forward. I joined it with my configuration table, consisting around 370 rows. So it could generate months in a year, or days in a year if I need it. Changing from month interval into days interval would be easy, as I need only to change the interval from MONTH to DAY.
If you're using PostgreSQL, you can combine both date_trunc and generate_series to do some very fun grouping and series generation.
For example, you could use this to generate a table of all dates in the last year:
SELECT current_date - s.a as date
FROM generate_series(0,365,1) as s(a);
Then, you could use date_trunc to grab the months and group by that date_trunc'ed field:
SELECT date(date_trunc('month', series.date)) as month, COUNT(*) as days
FROM (SELECT current_date - s.a as date
FROM generate_series(0,365,1) as s(a)) series
GROUP BY month;
Create a table (e.g. tblMonths) that includes all 12 months and use a LEFT JOIN (or RIGHT JOIN) on it and your partial source data.
Check out the reference and this tutorial for how this works.
I would do something like this:
SELECT COUNT(Order.OrderID)
FROM Orders
WHERE YEAR(Order.DateOrdered) > 2000
GROUP BY MONTH(Order.DateOrdered)
This will give you the number of orders grouped by each month.
Then in you application simply assign a ZERO to the months in which no data was returned
I hope this Helps
Query on static data MySQL.
You can select static data from hardcoded list with table by this query
SELECT *
FROM (
values row('Hamza','23'), row('Ali', '24')
) t1 (name, age);