How do I fill gaps in weekly data?

How do I fill gaps in weekly data? - mysql

I have a table with 4 fields (id, Year, Week, Totals).
I need a query, I guess using join, to fill zero values based on the year and week fields.
In my example I need to fill zero values for the weeks 3 and 4 / Year 2013
Rec Id, Year, Week, Totals
1, '2012', '52', '23'
2, '2013', '1' , '9'
3, '2013', '2' , '4'
Missing record from DB -> null, '2013', '3' , '0'
Missing record from DB -> null, '2013', '4' , '0'
4, '2013', '5' , '5'
5, '2013', '6' , '6'
6, '2013', '7' , '5'

That was a fun one! OK, here we go. First off, I'll give you the simple version, which relies on a couple assumptions:
You have at least one entry in your table already for each year
You have at least one of each week in your table, for any given year. IE: this query returns all numbers from 1 to 52:
SELECT DISTINCT week FROM your_table
Given those constraints, this query should do what you want:
INSERT INTO your_table (id, year, week, totals)
SELECT null, y, w, 0 FROM (
SELECT DISTINCT week w FROM your_table
) weeks
CROSS JOIN
(
SELECT DISTINCT year y FROM your_table
) years
WHERE
(y > (select min(year) from your_table) OR w > (select min(week) from your_table where `year`=y))
AND
(y < (select max(year) from your_table) OR w < (select max(week) from your_table where `year`=y))
AND
NOT EXISTS (select year, week from your_table where `year`=y AND `week`=w)
If condition 2 might not be satisfied - if there are some weeks that are missing in every year, you can replace this line
SELECT DISTINCT week w FROM your_table
with
SELECT
(TWO_1.SeqValue + TWO_2.SeqValue + TWO_4.SeqValue + TWO_8.SeqValue + TWO_16.SeqValue + TWO_32.SeqValue) w
FROM
(SELECT 0 SeqValue UNION ALL SELECT 1 SeqValue) TWO_1
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 2 SeqValue) TWO_2
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 4 SeqValue) TWO_4
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 8 SeqValue) TWO_8
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 16 SeqValue) TWO_16
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 32 SeqValue) TWO_32
HAVING w >= 1 AND w <= 52
Giving this more general case:
INSERT INTO your_table (id, year, week, totals)
SELECT null, y, w, 0 FROM (
SELECT
(TWO_1.SeqValue + TWO_2.SeqValue + TWO_4.SeqValue + TWO_8.SeqValue + TWO_16.SeqValue + TWO_32.SeqValue) w
FROM
(SELECT 0 SeqValue UNION ALL SELECT 1 SeqValue) TWO_1
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 2 SeqValue) TWO_2
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 4 SeqValue) TWO_4
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 8 SeqValue) TWO_8
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 16 SeqValue) TWO_16
CROSS JOIN (SELECT 0 SeqValue UNION ALL SELECT 32 SeqValue) TWO_32
HAVING w >= 1 AND w <= 52
) weeks
CROSS JOIN
(
SELECT DISTINCT year y FROM your_table
) years
WHERE
(y > (select min(year) from your_table) OR w > (select min(week) from your_table where `year`=y))
AND
(y < (select max(year) from your_table) OR w < (select max(week) from your_table where `year`=y))
AND
NOT EXISTS (select year, week from your_table where `year`=y AND `week`=w)
(You can use a similar technique to generate the list of years if condition 1 isn't satisfied, but I'm guessing you don't have entire year-long holes.)
Finally, this could be simplified a bit if you have a unique index on year and week. If you do not yet have such an index, you could create it like so:
ALTER TABLE `your_table` ADD CONSTRAINT date UNIQUE (
`year`,
`week`
)
and if you want, you could remove it when you're done, like so:
ALTER TABLE `your_table` DROP INDEX date;
In that case, the final part of the where clause can be removed:
AND
NOT EXISTS (select year, week from your_table where `year`=y AND `week`=w)
because the INSERT IGNORE will skip any rows for which that unique year/week combination already exists.
Kudos to this answer for the range-generating code: https://stackoverflow.com/a/8349837/160565

Related

How to fill missing values in aggregate-by-time function

I have function (from this question) which groups values by every 5 minutes and calculate min/avg/max:
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM data
WHERE clock BETWEEN 1200000000 AND 1200001200
GROUP BY FLOOR(clock / 300);
However, due to missing values, some five-minute periods are skipped, making the timeline inconsistent. How to make it so that in the absence of data for a certain period, the value of max / avg / min becomes 0, instead of being skipped?
For example:
If I have timestamp - value
1200000001 - 100
1200000002 - 300
1200000301 - 100
1200000601 - 300
I want to get this: (select min/avg/max, time between 1200000000 and 1200001200)
1200000000 - 100/200/300
1200000300 - 100/100/100
1200000600 - 300/300/300
1200000900 - 0/0/0
Instead of this: (time between 1200000000 and 1200001200)
1200000000 - 100/200/300
1200000300 - 100/100/100
1200000600 - 300/300/300
1200000900 - THIS LINE WILL NOT BE, I will only get 3 lines above. No data between 1200000900 and 1200001200 for calculation.
My Answer:
Generate first table with required time range, and then left join this generated table on query with common group by operator. Such like this:
select * from
(select UNIX_TIMESTAMP(gen_date) as unix_date from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where gen_date between '2017-01-01' and '2017-12-31') date_range_table
left join (
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM table
WHERE clock BETWEEN 1483218000 AND 1514667600
GROUP BY FLOOR(clock / 300)) data_table
on date_range_table.unix_date = data_table.period_start;

Use recursive CTE (available in MariaDB starting from 10.2.2) and generate base calendar table:
WITH RECURSIVE
cte AS ( SELECT #timestart timestart, #timestart + 300 timeend
UNION ALL
SELECT timestart + 300, timeend + 300 FROM cte WHERE timeend < #timeend)
SELECT cte.timestart,
COALESCE(MIN(value), 0) min_value,
COALESCE(AVG(value), 0) avg_value,
COALESCE(MAX(value), 0) max_value
FROM cte
LEFT JOIN example ON example.clock >= cte.timestart
AND example.clock < cte.timeend
GROUP BY cte.timestart;
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=f5c41b7596d56f1d7babe075f19302ec

I am not very sure but here's a link which can solve your problem
https://www.sqlservercurry.com/2009/06/find-missing-identity-numbers-in-sql.html

You can try this one;
with seq as (
select
(step-1)* 300 + (select (FLOOR(min(clock) / 300) * 300) from data) as step
from
(select row_number() over() as step from data) tmp
where
tmp.step-1 < (select(max(clock)-min(clock))/ 300 from data))
SELECT seq.step as period_start, MIN(value), AVG(value), MAX(value)
FROM seq left join data on (seq.step=(FLOOR(clock / 300) * 300))
WHERE clock BETWEEN 1622667600 AND 1625259600
GROUP BY period_start

Alternative answer is generate first table with required time range, and then left join this generated table on query with common group by operator.

Show even null values in case query

How would i show even null values in this query:
select
case floor(reading_winddirection / 45)
when 0 then 'N'
when 1 then 'NE'
when 2 then 'E'
when 3 then 'SE'
when 4 then 'S'
when 5 then 'SW'
when 6 then 'W'
when 7 then 'NW'
end windgroup,
count(*) cnt,
round(100 * count(*) / sum(count(*)) over()) percentage
from simulation_readings
group by windgroup
Now the query returns ie:
N 66 66
E 2 2
SE 1 1
SW 1 1
But i wish it to return all cases even if they have no values and set them to 0

I think that you want a left join with a fixed list if values
select
d.windgroup,
count(s.reading_winddirection) cnt,
coalesce(round(
100 * count(s.reading_winddirection)
/ nullif(sum(count(s.reading_winddirection)) over(), 0)
), 0) percentage
from (
select 0 n, 'N' windgroup
union all select 1, 'NE'
union all select 2, 'E'
union all select 3, 'SE'
union all select 4, 'S'
union all select 5, 'SW'
union all select 6, 'W'
union all select 7, 'NW'
) d
left join simulation_readings s
on floor(s.reading_winddirection / 45) = d.n
group by d.windgroup
In MySQL < 8.0, this would look like:
select
d.windgroup,
coalesce(c.cnt, 0) cnt,
coalesce(round(100 * c.cnt, 0 / nullif(t.total, 0)), 0) percentage
from (
select 0 n, 'N' windgroup
union all select 1, 'NE'
union all select 2, 'E'
union all select 3, 'SE'
union all select 4, 'S'
union all select 5, 'SW'
union all select 6, 'W'
union all select 7, 'NW'
) d
cross join (select count(*) total from simulation_readings) t
left join (
select floor(reading_winddirection / 45) n, count(*) cnt
from simulation_readings
group by n
) c on c.n = d.n
group by d.windgroup

Mysql Left outer join in my Query for optimize

I have two database table name called "tablestr table" and "restbookingtable":
tablestr:
str_id is primary key
restbooking:
bookingsection_id is foreign key
in booking table i storing str_id multiple values with comma separated and My query is
SELECT `str_id` FROM (`rest_tablestr`) WHERE str_id NOT IN (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.bookingsection_id, ",", n.n), ",", -1) value FROM rest_restaurantbooking t
CROSS JOIN (
SELECT a.N + b.N * 10 + 1 n FROM (
SELECT 0 AS N
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9) a , (
SELECT 0 AS N
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL SELECT 9)
b ORDER BY n ) n
WHERE n.n <= 1 + (LENGTH(t.bookingsection_id) -
LENGTH(REPLACE(t.bookingsection_id, ",", ""))) AND
t.res_id = 21 AND
t.booking_status not in ("cancelled","departed","noshow") AND
((t.bookingstart_time <= "2015-06-12 19:45:00" AND t.bookingend_time >= "2015-06-12 22:15:00") OR
(t.bookingend_time >= "2015-06-12 19:45:00" AND t.bookingend_time <= "2015-06-12 22:15:00") OR
(t.bookingstart_time >= "2015-06-12 19:45:00" AND t.bookingstart_time <= "2015-06-12 22:15:00") OR
(t.bookingstart_time >= "2015-06-12 19:45:00" AND t.bookingend_time <= "2015-06-12 22:15:00")) ) AND
`res_id` = '21' AND
`area_id` = '28' AND
`wait_table` = 'no' AND
`availability` = 'yes';
Result Set:
Can any body help me to rewrite query with left outer join or can be optimize query.

MySQL infinite loop in a subquery?

The following query probably results an infinite loop:
SELECT
*,
(SELECT
t2.`value`
FROM
`table` t2
WHERE
t2.`variable` = 'xxx'
AND t2.`read` = (SELECT
MAX(t1.`read`)
FROM
`table` t1
WHERE
t1.`variable` = 'xxx'
AND UNIX_TIMESTAMP(t1.`read`) < (1401801648 - n.integers)
)
)
FROM
(SELECT
#N:=#N + 1 AS integers
FROM
mysql.help_relation, (SELECT #N:=0) dum
LIMIT 48) n
I need a result with 48 rows for 48 different time ranges (In this example 1401801648 minus {1..48}). Each row should contain a value depending on the current time range. The query on the bottom is for these 48 ranges.
The query in the middle is needed to find the date for the newest entry which is older than the calculated timestamp (1401801648 - n.integers). The upper query tells me the value of the row with the date from the query in the middle.
When the "n.integers" is replaced by a number everything works fine.
Without the subquery (t2) the query is not in a loop(?):
SELECT
*,
(SELECT
MAX(t1.`read`)
FROM
`table` t1
WHERE
t1.`variable` = 'xxx'
AND UNIX_TIMESTAMP(t1.`read`) < (1401801648 - n.integers)
)
FROM
(SELECT
#N:=#N + 1 AS integers
FROM
mysql.help_relation, (SELECT #N:=0) dum
LIMIT 48) AS n

An alternative method avoiding using variables:-
SELECT sub1.a_cnt, t2.value
FROM table t2
INNER JOIN
(
SELECT sub1.a_timestamp, sub1.a_cnt, t1.variable, MAX(t1.read) AS max_timestamp
FROM
(
SELECT (1401801648 - units.i + 10 * tens.i) AS a_cnt,(1401801648 - units.i + 10 * tens.i) AS a_timestamp
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
WHERE units.i + 10 * tens.i BETWEEN 1 AND 48
) sub1
INNER JOIN table t1
AND UNIX_TIMESTAMP(t1.read) < sub1.a_timestamp
WHERE t1.variable = 'xxx'
GROUP BY sub1.a_timestamp
) sub2
ON t2.read = sub2.max_timestamp
AND t2.variable = sub2.variable
This uses a load of unioned queries getting constants to generate the numbers 0 to 9, cross joins that against another copy of itself and does a minor calulation to get all the numbers from 0 to 99, with a WHERE clause to narrow it down to the range 1 to 48, and uses this to calculate the timestamps required.
This is then joined against your table to get the max read date for each timestamp / generated number.
The results of this are then joined back against your table to get the other details from that row (in this case your value column).
Not tested it but hopefully it gives you an idea.

Insert N rows with date interval

I need to insert rows into a database, where every row is the same except a date column which should have its date incremented by 1 week for each new row. So, basically this:
for(n = 0; n<X; n++)
insert into events (date, title) values (start_date + 7*n, 'static title');
Any MySQL trick that can be used to do this?

You can use:
SELECT
'static_title' AS title,
DATE_ADD(#start_date, INTERVAL #i:=#i+1 WEEK) AS result_date
FROM
(SELECT
(two_1.id + two_2.id + two_4.id +
two_8.id + two_16.id) AS id
FROM
(SELECT 0 AS id UNION ALL SELECT 1 AS id) AS two_1
CROSS JOIN (SELECT 0 id UNION ALL SELECT 2 id) AS two_2
CROSS JOIN (SELECT 0 id UNION ALL SELECT 4 id) AS two_4
CROSS JOIN (SELECT 0 id UNION ALL SELECT 8 id) AS two_8
CROSS JOIN (SELECT 0 id UNION ALL SELECT 16 id) AS two_16
) AS sequence
CROSS JOIN
-- #i:=0 for not including current week
(SELECT #i:=-1, #start_date:=CURDATE()) AS init
WHERE
sequence.id<10;
-that will produce N rows (here N=10). To insert rows, just use INSERT .. SELECT syntax. Fiddle is here. Also in sample start_date is set to CURDATE() - but you can easily adjust that in query, of course.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How do I fill gaps in weekly data? - mysql

Related

How to fill missing values in aggregate-by-time function

Show even null values in case query

Mysql Left outer join in my Query for optimize

MySQL infinite loop in a subquery?

Insert N rows with date interval

Categories

Resources