Repeat rows based on range of dates in two columns - mysql

I have a table with following columns:
ID startdate enddate
I want the rows of this table to be repeated as many times as the difference between startdate and enddate along with a column which gives all the dates between these two days for each id in the table. So, my new table should be like this:
ID Date
A startdate
A startdate +1 day
A startdate +2 days (till enddate)
B startdate
B startdate + 1 day ....
Please note that I have different start and end dates for different IDs.
I tried the answer for the following question, but this doesn't work:
Mysql select multiple rows based on one row related date range

Here's one approach.
This uses an inline view (aliased as i to generate integer values from 0 to 999, and that is joined to your table to generate up to 1000 date values, starting from startdate up to enddate for each row.
The inline view i can be easily extended to generate 10,000 or 100,000 rows, following the same pattern.
This assumes that the startdate and enddate columns are datatype DATE. (or DATETIME or TIMESTAMP or a datatype that can be implicitly converted to valid DATE values.
SELECT t.id
, t.startdate + INTERVAL i.i DAY AS `Date`
FROM ( SELECT d3.n*100 + d2.n*10 + d1.n AS i
FROM ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d1
CROSS
JOIN ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d2
CROSS
JOIN ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d3
) i
JOIN mytable t
ON i.i <= DATEDIFF(t.enddate,t.startdate)

You need a numbers table... create a temporary table or dummy table that contains the numbers 1 to X (X being the maximum possible difference between the two dates)
Then join to that table using a date diff
I'm afraid I'm SQL Server and so not sure if the datediff functions work the same way in mysql, but you should get the idea.
SELECT
DateTable.Id,
DATEADD(dd, NumbersTable.Number, DateTable.StartDate)
FROM
DateTable
INNER JOIN
NumbersTable
ON
DATEADD(dd, NumbersTable.Number, DateTable.StartDate) <= DateTable.EndDate
ORDER BY
DateTable.Id,
DATEADD(dd, NumbersTable.Number, DateTable.StartDate)

I know its very late to answer
but still one more answer using recursive cte
with recursive cte ( id, startdate) as
(
select id,startdate from test t1
union all
select t2.id,(c.startdate + interval '1 day')::date
from test t2
join cte c on c.id=t2.id and (c.startdate + interval '1 day')::date<=t2.enddate
)
select id,startdate as date from cte
order by id, startdate
its PostgreSQL specific, but it should work in other relational databases with little bit change in Date function.

Related

Mysql slow joining small (20k) rows table to large (5mil) rows table, with indexes

I have two tables, devices (20k) rows and device_logins (5 mil rows), each device_logins row has a device_id with a foreign key and index linking to devices.
I'm trying to create a list of every week which a device was used in and from what applications, using MySQL, but it takes roughly 3 seconds to execute which stacks quickly in the application and from what I've read that's not enough data to justify taking that long
The schema is:
[devices]
id int unsigned
user_id int unsigned; foreign to users; index
hardware_type varchar
os_type varchar
os_version varchar
first_use datetime
last_use datetime
deleted_at datetime null
[device_logins]
id int unsigned
user_id int unsigned; foreign to users; index
device_id int unsigned; foreign to devices; index
application string
login_date datetime
The query is:
SELECT GROUP_CONCAT(DISTINCT appLoginToDeviceInRange.application SEPARATOR ', ') AS dataSource,
weekList.weekStartDate AS date,
MIN(devicesInRange.id) AS eventId
FROM (
SELECT DATE_FORMAT(date, '%Y-%u') AS week,
DATE_FORMAT(date - INTERVAL WEEKDAY(date) DAY, '%Y-%m-%d') AS weekStartDate,
DATE_FORMAT((date - INTERVAL WEEKDAY(date) DAY) + INTERVAL 6 DAY, '%Y-%m-%d') AS weekEndDate
FROM (
SELECT '2021-07-11' - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a)) DAY AS DATE
FROM (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS d ORDER BY DATE DESC) AS `dateList`
GROUP BY `week`
ORDER BY `weekStartDate` DESC
) AS `weekList`
INNER JOIN (
SELECT `devices`.*
FROM devices
INNER JOIN `users`
ON `users`.`id` = `devices`.`user_id`
AND users`.`id` IN (13368)
WHERE `hardware_type` = 'MOBILE'
) AS `devicesInRange`
ON `devicesInRange`.`first_use` <= `weekList`.`weekEndDate` AND `devicesInRange`.`last_use` >= `weekList`.`weekStartDate` AND
(`devicesInRange`.`deleted_at` IS NULL OR `devicesInRange`.`deleted_at` >= `weekList`.`weekStartDate`)
INNER JOIN (
SELECT DISTINCT `device_id`, `application`, YEARWEEK(login_date) AS loginWeek
FROM `deviceLogin`
WHERE `device_id` IS NOT NULL
) AS `appLoginToDeviceInRange`
ON `appLoginToDeviceInRange`.`device_id` = `devicesInRange`.`id`
AND `appLoginToDeviceInRange`.`loginWeek` >= YEARWEEK(weekList.weekStartDate)
AND `appLoginToDeviceInRange`.`loginWeek` <= YEARWEEK(weekList.weekEndDate)
WHERE `weekList`.`weekStartDate` < '2021-07-16 15:02:09.176280'
GROUP BY `os_type`, `week`
ORDER BY `weekList`.`weekStartDate` DESC, `os_type` DESC
LIMIT 20
Removing the join to my table with 5 million rows makes it take 80~ms as you'd expect and I've run mysqltuner and configured accordingly
Pre-build dateList, reaching far into the future; have the PRIMARY KEY on that date column; JOIN with a WHERE limiting to the desired range of days (or weeks).
Build and maintain a "Summary" table that contains each days counts. Then use that for tallying up over the desired day's (or week's). More: http://mysql.rjweb.org/doc.php/summarytables
If you might ever want info for "days", then use "DAY" in the above suggestions, else use "WEEK". Even reading from a daily summary to get weekly amounts will be reasonably fast. (And much faster than reading from the raw data, like you are doing now.)
If '2021-07-16 15:02:09.176280' came from NOW(6), you may as well simply say NOW(6) or, probably, CURDATE(). (No speed improvement, just clarity.)
JOIN (SELECT ...) ON ... JOIN (SELECT ...) ON ... is often inefficient; ponder whether some other formulation would work.
Perhaps you want FROM weekList LEFT JOIN ... to provide zeros if a week has no data? (What you have will, I think, simply leave out such weeks.) (Not for performance, but for 'correctness'.)
Please qualify hardware_type, os_type, etc with the table they are in. It makes it easier to know what is going on, especially when trying to determine optimal indexes. (An INDEX cannot span two tables.)
Please provide SHOW CREATE TABLE so I can help with indexes.
GROUP BY x ORDER BY y requires an extra temp table and sort -- when x and y are different. Your
GROUP BY `os_type`, `week`
ORDER BY `weekList`.`weekStartDate` DESC, `os_type` DESC
could probably be turned into
GROUP BY `weekList`.`weekStartDate`, `os_type`
ORDER BY `weekList`.`weekStartDate` DESC, `os_type` DESC
Fix those things up; then come back for more advice if it is still "too slow". (There will be so many changes that it would be best to start another Question.)

Can a subquery inside a SQL update fetch rows which have just been updated?

Having a collection of publications, I want to assign a different release date for each one per author. For doing this I am subtracting to all the dates, from publication's date until yesterday, the already taken dates for that author.
The problem of this update is that the current record depends on the assignation of the previous one. Eg: if there is already a feature assigned to April 2nd, new features on that day will be pushed to the 3rd or beyond. But if there are two unassigned features April 2nd, they will be both assigned to the same day.
UPDATE publications pub
SET pub.release_date = (
SELECT all.Dates
FROM ( # This generates all dates between publication date until yesterday
SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Dates
FROM (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as a
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as b
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as c
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as d
) all
WHERE all.Dates > DATE(pub.date)
AND all.Dates < curdate()
AND all.Dates NOT IN ( # Already taken dates for this author
SELECT DISTINCT(DATE(taken.release_date))
FROM (SELECT * FROM publications) as taken
WHERE taken.author_id = pub.author_id
AND taken.release_date IS NOT NULL
)
ORDER BY Date ASC
limit 1
)
WHERE pub.release_date is null
AND pub.type = 'feature';
I read that the way SQL works (simplifying here) is fetching a dataset to the buffer, altering it and then storing. Guess MySQL does something similar. This mismatch seems to happen because the subquery is not reading from the data buffer that we are updating but from the original dataset.
MySQL doesn't allow PostgreSQL update syntax:
UPDATE ...
SET ...
FROM <-
WHERE ...;
Can a subquery inside a SQL update fetch rows which have just been updated?

Optimizing a Select with Subquery that is loading VERY slow

I have a SELECT that is a little bit tricky, as I try to display data that has to be calculated on the fly.
The data is logged from a SmartHome system and displayed in the visualization solution Grafana.
So I have to handle all of this in MySQL and can't really edit the data or the frontend to do some of this work.
The diagram should show the average temperature per day for a time range that can be selected in the UI.
The data in MySQL is a table like that:
DEVICE | READING | VALUE | TIMESTAMP
-----------------------------------------------------------------------------
Thermometer | temperature | 20.0 | 2107.10.12 00:12:59
Thermometer | temperature | 20.2 | 2107.10.12 00:24:12
...
The Request first creates a virtual table (that is not in the database) with timestamps for every full hours for about 10 years.
This is running very quick and doesn't seem to be a reason for my slow fetches
After that I strip down the virtual table to values only within the visible time range in my diagram.
On all of these full-hour-timestamps I have to run a sub-select to get the last temperature value that was logged before the full hour.
This values are then grouped by day and the average is calculated.
That way I get the average over 24 values for each full hour from 00:00 to 23:00.
Based on different wether sites, this is how the official average temperature is normally calculated.
Here is the Select Statement:
SELECT
filtered.hour as time,
AVG((SELECT VALUE
FROM history
WHERE READING="temperature" AND DEVICE="Thermometer" AND TIMESTAMP <= filtered.hour
ORDER BY TIMESTAMP DESC
LIMIT 1
)) as value
FROM (
SELECT calculated.hour as hour FROM (
SELECT DATE_ADD(DATE_SUB(DATE($__timeTo()), INTERVAL 10 YEAR), INTERVAL t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i HOUR) as hour
FROM (SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calculated
WHERE calculated.hour >= $__timeFrom() AND calculated.hour <= $__timeTo()
) filtered
GROUP BY DATE(filtered.hour)
For a timespan of a week it already takes about 5-10 seconds for the diagram to show up. For a month you're close to half a minute.
All my other (simple fetches without calculations) diagrams are loading in about or less than a second.
As I'm a completely MySQL noob and just started to build some SELECTs for my smart home, I don't really know how this can be improved.
Any ideas from the pros? :)
Unless I'm overseeing something really obvious and it doesn't really matter on how many results the average per day is calculated you could really simplify your query and get rid of the subquery's. This should also give you a boost in speed.
SELECT DATE(`TIMESTAMP`) AS `date`, AVG(`VALUE`) AS `value` FROM `history` WHERE `READING`='temperature' AND `DEVICE`='Thermometer' AND DATE(`TIMESTAMP`) BETWEEN 'date1' AND 'date2'
Just replace date1 & date2 with the values you want, for example 2017-10-15.

Select all records in a date range even if no records present

I have a database of employees and their attendance. I have to create the date range reports of attendance.
Attendance Database table
I am using below query
SELECT a.* FROM attendance a WHERE a.user_id=10 AND (a.date BETWEEN '2017-01-06' AND '2017-01-10')
Result:-
I also want records for all given date range but some dates record is not present in database and i want that dates shows null values to corresponds to that dates as shown in below image.
Try this. I am not able to run it but the idea is to generate a date range based on this answer.
Use this date range as derived table d.
Do a d left join attendance adding your condition so you will get all the columns for matching data and null for not matching data.
This will give you rest of the data, except employee id. I suggest you can hardcode it in select query.
select * from
(select a.Date
from
(
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date between '2017-01-06' AND '2017-01-10'
) d
left join
attendance a
on d.date = a.date
where
a.user_id=10

Show average values for large amount of data

I have a MySQL database named WindData which looks like this:
timestamp
temperature
windspeed
winddirection
I add a new row every 2-3 minutes so over a year time there will be lots of rows.
Now I want to present the data as a chart within a certain timeframe, (4 days ago, last month, last 6 month, 2011-2012...). Say that I want to display how the temperature has varied for the past year, using Google Charts to display this. Then Google chart has a maximum limit of the amount of datapoints that you may use.
I would then like a SQL query where I specify the timerange (2012-01-01 -- 2013-10-10) that gives me
A fixed number of rows (for example 200)
Every row contains the average and max value over that interval.
An ascii art example:
...............1..............2...............+..............199..............200
Where . is one row in my table, and the numbers represent average and maxvalue of the previous dots.
Some psudocode that might show what I am trying to accomplish is:
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-01-01 AND timestamp < 2013-10-10
This would just give me one result where I get the average value of the whole timerange.
So maybe there is a way to create one more SQL statement which runs the above sql statement 200 times with different time-range.
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-01-01 AND timestamp < 2013-02-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-02-01 AND timestamp < 2013-03-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-03-01 AND timestamp < 2013-04-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-04-01 AND timestamp < 2013-05-1
...and so on.
If anyone is interested, I will use the help here to present better diagrams on www.surfvind.se, which displays weather data from a homebuilt weather station.
You can get a fixed number of rows using something like this:-
SELECT units.i + tens.i * 10 + hundreds.i AS aNumber
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 ) hundreds
You can use something like this to get the different ranges, and join it against you data to get the number of values within each range.
EDIT - To go with the details you have added:-
SELECT Sub1.aDate, AVG(temperature)
FROM
(
SELECT DATE_ADD('2012-01-01', INTERVAL units.i + tens.i * 10 + hundreds.i * 100 DAY) AS aDate
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
) Sub1
LEFT OUTER JOIN
WindData
ON Sub1.aDate = DATE(WindData.`timestamp`)
GROUP BY Sub1.aDate
This is getting a range of 1000 days starting from 2012-01-01 (you can easily limit that range in the subselect if you want), and matching that against the temp values for a day and getting the average group by date.