Show average values for large amount of data

Show average values for large amount of data - mysql

I have a MySQL database named WindData which looks like this:
timestamp
temperature
windspeed
winddirection
I add a new row every 2-3 minutes so over a year time there will be lots of rows.
Now I want to present the data as a chart within a certain timeframe, (4 days ago, last month, last 6 month, 2011-2012...). Say that I want to display how the temperature has varied for the past year, using Google Charts to display this. Then Google chart has a maximum limit of the amount of datapoints that you may use.
I would then like a SQL query where I specify the timerange (2012-01-01 -- 2013-10-10) that gives me
A fixed number of rows (for example 200)
Every row contains the average and max value over that interval.
An ascii art example:
...............1..............2...............+..............199..............200
Where . is one row in my table, and the numbers represent average and maxvalue of the previous dots.
Some psudocode that might show what I am trying to accomplish is:
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-01-01 AND timestamp < 2013-10-10
This would just give me one result where I get the average value of the whole timerange.
So maybe there is a way to create one more SQL statement which runs the above sql statement 200 times with different time-range.
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-01-01 AND timestamp < 2013-02-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-02-01 AND timestamp < 2013-03-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-03-01 AND timestamp < 2013-04-1
SELECT AVG(temperature)
FROM WindData
WHERE timestamp > 2012-04-01 AND timestamp < 2013-05-1
...and so on.
If anyone is interested, I will use the help here to present better diagrams on www.surfvind.se, which displays weather data from a homebuilt weather station.

You can get a fixed number of rows using something like this:-
SELECT units.i + tens.i * 10 + hundreds.i AS aNumber
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 ) hundreds
You can use something like this to get the different ranges, and join it against you data to get the number of values within each range.
EDIT - To go with the details you have added:-
SELECT Sub1.aDate, AVG(temperature)
FROM
(
SELECT DATE_ADD('2012-01-01', INTERVAL units.i + tens.i * 10 + hundreds.i * 100 DAY) AS aDate
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds
) Sub1
LEFT OUTER JOIN
WindData
ON Sub1.aDate = DATE(WindData.`timestamp`)
GROUP BY Sub1.aDate
This is getting a range of 1000 days starting from 2012-01-01 (you can easily limit that range in the subselect if you want), and matching that against the temp values for a day and getting the average group by date.

Related

Combining mySQL-Subqueries, dynamic BETWEENs?

I am working with MySQL and am trying to combine two Subqueries.
It is about time and excluding timespans.
The first (working) query fetches me every single valid day between two dates that are neigther saturday nor sunday (workdays):
SELECT * FROM
(SELECT adddate('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date from
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4) v
WHERE selected_date BETWEEN '2021-04-01' and '2021-07-15'
AND weekday(selected_date) <> 5
AND weekday(selected_date) <> 6
The second query fetches me the start and end-Dates of vacations from a table (UNIX-Timestamps) and formats these dates the same way the first query returns them as 2 separate values (from, to):
SELECT date_format(FROM_UNIXTIME(from),'%Y-%m-%d') AS "from", date_format(FROM_UNIXTIME(to),'%Y-%m-%d') AS "to" FROM vacation
I am trying to eliminate every occurence of dates that are within these Timespans between 'from' and 'to' from my second query within my first query.
The thing that gets me is that i dont know how to set multiple BETWEENs dynamically to filter out those ranges. The second query returns multiple 'from' and 'to' values which i want to use as a "NOT BETWEEN-Filter" for my first query.
I hope what i said makes sense to you.
I am glad for every answer pushing me towards the right direction.
Thanks in advance
Felix

Use LEFT JOIN to join the two queries, and then a NULL check to exclude the matched rows.
SELECT t1.*
FROM (first query) AS t1
LEFT JOIN (second query) AS t2 ON t1.selected_date BETWEEN t2.from AND t2.to
WHERE t2.from IS NULL
It would also be better if the second query returned DATETIME values rather than formatted dates, so remove the calls to DATE_FORMAT().

Can a subquery inside a SQL update fetch rows which have just been updated?

Having a collection of publications, I want to assign a different release date for each one per author. For doing this I am subtracting to all the dates, from publication's date until yesterday, the already taken dates for that author.
The problem of this update is that the current record depends on the assignation of the previous one. Eg: if there is already a feature assigned to April 2nd, new features on that day will be pushed to the 3rd or beyond. But if there are two unassigned features April 2nd, they will be both assigned to the same day.
UPDATE publications pub
SET pub.release_date = (
SELECT all.Dates
FROM ( # This generates all dates between publication date until yesterday
SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as Dates
FROM (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as a
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as b
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as c
CROSS JOIN (SELECT 0 as a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) as d
) all
WHERE all.Dates > DATE(pub.date)
AND all.Dates < curdate()
AND all.Dates NOT IN ( # Already taken dates for this author
SELECT DISTINCT(DATE(taken.release_date))
FROM (SELECT * FROM publications) as taken
WHERE taken.author_id = pub.author_id
AND taken.release_date IS NOT NULL
)
ORDER BY Date ASC
limit 1
)
WHERE pub.release_date is null
AND pub.type = 'feature';
I read that the way SQL works (simplifying here) is fetching a dataset to the buffer, altering it and then storing. Guess MySQL does something similar. This mismatch seems to happen because the subquery is not reading from the data buffer that we are updating but from the original dataset.
MySQL doesn't allow PostgreSQL update syntax:
UPDATE ...
SET ...
FROM <-
WHERE ...;
Can a subquery inside a SQL update fetch rows which have just been updated?

Optimizing a Select with Subquery that is loading VERY slow

I have a SELECT that is a little bit tricky, as I try to display data that has to be calculated on the fly.
The data is logged from a SmartHome system and displayed in the visualization solution Grafana.
So I have to handle all of this in MySQL and can't really edit the data or the frontend to do some of this work.
The diagram should show the average temperature per day for a time range that can be selected in the UI.
The data in MySQL is a table like that:
DEVICE | READING | VALUE | TIMESTAMP
-----------------------------------------------------------------------------
Thermometer | temperature | 20.0 | 2107.10.12 00:12:59
Thermometer | temperature | 20.2 | 2107.10.12 00:24:12
...
The Request first creates a virtual table (that is not in the database) with timestamps for every full hours for about 10 years.
This is running very quick and doesn't seem to be a reason for my slow fetches
After that I strip down the virtual table to values only within the visible time range in my diagram.
On all of these full-hour-timestamps I have to run a sub-select to get the last temperature value that was logged before the full hour.
This values are then grouped by day and the average is calculated.
That way I get the average over 24 values for each full hour from 00:00 to 23:00.
Based on different wether sites, this is how the official average temperature is normally calculated.
Here is the Select Statement:
SELECT
filtered.hour as time,
AVG((SELECT VALUE
FROM history
WHERE READING="temperature" AND DEVICE="Thermometer" AND TIMESTAMP <= filtered.hour
ORDER BY TIMESTAMP DESC
LIMIT 1
)) as value
FROM (
SELECT calculated.hour as hour FROM (
SELECT DATE_ADD(DATE_SUB(DATE($__timeTo()), INTERVAL 10 YEAR), INTERVAL t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i HOUR) as hour
FROM (SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 as i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calculated
WHERE calculated.hour >= $__timeFrom() AND calculated.hour <= $__timeTo()
) filtered
GROUP BY DATE(filtered.hour)
For a timespan of a week it already takes about 5-10 seconds for the diagram to show up. For a month you're close to half a minute.
All my other (simple fetches without calculations) diagrams are loading in about or less than a second.
As I'm a completely MySQL noob and just started to build some SELECTs for my smart home, I don't really know how this can be improved.
Any ideas from the pros? :)

Unless I'm overseeing something really obvious and it doesn't really matter on how many results the average per day is calculated you could really simplify your query and get rid of the subquery's. This should also give you a boost in speed.
SELECT DATE(`TIMESTAMP`) AS `date`, AVG(`VALUE`) AS `value` FROM `history` WHERE `READING`='temperature' AND `DEVICE`='Thermometer' AND DATE(`TIMESTAMP`) BETWEEN 'date1' AND 'date2'
Just replace date1 & date2 with the values you want, for example 2017-10-15.

Select all records in a date range even if no records present

I have a database of employees and their attendance. I have to create the date range reports of attendance.
Attendance Database table
I am using below query
SELECT a.* FROM attendance a WHERE a.user_id=10 AND (a.date BETWEEN '2017-01-06' AND '2017-01-10')
Result:-
I also want records for all given date range but some dates record is not present in database and i want that dates shows null values to corresponds to that dates as shown in below image.

Try this. I am not able to run it but the idea is to generate a date range based on this answer.
Use this date range as derived table d.
Do a d left join attendance adding your condition so you will get all the columns for matching data and null for not matching data.
This will give you rest of the data, except employee id. I suggest you can hardcode it in select query.
select * from
(select a.Date
from
(
select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) a
where a.Date between '2017-01-06' AND '2017-01-10'
) d
left join
attendance a
on d.date = a.date
where
a.user_id=10

Repeat rows based on range of dates in two columns

I have a table with following columns:
ID startdate enddate
I want the rows of this table to be repeated as many times as the difference between startdate and enddate along with a column which gives all the dates between these two days for each id in the table. So, my new table should be like this:
ID Date
A startdate
A startdate +1 day
A startdate +2 days (till enddate)
B startdate
B startdate + 1 day ....
Please note that I have different start and end dates for different IDs.
I tried the answer for the following question, but this doesn't work:
Mysql select multiple rows based on one row related date range

Here's one approach.
This uses an inline view (aliased as i to generate integer values from 0 to 999, and that is joined to your table to generate up to 1000 date values, starting from startdate up to enddate for each row.
The inline view i can be easily extended to generate 10,000 or 100,000 rows, following the same pattern.
This assumes that the startdate and enddate columns are datatype DATE. (or DATETIME or TIMESTAMP or a datatype that can be implicitly converted to valid DATE values.
SELECT t.id
, t.startdate + INTERVAL i.i DAY AS `Date`
FROM ( SELECT d3.n*100 + d2.n*10 + d1.n AS i
FROM ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d1
CROSS
JOIN ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d2
CROSS
JOIN ( SELECT 0 AS n
UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) d3
) i
JOIN mytable t
ON i.i <= DATEDIFF(t.enddate,t.startdate)

You need a numbers table... create a temporary table or dummy table that contains the numbers 1 to X (X being the maximum possible difference between the two dates)
Then join to that table using a date diff
I'm afraid I'm SQL Server and so not sure if the datediff functions work the same way in mysql, but you should get the idea.
SELECT
DateTable.Id,
DATEADD(dd, NumbersTable.Number, DateTable.StartDate)
FROM
DateTable
INNER JOIN
NumbersTable
ON
DATEADD(dd, NumbersTable.Number, DateTable.StartDate) <= DateTable.EndDate
ORDER BY
DateTable.Id,
DATEADD(dd, NumbersTable.Number, DateTable.StartDate)

I know its very late to answer
but still one more answer using recursive cte
with recursive cte ( id, startdate) as
(
select id,startdate from test t1
union all
select t2.id,(c.startdate + interval '1 day')::date
from test t2
join cte c on c.id=t2.id and (c.startdate + interval '1 day')::date<=t2.enddate
)
select id,startdate as date from cte
order by id, startdate
its PostgreSQL specific, but it should work in other relational databases with little bit change in Date function.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Show average values for large amount of data - mysql

Related

Combining mySQL-Subqueries, dynamic BETWEENs?

Can a subquery inside a SQL update fetch rows which have just been updated?

Optimizing a Select with Subquery that is loading VERY slow

Select all records in a date range even if no records present

Repeat rows based on range of dates in two columns

Categories

Resources