MySQL - Average values of data for every 3 hours? - mysql

I currently have two tables in mysql that are collecting data. One collects every 3 hours and the other every 15 minutes. I am trying to create a forecast using this data using machine learning techniques which requires the data to be of the same time interval.
My question is, how can I get the average value for every 3 hours from both tables into a new table that organises them by date and hour? So of a format like:
Date / Hour / Table 1 3-hour average / Table 2 3-hour average
Table 1 (every 3 hours):
datetime(timestamp)/level(double)
Table 2 (every 15 minutes):
datetime(timestamp)/value(double)
I have managed to create a table that gives me the average per day, but I'd prefer something more accurate!
This is the statement I've already got for daily averages:
SELECT DAY(`datetime`),
AVG(`level`) AS `Table 1 Average` ,
(SELECT AVG(`value`) AS `Table 2 Daily Average`
FROM `table2`
WHERE DAY(`datetime`) = DAY(table1.datetime)
GROUP BY DAY(`datetime`))
FROM `table1` GROUP BY DAY(`datetime`);
Thanks!

You really just need to change the subquery. I think this may do way you want:
SELECT t1.*,
SELECT AVG(t2.`value`)
FROM `table2`
WHERE t2.datetime <= t1.datime and
t2.datetime >= date_sub(t1.datetime, interval 3 hour)
) as t2_moving_average
FROM `table1` t1;

Related

SQL Query to get distinct values from a table and the difference between ordered rows

I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.

Selecting first value of every minute in table

I've been trying to work this one out for a while now, maybe my problem is coming up with the correct search query. I'm not sure.
Anyway, the problem I'm having is that I have a table of data that has a new row added every second (imagine the structure {id, timestamp(datetime), value}). I would like to do a single query for MySQL to go through the table and output only the first value of each minute.
I thought about doing this with multiple queries with LIMIT and datetime >= (beginning of minute) but with the volume of data I'm collecting that is a lot of queries so it would be nicer to produce the data in a single query.
Sample data:
id datetime value
1 2015-01-01 00:00:00 128
2 2015-01-01 00:00:01 127
3 2015-01-01 00:00:04 129
4 2015-01-01 00:00:05 127
...
67 2015-01-01 00:00:59 112
68 2015-01-01 00:01:12 108
69 2015-01-01 00:01:13 109
Where I would want the result to select the rows:
1 2015-01-01 00:00:00 128
68 2015-01-01 00:01:12 108
Any ideas?
Thanks!
EDIT: Forgot to add, the data, whilst every second, is not reliably on the first second of every minute - it may be :30 or :01 rather than :00 seconds past the minute
EDIT 2: A nice-to-have (definitely not required for answer) would be a query that is flexible to also take an arbitrary number of minutes (rather than one row each minute)
SELECT t2.* FROM
( SELECT MIN(`datetime`) AS dt
FROM tbl
GROUP BY DATE_FORMAT(`datetime`,'%Y-%m-%d %H:%i')
) t1
JOIN tbl t2 ON t1.dt = t2.`datetime`
SQLFiddle
Or
SELECT *
FROM tbl
WHERE dt IN ( SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i'))
SQLFiddle
SELECT t1.*
FROM tbl t1
LEFT JOIN (
SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i')
) t2 ON t1.dt = t2.dt
WHERE t2.dt IS NOT NULL
SQLFiddle
In MS SQL Server I would use CROSS APPLY, but as far as I know MySQL doesn't have it, so we can emulate it.
Make sure that you have an index on your datetime column.
Create a table of numbers, or in your case a table of minutes. If you have a table of numbers starting from 1 it is trivial to turn it into minutes in the necessary range.
SELECT
tbl.ID
,tbl.`dt`
,tbl.value
FROM
(
SELECT
MinuteValue
, (
SELECT tbl.id
FROM tbl
WHERE tbl.`dt` >= Minutes.MinuteValue
ORDER BY tbl.`dt`
LIMIT 1
) AS ID
FROM Minutes
) AS IDs
INNER JOIN tbl ON tbl.ID = IDs.ID
For each minute find one row that has timestamp greater than the minute. I don't know how to return the full row, rather than one column in MySQL in the nested SELECT, so at first I'm making a temp table with two columns: Minute and id from the original table and then explicitly look up rows from original table knowing their IDs.
SQL Fiddle
I've created a table of Minutes in the SQL Fiddle with the necessary values to make example simple. In real life you would have a more generic table.
Here is SQL Fiddle that uses a table of numbers, just for illustration.
In any case, you do need to know in advance somehow the range of dates/numbers you are interested in.
It is trivial to make it work for any interval of minutes. If you need results every 5 minutes, just generate a table of minutes that has values not every 1 minute, but every 5 minutes. The main query would remain the same.
It may be more efficient, because here you don't join the big table to itself and you don't make calculations on the datetime column, so the server should be able to use the index on it.
The example that I made assumes that for each minute there is at least one row in the big table. If it is possible that there are some minutes that don't have any data at all you'd need to add extra check in the WHERE clause to make sure that the found row is still within that minute.
select * from table where timestamp LIKE "%-%-% %:%:00" could work.
This is similar to this question: Stack Overflow Date SQL Query Question
Edit: This probably would work better:
`select , date_format(timestamp, '%Y-%m-%d %H:%i') as the_minute, count()
from table
group by the_minute
order by the_minute
Similar to this question here: mysql select date format
i'm not really sure, but you could try this:
SELECT MIN(timestamp) FROM table WHERE YEAR(timestamp)=2015 GROUP BY DATE(timestamp), HOUR(timestamp), MINUTE(timestamp)

DB, how to select data based on time and particular interval

I have a table in my database, my program will insert data to that table in every 10 mins.
The table has a field recording the insert date and time.
Now I want to retrieve those data, but I don't want hundreds of data comes out.
I want to get 1 records from every half hour based on insert time stamp (so less than 50 in total of a day).
For that 1 record, it can be either random pick or average from each interval.
Sorry for the ambiguit, cuz I just wanna figure out the way to select from intervals
Let say,
Table name: network_speed
----------------------------------
ID. ....... Speed ......... Insert_time
1 ....... 10 ......... 10:02am......
2 ....... 12 ......... 10:12am......
...
...
...
123 ....... 17 ........ 9:23am........
To get them all but out put must be average of each half hour record
How can I write a query to achieve this?
Here is a query that calculates half hour intervals on a specific day ( 2013-09-04).
SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04';
Use that in a nested query to get stats on the records in the intervals.
SELECT IT.interval, COUNT(ID), MIN(Insert_time), MAX(Insert_time), AVG(Speed)
FROM
(SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04') AS IT
GROUP BY IT.interval;
Here it is used to get the first record in each interval.
SELECT NS.*
FROM
(SELECT IT.interval, MIN(ID) AS 'first_id'
FROM
(SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04') AS IT
GROUP BY IT.interval) AS MI,
network_speed AS NS
WHERE MI.first_id = NS.ID;
Hope that helps.
Is this what you need?
SELECT HOUR(ts) as hr, fld1, fld2 from tbl group by hr
This query selects only hour from the timestamp and then groups the result based on the hour field so you get 1 row for each hour

Mysql maximum rows in a variable timeframe

I'm making a fitness logbook where indoor rowers can log there results.
To make it interesting and motivating I'm implementing an achievement system.
I like to have an achievement that if someone rows more than 90 times within 24 weeks they get that achievement.
Does anybody have some hints in how i can implement this in MYSQL.
The mysql-table for the logbook is pretty straightforward: id, userid, date (timestamp),etc (rest is omitted because it doesn't really matter)
The jist is that the first rowdate and the last one can't exceed the 24 weeks.
I assume from your application that you want the most recent 24 weeks.
In mysql, you do this as:
select lb.userid
from logbook lb
where datediff(now(), lb.date) >= 7*24
group by userid
having count(*) >= 90
If you need it for an arbitrary 24-week period, can you modify the question?
Just do a sql query to count the number of rows a user has between now and 24 weeks ago. This is a pretty straight forward query to run.
Look at using something with datediff in mysql to get the difference between now and 24 weeks ago.
After you have a script set up to do this, set up a cron job to run either every day or every week and do some automation on this.
I think you should create a table achievers which you populate with the achievers of each day.
You can set a recurrent(daily, right before midnight) event in which you run a query like this:
delete from achievers;
insert into achievers (
select userid
from logbook
where date < currenttimestamp and date > currenttimestamp - 24weeks
group by userid
having count(*) >= 90
)
For events in mysql: http://dev.mysql.com/doc/refman/5.1/en/events-overview.html
This query will give you the list of users total activity in 24 weeks
select * from table groupby userid where `date` BETWEEN DATE_SUB( CURDATE( ) ,INTERVAL 168 DAY ) AND CURDATE( ) having count(id) >= 90

Padding MYSQL data with missing dates when comparing year over year stats?

I have a table that tracks emails sent. It is pretty simple.
ID | DATETIME | E-MAIL | SUBJECT | MESSAGE
I have been collecting data for several years. Some days I don't have any entries in the table.
query1:
SELECT COUNT(ID) FROM emails
WHERE DATE(datetime) >= 'XXXX-XX-XX'
AND DATE(datetime) is <= 'ZZZZ-ZZ-ZZ'
GROUP BY DATE(datetime)
I then use a some php to get one year prior for both XXXX and YYYY and run the second query which is the same as the first...
query2:
SELECT COUNT(ID) from emails
WHERE DATE(datetime) >= 'XXXX-XX-XX'
AND DATE(datetime) is <= 'ZZZZ-ZZ-ZZ'
GROUP BY DATE(datetime)
I am using a charting package to compare how many emails I got for a date range and then I overlay how many emails I got for the same range only one year prior. This is two queries right now and I chart the results.
The issue is where mysql does not have any emails for 2011 for a day in question, but has a few in 2012 for the same day.
Combining the results and graphing them skews the results since I am missing a date and a 0 value for last year for that day, effectively making all my values no longer match up.
2011-03-01 10 2012-03-01 4
2011-03-02 4 2012-03-02 2
2011-03-03 6 2012-03-04 1 <---- see where the two queries
end up diverging? (I had nothing
logged for 2012-03-03 so naturally
it was not in the results.
Is there a way I can get mysql to output the data I need including dates where value appear in one year but not another OR if no values appear in either year (still need date and 0) so my chart works?
I cannot seem to figure out how to do this...
Thanks!
There are a few different ways to get the results for a contiguous set of dates. My favourite one is to create the full set that is required using a dummy table or an existing contiguous set of ids from an AI PK. Something like this -
SELECT '2011-01-01' + INTERVAL (id -1) DAY
FROM dummy
WHERE id BETWEEN 1 AND 365
This will return a full set of days for 2011 which can then be LEFT JOINed to your emails table to get the counts -
SELECT `dates`.`date`, COUNT(emails.id)
FROM (
SELECT '2011-01-01' + INTERVAL (id - 1) DAY AS `date`, '2011-01-01 23:59:59' + INTERVAL (id - 1) DAY AS `end_of_day`
FROM dummy
WHERE id BETWEEN 1 AND 365
) `dates`
LEFT JOIN emails
ON `emails`.`datetime` BETWEEN `dates`.`date` AND `dates`.`end_of_day`
GROUP BY `dates`.`date`
To populate your dummy / seq table you can insert the first ten values manually and then use INSERT ... SELECT to add the rest -
CREATE TABLE dummy (id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO dummy VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
SET #tmp := (SELECT MAX(id) FROM dummy) + 1;
INSERT INTO dummy
SELECT #tmp + id
FROM dummy;
You need to execute the SET query before each run of the INSERT ... SELECT query.