mysql group based on time intervals - mysql

I want to query a table which is like
Table structure for table archive
|------
|Column|Type|Null|Default
|------
|//**id**//|int(11)|No|
|datetime|timestamp|No|CURRENT_TIMESTAMP
|gatewayid|int(11)|No|
|RSSI|float|No|
|distance|float|No|
|beaconid|int(11)|No|
== Dumping data for table archive
|1|2017-08-22 12:14:19|1|-65|36|1
|2|2017-08-22 12:14:19|2|-60|30|1
|3|2017-08-22 12:14:19|3|-60|30|1
|4|2017-08-22 12:14:19|1|-52|63|2
|5|2017-08-22 12:14:19|2|-36|33|2
|6|2017-08-22 12:14:19|3|-65|33|2
|7|2017-08-22 12:14:19|1|-69|66|3
|8|2017-08-22 12:14:19|2|-65|33|3
|9|2017-08-22 12:14:19|3|-66|33|3
|10|2017-08-22 12:16:09|1|-65|36|1
|11|2017-08-22 12:16:09|2|-60|30|1
|12|2017-08-22 12:16:09|3|-60|30|1
|13|2017-08-22 12:16:09|1|-52|63|2
|14|2017-08-22 12:16:09|2|-36|33|2
|15|2017-08-22 12:16:09|3|-65|33|2
|16|2017-08-22 12:16:09|1|-69|66|3
|17|2017-08-22 12:16:09|2|-65|33|3
|18|2017-08-22 12:16:09|3|-66|33|3
|19|2017-08-22 12:32:05|1|-65|36|1
|20|2017-08-22 12:32:05|2|-60|30|1
|21|2017-08-22 12:32:05|3|-60|30|1
|22|2017-08-22 12:32:05|1|-52|63|2
|23|2017-08-22 12:32:05|2|-36|33|2
|24|2017-08-22 12:32:05|3|-65|33|2
|25|2017-08-22 12:32:05|1|-69|66|3
I want to average RSSI values based on theses rules
- group based on gatewayid,beaconid and datetime
- the group by datetime should be in every 5 minutes for example
well in fact I want avrage RSSI values for rows which their beaconid and gatewayid are the same and they are added in a 5 minutes interval,
I have written this query
select DATE_ADD( '1900-01-01T00:00:00',INTERVAL 15+TIMESTAMPDIFF(minute, '1900-01-01T00:00:00', datetime) minute),
(sum(RSSI)/count(*)) as mean_rssi,
(sum(distance)/count(*)) as mean_distance,
beaconid,
gatewayid
from archive
GROUP by DATE_ADD( '1900-01-01T00:00:00',INTERVAL 15+TIMESTAMPDIFF(minute, '1900-01-01T00:00:00', datetime) minute),
beaconid,
gatewayid
Here is sqlfiddle for create statement
this query returns all rows without any changes,where am I doing wrong?
thanks

Your query appears to be correct it's just that your data is unique on interval/beaconid/getewayid so it happens to return 'all' rows...
If I understand your data you probably want to group by 5 min intervals, and to get a value that is the same for every five min interval you may opt to just divide the timestamp by 300 seconds - something like:
select min(`datetime`) as `start`,
(sum(RSSI)/count(*)) as mean_rssi,
(sum(distance)/count(*)) as mean_distance,
beaconid,
gatewayid
from archive
group by UNIX_TIMESTAMP(`datetime`) / 60*5,
beaconid,
gatewayid
also use sqlfiddle or rextester it is much easier to help you if you use those...

Related

SQL Query to get distinct values from a table and the difference between ordered rows

I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.

How to generate faster mysql query with 1.6M rows

I have a table that has 1.6M rows. Whenever I use the query below, I get an average of 7.5 seconds.
select * from table
where pid = 170
and cdate between '2017-01-01 0:00:00' and '2017-12-31 23:59:59';
I tried adding a LIMIT 1000 or 10000 or change the date to filter for 1 month, it still processes it to an average of 7.5s. I tried adding a composite index for pid and cdate but it resulted to 1 second slower.
Here is the INDEX list
https://gist.github.com/primerg/3e2470fcd9b21a748af84746554309bc
Can I still make it faster? Is this an acceptable performance considering the amount of data?
Looks like the index is missing. Create this index and see if its helping you.
CREATE INDEX cid_date_index ON table_name (pid, cdate);
And also modify your query to below.
select * from table
where pid = 170
and cdate between CAST('2017-01-01 0:00:00' AS DATETIME) and CAST('2017-12-31 23:59:59' AS DATETIME);
Please provide SHOW CREATE TABLE clicks.
How many rows are returned? If it is 100K rows, the effort to shovel that many rows is significant. And what will you do with that many rows? If you then summarize them, consider summarizing in SQL!
Do have cdate as DATETIME.
Do you use id for anything? Perhaps this would be better:
PRIMARY KEY (pid, cdate, id) -- to get benefit from clustering
INDEX(id) -- if still needed (and to keep AUTO_INCREMENT happy)
This smells like Data Warehousing. DW benefits significantly from building and maintaining Summary table(s), such as one that has the daily click count (etc), from which you could very rapidly sum up 365 counts to get the answer.
CAST is unnecessary. Furthermore 0:00:00 is optional -- it can be included or excluded for either DATE or DATETIME. I prefer
cdate >= '2017-01-01'
AND cdate < '2017-01-01' + INTERVAL 1 YEAR
to avoid leap year, midnight, date arithmetic, etc.

selecting all records between a certain period in mysql

I have the following table,for example:
id....name....fromtime....totime
1.....a.......00:00.......00:09:59
2.....a.......00:10.......00:19:59
3.....a.......00:20.......00:29:59
4.....a.......00:30.......00:39:59
5.....a.......00:40.......00:49:59
I want to retrieve all records that belong to a (not a problem) and are between 00:05 and 00:25 that is, for the above example, are rows 1 to 3 leaving 4 and 5 out.
How can I accomplish that using mysql?
Thanks
Look into using the 'between' clause with a new column that combines fromtime and totime.
For instance, SELECT id, fromtime-totime As "elapsed_time" FROM your_table WHERE elapsed_time BETWEEN 5 AND 25. Or something like that. The query will vary depending on the types of your columns.
See more here:
http://www.tutorialspoint.com/mysql/mysql-between-clause.htm
http://www.geeksengine.com/database/basic-select/arithmetic-operations.php
AND FromTime >= subtime( $time ) , '00:10:00' ) - to get also the first row
AND ToTime <= addtime( TIME( $time ) , '00:20:00' ) - 00:20:00 is an example ofcourse

average rows in a column that are between 5 minutes

I would like to ask about how can I take average of rows in a column that are between 5 minutes.
In order to be more accurate I have a table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:03:34)----------45
1---------123------(24/4/2014 12:04:34)----------43
2---------127------(24/4/2014 12:04:37)----------50
3---------123------(28/4/2014 12:03:34)----------60
i would like to create a new table that will have the average of speed for rows that have the same link_id and are between 5 minutes
In the case that I mentioned above only the two first rows comply the requirements
and i want a new table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:00:00)----------44
2---------127------(24/4/2014 12:00:00)----------50
3---------123------(28/4/2014 12:00:00)----------60
which is the query that i have to use to create a new table with those requirments?
thank you in advance
It is not clear what you mean by 'average of speed for rows that ... are between five minutes.' So I will guess.
I guess you want to compute the averages for each distinct five minute interval. For example, you want averages of all items with timestamps from 2014-04-24 12:00:00 to 2014-04-24:12:04:59, then another average for items with timestamps from 2014-04-24 12:05:00 to 2014-04-24:12:09:59, and so forth.
To do this, you need to start with an expression that will take any DATETIME value and round it down to the beginning of its five-minute interval. How do you get that?
First, this expression will round down a timestamp to the beginning of the minute in which it occurs:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00')
This expression gives the number of minutes past the hour, modulo 5.
MINUTE(`date`)%5
So, this expression gives you the rounded-down DATETIME you need:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
Great. Now we need to use that in an aggregate query to compute the average speeds.
SELECT link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE AS five_min
AVG(speed) AS avg_speed
FROM mytable
GROUP BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
ORDER BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
This will do the trick you need done. There will be one row for each distinct link_id and five-minute interval of time. The time interval will be named by giving the time at which it begins. Each row will contain the average speed for observations in that time interval.
It's helpful when creating your specification for this kind of query to think very carefully about what you want each row of your result set to contain. If you do that, you will probably find that your query flows naturally from your specification.
Here's a more extensive writeup on how to do this sort of thing.
http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/

Select rows that are less than 5 minutes old using DATE_SUB

I have a table that is getting hundreds of requests per minute. The issue that I'm having is that I need a way to select only the rows that have been inserted in the past 5 minutes. I am trying this:
SELECT count(id) as count, field1, field2
FROM table
WHERE timestamp > DATE_SUB(NOW(), INTERVAL 5 MINUTE)
ORDER BY timestamp DESC
My issue is that it returns 70k+ results and counting. I am not sure what it is that I am doing wrong, but I would love to get some help on this. In addition, if there were a way to group them by minute to have it look like:
| count | field1 | field2 |
----------------------------
I'd love the help and direction on this, so please let me know your thoughts.
You don't really need DATE_ADD/DATE_SUB, date arithmetic is much simpler:
SELECT COUNT(id), DATE_FORMAT(`timestamp`, '%Y-%m-%d %H:%i')
FROM `table`
WHERE `timestamp` >= CURRENT_TIMESTAMP - INTERVAL 5 MINUTE
GROUP BY 2
ORDER BY 2
The following seems like it would work which is mighty close to what you had:
SELECT
MINUTE(date_field) as `minute`,
count(id) as count
FROM table
WHERE date_field > date_sub(now(), interval 5 minute)
GROUP BY MINUTE(date_field)
ORDER BY MINUTE(date_field);
Note the added column to show the minute and the GROUP BY clause that gathers up the results into the corresponding minute. Imagine that you had 5 little buckets labeled with the last 5 minutes. Now imagine you tossed each row that was 4 minutes old into it's own bucket. count() will then count the number of entries found in each bucket. That's a quick visualization on how GROUP BY works. http://www.tizag.com/mysqlTutorial/mysqlgroupby.php seems to be a decent writeup on GROUP BY if you need more info.
If you run that and the number of entries in each minute seems too high, you'll want to do some troubleshooting. Try replacing COUNT(id) with MAX(date_field) and MIN(date_field) so you can get an idea what kind of dates it is capturing. If MIN() and MAX() are inside the range, you may have more data written to your database than you realize.
You might also double check that you don't have dates in the future as they would all be > now(). The MIN()/MAX() checks mentioned above should identify that too if it's a problem.