DB, how to select data based on time and particular interval - mysql

I have a table in my database, my program will insert data to that table in every 10 mins.
The table has a field recording the insert date and time.
Now I want to retrieve those data, but I don't want hundreds of data comes out.
I want to get 1 records from every half hour based on insert time stamp (so less than 50 in total of a day).
For that 1 record, it can be either random pick or average from each interval.
Sorry for the ambiguit, cuz I just wanna figure out the way to select from intervals
Let say,
Table name: network_speed
----------------------------------
ID. ....... Speed ......... Insert_time
1 ....... 10 ......... 10:02am......
2 ....... 12 ......... 10:12am......
...
...
...
123 ....... 17 ........ 9:23am........
To get them all but out put must be average of each half hour record
How can I write a query to achieve this?

Here is a query that calculates half hour intervals on a specific day ( 2013-09-04).
SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04';
Use that in a nested query to get stats on the records in the intervals.
SELECT IT.interval, COUNT(ID), MIN(Insert_time), MAX(Insert_time), AVG(Speed)
FROM
(SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04') AS IT
GROUP BY IT.interval;
Here it is used to get the first record in each interval.
SELECT NS.*
FROM
(SELECT IT.interval, MIN(ID) AS 'first_id'
FROM
(SELECT ID, Speed, Insert_time,
ROUND(TIMESTAMPDIFF(MINUTE, '2013-09-04', Insert_time)/48) AS 'interval'
FROM network_speed
WHERE DATE(Insert_time) = '2013-09-04') AS IT
GROUP BY IT.interval) AS MI,
network_speed AS NS
WHERE MI.first_id = NS.ID;
Hope that helps.

Is this what you need?
SELECT HOUR(ts) as hr, fld1, fld2 from tbl group by hr
This query selects only hour from the timestamp and then groups the result based on the hour field so you get 1 row for each hour

Related

SQL Query to get distinct values from a table and the difference between ordered rows

I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.

How to get average difference of timestamp in hive

I have this table below which contains two column
hive> select * from hivetable;
a 2016-09-16T03:01:12.367782Z
b 2016-09-16T03:01:12.300514Z
c 2016-09-16T03:01:12.241532Z
a 2016-09-16T03:01:12.138016Z
c 2016-09-16T03:01:12.136986Z
b 2016-09-16T03:01:10.512201Z
c 2016-09-16T03:01:12.235671Z
Time taken: 0.457 seconds, Fetched: 7 row(s)
and now I want to find the unique value from first column and the timestamp difference or I should say average timestamp difference in case there are more than 2 records as in case of c. so in my case the output should be like
a 1 day 5 hr 30 min 20 sec
b 5 sec
c 30 minutes
Note: it is just a sample output and not the actual output
Is it possible to get this output or any similar one in hive?
You just need to use a window function to select the previous row in the grouping. I don't believe it can be compressed into just one query.
select
id,
avg(DATEDIFF(time, prev_time)) as avg_time_diff_days
from (
select id,
time,
LAG(time, 1, 0) OVER (PARTITION BY id, time ORDER BY time ASC)) as prev_time
from table
) intervals
group by id;

How to return zero values if nothing was written in time interval?

I am using the Graph Reports for the select below. The MySQL database only has the active records in the database, so if no records are in the database from X hours till Y hours that select does not return anything. So in my case, I need that select return Paypal zero values as well even the no activity was in the database. And I do not understand how to use the UNION function or re-create select in order to get the zero values if nothing was recorded in the database in time interval. Could you please help?
select STR_TO_DATE ( DATE_FORMAT(`acctstarttime`,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(*) as `Active Paid Accounts`
from radacct_history where `paymentmethod` = 'PayPal'
group by DATE_FORMAT(`#date`,'%y-%m-%d %H')
When I run the select the output is:
Current Output
But I need if there are no values between 2016-07-27 07:00:00 and 2016-07-28 11:00:00, then in every hour it should show zero active accounts Like that:
Needed output with no values every hour
I have created such select below , but it not put to every hour the zero value like i need. showing the big gap between the 12 Sep and 13 Sep anyway, but there should be the zero values every hour
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(paymentmethod) as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H'))
union ALL
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', 0 as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H')) ;
I guess, you want to return 0 if there is no matching rows in MySQL. Here is an example:
(SELECT Col1,Col2,Col3 FROM ExampleTable WHERE ID='1234')
UNION (SELECT 'Def Val' AS Col1,'none' AS Col2,'' AS Col3) LIMIT 1;
Updated the post: You are trying to retrieve data that aren't present in the table, I guess in reference to the output provided. So in this case, you have to maintain a date table to show the date that aren't in the table. Please refer to this and it's little bit tricky - SQL query that returns all dates not used in a table
You need an artificial table with all necessary time intervals. E.g. if you need daily data create a table and add all day dates e.g. start from 1970 till 2100.
Then you can use the table and LEFT JOIN your radacct_history. So for each desired interval you will have group item (group by should be based on the intervals table.

Selecting first value of every minute in table

I've been trying to work this one out for a while now, maybe my problem is coming up with the correct search query. I'm not sure.
Anyway, the problem I'm having is that I have a table of data that has a new row added every second (imagine the structure {id, timestamp(datetime), value}). I would like to do a single query for MySQL to go through the table and output only the first value of each minute.
I thought about doing this with multiple queries with LIMIT and datetime >= (beginning of minute) but with the volume of data I'm collecting that is a lot of queries so it would be nicer to produce the data in a single query.
Sample data:
id datetime value
1 2015-01-01 00:00:00 128
2 2015-01-01 00:00:01 127
3 2015-01-01 00:00:04 129
4 2015-01-01 00:00:05 127
...
67 2015-01-01 00:00:59 112
68 2015-01-01 00:01:12 108
69 2015-01-01 00:01:13 109
Where I would want the result to select the rows:
1 2015-01-01 00:00:00 128
68 2015-01-01 00:01:12 108
Any ideas?
Thanks!
EDIT: Forgot to add, the data, whilst every second, is not reliably on the first second of every minute - it may be :30 or :01 rather than :00 seconds past the minute
EDIT 2: A nice-to-have (definitely not required for answer) would be a query that is flexible to also take an arbitrary number of minutes (rather than one row each minute)
SELECT t2.* FROM
( SELECT MIN(`datetime`) AS dt
FROM tbl
GROUP BY DATE_FORMAT(`datetime`,'%Y-%m-%d %H:%i')
) t1
JOIN tbl t2 ON t1.dt = t2.`datetime`
SQLFiddle
Or
SELECT *
FROM tbl
WHERE dt IN ( SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i'))
SQLFiddle
SELECT t1.*
FROM tbl t1
LEFT JOIN (
SELECT MIN(dt) AS dt
FROM tbl
GROUP BY DATE_FORMAT(dt,'%Y-%m-%d %H:%i')
) t2 ON t1.dt = t2.dt
WHERE t2.dt IS NOT NULL
SQLFiddle
In MS SQL Server I would use CROSS APPLY, but as far as I know MySQL doesn't have it, so we can emulate it.
Make sure that you have an index on your datetime column.
Create a table of numbers, or in your case a table of minutes. If you have a table of numbers starting from 1 it is trivial to turn it into minutes in the necessary range.
SELECT
tbl.ID
,tbl.`dt`
,tbl.value
FROM
(
SELECT
MinuteValue
, (
SELECT tbl.id
FROM tbl
WHERE tbl.`dt` >= Minutes.MinuteValue
ORDER BY tbl.`dt`
LIMIT 1
) AS ID
FROM Minutes
) AS IDs
INNER JOIN tbl ON tbl.ID = IDs.ID
For each minute find one row that has timestamp greater than the minute. I don't know how to return the full row, rather than one column in MySQL in the nested SELECT, so at first I'm making a temp table with two columns: Minute and id from the original table and then explicitly look up rows from original table knowing their IDs.
SQL Fiddle
I've created a table of Minutes in the SQL Fiddle with the necessary values to make example simple. In real life you would have a more generic table.
Here is SQL Fiddle that uses a table of numbers, just for illustration.
In any case, you do need to know in advance somehow the range of dates/numbers you are interested in.
It is trivial to make it work for any interval of minutes. If you need results every 5 minutes, just generate a table of minutes that has values not every 1 minute, but every 5 minutes. The main query would remain the same.
It may be more efficient, because here you don't join the big table to itself and you don't make calculations on the datetime column, so the server should be able to use the index on it.
The example that I made assumes that for each minute there is at least one row in the big table. If it is possible that there are some minutes that don't have any data at all you'd need to add extra check in the WHERE clause to make sure that the found row is still within that minute.
select * from table where timestamp LIKE "%-%-% %:%:00" could work.
This is similar to this question: Stack Overflow Date SQL Query Question
Edit: This probably would work better:
`select , date_format(timestamp, '%Y-%m-%d %H:%i') as the_minute, count()
from table
group by the_minute
order by the_minute
Similar to this question here: mysql select date format
i'm not really sure, but you could try this:
SELECT MIN(timestamp) FROM table WHERE YEAR(timestamp)=2015 GROUP BY DATE(timestamp), HOUR(timestamp), MINUTE(timestamp)

Time Frequency of Inserts

I have data being inserted into a mysql table of the form
id companyid date
how would I find the time average frequency of inserts by companyid.
Some companies send data daily, some weekly, some every 10 days, etc.
would like a result of the form
companyid average frequency of inserts
2 every 5 days
3 every 10 days
4 every 2 days
One definition of average would be the difference between the maximum and minimum values divided by one less than the count. Something like this might be what you are looking for:
select companyid,
(case when max(date) <> min(date())
then datediff(max(date), min(date)) / (count(*) - 1)
end) as average_frequency
from table t
group by companyid;