I have a table that is getting hundreds of requests per minute. The issue that I'm having is that I need a way to select only the rows that have been inserted in the past 5 minutes. I am trying this:
SELECT count(id) as count, field1, field2
FROM table
WHERE timestamp > DATE_SUB(NOW(), INTERVAL 5 MINUTE)
ORDER BY timestamp DESC
My issue is that it returns 70k+ results and counting. I am not sure what it is that I am doing wrong, but I would love to get some help on this. In addition, if there were a way to group them by minute to have it look like:
| count | field1 | field2 |
----------------------------
I'd love the help and direction on this, so please let me know your thoughts.
You don't really need DATE_ADD/DATE_SUB, date arithmetic is much simpler:
SELECT COUNT(id), DATE_FORMAT(`timestamp`, '%Y-%m-%d %H:%i')
FROM `table`
WHERE `timestamp` >= CURRENT_TIMESTAMP - INTERVAL 5 MINUTE
GROUP BY 2
ORDER BY 2
The following seems like it would work which is mighty close to what you had:
SELECT
MINUTE(date_field) as `minute`,
count(id) as count
FROM table
WHERE date_field > date_sub(now(), interval 5 minute)
GROUP BY MINUTE(date_field)
ORDER BY MINUTE(date_field);
Note the added column to show the minute and the GROUP BY clause that gathers up the results into the corresponding minute. Imagine that you had 5 little buckets labeled with the last 5 minutes. Now imagine you tossed each row that was 4 minutes old into it's own bucket. count() will then count the number of entries found in each bucket. That's a quick visualization on how GROUP BY works. http://www.tizag.com/mysqlTutorial/mysqlgroupby.php seems to be a decent writeup on GROUP BY if you need more info.
If you run that and the number of entries in each minute seems too high, you'll want to do some troubleshooting. Try replacing COUNT(id) with MAX(date_field) and MIN(date_field) so you can get an idea what kind of dates it is capturing. If MIN() and MAX() are inside the range, you may have more data written to your database than you realize.
You might also double check that you don't have dates in the future as they would all be > now(). The MIN()/MAX() checks mentioned above should identify that too if it's a problem.
Related
I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.
I have a table that has a column that is called scores and another one that is called date_time
I am trying to find out for each 5 minute time increment how many I have that are above a certain score. I want to ignore the date portion completely and just base this off of time.
This is kind of like in a stats program where they display your peak hours with the only difference that I want to go is detailed as 5 minute time segments.
I am still fairly new at MySQL and Google seems to be my best companion.
What I have found so far is:
SELECT id, score, date_time, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY TIME(date_time) DIV 300;
Would this work or is there a better way to do this.
I don't think your query would work. You need to do a bit more work to get the time rounded to 5 minute intervals. Something like:
SELECT SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300) as time5, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300)
ORDER BY time5;
I would like to ask about how can I take average of rows in a column that are between 5 minutes.
In order to be more accurate I have a table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:03:34)----------45
1---------123------(24/4/2014 12:04:34)----------43
2---------127------(24/4/2014 12:04:37)----------50
3---------123------(28/4/2014 12:03:34)----------60
i would like to create a new table that will have the average of speed for rows that have the same link_id and are between 5 minutes
In the case that I mentioned above only the two first rows comply the requirements
and i want a new table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:00:00)----------44
2---------127------(24/4/2014 12:00:00)----------50
3---------123------(28/4/2014 12:00:00)----------60
which is the query that i have to use to create a new table with those requirments?
thank you in advance
It is not clear what you mean by 'average of speed for rows that ... are between five minutes.' So I will guess.
I guess you want to compute the averages for each distinct five minute interval. For example, you want averages of all items with timestamps from 2014-04-24 12:00:00 to 2014-04-24:12:04:59, then another average for items with timestamps from 2014-04-24 12:05:00 to 2014-04-24:12:09:59, and so forth.
To do this, you need to start with an expression that will take any DATETIME value and round it down to the beginning of its five-minute interval. How do you get that?
First, this expression will round down a timestamp to the beginning of the minute in which it occurs:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00')
This expression gives the number of minutes past the hour, modulo 5.
MINUTE(`date`)%5
So, this expression gives you the rounded-down DATETIME you need:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
Great. Now we need to use that in an aggregate query to compute the average speeds.
SELECT link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE AS five_min
AVG(speed) AS avg_speed
FROM mytable
GROUP BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
ORDER BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
This will do the trick you need done. There will be one row for each distinct link_id and five-minute interval of time. The time interval will be named by giving the time at which it begins. Each row will contain the average speed for observations in that time interval.
It's helpful when creating your specification for this kind of query to think very carefully about what you want each row of your result set to contain. If you do that, you will probably find that your query flows naturally from your specification.
Here's a more extensive writeup on how to do this sort of thing.
http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/
first of all sorry for that title, but I have no idea how to describe it:
I'm saving sessions in my table and I would like to get the count of sessions per hour to know how many sessions were active over the day. The sessions are specified by two timestamps: start and end.
Hopefully you can help me.
Here we go:
http://sqlfiddle.com/#!2/bfb62/2/0
While I'm still not sure how you'd like to compare the start and end dates, looks like using COUNT, YEAR, MONTH, DAY, and HOUR, you could come up with your desired results.
Possibly something similar to this:
SELECT COUNT(ID), YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
FROM Sessions
GROUP BY YEAR(Start), HOUR(Start), DAY(Start), MONTH(Start)
And the SQL Fiddle.
What you want to do is rather hard in MySQL. You can, however, get an approximation without too much difficulty. The following counts up users who start and stop within one day:
select date(start), hour,
sum(case when hours.hour between hour(start) and hours.hour then 1 else 0
end) as GoodEstimate
from sessions s cross join
(select 0 as hour union all
select 1 union all
. . .
select 23
) hours
group by date(start), hour
When a user spans multiple days, the query is harder. Here is one approach, that assumes that there exists a user who starts during every hour:
select thehour, count(*)
from (select distinct date(start), hour(start),
(cast(date(start) as datetime) + interval hour(start) hour as thehour
from sessions
) dh left outer join
sessions s
on s.start <= thehour + interval 1 hour and
s.end >= thehour
group by thehour
Note: these are untested so might have syntax errors.
OK, this is another problem where the index table comes to the rescue.
An index table is something that everyone should have in their toolkit, preferably in the master database. It is a table with a single id int primary key indexed column containing sequential numbers from 0 to n where n is a number big enough to do what you need, 100,000 is good, 1,000,000 is better. You only need to create this table once but once you do you will find it has all kinds of applications.
For your problem you need to consider each hour and, if I understand your problem you need to count every session that started before the end of the hour and hasn't ended before that hour starts.
Here is the SQL fiddle for the solution.
What it does is use a known sequential number from the indextable (only 0 to 100 for this fiddle - just over 4 days - you can see why you need a big n) to link with your data at the top and bottom of the hour.
select (SELECT power FROM newdb.newmeter where date(dt)=curdate() order by dt desc limit 1)
-(select Power from newdb.newmeter where date(dt)=(select date(subdate(now(), interval weekday(now()) day))) limit 0,1) as difference;
The above query is part of my prog which gives me difference in data being stored from day 1 of the week to the current day of the week. Those queries individually works fine as below, and returns:
SELECT power FROM newdb.newmeter where date(dt)=curdate() order by dt desc limit 1;
result: 941690 current time
select Power from newdb.newmeter where date(dt)=(select date(subdate(now(), interval weekday(now()) day))) limit 0,1;
result 93242.4 at the start of the week (or day for today as its monday)
But as soon as I run the difference query which is just the difference between above two that result in : 848447.8515625
This seems just really strange don't understand whats wrong with it? Please help.
You don't order by dt in your second query, looking for power at the start of week. Which means you are selecting undefined record that happens to have matching date. For a simple select the table is usually sorted by insert order, but it can change when optimizer thinks it can run a query faster using some other order. Basically, if you don't define order you don't care about order.
How many decimal places do you want in the answer? Use DECIMAL(7,1) for Power (I assume you want one decimal place) instead and see what you get.
I tend to avoid floats/doubles as the approximate values can get you into trouble.