Grouping by time ignoring the Date portion - mysql

I have a table that has a column that is called scores and another one that is called date_time
I am trying to find out for each 5 minute time increment how many I have that are above a certain score. I want to ignore the date portion completely and just base this off of time.
This is kind of like in a stats program where they display your peak hours with the only difference that I want to go is detailed as 5 minute time segments.
I am still fairly new at MySQL and Google seems to be my best companion.
What I have found so far is:
SELECT id, score, date_time, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY TIME(date_time) DIV 300;
Would this work or is there a better way to do this.

I don't think your query would work. You need to do a bit more work to get the time rounded to 5 minute intervals. Something like:
SELECT SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300) as time5, COUNT(id)
FROM data
WHERE score >= 500
GROUP BY SEC_TO_TIME(FLOOR(TIME_TO_SEC(time(date_time))/300)*300)
ORDER BY time5;

Related

What is difference for below there two sql queries

select
substr(insert_date, 1, 14),
device, count(1)
from
abc.xyztable
where
insert_date >= DATE_SUB(NOW(), INTERVAL 10 DAY)
group by
device, substr(insert_date, 1, 14) ;
and then I am trying to get average of the same rows count which I got above.
SELECT
date, device, AVG(count)
FROM
(SELECT
substr(insert_date, 1, 14) AS date,
device,
COUNT(1) AS count
FROM
abc.xyztable
WHERE
insert_date >= DATE_SUB(NOW(), INTERVAL 10 DAY)
GROUP BY
device, substr(insert_date, 1, 14)) a
GROUP BY
device, date;
AS I found both queries return the same results, I tried for last 10 days data.
My purpose is to get the average rows count for last 10 days which I get from the above 1st query.
I'm not entirely sure what you're asking, the "difference" between the two queries is that the first one is valid but the second does not appear to be, as per HoneyBadger's comment. They also seem to be trying to achieve two different goals.
However, I think what you are trying to do is produce a query based on the data from the first query, which returns the date, device, and an average of the count column. If so, I believe the following query would calculate this:
WITH
dataset AS (
select substr(insert_date,1,14) AS theDate, device, count(*) AS
theCount
from abc.xyztable
where insert_date >=DATE_SUB(NOW(), INTERVAL 10 DAY)
group by device,substr(insert_date,1,14)
)
SELECT theDate, device, (SELECT ROUND(AVG(CAST(theCount
AS FLOAT)), 2) FROM
dataset) AS Average
FROM dataset
GROUP BY theDate, device
I have referenced the accepted answers of this question to calculate the average: How to calculate average of a column and then include it in a select query in oracle?
And this question to tidy up the query: Formatting Clear and readable SQL queries
Without having a sample of your data, or any proper context, I can't see how this would be especially useful, so if it was not what you were looking for, please edit your question and clarify exactly what you need.
EDIT: Based on what extra information you have provided, I've made a tweak to my solution to increase the precision of the average column. It now calculates the average to two decimal places. You have stated that this returns the same result as your original query, but the two queries are not formulating the same thing. If the count column is consistently the same number with little variation, the AVG function will round this, which in turn could produce results which look the same, especially if you only compare a small sample, so I have amended my answer to demonstrate this. Again, we'd all be able to help you much easier if you would provide more information, such as a sample of your data.
If you want an average you need to change the last GROUP BY
to get an average per device
GROUP BY device;
to get an average per date
GROUP BY date;
or remove it completely to get an average for all rows in the sub-query
Update
Below is a full example for getting the average per device
SELECT device, avg(count)
FROM (SELECT substr(insert_date,1,14) as date, device, count(1) as count
FROM abc.xyztable
WHERE insert_date >=DATE_SUB(NOW(), INTERVAL 10 DAY)
GROUP BY device,substr(insert_date,1,14)) a
GROUP BY device;

SQL query to find the most common index of a minimal field [duplicate]

This question already has answers here:
ROW_NUMBER() in MySQL
(26 answers)
Closed 8 years ago.
I have a table that tracks the activity in several websites. Each row is of the following form: (Date, Hour, Website, Hits)
The Hour field is a number between 0 and 23 and represents an entire hour (for example, 22 is for any hits between 22:00 and 22:59).
I want to find the overall slowest hour for each website, meaning the input should be something like (Website, Hour).
In order to do that, I was thinking I should have a nested query to find the minimum hits for each website on each day, and then count the values of Hour (again, for each website on each day), and see which value is the maximal.
I'm still new to SQL so I'm having difficulties using the min() function properly, to find the minimal value only for a specific date and website. Then I have the same problem with using count() for a specific website.
I'm also curious if I can get not just the most common slowest hour, but maybe the 3 slowest, but at least to me it seems like it's really complicating the problem.
For the first nested query, I considered something like this:
SELECT DISTINCT Date Date_t, Website Website_t, Hour,
(SELECT min(Hits) from HITS_TABLE WHERE Date=Date_t and Website=Website_t) as MinHits
FROM HITS_TABLE
But not only it takes an abnormally long time to calculate, it also gives me multiple entries of (Date_t, Website_t, Hour, min(Hits)) for each value of Hour, so I take it that I'm not doing it in the smartest, nor the most efficient way.
Thanks in advance for any help!
You can get the minimum hour using a trick in MySQL:
select website, substring_index(group_concat(hour order by hits), ',', 1) as minhour
from table t
group by website;
For each website, this constructs a comma-delimited list of hours, ordered by the number of hits. The function substring_index() returns the first row.
This is something of a hack. In most other databases, you would use window/analytic functions, but these are not available in MySQL.
EDIT:
You can do this in standard SQL as well:
select t.*
from table t
where not exists (select 1
from table t2
where t2.hour = t.hour and
t2.hits < t.hits
);
This is interpreted as: "Get me all rows from the table where there are no other rows with the same hour and a lower number of hits." This is a round-about way of saying: "Get me the hour with the minimum value." Note that this will return multiple rows when there are ties.

MySQL order by count for last hour recursive

I have a SQL question. First of all I'd like to know is it even possible with just SQL, and if not does anyone know a good workaround.
We are building a site, where users can vote for videos.
The users can vote by SMS or directly on site after Facebook authentication.
We have to make a top list of all videos, and calculate the "position" on the list for each video.
So far, we have done that with a simple subquery, something like this:
SELECT v.video_id AS id,
(SELECT (COUNT(*)+1) FROM videos AS v2
WHERE (v2.SMS_votes + v2.facebook_votes) > (v.SMS_votes + v.facebook_votes)) AS total_position
FROM videos AS v
SMS_votes and facebook_votes are aggregated fields. There are separate tables for each kind of votes, with records for each vote, including the time the vote has been set.
This works fine, the positions are calculated... if 2 or more videos have the same number of votes, they "share" the position.
Unfortunately there can be no position sharing, and we have to resolve it by the following rules:
if 2 videos have the same number of votes, the one with more SMS votes has the advantage
if they also have the same number of SMS votes, the one which has more SMS votes in the last hour has the advantage
if they also have the same number of SMS votes in the last hour, they are compared by the hour before, and recursively like that, until there is a difference between the two
Is it possible to do this kind of recursive ordering only in SQL, or do we have to resolve this manually in code? All ideas are welcomed. Just to note, performance is important here, because the top list is used all over the site.
I don't think it's feasible to perform this kind of ordering with a recusive calculation (which is potentially unbounded), but if you're willing to limit the amount of time you look back, there are ways it could be done.
Here's one possibility.
SELECT video_id,
SMS_votes + facebook_votes AS total_votes,
SMS_votes,
COUNT(CASE WHEN time > NOW() - INTERVAL 1 HOUR THEN 1 END) AS h1,
COUNT(CASE WHEN time > NOW() - INTERVAL 2 HOUR THEN 1 END) AS h2,
COUNT(CASE WHEN time > NOW() - INTERVAL 3 HOUR THEN 1 END) AS h3
FROM videos
JOIN SMS_votes USING(video_id)
GROUP BY video_id
ORDER BY total_votes DESC, SMS_votes DESC, h1 DESC, h2 DESC, h3 DESC;
This assumes you have a table called SMS_votes tracking each vote, with a video_id field and a time field.
For each video, it calculates the total votes, the SMS votes, the SMS votes in the past hour, the past two hours, and the past three hours. It then does an ORDER BY on all those values to get the correct position.
It's fairly easy to extend this to include a wider range of hours, but you might also want to consider using an increasing time range as you go back in time. For example, you first look at votes in the past hour, then the past day, then the past week, etc. I suspect that would lower your chance of videos having the same votes without having to add as many extra calculations.
SQL Fiddle example

subtract the data for every 5 minutes between two particular times

I have some problem with MYSQL,I need to subtract the data between two particular times,for every 5 minutes and then average it the 5 minutes data.
What I am doing now is:
select (avg(columnname)),convert((min(datetime) div 500)*500, datetime) + INTERVAL 5 minute as endOfInterval
from Databasename.Tablename
where datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:50:00'
group by datetime div 500;
It is the cumulative average.
Suppose i get 500 at 11 o' clock and 700 at 11.05 ,the average i need is (700-500)/5 = 40.
But now i am getting (500+700)/5 = 240.
I dont need the cumulative average .
Kindly help me.
For the kind of average you're talking about, you don't want to aggregate multiple rows using a GROUP BY clause. INstead, you want to compute your result using exactly two diffrent rows from the same table. This calls for a self-join:
SELECT (b.columnname - a.columnname)/5, a.datetime, b.datetime
FROM Database.Tablename a, Database.Tablename b
WHERE b.datetime = a.datetime + INTERVAL 5 MINUTE
AND a.datetime BETWEEN '2012-09-12 10:50:00' AND '2012-09-12 14:45:00'
a and b refer to two different rows of the same table. The WHERE clause ensures that they are exactly 5 minutes apart.
If there is no second column matching that temporal distance, no resulting row will be included in the query result. If your table doesn't have data points exactly every five minutes, but you have to search for the suitable partner instead, then things become much more difficult. This answer might perhaps be adjusted for that use case. Or you might implement this at the application level, instead of on the database server.

Select rows that are less than 5 minutes old using DATE_SUB

I have a table that is getting hundreds of requests per minute. The issue that I'm having is that I need a way to select only the rows that have been inserted in the past 5 minutes. I am trying this:
SELECT count(id) as count, field1, field2
FROM table
WHERE timestamp > DATE_SUB(NOW(), INTERVAL 5 MINUTE)
ORDER BY timestamp DESC
My issue is that it returns 70k+ results and counting. I am not sure what it is that I am doing wrong, but I would love to get some help on this. In addition, if there were a way to group them by minute to have it look like:
| count | field1 | field2 |
----------------------------
I'd love the help and direction on this, so please let me know your thoughts.
You don't really need DATE_ADD/DATE_SUB, date arithmetic is much simpler:
SELECT COUNT(id), DATE_FORMAT(`timestamp`, '%Y-%m-%d %H:%i')
FROM `table`
WHERE `timestamp` >= CURRENT_TIMESTAMP - INTERVAL 5 MINUTE
GROUP BY 2
ORDER BY 2
The following seems like it would work which is mighty close to what you had:
SELECT
MINUTE(date_field) as `minute`,
count(id) as count
FROM table
WHERE date_field > date_sub(now(), interval 5 minute)
GROUP BY MINUTE(date_field)
ORDER BY MINUTE(date_field);
Note the added column to show the minute and the GROUP BY clause that gathers up the results into the corresponding minute. Imagine that you had 5 little buckets labeled with the last 5 minutes. Now imagine you tossed each row that was 4 minutes old into it's own bucket. count() will then count the number of entries found in each bucket. That's a quick visualization on how GROUP BY works. http://www.tizag.com/mysqlTutorial/mysqlgroupby.php seems to be a decent writeup on GROUP BY if you need more info.
If you run that and the number of entries in each minute seems too high, you'll want to do some troubleshooting. Try replacing COUNT(id) with MAX(date_field) and MIN(date_field) so you can get an idea what kind of dates it is capturing. If MIN() and MAX() are inside the range, you may have more data written to your database than you realize.
You might also double check that you don't have dates in the future as they would all be > now(). The MIN()/MAX() checks mentioned above should identify that too if it's a problem.