MySQL count rows in multiple date ranges? - mysql

I want to count the rows in several date ranges (i.e: last hour, today, this week, last 30 days) from a given table.
I need to know how many entries are in this time/date periods to be able to tell if a given user has reach the limit for each one of this ranges. For instance, a user can have max 300 entries one month but with a (hourly/daily/weekly/monthly) limit.
So far I'm trying with a subquery approach using a SELECT CASE similar to this one: group by range in mysql
Which should be the best way of doing this?

In mysql you could use a series of count functions with if statements so that only the required dates are counted, like so.
SELECT COUNT(IF(date >= DATE_SUB(NOW(), INTERVAL 1 HOUR), 1, null)) AS hourHits,
and so on
Edited as per comments

Related

Select where last activity 3 months ago

I have a table of cellular invoices, relevant columns are Cellular_Account_id (INT), billing_end_date(DATE), and data_usage_GB.
There is a separate row for each account every month. I'm trying to get a list of accounts that have had no data usage for each of the past three months.
I'm pretty new to databases in general, so I'm not really even sure what syntax I should be searching for, or what approach I should be taking.
I can, of course, select WHERE data_usage_GB = 0.000 AND MONTH(billing_end_date) = month(current_date()) -1 but that only gives me the info in 1 month's range. I'm not sure how to group together the results where data_usage_GB = 0.000 for each of the last three months.
I'd group by the account, get the maximum date for each and then filter them using a having clause:
SELECT cellular_account_id
FROM invoices
GROUP BY cellular_account_id
HAVING MAX(billing_end_date) < DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)

MySQL: Select count of returning vs. new rows in MySQL with period

I have database table in MySQL, which consist of the following fields:
id
user_id
timestamp
The table is a simple log of visitors. I am trying to get the following numbers in one query:
Distinct user_id's for a specific time period (30 days)
Amount of these user_id's, which already exist in the table, regardless of time period
I have been able to do it within the period with this simple query:
SELECT
COUNT(DISTINCT user_id) AS 'count_distinct',
COUNT(user_id) AS 'count_all'
FROM
table
WHERE
timestamp BETWEEN CURDATE() - INTERVAL 30 DAY AND CURDATE();
Running this query gives me the count of distinct user_id's and the count of all user_id's within the time period. I can then apply the math myself to get the count of new vs. returning visitors - for that period. What I am trying to figure out is how many distinct user_id's, who visited within 30 days, who has also visited at any previous point in time.
I hope you can help me solve this.

SQL: Reuse function result in query without using sub-query

In a MySQL DB table that stores sale orders, I have a LastReviewed column that holds the last date and time when the sale order was modified (type timestamp, default value CURRENT_TIMESTAMP). I'd like to plot the number of sales that were modified each day, for the last 90 days, for a particular user.
I'm trying to craft a SELECT that returns the number of days since LastReviewed date, and how many records fall within that range. Below is my query, which works just fine:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND DATEDIFF(CURDATE(),LastReviewed)<=90
GROUP BY days
ORDER BY days ASC
Notice that I am computing the DATEDIFF() as well as CURDATE() multiple times for each record. This seems really ineffective, so I'd like to know how I can reuse the results of the previous computation. The first thing I tried was:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND days<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'where clause'. So I started to look around the net. Based on another discussion (Can I reuse a calculated field in a SELECT query?), I next tried the following:
SELECT DATEDIFF(CURDATE(), LastReviewed) AS days, COUNT(*) AS number FROM sales
WHERE UserID=123 AND (SELECT days)<=90
GROUP BY days
ORDER BY days ASC
Error: Unknown column 'days' in 'field list'. I'm also tried the following:
SELECT #days := DATEDIFF(CURDATE(), LastReviewed) AS days,
COUNT(*) AS number FROM sales
WHERE UserID=123 AND #days <=90
GROUP BY days
ORDER BY days ASC
The query returns zero result, so #days<=90 seems to return false even though if I put it in the SELECT clause and remove the WHERE clause, I can see some results with #days values below 90.
I've gotten things to work by using a sub-query:
SELECT * FROM (
SELECT DATEDIFF(CURDATE(),LastReviewed) AS sales ,
COUNT(*) AS number FROM sales
WHERE UserID=123
GROUP BY days
) AS t
WHERE days<=90
ORDER BY days ASC
However I odn't know whether it's the most efficient way. Not to mention that even this solution computes CURDATE() once per record even though its value will be the same from the start to the end of the query. Isn't that wasteful? Am I overthinking this? Help would be welcome.
Note: Mods, should this be on CodeReview? I posted here because the code I'm trying to use doesn't actually work
There are actually two problems with your question.
First, you're overlooking the fact that WHERE precedes SELECT. When the server evaluates WHERE <expression>, it then already knows the value of the calculations done to evaluate <expression> and can use those for SELECT.
Worse than that, though, you should almost never write a query that uses a column as an argument to a function, since that usually requires the server to evaluate the expression for each row.
Instead, you should use this:
WHERE LastReviewed < DATE_SUB(CURDATE(), INTERVAL 90 DAY)
The optimizer will see this and get all excited, because DATE_SUB(CURDATE(), INTERVAL 90 DAY) can be resolved to a constant, which can be used on one side of a < comparison, which means that if an index exists with LastReviewed as the leftmost relevant column, then the server can immediately eliminate all of the rows with LastReviewed >= that constant value, using the index.
Then DATEDIFF(CURDATE(), LastReviewed) AS days (still needed for SELECT) will only be evaluated against the rows we already know we want.
Add a single index on (UserID, LastReviewed) and the server will be able to pinpoint exactly the relevant rows extremely quickly.
Builtin functions are much less costly than, say, fetching rows.
You could get a lot more performance improvement with the following 'composite' index:
INDEX(UserID, LastReviewed)
and change to
WHERE UserID=123
AND LastReviewed >= CURRENT_DATE() - INTERVAL 90 DAY
Your formulation is 'hiding' LastRevieded in a function call, making it unusable in an index.
If you are still not satisfied with that improvement, then consider a nightly query that computes yesterday's statistics and puts them in a "Summary table". From there, the SELECT you mentioned can run even faster.

average rows in a column that are between 5 minutes

I would like to ask about how can I take average of rows in a column that are between 5 minutes.
In order to be more accurate I have a table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:03:34)----------45
1---------123------(24/4/2014 12:04:34)----------43
2---------127------(24/4/2014 12:04:37)----------50
3---------123------(28/4/2014 12:03:34)----------60
i would like to create a new table that will have the average of speed for rows that have the same link_id and are between 5 minutes
In the case that I mentioned above only the two first rows comply the requirements
and i want a new table like this
id-----link_id---------date---------------------speed
0---------123------(24/4/2014 12:00:00)----------44
2---------127------(24/4/2014 12:00:00)----------50
3---------123------(28/4/2014 12:00:00)----------60
which is the query that i have to use to create a new table with those requirments?
thank you in advance
It is not clear what you mean by 'average of speed for rows that ... are between five minutes.' So I will guess.
I guess you want to compute the averages for each distinct five minute interval. For example, you want averages of all items with timestamps from 2014-04-24 12:00:00 to 2014-04-24:12:04:59, then another average for items with timestamps from 2014-04-24 12:05:00 to 2014-04-24:12:09:59, and so forth.
To do this, you need to start with an expression that will take any DATETIME value and round it down to the beginning of its five-minute interval. How do you get that?
First, this expression will round down a timestamp to the beginning of the minute in which it occurs:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00')
This expression gives the number of minutes past the hour, modulo 5.
MINUTE(`date`)%5
So, this expression gives you the rounded-down DATETIME you need:
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
Great. Now we need to use that in an aggregate query to compute the average speeds.
SELECT link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE AS five_min
AVG(speed) AS avg_speed
FROM mytable
GROUP BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
ORDER BY link_id,
DATE_FORMAT(`date`,'%Y-%m-%d %H:%i:00') - INTERVAL (MINUTE(`date`)%5) MINUTE
This will do the trick you need done. There will be one row for each distinct link_id and five-minute interval of time. The time interval will be named by giving the time at which it begins. Each row will contain the average speed for observations in that time interval.
It's helpful when creating your specification for this kind of query to think very carefully about what you want each row of your result set to contain. If you do that, you will probably find that your query flows naturally from your specification.
Here's a more extensive writeup on how to do this sort of thing.
http://www.plumislandmedia.net/mysql/sql-reporting-time-intervals/

Retrieve rows grouped by hour with MySQL

I have a table containing access logs. I want to know how many accesses to resource_id '123' occured in each hour in a 24 hour day.
My first thought for retrieving this info is just looping through each hour and querying the table in each loop with something like... and time like '$hour:%', given that the time field holds data in the format 15:47:55.
Is there a way I can group by the hours and retrieve each hour and the number of rows within each hour in a single query?
Database is MySQL, language is PHP.
SELECT HOUR(MyDatetimeColumn) AS h, COUNT(*)
FROM MyTable
GROUP BY h;
You can use the function HOUR to get the hour out of the time. Then you should be able to group by that.