Rolling 30 day uniques in sql - mysql

Suppose you have a table of the form:
create table user_activity (
user_id int not null,
activity_date timestamp not null,
...);
It's easy enough to select the number of unique user_id's in the past 30 days.
select count(distinct user_id) from user_activity where activity_date > now() - interval 30 day;
But how can you select the number of unique user_ids in the prior 30 days for each of the past 30 days? E.g. uniques for 0-30 days ago, 1-31 days ago, 2-32 days ago and so on to 30-60 days ago.
The database engine is mysql if it matters

You could try using a sub query:
SELECT DISTINCT `activity_date` as `day`, (
SELECT count(DISTINCT `user_id`) FROM `user_activity` WHERE `activity_date` = `day`
) as `num_uniques`
FROM `user_activity`
WHERE `activity_date` > NOW() - INTERVAL 30 day;
This should give you the number of unique users for each day. However, I haven't tested this since I don't have the DB to work with.

I haven't tried this in MySQL, but hopefully the syntax is right. If not, maybe it will point you in the right direction. First, I often employ a Numbers table. It can be a physical table simply made up of numbers or it can be a generated/virtual/temporary table.
SELECT
N.number,
COUNT(DISTINCT UA.user_id)
FROM
Numbers N
INNER JOIN User_Activity UA ON
UA.activity_date > NOW() - INTERVAL 30 + N.number DAY AND
UA.activity_date <= NOW() - INTERVAL N.number DAY
WHERE
N.number BETWEEN 0 AND 30
GROUP BY
N.number
I'm not familiar with the whole INTERVAL syntax, so if I got that wrong, please let me know and I'll try to correct it.

If you get the days number for todays date and mod it by 30 you get the offset of the current day. Then you add that to each number for a date and divide the result by 30, this gives you the group of days. Then group your results by this number. So in code something like this:
select count(distinct user_id), (to_days(activity_date)+(to_days(now()) % 30)) / 30 as period
from user_activity
group by (to_days(activity_date)+(to_days(now()) % 30)) / 30
I will leave calculating the reverse numbering of period up to you (hint: take the period number for the current date as "max" and subtract period above and add 1.)

Related

Select last_update lower then date less X days from another table

I'll try to explain as better as is possible. We have a table with products, all products have a last_update datetime field. and I have a table with a list of suppliers, and i need to get each day count from the suppliers subtract to current date, and check if the last date is lower than today.
for now I have this, that proven to not work:
update products_ean p
join astra_settings_automation a on a.id_supplier = p.id_supplier
set p.active = 0
where p.last_update < now() - INTERVAL 5 DAY and p.last_update < DATE_SUB(
CURDATE(), INTERVAL (
select Days_Until_Clean_Stock
from astra_settings_automation
where astra_settings_automation.id_supplier=p.id_supplier
) DAY
);
how can i solve this? all the "old" ones need to be deactivated.
thanks in advance

mysql optimize query where date hour

Hi all, I have pretty awfull query, that needs optimizing.
I need to select all records where date of created matches NOW - 35days, but the minutes and seconds can be any.
So I have this query here, its ugly, but working:
Any optimisation tips are welcome!
SELECT * FROM outbound_email
oe
INNER JOIN (SELECT `issue_id` FROM `issues` WHERE 1 ORDER BY year DESC, NUM DESC LIMIT 0,5) as issues
ON oe.issue_id = issues.issue_id
WHERE
year(created) = year( DATE_SUB(NOW(), INTERVAL 35 DAY) ) AND
month(created) = month( DATE_SUB(NOW(), INTERVAL 35 DAY) ) AND
day(created) = day( DATE_SUB(NOW(), INTERVAL 35 DAY) ) AND
hour(created) = hour( DATE_SUB(NOW(), INTERVAL 35 DAY) )
AND campaign_id IN (SELECT id FROM campaigns WHERE initial = 1)
I assume the field "created" is a datetime field and is from the issues table? Since you don't need anything else on the issues and campaign table, then you can do the following:
SELECT e.* FROM outbound_email e
JOIN issues i ON e.issue_id = i.issue_id
JOIN campaigns c ON c.id = i.campaign_id
WHERE i.created < DATE_SUB(NOW(), INTERVAL 35 DAY)
AND c.initial = 1
There's no need to separate the datetime field into years, months...etc.
You seem to be saying you want to select all rows from a table where the time they were created was the same hour as it is currently, 35 days ago
SELECT * FROM table WHERE created BETWEEN
DATE_ADD(CURDATE(), INTERVAL (HOUR(now()) - 840) HOUR) AND
DATE_ADD(CURDATE(), INTERVAL (HOUR(now()) - 839) HOUR)
Why does it work? Curdate gives us today at midnight. We add to this the current hour of the time (e.g. Suppose it's now 5pm we'd add `HOUR(NOW()) which would give us 17, for a time now of 5pm) but we also subtract 840 because that's 35 days * 24 hours a day = 840 hours. Date add will hence add -823 hours to the current date, i.e. 5pm 35 days ago
We make the search a range to get all the records from the hour, the simplest way to specify an hour later is to subtract 839 hours instead of 840
Technically this query will also return records that are bang on 6pm (but not a second later) 35 days ago too because between is inclusive (between 1 and 10 will return 10 also
If this is a problem, change the BETWEEN for created >= blah AND created < blahblah
I haven't put the rest of your query in for reasons of clarity
As a side note, the way you did it wasn't bad- you could have simplified things by not having the year/month/day parts, just dropping the time part of the date with date(created) = date_sub(curdate(), interval 35 day) which is the year month and day combined as a date, no time element.. BUT it is generally always best to leave table data alone rather than format or convert it just to match a query. If you convert table data then indexes can no longer be used. If you go the extra mile to get your query parameters into the format of the column, and don't convert the table data then indexes on the column can be used

MySql - get days remaining

Users can sign up for a premium listing for a specified number of days, e.g. 30 days.
tblPremiumListings
user_id days created_date
---------------------------------
1 30 2013-05-21
2 60 2013-06-21
3 120 2012-06-21
How would I select records where there are still days remaining on a premium listing.
SELECT *
FROM tblPremiumListings
WHERE created_date + INTERVAL `days` DAY >= CURDATE()
It's easiest to read with INTERVAL
select *
from tblPremiumListings
where created_date + interval days day >= now();
But I would also change the table to instead of created_date and days instead store end_date. That way the query is
select *
from tblPremiumListings
where end_date >= now();
The benefit of doing like this is that you can put an index on end_date and quickly find all ended premium listings, with your original table you'll always have to do a full table scan to find the records with expired listing.
SELECT * FROM
tblPremiumListings
WHERE (DATEDIFF(NOW(), created_date) - days) <= 0
See if it solves your problem
One way to get the result:
SELECT t.user_id
, t.days
, t.created_date
FROM tblPremiumListings t
WHERE t.created_date + INTERVAL t.days DAY > DATE(NOW())
You may want a >= comparison operator (instead of >) depending on how you define days remaining.
NOTE: the access plan for this query will be full scan of all rows, since MySQL won't be able to do a range scan on an index. For large sets, having a column that can be indexed to satisfy the query may improve performance, e.g.
WHERE t.expire_date > DATE(NOW())
Try this one...
SELECT * FROM
tblPremiumListings
WHERE DATE_ADD(created_date, days)>DATE(NOW())

Average posts per hour on MySQL?

I have a number of posts saved into a InnoDB table on MySQL. The table has the columns "id", "date", "user", "content". I wanted to make some statistic graphs, so I ended up using the following query to get the amount of posts per hour of yesterday:
SELECT HOUR(FROM_UNIXTIME(`date`)) AS `hour`, COUNT(date) from fb_posts
WHERE DATE(FROM_UNIXTIME(`date`)) = CURDATE() - INTERVAL 1 DAY GROUP BY hour
This outputs the following data:
I can edit this query to get any day I want. But what I want now is the AVERAGE of each hour of every day, so that if on Day 1 at 00 hours I have 20 posts and on Day 2 at 00 hours I have 40, I want the output to be "30". I'd like to be able to pick date periods as well if it's possible.
Thanks in advance!
You can use a sub-query to group the data by day/hour, then take the average by hour across the sub-query.
Here's an example to give you the average count by hour for the past 7 days:
select the_hour,avg(the_count)
from
(
select date(from_unixtime(`date`)) as the_day,
hour(from_unixtime(`date`)) as the_hour,
count(*) as the_count
from fb_posts
where `date` >= unix_timestamp(current_date() - interval 7 day)
and created_on < unix_timestamp(current_date())
group by the_day,the_hour
) s
group by the_hour
Aggregate the information by date and hour, and then take the average by hour:
select hour, avg(numposts)
from (SELECT date(`date`) as day, HOUR(FROM_UNIXTIME(`date`)) AS `hour`,
count(*) as numposts
from fb_posts
WHERE DATE(FROM_UNIXTIME(`date`)) between <date1> and <date2>
GROUP BY date(`date`), hour
) d
group by hour
order by 1
By the way, I prefer including the explicit order by, since most databases do not order the results of a group by. Mysql happens to be one database that does.
SELECT
HOUR(FROM_UNIXTIME(`date`)) AS `hour`
, COUNT(`id`) \ COUNT(DISTINCT TO_DAYS(`date`)) AS avgHourlyPostCount
FROM fb_posts
WHERE `date` > '2012-01-01' -- your optional date criteria
GROUP BY hour
This gives you a count of all the posts, divided by the number of days, by hour.

sql query date filtering the result

i've create a database to store statistics from a website such as traffic. i'm trying to query the database for the number of unique ip addresses that have been captured over the last 30 days;
SELECT COUNT(*) totalA FROM statistics GROUP by ipAddress
AND DATE_SUB(CURDATE( ) ,INTERVAL 30 DAY) LIMIT 0, 30
however the query just return the number of data entry in the table. i've used the same query minus the date filter and gotten the correct result so it's just the date filtering thats messed up
any help would be appreciated thanks
It's because you're GROUP BY on the ipAddress AND DATE_SUB(CURDATE( ), INTERVAL 30 DAY) and this expression yields the logical AND between two fields, which has only 2 possible values. Still, you don't care bout the distinction and request just count(*).
What you probably want is:
SELECT ipAddress, DATE_SUB(CURDATE( ), INTERVAL 30 DAY), COUNT(*) AS totalA
FROM statistics
GROUP by ipAddress, DATE_SUB(CURDATE( ), INTERVAL 30 DAY)
LIMIT 0, 30
I think this is what you are looking for. Basically I just put your date code in a WHERE clause and compared it to the date column in your table. Obviously you will need to change the "entryDate" to whatever the name of your date column is.
SELECT COUNT(*) AS totalA
FROM statistics
WHERE entryDate >= DATE(CURDATE()-INTERVAL 30 DAY)
GROUP BY ipAdress