Hi I have a table that logs user events.
date | event | userId
2016/09/29 | A | 10
2016/09/29 | A | 3
2016/09/29 | A | 2
2016/09/28 | A | 2
2016/09/28 | B | 2
2016/09/27 | A | 1
2016/09/27 | A | 1
2016/09/27 | B | 1
I need to count for each day of the current month, the number of userId that were never seen before (to simplify, the number of new users).
I have come up with the following query for the current day.
SELECT DATE(date) AS time, COUNT(DISTINCT userId) AS count FROM event
WHERE DATE(date) = date(now())
AND userId NOT IN (
SELECT DISTINCT userId FROM event WHERE DATE(date) <= DATE_SUB(now(), INTERVAL 1 DAY)
AND userId IS NOT NULL
)
But I am not able to find an efficient query to get the same result for each day of the current month.
You can try this:
SELECT myDate, COUNT(userId) AS user_count
FROM (
SELECT userId, DATE(MIN(date)) AS myDate
FROM event
GROUP BY userId
) AS ch
GROUP BY myDate
select fd date, count(1) newUserCount from
(select min(date) fd ,userId from event where date between {monthStart} and {monthEnd}
group by userId )t group by fd
Find each user's first visit date of the month,and the daily count of new user of this month is each day's user count.
Hope this can help
Related
I have a table like this :
-------------------------------
| id | valid_until | username|
-------------------------------
| 1 | 2020-01-01 | user1 |
-------------------------------
| 1 | 2020-01-01 | user2 |
-------------------------------
| 1 | 2020-01-02 | user3 |
-------------------------------
| 1 | 2020-01-02 | user4 |
-------------------------------
| 1 | 2020-01-03 | user5 |
-------------------------------
| 1 | 2020-01-03 | user6 |
-------------------------------
| 1 | 2020-01-03 | user7 |
-------------------------------
This is the user subscription table, valid_until says when the subscription will end up.
I want to know active subscription in each day, So I have a range, for example, 2020-01-01 TO 2020-01-03 and here is my query :
SELECT
valid_until qva,
(SELECT
COUNT(iod.id) AS ct
FROM
`order_detail` iod
WHERE
valid_until > qva)
FROM
`order_detail`
WHERE
valid_until >= '2020-01-01'
AND valid_until <= '2020-01-03'
GROUP BY qva
But this query is too slow, What is the problem with my query? Response time (230 sec)
Your query is invalid according to the SQL standard. You have GROUP BY qva (and using the alias name here is usually not allowed, but in MySQL and MariaDB it is), but you don't apply an aggregate function on the subquery result. MySQL is known for violating the standard here and they silently apply ANY_VALUE on such unaggregated expressions.
This makes the query slow. For each order in the date range the count is evaluated, only to pick one of those counts per date at the very end.
So, let's build the query up from scratch. You want one result row for each day in the given date range. That is:
select distinct valid_until
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail;
Then, for each of these dates you want to get the count. You can do this in a subquery as in your original query. I suppose it must be >= instead of > though, as an order valid until a day is still valid that day (or at least this is what I'd expect).
SELECT
dates.day,
(
select count(*)
from order_detail od
where od.valid_until >= dates.day
) as ct
FROM
(
select distinct valid_until as day
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail
) dates
order by dates.day;
(This assumes that valid_until is a mere date. If it's a datetime, you'll have to adjust this query a little.)
UPDATE:
As ysth told you in the request comments, your query will get you only days in the given range that exist as valid_until in your table. So does mine. If you wanted the three days regardless, you'd have to replace the subquery and make it independent from the table, e.g.
FROM
(
select date '2020-01-01' as day union all
select date '2020-01-02' as day union all
select date '2020-01-03' as day
) dates
I think you just want a cumulative count of valid_until from the latest date to the earliest:
select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until;
If you want this for a range of dates, then use a subquery:
select od.*
from (select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until
) od
where valid_until >= :from and valid_until <= :to
I'm trying to create a query with conditional logic where I only calculate revenue for the most recent records by each month using a datetime column (start_date), but only if there are multiple records in that month from the same account_id.
Here's a basic example of the schema after I join two tables (full schema in sqlfiddle link).
| account_id | plan_id | start_date | plan_interval | price |
|------------|---------|----------------------|---------------|-------|
| 1 | 1 | 2018-01-03T14:52:13Z | month | 39 |
| 1 | 3 | 2018-02-07T11:10:17Z | year | 999 |
| 1 | 2 | 2018-02-07T11:11:17Z | month | 99 |
In the above example, I would only like to include rows 1 and 3 in my output, as it's the one record from account_id 1 in January and the most recent of two records for account_id 1 in February.
SELECT
MONTH(start_date) AS month,
SUM(CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12 END) AS mrr
FROM subscriptions
JOIN plans
ON plans.id = subscriptions.plan_id
WHERE Year(start_date) = 2018 AND
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
GROUP BY month
ORDER BY month ASC;
The case statement in the subquery above does not seem to work in doing this. It returns the data without filtering out records when the first condition is met.
Here is an example: sqlfiddle
This query returns the rows that you are asking for in the question:
SELECT s.*, p.plan_interval, p.price,
(CASE WHEN p.plan_interval = 'month'
THEN p.price * 0.01
ELSE (p.price * 0.01)/12
END) AS mrr
FROM subscriptions s JOIN
plans p
ON p.id = s.plan_id
WHERE YEAR(s.start_date) = 2018 AND
s.start_date = (SELECT MAX(s2.start_date)
FROM subscriptions s2
WHERE s2.account_id = s.account_id AND
EXTRACT(YEAR_MONTH FROM s2.start_date) = EXTRACT(YEAR_MONTH FROM s.start_date)
)
ORDER BY s.start_date ASC;
This uses a subquery to get the most recent record for a subscription for each month.
You can then aggregate this however you wish.
Notes about the query:
Table aliases make the query easier to write and to read.
The subquery uses the handy YEAR_MONTH option of EXTRACT(), so it handles both years and months.
For numeric constants between -1 and 1, I always prepend with a 0, so 0.12 rather than .12. If find that this makes the decimal point more obvious.
First work out the last entry by account and month (sub query a) join to subscriptions to get the plan_id and then get the plan
SELECT S.ACCOUNT_id,s.plan_id,s.start_date,p.Price,p.plan_interval,
case when p.plan_interval = 'month' then p.price * .01 /12 else p.price * .01 end as rev
from subscriptions s
join (select s.account_id,month(s.start_date), max(s.start_date) start_date
from subscriptions s
group by account_id,month(start_date)) a on a.account_id = s.account_id and a.start_date = s.start_date
join plans p on p.id = s.plan_id;
+------------+---------+---------------------+----------+---------------+--------------+
| ACCOUNT_id | plan_id | start_date | Price | plan_interval | rev |
+------------+---------+---------------------+----------+---------------+--------------+
| 1 | 1 | 2018-01-03 14:52:13 | 3900.00 | month | 3.25000000 |
| 1 | 2 | 2018-02-07 11:11:17 | 9900.00 | month | 8.25000000 |
| 2 | 3 | 2018-01-03 17:40:05 | 99900.00 | year | 999.00000000 |
+------------+---------+---------------------+----------+---------------+--------------+
In your case, the WHERE statement does not work because the CASE statement will always return a boolean.
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
Another approach to what you are building would involve using a subquery to order the columns the way you want within the groups.
SELECT
account_id,
month,
CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12
END AS mrr
FROM (
SELECT *, MONTH(start_date) AS month
FROM subscriptions
INNER JOIN plans ON plans.id = subscriptions.plan_id
ORDER BY account_id, start_date DESC
) sq
GROUP BY account_id, month
This works because selecting columns in a GROUP BY will automatically take the first row that is returned by the subquery for a given group of columns.
I have a table with the following data (merely an example, actual table has 600,000 rows) (aid = access id [primary key] and id = user id [foreign key]):
aid | id | date
332 | 1 | 2016-12-15
331 | 4 | 2016-12-15
330 | 3 | 2016-12-15
329 | 1 | 2016-12-14
328 | 1 | 2016-12-14
327 | 2 | 2016-12-14
326 | 3 | 2016-12-13
325 | 2 | 2016-12-13
324 | 1 | 2016-12-13
323 | 1 | 2016-12-12
322 | 3 | 2016-12-12
321 | 1 | 2016-12-12
Each id is a users primary key, and every time they access something in my system I log them in this table (with the date in the format as shown, and their id). A user can be logged multiple times a day.
I'm looking to: return the total number of times the thing has been accessed in a day and return the total number of NEW users who have accessed the thing in a day, for the last 8 days (something will always be logged each day, so using "LIMIT 8" is fine for getting only the last 8 days).
My SQL currently looks like:
SELECT COUNT(id), COUNT(distinct id), date
FROM table
GROUP BY date
ORDER BY date DESC
LIMIT 8;
That SQL does the first part correctly, but I can't figure out how to get it to return the number of users who have never accessed the thing until that day.
Desired results would be, the one "newuser" represents the user with id "4" as they have never accessed the thing before:
COUNT(id) | newusers | date
3 | 1 | 2016-12-15
3 | 0 | 2016-12-14
3 | 0 | 2016-12-13
3 | 0 | 2016-12-12
Sorry if I didn't explain this clear enough.
To get new users you want the first day an id appeared:
select id, min(date)
from t
group by id;
The rest is just a join and group by:
select d.date, cnt, count(dd.id) as newusers
from (select date, count(*) as cnt
from t
group by date
) d left join
(select id, min(date) as mindate
from t
group by id
) dd
on d.date = dd.mindate
group by d.date, d.cnt
limit 8;
To get the number of new users you need to compare them to a set of ids over the past 8 days
My MySQL is a bit rusty, so you might have to correct the syntax.
SELECT COUNT(id)
FROM table
WHERE id NOT IN (
SELECT DISTINCT id
FROM table
WHERE date BETWEEN DATE(DATE_SUB(NOW(), INTERVAL 8 DAY)) AND DATE(DATE_SUB(NOW(), INTERVAL 1 DAY))
)
I'll leave it as a task for you to combine it with your other query ;)
Hi if your date column in database is datetime/date or other date representing format you can do something like this:
for getting all users who accessed something in 8 days:
Select id, date from table
where date BETWEEN DATE_ADD(NOW(), INTERVAL -9 DAY) AND NOW()
I think, you can do whatever grouping you want on that.
To get new users, you can either go with self join or with sub select
selfjoin:
select t.id, t.date from table as t
LEFT join table as t2
ON t.id = t2.id
AND t.date BETWEEN DATE_ADD(NOW(), INTERVAL -1 DAY) AND NOW()
AND t2.date NOT BETWEEN DATE_ADD(NOW(), INTERVAL -9 DAY) AND NOW()
WHERE t2.id IS NULL
i used left join to match all access from users and then in where excluded those rows. However self joins are slow, and even slower with LEFT join
subselect:
select id, date from table
where date BETWEEN DATE_ADD(NOW(), INTERVAL -1 DAY) AND NOW()
AND id NOT IN (
SELECT id FROM table
WHERE date BETWEEN DATE_ADD(NOW(), INTERVAL -2 DAY) AND DATE_ADD(NOW(), INTERVAL -1 DAY)
)
I know those betweens with date_adds are not exactly nice looking, but i hope it will help you more than grouping dates
I would suggest using date with time for more information, but its entirely up to meaning of yours data
I have a table to register users logs every one minute and other activities using DateTime for each user_id
This is a sample data of my table
id | user_id | log_datetime
------------------------------------------
1 | 1 | 2016-09-25 13:01:08
2 | 1 | 2016-09-25 13:04:08
3 | 1 | 2016-09-25 13:07:08
4 | 1 | 2016-09-25 13:10:08
5 | 2 | 2016-09-25 13:11:08
6 | 1 | 2016-09-25 13:13:08
7 | 2 | 2016-09-25 13:13:09
8 | 2 | 2016-09-25 13:14:10
I would like to calculate the total active time on the system
UPDATE: Expected Output
For Example user_id 1 his total available time should be 00:12:00
Since his hours and seconds are same so I'll just subtract last log from previous then previous from next previous and so on then I'll sum all subtracted values
this a simple for
Simply I want to loop through the data from last record to first record with in my range
this is a simple formula I hope that make my question clear
SUM((T< n > - T< n-1 >) + (T< n-1 > - T< n-2 >) ... + (T< n-x > - T< n-first >))
Since user_id 1 his hours and seconds are the same then I'll calculate the minutes only.
(13-10)+(10-7)+(7-4)+(4-1) = 12
user_id | total_hours
---------------------------------
1 | 00:12:00
2 | 00:03:02
I did this code
SET #start_date = '2016-09-25';
SET #start_time = '13:00:00';
SET #end_date = '2016-09-25';
SET #end_time = '13:15:00';
SELECT
`ul1`.`user_id`, SEC_TO_TIME(SUM(TIME_TO_SEC(`dl1`.`log_datetime`))) AS total_hours
FROM
`users_logs` AS `ul1`
JOIN `users_logs` AS `ul2`
ON `ul1`.`id` = `ul2`.`id`
WHERE
`ul1`.`log_datetime` >= CONCAT(#start_date, ' ', #start_time)
AND
`ul2`.`log_datetime` <= CONCAT(#end_date, ' ', #end_time)
GROUP BY `ul1`.`user_id`
But this code Sum all Time not getting the difference. This is the output of the code
user_id | total_hours
---------------------------------
1 | 65:35:40
2 | 39:38:25
How can I calculate the Sum of all difference datetime, then I want to display his active hours every 12 hours (00:00:00 - 11:59:59) and (12:00:00 - 23:59:59) with in selected DateTime Period at the beginning of the code
So the output would look like this (just an dummy example not from given data)
user_id | total_hours | 00_12_am | 12_00_pm |
-------------------------------------------------------
1 | 10:10:40 | 02:05:20 | 08:05:20 |
2 | 04:10:20 | 01:05:10 | 03:05:30 |
Thank you
So you log every minute and if a user is available there is a log entry.
Then count the logs per user, so you have the number of total minutes.
select user_id, count(*) as total_minutes
from user_logs
group by user_id;
If you want them displayed as time use sec_to_time:
select user_id, sec_to_time(count(*) * 60) as total_hours
from user_logs
group by user_id;
As to conditional aggregation:
select
user_id,
count(*) as total_minutes,
count(case when hour(log_datetime) < 12 then 1 end) as total_minutes_am,
count(case when hour(log_datetime) >= 12 then 1 end) as total_minutes_pm
from user_logs
group by user_id;
UPDATE: In order to count each minute just once count distinct minutes, i.e. DATE_FORMAT(log_datetime, '%Y-%m-%d %H:%i'). This can be done with COUNT(DISTINCT ...) or with a subquery getting distinct values.
The complete query:
select
user_id,
count(*) as total_minutes,
count(case when log_hour < 12 then 1 end) as total_minutes_am,
count(case when log_hour >= 12 then 1 end) as total_minutes_pm
from
(
select distinct
user_id,
date_format(log_datetime, '%y-%m-%d %h:%i') as log_moment,
hour(log_datetime) as log_hour
from.user_logs
) log
group by user_id;
The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)