I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.
So the conditions are:
For each user max view count is 1
View = 1 if one user's SUM(duration) >= 120
Date should be fixed once SUM(duration) reaches 120
But the issue is here to get a correct date row. For example row1.duration + row2.duration >= 120 and thus view count = 1 should be applied for 2021-10-16
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data
SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at;
Is there any chance to have this work via SQL or it's should be done in application level?
Thanks in advance!
Update:
Version: AWS RDS MySQL 5.7.33
But I'm ok to switch to Postgres if that can help.
Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).
Better than this one.
SELECT IFNULL(cnt, 0) as cnt,
t3.created_at
FROM
(SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at) AS t2
RIGHT JOIN
(SELECT distinct(created_at)
FROM watch_time) AS t3
ON t2.created_at = t3.created_at;
which returns:
| cnt | created_at |
+-------+------------+
| 1 | 2021-10-15 |
| 0 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
But I'm not sure whether the date(2021-10-15) has taken randomly or its always the lowest date
Update 2:
Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.
So in this case:
row1 & row2 both has user_id: 1 and film_id: 1, and the result is 1 view, because the sum of their durations is >= 120. so the date in this case will be 2021-10-16.
but row3 has user_id: 1 and film_id: 2, and with duration >= 120 it's also a 1 view with date 2021-10-17
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 1 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row
http://sqlfiddle.com/#!9/569088/14
SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
SELECT user_id, created_at,
CASE WHEN #UID != user_id THEN #SUM_TIME := 0 WHEN #SUM_TIME >= 120 AND #DT != created_at THEN #SUM_TIME := 0 - duration ELSE 0 END SX,
#SUM_TIME := #SUM_TIME + duration AS duration,
#UID := user_id,
#DT := created_at
FROM watch_time
JOIN ( SELECT #SUM_TIME :=0, #DT := NOW(), #UID := '' ) t
ORDER BY user_id, created_at
) f
GROUP BY created_at
I think I misunderstood the requirement in my first attempt.
Second attempt
MySql >= 8.0 (or Postgresl) using window functions
I know you are working with MySql 5.7, I add an answer for it next.
I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?
First, I get cumulative sum by user (cte subquery) ordered by date. In subquery cte1 with a CASE statement I set one the first time a user reach 120 minutes (view column). Finally I group by created_at (date) and count() ones in view column:
WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
FROM watch_time),
cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view
FROM cte)
SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;
created_at
cnt
2021-10-15
0
2021-10-16
1
2021-10-17
0
2021-10-18
2
2021-10-19
0
MySql 5.7
I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id and get MIN(created_at). Finally I group by min_created_at and count records.
SELECT min_created_at AS date, count(*) AS cnt
FROM (SELECT user_id, MIN(created_at) AS min_created_at
FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at
GROUP BY wt1.user_id, wt1.created_at
HAVING SUM(wt2.duration) >= 120) AS sq
GROUP BY user_id) AS sq2
GROUP BY min_created_at;
date
cnt
2021-10-16
1
2021-10-18
2
You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.
First attempt
I understood that you want count one each time a user reach 120 minutes per day.
First, I get the total movie watch time by user and date (subquery sq), then with a CASE statement I set one each time a user in a date exceed 119 minutes, I group by created_at (date) and count() ones in CASE statement:
SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
FROM watch_time
GROUP BY created_at, user_id) AS sq
GROUP BY created_at;
Output (with sample data from the question):
reated_at
cnt
2021-10-15
0
2021-10-16
0
2021-10-17
1
2021-10-18
2
2021-10-19
0
Related
Using MariaDB, I am trying to get a monthly total of items that were created minus items that were deleted that month, for each month. If no items were deleted, the total should be just the number of items that were created that month. If more items were deleted than created, the total should be a negative number.
The table has a created_at column which is never null and a deleted_at column which is set once the item has been 'deleted'
To illustrate, the (simplified) schema is like this:
TABLE Items:
+----------------------------------------------------------------------------+
| idItem | created_at | deleted_at |
+----------------------------------------------------------------------------+
| 1 | 2020-03-20T04:28:41.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
| 2 | 2020-03-20T04:28:41.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
| 3 | 2021-03-02T21:39:10.000+00:00 | ∅ |
| 4 | 2021-03-05T21:13:13.000+00:00 | ∅ |
| 5 | 2021-06-08T13:49:11.000+00:00 | 2021-07-27T02:36:05.000+00:00 |
| 6 | 2021-07-13T02:36:05.000+00:00 | ∅
| 7 | 2021-09-17T21:12:13.000+00:00 | ∅ |
+----------------------------------------------------------------------------+
The information I need is the monthly total that have not been deleted, like so:
+-----------------------------------+
| total_existing | during_month |
+-----------------------------------+
| 2 | 2020-03 | -- two were added
+-----------------------------------+
| 4 | 2021-03 | -- another two were created
+-----------------------------------+
| 5 | 2021-06 | -- another was added
+-----------------------------------+
| 3 | 2021-07 | -- three deleted, one added
+-----------------------------------+
| 4 | 2021-09 | -- one added
+-----------------------------------+
Ultimately, I need to display the total for each month.
I've tried this but it's not right.
SELECT
count(created.idItem) AS monthly_created_count,
count(deleted.idItem) AS monthly_deleted_count,
count(created.idItem) - count(deleted.idItem) as total,
DATE_FORMAT(created.created_at, '%Y-%m') as created_month ,
DATE_FORMAT(deleted.deleted_at, '%Y-%m') as deleted_month
FROM
Item created
LEFT JOIN
Item deleted
ON
DATE_FORMAT(deleted.deleted_at, '%Y-%m') = DATE_FORMAT(created.created_at, '%Y-%m')
GROUP BY DATE_FORMAT(created.created_at, '%Y-%m'), DATE_FORMAT(deleted.deleted_at, '%Y-%m')
I keep thinking I'm so close, but when we look at the rows where the deleted_at dates are set, it's obvious I'm off the mark.
If you're looking for a cumulative total of rows created/deleted, one approach is COUNT the number of records created and deleted by month/year separately. Then join the counts together with UNION ALL and calculate the sum totals:
SELECT t.YearMonth
, SUM(t.TotalCreated) - SUM(t.TotalDeleted) AS TotalExisting
FROM (
SELECT DATE_FORMAT(created_at, '%Y-%m') AS YearMonth
, COUNT(*) AS TotalCreated
, 0 AS TotalDeleted
FROM Item
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
UNION ALL
SELECT DATE_FORMAT(deleted_at, '%Y-%m') AS YearMonth
, 0 AS TotalCreated
, COUNT(*) AS TotalDeleted
FROM Item
WHERE deleted_at IS NOT NULL
GROUP BY DATE_FORMAT(deleted_at, '%Y-%m')
) t
GROUP BY t.YearMonth
ORDER BY t.YearMonth
Results:
YearMonth | TotalExisting
:-------- | ------------:
2020-03 | 2
2021-03 | 2
2021-06 | 1
2021-07 | -2
2021-09 | 1
Then wrap those statements in a CTE and use a Window Function to calculate the rolling total:
See also db<>fiddle
WITH cte AS (
SELECT t.YearMonth
, SUM(t.TotalCreated) - SUM(t.TotalDeleted) AS TotalExisting
FROM (
SELECT DATE_FORMAT(created_at, '%Y-%m') AS YearMonth
, COUNT(*) AS TotalCreated
, 0 AS TotalDeleted
FROM Item
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
UNION ALL
SELECT DATE_FORMAT(deleted_at, '%Y-%m') AS YearMonth
, 0 AS TotalCreated
, COUNT(*) AS TotalDeleted
FROM Item
WHERE deleted_at IS NOT NULL
GROUP BY DATE_FORMAT(deleted_at, '%Y-%m')
) t
GROUP BY t.YearMonth
ORDER BY t.YearMonth
)
SELECT YearMonth, SUM(TotalExisting) OVER (ORDER BY YearMonth) AS TotalExisting
FROM cte;
Final Results:
YearMonth | TotalExisting
:-------- | ------------:
2020-03 | 2
2021-03 | 4
2021-06 | 5
2021-07 | 3
2021-09 | 4
Here is my current query:
SELECT DATEDIFF(created_at, '2020-07-01') DIV 6 period,
user_id FROM transactions
WHERE DATE(created_at) >= '2020-07-01'
GROUP BY user_id, DATEDIFF(created_at, '2020-07-01') DIV 6
ORDER BY period
It returns a list of users that have had at least one transaction per period (period === 6 days). Here is a simplified of the current output:
// res_table
+--------+---------+
| period | user_id |
+--------+---------+
| 0 | 1111 |
| 0 | 2222 |
| 0 | 3333 |
| 1 | 7777 |
| 1 | 1111 |
| 2 | 2222 |
| 2 | 1111 |
| 2 | 8888 |
| 2 | 3333 |
+--------+---------+
Now, I need to know, in which period, how many users have had at least one transaction again (in the term of marketing, I'm trying to picturing the retention rate by a Cohort chart). Therefore, the calculations must be done in the Cartesian algorithm; Like a self-join!
Here is the expected result:
+---------+---------+------------+
| periodX | periodY | percentage |
+---------+---------+------------+
| 0 | 0 | 100% | -- it means 3 users exist in period 0 and logically all of them exist in period 0. So 3/3=100%
| 0 | 1 | 33% | -- It means 3 users exist in period 0, and just 1 of them exist in period 1. So 1/3=33%
| 0 | 2 | 66% | -- It means 3 user exists in period 0, and just 2 of them exist in period 2. So 2/3=66%
| 1 | 1 | 100% | -- it means 1 user (only #777, actually #111 is ignored because it's duplicated in pervious periods) exists in period 1 and logically it exists in period 1. So 1/1=100%
| 1 | 2 | 0% |
| 2 | 2 | 100% |
+---------+---------+------------+
Is it possible to do this using MySQL purely?
You can use window functions:
SELECT first_period, period, COUNT(*),
COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY first_period) as ratio
FROM (SELECT DATEDIFF(created_at, '2020-07-01') DIV 6 period,
user_id,
MIN(MIN(DATEDIFF(created_at, '2020-07-01') DIV 6) OVER (PARTITION BY user_id)) as first_period
FROM transactions
WHERE DATE(created_at) >= '2020-07-01'
GROUP BY user_id, DATEDIFF(created_at, '2020-07-01') DIV 6
) u
GROUP BY first_period, period
ORDER BY first_period, period;
This does not include missing periods. That is a little trickers, because you need to enumerate all of them:
with periods as (
select 0 as period union all
select 1 as period union all
select 2 as period
)
select p1.period, p2.period, COUNT(u.user_id)
from periods p1 join
periods p2
on p1.period <= p2.period left join
(SELECT DATEDIFF(created_at, '2020-07-01') DIV 6 period,
user_id,
MIN(MIN(DATEDIFF(created_at, '2020-07-01') DIV 6) OVER (PARTITION BY user_id)) as first_period
FROM transactions
WHERE DATE(created_at) >= '2020-07-01'
GROUP BY user_id, DATEDIFF(created_at, '2020-07-01') DIV 6
) u
ON p1.period = u.first_period AND p2.period = u.period
GROUP BY p1.period, p2.period;
I am running a mysql - 10.1.39-MariaDB - mariadb.org binary- database.
I am having the following table:
| id | date | product_name | close |
|----|---------------------|--------------|-------|
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 |
| 2 | 2019-08-06 00:00:00 | Product 1 | 982 |
| 3 | 2019-08-05 00:00:00 | Product 1 | 64 |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 |
| 5 | 2019-08-06 00:00:00 | Product 2 | 739 |
| 6 | 2019-08-05 00:00:00 | Product 2 | 555 |
| 7 | 2019-08-07 00:00:00 | Product 3 | 762 |
| 8 | 2019-08-06 00:00:00 | Product 3 | 955 |
| 9 | 2019-08-05 00:00:00 | Product 3 | 573 |
I want to get the following output:
| id | date | product_name | close | daily_return |
|----|---------------------|--------------|-------|--------------|
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0,182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0,179226069 |
Basically I want ot get the TOP 2 products with the highest return. Whereas return is calculated by (close_currentDay - close_previousDay)/close_previousDay for each product.
I tried the following:
SELECT
*,
(
CLOSE -(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
)
) /(
SELECT
(t2.close)
FROM
prices t2
WHERE
t2.date < t1.date
ORDER BY
t2.date
DESC
LIMIT 1
) AS daily_return
FROM
prices t1
WHERE DATE >= DATE(NOW()) - INTERVAL 1 DAY
Which gives me the return for each product_name.
How to get the last product_name and sort this by the highest daily_return?
Problem Statement: Find the top 2 products with the highest returns on the latest date i.e. max date in the table.
Solution:
If you have an index on date field, it would be super fast.
Scans table only once and also uses date filter(index would allow MySQL to only process rows of given date range only.
A user-defined variable #old_close is used to find the return. Note here we need sorted data based on product and date.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices
INNER JOIN (
SELECT
DATE(MAX(`date`)) - INTERVAL 1 DAY AS date_from, -- if you're not sure whether you have date before latest date or not, can keep date before 1/2/3 day.
#old_close:= 0 as o_c
FROM prices
) AS t ON prices.date >= t.date_from
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2;
Another version which doesn't depend on this date parameter.
SELECT *
FROM (
SELECT
prices.*,
CAST((`close` - #old_close) / #old_close AS DECIMAL(20, 10)) AS daily_return, -- Use #old_case, currently it has value of old row, next column will set it to current close value.
#old_close:= `close` -- Set #old_close to close value of this row, so it can be used in next row
FROM prices,
(SELECT #old_close:= 0 as o_c) AS t
ORDER BY product_name, `date` ASC
) AS tt
ORDER BY `date` DESC, daily_return DESC
LIMIT 2
You can do it with a self join:
select
p.*,
cast((p.close - pp.close) / pp.close as decimal(20, 10)) as daily_return
from prices p left join prices pp
on p.product_name = pp.product_name
and pp.date = date_add(p.date, interval -1 day)
order by p.date desc, daily_return desc, p.product_name
limit 2
See the demo.
Results:
| id | date | product_name | close | daily_return |
| --- | ------------------- | ------------ | ----- | ------------ |
| 4 | 2019-08-07 00:00:00 | Product 2 | 874 | 0.182679296 |
| 1 | 2019-08-07 00:00:00 | Product 1 | 806 | -0.179226069 |
I have the following ranking table:
CREATE TABLE IF NOT EXISTS ranking(
user_id int(11) unsigned NOT NULL,
create_date date NOT NULL,
score int(8),
PRIMARY KEY (user_id, create_date)
);
I want to get each user's maximum number of consecutive days during which the score is greater or equal to 10. For example, if the table contains the following entries, the output (user, max_number) is listed below. My question is how to write the query in MySQL?
user_id | create_date | score
1 | 2017-03-08 | 40
1 | 2017-03-07 | 50
1 | 2017-03-06 | 60
1 | 2017-03-05 | 0
1 | 2017-03-04 | 70
1 | 2017-03-03 | 80
1 | 2017-03-02 | 0
2 | 2017-03-10 | 20
2 | 2017-03-09 | 30
2 | 2017-03-08 | 40
2 | 2017-03-07 | 50
2 | 2017-03-06 | 0
2 | 2017-03-05 | 60
2 | 2017-03-04 | 70
Output:
user_id | max_number
1 | 3
2 | 4
You can use user variables for this task:
select user_id, max(cnt) max_cnt
from (
select user_id, date_group, count(*) cnt
from (
select t.*, date_sub(create_date, interval(#rn := #rn + 1) day) date_group
from your_table t, (select #rn := 0) x
where score >= 10
order by user_id, create_date
) t
group by user_id, date_group
) t
group by user_id;
Produces:
user_id max_cnt
1 3
2 4
Demo: Rextester
How it works:
We generate a sequence number in the order of user_id and create_date (both increasing) and then, subtract as many days as this sequence number from the create_date to create groups where the dates are consecutive and then apply required aggregations to get the results.
I'm using two tables in the database. These tables look like this:
Table A:
id | date
----------------------
12001 | 2011-01-01
13567 | 2011-01-04
13567 | 2011-01-04
11546 | 2011-01-07
13567 | 2011-01-07
18000 | 2011-01-08
Table B:
user | date | amount
----------------------------------
15467 | 2011-01-04 | 140
14568 | 2011-01-04 | 120
14563 | 2011-01-05 | 140
12341 | 2011-01-07 | 140
18000 | 2011-01-08 | 120
I need a query that will join these the two tables.
The first query should result in a total number of users from table A group by date and the number of unique users from table A grouped by date. That query looks like:
SELECT COUNT(DISTINCT id) AS uniq, COUNT(*) AS total, format_date(date, '%Y-%m-%d') as date FROM A GROUP BY date
From the second table I need the sum of the amounts grouped by dates.
That query looks like:
SELECT SUM(amount) AS total_amount FROM B GROUP BY DATE_FORMAT( date, '%Y-%m-%d' )
What I want to do is to merge these two queries into one on column "date", and that as a result I get the following list:
date | unique | total | amount
-----------------------------------------------
2011-01-01 | 1 | 1 | 0
2011-01-04 | 1 | 2 | 260
2011-01-05 | 0 | 0 | 140
2011-01-07 | 2 | 2 | 140
2011-01-08 | 1 | 1 | 120
How can I do that using one query?
Thanks for all suggestions.
select date_format(a.date, '%Y-%m-%d') as date, a.uniq, a.total, ifnull(b.amount, 0) as amount
from (
select count(distinct id) as uniq, count(*) as total, date
from tablea
group by date
) a
left join (
select sum(amount) as amount, date
from tableb
group by date
) b on a.date = b.date
order by a.date
I assume that field date is a datetime type. It's better to format output fields in final result set (date field in this case).
Your queries are fine everything they need is a join.