I have the following MySQL table:
+----+---------+----------------+------------+
| id | user_id | employment_type| date |
+----+---------+----------------+------------+
| 1 | 9 | full-time | 2013-01-01 |
| 2 | 9 | half-time | 2013-05-10 |
| 3 | 9 | full-time | 2013-12-01 |
| 4 | 248 | intern | 2015-01-01 |
| 5 | 248 | full-time | 2018-10-10 |
| 6 | 58 | half-time | 2020-10-10 |
| 7 | 248 | NULL | 2021-01-01 |
+----+---------+----------------+------------+
I want to query, for example, which employees were full-time employed on 2014-01-01.
Which SQL query I need to pass to get the correct result?
In this case, the result will be an employee with user_id=9;
Is this table properly structured to be possible to get such a result?
If your version of MySql is 8.0+ you can do it with FIRST_VALUE() window function:
SELECT DISTINCT user_id
FROM (
SELECT user_id,
FIRST_VALUE(employment_type) OVER (PARTITION BY user_id ORDER BY date DESC) last_type
FROM tablename
WHERE date <= '2014-01-01'
) t
WHERE last_type = 'full-time'
For previous versions of MySql you can do it with NOT EXISTS:
SELECT t1.user_id
FROM tablename t1
WHERE t1.date <= '2014-01-01' AND t1.employment_type = 'full-time'
AND NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.user_id = t1.user_id AND t2.date BETWEEN t1.date AND '2014-01-01'
AND COALESCE(t2.employment_type, '') <> t1.employment_type
)
See the demo.
Results:
| user_id |
| ------- |
| 9 |
You want the most recent record on or before that date. I would use row_number():
select t.*
from (select t.*,
row_number() over (partition by user_id order by date desc) as seqnum
from t
where date <= '2014-01-01'
) t
where seqnum = 1 and employment_type = 'full_time';
A fun method that just uses group by is:
select t.user_id
from t
where t.date <= '2014-01-01'
group by t.user_id
having max(date) = max(case when employment_type = 'full_time' then date end);
This checks that the maximum date -- before the cutoff -- is the same as the maximum date for 'full-time'.
Related
I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.
So the conditions are:
For each user max view count is 1
View = 1 if one user's SUM(duration) >= 120
Date should be fixed once SUM(duration) reaches 120
But the issue is here to get a correct date row. For example row1.duration + row2.duration >= 120 and thus view count = 1 should be applied for 2021-10-16
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data
SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at;
Is there any chance to have this work via SQL or it's should be done in application level?
Thanks in advance!
Update:
Version: AWS RDS MySQL 5.7.33
But I'm ok to switch to Postgres if that can help.
Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).
Better than this one.
SELECT IFNULL(cnt, 0) as cnt,
t3.created_at
FROM
(SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at) AS t2
RIGHT JOIN
(SELECT distinct(created_at)
FROM watch_time) AS t3
ON t2.created_at = t3.created_at;
which returns:
| cnt | created_at |
+-------+------------+
| 1 | 2021-10-15 |
| 0 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
But I'm not sure whether the date(2021-10-15) has taken randomly or its always the lowest date
Update 2:
Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.
So in this case:
row1 & row2 both has user_id: 1 and film_id: 1, and the result is 1 view, because the sum of their durations is >= 120. so the date in this case will be 2021-10-16.
but row3 has user_id: 1 and film_id: 2, and with duration >= 120 it's also a 1 view with date 2021-10-17
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 1 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row
http://sqlfiddle.com/#!9/569088/14
SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
SELECT user_id, created_at,
CASE WHEN #UID != user_id THEN #SUM_TIME := 0 WHEN #SUM_TIME >= 120 AND #DT != created_at THEN #SUM_TIME := 0 - duration ELSE 0 END SX,
#SUM_TIME := #SUM_TIME + duration AS duration,
#UID := user_id,
#DT := created_at
FROM watch_time
JOIN ( SELECT #SUM_TIME :=0, #DT := NOW(), #UID := '' ) t
ORDER BY user_id, created_at
) f
GROUP BY created_at
I think I misunderstood the requirement in my first attempt.
Second attempt
MySql >= 8.0 (or Postgresl) using window functions
I know you are working with MySql 5.7, I add an answer for it next.
I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?
First, I get cumulative sum by user (cte subquery) ordered by date. In subquery cte1 with a CASE statement I set one the first time a user reach 120 minutes (view column). Finally I group by created_at (date) and count() ones in view column:
WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
FROM watch_time),
cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view
FROM cte)
SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;
created_at
cnt
2021-10-15
0
2021-10-16
1
2021-10-17
0
2021-10-18
2
2021-10-19
0
MySql 5.7
I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id and get MIN(created_at). Finally I group by min_created_at and count records.
SELECT min_created_at AS date, count(*) AS cnt
FROM (SELECT user_id, MIN(created_at) AS min_created_at
FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at
GROUP BY wt1.user_id, wt1.created_at
HAVING SUM(wt2.duration) >= 120) AS sq
GROUP BY user_id) AS sq2
GROUP BY min_created_at;
date
cnt
2021-10-16
1
2021-10-18
2
You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.
First attempt
I understood that you want count one each time a user reach 120 minutes per day.
First, I get the total movie watch time by user and date (subquery sq), then with a CASE statement I set one each time a user in a date exceed 119 minutes, I group by created_at (date) and count() ones in CASE statement:
SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
FROM watch_time
GROUP BY created_at, user_id) AS sq
GROUP BY created_at;
Output (with sample data from the question):
reated_at
cnt
2021-10-15
0
2021-10-16
0
2021-10-17
1
2021-10-18
2
2021-10-19
0
table: t
+--------------+-----------+-----------+
| Id | price | Date |
+--------------+-----------+-----------+
| 1 | 30 | 2021-05-09|
| 1 | 24 | 2021-04-26|
| 1 | 33 | 2021-04-13|
| 2 | 36 | 2021-04-18|
| 3 | 15 | 2021-04-04|
| 3 | 33 | 2021-05-06|
| 4 | 46 | 2021-02-16|
+--------------+-----------+-----------+
I want to select rows where id is 1,2,4 and get maximum 2 row for each id by date descending order.
+--------------+-----------+-----------+
| Id | price | Date |
+--------------+-----------+-----------+
| 1 | 30 | 2021-05-09|
| 1 | 24 | 2021-04-26|
| 2 | 36 | 2021-04-18|
| 4 | 46 | 2021-02-16|
+--------------+-----------+-----------+
Something like:
Select * from t where Id IN ('1','2','4') limit 2 order by Date desc;
this will limit the overall result fetched.
Use row_number():
select id, price, date
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum
from t
where id in (1, 2, 4)
) t
where seqnum <= 2;
Probably the most efficient method is a correlated subquery:
select t.*
from t
where t.id in (1, 2, 4) and
t.date >= coalesce( (select t2.date
from t t2
where t2.id = t.id
order by t2.date desc
limit 1,1
), t.date
);
For performance, you want an index on (id, date). Also, this can return duplicates if there are multiple rows for a given id on the same date.
Here is a db<>fiddle.
I want to calculate count of order status changes within different states.
My Orderstatus table:
| id |ordr_id| status |
|----|-------|------------|
| 1 | 1 | pending |
| 2 | 1 | processing |
| 3 | 1 | complete |
| 4 | 2 | pending |
| 5 | 2 | cancelled |
| 6 | 3 | processing |
| 7 | 3 | complete |
| 8 | 4 | pending |
| 9 | 4 | processing |
Output I want:
| state | count |
|----------------------|-------|
| pending->processing | 2 |
| processing->complete | 2 |
| pending->cancelled | 1 |
Currently I'm fetching the results by SELECT order_id,GROUP_CONCAT(status) as track FROM table group by order_id and then process the data in php to get the output. But is that possible in query itself ?
Use lag():
select prev_status, status, count(*)
from (select t.*,
lag(status) over (partition by order_id order by status) as prev_status
from t
) t
group by prev_status, status;
LAG() is available in MySQL starting with version 8.
Note that you can filter out the first status for each order by putting where prev_status is not null in the outer query.
Your version is not quite correct, because it does not enforce the ordering. It should be:
SELECT order_id,
GROUP_CONCAT(status ORDER BY id) as track
EDIT:
In earlier versions of MySQL, you can use a correlated subquery:
select prev_status, status, count(*)
from (select t.*,
(select t2.status
from t t2
where t2.order_id = t.order_id and t2.id < t.id
order by t2.id desc
limit 1
) as prev_status
from t
) t
group by prev_status, status;
If id column ensure the sequence of records, you can use self join to achieve your requirement as below-
SELECT A.Status +'>'+ B.Status, COUNT(*)
FROM OrderStatus A
INNER JOIN OrderStatus B
ON A.id = B.id -1
WHERE B.Status IS NOT NULL
GROUP BY A.Status +'>'+ B.Status
With a join of the 3 status change types to the grouping of the table that you already did:
select c.changetype, count(*) counter
from (
select 'pending->processing' changetype union all
select 'processing->complete' union all
select 'pending->cancelled'
) c inner join (
select
group_concat(status order by id separator '->') changestatus
from tablename
group by ordr_id
) t on concat('->', t.changestatus, '->') like concat('%->', changetype, '->%')
group by c.changetype
See the demo.
Results:
> changetype | counter
> :------------------- | ------:
> pending->cancelled | 1
> pending->processing | 2
> processing->complete | 2
...or just a simple join...
SELECT CONCAT(a.status,'->',b.status) action
, COUNT(*) total
FROM my_table a
JOIN my_table b
ON b.ordr_id = a.ordr_id
AND b.id = a.id + 1
GROUP
BY action;
+----------------------+-------+
| action | total |
+----------------------+-------+
| pending->cancelled | 1 |
| pending->processing | 2 |
| processing->complete | 2 |
+----------------------+-------+
Note that this relies on the fact that ids are contiguous.
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 5 years ago.
I would like to select the "top most" entry for each row with a duplicated column value.
Performing the following query -
SELECT *
FROM shop
ORDER BY shop.start_date DESC, shop.created_date DESC;
I get the result set -
+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1 | 1 | 2017-02-01 | 2017-01-01 |
| 2 | 1 | 2017-01-01 | 2017-02-01 |
| 3 | 2 | 2017-01-01 | 2017-07-01 |
| 4 | 2 | 2017-01-01 | 2017-01-01 |
+--------+---------+------------+--------------+
Can I modify the SELECT so that I only get back the "top rows" for each unique shop_id -- in this case, row_ids 1 and 3. There can be 1..n number of rows with the same shop_id.
Similarly, if my query above returned the following order, I'd want to only SELECT row_ids 1 and 4 since those would be the "top most" entries each shop_id.
+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1 | 1 | 2017-02-01 | 2017-01-01 |
| 2 | 1 | 2017-01-01 | 2017-02-01 |
| 4 | 2 | 2017-01-01 | 2017-07-01 |
| 3 | 2 | 2017-01-01 | 2017-01-01 |
+--------+---------+------------+--------------+
You can do this by using a subquery:
select s.*
from shop s
where s.row_id = (
select row_id
from shop
where shop_id = s.shop_id
order by start_date desc, created_date desc
limit 1
)
Mind the assumption of row_id being uniq for each shop_id in this query example.
Demonstration
Or like this:
select t.*
from shop t
join (
select t2.shop_id, t2.start_date, max(t2.created_date) as created_date
from shop t2
join (
select max(start_date) as start_date, shop_id
from shop
group by shop_id
) t3 on t3.shop_id = t2.shop_id and t3.start_date = t2.start_date
group by t2.shop_id, t2.start_date
) t1 on t1.shop_id = t.shop_id and t.start_date = t1.start_date and t.created_date = t1.created_date
Mind that in case there can be records with the same start_date and created_date for the same shop_id you would need to use another group by s.shop_id, s.start_date, s.created_date in the outer query (adding min(row_id) with other columns listed in the group by in select)
Demonstration
Try joining to a subquery which finds the "top" rows for each shop_id:
SELECT t1.*
FROM shop t1
INNER JOIN
(
SELECT shop_id, MIN(row_id) AS min_id
FROM shop
GROUP BY shop_id
) t2
ON t1.shop_id = t2.shop_id AND
t1.row_id = t2.min_id
ORDER BY
t1.start_date DESC,
t1.created_date DESC;
Demo
I have a table like this:
+----+---------+---------------------+
| id | user_id | start_date |
+----+---------+---------------------+
| 1 | 1 | 2014-02-01 00:00:00 |
| 2 | 1 | 2014-01-01 00:00:00 |
| 3 | 2 | 2014-01-01 00:00:00 |
| 4 | 2 | 2014-01-01 00:00:00 |
| 5 | 3 | 2015-01-01 00:00:00 |
+----+---------+---------------------+
how can I select all rows that, for each user, have:
start date before NOW() and
maximum start_date
so for sample rows, the output should be:
+----+---------+---------------------+
| id | user_id | start_date |
+----+---------+---------------------+
| 1 | 1 | 2014-02-01 00:00:00 | // this is a single maximum date within that user
| 3 | 2 | 2014-01-01 00:00:00 | // these two share maximum start date
| 4 | 2 | 2014-01-01 00:00:00 |
+----+---------+---------------------+
what I have so far is something like this:
SELECT t.* FROM ticket t
JOIN (
SELECT start_date, MAX(start_date) FROM ticket /* GROUP BY user_id */
) highest
ON t.start_date = highest.start_date
WHERE t.start_date <= NOW();
but this doesn't work as desired. Am I on good path?
You're on the right track, sort of.
In your derived table, you need to get the max date for each user id, so:
SELECT user_id,
MAX(start_date) as MaxDate
FROM ticket
GROUP BY user_id
Then you can join to that on start date and user id:
SELECT t.* FROM ticket t
JOIN (
SELECT user_id,
MAX(start_date) as MaxDate
FROM ticket
GROUP BY user_id
) highest
ON t.start_date = highest.maxdate
and t.user_id = highest.user_id
WHERE t.start_date <= NOW();
SQL Fiddle
_try this:
SELECT T.* FROM ticket AS T
JOIN (SELECT
[User_Id]
,MAX([Start_Date]) AS Start_Date
FROM ticket
WHERE Start_Date <= GETDATE()
GROUP BY User_Id) AS Grouped ON T.User_Id = Grouped.User_Id AND T.Start_Date = Grouped.Start_Date
ORDER BY Id
DROP TABLE #This