here is a table of users, carts, and time.
A user can have multiple carts. All anonymous users have the userId = 1000; any identified user has an ID different from 1000.
All cartIds are unique.
+------------+-------------+----------------------+
| userId | cartId | time |
+------------+-------------+----------------------+
| 7650 | 231 | 2014-08-27 13:41:02 |
+------------+-------------+----------------------+
| 7632 | 221 | 2014-08-27 13:42:02 |
+------------+-------------+----------------------+
| 7650 | 289 | 2014-08-27 14:13:02 |
+------------+-------------+----------------------+
| 1000 | 321 | 2014-08-27 14:41:02 |
+------------+-------------+----------------------+
| 7650 | 500 | 2014-08-27 17:41:02 |
I am interested in calculating the number of distinct identified users by the hour of the day.
I tried the following, but it cannot keep a record of all the IDs entered before when I group by them up by Hour(Date).
COUNT( distinct (case when userId <> 1000 then userId end)) as numSELFIDUsers
For the output, I want something like:
+------------+-------------+----------------------+
| Date | HourOfDay | numSELFIDUsers |
+------------+-------------+----------------------+
| 2014-08-27 | 13 | 2 |
+------------+-------------+----------------------+
| 2014-08-27 | 14 | 0 |
+------------+-------------+----------------------+
| 2014-08-27 | 17 | 0 |
+------------+-------------+----------------------+
Please let me know if there are any questions.
Thanks in advance for the help.
I think you want something like this:
select date(time), hour(time),
COUNT(distinct case when userId <> 1000 then userId end) as numSELFIDUsers
from usercarts
where date(time) = '2014-08-27'
group by date(time), hour(time)
order by 1, 2;
This looks similar to what you have in the query. I'm not sure why your version wouldn't work.
EDIT:
You also seem to want times with 0 counts. This is a bit more challenging, but you can do it like this:
select d.dt, h.hr, COUNT(distinct case when userId <> 1000 then userId end)
from (select distinct date(time) dt from usercarts where dt IN YOUR RANGE) d cross join
(select 0 as hr union all select 1 union all select 2 union all select 3 union all . . .
select 23
) h left join
usercart uc
on date(uc.time) = d.dt and hour(uc.time) = h.hr;
The . . . is where you put in the rest of the numbers from 3 to 23.
EDIT II:
I suspect that you are actually looking for the first time a user appears. If so, try this:
select date(firsttime), hour(firsttime), count(*) as NumFirstUsers
from (select userId, min(time) as firsttime
from usercarts
where userid <> 1000
group by userId
) u
group by date(firsttime), hour(firsttime)
order by 1, 2;
Related
I have a table where I track the duration of watched films by a user for each day.
Now I would like to calculate a unique view count based on date.
So the conditions are:
For each user max view count is 1
View = 1 if one user's SUM(duration) >= 120
Date should be fixed once SUM(duration) reaches 120
But the issue is here to get a correct date row. For example row1.duration + row2.duration >= 120 and thus view count = 1 should be applied for 2021-10-16
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
This is what I tried, but it choses first date, and ignores 0 count.
Here is the fiddle with populated data
SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at;
Is there any chance to have this work via SQL or it's should be done in application level?
Thanks in advance!
Update:
Version: AWS RDS MySQL 5.7.33
But I'm ok to switch to Postgres if that can help.
Much appreciated even there is a way to have MIN(date) but with the all dates(included 0 views).
Better than this one.
SELECT IFNULL(cnt, 0) as cnt,
t3.created_at
FROM
(SELECT count(*) AS cnt,
created_at
FROM
(SELECT user_id,
sum(duration) AS total,
created_at
FROM watch_time
GROUP BY user_id) AS t
WHERE t.total >= 120
GROUP BY created_at) AS t2
RIGHT JOIN
(SELECT distinct(created_at)
FROM watch_time) AS t3
ON t2.created_at = t3.created_at;
which returns:
| cnt | created_at |
+-------+------------+
| 1 | 2021-10-15 |
| 0 | 2021-10-16 |
| 0 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
But I'm not sure whether the date(2021-10-15) has taken randomly or its always the lowest date
Update 2:
Is it possible to include the film_id as well? Like considering user_id, film_id as a unique view instead of only grouping by user_id.
So in this case:
row1 & row2 both has user_id: 1 and film_id: 1, and the result is 1 view, because the sum of their durations is >= 120. so the date in this case will be 2021-10-16.
but row3 has user_id: 1 and film_id: 2, and with duration >= 120 it's also a 1 view with date 2021-10-17
| id | user_id | duration | created_at | film_id |
+----+---------+----------+------------+---------+
| 1 | 1 | 80 | 2021-10-15 | 1 |
| 2 | 1 | 70 | 2021-10-16 | 1 |
| 3 | 1 | 200 | 2021-10-17 | 2 |
| 4 | 2 | 50 | 2021-10-18 | 1 |
| 5 | 2 | 90 | 2021-10-18 | 1 |
| 6 | 3 | 140 | 2021-10-18 | 2 |
| 7 | 4 | 10 | 2021-10-19 | 3 |
Expected result:
| cnt | created_at |
+-------+------------+
| 0 | 2021-10-15 |
| 1 | 2021-10-16 |
| 1 | 2021-10-17 |
| 2 | 2021-10-18 |
| 0 | 2021-10-19 |
Using MySQL variables, it can implement your count logic, it basically orders the table rows by user_id and created_at, and calculate row by row
http://sqlfiddle.com/#!9/569088/14
SELECT created_at, SUM(CASE WHEN duration >= 120 THEN 1 ELSE 0 END) counts
FROM (
SELECT user_id, created_at,
CASE WHEN #UID != user_id THEN #SUM_TIME := 0 WHEN #SUM_TIME >= 120 AND #DT != created_at THEN #SUM_TIME := 0 - duration ELSE 0 END SX,
#SUM_TIME := #SUM_TIME + duration AS duration,
#UID := user_id,
#DT := created_at
FROM watch_time
JOIN ( SELECT #SUM_TIME :=0, #DT := NOW(), #UID := '' ) t
ORDER BY user_id, created_at
) f
GROUP BY created_at
I think I misunderstood the requirement in my first attempt.
Second attempt
MySql >= 8.0 (or Postgresl) using window functions
I know you are working with MySql 5.7, I add an answer for it next.
I am not sure if I understand correctly your requirement. Do you want the cumulative sum of time watch by user and the first time some user exceed 119 minutes count one that day?
First, I get cumulative sum by user (cte subquery) ordered by date. In subquery cte1 with a CASE statement I set one the first time a user reach 120 minutes (view column). Finally I group by created_at (date) and count() ones in view column:
WITH cte AS (SELECT *, SUM(duration) OVER (PARTITION BY user_id ORDER BY created_at ASC, film_id) as cum_duration
FROM watch_time),
cte1 AS (SELECT *, CASE WHEN cum_duration >= 120 AND COALESCE(LAG(cum_duration) OVER (PARTITION BY user_id ORDER BY created_at ASC), 0) < 120 THEN 1 END AS view
FROM cte)
SELECT created_at, COUNT(view) AS cnt
FROM cte1
GROUP BY created_at;
created_at
cnt
2021-10-15
0
2021-10-16
1
2021-10-17
0
2021-10-18
2
2021-10-19
0
MySql 5.7
I get the cumulative sum for each user and filter cumulative duration >= 120, then I group by user_id and get MIN(created_at). Finally I group by min_created_at and count records.
SELECT min_created_at AS date, count(*) AS cnt
FROM (SELECT user_id, MIN(created_at) AS min_created_at
FROM (SELECT wt1.user_id, wt1.created_at, SUM(wt2.duration) AS cum_duration
FROM (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt1
INNER JOIN (SELECT user_id, created_at, SUM(duration) AS duration FROM watch_time GROUP BY user_id, created_at) wt2 ON wt1.user_id = wt2.user_id AND wt1.created_at >= wt2.created_at
GROUP BY wt1.user_id, wt1.created_at
HAVING SUM(wt2.duration) >= 120) AS sq
GROUP BY user_id) AS sq2
GROUP BY min_created_at;
date
cnt
2021-10-16
1
2021-10-18
2
You can JOIN my query (RIGHT JOIN) with the original table (GROUP BY created_at) to get the rest of the dates with count equal to 0.
First attempt
I understood that you want count one each time a user reach 120 minutes per day.
First, I get the total movie watch time by user and date (subquery sq), then with a CASE statement I set one each time a user in a date exceed 119 minutes, I group by created_at (date) and count() ones in CASE statement:
SELECT created_at, COUNT(CASE WHEN total_duration >= 120 THEN 1 END) cnt
FROM (SELECT created_at, user_id, SUM(duration) AS total_duration
FROM watch_time
GROUP BY created_at, user_id) AS sq
GROUP BY created_at;
Output (with sample data from the question):
reated_at
cnt
2021-10-15
0
2021-10-16
0
2021-10-17
1
2021-10-18
2
2021-10-19
0
Suppose that we have a table to log users' weights and other info as follows:
health_indexes
id | user_id | weight | created_at
---+---------+--------+-----------
1 | 50 | 100 | 2020-01-01
2 | 50 | 98 | 2020-01-05
3 | 50 | 98.5 | 2020-01-10
4 | 50 | 92 | 2020-01-15
5 | 50 | 80 | 2020-01-20
.
.
.
10 | 100 | 130 | 2018-01-01
11 | 100 | 149999 | 2018-01-05
12 | 100 | 159999 | 2018-01-10
13 | 100 | 120 | 2018-01-15
.
.
.
20 | 200 | 87 | 2020-02-01
.
.
.
30 | 300 | 140 | 2020-01-01
I do get to the following table, but I'm looking for a better way:
user_id | first_weight | first_created_at | last_weight | last_created_at
--------+--------------+------------------+-------------+----------------
50 | 100 | 2020-01-01 | 80 | 2020-01-20
100 | 130 | 2018-01-01 | 120 | 2018-01-15
Query:
select u.id user_id,
(select weight from health_indexes where user_id = u.id order by created_at limit 1) first_weight,
(select created_at from health_indexes where user_id = u.id order by created_at limit 1) first_created_at,
(select weight from health_indexes where user_id = u.id order by created_at desc limit 1) last_weight,
(select created_at from health_indexes where user_id = u.id order by created_at desc limit 1) last_created_at
from users u
group by u.id
having first_weight > last_weight
order by (first_weight - last_weight) desc
limit 50;
I'm looking for a way to JOIN twice on health_indexes to get the same result. Any ideas?
If you are using MySQL 8.0, you can do this with window functions only, without any join. There is one of the rare cases when distinct can be usefuly combined with window functions:
select distinct
user_id,
first_value(weight) over(partition by user_id order by created_at) first_weight,
min(created_at) over(partition by user_id) first_created_at,
first_value(weight) over(partition by user_id order by created_at desc) last_weight,
max(created_at) over(partition by user_id) last_created_at
from health_indexes
In earlier versions, one options uses joins and filtering:
select
hi1.user_id,
hi1.weight first_weight,
hi1.created_at first_created_at,
hi2.weight last_weight,
hi2.created_at last_created_at
from health_indexes hi1
inner join health_indexes hi2 on hi2.user_id = hi1.user_id
where
hi1.created_at = (select min(h3.created_at) from health_indexes hi3 where hi3.user_id = hi1.user_id)
and hi2.created_at = (select max(h3.created_at) from health_indexes hi3 where hi3.user_id = hi2.user_id)
I have lot of coupons. I would like to learn for each day how many coupons from each campaign have been received by users. But I cannot write something like assigned_date for each select row
SELECT count(id) as number_of_coupons,
DATE_FORMAT(assigned_date, '%d-%m-%Y') as date,
(SELECT COUNT(id) FROM coupon WHERE campaign_id = 1 AND assigned_date=THIS MUST BE SOMETHING) as campaign_1,
(SELECT COUNT(id) FROM coupon WHERE campaign_id = 2 AND assigned_date=THIS MUST BE SOMETHING) as campaign_2
FROM coupon
GROUP BY DATE_FORMAT(assigned_date, '%d-%m-%Y')
order by STR_TO_DATE(date, '%d-%m-%Y') DESC
So the result will be something like. How can I achieve this result?
+-------------------+------------+-------------+-----------+
| number of coupons | date | campaign_1 | campaign2 |
+-------------------+------------+-------------+-----------+
| 156 | 12-10-2019 | 6980 | 100 |
| 177 | 11-10-2019 | 6980 | 100 |
| 44 | 10-10-2019 | 6980 | 100 |
| 94 | 09-10-2019 | 6980 | 100 |
| 93 | 08-10-2019 | 6980 | 100 |
+-------------------+------------+-------------+-----------+
Not knowing what your data structure looks like, I can only speculate on what the solution should be. However, my guess is that a query such as the following is what you want:
SELECT COUNT(DISTINCT cv.id) as number_of_coupons,
DATE_FORMAT(cv.assigned_date, '%d-%m-%Y') as date,
SUM(CASE WHEN c.campaign_id = 1 THEN 1 ELSE 0 END) as campaign_1,
SUM(CASE WHEN c.campaign_id = 2 THEN 1 ELSE 0 END) as campaign_2
FROM coupon_vault cv LEFT JOIN
coupons c
ON cv.coupon_id = c.coupon_id
GROUP BY DATE_FORMAT(cv.assigned_date, '%d-%m-%Y')
ORDER BY MIN(cv.assigned_date);
It is quite possible that the COUNT(DISTINCT) is unnecessary and COUNT() would suffice.
Both Postgres and MySQL (your original tag) have reasonable alternatives for the SUM(CASE . . .). However, you have not specified your database, so I'm sticking with the code that works in both databases.
I have the following SQL table that keeps track of a user's score at a particular timepoint. A user can have multiple scores per day.
+-------+------------+-------+-----+
| user | date | score | ... |
+-------+------------+-------+-----+
| bob | 2014-04-19 | 100 | ... |
| mary | 2014-04-19 | 100 | ... |
| alice | 2014-04-20 | 100 | ... |
| bob | 2014-04-20 | 110 | ... |
| bob | 2014-04-20 | 125 | ... |
| mary | 2014-04-20 | 105 | ... |
| bob | 2014-04-21 | 115 | ... |
+-------+------------+-------+-----+
Given a particular user (let's say bob), How would I generate a report of each user's score, but only use the highest submitted score per day? (Getting the specific row with the highest score is important as well, not just the highest score)
SELECT * FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`
is the base query that I'm building off of. It results in the following result set:
+-------+------------+-------+-----+
| user | date | score | ... |
+-------+------------+-------+-----+
| bob | 2014-04-19 | 100 | ... |
| bob | 2014-04-20 | 110 | ... |
| bob | 2014-04-21 | 115 | ... |
+-------+------------+-------+-----+
bob's higher score of 125 from 2014-04-20 is missing. I tried rectifying that with MAX(score)
SELECT *, MAX(score) FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`
returns the highest score for the day, but not the row that has the highest score. Other column values on that row are important,
+-------+------------+-------+-----+------------+
| user | date | score | ... | max(score) |
+-------+------------+-------+-----+------------+
| bob | 2014-04-19 | 100 | ... | 100 |
| bob | 2014-04-20 | 110 | ... | 125 |
| bob | 2014-04-21 | 115 | ... | 110 |
+-------+------------+-------+-----+------------+
Lastly, I tried
SELECT *, MAX(score) FROM `user_score` WHERE `user` = 'bob' AND score = MAX(score) GROUP BY `date`
But that results in an invalid use of GROUP BY.
Selecting a row with specific value from a group? is on the right track with what I am trying to accomplish, but I dont know the specific score to filter by.
EDIT:
SQLFiddle: http://sqlfiddle.com/#!2/ee6a2
If you want all the fields, the easiest (and fastest) way in MySQL is to use not exists:
SELECT *
FROM `user_score` us
WHERE `user` = 'bob' AND
NOT EXISTS (SELECT 1
FROM user_score us2
WHERE us2.`user` = us.`user` AND
us2.date = us.date AND
us2.score > us.score
);
This may seem like a strange approach. And, I'll admit that it is. What it is doing is pretty simple: "Get me all rows for Bob from user_score where there is no higher score (for Bob)". That is equivalent to getting the row with the maximum score. With an index on user_score(name, score), this is probably the most efficient way to do what you want.
You can use a JOIN:
SELECT a.*
FROM `user_score` as a
INNER JOIN (SELECT `user`, `date`, MAX(score) MaxScore
FROM `user_score`
GROUP BY `user`, `date`) as b
ON a.`user` = b.`user`
AND a.`date` = b.`date`
AND a.score = b.MaxScore
WHERE a.`user` = 'bob'
One option is to use an inline view and a JOIN operation. If there is more than one row with the "high score" value for a given day, this query will return all the rows. (If (user,date,score) is unique, then this isn't a problem.)
For example:
SELECT t.user
, t.date
, t.score
, t.`...`
FROM user_score t
JOIN ( SELECT d.user
, d.date
, MAX(s.score) AS score
FROM user_score d
WHERE d.user = 'bob'
GROUP BY d.user, d.date
) s
ON s.user = t.user
AND s.date = t.date
AND s.score = t.score
I have the following tables:
chats:
+--------+-------------+--------------+--------------+---------------------+
| chatID | firstPerson | secondPerson | chatAccepted | creationDate |
+--------+-------------+--------------+--------------+---------------------+
| 1 | 59 | 52 | 1 | 31-01-2014 09:32:37 |
| 2 | 59 | 12 | 0 | 28-01-2014 11:07:25 |
| 3 | 34 | 59 | 1 | 28-01-2014 08:50:48 |
| 4 | 78 | 59 | 1 | 26-01-2014 03:58:12 |
+--------+-------------+--------------+--------------+---------------------+
messages:
+-----------+-------------+--------+----------+------------+---------------------+
| messageID | messageText | chatID | senderID | receiverID | creationDate |
+-----------+-------------+--------+----------+------------+---------------------+
| 1 | Lorum... | 1 | 59 | 52 | 31-01-2014 09:32:37 |
| 2 | Ipsum... | 1 | 52 | 59 | 28-01-2014 11:07:25 |
| 3 | Dollar... | 3 | 34 | 59 | 28-01-2014 08:50:48 |
| 4 | Sit... | 3 | 59 | 34 | 31-01-2014 11:09:48 |
+-----------+-------------+--------+----------+------------+---------------------+
What I'm trying to get as a result is the chatID where (firstPerson = 59 or secondPerson = 59) and chatAccepted = 1. Now the thing I can't get figured out: I want the result ordered by 'which one has the latest message'.
I tried a lot of different things, one was:
"SELECT chats.chatID, chats.firstPerson, chats.secondPerson, str_to_date(messages.creationDate,'%d-%m-%Y %H:%i:%s') AS cdate
FROM chats
INNER JOIN messages
ON chats.chatID=messages.chatID
WHERE chats.chatAccepted = 1 AND messages.receiverID = 59
UNION SELECT chats.chatID, chats.firstPerson, chats.secondPerson, str_to_date(messages.creationDate,'%d-%m-%Y %H:%i:%s') AS cdate
FROM chats
INNER JOIN messages
ON chats.chatID=messages.chatsID
WHERE chats.chatAccepted = 1 AND messages.senderID = 59
ORDER BY cdate desc"
This works like a charm, except for when there are no messages yet. Than it just results no records. But i need to know if that chat is accepted otherwise they can never even start the chat.
Any help would be very welcome.
If you need any more information please led me know!
UPDATE: so what I at least want as a result in this case would be:
+--------+
| chatID |
+--------+
| 3 |
| 1 |
| 4 |
+--------+
this because chatID '3' has the latest message linked to it. chatID '4' has no messages yet, but it is a accepted chat so it should be in the result.
SELECT c.chatID, c.firstPerson, c.secondPerson, str_to_date(m.creationDate,'%d-%m-%Y %H:%i:%s') AS cdate
FROM chats c left join messages m on c.chatID=m.chatID
WHERE c.chatAccepted = 1 AND
59 in (m.senderID, m.receiverID)
ORDER BY m.creationDate desc
To keep also chats whit no messages you need a LEFT JOIN then you should decide witch order have the chats without messages, suppose you want use the chat creation date, then use the COALESCE function to select chat creation date where there are no messages (message creation date is NULL).
In this way you have multiple entry for chats with more messages, than you have to use also the aggregation funcion MAX on cDate to have only last messages entries:
SELECT c.chatID,
c.firstPerson,
c.secondPerson,
str_to_date(MAX(COALESCE(m.creationDate, c.creationDate)),'%d-%m-%Y %H:%i:%s') AS cdate
FROM chats AS c LEFT OUTER JOIN
messages AS m ON c.ChatID = m.ChatID
WHERE c.Accepted = 1 AND (c.firstPerson=59 OR c.SecondPerson=59)
GROUP BY c.chatID, c.firstPerson, c.secondPerson
ORDER BY cdate DESC
I'm not sure if in MySql applying the str_to_date function to an aggregate field works, in the negative case the query should be rewritten as:
SELECT chatID,
firstPerson,
secondPerson,
str_to_date(cdate,'%d-%m-%Y %H:%i:%s') AS cdate
FROM (SELECT c.chatID,
c.firstPerson,
c.secondPerson,
MAX(COALESCE(m.creationDate, c.creationDate)) AS cdate
FROM chats AS c LEFT OUTER JOIN
messages AS m ON c.ChatID = m.ChatID
WHERE c.Accepted = 1 AND (c.firstPerson=59 OR c.SecondPerson=59)
GROUP BY c.chatID, c.firstPerson, c.secondPerson
ORDER BY cdate DESC) AS devTbl
I hope this help