I've got the following table/data (example)
Users
user_id | email
1 | asd#asd.com
2 | asd2#asd.com
3 | asd3#asd.com
4 | asd4#asd.com
5 | asd5#asd.com
Scheduled_Jobs
job_id | user_id | date
1 | 1 | 05/09/2019
2 | 1 | 05/10/2019
3 | 1 | 05/11/2019
4 | 1 | 05/12/2019
5 | 2 | 07/10/2019
6 | 2 | 07/11/2019
7 | 2 | 07/12/2019
8 | 3 | 11/07/2019
9 | 4 | 13/10/2019
10 | 4 | 13/11/2019
11 | 5 | 10/10/2019
12 | 5 | 10/11/2019
13 | 5 | 10/12/2019
Last_Update
update_id | job_id
1 | 1
2 | 2
3 | 3
4 | 5
5 | 9
6 | 11
When a user is created a list of scheduled jobs is created too. When a user completes a job the Last_Update table is getting updated.
I'm trying to show a list of users which got unfinished jobs based on date. For example 1-30 days delay: x users, 31-60 days delay: y users etc
Based on the example above here would be the expected result:
Number of users with no delayed jobs: 2 (users 1 & 4)
1-30 days delay: 2 (users 2 & 5)
31-60 days delay: 0
Over 60 days delay: 1 (user 3)
I'm currently only showing the number of users that got no delayed jobs
SELECT u.user_id
FROM users u
LEFT JOIN (
SELECT j.user_id AS completed
FROM jobs j
LEFT JOIN last_update lu
ON lu.job_id = j.job_id
WHERE j.job_date <= CURDATE()
AND lu.update_id IS NULL
) AS cj
ON u.user_id = cj.completed
WHERE cj.completed IS NULL
You can first join the three tables, aggregate by user_id and compute, for each user
how many unfinished jobs they have
how many unfinished jobs they have within the last 30 days
how many unfinished jobs they have within the last 31-60 days
Then, you can add another level of aggreation and count how many users meet each criteria.
Query:
select
sum(cnt_jobs_unfinished = 0) cnt_users_no_unfinished_jobs,
sum(cnt_jobs_unfinished_30d > 0) cnt_users_unfinished_30d,
sum(cnt_jobs_unfinished_31_60d > 0) cnt_users_unfinished_31_60d
from (
select
u.user_id,
sum(l.job_id is null) cnt_jobs_unfinished,
sum(
l.job_id is null
and j.date >= curdate() - interval 30 day
) cnt_jobs_unfinished_30d,
sum(
l.job_id is null
and j.date < curdate() - interval 30 day
and j.date >= curdate() - interval 60 day
) cnt_jobs_unfinished_31_60d
from users u
inner join scheduled_jobs j
on j.date <= curdate()
and j.user_id = u.user_id
left join last_update l
on l.job_id = j.job_id
group by u.user_id
) t
Demo on DB Fiddle
cnt_users_no_unfinished_jobs | cnt_users_unfinished_30d | cnt_users_unfinished_31_60d
---------------------------: | -----------------------: | --------------------------:
2 | 2 | 1
Note: I had to modify your sample data so job 8, for user 3, has a date within 30-60 days, as it was not the case in your original data).
You can run the subquery independantly to see what it returns:
user_id | cnt_jobs_unfinished | cnt_jobs_unfinished_30d | cnt_jobs_unfinished_31_60d
------: | ------------------: | ----------------------: | -------------------------:
1 | 0 | 0 | 0
2 | 1 | 1 | 0
3 | 1 | 0 | 1
4 | 0 | 0 | 0
5 | 1 | 1 | 0
Related
I have a logs table which consists of data in which user has opened (1) or closed (2) status. My problem is I need to get the user with registration_id that has open status but no close status.
Logs Table
id | registration_id | user_id | status | created_at
1 | 1 | 1 | 1 | 2021-02-22 8:00:00
2 | 1 | 1 | 2 | 2021-02-22 8:30:00
3 | 2 | 1 | 1 | 2021-02-22 8:30:00
4 | 2 | 1 | 2 | 2021-02-22 9:00:00
5 | 3 | 1 | 1 | 2021-02-22 9:00:00
6 | 4 | 2 | 1 | 2021-02-22 8:00:00
7 | 4 | 2 | 2 | 2021-02-22 8:30:00
Expected Output
id | registration_id | user_id | status | created_at
5 | 3 | 1 | 1 | 2021-02-22 9:00:00
Since the registration_id = 3 with user_id = 1 don't have a closed status. Also, there's a lot of logs between open and closed, I just simplified it in my question so, if you're planning to just count the registration_id if it's equals to 1. It doesn't work.
What I've tried is subtracting the closed created_at - open created_at and if the total is less than or equal 0, it doesn't have a close status but I know there's a better way to get what I wanted because my current query is very slow.
SELECT
user_id,
registration_id,
date,
SUM(timestampdiff(minute, openTime, closedTime)) AS total
FROM (
SELECT
user_id,
date(created_at) as `date`,
created_at as openTime,
registration_id,
coalesce(
(
SELECT created_at FROM logs t2
WHERE t1.registration_id = t2.registration_id
AND t1.created_at < t2.created_at
AND t1.user_id = t2.user_id
AND status = 2
ORDER BY t1.created_at
LIMIT 1
),
date_add(t1.created_at, interval -1 minute)
) AS closedTime
FROM logs t1
WHERE status = 1
) a
GROUP BY a.user_id, registration_id
HAVING total <= 0;
I have an table with user ids and login dates.
id | customer | timestamp
1 | 1 | 2017-02-10 11:30:28
2 | 1 | 2017-02-11 11:30:28
3 | 2 | 2017-02-12 11:30:28
4 | 3 | 2017-02-13 11:30:28
5 | 1 | 2017-02-14 11:30:28
Now I want to get the count of the longest continuous streak of logins per customer.
I got to the point, where the difference is calculated correctly for one customer.
SELECT a.id aId,
b.id bId,
a.customer,
a.timestamp aTime,
b.timestamp bTime,
DATEDIFF(b.timestamp, a.timestamp) as diff
FROM logins a
INNER JOIN logins b
ON a.customer = b.customer AND a.id < b.id
WHERE b.customer = 7
GROUP BY a.id
How can I do this for the whole table and count the following logins with a difference under 1 day?
The wanted result for this example should be:
customer | days of continuous login
1 | 2
2 | 1
3 | 1
You can do this with a LEFT JOIN
Query
SELECT
logins.customer
, COUNT(*) as "longest continuous streak of logins"
FROM (
SELECT
login1.*
FROM
login login1
LEFT JOIN
login login2
ON
login1.timestamp < login2.timestamp
AND
# Only JOIN if date diff is less or equal then 1 day
DATEDIFF(login2.timestamp, login1.timestamp) <= 1
WHERE
login2.id IS NOT NULL
AND
login2.customer IS NOT NULL
AND
login2.timestamp IS NOT NULL
ORDER BY
login1.customer
)
AS logins
GROUP BY
logins.customer
Result
| customer | longest continuous streak of logins |
|----------|-------------------------------------|
| 1 | 2 |
| 2 | 1 |
| 3 | 1 |
see demo http://www.sqlfiddle.com/#!9/ad581/17
Here is the schema:
Customer (Customer_ID, Name, Address, Phone),
Porder (Customer_ID, Pizza_ID, Quantity, Order_Date),
Pizza (Pizza_ID, Name, Price).
I want to get all customers that ordered a pizza in the last 30 days, based on the Order_Date & who spent the most money in the last 30 days. Can these be combined into one?
Here is what I am trying and I am not sure about DATEDIFF or how the query would calculate the total money.
SELECT customer.customer_ID, customer.name FROM customer
JOIN porder ON customer.customer_ID = porder.customer_ID
GROUP BY customer.customer_ID, customer.name
WHERE DATEDIFF(porder.porder_date,getdate()) between 0 and 30
Who spent the most money last 30 days?
SELECT porder.customer_ID, porder.pizza_id, porder.quantity FROM order
JOIN pizza ON porder.pizza_ID = pizza.pizza_ID
GROUP BY porder.customer_ID
WHERE MAX((porder.quantity * pizza.price)) && DATEDIFF(porder.porder_date,getdate()) between 0 and 30
Remember that functions are blackboxes to query optimizer, so you better make the query fit the index, and not the other way around.
WHERE DATEDIFF(order.order_date,getdate()) between 0 and 30
can be rewritten, so that the query would use plain index on order_date
WHERE order.order_date >= CURRENT_DATE - INTERVAL 30 DAY
Who spent the most money in the last 30 days
SELECT
o.customer_id, SUM(p.price * o.quantity)
FROM
order o
INNER JOIN pizza p
ON o.pizza_id = p.pizza_id
WHERE
order_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY o.customer_id
ORDER BY SUM(p.price * o.quantity) DESC
LIMIT 1
Something to think about once you've sorted out your tables, and separated order details from orders.
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT x.*
, IF(x.i = y.maxi,1,0) is_biggest
FROM ints x
LEFT
JOIN (SELECT MAX(i) maxi FROM ints) y
ON y.maxi = x.i;
+---+------------+
| i | is_biggest |
+---+------------+
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
| 9 | 1 |
+---+------------+
I have a query that shows how many messages, is being sent through my system the last year, grouped by months. Works perfectly!
The result look like this:
+------+-------+--------+--------+--------+
| Year | Month | Type 1 | Type 2 | Type 3 |
+------+-------+--------+--------+--------+
| 2013 | 10 | 0 | 2 | 3 |
| 2013 | 11 | 4 | 21 | 56 |
| 2013 | 12 | 1 | 10 | 16 |
| 2014 | 1 | 2 | 10 | 52 |
| 2014 | 2 | 1 | 62 | 118 |
+------+-------+--------+--------+--------+
(type 1,2 and 3 is simply different types of USERS -ignore this)
However, I'd like to avoid that the same receiver (msg_receiver) can be shown twice in the result set, for each month.
So if user 44 and 39 sends a message to user 70 in december, user_id 70 would only be counted ONCE for december. Currently, he will show up twice.
Below is my query:
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
sum(u.type = '1') as type_1,
Sum(u.type = '2') as type_2,
sum(u.type = '7') as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Month(m.msg_date) -- , m.msg_receiver (this does not work, it will no longer group by each month/year).
ORDER BY
msg_date
The logical answer to this, would in my option be, to first group by month, then user_id (or vice via). But if I do this, the results looks strange. See:
Using GROUP BY Month(m.msg_date), u.user_id
+------+-------+--------+--------+--------+
| Year | Month | Type 1 | Type 2 | Type 3 |
+------+-------+--------+--------+--------+
| 2013 | 10 | 0 | 1 | 0 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 10 | 0 | 1 | 0 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 11 | 0 | 0 | 19 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 2 | 0 | 0 |
| 2013 | 11 | 0 | 0 | 11 |
+------+-------+--------+--------+--------+
It does not group by months anymore, as it should.
Any ideas?
EDIT
Just to clarify what exactly I want to achieve, as people have been a bit confused. Imagine this scenario:
It is December 2013.
USER 1 has written 5 messages to USER 2 (this should count as 1 in december)
USER 4 has written 1 message to USER 4 (this should count as 1 in december)
USER 3 has written 2 messages to USER 4 and 2 (this should count as 2 in december).
The totals of the month would then be 4. Because there has been 4 conversations.
Does it makes sense? I find my self often struggling with how to express my self correctly and understandable.
You can use COUNT(DISTINCT to only count each msg_receiver once per type:
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
COUNT(DISTINCT CASE WHEN u.type = '1' THEN m.msg_receiver END) as type_1,
COUNT(DISTINCT CASE WHEN u.type = '2' THEN m.msg_receiver END) as type_2,
COUNT(DISTINCT CASE WHEN u.type = '3' THEN m.msg_receiver END) as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date)
ORDER BY
msg_date
N.B I have added Year(m.msg_date) to your group by to ensure the results are determinate
If the same user receives a message from two different users that have two different types, they will be counted in both types though. If this is not the intended result you would need to come up with some logic as to which type they should be counted in (Min, Max, Mode, Median etc)
If, for example, you wanted the minimum user type, you could use:
SELECT
m.year,
m.month,
sum(m.type = '1') as type_1,
Sum(m.type = '2') as type_2,
sum(m.type = '7') as type_3
FROM (
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
m.msg_receiver,
MIN(u.type) AS type
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date), m.msg_receiver
) m
GROUP BY
m.Year, m.Month
ORDER BY
m.year, m.month;
EDIT
In response to your updated question, in its current form my first answer would count your example as only 3 conversations not 4, as there were only 3 unique recipients. What you really need is to be able to count distinct over sender and receiver, i.e. count(distinct m.msg_sender, m.msg_sender). Unfortunately this is not valid syntax, however, you can achieve essentially the same thing by concatenating the two fields (as long as they are separated by a character/characters that cannot appear in either. e.g
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
COUNT(DISTINCT CASE WHEN u.type = '1' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_1,
COUNT(DISTINCT CASE WHEN u.type = '2' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_2,
COUNT(DISTINCT CASE WHEN u.type = '3' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date)
ORDER BY
msg_date
you haven't posted a data structure, but it appears that you want to change the INNER JOIN to
INNER JOIN
users u ON u.user_id = m.msg_receiver
MySQL Query:
SELECT c.day,
COUNT(site_id)
FROM calendar c
LEFT JOIN
(
SELECT *
FROM visitors
WHERE site_id = 16
) d ON DAYOFMONTH(d.created) = c.day
WHERE c.day BETWEEN DAYOFMONTH('2012-10-01') AND DAYOFMONTH('2012-10-31')
GROUP BY c.day
ORDER BY c.day
My Tables
Calendar
id | day
---------
1 | 1
2 | 2
3 | 3
...
31 | 31
Visitors
id | site_id | created
-----------------------------------
1 | 16 | 2012-10-18 11:14:39
2 | 16 | 2012-10-18 11:15:17
3 | 11 | 2012-10-18 11:49:14
4 | 11 | 2012-10-18 11:49:43
5 | 16 | 2012-10-19 11:54:37
6 | 1 | 2012-10-19 05:56:31
7 | 2 | 2012-10-19 05:57:56
I used the above query to retrieve a daily result of visits to a site. The query solved my question here.
Results:
day | COUNT(*)
-------------
1 | 0
2 | 0
3 | 0
....
18 | 2
19 | 1
...
31 | 0
Although, now, I am having problems retrieving UNIX_TIMESTAMP from the day which I need for graphing purposes.
How do I retrieve it from the c.day in the query?
Edited:
SELECT
UNIX_TIMESTAMP('2012-10-01' + INTERVAL c.day - 1 DAY) unix_ts_day,
COUNT(v.site_id)
FROM
calendar c
LEFT JOIN (
SELECT * FROM visitors
WHERE site_id = 16 AND DATE(created) BETWEEN '2012-10-01' AND '2012-10-31'
) v
ON DAYOFMONTH(v.created) = c.day
GROUP BY
unix_ts_day