I'm trying to delete all records older than one week while keeping at least one for each user.
Example:
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 3 | 1234 | -8 days | ...
| 4 | 5678 | -9 days | ...
| 5 | 5678 | -10 days | ...
Should become
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 4 | 5678 | -9 days | ... // Keeping the most recent record for this user
So far I've made this, but it uses CASE to set OFFSET, so it doesn't work:
DELETE FROM transactions WHERE ID < (
SELECT ID FROM (
SELECT ID FROM transactions t WHERE
DATE(date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
ORDER BY ID DESC
LIMIT 1 OFFSET CASE WHEN EXISTS (
SELECT ID FROM transactions x WHERE
DATE(date) > DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
) THEN 0 ELSE 1 END
)
)
So the question is: how to fix the code above?
P.S.: I'm relatively new to anything except most basic operations in SQL
By grouping the transactions by user, you can determine those that you wish to preserve:
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
You can then make an outer join between these results and your original table using the multiple-table DELETE syntax in order to delete only the desired records:
DELETE transactions
FROM transactions NATURAL LEFT JOIN (
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
) t
WHERE date < CURRENT_DATE - INTERVAL 7 DAY
AND t.date IS NULL
try
DELETE FROM transactions tt WHERE tt.id NOT IN (
SELECT ID FROM transactions t WHERE
DATE(t.date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
t.user = tt.transactions.user
ORDER BY t.ID DESC limit 1
)
Related
I have a MYSQL database that keeps track of all the users' daily total scores (and some other similar score/count type metrics like "badgesEarned", I am only including 2 fields here out of the 5 I need to track). It only has data for the days in which a user was active (earning score points or badges). So the db wont have data for every date there is.
Here's a toy example:
Example Database Table: "User"
Now my goal is to get the last 7 days change in score for each user (I also need to do last 30 days and 365 day but let's stick to just 7 for this example). Since the db table stores a snapshot of total scores for all active days for each user, I wrote a SQL query that finds the two appropriate rows/snapshots and gets the difference in score/badges between them. These 2 rows would be the current date row (or if that doesnt exist, use the row just prior to it) vs the (current_date - 7)th row (or if that doesnt exist, use the row just prior to it).
To make matters worse, I also have to keep track of the "ranks" of each player via the dense_rank() SQL method and add that in as a column in the final result table.
There are 2 ways so far that I can achieve this using 2 different SQL queries.
My main question is - is one of these "better" in terms of performance/good practice/efficiency than the other? Or are they both horrendous and I have completely gone down the wrong route to begin with and totally missed a more efficient approach? I am not great with SQL stuff, so apologies in advance if the question and code examples are horrifying:
First Approach:
Use multiple nested subqueries only (no join).
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
Second Approach:
Get 2 separate tables for each date point, then do Inner Join to subtract relevant columns.
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
Side-question: One of my colleagues was suggesting that I handle the column subtractions in the code itself - like, I would call the database twice, get the two tables (one for CURDATE() and another for CURDATE-7), and then loop through all the User objects and subtract the relevant fields to construct my final result list. I'm not sure if that would be the better approach, so should I be doing that instead of handling it all through the SQL way?
Here's the SQLfiddle of the db if you want to play around with dummy data: http://sqlfiddle.com/#!9/86c58f0/1
Also, the above two code segments run just fine on my MySQL 8.0 workbench with no errors.
I'm not quite getting your expected results. But could you not just work with window functions, in conjunction with the RANGE clause?
I'm just creating the central backbone table, and it will then be up to you to subtract whatever you need to subtract from each other, and finally to dense_rank() what you need to dense_rank(). Basically, I think you need to put a final select, containing DENSE_RANK() , to select from my with_a_week_before in-line table.
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07
I have the following table
Date | amount 1 |
-----------|-------------|
2020-01-01 | 100 |
2020-01-02 | 120 |
2020-01-03 | 150 |
What I try to get is writing the day before data on the following day
Date | amount 1 | amount 2 |
-----------|-------------|----------|
2020-01-01 | 100 | 0 |
2020-01-02 | 120 | 100 |
2020-01-03 | 150 | 120 |
I can get yesterday but don't know how to do it for all rows.
Thanks,
You can use next approach.
select
test.date1,
test.amount1,
ifnull(yestarday_test.amount1, 0) as amount2
from test
left join test yestarday_test on
date_sub(yestarday_test.date1, interval -1 day ) = test.date1
order by test.date1 asc
;
In this query we use join same table to itself by date with 1 day shift.
DB Fiddle
Use lag():
select date, amount,
lag(amount, 1, 0) over (order by date) as amount_prev
from t;
In MySQL < 8.0, where window functions are not available, one option is a correlated subquery:
select
date,
amount,
(
select t1.amount
from mytable t1
where t1.date < t.date
order by t1.date desc limit 1
) prev_amount
from mytable t
I have 2 tables transaction,plan so I am getting subscribed users data by joining these two table and at the same time I want to add plan_validity column as interval with subscription date to check whether validity is completed or not ,but while adding interval it is showing error unexpected date_interval near '))' so how to add interval to subscription date of user.
transaction table
id user id plan_id subscription_date
1 1 1 2019-06-08
2 2 3 2019-07-05
plan table
id plan_validity
1 3 month
2 6 month
3 9 month
mysql query:
select tr.*,t.subscription_date,DATE_ADD(t.subscription_date, INTERVAL p.plan_validity)
from transaction t inner join plan p on t.plan_id=plan.id where t.user_id=1
Try with a CASE expression that checks the type of interval that you want to add:
select t.*,
case
when p.plan_validity like '%month'
then date_add(t.subscription_date, interval p.plan_validity month)
when p.plan_validity like '%year'
then date_add(t.subscription_date, interval p.plan_validity year)
end result
from transaction t inner join plan p
on t.plan_id = p.id
where t.user_id = 1
See the demo.
Results:
| id | user_id | plan_id | subscription_date | result |
| --- | ------- | ------- | ------------------- | -----------|
| 1 | 1 | 1 | 2019-06-08 | 2019-09-08 |
I really don't know how to find an answer for my question, so I'm asking you.
Here is the table I have :
+----+------------+------------+-------------+
| id | start_date | end_date | id_person |
+----+------------+------------+-------------+
| 1 | 2017-10-01 | 2017-12-01 | 1 |
| 2 | 2017-07-01 | 2017-09-01 | 1 |
| 3 | 2016-01-01 | 2016-02-01 | 1 |
| 4 | 2016-05-01 | 2016-06-01 | 2 |
| 5 | 2016-01-01 | 2016-02-01 | 2 |
+----+------------+------------+-------------+
And here is the query I tried to use :
SELECT * FROM table
WHERE ((start_date < NOW() AND end_date > NOW())
OR start_date > NOW()
OR end_date < NOW())
GROUP BY `id_person`
The result I was expecting was this one :
+----+------------+------------+-------------+
| id | start_date | end_date | id_person |
+----+------------+------------+-------------+
| 2 | 2017-07-01 | 2017-09-01 | 1 | // matches first condition
| 4 | 2016-05-01 | 2016-06-01 | 2 | // matches 3rd condition and has the most recent start_date
+----+------------+------------+-------------+
If you didn't get what I did wrong yet, I'm going to tell you.
Here, I was trying to show a single row per person but I wanted this row to match the first condition it finds and not the others, I don't want the row to just be ordered by start_date. It is like a custom order where I want the first row for each person.
The problem is that this query doesn't work since the GROUP BY statement doesn't apply conditions first. (even if it did, I'm not sure the condition would only select one row)
I really don't know how I can achieve that and I don't even know if it is possible, I hope someone can lead me towards any solution.
Thanks for reading this, I'll answer as fast as I can to give you more informations.
Here's one idea...
SELECT m.*
FROM my_table m
JOIN
( SELECT x.id_person
, MAX(x.start_date) start_date
FROM my_table x
JOIN
( SELECT id_person
, MIN(CASE WHEN NOW() BETWEEN start_date AND end_date THEN 'A' WHEN start_date > NOW() THEN 'B' WHEN end_date < NOW() THEN 'C' END) rule
FROM my_table
GROUP
BY id_person
) y
ON y.id_person = x.id_person
AND y.rule = CASE WHEN NOW() BETWEEN start_date AND end_date THEN 'A' WHEN start_date > NOW() THEN 'B' WHEN end_date < NOW() THEN 'C' END
GROUP
BY id_person
) n
ON n.id_person = m.id_person
AND n.start_date = m.start_date;
+----+------------+------------+-----------+
| id | start_date | end_date | id_person |
+----+------------+------------+-----------+
| 2 | 2017-07-01 | 2017-09-01 | 1 |
| 4 | 2016-05-01 | 2016-06-01 | 2 |
+----+------------+------------+-----------+
If are happy to write the rules directly in sql rather than as where conditions you can ask the database for what yuo want more directly.
This means taking a step back to see what the rules you want are. It looks like you want to prioritise the entries by closest date, showing first current, then future, then historical. It also looks like end_date >= start_date, which means you only need to look at end_date to find what you are looking for.
Mysql can answer the question abusing it's group by functionality (until recent versions).
SELECT t.* FROM
(
SELECT t.*
FROM table t
ORDER BY SIGN(t.end_date - NOW()),ABS(t.end_date-NOW())
)
GROUP BY t.id_person
A standard sql method that will also play better with indexes would be to look for end dates before and after today separately.
SELECT t.*
FROM table t
JOIN (
SELECT t.person_id
,COALESCE(first_not_ended.end_date,t.last_ended.end_date) AS end_date
FROM table t
LEFT JOIN (
SELECT t.*,MIN(end_date) AS end_date
FROM table t
WHERE t.end_date > NOW()
GROUP by t.person_id
) first_not_ended
ON t.person_id=first_not_ended.person_id
AND t.end_date=first_not_ended.end_ate
LEFT JOIN (
SELECT t.person_id,MAX(end_date) AS end_date
FROM table t
WHERE t.end_date < NOW()
GROUP by t.person_id
) last_ended
ON t.person_id=last_ended.person_id
AND t.end_date=last_ended.end_date
) closest
ON t.person_id=closest.person_id
AND t.end_date=closest.end_date
Hi I would like to count the sum of chat_duration from 8 hours ago of the current data
I have :
agent text
start_time datetime
end_time datetime
chat_duration bigint
and i need to insert the calculation result into past8_hours_chat_duration
so when i have :
+----+--------+------------+----------+---------------+---------------------------+
| id | agent | start_time | end_time | chat_duration | past8_hours_chat_duration |
+----+--------+------------+----------+---------------+---------------------------+
| 1 | agent1 | 00.00.00 | 00.01.00 | 60 | |
| 2 | agent2 | 00.00.00 | 00.01.00 | 60 | |
| 3 | agent1 | 00.02.00 | 00.04.00 | 120 | |
| 4 | agent1 | 08.02.00 | 08.03.00 | 60 | |
+----+--------+------------+----------+---------------+---------------------------+
I'll try to explain as much as possible.
For each row I need to find the sum of duration past 8 hours of the current agent
or in another word : where the start_time is after (currentData.start_time minus 8 hour) and not itself ( current row) and not where the start_time is after currentData.start_time
for id 1, there is no session for agent1 where the start_time is after 00.00.00 minus 8 hour ( current start_time) so the total is 0
for id 2, there is also no session for agent2 where the start_time is after 00.00.00 minus 8 hour ( current start_time) so the total is 0
for id 3, since the start_time of id 1 is > 00.02.00(current) - 8 hours so the total is 60
and
for id 4, since the start_time of
id 1 is < 08.02.00(current) - 8 hours
& id 3 is > 08.02.00(current) - 8 hours
so the total is 120(from id 3)
i'm using mysql
at first i'm using :
UPDATE chats AS c
JOIN ( SELECT agent,
SUM(chat_duration) AS sum_duration
FROM abc
GROUP BY agent
) AS c2
ON c2.agent = c.agent
SET c.past8_hours_chat_duration = c2.sum_duration
WHERE c.id < 10;
but that's sum of all the agent duration, how should i find the sum of the past 8 hours chat data.
Thank you,
You can do this in a query using a correlated subquery:
select c.*,
(select sum(c2.duration)
from chats c2
where c2.agent = c.agent and
c2.start_time > c.start_time - interval 8 hour and
c2.start_time <= c.start_time
) as past8_hours_chat_duration
from chats c;
In MySQL, integrating this into an update is tricky, because you can only refer to the table being updated in the join clause. So:
update chats c join
(select c.*,
(select sum(c2.duration)
from chats c2
where c2.agent = c.agent and
c2.start_time > c.start_time - interval 8 hour and
c2.start_time <= c.start_time
) as past8_hours_chat_duration
from chats c
) cc
on c.id = cc.id
c.past8_hours_chat_duration = coalesce(cc.past8_hours_chat_duration, 0);