I have the following table
Date | amount 1 |
-----------|-------------|
2020-01-01 | 100 |
2020-01-02 | 120 |
2020-01-03 | 150 |
What I try to get is writing the day before data on the following day
Date | amount 1 | amount 2 |
-----------|-------------|----------|
2020-01-01 | 100 | 0 |
2020-01-02 | 120 | 100 |
2020-01-03 | 150 | 120 |
I can get yesterday but don't know how to do it for all rows.
Thanks,
You can use next approach.
select
test.date1,
test.amount1,
ifnull(yestarday_test.amount1, 0) as amount2
from test
left join test yestarday_test on
date_sub(yestarday_test.date1, interval -1 day ) = test.date1
order by test.date1 asc
;
In this query we use join same table to itself by date with 1 day shift.
DB Fiddle
Use lag():
select date, amount,
lag(amount, 1, 0) over (order by date) as amount_prev
from t;
In MySQL < 8.0, where window functions are not available, one option is a correlated subquery:
select
date,
amount,
(
select t1.amount
from mytable t1
where t1.date < t.date
order by t1.date desc limit 1
) prev_amount
from mytable t
Related
I have a MYSQL database that keeps track of all the users' daily total scores (and some other similar score/count type metrics like "badgesEarned", I am only including 2 fields here out of the 5 I need to track). It only has data for the days in which a user was active (earning score points or badges). So the db wont have data for every date there is.
Here's a toy example:
Example Database Table: "User"
Now my goal is to get the last 7 days change in score for each user (I also need to do last 30 days and 365 day but let's stick to just 7 for this example). Since the db table stores a snapshot of total scores for all active days for each user, I wrote a SQL query that finds the two appropriate rows/snapshots and gets the difference in score/badges between them. These 2 rows would be the current date row (or if that doesnt exist, use the row just prior to it) vs the (current_date - 7)th row (or if that doesnt exist, use the row just prior to it).
To make matters worse, I also have to keep track of the "ranks" of each player via the dense_rank() SQL method and add that in as a column in the final result table.
There are 2 ways so far that I can achieve this using 2 different SQL queries.
My main question is - is one of these "better" in terms of performance/good practice/efficiency than the other? Or are they both horrendous and I have completely gone down the wrong route to begin with and totally missed a more efficient approach? I am not great with SQL stuff, so apologies in advance if the question and code examples are horrifying:
First Approach:
Use multiple nested subqueries only (no join).
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
Second Approach:
Get 2 separate tables for each date point, then do Inner Join to subtract relevant columns.
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
Side-question: One of my colleagues was suggesting that I handle the column subtractions in the code itself - like, I would call the database twice, get the two tables (one for CURDATE() and another for CURDATE-7), and then loop through all the User objects and subtract the relevant fields to construct my final result list. I'm not sure if that would be the better approach, so should I be doing that instead of handling it all through the SQL way?
Here's the SQLfiddle of the db if you want to play around with dummy data: http://sqlfiddle.com/#!9/86c58f0/1
Also, the above two code segments run just fine on my MySQL 8.0 workbench with no errors.
I'm not quite getting your expected results. But could you not just work with window functions, in conjunction with the RANGE clause?
I'm just creating the central backbone table, and it will then be up to you to subtract whatever you need to subtract from each other, and finally to dense_rank() what you need to dense_rank(). Basically, I think you need to put a final select, containing DENSE_RANK() , to select from my with_a_week_before in-line table.
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07
I have a table that logs weather data variables by datetime like this:
|------------------|------------| ----
| LogDateTime | Temp | ...
+------------------|------------| ----
| 2020-01-01 00:00 | 20.1 | ...
| 2020-01-01 00:05 | 20.1 | ...
| 2020-01-01 00:10 | 19.9 | ...
| 2020-01-01 00:15 | 19.8 | ...
---------------------------------------
From that table I want to return the earliest time of the maximum temperature for each day like this (just the time portion of the datetime value):
|------------|----------------------
| LogDate | LogTime| MaxTemp
+---------------------|--------------
| 2020-01-01 | 14:00 | 24.5
| 2020-01-02 | 15:12 | 23.2
| 2020-01-03 | 10:12 | 25.1
| 2020-01-04 | 12:14 | 28.8
--------------------------------
The query I have to return this so far is the below, but it returns the earliest temperature for each day instead of the earliest occurrence of the maximum temperature for each day
SELECT TIME(a.LogDateTime), a.Temp
FROM Monthly a
INNER JOIN (
SELECT TIME(LogDateTime), LogDateTime, MAX(Temp) Temp
FROM Monthly
GROUP BY LogDateTime
) b ON a.LogDateTime = b.LogDateTime AND a.Temp= b.Temp
GROUP BY DATE(a.LogDateTime)
I then want to use that query to update a table of one row per day that summarises the minimum and maximum values with a query something like this but update the time rather than the actual maximum temperature:
UPDATE Dayfile AS d
JOIN (
SELECT DATE(LogDateTime) AS date, MAX(Temp) AS Temps
FROM Monthly
GROUP BY date
) AS m ON DATE(d.LogDate) = m.date
SET d.MaxTemp = m.Temps
Your version of MariaDB supports window functions, so use ROW_NUMBER():
select LogDateTime, Temp
from (
select *,
row_number() over (partition by date(LogDateTime) order by Temp desc, LogDateTime) rn
from Monthly
) t
where t.rn = 1
See a simplified demo.
Use it to update Dayfile like this:
update Dayfile d
inner join (
select LogDateTime, Temp
from (
select *,
row_number() over (partition by date(LogDateTime) order by Temp desc, LogDateTime) rn
from Monthly
) t
where t.rn = 1
) m on date(d.LogDate) = m.date
set d.MaxTemp = m.Temp
I have a table like this:
+----+---------+------------+
| id | price | date |
+----+---------+------------+
| 1 | 340 | 2018-09-02 |
| 2 | 325 | 2018-09-05 |
| 3 | 358 | 2018-09-08 |
+----+---------+------------+
And I need to make a view which has a row for every day. Something like this:
+----+---------+------------+
| id | price | date |
+----+---------+------------+
| 1 | 340 | 2018-09-02 |
| 1 | 340 | 2018-09-03 |
| 1 | 340 | 2018-09-04 |
| 2 | 325 | 2018-09-05 |
| 2 | 325 | 2018-09-06 |
| 2 | 325 | 2018-09-07 |
| 3 | 358 | 2018-09-08 |
+----+---------+------------+
I can do that using PHP with a loop (foreach) and making a temp variable which holds the previous price til there is a new date.
But I need to make a view ... So I should do that using pure-SQL .. Any idea how can I do that?
You could use a recursive CTE to generate the records in the "gaps". To avoid that an infinite gap after the last date is "filled", first get the maximum date in the source data and make sure not to bypass that date in the recursion.
I have called your table tbl:
with recursive cte as (
select id,
price,
date,
(select max(date) date from tbl) mx
from tbl
union all
select cte.id,
cte.price,
date_add(cte.date, interval 1 day),
cte.mx
from cte
left join tbl
on tbl.date = date_add(cte.date, interval 1 day)
where tbl.id is null
and cte.date <> cte.mx
)
select id,
price,
date
from cte
order by 3;
demo with mysql 8
Here is an approach which should work without analytic functions. This answer uses a calendar table join approach. The first CTE below is the base table on which the rest of the query is based. We use a correlated subquery to find the most recent date earlier than the current date in the CTE which has a non NULL price. This is the basis for finding out what the id and price values should be for those dates coming in from the calendar table which do not appear in the original data set.
WITH cte AS (
SELECT cal.date, t.price, t.id
FROM
(
SELECT '2018-09-02' AS date UNION ALL
SELECT '2018-09-03' UNION ALL
SELECT '2018-09-04' UNION ALL
SELECT '2018-09-05' UNION ALL
SELECT '2018-09-06' UNION ALL
SELECT '2018-09-07' UNION ALL
SELECT '2018-09-08'
) cal
LEFT JOIN yourTable t
ON cal.date = t.date
),
cte2 AS (
SELECT
t1.date,
t1.price,
t1.id,
(SELECT MAX(t2.date) FROM cte t2
WHERE t2.date <= t1.date AND t2.price IS NOT NULL) AS nearest_date
FROM cte t1
)
SELECT
(SELECT t2.id FROM yourTable t2 WHERE t2.date = t1.nearest_date) id,
(SELECT t2.price FROM yourTable t2 WHERE t2.date = t1.nearest_date) price,
t1.date
FROM cte2 t1
ORDER BY
t1.date;
Demo
Note: To make this work on MySQL versions earlier than 8+, you would need to inline the CTEs above. It would result in verbose code, but, it should still work.
Since you are using MariaDB, it is rather trivial:
MariaDB [test]> SELECT '2019-01-01' + INTERVAL seq-1 DAY FROM seq_1_to_31;
+-----------------------------------+
| '2019-01-01' + INTERVAL seq-1 DAY |
+-----------------------------------+
| 2019-01-01 |
| 2019-01-02 |
| 2019-01-03 |
| 2019-01-04 |
| 2019-01-05 |
| 2019-01-06 |
(etc)
There are variations on this wherein you generate a large range of dates, but then use a WHERE to chop to what you need. And use LEFT JOIN with the sequence 'derived table' on the 'left'.
Use something like the above as a derived table in your query.
I'm trying to delete all records older than one week while keeping at least one for each user.
Example:
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 3 | 1234 | -8 days | ...
| 4 | 5678 | -9 days | ...
| 5 | 5678 | -10 days | ...
Should become
| ID | user | date | other columns...
| 1 | 1234 | -2 days | ...
| 2 | 1234 | -3 days | ...
| 4 | 5678 | -9 days | ... // Keeping the most recent record for this user
So far I've made this, but it uses CASE to set OFFSET, so it doesn't work:
DELETE FROM transactions WHERE ID < (
SELECT ID FROM (
SELECT ID FROM transactions t WHERE
DATE(date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
ORDER BY ID DESC
LIMIT 1 OFFSET CASE WHEN EXISTS (
SELECT ID FROM transactions x WHERE
DATE(date) > DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
user = transactions.user
) THEN 0 ELSE 1 END
)
)
So the question is: how to fix the code above?
P.S.: I'm relatively new to anything except most basic operations in SQL
By grouping the transactions by user, you can determine those that you wish to preserve:
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
You can then make an outer join between these results and your original table using the multiple-table DELETE syntax in order to delete only the desired records:
DELETE transactions
FROM transactions NATURAL LEFT JOIN (
SELECT user, MAX(date) date
FROM transactions
GROUP BY user
) t
WHERE date < CURRENT_DATE - INTERVAL 7 DAY
AND t.date IS NULL
try
DELETE FROM transactions tt WHERE tt.id NOT IN (
SELECT ID FROM transactions t WHERE
DATE(t.date) <= DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND
t.user = tt.transactions.user
ORDER BY t.ID DESC limit 1
)
I am struggling in to get result from mysql in the following way. I have 10 records in mysql db table having date and unit fields. I need to get used units on every date.
Table structure as follows, adding today unit with past previous unit in every record:
Date Units
---------- ---------
10/10/2012 101
11/10/2012 111
12/10/2012 121
13/10/2012 140
14/10/2012 150
15/10/2012 155
16/10/2012 170
17/10/2012 180
18/10/2012 185
19/10/2012 200
Desired output will be :
Date Units
---------- ---------
10/10/2012 101
11/10/2012 10
12/10/2012 10
13/10/2012 19
14/10/2012 10
15/10/2012 5
16/10/2012 15
17/10/2012 10
18/10/2012 5
19/10/2012 15
Any help will be appreciated. Thanks
There's a couple of ways to get the resultset. If you can live with an extra column in the resultset, and the order of the columns, then something like this is a workable approach.
using user variables
SELECT d.Date
, IF(#prev_units IS NULL
,#diff := 0
,#diff := d.units - #prev_units
) AS `Units_used`
, #prev_units := d.units AS `Units`
FROM ( SELECT #prev_units := NULL ) i
JOIN (
SELECT t.Date, t.Units
FROM mytable t
ORDER BY t.Date, t.Units
) d
This returns the specified resultset, but it includes the Units column as well. It's possible to have that column filtered out, but it's more expensive, because of the way MySQL processes an inline view (MySQL calls it a "derived table")
To remove that extra column, you can wrap that in another query...
SELECT f.Date
, f.Units_used
FROM (
query from above goes here
) f
ORDER BY f.Date
but again, removing that column comes with the extra cost of materializing that result set a second time.
using a semi-join
If you are guaranteed to have a single row for each Date value, either stored as a DATE, or as a DATETIME with the timecomponent set to a constant, such as midnight, and no gaps in the Date value, and Date is defined as DATE or DATETIME datatype, then another query that will return the specifid result set:
SELECT t.Date
, t.Units - s.Units AS Units_Used
FROM mytable t
LEFT
JOIN mytable s
ON s.Date = t.Date + INTERVAL -1 DAY
ORDER BY t.Date
If there's a missing Date value (a gap) such that there is no matching previous row, then Units_used will have a NULL value.
using a correlated subquery
If you don't have a guarantee of no "missing dates", but you have a guarantee that there is no more than one row for a particular Date, then another approach (usually more expensive in terms of performance) is to use a correlated subquery:
SELECT t.Date
, ( t.Units - (SELECT s.Units
FROM mytable s
WHERE s.Date < t.Date
ORDER BY s.Date DESC
LIMIT 1)
) AS Units_used
FROM mytable t
ORDER BY t.Date, t.Units
spencer7593's solution will be faster, but you can also do something like this...
SELECT * FROM rolling;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 111 |
| 3 | 121 |
| 4 | 140 |
| 5 | 150 |
| 6 | 155 |
| 7 | 170 |
| 8 | 180 |
| 9 | 185 |
| 10 | 200 |
+----+-------+
SELECT a.id,COALESCE(a.units - b.units,a.units) units
FROM
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) b
ON b.rank= a.rank -1;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 10 |
| 3 | 10 |
| 4 | 19 |
| 5 | 10 |
| 6 | 5 |
| 7 | 15 |
| 8 | 10 |
| 9 | 5 |
| 10 | 15 |
+----+-------+
This should give the desired result. I don't know how your table is called so I named it "tbltest".
Naming a table date is generally a bad idea as it also refers to other things (functions, data types,...) so I renamed it "fdate". Using uppercase characters in field names or tablenames is also a bad idea as it makes your statements less database independent (some databases are case sensitive and some are not).
SELECT
A.fdate,
A.units - coalesce(B.units, 0) AS units
FROM
tbltest A left join tbltest B ON A.fdate = B.fdate + INTERVAL 1 DAY