I have a table which contains an amount per month and a previous month amount.
Each month I need to carry the last previous month amount if it does not exist.
To explain slightly better (and with examples) I might have the following data;
Month,Amount,Previous
2019-01-01,100,0
2019-02-01,100,100
2019-03-01,100,null
2019-04-01,100,null
2019-05-01,100,200
2019-06-01,100,null
So I want to carry the 100 to March and April and then the 200 to June so it looks like this;
Month,Amount,Previous
2019-01-01,100,0
2019-02-01,100,100
2019-03-01,100,100
2019-04-01,100,100
2019-05-01,100,200
2019-06-01,100,200
I'm just hitting blanks, I know there is a way but the mind simply isn't putting it together.
I think it's going to involve LEFT JOIN on the same table and getting a MIN month value with the amount where the date is greater than the last month value but is not greater than the next.
Or it's going to be doing a subquery in a WHERE clause and a LEFT JOIN.
So far I've managed the below but it duplicates the May and June rows for each previous value (100 and 200).
SELECT
*
FROM
table1 t1
LEFT JOIN
table1 t2 ON
t1.month > t2.month
Month,Amount,Previous
2019-01-01,100,0
2019-02-01,100,100
2019-03-01,100,100
2019-04-01,100,100
2019-05-01,100,200
2019-05-01,100,100
2019-06-01,100,200
2019-06-01,100,100
drop table if exists t;
create table t
(Month date,Amount int,Previous int);
insert into t values
('2019-01-01',100,0),
('2019-02-01',100,100),
('2019-03-01',100,null),
('2019-04-01',100,null),
('2019-05-01',100,200),
('2019-06-01',100,null);
select t.*,
case when previous is null then
(select previous from t t1 where t1.month < t.month and t1.previous is not null order by month desc limit 1)
else previous
end as previousdownfilled
from t;
+------------+--------+----------+--------------------+
| Month | Amount | Previous | previousdownfilled |
+------------+--------+----------+--------------------+
| 2019-01-01 | 100 | 0 | 0 |
| 2019-02-01 | 100 | 100 | 100 |
| 2019-03-01 | 100 | NULL | 100 |
| 2019-04-01 | 100 | NULL | 100 |
| 2019-05-01 | 100 | 200 | 200 |
| 2019-06-01 | 100 | NULL | 200 |
+------------+--------+----------+--------------------+
6 rows in set (0.00 sec)
where the case statement checks if things need to be done and the correlated cub query does it. But I suspect this work should be done in the code which creates this table.
You can use a Correlated Subquery here. In the subquery, we will utilize ORDER BY with LIMIT to get the immediate previous amount value.
SELECT
t1.month,
t1.amount,
(SELECT t2.amount FROM table1 t2
WHERE t2.month < t1.month
ORDER BY t2.month DESC LIMIT 1) AS previous
FROM
table1 t1
In MySL 8+, you can use window functions. Unfortunately, MySQL has not (yet) implemented the simplest approach -- lag() with the ignore nulls option.
But you can still do this pretty simply:
select month, amount,
max(previous) over (partition by grp) as previous
from (select t.*, count(previous) over (order by month) as grp
from t
) t;
Related
I have a MYSQL database that keeps track of all the users' daily total scores (and some other similar score/count type metrics like "badgesEarned", I am only including 2 fields here out of the 5 I need to track). It only has data for the days in which a user was active (earning score points or badges). So the db wont have data for every date there is.
Here's a toy example:
Example Database Table: "User"
Now my goal is to get the last 7 days change in score for each user (I also need to do last 30 days and 365 day but let's stick to just 7 for this example). Since the db table stores a snapshot of total scores for all active days for each user, I wrote a SQL query that finds the two appropriate rows/snapshots and gets the difference in score/badges between them. These 2 rows would be the current date row (or if that doesnt exist, use the row just prior to it) vs the (current_date - 7)th row (or if that doesnt exist, use the row just prior to it).
To make matters worse, I also have to keep track of the "ranks" of each player via the dense_rank() SQL method and add that in as a column in the final result table.
There are 2 ways so far that I can achieve this using 2 different SQL queries.
My main question is - is one of these "better" in terms of performance/good practice/efficiency than the other? Or are they both horrendous and I have completely gone down the wrong route to begin with and totally missed a more efficient approach? I am not great with SQL stuff, so apologies in advance if the question and code examples are horrifying:
First Approach:
Use multiple nested subqueries only (no join).
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
Second Approach:
Get 2 separate tables for each date point, then do Inner Join to subtract relevant columns.
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
Side-question: One of my colleagues was suggesting that I handle the column subtractions in the code itself - like, I would call the database twice, get the two tables (one for CURDATE() and another for CURDATE-7), and then loop through all the User objects and subtract the relevant fields to construct my final result list. I'm not sure if that would be the better approach, so should I be doing that instead of handling it all through the SQL way?
Here's the SQLfiddle of the db if you want to play around with dummy data: http://sqlfiddle.com/#!9/86c58f0/1
Also, the above two code segments run just fine on my MySQL 8.0 workbench with no errors.
I'm not quite getting your expected results. But could you not just work with window functions, in conjunction with the RANGE clause?
I'm just creating the central backbone table, and it will then be up to you to subtract whatever you need to subtract from each other, and finally to dense_rank() what you need to dense_rank(). Basically, I think you need to put a final select, containing DENSE_RANK() , to select from my with_a_week_before in-line table.
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07
I need help to resolve the next case.
The data which users want to see is accessible by pagination requests and later these requests are stored in the database in the next form:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
The table is ordered by user id asc, first asc, amount desc.
The task is to write the SQL statement which calculate what total unique amount of data the user has seen.
For the first user total amount must be 20, since the request with id=1 returned first 5 items, with id=2 returned another 10 items. Request with id=3 returns data already 'seen' by request with id=2. Request with id=4 intersects with id=2, but still returns 5 'unseen' pieces of data.
For the second user total amount must be 15.
As a result of SQL statement, I should get the next output:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
I am using MySQL 5.7, so window functions are not available for me. I stuck with this task for a day already and still cannot get the desired output. If it is not possible with this setup, I will end up calculating the results in the application code. I would appreciate any suggestions or help with resolving this task, thank you!
This is a type of gaps and islands problem. In this case, use a cumulative max to determine if one request intersects with a previous request. If not, that is the beginning of an "island" of adjacent requests. A cumulative sum of the beginnings assigns an "island", then an aggregation counts each island.
So, the islands look like this:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
You then want this summed by userid, so that is one more level of aggregation:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here is a db<>fiddle.
This logic is similar to Gordon's, but runs on older releases of MySQL, too.
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
See fiddle
I have a table like this:
+----+---------+------------+
| id | price | date |
+----+---------+------------+
| 1 | 340 | 2018-09-02 |
| 2 | 325 | 2018-09-05 |
| 3 | 358 | 2018-09-08 |
+----+---------+------------+
And I need to make a view which has a row for every day. Something like this:
+----+---------+------------+
| id | price | date |
+----+---------+------------+
| 1 | 340 | 2018-09-02 |
| 1 | 340 | 2018-09-03 |
| 1 | 340 | 2018-09-04 |
| 2 | 325 | 2018-09-05 |
| 2 | 325 | 2018-09-06 |
| 2 | 325 | 2018-09-07 |
| 3 | 358 | 2018-09-08 |
+----+---------+------------+
I can do that using PHP with a loop (foreach) and making a temp variable which holds the previous price til there is a new date.
But I need to make a view ... So I should do that using pure-SQL .. Any idea how can I do that?
You could use a recursive CTE to generate the records in the "gaps". To avoid that an infinite gap after the last date is "filled", first get the maximum date in the source data and make sure not to bypass that date in the recursion.
I have called your table tbl:
with recursive cte as (
select id,
price,
date,
(select max(date) date from tbl) mx
from tbl
union all
select cte.id,
cte.price,
date_add(cte.date, interval 1 day),
cte.mx
from cte
left join tbl
on tbl.date = date_add(cte.date, interval 1 day)
where tbl.id is null
and cte.date <> cte.mx
)
select id,
price,
date
from cte
order by 3;
demo with mysql 8
Here is an approach which should work without analytic functions. This answer uses a calendar table join approach. The first CTE below is the base table on which the rest of the query is based. We use a correlated subquery to find the most recent date earlier than the current date in the CTE which has a non NULL price. This is the basis for finding out what the id and price values should be for those dates coming in from the calendar table which do not appear in the original data set.
WITH cte AS (
SELECT cal.date, t.price, t.id
FROM
(
SELECT '2018-09-02' AS date UNION ALL
SELECT '2018-09-03' UNION ALL
SELECT '2018-09-04' UNION ALL
SELECT '2018-09-05' UNION ALL
SELECT '2018-09-06' UNION ALL
SELECT '2018-09-07' UNION ALL
SELECT '2018-09-08'
) cal
LEFT JOIN yourTable t
ON cal.date = t.date
),
cte2 AS (
SELECT
t1.date,
t1.price,
t1.id,
(SELECT MAX(t2.date) FROM cte t2
WHERE t2.date <= t1.date AND t2.price IS NOT NULL) AS nearest_date
FROM cte t1
)
SELECT
(SELECT t2.id FROM yourTable t2 WHERE t2.date = t1.nearest_date) id,
(SELECT t2.price FROM yourTable t2 WHERE t2.date = t1.nearest_date) price,
t1.date
FROM cte2 t1
ORDER BY
t1.date;
Demo
Note: To make this work on MySQL versions earlier than 8+, you would need to inline the CTEs above. It would result in verbose code, but, it should still work.
Since you are using MariaDB, it is rather trivial:
MariaDB [test]> SELECT '2019-01-01' + INTERVAL seq-1 DAY FROM seq_1_to_31;
+-----------------------------------+
| '2019-01-01' + INTERVAL seq-1 DAY |
+-----------------------------------+
| 2019-01-01 |
| 2019-01-02 |
| 2019-01-03 |
| 2019-01-04 |
| 2019-01-05 |
| 2019-01-06 |
(etc)
There are variations on this wherein you generate a large range of dates, but then use a WHERE to chop to what you need. And use LEFT JOIN with the sequence 'derived table' on the 'left'.
Use something like the above as a derived table in your query.
Say I have a dataset of :
|dateid | value |
|20150101 | 1 |
|20150102 | 2 |
|20150103 | 3.1 |
|20150104 | 4.3 |
|20150105 | 3.1 |
|20150106 | 1 |
|20150107 | 1 |
|20150108 | 1 |
|.... | |
|.... | ... |
|20151001 | 10.3|
I want to query the average of every past 7 days based on a date range.
say for dateid from 20150707 and 20150730, when I select row of 20150707, I also need the average value between 20150701 and 20150707( (1+2+3.1+4.3+1+1+1+1)/7) as well as the value for 20150707(1) like:
select dateid, value , avg(value) as avg_past_7 from mytable where dateid between 20150707 and 20150730GROUP BY every past_7days.
And when the records are less than 7 rows to count, the avg remains null.
That means if I only have records from 20150707-20150730 in the table, the past_7_day avg for 20150707/8/9/10/11/12 remains null.
Correlated sub-select:
select dateid, value, (select avg(value) from mytable t2
where t2.dateid between (DATE_SUB(date(t1.dateid),INTERVAL 6 day)+0)
and t1.dateid) as avg_past_7
from mytable t1
where dateid between 20150101 and 20150201 order by dateid;
Use Date_SUB With Interval of 7 Days
I solve the problem by :
select t1.dateid, t1.value, if(count(1)>=7,avg(t2.value),null)
from mytable t1 , mytable t2
where t2.dateid between DATE_SUB(date(t1.dateid),INTERVAL 6 day)+0 and t1.dateid and
t1.dateid between 20150105 and 20150201
group by t1.dateid ,t1.value
order by dateid;
I am struggling in to get result from mysql in the following way. I have 10 records in mysql db table having date and unit fields. I need to get used units on every date.
Table structure as follows, adding today unit with past previous unit in every record:
Date Units
---------- ---------
10/10/2012 101
11/10/2012 111
12/10/2012 121
13/10/2012 140
14/10/2012 150
15/10/2012 155
16/10/2012 170
17/10/2012 180
18/10/2012 185
19/10/2012 200
Desired output will be :
Date Units
---------- ---------
10/10/2012 101
11/10/2012 10
12/10/2012 10
13/10/2012 19
14/10/2012 10
15/10/2012 5
16/10/2012 15
17/10/2012 10
18/10/2012 5
19/10/2012 15
Any help will be appreciated. Thanks
There's a couple of ways to get the resultset. If you can live with an extra column in the resultset, and the order of the columns, then something like this is a workable approach.
using user variables
SELECT d.Date
, IF(#prev_units IS NULL
,#diff := 0
,#diff := d.units - #prev_units
) AS `Units_used`
, #prev_units := d.units AS `Units`
FROM ( SELECT #prev_units := NULL ) i
JOIN (
SELECT t.Date, t.Units
FROM mytable t
ORDER BY t.Date, t.Units
) d
This returns the specified resultset, but it includes the Units column as well. It's possible to have that column filtered out, but it's more expensive, because of the way MySQL processes an inline view (MySQL calls it a "derived table")
To remove that extra column, you can wrap that in another query...
SELECT f.Date
, f.Units_used
FROM (
query from above goes here
) f
ORDER BY f.Date
but again, removing that column comes with the extra cost of materializing that result set a second time.
using a semi-join
If you are guaranteed to have a single row for each Date value, either stored as a DATE, or as a DATETIME with the timecomponent set to a constant, such as midnight, and no gaps in the Date value, and Date is defined as DATE or DATETIME datatype, then another query that will return the specifid result set:
SELECT t.Date
, t.Units - s.Units AS Units_Used
FROM mytable t
LEFT
JOIN mytable s
ON s.Date = t.Date + INTERVAL -1 DAY
ORDER BY t.Date
If there's a missing Date value (a gap) such that there is no matching previous row, then Units_used will have a NULL value.
using a correlated subquery
If you don't have a guarantee of no "missing dates", but you have a guarantee that there is no more than one row for a particular Date, then another approach (usually more expensive in terms of performance) is to use a correlated subquery:
SELECT t.Date
, ( t.Units - (SELECT s.Units
FROM mytable s
WHERE s.Date < t.Date
ORDER BY s.Date DESC
LIMIT 1)
) AS Units_used
FROM mytable t
ORDER BY t.Date, t.Units
spencer7593's solution will be faster, but you can also do something like this...
SELECT * FROM rolling;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 111 |
| 3 | 121 |
| 4 | 140 |
| 5 | 150 |
| 6 | 155 |
| 7 | 170 |
| 8 | 180 |
| 9 | 185 |
| 10 | 200 |
+----+-------+
SELECT a.id,COALESCE(a.units - b.units,a.units) units
FROM
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) b
ON b.rank= a.rank -1;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 10 |
| 3 | 10 |
| 4 | 19 |
| 5 | 10 |
| 6 | 5 |
| 7 | 15 |
| 8 | 10 |
| 9 | 5 |
| 10 | 15 |
+----+-------+
This should give the desired result. I don't know how your table is called so I named it "tbltest".
Naming a table date is generally a bad idea as it also refers to other things (functions, data types,...) so I renamed it "fdate". Using uppercase characters in field names or tablenames is also a bad idea as it makes your statements less database independent (some databases are case sensitive and some are not).
SELECT
A.fdate,
A.units - coalesce(B.units, 0) AS units
FROM
tbltest A left join tbltest B ON A.fdate = B.fdate + INTERVAL 1 DAY