Update by LEAD/LAG or Recursive CTE? - mysql

There is a table holding date and account balance.
However, the balance is not available for some dates.
Assuming the balance does not change when date is unavailable.
How to update the balance information for all dates?
Here is an example:
Table D contains all valid dates.
2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2000-01-06
2000-01-07
2000-01-08
2000-01-09
Table A contains date and account balance.
2000-01-02 $100
2000-01-05 $200
2000-01-09 $700
Ultimately, I want to generate a table like this:
2000-01-01 null
2000-01-02 $100
2000-01-03 $100
2000-01-04 $100
2000-01-05 $200
2000-01-06 $200
2000-01-07 $200
2000-01-08 $200
2000-01-09 $700
I have thought about the following:
LEAD and LAG,
Recursive CTE
However, it seems that none of them is suitable for this scenario.

SQL Server does not support the IGNORE NULLS option for LAG() or LAST_VALUE(). That is actually the simplest method.
Instead, you can use APPLY:
select d.*, a.balance
from dates d outer apply
(select top (1) a.*
from a
where a.date <= d.date
order by a.date desc
) a;
Or the equivalent using a correlated subquery:
select d.*,
(select top (1) a.*
from a
where a.date <= d.date
order by a.date desc
fetch first 1 row only
)
from dates d;
This will work in both MySQL and SQL Server with the caveat that you need LIMIT in MySQL.
That said, if you had a large amount of data (which is unlikely at the granularity of "date"), then a two-steps of window functions are probably the better solution:
select ad.date,
max(ad.balance) over (partition by grp) as balance
from (select d.date, a.balance,
count(a.date) over (order by d.date) as grp
from dates d left join
a
on d.date = a.date
) ad;
The subquery assigns a "group" to each balance value and the following dates. This "group" is then used to assign the balance in the outer query.
This version will work in both MySQL or SQL Server.

One way you could achieve this is by using first value and creating some ranking functions. I am using SQL server
with cte as
(
select '2000-01-01' as Datenew union all
select '2000-01-02' as Datenew union all
select '2000-01-03' as Datenew union all
select '2000-01-04' as Datenew union all
select '2000-01-05' as Datenew union all
select '2000-01-06' as Datenew union all
select '2000-01-07' as Datenew union all
select '2000-01-08' as Datenew union all
select '2000-01-09' as Datenew ), cte2 as (
select '2000-01-02' as DateSal, '100' as Salary union all
select '2000-01-05' as DateSal, '200' as Salary union all
select '2000-01-09' as DateSal, '700' as Salary )
select datenew, Salary = FIRST_VALUE(salary) over (partition by ranking order by datenew) from (
select datenew, salary ,
sum(case when DateSal is not null then 1 end) over (order by datenew) ranking
from cte c
left join cte2 c2 on c.Datenew = c2.DateSal ) tst
order by datenew
--Sum creates running total to create a grouping and first value ensures that we are getting the same value for the given group.
this is the output

ANSI SQL.
Table_D
-------
dd(field name)
-------
2000-01-01
2000-01-02
2000-01-03
2000-01-04
2000-01-05
2000-01-06
2000-01-07
2000-01-08
2000-01-09
Table_A
-------
dd(field name) cost(field name)
-------
2000-01-02 $100
2000-01-05 $200
2000-01-09 $700
select a.dd
, (
case when a.cost is null then min(a.cost) OVER (partition by a.cost_group ORDER BY a.dd) else a.cost end
) as cost
from (
select a.dd, b.cost
, count(b.cost) over (order by a.dd) as cost_group
from Table_D a
left join Table_A b on (b.dd = a.dd)
) a

We can use count() over () window function to create different partitions and then min() over () function to spread minimum value to that partition.
First, I have created temporary variable table to hold OP data -
declare #xyz table (dt date, amount int)
insert into #xyz values
('2000-01-02','100'),
('2000-01-05','200'),
('2000-01-09','700');
Second, I will fetch max date from above table.
declare #maxDT date = (select cast(max(dt) as date) from #xyz);
Final, first CTE is recursive CTE to create data from 2000-01-01 to max date store in above variable. Second CTE is to create partitions.
;with cte as (
select cast('2000-01-01' as date) as dt
union all
select dateadd(day,1,cte.dt) from cte where cte.dt < #maxDT
), cte2 as (
select cte.dt, x.amount, x.dt as dt2, count(x.dt) over (order by cte.dt) as ranking
from cte
left join #xyz x on x.dt = cte.dt
)
select dt, min(amount) over (partition by ranking)
from cte2;

Related

Get sum of previous records in query and add or subtract the following results

Case:
I select an initial date and an end date, it should bring me the movements of all the products in that date range, but if there were movements before the initial date (records in table), I want to obtain the previous sum (prevData)
if the first move is exit 5 and the second move is income 2.
I would have in the first row (prevData-5), second row would have (prevData-5 + 2) and thus have a cumulative.
The prevData would be calculated as the sum of the above, validating the product id of the record, I made the query but if the product has 10 movements, I would do the query 10 times, and how would I identify the sum of another product_id?
SELECT
ik.id,
ik.quantity,
ik.date,
ik.product_id,
#balance = (SELECT SUM(quantity) FROM table_kardex WHERE product_id = ik.product_id AND id < ik.id)
from table_kardex ik
where ik.date between '2021-11-01' and '2021-11-15'
order by ik.product_id,ik.id asc
I hope you have given me to understand, I will be attentive to any questions.
#table_kardex
id|quantity|date|product_id
1 8 2020-10-12 2
2 15 2020-10-12 1
3 5 2021-11-01 1
4 10 2021-11-01 2
5 -2 2021-11-02 1
6 -4 2021-11-02 2
#result
id|quantity|date|product_id|saldo
3 5 2021-11-01 1 20 (15+5)
5 -2 2021-11-02 1 18 (15+5-2)
4 10 2021-11-01 2 18 (8+10-4)
6 -4 2021-11-02 2 14 (15+5-2)
Use MySQL 5.7
If you're using MySQL 8+, then analytic functions can be used here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY date) rn,
SUM(quantity) OVER (PARTITION BY product_id ORDER BY date) saldo
FROM table_kardex
WHERE date BETWEEN '2021-11-01' AND '2021-11-15'
)
SELECT id, quantity, date, product_id, saldo
FROM cte
WHERE rn > 1
ORDER BY product_id, date;
MySQL 5.7
Try this:
SELECT *
FROM (
SELECT product_id,
t1.`date`,
SUM(t2.quantity) - t1.quantity cumulative_quantity_before,
SUM(t2.quantity) cumulative_quantity_after
FROM table t1
JOIN table t2 USING (product_id)
WHERE t1.`date` >= t2.`date`
AND t1.`date` <= #period_end
GROUP BY product_id, t1.`date`, t1.quantity
) prepare_data
WHERE `date` >= #period_start;
The easiest solution is to use the window function SUM OVER to get the running total. In the second step reduce this to the date you want to have this started:
SELECT id, quantity, date, product_id, balance
FROM
(
SELECT
id,
quantity,
date,
product_id,
SUM(quantity) OVER (PARTITION BY product_id ORDER BY id) AS balance
from table_kardex ik
where date < DATE '2021-11-16'
) cumulated
WHERE date >= DATE '2021-11-01'
ORDER BY product_id, id;
UPDATE: You have changed your request to mention that you are using an old MySQL version (5.7). This doesn't support window functions. In that case use your original query. If I am not mistaken, though, #balance = (...) is invalid syntax for MySQL. And according to your explanation you want id <= ik.id, not id < ik.id:
SELECT
ik.id,
ik.quantity,
ik.date,
ik.product_id,
(
SELECT SUM(quantity)
FROM table_kardex
WHERE product_id = ik.product_id AND id <= ik.id
) AS balance
FROM table_kardex ik
WHERE ik.date >= DATE '2021-11-01' AND ik.date < DATE '2021-11-16'
ORDER BY ik.product_id, ik.id;
The appropriate indexes for this query are:
create index idx1 on table_kardex (date, product_id, id);
create index idx2 on table_kardex (product_id, id, quantity);

SQL - need a query for the last_date for each user, max_date, min_date

The original table looks like:
id date name
----------------------
11 01-2021 'aaa'
11 03-2020 'bbb'
11 01-2019 'ccc'
11 12-2017 'ddd'
12 02-2011 'kkk'
12 05-2015 'lll'
12 12-2020 'mmm'
the expected output:
id. min_date. max_date name
---------------------------------
11 12-2017 01-2021 'aaa'
12 02-2011 12-2020 'mmm'
I need to have, min, max dates and the name that corresponds to the max_date.
I know a way to get min, max dates and separately how to get the date corresponding to the max_date (using ROW_NUMBER() OVER(PARTITION BY...)), but cannot figure out how to combine both together.
One option is to use ROW_NUMBER along with pivoting logic to select the name corresponding the max date per each id:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) rn
FROM yourTable
)
SELECT
id,
MIN(date) AS min_date,
MAX(date) AS max_date,
MAX(CASE WHEN rn = 1 THEN name END) AS name
FROM cte
GROUP BY
id;
Demo
Note that your current date column appears to be text. Don't store your dates as text, instead use a proper date column.
This below query should work
SELECT *
FROM tbl1 t1
INNER JOIN
(SELECT id,
min(date) AS min_date,
max(date) AS max_date
FROM tbl1
GROUP BY id) t2 ON t1.date = t2.max_date
Demo

How do I select SQL data in buckets when data doesn't exist for one bucket?

I'm trying to get a complete set of buckets for a given dataset, even if no records exist for some buckets.
For example, I want to display totals by day of week, with zero total for days with no records.
SELECT
WEEKDAY(transaction_date) AS day_of_week,
SUM(sales) AS total_sales
FROM table1
GROUP BY day_of_week
If I have sales every day, I'll get 7 rows in my result representing total sales on days 0-6.
If I don't have sales on Day 2, I get no result for Day 2.
What's the most efficient way to force a zero value for day 2?
Should I join to a temporary table or array of defined buckets? ['0','1','2','3','4','5','6']
Or is it better to insert zeros outside of MySQL, after I've done the query?
I am using MySQL, but this is a general SQL question.
In MySQL, you could simply use a derived table of numbers from 1 to 7, left join it with the table, then aggregate:
select d.day_of_week, sum(sales) AS total_sales
from (
select 1 day_of_week union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
) d
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Very recent versions have the values(row...) syntax, which shortens the query:
select d.day_of_week, sum(sales) AS total_sales
from (values row(1), row(2), row(3), row(4), row(5), row(6), row(7)) d(day_of_week)
left join table1 t1 on weekday(t1.transaction_date) = d.day_of_week
group by day_of_week
Basically you want the answer to be 0 when the data is actually null for that bucket, therefore you want the max(null, 0). A max function wouldn't natively work with NULL in this way, however, you can use COALESCE to force it:
COALESCE(MAX(SUM(sales)),0)
as suggested by this answer
First off you need a calendar table; something like this or this. Or create calendar subset on the fly. I am not sure of the mySQL syntax, but here is what it would look like in SQL Server.
DECLARE
#FromDate DATE
, #ToDate DATE
-- set these variables to appropriate values
SET #FromDate = '2020-03-01';
SET #ToDate = '2020-03-31';
;WITH cteCalendar (MyDate) AS
(
SELECT CONVERT(DATE, #FromDate) AS MyDate
UNION ALL
SELECT DATEADD(DAY, 1, MyDate)
FROM cteCalendar
WHERE DATEADD(DAY, 1, MyDate) <= #ToDate
)
SELECT WEEKDAY(cte.MyDate) AS day_of_week,
SUM(sales) AS total_sales
FROM cteCalendar cte
LEFT JOIN table1 t1 ON cte.MyDate = t1.transaction_date
GROUP BY day_of_week

How to create SQL based on complex rule?

I have 3 columns (id, date, amount) and trying to calculate 4th column (calculated_column).
How to create SQL query to do following:
The way that needs to be calculated is to look at ID (e.g. 1) and see all same IDs for that month (e.g. for first occurrence - 1-Sep it should be calculated as 5 and for second occurrence - it would be 5+6=11 -> all amounts from beginning of that month including that amount).
Then for the next month (Oct) - it will find first occurrence of id=1 and store 3 in calculated_column and for the second occurrence of id=1 in Oct it will do sum from beginning of that month for the same id (3+2=5)
Assuming I've understood correctly, I would suggest a correlated subquery such as:
select t.*,
(
select sum(u.amount) from table1 u
where
u.id = t.id and
date_format(u.date, '%Y-%m') = date_format(t.date, '%Y-%m') and u.date <= t.date
) as calculated_column
from table1 t
(Change the table name table1 to suit your data)
In Oracle and MySQL 8+, you can use window functions. The corresponding date arithmetic varies, but here is the idea:
select t.*,
(case when date = max(date) over (partition by to_char(date, 'YYYY-MM') and
id = 1
then sum(amount) over (partition by to_char(date, 'YYYY-MM')
end) as calculated_column
from t;
The outer case is simply to put the value on the appropriate row of the result set. The code would be simpler if all rows in the month had the same value.
Here is a solution for oracle. Since you did not gave the table name I named it my_table, change it to the real name
select
t1.id,
t1.date,
t1.amount,
decode(t1.id, 1, sum(nvl(t2.amount, 0)), null) calculated_column
from my_table1 t1
left join my_table t2
on trunc(t2.date, 'month') = trunc(t1.date, 'month')
and t1.id = 1
group by t1.id, t1.date, t1.amount
If your version supports window function (e.g. MySQL 8 upwards)
# MySQL 8+
select
t.*
, sum(amount) over (partition by id, date_format(date, '%Y-%m-01') order by date) as calculated_column
from t
;
-- Oracle
select
t.*
, sum(amount) over (partition by id, trunc(date, 'MM') order by date) as calculated_column
from t
;

MySQL - get min/max of consecutive events in a series of rows

I have a table that looks like this:
http://sqlfiddle.com/#!9/152d2/1/0
CREATE TABLE Table1 (
id int,
value decimal(10,5),
dt datetime,
threshold_id int
);
Current Query:
SELECT sensors_id, DATE_FORMAT(datetime, '%Y-%m-%d'), MIN(value), MAX(value)
FROM Readings
WHERE datetime < "2015-11-18 00:00:00"
AND datetime > "2015-10-18 00:00:00"
AND sensors_id = 9
GROUP BY DATE_FORMAT(datetime, '%Y-%m-%d')
ORDER BY datetime DESC
What I'm trying to do is to return the min/max value in each group, where threshold_id IS NOT NULL. Therefore, the example should return something like:
min_value | max_value | start_date | end_date
9 | 10.5 | 2015-07-29 10:52:31 | 2015-07-29 10:57:31
8.5 | 9.5 | 2015-07-29 11:03:31 | 2015-07-29 11:05:31
I can't work out how to do this grouping. I need to return the min/max for each group of consecutive rows where the threshold_id IS NOT NULL.
Use user variables to compare existing value to the previous value and increment a column you can use to group by,tested on my machine.
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt)
FROM (
SELECT id,value,dt,
CASE WHEN COALESCE(threshold_id,'')=#last_ci THEN #n ELSE #n:=#n+1 END AS g,
#last_ci := COALESCE(threshold_id,'') As th
FROM
Table1, (SELECT #n:=0) r
ORDER BY
id
) s
WHERE th!=''
GROUP BY
g
For mysql 8 this could be rewritten as below.Use a CTE to get different sequences and GROUP By the difference between them.
WITH cte as (
SELECT *,
ROW_NUMBER() OVER (ORDER BY id)as rn,
ROW_NUMBER() OVER (PARTITION BY threshold_id ORDER BY id)as rnn
FROM Table1
ORDER BY id
)
SELECT MIN(value),MAX(value),MIN(dt),MAX(dt) FROM cte WHERE threshold_id IS NOT NULL GROUP BY rn-rnn
MYSQL8
FIDDLE
Your sample data only includes a single day's worth, so you only get a single row back (assuming you want to group by day):
SELECT DAYOFYEAR(dt) `day`, MIN(`value`) min_value, MAX(`value`) max_value
FROM Table1
GROUP BY `day`
ORDER BY `day` ASC