gaps and islands with date format - mysql

Lets say I have a table like this:
ma_id
act_date
1
2023-01-01
1
2023-01-02
1
2023-01-03
1
2023-01-05
1
2023-01-06
2
2023-02-08
2
2023-02-09
I have read a lot of guides but couldn't find what I was looking for...
I want a result like this:
ma_id
start_date
end_date
1
2023-01-01
2023-01-03
1
2023-01-05
2023-01-06
2
2023-02-08
2023-02-09
to select a new row each time a date is missing...
I have an sql I have tried here but it is designed for datetime-format and not just date-format....:
select *,min(act_date),max(act_date) from (select t.*,sum(case when prev_act_date >= act_date then 0 else 1 end) over(partition by ma_id,date_format(act_date,'%d-%m-%Y') order by act_date) grp from (select t.*,lag(act_date) over(partition by ma_id,date_format(act_date,'%d-%m-%Y') order by act_date) prev_act_date from XXXX.XXXX t where t.ma_id in (1,2)) t) t group by ma_id,date_format(act_date,'%d-%m-%Y'),grp order by min(act_date)

Assuming MySQL 8+:
WITH cte AS (
SELECT *, DATEDIFF(act_date,
LAG(act_date, 1, act_date - INTERVAL 1 DAY) OVER
(PARTITION BY ma_id ORDER BY act_date)) - 1 AS diff
FROM yourTable
),
cte2 AS (
SELECT *, SUM(diff) OVER (PARTITION BY ma_id ORDER BY act_date) AS grp
FROM cte
)
SELECT ma_id, MIN(act_date) AS start_date, MAX(act_date) AS end_date
FROM cte2
GROUP BY ma_id, grp
ORDER BY 1, 2;
The first CTE computes the difference in days between adjacent records. We use the long form of the LAG() window function, to ensure that the first record has a date difference of zero. The second CTE sums this date difference to form a pseudo group. Appreciate that the group number only changes when there is a date gap greater than one. Finally, we aggregate by ma_id and pseudo group to find the start and end dates.

I will try to explain step by step(using row_number() function):
1. add a fixed date column to each row, like '19700101'
select yt.ma_id, yt.act_date, '19700101' from yourTable yt;
2. add column diff of datediff from act_date to '19700101'
select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff from yourTable yt;
3. add column using row_number(partition by yt.ma_id order by yt.act_date) function to add another column to result
select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num from yourTable yt;
4. calculate diff - row_num, we interested in column diff_minus_row_num
select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num, datediff(date(yt.act_date), '19700101') - row_number() over (partition by yt.ma_id order by date(yt.act_date)) as diff_minus_row_num from yourTable yt;
5. use group by and min, max to get the results:
select outside.ma_id, min(act_date) as 'start_date', max(act_date) as 'end_date' from (select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num, datediff(date(yt.act_date), '19700101') - row_number() over (partition by yt.ma_id order by date(yt.act_date)) as diff_minus_row_num from yourTable yt) outside group by outside.diff_minus_row_num order by 1,2;

Related

How to get the value of the last record in each group?

I have a table like this:
Table "*wallet"
amount
balance
timestamp
1000
1000
2023-01-25 21:41:39
-1000
0
2023-01-25 21:41:40
200000
200000
2023-01-25 22:30:10
10000
210000
2023-01-26 08:12:05
5000
215000
2023-01-26 09:10:12
And here is the expected result: (one row per day)
min_balance
last_balance
date
0
200000
2023-01-25
210000
215000
2023-01-26
Here is my current query:
SELECT MIN(balance) min_balance,
DATE(timestamp) date
FROM wallet
GROUP BY date
How can I add last_balance? Sadly there is no something like LAST(balance) in MySQL. By "last" I mean bigger timestamp.
With MIN() and FIRST_VALUE() window functions:
SELECT DISTINCT
MIN(balance) OVER (PARTITION BY DATE(timestamp)) AS min_balance,
FIRST_VALUE(balance) OVER (PARTITION BY DATE(timestamp) ORDER BY timestamp DESC) AS last_balance,
DATE(timestamp) AS date
FROM wallet;
See the demo.
If you are running MySQL 8 or later, then we can use ROW_NUMBER() here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY DATE(timestamp) ORDER BY balance) rn_min,
ROW_NUMBER() OVER (PARTITION BY DATE(timestamp) ORDER BY timestamp DESC) rn_last
FROM yourTable
)
SELECT
DATE(timestamp) AS date,
MAX(CASE WHEN rn_min = 1 THEN balance END) AS min_balance,
MAX(CASE WHEN rn_last = 1 THEN balance END) AS last_balance
FROM cte
GROUP BY 1
ORDER BY 1;

How can i get substaction calculation row between multiple dates and display the percentage?

I have the following information using group by and some calculations
I'm trying to calculate the maximum difference value between several dates in this example are 3 dates (2022, 2021, 2020), the oldest date should calculate 0 because won't do substractions.
After detecting the maximun difference between the previous year, it must calculate the percentage:
After doing the query for maximum difference calculation between date rows. The final result should be this:
Demo with 4 dates: https://dbfiddle.uk/KF-d2KpR?hide=4
The following query is displaying without percentage:
WITH cte1 AS (
SELECT
a.date_rehearsal,
a.col1_val,
ROW_NUMBER() OVER (PARTITION BY a.date_rehearsal ORDER BY a.date_rehearsal DESC) AS rn
FROM demo a
),
cte2 AS (
SELECT
b.date_rehearsal,
b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn
ORDER BY b.date_rehearsal DESC), b.col1_val) AS diff
FROM cte1 b)
SELECT
c.date_rehearsal AS 'Dates',
MAX(c.diff) as 'max_col1_val_difference'
FROM cte2 c
GROUP BY c.date_rehearsal
ORDER BY c.date_rehearsal DESC
Can you please help me this operation to display with percentage?
Thanks in advance.
You can use a CTE to get the ROW_NUMBER by date, then SELECT the MAX difference, GROUPED BY date, using a subquery that will take the current col1_val and subtracting it from the subsequent row's value using LEAD partitioned by the ROW_NUMBER from the CTE. If the subsequent row is NULL, then subtract it from the current row's col1_val using COALESCE, which will return zero for the earliest year in your table (in your case, 2020).
WITH cte AS (
SELECT
a.date_rehearsal,
a.col1_val,
ROW_NUMBER() OVER (PARTITION BY a.date_rehearsal ORDER BY a.date_rehearsal DESC) AS rn
FROM demo a
)
SELECT
c.date_rehearsal AS 'Dates',
MAX(c.diff) as 'max_col1_val_difference',
ROUND(MAX(c.diffPercent),2) as 'max_col1_val_percent',
CONCAT(MAX(c.diff), ' (', ROUND(MAX(c.diffPercent),2), '%)') as 'max_dif_with_percentage'
FROM (
SELECT
b.date_rehearsal,
b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn ORDER BY b.date_rehearsal DESC), b.col1_val) AS diff,
(((b.col1_val - COALESCE(LEAD(b.col1_val) OVER (PARTITION BY b.rn ORDER BY b.date_rehearsal DESC), b.col1_val))/b.col1_val)*100) AS diffPercent
FROM cte b) c
GROUP BY c.date_rehearsal
ORDER BY c.date_rehearsal DESC
Result:
Dates
max_col1_val_difference
max_col1_val_percent
max_dif_with_percentage
2022-07-01
6
5.08
6 (5.08%)
2021-07-01
10
10.00
10 (10.00%)
2020-07-01
0
0.00
0 (0.00%)
db<>fiddle here.

SQL - need a query for the last_date for each user, max_date, min_date

The original table looks like:
id date name
----------------------
11 01-2021 'aaa'
11 03-2020 'bbb'
11 01-2019 'ccc'
11 12-2017 'ddd'
12 02-2011 'kkk'
12 05-2015 'lll'
12 12-2020 'mmm'
the expected output:
id. min_date. max_date name
---------------------------------
11 12-2017 01-2021 'aaa'
12 02-2011 12-2020 'mmm'
I need to have, min, max dates and the name that corresponds to the max_date.
I know a way to get min, max dates and separately how to get the date corresponding to the max_date (using ROW_NUMBER() OVER(PARTITION BY...)), but cannot figure out how to combine both together.
One option is to use ROW_NUMBER along with pivoting logic to select the name corresponding the max date per each id:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) rn
FROM yourTable
)
SELECT
id,
MIN(date) AS min_date,
MAX(date) AS max_date,
MAX(CASE WHEN rn = 1 THEN name END) AS name
FROM cte
GROUP BY
id;
Demo
Note that your current date column appears to be text. Don't store your dates as text, instead use a proper date column.
This below query should work
SELECT *
FROM tbl1 t1
INNER JOIN
(SELECT id,
min(date) AS min_date,
max(date) AS max_date
FROM tbl1
GROUP BY id) t2 ON t1.date = t2.max_date
Demo

MySQL: How to find the maximum length of an uninterrupted sequence of certain values?

Given a table:
date value
02.10.2019 1
03.10.2019 2
04.10.2019 2
05.10.2019 -1
06.10.2019 1
07.10.2019 1
08.10.2019 2
09.10.2019 2
10.10.2019 -1
11.10.2019 2
12.10.2019 1
How to find the maximum length of an uninterrupted sequence of positive values (4 in that example)?
This is a gaps-and-islands problem. One simple method is the difference of row numbers to identify the islands:
select min(date), max(date), count(*) as length
from (select t.*,
row_number() over (order by date) as seqnum_1,
row_number() over (partition by sign(value) order by date) as seqnum_2
from t
) t
group by sign(value), (seqnum_1 - seqnum_2)
order by count(*) desc
limit 1;
This is a little hard to explain. I find that if you stare at the results of the subquery, you will see how the difference identifies the groups.
Assuming there are no gaps in the dates, another method finds the next non-positive number (if any):
select t.*,
datediff(date, coalesce(next_end_date, max_date)) as num
from (select t.*,
min(case when value <= 0 then date end) over (order by date desc) as next_end_date,
max(date) over () as max_date
from t
) t
where value > 0
order by datediff(date, coalesce(next_end_date, max_date)) desc
limit 1;

How do I find the 2nd most recent order of a customer

I have the customers first order date and last order date by doing MIN and MAX on the created_at field (grouping by email), but I also need to get the customers 2nd most recent order date (the order date right before the last orderdate )
SELECT
customer_email,
COUNT(entity_id) AS NumberOfOrders,
MIN(CONVERT_TZ(created_at,'UTC','US/Mountain')) AS 'FirstOrder',
MAX(CONVERT_TZ(created_at,'UTC','US/Mountain')) AS 'MostRecentOrder',
SUM(grand_total) AS TotalRevenue,
SUM(discount_amount) AS TotalDiscount
FROM sales_flat_order
WHERE
customer_email IS NOT NULL
AND store_id = 1
GROUP BY customer_email
LIMIT 500000
Use window function ROW_NUMBER() (available in MySQL 8.0):
SELECT *
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY customer_email ORDER BY created_at) rn_asc,
ROW_NUMBER() OVER(PARTITION BY customer_email ORDER BY created_at DESC) rn_desc
FROM sales_flat_order
WHERE customer_email IS NOT NULL AND store_id = 1
) x
WHERE rn_asc = 1 OR rn_desc <= 2
This will get you the earlierst and the two latest orders placed by each customer.
Note: it is unclear what the timezone conversions are intended for. I left them apart, since they obviously do not affect the sorting order; feel free to add them as per your use case.
If you want a single record per customer, along with its total count of orders, and the date of his first, last, and last but one order, then you can aggregate in the outer query:
SELECT
customer_email,
NumberOfOrders,
MAX(CASE WHEN rn_asc = 1 THEN created_at END) FirstOrder,
MAX(CASE WHEN rn_desc = 1 THEN created_at END) MostRecentOrder,
MAX(CASE WHEN rn_desc = 2 THEN created_at END) MostRecentButOneOrder,
TotalRevenue,
TotalDiscount
FROM (
SELECT
customer_email,
created_at,
COUNT(*) OVER(PARTITION BY customer_email) NumberOfOrders,
SUM(grand_total) OVER(PARTITION BY customer_email) TotalRevenue,
SUM(discount_amount) OVER(PARTITION BY customer_email) TotalDiscount,
ROW_NUMBER() OVER(PARTITION BY customer_email ORDER BY created_at) rn_asc,
ROW_NUMBER() OVER(PARTITION BY customer_email ORDER BY created_at DESC) rn_desc
FROM sales_flat_order
WHERE customer_email IS NOT NULL AND store_id = 1
) x
GROUP BY customer_email