Finding the latest record in each window - MariaDB/MySQL - mysql

In MariaDb 10.3, how to find the latest(based on timestamp) row for each window(or partition, I am not entirely clear on the terminology here)?
Consider the following table with data
ItemID
Itemname
Value
Timestamp
1
A
22
2021-12-22 20:01:00
1
A
2
2021-12-22 15:09:44
1
A
3
2021-12-22 14:39:49
2
B
54
2021-12-22 12:46:37
2
B
23
2021-12-22 12:17:52
2
B
43
2021-12-22 11:19:11
1
A
23
2021-12-22 04:00:58
1
A
53
2021-12-22 03:00:58
3
C
21
2021-12-21 04:00:58
2
B
74
2021-12-21 04:06:58
2
B
36
2021-12-21 04:06:09
1
A
34
2021-12-21 03:08:09
Desired output
ItemID
ItemName
Value
Timestamp
1
A
22
2021-12-22 20:01:00
2
B
54
2021-12-22 12:46:37
1
A
23
2021-12-22 04:00:58
3
C
21
2021-12-21 04:00:58
2
B
74
2021-12-21 04:06:58
1
A
34
2021-12-21 03:08:09

Following query generates expected result
WITH ordered AS (
SELECT
*,
LAG(`ItemID`) OVER (ORDER BY `Timestamp` DESC) AS LastItem
FROM dataset
)
SELECT `ItemID`, `ItemName`, `Value`, `Timestamp`
FROM ordered
WHERE `ItemID` <> `LastItem` OR `LastItem` IS NULL
ORDER BY `Timestamp` DESC
demo

Related

Use date (-1 day) on next row to be the end date for current row

I am trying to self join in my current script in order to find the next row and then whatever day specified it should minus 1 day from it and put that in the end date column for the current row, but I seem to be going wrong somewhere.
SELECT
BCG.BudgetId
,B.CustomerId
,CAST(BCG.StartOfPeriod AS DATE) AS StartOfPeriod
,BCG2.EndOfPeriod
,ROUND(SUM(BCG.Charge),2) AS ExpenditureBudget
,ROUND(SUM(BCG.Consumption),2) AS ConsumptionBudget
,ROW_NUMBER() OVER (PARTITION BY BCG.BudgetId ORDER BY BCG.StartOfPeriod ASC) AS rowNum
,B.Status
FROM Budgets_BudgetCalcGroup BCG
INNER JOIN Budgets_Budget B ON B.Id = BCG.BudgetId
LEFT JOIN Budgets_BudgetCalcGroup BCG2 ON
BCG2.EndOfPeriod = (SELECT MIN(StartOfPeriod)-1
FROM Budgets_BudgetCalcGroup AS t3
WHERE t3.StartOfPeriod > t1.StartOfPeriod
)
WHERE B.Status = 2
GROUP BY BCG.BudgetId,StartOfPeriod
Error Received:
Unknown Column BCG2.EndOfPeriod in field list
Expected Output:
254 41 2018-09-01 2018-09-30 29017.8 542331.59 1 2
254 41 2018-10-01 2018-10-31 27858.82 575545.97 2 2
254 41 2018-11-01 2018-11-30 28927.71 576106.15 3 2
254 41 2018-12-01 NULL 34639.71 613779.57 4 2
I found an alternative way other than doing a self join which utilises the LEAD() function.
DATE_ADD(CAST(LEAD(BCG.StartOfPeriod, 1) OVER (PARTITION BY BCG.BudgetId ORDER BY BCG.StartOfPeriod) AS DATE),INTERVAL -1 DAY) AS EndOfPeriod
Output:
254 41 2018-09-01 2018-09-30 29017.8 542331.59 1
254 41 2018-10-01 2018-10-31 27858.82 575545.97 2
254 41 2018-11-01 2018-11-30 28927.71 576106.15 3
254 41 2018-12-01 2018-12-31 34639.71 613779.57 4

Find rows where ID matches and date is within X days

Somewhat new to SQL and I'm running into a bit of issue with a project. I have a table like this:
ID
subscription_ID
renewal_date
1
11
2022-01-01 00:00:00
2
11
2022-01-02 00:00:00
3
12
2022-01-01 00:00:00
4
12
2022-01-01 12:00:00
5
13
2022-01-01 12:00:00
6
13
2022-01-03 12:00:00
My goal is to return rows where the subscription_ID matches and the start_date is within or equal to a certain # of days (hours would work as well). For instance, I'd like rows where subscription_ID matches and the start_date is within or equal to 1 day such that my results from the table above would be:
ID
subscription_ID
renewal_date
1
11
2022-01-01 00:00:00
2
11
2022-01-02 00:00:00
3
12
2022-01-01 00:00:00
4
12
2022-01-01 12:00:00
Any assistance would be greatly appreciated--thanks!
If I understand correctly maybe you are trying something like:
select t.*
from test_tbl t
join ( SELECT subscription_id
, MAX(diff) max_diff
FROM
( SELECT x.subscription_id
, DATEDIFF(MIN(y.start_date),x.start_date) diff
FROM test_tbl x
JOIN test_tbl y ON y.subscription_id = x.subscription_id
AND y.start_date > x.start_date
GROUP BY x.subscription_id , x.start_date
) z
GROUP BY subscription_id
) as t1 on t.subscription_id=t1.subscription_id
where t1.max_diff<=1;
Result:
id subscription_id start_date
1 11 2022-01-01 00:00:00
2 11 2022-01-02 00:00:00
3 12 2022-01-01 00:00:00
4 12 2022-01-01 12:00:00
The subquery returns:
subscription_id max_diff
11 1
12 0
13 2
which is used on the where condition.
Demo

MySQL: select sum of column having specific values

What I'm trying to do is to select a specific amount of tickets (max 2) and every person that has the sum of the number of tickets less of 3 and the valid field has to be != 'e'
I have this table:
ID
id_person
nr_tickets
valid
1
220
1
s
2
220
1
s
3
330
2
s
4
330
1
e
5
331
1
s
6
220
2
s
7
441
1
s
8
442
2
s
9
443
1
s
10
444
1
s
11
445
2
s
Here is what I did:
SELECT m.id, m.id_person, m.nr_tickets, m.valid
FROM table m
JOIN table m1 ON m1.id <= m.id
WHERE m.nr_tickets > 0
GROUP BY m.id
HAVING SUM(case when m.valid != 'e' then m1.nr_tickets end) <= 10
This query gives me
ID
id_person
nr_tickets
valid
1
220
1
s
2
220
1
s
3
330
2
s
5
331
1
s
6
220
2
s
7
441
1
s
8
442
2
s
As you can see the query it's almost right, the thing is that the person 220 in the results has the sum of the tickets is greater than 2.
What I'm trying to achieve is to bypass the ID 6, and to have instead the ID 9
select `id`,`id_person`, sum(`nr_tickets`) as `nr_tickets`, `valid`
from `test`
group by `id_person`
having sum(`nr_tickets`) < 3 and `valid`!="e"
Output:
id id_person nr_tickets valid
5 331 1 s
7 441 1 s
8 442 2 s
9 443 1 s
10 444 1 s
11 445 2 s

Still show the proper set of time even if there's no entry for that time

I have this query where it gets the average and group the values by 15 mins from 12 AM to 11:45 PM.
SELECT FROM_UNIXTIME(t_stamp/1000, '%m/%d/%Y %l:%i %p') as t_stamp,
ROUND(AVG(CASE WHEN id = '001' THEN value END),2) Value1,
ROUND(AVG(CASE WHEN id = '002' THEN value END),2) Value2,
ROUND(AVG(CASE WHEN id = '003' THEN value END),2) Value3
FROM table1
WHERE tagid IN ("001", "002", "003") and
date(from_unixtime(t_stamp/1000)) BETWEEN "2014-05-01" AND "2014-05-01"
GROUP BY DATE(from_unixtime(t_stamp/1000)), HOUR(from_unixtime(t_stamp/1000)), MINUTE(from_unixtime(t_stamp/1000)) DIV 15
The output looks like this
t_stamp | Value1 | Value2 | Value3
05/01/2014 12:00 AM | 199 | 99 | 100
05/01/2014 12:15 AM | 299 | 19 | 140
05/01/2014 12:30 AM | 399 | 59 | 106
05/01/2014 12:45 AM | 499 | 59 | 112
.
.
.
05/01/2014 11:00 PM | 149 | 199 | 100
05/01/2014 11:15 PM | 599 | 93 | 123
05/01/2014 11:30 PM | 129 | 56 | 150
05/01/2014 11:45 PM | 109 | 60 | 134
It works fine but I've noticed that sometimes if there's no entry for like the time 12:30 instead of showing
t_stamp | Value1 | Value2 | Value3
05/01/2014 12:00 AM | 199 | 99 | 100
05/01/2014 12:15 AM | 299 | 19 | 140
05/01/2014 12:30 AM | Null | Null | Null
05/01/2014 12:45 AM | 499 | 59 | 112
It will show the set of time like this:
t_stamp | Value1 | Value2 | Value3
05/01/2014 12:00 AM | 199 | 99 | 100
05/01/2014 12:15 AM | 299 | 19 | 140
05/01/2014 12:33 AM | 122 | 141 | 234
05/01/2014 12:45 AM | 499 | 59 | 112
What I would like to happen is when there's no time for that 15 min group it will still show the proper set of time and then just show null on the column values. The output I would like is like this:
t_stamp | Value1 | Value2 | Value3
05/01/2014 12:00 AM | 199 | 99 | 100
05/01/2014 12:15 AM | 299 | 19 | 140
05/01/2014 12:30 AM | Null | Null | Null
05/01/2014 12:45 AM | 499 | 59 | 112
How can I do this?
Thank You.
You need a table that's a source of cardinal numbers as a start for this. For the moment let's assume it exists, and it's called cardinal.
Then, you need to create a query (a virtual table) that will return rows with timestamps every fifteen minutes, starting with the earliest relevant timestamp and ending with the latest. Here's how to do that for your query.
SELECT '2014-05-01' + INTERVAL (cardinal.n * 15) MINUTE as t_stamp
FROM cardinal
WHERE cardinal.n <= 24*4
Then you need to JOIN that virtual table to your existing query, as follows
SELECT DATE_FORMAT(t_stamp.t_stamp, '%m/%d/%Y %l:%i %p') t_stamp,
ROUND(AVG(CASE WHEN id = '001' THEN value END),2) Value1,
ROUND(AVG(CASE WHEN id = '002' THEN value END),2) Value2,
ROUND(AVG(CASE WHEN id = '003' THEN value END),2) Value3
FROM table1 AS t
LEFT JOIN (
SELECT '2014-05-01' + INTERVAL (cardinal.n * 15) MINUTE as t_stamp
FROM cardinal
WHERE cardinal.n <= 24*4
) AS t_stamp
ON t_stamp.t_stamp = FROM_UNIXTIME(t.t_stamp/1000)
WHERE tagid IN ("001", "002", "003")
AND date(from_unixtime(t_stamp/1000)) BETWEEN "2014-05-01" AND "2014-05-01"
GROUP BY DATE(from_unixtime(t_stamp/1000)),
HOUR(from_unixtime(t_stamp/1000)),
MINUTE(from_unixtime(t_stamp/1000)) DIV 15
Notice that the LEFT JOIN makes sure the rows will NULL values from your original query get included in the result set.
Now, where does this magical cardinal table come from?
You can generate it as two views, like this. This particular view generates numbers from 0 to 100 000, which is more than enough for quarters of hours for a year.
CREATE OR REPLACE VIEW cardinal10 AS
SELECT 0 AS N UNION
SELECT 1 AS N UNION
SELECT 2 AS N UNION
SELECT 3 AS N UNION
SELECT 4 AS N UNION
SELECT 5 AS N UNION
SELECT 6 AS N UNION
SELECT 7 AS N UNION
SELECT 8 AS N UNION
SELECT 9 AS N;
CREATE OR REPLACE VIEW cardinal AS
SELECT A.N + 10*(B.N + 10*(C.N + 10*(D.N + 10*(E.N)))) AS N
FROM cardinal10 A,cardinal10 B,cardinal10 C,
cardinal10 D,cardinal10 E;
Here's a writeup on the topic.
http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

How to select all records where a field is of a certain value until a record shows up that has a different value?

Let's say that we have a table with COLUMN1 and COLUMN 2. Here's a sample of the records:
COLUMN 1 | COLUMN 2
124 | 12
124 | 11
124 | 10
124 | 9
26 | 8
65 | 7
65 | 6
65 | 5
65 | 4
23 | 3
124 | 2
124 | 1
124 | 0
There is absolutely no pattern to this, but what I'd like to do is get:
COUNT(*) | COLUMN 1 | Smallest Column 2
4 | 124 | 9
1 | 26 | 8
4 | 65 | 4
1 | 23 | 3
3 | 124 | 0
So far, I've been doing this with PHP, but I'd like to find a way to do this in MySQL, as I'm sure it'd be a lot more efficient. The problem is, I can't even think of where to start with this. A regular GROUP BY COLUMN 1 wouldn't work because I want two results for 124, since it appears in two different instances. I've been fiddling around for hours and looking into the documentation and Google, but I haven't been able to find anything yet, and I was wondering if any of you would be able to point me in the right direction. Is this even possible with MySQL?
Well, it took a bit of fiddling, but here it is!
This assumes you have an id column in your table that you order by to get a consistent ordering (if you don't have an id column, order by timestamp or whatever in the inner query).
set #prev := '', #low := 0, #cnt := 0, #grp :=0;
select cnt, column1, low
from (
select
column2,
#low := if(#prev = column1, least(column2, #low), column2) low,
#cnt := if(#prev = column1, #cnt + 1, 1) cnt,
#grp := if(#prev = column1, #grp, #grp + 1) grp,
#prev := column1 column1
from (select column1, column2 from so9091342 order by id) x
order by grp, cnt desc) y
group by grp;
Here's the sql needed to set up a table for testing:
create table so9091342 (id int primary key auto_increment, column1 int, column2 int);
insert into so9091342 (column1, column2) values (124,12),(124,11),(124,10),(124,9),(26,8),(65,7),(65,6),(65,5),(65,4),(23,3),(124,2),(124,1),(124,0);
Output of above query:
+------+---------+------+
| cnt | column1 | low |
+------+---------+------+
| 4 | 124 | 9 |
| 1 | 26 | 8 |
| 4 | 65 | 4 |
| 1 | 23 | 3 |
| 3 | 124 | 0 |
+------+---------+------+
p.s. I named the table so9091342 because this is SO question ID #9091342.
Interesting question. I know Oracle much better than MySQL so I was able to get it working in Oracle. Might be a better way but this is what I came up with.
select count(col1) as cnt, col1, min(col2) as smallestCol2
from (
select col1, col2, col2-rnk as rnk
from
(
select col1, col2, RANK() OVER (PARTITION by col1 order by col2 asc) as rnk
from tmp_tbl
)
)
group by col1,rnk
order by min(col2) desc
I'm not quite sure how rank and partition work in MySQL but this might be helpful:
Rank function in MySQL
EDIT: To clarify what is going on in my query:
The inner query assigns a unique counter (RNK) to each value in column 1. The result of the most inner query is:
COL1 COL2 RNK
23 3 1
26 8 1
65 4 1
65 5 2
65 6 3
65 7 4
124 0 1
124 1 2
124 2 3
124 9 4
124 10 5
124 11 6
124 12 7
By subtracting the rank from column 2, you can get a unique value for each grouping of column 1 values. The result of the second nested query is:
COL1 COL2 RNK
23 3 2
26 8 7
65 4 3
65 5 3
65 6 3
65 7 3
124 0 -1
124 1 -1
124 2 -1
124 9 5
124 10 5
124 11 5
124 12 5
Then you can group on column 1 and that unique value. The final result:
CNT COL1 SMALLESTCOL2
4 124 9
1 26 8
4 65 4
1 23 3
3 124 0