sql exclude rows based on first occurrence of data and conditions

sql exclude rows based on first occurrence of data and conditions - mysql

I have created a dataset that has columns for 2 customers:
Cust_No Transaction_date amount credit_debit running_total row_num
1 5/27/2022 800 D -200 1
1 5/26/2022 300 D 600 2
1 5/22/2022 800 C 900 3
1 5/20/2022 100 C 100 4
9 5/16/2022 500 D -300 1
9 5/14/2022 300 D 200 2
9 5/6/2022 200 C 500 3
9 5/5/2022 500 D 300 4
9 5/2/2022 300 D 800 5
9 5/2/2022 500 C 1100 6
9 5/1/2022 500 C 600 7
9 5/1/2022 100 C 100 8
The result I am looking for is:
Cust_No Transaction_date amount credit_debit running_total row_num
1 5/27/2022 800 D -200 1
1 5/26/2022 300 D 600 2
1 5/22/2022 800 C 900 3
9 5/16/2022 500 D -300 1
9 5/14/2022 300 D 200 2
9 5/6/2022 200 C 500 3
9 5/5/2022 500 D 300 4
9 5/2/2022 300 D 800 5
9 5/2/2022 500 C 1100 6
I sorted the dataset based on latest transaction for each customer.
We note the latest transaction amount and search for first occurrence of same amount that was a credit (C) and exclude the rest of the rows after it.
In the example above: Customer 9 has lastest debit transaction of 500, so we look for most recent credit transaction of 500 and exclude all the rows after that for customer 9.
Progress Made so far:
calculated the running total using logic:
sum (case when credit_debit ='C' then amount else -1*amount end) over (partition by cust_no order by transaction_date desc ) as running_total
I also got the data using lead 1,2,3,4,5 but this is not efficient and I could have multiple rows before I find the first credit number with amount same as 1st row:
case when lead(amount, 1) over(partition by cust_no order by transaction_date desc) = amount then amount else null end as lead1

No sure which dbms this is for but it need a lateral join in postgres.
It searches for the most recent transaction identified when rn = 1, then it matches that amount to an earlier credit transaction of the same amount and using the rn of that row to form a boundary of row numbers to be returned:
with CTE as (
select
Cust_No, Transaction_date, amount, credit_debit, running_total
, row_number() over(partition by cust_no order by transaction_date DESC) as rn
from mytable
)
, RANGE as (
select *
from CTE
left join lateral (
select c.rn as ignore_after
from CTE as c
where CTE.Cust_No = c.Cust_No
and CTE.amount = c.amount
and c.credit_debit = 'C'
and CTE.rn = 1
order by c.rn ASC
limit 1
) oa on true
where CTE.rn = 1
)
select
CTE.*
from CTE
inner join RANGE on CTE.rn between RANGE.rn and RANGE.ignore_after
and CTE.cust_no = RANGE.cust_no
Cust_No | Transaction_date | amount | credit_debit | running_total | rn
------: | :--------------- | -----: | :----------- | ------------: | -:
1 | 2022-05-27 | 800 | D | -200 | 1
1 | 2022-05-26 | 300 | D | 600 | 2
1 | 2022-05-22 | 800 | C | 900 | 3
9 | 2022-05-16 | 500 | D | -300 | 1
9 | 2022-05-14 | 300 | D | 200 | 2
9 | 2022-05-06 | 200 | C | 500 | 3
9 | 2022-05-05 | 500 | D | 300 | 4
9 | 2022-05-02 | 300 | D | 800 | 5
9 | 2022-05-02 | 500 | C | 1100 | 6
for postgres see: db<>fiddle here
nb: for an "outer apply" example I have also used SQL Server in the following fiddle see: db<>fiddle here

Related

mySQL how to select some data in the same field from limit date

I got the data from my table with the query
SELECT dt, place
FROM horseri
WHERE horseid = 'C299'
AND dt < '20200715'
ORDER BY dt DESC
as below, where dt is the date and the place is the winning place
dt | place
----------------------
2020-07-12 | 8
2020-06-07 | 2
2020-05-17 | 3
2020-04-12 | 9
2020-03-29 | 12
2020-03-01 | 3
2020-02-16 | 4
2020-01-27 | 5
2019-12-18 | 3
2019-11-23 | 10
2019-10-30 | 2
2019-10-01 | 9
2019-09-08 | 2
2019-07-14 | 7
2019-07-01 | 13
2019-06-16 | 7
2019-05-18 | 8
2019-03-31 | 13
2019-03-17 | 12
How can I get the first 3 winning places from the data only by the last 10 date ?
My expected output will be
dt | place
----------------------
2020-06-07 | 2
2020-05-17 | 3
2020-03-01 | 3
2019-12-18 | 3

Use a subquery to get the most recent 10 dates. Then select the top 3 places from that.
SELECT dt, place
FROM (
SELECT dt, place
FROM horseri
where horseid = 'C299'
ORDER BY dt DESC
LIMIT 10
) as x
WHERE place <= 3

The more modern way of writing Barmar's answer (assuming it be what the OP wants here), would be to use ROW_NUMBER:
SELECT dt, place
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY dt DESC) rn
FROM horseri
WHERE horseid = 'C299'
) t
WHERE rn <= 10 AND place <= 3;
To isolate individual places, just change the outer WHERE clause. For example, for second place finishers in the most recent 10 dates, use:
WHERE rn <= 10 AND place = 2

how to select the row where sum reach 1000?

id | amount
1 | 96
2 | 0.63
3 | 351.03
4 | 736
5 | 53
6 | 39
7 | 105
8 | 91
I want to get the row where sum(amount) reach 1000
please note only the row that trigger 1000

This query should do what (I think) you want:
select id, (select sum(amount)
from table1 t1
where t1.id <= table1.id) as total
from table1
having total >= 1000
limit 1
For your sample table, it gives
id total
4 1183.66

Format SQL table removing 0's and grouping them based on years

I have a table which I want to format. The query being used for the table is
select x.year,
round(avg(case when c.mark >= 50 and x.term = 'S1' then 1 else 0
end)::numeric,2) as s1_pass_rate,
round(avg(case when c.mark >= 50 and x.term = 'S2' then 1 else 0
end)::numeric,2) as s2_pass_rate
from course_enrolments c join
courses s
on c.course = s.id
join semesters x on s.semester = x.id
where s.subject in (select id from subjects where name = 'COMP SYS') and
c.mark IS NOT NULL
group by x.year, x.term;
It generates the following table:
year | s1_pass_rate | s2_pass_rate
------+--------------+--------------
2003 | 1.00 | 0.00
2003 | 0.00 | 1.00
2004 | 1.00 | 0.00
2004 | 0.00 | 0.85
2005 | 1.00 | 0.00
2005 | 0.00 | 1.00
2006 | 1.00 | 0.00
2006 | 0.00 | 1.00
I want to format it to:
year | s1_pass_rate | s2_pass_rate
------+--------------+--------------
03 | 1.00 | 1.00
04 | 1.00 | 0.85
05 | 1.00 | 1.00
06 | 1.00 | 1.00
Not sure how to group the years like that and remove values with 0.00. Please help me with this. Thanks

You can try grouping by the year alone (not the term), and then take the sum of each of the two pass rate columns. The zero will effectively no-op in that summation, leaving you with the non zero values you want. Try the following query:
SELECT RIGHT(CONVERT(varchar(4), t.year), 2)
SUM(t.s1_pass_rate) AS s1_pass_rate,
SUM(t.s2_pass_rate) AS s2_pass_rate
FROM
(
SELECT x.year,
ROUND(AVG(CASE WHEN c.mark >= 50 AND x.term = 'S1'
THEN 1 ELSE 0 END)::NUMERIC, 2) AS s1_pass_rate,
ROUND(AVG(CASE WHEN c.mark >= 50 AND x.term = 'S2'
THEN 1 ELSE 0 END)::NUMERIC, 2) AS s2_pass_rate
FROM course_enrolments c
INNER JOIN courses s
ON c.course = s.id
INNER JOIN semesters x
ON s.semester = x.id
WHERE s.subject in (SELECT id FROM subjects WHERE name = 'COMP SYS') AND
c.mark IS NOT NULL
GROUP BY x.year, x.term
) t
GROUP BY t.year

Select latest data per group from joined tables

I have two tables like this:
survey:
survey_id | store_code | timestamp
product_stock:
survey_id | product_code | production_month | value
How can I get latest value, based on survey timestamp and grouped by store_code, product_code, and production_month?
for example if I have
survey_id | store_code | timestamp
1 store_1 2015-04-20
2 store_1 2015-04-22
3 store_2 2015-04-21
4 store_2 2015-04-22
survey_id | product_code | production_month | value
1 product_1 2 15
2 product_1 2 10
1 product_1 3 20
1 product_2 2 12
3 product_2 2 23
4 product_2 2 17
It'd return result like this
survey_id | store_code | time_stamp | product_code | production_month | value
2 store_1 2015-04-22 product_1 2 10
1 store_1 2015-04-20 product_1 3 20
1 store_1 2015-04-20 product_2 2 12
4 store_2 2015-04-22 product_2 2 17
and it needs to be as fast as possible, seeing the database is quite large in size

UPDATED - please run query again
Here is my answer:
SELECT survey.survey_id, survey.store_code, survey.timestamp, product_stock.survey_id, product_stock.product_code, product_stock.production_month, product_stock.value
FROM survey
INNER JOIN product_stock
ON survey.survey_id = product_stock.survey_id
WHERE survey.timestamp = (SELECT MAX(timestamp)
FROM survey)
GROUP BY survey.store_code,product_stock.product_code,product_stock.production_month;

Getting Max date from multiple table after INNER JOIN

I have two following tables
table 1)
ID | HOTEL ID | NAME
1 100 xyz
2 101 pqr
3 102 abc
table 2)
ID | BOOKING ID | DEPARTURE DATE | AMOUNT
1 1 2013-04-12 100
2 1 2013-04-14 120
3 1 2013-04-9 90
4 2 2013-04-14 100
5 2 2013-04-18 150
6 3 2013-04-12 100
I want to get reault in mysql such that it take the row from table two with MAX DEPARTURE DATE.
ID | BOOKING ID | DEPARTURE DATE | AMOUNT
2 1 2013-04-14 120
5 2 2013-04-18 150
6 3 2013-04-12 100

SELECT b.ID,
b.BookingID,
a.Name,
b.departureDate,
b.Amount
FROM Table1 a
INNER JOIN Table2 b
ON a.ID = b.BookingID
INNER JOIN
(
SELECT BookingID, MAX(DepartureDate) Max_Date
FROM Table2
GROUP BY BookingID
) c ON b.BookingID = c.BookingID AND
b.DepartureDate = c.Max_date
SQLFiddle Demo

Well,
SELECT * FROM `table2` ORDER BY `DEPARTURE_DATE` DESC LIMIT 0,1
should help

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

sql exclude rows based on first occurrence of data and conditions - mysql

Related

mySQL how to select some data in the same field from limit date

how to select the row where sum reach 1000?

Format SQL table removing 0's and grouping them based on years

Select latest data per group from joined tables

Getting Max date from multiple table after INNER JOIN

Categories

Resources