Let's say you have a user table that has at least the date the user signed up and an id.
Now let's say you have a separate table that tracks an action like a payment that can happen at any point in the user's lifetime. (Say like an in-app purchase.) In that table we track the userId, payment date, and an id for the payment.
So we have something that looks like this to get our schema set up:
CREATE TABLE users (
UserId INT,
AddedDate DATETIME
);
CREATE TABLE payments (
PaymentId INT,
UserId INT,
PaymentDate Datetime
);
Now you want a table that shows weekly cohorts. A table that looks something like this:
Week size w1 w2 w3 w4 w5 w6 w7
2017-08-28 1 0 0 0 1 0 0 0
2017-09-04 3 1 0 2 0 1 1 2
2017-09-11 2 0 0 1 0 0 0 1
2017-09-18 6 3 1 4 3 1 1 2
2017-09-25 2 1 1 1 0 1 2 0
2017-10-02 7 5 2 3 4 3 1 0
2017-10-09 7 4 5 1 2 5 0 0
2017-10-16 2 1 2 1 1 0 0 0
2017-10-23 7 5 4 4 3 0 0 0
2017-10-30 8 8 7 0 0 0 0 0
2017-11-06 5 5 2 0 0 0 0 0
So the first column has the week, the second has number of people that signed up that week. Say we look at week 2017-09-18. 6 people signed up that week. The 3 under the w1 column means that 3 people out of that 6 made a purchase the week they signed up. The 1 under w2 means 1 person out of that 6 made a purchase the second week they were signed up, and so on.
What query would I use to get a table that looks like that?
This query is modified from the one I wrote here: Cohort analysis in SQL
Here's the final query:
SELECT
STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
size,
w1,
w2,
w3,
w4,
w5,
w6,
w7
FROM (
SELECT u.cohort,
IFNULL(SUM(s.Offset = 0), 0) w1,
IFNULL(SUM(s.Offset = 1), 0) w2,
IFNULL(SUM(s.Offset = 2), 0) w3,
IFNULL(SUM(s.Offset = 3), 0) w4,
IFNULL(SUM(s.Offset = 4), 0) w5,
IFNULL(SUM(s.Offset = 5), 0) w6,
IFNULL(SUM(s.Offset = 6), 0) w7
FROM (
SELECT
UserId,
DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users
) as u
LEFT JOIN (
SELECT DISTINCT
payments.UserId,
FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
FROM payments
LEFT JOIN users ON (users.UserId = payments.UserId)
) as s ON s.UserId = u.UserId
GROUP BY u.cohort
) as tb
LEFT JOIN (
SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
) size ON tb.cohort = size.dt
So the core of this is we grab the users and the date they signed up and format the date by year-week number, since we are doing a weekly cohort.
SELECT
UserId,
DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users
Since we want to group by the cohort we have to put this in a subquery in the FROM part of the query.
Then we want join the payment information on the users.
SELECT DISTINCT
payments.UserId,
FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
FROM payments
LEFT JOIN users ON (users.UserId = payments.UserId)
This will get unique weekly payment events per user by the numbers of weeks they have been a user. We use distinct because if a user made 2 purchase in one week, we don't want to count that as two users.
We don't just use the payments table, because some users may sign up and not have payments. So we select from the users table and join on the payments table.
You then group by the week - u.cohort. Then you aggregate on the week numbers to find out how many people made payments the weeks after they signed up.
The version of mysql I used had sql_mode set to only_full_group_by. So to get the cohort size I put the bulk of the query in subquery so I could join on the users to get the size of the cohort.
Further considerations:
Filter by weeks is simple. tb.cohort > start date and tb.cohort < end date where start and end date are formatted with "%Y-%u". To make the query more efficient you'll probably want to filter out payment events that don't fall within the date range as well so you're not joining on data you don't need.
You may want to consider using a calender table to cover cases where there are no user sign ups during the week.
Here's a fiddle with everything working: http://sqlfiddle.com/#!9/172dbe/1
To sort by months, you need to transfer the month to Offset
MONTH(payments.PaymentDate) AS Offset
Also to add a date selection with months
DATE_FORMAT(AddedDate, "%Y-%m") AS cohort_month
And add
ORDER BY tb.cohort_month ASC
Related
I am trying to write an complex mySQL query in which there are 2 tables action and revenue what I need is:
From auction table take out location, postal code on the basis of user, cat_id, cat and spent and join with revenue table which has revenue column so as that given cat_id, cat and date I can figure out the returns that each 'postal' is generating.
Complexities:
User is unique key here
In auction table has column 'spent' but its populates only when 'event' column has 'show' but it has 'cat' entry. And 'cat_id' starts populating at any event except show. So need to map cat_id from 'cat' for event 'show' to get the spent for that cat_id.
The date has to be setup such that while joining the tables the timestamp should be compared for plus minus 10 mins. Right now in my query I have 24 hrs duration
Aggregating on postal in desc order to postal giving highest returns
**Auction Table**
dt user cat_id cat location postal event spent
2020-11-01 22:12:25 1 0 A US X12 Show 2
2020-11-01 22:12:25 1 0 A US X12 Show 2 (duplicate also in table)
2020-11-01 22:12:25 1 6 A US X12 Mid null
2020-11-01 22:13:20 2 0 B UK L23 Show 2
2020-11-01 22:15:24 2 3 B UK L23 End null
**Revenue table**
dt user cat_id revenue
2020-11-01 22:14:45 1 6 null
2020-11-01 22:13:20 2 3 3
Want to create final table(by aggregating on revenue for each 'postal' area):
location postal spend revenue returns
UK X12 2 0 0
US L23 2 3 3/2=1.5
I have written a query but unable to figure out solution for above mention 3 complexities:
Select s.location, s.postal, s.spend, e.revenue
From revenue e JOIN
auction s
on e.user = s.user
where s.event in ('Mid','End','Show') and
TO_DATE(CAST(UNIX_TIMESTAMP(e.dt, 'y-M-d') AS TIMESTAMP)) = TO_DATE(CAST(UNIX_TIMESTAMP(s.dt, 'y-M-d') AS TIMESTAMP)) and
s.cat_id in ('3') and
s.cat = 'B'
Any suggestion will be helpful
This answers the question for MySQL, which is the original tag on the question as well as mentioned in the question.
If I understand correctly, your issue is "joining" within a time frame. You can do what you want using a correlated subquery. Then the rest is aggregation, which I think is:
select location, postal, max(spend), max(revenue)
from (select a.*,
(select sum(r.revenue)
from revenue r
where r.user = a.user and
r.dte >= s.dt - interval 10 minute and
r.dte <= s.dte + interval 10 minute
) as revenue
from auction a
where s.event in ('Mid', 'End', 'Show') and
s.cat_id in (3) and
s.cat = 'B'
) a
group by location, postal;
So I have this query, that selects the users, some data, with some filters (such as group that they are in and stuff) and with them the amount they produced (in $) last month (get the last existing record from last month, using MAX(created_date)), for a management platform, which shows how much they produced this month and at the previous (us.amount_produced and up.amount_produced last_month_amount).
The problem is that it doesn't select users that are new (that haven´t produced any amount last month), and I need those to return too.
Any help is appreciated, thanks
(I was thinking about doing a JOIN or even two queries, but I´m sure about the best approach)
Note by examples below that the user #3 didnt have any logs at the User_Performance table before February, he was created on february. So the query below won't return him (i need it to return him)
User table structure:
Users
id email login amount_produced created_date
---------------------------------------------
1 foo#bar.com foo 1000 2019-12-20 22:30:01
2 jack#gmail.com jack 0 2019-12-20 22:30:01
3 john#gmail.com john 2000 2020-02-01 00:00:01
User_Group_Config table structure:
User_Group_Config
user_id group_id
---------------------------------------------
1 4
2 1
3 4
User_Performance table structure this table is a log table that a job inserts data every hour, calculating users productivity and logging:
Users
user_id amount_produced created_date
---------------------------------------------
1 500 2020-01-31 22:30:01
2 0 2020-01-31 22:30:01
1 500 2020-01-31 23:30:01
2 0 2020-01-31 23:30:01
1 1000 2020-02-01 00:30:01
2 0 2020-02-01 00:30:01
3 0 2020-02-01 00:30:01
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
up.user_id,
up.amount_produced last_month_amount
FROM
db.User_Performance AS up,
db.User_Group_Config ugc,
db.User AS us
WHERE
created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id)
AND ugc.group_id = 4
AND up.user_id = ugc.user_id
AND us.id = up.user_id;
Desired Results (note that user #2 wasn´t selected since his group_id is #1
Results
(current month) (previous month)
id email login amount_produced last_month_amount
---------------------------------------------
1 foo#bar.com foo 1000 500
3 john#gmail.com john 0 null or 0
Test
SELECT
us.id,
us.contact_phone,
us.email,
us.first_name,
us.last_name,
us.login,
ugc.group_id,
us.create_date,
us.expire_date,
us.profile_photo,
us.dashboard_enabled,
us.general_rating,
us.rework_rating,
us.amount_produced,
us.amount_spent,
up.user_id,
up.amount_produced last_month_amount
FROM db.User_Performance AS up
LEFT JOIN db.User_Group_Config ugc ON up.user_id = ugc.user_id AND ugc.group_id = 4
LEFT JOIN db.User us ON us.id = up.user_id
WHERE
up.created_date IN (SELECT
MAX(created_date)
FROM
User_Performance
WHERE
/* Here it filters only users that have data last month, I need these AND the ones that have no data to return zero here or null or undefined at this row)*/
MONTH(created_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
GROUP BY user_id);
Solved using this, with subquery and JOIN (not the best solution, but a solution):
SELECT
us.id,
us.email,
us.login,
ugc.group_id,
us.amount_produced,
(
SELECT
perf.amount_produced
FROM
User_Performance perf
WHERE
perf.user_id = us.id AND
perf.created_date BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') and CONCAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), " 23:59:59")
ORDER BY
perf.created_date DESC
LIMIT 1
) as amount_produced_last_month
FROM
User AS us
INNER JOIN
User_Group_Config ugc ON ugc.user_id = us.id
WHERE
ugc.group_id = 4;
I want to query how many cars are over booked for each date (provided a range of date) and each car type.
SQL Fiddle Schema
The required output for date range 2019-01-01 to 2019-01-06 is following.
date ,car , available
2019-01-01,ECONOMY,-1 (cars available - reservation i.e 2 - 3 = -1)
2019-01-02,ECONOMY, 0 (cars available - reservation i.e 0 - 0 = 0)
2019-01-03,ECONOMY, 0 (cars available - reservation i.e 0 - 0 = 0)
2019-01-04,ECONOMY, 0 (cars available - reservation i.e 0 - 0 = 0)
2019-01-05,ECONOMY, 2 (cars available - reservation i.e 2 - 0 = 2)
2019-01-06,ECONOMY, 2 (cars available - reservation i.e 2 - 0 = 2)
Explanation:
For date 2019-01-01 we have two cars available and reservation count is 3 so we have over booked 1 car i.e -1
For date 2019-01-02 we have zero cars available as the two cars available are gone on date 2019-01-01 and reservation count is zero so 0-0 = 0
For date 2019-01-05 we have two cars available as the two cars gone on 2019-01-01 are available now so 2 - 0 = 2
I am using MySQL 8
The answer to this question has several parts. First, you need to construct a list of dates. You can do that using a recursive CTE.
Second, you want to generate all combinations of dates and cars. Then count the number of reservations on each day, and subtract this number from the count:
with recursive dates as (
select date('2019-01-01') as dte
union all
select dte + interval 1 day
from dates
where dte < '2019-01-06'
)
select c.name, d.dte,
( c.cnt - count(r.id) ) as cars available
from dates d cross join
cars c left join
reservations r
on c.name = r.car_type and
d.dte >= r.date_out and
d.dte <= r.date_in
group by c.name, d.dte, c.cnt;
There is a table likes:
like_user_id | like_post_id | like_date
----------------------------------------
1 | 2 | 1399274149
5 | 2 | 1399271149
....
1 | 3 | 1399270129
I need to make one SELECT query and count records for specific like_post_id by grouping according periods for 1 day, 7 days, 1 month, 1 year.
The result must be like:
period | total
---------------
1_day | 2
7_days | 31
1_month | 87
1 year | 141
Is it possible?
Thank you.
I have a created a query for Oracle syntax please change it according to your db
select '1_Day' as period , count(*) as Total
from likes
where like_date>(sysdate-1)
union
select '7_days' , count(*)
from likes
where like_date>(sysdate-7)
union
select '1_month' , count(*)
from likes
where like_date>(sysdate-30)
union
select '1 year' , count(*)
from likes
where like_date>(sysdate-365)
here idea is to get single sub query for single period and apply the filter in where to match the filter.
This code shows how to build a cross-tab style query that you will likely need. This aggregates by like_post_id and you may want to put restrictions on it. Further, in terms of last month I don't know whether you mean month to date, last 30 days or last calendar month so I've left that to you.
SELECT
like_post_id,
-- cross-tab example, rinse and repeat as required
-- aside of date logic, the SUM(CASE logic is designed to be ANSI compliant but you could use IF instead of CASE
SUM(CASE WHEN FROM_UNIXTIME(like_date)>=DATE_SUB(CURRENT_DATE(), interval 1 day) THEN 1 ELSE 0 END) as 1_day,
...
FROM likes
-- to restrict the number of rows considered
WHERE FROM_UNIXTIME(like_date)>=DATE_SUB(CURRENT_DATE(), interval 1 year)
GROUP BY like_post_id
To be flexible, simply make a table time_intervals which holds from_length and to_length in seconds:
CREATE TABLE time_intervals
( id int(11) not null auto_increment primary key,
name varchar(255),
from_seconds int,
to_seconds int
);
The select is then quite straight:
select like_post_id, ti.name as interval, count(*) as cnt_likes
from time_intervals ti
left /* or inner */ join likes on likes.like_post_id = 175
and likes.like_date between unix_timestamp(now()) - ti.to_seconds and unix_timestamp(now()) + ti.from_seconds
group by ti.id
With left join you get always all intervals (even when holes exist), with inner join only the intervals which exist.
So you change only table time_intervals and can get what you want. The "175" stands for the post you want, and of course you can change to where ... in () if you want.
Here is an alternative using CROSS JOIN. First, the time difference is calculated using the TIMESTAMPDIFF function and the appropriate parameter (DAY/WEEK/MONTH/YEAR). Then, if the counts are equal to 1, then the value is added up. Finally, the CROSS JOIN is made with an inline view containing the names of the periods.
SELECT
periods.period,
CASE periods.period
WHEN '1_day' THEN totals.1_day
WHEN '7_days' THEN totals.7_days
WHEN '1_month' THEN totals.1_month
WHEN '1_year' THEN totals.1_year
END total
FROM
(
SELECT
SUM(CASE days WHEN 2 THEN 1 ELSE 0 END) 1_day,
SUM(CASE weeks WHEN 1 THEN 1 ELSE 0 END) 7_days,
SUM(CASE months WHEN 1 THEN 1 ELSE 0 END) 1_month,
SUM(CASE years WHEN 1 THEN 1 ELSE 0 END) 1_year
FROM
(
SELECT
TIMESTAMPDIFF(YEAR, FROM_UNIXTIME(like_date), NOW()) years,
TIMESTAMPDIFF(MONTH, FROM_UNIXTIME(like_date), NOW()) months,
TIMESTAMPDIFF(WEEK, FROM_UNIXTIME(like_date), NOW()) weeks,
TIMESTAMPDIFF(DAY, FROM_UNIXTIME(like_date), NOW()) days
FROM likes
) counts
) totals
CROSS JOIN
(
SELECT
'1_day' period
UNION ALL
SELECT
'7_days'
UNION ALL
SELECT
'1_month'
UNION ALL
SELECT
'1_year'
) periods
I have a record set for inspections of many pieces of equipment. The four cols of interest are equip_id, month, year, myData.
My requirement is to have EXACTLY ONE record per month for each piece of equipment.
I have a query that makes the data unique over equip_id, month, year. So there is no more than one record for each month/year for a piece of equipment. But now I need to simulate data for the missing month. I want to simply go back in time to get the last piece of my data.
So that may seem confusing, so I'll show by example.
Given this sample data:
equip_id month year myData
-----------------------------
1 1 2010 500
1 2 2010 600
1 5 2010 800
2 2 2010 300
2 4 2010 400
2 6 2010 500
I want this output:
equip_id month year myData
-----------------------------
1 1 2010 500
1 2 2010 600
1 3 2010 600
1 4 2010 600
1 5 2010 800
2 2 2010 300
2 3 2010 300
2 4 2010 400
2 5 2010 400
2 6 2010 500
Notice that I'm filling in missing data with the data from the month (or two months etc.) before. Also note that if the first record for equip 2 is in 2/2010 than I don't need a record for 1/2010 even though I have one for equip 1.
I just need exactly one record for each month/year for each piece of equipment. So if the record does not exist I just want to go back in time and grab the data for that record.
Thanks!
By no means perfect:
SELECT equip_id, month, mydata
FROM (
SELECT equip_id, month, mydata FROM equip
UNION ALL
SELECT EquipNum.equip_id, EquipNum.Num,
(SELECT Top 1 mydata
FROM equip
WHERE equip.month<n.num And equip.equip_id=equipnum.equip_id
ORDER BY equip.month desc) AS Data
FROM
(SELECT e.equip_id, n.Num
FROM
(SELECT DISTINCT equip_id FROM equip) AS e,
Numbers AS n) AS EquipNum
LEFT JOIN equip
ON (EquipNum.Num = equip.month)
AND (EquipNum.equip_id = equip.equip_id)
WHERE EquipNum.Num<DMax("month","equip")
AND
(SELECT top 1 mydata
FROM equip
WHERE equip.month<n.num And equip.equip_id=equipnum.equip_id
ORDER BY equip.month desc) Is Not Null
AND equip.equip_id Is Null AND equip.Month Is Null) AS x
ORDER BY equip_id, month
For this to work you need a Numbers table, in this case it needs only hold integers from 1 to 12. The numbers table I used is called Numbers and the field is called Num.
EDIT re years comment
SELECT equip_id, year, month, mydata
FROM (
SELECT equip_id, year, month, mydata FROM equip
UNION ALL
SELECT en.equip_id, en.year, en.Num, (SELECT Top 1 mydata
FROM equip e
WHERE e.month<n.num And e.year=en.year And e.equip_id=en.equip_id
ORDER BY e.month desc) AS Data
FROM (SELECT e.equip_id, n.Num, y.year
FROM
(SELECT DISTINCT equip_id FROM equip) AS e,
Numbers AS n,
(SELECT DISTINCT year FROM equip) AS y) AS en
LEFT JOIN equip AS e ON en.equip_id = e.equip_id
AND en.year = e.year
AND en.Num = e.month
WHERE en.Num<DMax("month","equip") AND
(SELECT Top 1 mydata
FROM equip e
WHERE e.month<n.num And e.year=en.year And e.equip_id=en.equip_id
ORDER BY e.month desc) Is Not Null
AND e.equip_id Is Null
AND e.Month Is Null) AS x
ORDER BY equip_id, year, month
I've adjusted to account for year and month... The primary principles remain the same as the original queries presented where just the month. However, for applying a month and year, you need to test for the SET of YEAR + MONTH, ie: what happens if Nov/2009, then jump to Feb/2010, You can't rely on just a month being less than another, but the "set". So, I've apply the year * 12 + month to prevent a false value such as Nov=11 + year=2009 = 2009+11 = 2020, then Feb=2 of year=2010 = 2010+2 = 2012... But 2009*12 = 24108 + Nov = 11 = 24119 compared to 2010*12 = 24120 + Feb =2 = 24122 -- retains proper sequence per year/month combination. The rest of the principles apply. However, one additional, I created a table to represent the span of years to consider. For my testing, I added a sample Equip_ID = 1 entry with a Nov-2009, and Equip_ID = 2 with a Feb-2011 entry and the proper roll-over works too. (Table C_Years, column = year and values of 2009, 2010, 2011)
SELECT
PYML.Equip_ID,
PYML.Year,
PYML.Mth,
P1.MyData
FROM
( SELECT
PAll.Equip_ID,
PAll.Year,
PAll.Mth,
( SELECT MAX( P1.Year*12+P1.Mth )
FROM C_Preset P1
WHERE PAll.Equip_ID = P1.Equip_ID
AND P1.Year*12+P1.Mth <= PAll.CurYrMth) as MaxYrMth
FROM
( SELECT
PYM1.Equip_ID,
Y1.Year,
M1.Mth,
Y1.Year*12+M1.Mth as CurYrMth
FROM
( SELECT p.equip_id,
MIN( p.year*12+p.mth ) as MinYrMth,
MAX( p.year*12+p.mth ) as MaxYrMth
FROM
C_Preset p
group by
1
) PYM1,
C_Years Y1,
C_Months M1
WHERE
Y1.Year*12+M1.Mth >= PYM1.MinYrMth
AND Y1.Year*12+M1.Mth <= PYM1.MaxYrMth
) PAll
) PYML,
C_Preset P1
WHERE
PYML.Equip_ID = P1.Equip_ID
AND PYML.MaxYrMth = P1.Year*12+P1.Mth
If this is going to be a repetative thing/report, I would just create a temporary table with 12 months -- then use that as the primary table, and do a left OUTER join to the rest of your data. This way, you know you'll always get every month, but only when a valid join to the "other side" is identified, you'll get that data too. Ooops... missed your point about the filling in missing elements from the last element... Thinking...
The following works... and I'll describe the elements to what is going on. First, I created a temp table "C_Months" with a column Mth (month) with numbers 1-12. I used "Mth" as an abbreviation of Month to not cause possible conflict with POSSIBLE reserved word MONTH. Additionally, in my query, the table reference "C_Preset" is the prepared set of data you mentioned you already have of distinct elements.
SELECT
LVM.Equip_ID,
LVM.Mth,
P1.Year,
P1.MyData
FROM
( SELECT
JEM.Equip_ID,
JEM.Mth,
( SELECT MAX( P.Mth )
FROM C_Preset P
WHERE P.Equip_ID = JEM.Equip_ID
AND P.Mth <= JEM.Mth ) as MaxMth
FROM
( SELECT distinct
p.equip_id,
c.mth
FROM
C_months c,
C_Preset p
group by
1, 2
HAVING
c.mth >= MIN( p.Mth )
and c.mth <= MAX( p.Mth )
ORDER BY
1, 2 ) JEM
) LVM,
C_Preset P1
WHERE
LVM.Equip_ID = P1.Equip_ID
AND LVM.MaxMth = P1.Mth
ORDER BY
1, 2
The inner most query is a query of the available months (C_Months) associated with a given equipment ID. In your example, equipment ID 1 had a values of 1,2,5. So this would return 1, 2, 3, 4, 5. And for Equipment ID 2, it started with 2, but ended with 6, so it would return 2, 3, 4, 5, 6. Hence the aliased reference JEM (Just Equipment Months)
Then, the field selection for MaxMth (Maximum month)... This is the TRICKY ONE
( SELECT MAX( P.Mth )
FROM C_Preset P
WHERE P.Equip_ID = JEM.Equip_ID
AND P.Mth <= JEM.Mth ) as MaxMth
From this, stating I want the maximum month AVAILABLE (from JEM) associated with the given equipment that is AT OR LESS than the month In question (detecting the highest "valid" equipment item/month within the qualified list. The result of this would result in...
Equip_ID Mth MaxMth
1 1 1
1 2 2
1 3 2
1 4 2
1 5 5
2 2 2
2 3 2
2 4 4
2 5 4
2 6 6
So, for your example of ID = 1, you had months 1, 2, 5 (3 and 4 were missing), so the last valid month that 3 and 4 would refer to is sequence #2's month. Likewise for ID = 2, you had months 2, 4 and 6... Here, 3 would refer back to 2, 5 would refer back to 4.
The rest is the easy part. Now, we join your LVM (Last Valid Month) result as shown above to your original C_Preset (less records). But since we now have the last valid month to directly associate to an existing record in the C_Preset, we join by equipment id and the MaxMth colum, and NOT THE ACTUAL month.
Hope this helps... Again, you'll probably have to change my "mth" column references to "month" to match your format.