Use the result of a sub-query outside of the sub-query - mysql

I have a table structured like this.
User_id
Subscription_type
timestamp
100
PAYING
2/10/2021
99
TRIAL
2/10/2021
100
TRIAL
15/9/2021
I want my output to be the same, with an additional column pulling the trial start date when the subscriber converts to a paying subscription.
User_id
Subscription_type
timestamp
Trial_Start_date
100
PAYING
2/10/2021
15/9/2021
99
TRIAL
2/10/2021
100
TRIAL
2/10/2021
At the moment, I have this query:
SELECT *,
CASE WHEN
(SELECT `subscription_type` FROM subscription_event se1
WHERE se1.`timestamp` < se.`timestamp` AND se1.user_id = se.user_id
ORDER BY user_id DESC LIMIT 1) = 'TRIAL'
then se1.`timestamp` else 0 end as "Converted_from_TRIAL"
FROM subscription_event se
I have an error message with se1.timestamp not been defined. I understand why, but I cannot see a workaround.
Any pointer?

If you need to get two values out of the subquery, you have to join with it, not use it as an expression.
SELECT se.*,
MAX(se1.timestamp) AS Converted_from_TRIAL
FROM subscription_event AS se
LEFT JOIN subscription_event AS se1 ON se.user_id = se1.user_id AND se1.timestamp < se.timestamp AND se1.subscription_type = 'TRIAL'
GROUP BY se.user_id, se.subscription_type, se.timestamp

Thanks a lot!
For some reasons I needed to declare explicitely in SELECT the variables used in the GROUP BY . Not sure why ( I am using MySQL5.7 so maybe it is linked with that).
In any case, this is the working query.
SELECT se.user_id, se.subscription_type, se.timestamp,
MAX(se1.timestamp) AS Converted_from_TRIAL
FROM subscription_event AS se
LEFT JOIN subscription_event AS se1 ON se.user_id = se1.user_id AND se1.timestamp < se.timestamp AND se1.subscription_type = 'TRIAL'
GROUP BY se.user_id, se.subscription_type, se.timestamp

Related

Optimizing Parameterized MySQL Queries

I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run.
If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds).
I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time.
The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < #lim) while the sproc the values have been set (where limit < 300).
I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the # parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL.
I feel like I'm missing something obvious here but I don't know what it is.
The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL.
My main question is why the query is 20x faster in one format than another.
Does the way the query is sent to MySQL have anything to do with this (the '#' values sent through and can I replicate this in a stored procedure?
Updated 1st Jan
Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not)
Here is the original query:
with tmp_bat as (select bd.MatchId,
bd.matchtype,
bd.playerid,
bd.teamid,
bd.opponentsid,
bd.inningsnumber,
bd.dismissal,
bd.dismissaltype,
bd.bowlerid,
bd.fielderid,
bd.score,
bd.position,
bd.notout,
bd.balls,
bd.minutes,
bd.fours,
bd.sixes,
bd.hundred,
bd.fifty,
bd.duck,
bd.captain,
bd.wicketkeeper,
m.hometeamid,
m.awayteamid,
m.matchdesignator,
m.matchtitle,
m.location,
m.tossteamid,
m.resultstring,
m.whowonid,
m.howmuch,
m.victorytype,
m.duration,
m.ballsperover,
m.daynight,
m.LocationId
from (select *
from battingdetails
where matchid in
(select id
from matches
where id in (select matchid from battingdetails)
and matchtype = #match_type
)) as bd
join matches m on m.id = bd.matchid
join extramatchdetails emd1
on emd1.MatchId = m.Id
and emd1.TeamId = bd.TeamId
join extramatchdetails emd2
on emd2.MatchId = m.Id
and emd2.TeamId = bd.TeamId
)
select players.fullname name,
teams.teams team,
'' opponents,
players.sortnamepart,
innings.matches,
innings.innings,
innings.notouts,
innings.runs,
HS.score highestscore,
HS.NotOut,
CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)),
2) AS DECIMAL(7, 2)) 'Avg',
innings.hundreds,
innings.fifties,
innings.ducks,
innings.fours,
innings.sixes,
innings.balls,
CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground,
'' Year,
'' CountryName
from (select count(case when inningsnumber = 1 then 1 end) matches,
count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings,
LocationId,
playerid,
MatchType,
SUM(score) runs,
SUM(notout) notouts,
SUM(hundred) Hundreds,
SUM(fifty) Fifties,
SUM(duck) Ducks,
SUM(fours) Fours,
SUM(sixes) Sixes,
SUM(balls) Balls
from tmp_bat
group by MatchType, playerid, LocationId) as innings
JOIN players ON players.id = innings.playerid
join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType
join
(select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams
from playersteams pt
join teams t on pt.teamid = t.id
group by pt.playerid, t.matchtype)
as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType
JOIN
(SELECT playerid,
LocationId,
MAX(Score) Score,
MAX(NotOut) NotOut
FROM (SELECT battingdetails.playerid,
battingdetails.score,
battingdetails.notout,
battingdetails.LocationId
FROM tmp_bat as battingdetails
JOIN (SELECT battingdetails.playerid,
battingdetails.LocationId,
MAX(battingdetails.Score) AS score
FROM tmp_bat as battingdetails
GROUP BY battingdetails.playerid,
battingdetails.LocationId,
battingdetails.playerid) AS maxscore
ON battingdetails.score = maxscore.score
AND battingdetails.playerid = maxscore.playerid
AND battingdetails.LocationId = maxscore.LocationId ) AS internal
GROUP BY internal.playerid, internal.LocationId) AS HS
ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId
where innings.runs >= #runs_limit
order by runs desc, KnownAs, SortNamePart
limit 0, 300;
Wherever you see '#match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this:
1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100,
1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where
1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100,
8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index
14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where
3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
and the EXPLAIN for the query with the hardcoded values looks like this:
1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100,
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where
1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where
1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where
8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100,
8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100,
8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where
8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much
Update 2 1st Jan
AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared
In your WITH construct, are you overthinking your select in ( select in ( select in ))) ... overstating what could just be simplified to the with Innings I have in my solution.
Also, you were joining to the extraMatchDetails TWICE, but joined on the same conditions on match and team, but never utliized either of those tables in the "WITH CTE" rendering that component useless, doesn't it? However, the MATCH table has homeTeamID and AwayTeamID which is what I THINK your actual intent was
Also, your WITH CTE is pulling many columns not needed or used in subsequent return such as Captain, WicketKeeper.
So, I have restructured... pre-query the batting details once up front and summarized, then you should be able to join off that.
Hopefully this MIGHT be a better fit, function and performance for your needs.
with innings as
(
select
bd.matchId,
bd.matchtype,
bd.playerid,
m.locationId,
count(case when bd.inningsnumber = 1 then 1 end) matches,
count(case when bd.dismissaltype in ( 11, 14 ) then 0 else 1 end) innings,
SUM(bd.score) runs,
SUM(bd.notout) notouts,
SUM(bd.hundred) Hundreds,
SUM(bd.fifty) Fifties,
SUM(bd.duck) Ducks,
SUM(bd.fours) Fours,
SUM(bd.sixes) Sixes,
SUM(bd.balls) Balls
from
battingDetails bd
join Match m
on bd.MatchID = m.MatchID
where
matchtype = #match_type
group by
bd.matchId,
bd.matchType,
bd.playerid,
m.locationId
)
select
p.fullname playerFullName,
p.sortnamepart,
CONCAT(g.CountryName, ' - ', g.KnownAs) Ground,
t.team,
i.matches,
i.innings,
i.runs,
i.notouts,
i.hundreds,
i.fifties,
i.ducks,
i.fours,
i.sixes,
i.balls,
CAST( TRUNCATE( i.runs / (CAST((i.Innings - i.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg',
hs.maxScore,
hs.maxNotOut,
'' opponents,
'' Year,
'' CountryName
from
innings i
JOIN players p
ON i.playerid = p.id
join grounds g
on i.locationId = g.GroundId
and i.matchType = g.matchType
join
(select
pt.playerid,
t.matchtype,
GROUP_CONCAT(t.name SEPARATOR ', ') team
from
playersteams pt
join teams t
on pt.teamid = t.id
group by
pt.playerid,
t.matchtype) as t
on i.playerid = t.playerid
and i.MatchType = t.matchtype
join
( select
i2.playerid,
i2.locationid,
max( i2.score ) maxScore,
max( i2.notOut ) maxNotOut
from
innings i2
group by
i2.playerid,
i2.LocationId ) HS
on i.playerid = HS.playerid
AND i.locationid = HS.locationid
FROM
where
i.runs >= #runs_limit
order by
i.runs desc,
g.KnownAs,
p.SortNamePart
limit
0, 300;
Now, I know that you stated that after the server reboot, performance is better, but really, what you DO have appears to really have overbloated queries.
Not sure this is the correct answer but I thought I'd post this in case other people have the same issue.
The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run!
I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this:
create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT,
IN opponents_id INT)
begin
set #team_id = team_id;
set #opponents_id = opponents_id;
# use these variables in the SQL below
...
end
Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.

My SUM with cases seems to be repeating twice

I have some camp management software that registers users for a camp.
I am trying to get how much a user owes on their account based on how much a camp costs and whether they are using the bus, and whether or not they sign up for the horse option. (These all cost extra).
I originally was grouping by registration_ids which a camper can have multiple of if they sign up for a camp. But when I put this in I get this:
https://imgur.com/i63Bnsu
This is my sql:
SELECT srbc_campers.camper_id,
/*Calculate how much the user owes*/
SUM(
srbc_camps.cost + (CASE WHEN srbc_registration.horse_opt = 1 THEN srbc_camps.horse_opt_cost
ELSE 0
END)
+
(CASE WHEN srbc_registration.busride = 'to' THEN 35
WHEN srbc_registration.busride = 'from' THEN 35
WHEN srbc_registration.busride = 'both' THEN 60
ELSE 0
END)
- IF(srbc_registration.discount IS NULL,0,srbc_registration.discount)
- IF(srbc_registration.scholarship_amt IS NULL,0,srbc_registration.scholarship_amt)
) AS owe
FROM (
srbc_registration INNER JOIN srbc_camps ON srbc_registration.camp_id=srbc_camps.camp_id)
INNER JOIN srbc_payments ON srbc_registration.registration_id = srbc_payments.registration_id)
INNER JOIN srbc_campers ON srbc_campers.camper_id=srbc_registration.camper_id)
WHERE NOT srbc_payments.payment_type='Store'
GROUP BY srbc_campers.camper_id
This seems to be affected by how many payments they have made in their account. It multiplies the amount they owe times how many individual payments were made toward that camp. I can't figure out how to stop this.
For instance in picture above^
We have camper_id #4 and they owe 678.
I expect camper_id #4 to owe 339. They have made 2 payments on their account in srbc_payments.
Haven't been using sql for that long, so any suggestions for a better way I am open too!
You are not selecting anything from srbc_payments, just checking for registration_id in srbc_payments. Or did you forget to subtract payments from srbc_payments? You can replace the inner join with:
where srbc_registration.registration_id in
(
select t1.registration_id from srbc_payments t1
where t1.registration_id = srbc_registration.registration_id
and t1.payment_type <> 'Store'
)
This is what I ended up getting to work how I wanted it too:
SELECT owedTble.registration_id,owe
FROM (SELECT registration_id,
SUM(
srbc_camps.cost + (CASE WHEN srbc_registration.horse_opt = 1 THEN srbc_camps.horse_opt_cost
ELSE 0
END)
+
(CASE WHEN srbc_registration.busride = 'to' THEN 35
WHEN srbc_registration.busride = 'from' THEN 35
WHEN srbc_registration.busride = 'both' THEN 60
ELSE 0
END)
- IF(srbc_registration.discount IS NULL,0,srbc_registration.discount)
- IF(srbc_registration.scholarship_amt IS NULL,0,srbc_registration.scholarship_amt)
) AS owe
FROM srbc_camps INNER JOIN srbc_registration ON srbc_camps.camp_id=srbc_registration.camp_id
GROUP BY srbc_registration.registration_id
) as owedTble
I kind of understand what I did here. I ended up trying different things from this answer: My SUM with cases seems to be repeating twice
Thanks for the helpful comments from #nick and #a_horse_with_no_name

msAccess query with Sum and extra criteria

I need to get all of the Costs values for a Dog in a specific month. When I use this code with Access it says the join operation is not supported. Is there a better way to accomplish this in MS Access? I need all of the dog names to come back even if they don't have a cost associated with them for a specific month
Select Dog.DogName, Dog.DogOwner, Sum(Costs.CostAmount)
From
(Dog Left join Costs on Dog.DogName = Costs.DogName and Costs.CostMonth = 10)
Group by Dog.DogName, Dog.OwnerName
Try this:
Select
Dog.DogName, Dog.DogOwner, Sum(Costs.CostAmount) As TotalAmount
From
Dog
Left join
Costs
On
(Dog.DogName = Costs.DogName)
Where
Costs.CostMonth <= Month(Date())
Or
Costs.CostMonth Is Null
Group by
Dog.DogName, Dog.OwnerName
SELECT Dogs.DogName
, Dogs.OwnerName
, (
SELECT SUM(Costs.CostAmountAmount)
FROM Costs
WHERE Dogs.DogName = Costs.DogName AND
Costs.CostMonth =NumMonth
)
FROM Dogs;

MySQL Ignoring Outliers

I have to present some data to work colleagues and i am having issues analysing it in MySQL.
I have 1 table called 'payments'. Each payment has columns for:
Client (our client e.g. a bank)
Amount_gbp (the GBP equivalent of the value of the transaction)
Currency
Origin_country
Client_type (individual or company)
I have written pretty simple queries like:
SELECT
AVG(amount_GBP),
COUNT(client) AS '#Of Results'
FROM payments
WHERE client_type = 'individual'
AND amount_gbp IS NOT NULL
AND currency = 'TRY'
AND country_origin = 'GB'
AND date_time BETWEEN '2017/1/1' AND '2017/9/1'
But what i really need to do is eliminate outliers from the average AND/OR only include results within a number of Standard Deviations from the Mean.
For example, ignore the top/bottom 10 results of 2% of results etc.
AND/OR ignore any results that fall outside of 2 STDEVs from the Mean
Can anyone help?
--- EDITED ANSWER -- TRY AND LET ME KNOW ---
Your best best is to create a TEMPORARY table with the avg and std_dev values and compare against them. Let me know if that is not feasible:
CREATE TEMPORARY TABLE payment_stats AS
SELECT
AVG(p.amount_gbp) as avg_gbp,
STDDEV(amount_gbp) as std_gbp,
(SELECT MIN(srt.amount_gbp) as max_gbp
FROM (SELECT amount_gbp
FROM payments
<... repeat where no p. ...>
ORDER BY amount_gbp DESC
LIMIT <top_numbers to ignore>
) srt
) max_g,
(SELECT MAX(srt.amount_gbp) as min_gbp
FROM (SELECT amount_gbp
FROM payments
<... repeat where no p. ...>
ORDER BY amount_gbp ASC
LIMIT <top_numbers to ignore>
) srt
) min_g
FROM payments
WHERE client_type = 'individual'
AND amount_gbp IS NOT NULL
AND currency = 'TRY'
AND country_origin = 'GB'
AND date_time BETWEEN '2017/1/1' AND '2017/9/1';
You can then compare against the temp table
SELECT
AVG(p.amount_gbp) as avg_gbp,
COUNT(p.client) AS '#Of Results'
FROM payments p
WHERE
p.amount_gbp >= (SELECT (avg_gbp - std_gbp*2)
FROM payment_stats)
AND p.amount_gbp <= (SELECT (avg_gbp + std_gbp*2)
FROM payment_stats)
AND p.amount_gbp > (SELECT min_g FROM payment_stats)
AND p.amount_gbp < (SELECT max_g FROM payment_stats)
AND p.client_type = 'individual'
AND p.amount_gbp IS NOT NULL
AND p.currency = 'TRY'
AND p.country_origin = 'GB'
AND p.date_time BETWEEN '2017/1/1' AND '2017/9/1';
-- Later on
DROP TEMPORARY TABLE payment_stats;
Notice I had to repeat the WHERE condition. Also change *2 to whatever <factor> to what you need!
Still Phew!
Each compare will check a different stat
Let me know if this is better

Getting the latest 'role-switch' timestamp from a messages table

Problem
I am looking at trying to get the lowest timestamp (earliest) after the 'side' has changed in a ticket conversation, to see how long it has been since the first reply to the latest message.
Example:
A (10:00) : Hello
A (10:05) : How are you?
B (10:06) : I'm fine, thank you
B (10:08) : How about you?
A (10:10) : I'm fine too, thank you <------
A (10:15) : I have to go now, see you around!
Now what I am looking for is the timestamp of the message indicated by the arrow. The first message after the 'side' of the conversation changed, in this case from user to support.
Example data from table "messages":
mid conv_id uid created_at message type
2750 1 3941 1341470051 Hello support
3615 1 3941 1342186946 How are you? support
4964 1 2210 1343588022 I'm fine, thank you user
4965 1 2210 1343588129 How about you? user
5704 1 3941 1344258743 I'm fine too, thank you support
5706 1 3941 1344258943 I have to go now, see you around! support
What I have tried so far:
select
n.nid AS `node_id`,
(
SELECT m_inner.created_at
FROM messages m_inner
WHERE m_inner.mid = messages.mid AND
CASE
WHEN MAX(m_support.created_at) < MAX(m_user.created_at) THEN -- latest reply from user
m_support.created_at
ELSE
m_user.created_at
END <= m_inner.created_at
ORDER BY messages.created_at ASC
LIMIT 0,1
) AS `latest_role_switch_timestamp`
from
node n
left join messages m on n.nid = messages.nid
left join messages m_user on n.nid = m_user.nid and m_user.type = 'user'
left join messages m_support on n.nid = m_support.nid and m_support.type = 'support'
GROUP BY messages.type, messages.nid
ORDER BY messages.nid, messages.created_at DESC
Preferred result:
node_id latest_role_switch_timestamp
1 1344258743
But this has not yielded any results for the subquery. Am I looking in the right direction or should I try something else? I don't know if this would be possible in mysql.
Also this uses a subquery, which, for performance reasons, is not ideal, considering this query will probably be used in overviews, meaning it would have to run that subquery for every message in the overview.
If you require any more information, please tell me, as I am at my wit's end
Join the table to a max-date summary of itself to get the messages of the last block, then use mysql's special group-by support to pick the first row from those for each conversation:
select * from (
select * from (
select m.*
from messages m
join (
select conv_id, type, max(created_at) last_created
from messages
group by 1,2) x
on x.conv_id = m.conv_id
and x.type != m.type
and x.last_created < m.created_at) y
order by created_at) z
group by conv_id
This returns the whole row that was the first message of the last block.
See SQLFiddle.
Performance will be pretty good, because there are no correlated subqueries.