Use the result of a sub-query outside of the sub-query - mysql
I have a table structured like this.
User_id
Subscription_type
timestamp
100
PAYING
2/10/2021
99
TRIAL
2/10/2021
100
TRIAL
15/9/2021
I want my output to be the same, with an additional column pulling the trial start date when the subscriber converts to a paying subscription.
User_id
Subscription_type
timestamp
Trial_Start_date
100
PAYING
2/10/2021
15/9/2021
99
TRIAL
2/10/2021
100
TRIAL
2/10/2021
At the moment, I have this query:
SELECT *,
CASE WHEN
(SELECT `subscription_type` FROM subscription_event se1
WHERE se1.`timestamp` < se.`timestamp` AND se1.user_id = se.user_id
ORDER BY user_id DESC LIMIT 1) = 'TRIAL'
then se1.`timestamp` else 0 end as "Converted_from_TRIAL"
FROM subscription_event se
I have an error message with se1.timestamp not been defined. I understand why, but I cannot see a workaround.
Any pointer?
If you need to get two values out of the subquery, you have to join with it, not use it as an expression.
SELECT se.*,
MAX(se1.timestamp) AS Converted_from_TRIAL
FROM subscription_event AS se
LEFT JOIN subscription_event AS se1 ON se.user_id = se1.user_id AND se1.timestamp < se.timestamp AND se1.subscription_type = 'TRIAL'
GROUP BY se.user_id, se.subscription_type, se.timestamp
Thanks a lot!
For some reasons I needed to declare explicitely in SELECT the variables used in the GROUP BY . Not sure why ( I am using MySQL5.7 so maybe it is linked with that).
In any case, this is the working query.
SELECT se.user_id, se.subscription_type, se.timestamp,
MAX(se1.timestamp) AS Converted_from_TRIAL
FROM subscription_event AS se
LEFT JOIN subscription_event AS se1 ON se.user_id = se1.user_id AND se1.timestamp < se.timestamp AND se1.subscription_type = 'TRIAL'
GROUP BY se.user_id, se.subscription_type, se.timestamp
Related
Optimizing Parameterized MySQL Queries
I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run. If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds). I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time. The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < #lim) while the sproc the values have been set (where limit < 300). I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the # parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL. I feel like I'm missing something obvious here but I don't know what it is. The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL. My main question is why the query is 20x faster in one format than another. Does the way the query is sent to MySQL have anything to do with this (the '#' values sent through and can I replicate this in a stored procedure? Updated 1st Jan Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not) Here is the original query: with tmp_bat as (select bd.MatchId, bd.matchtype, bd.playerid, bd.teamid, bd.opponentsid, bd.inningsnumber, bd.dismissal, bd.dismissaltype, bd.bowlerid, bd.fielderid, bd.score, bd.position, bd.notout, bd.balls, bd.minutes, bd.fours, bd.sixes, bd.hundred, bd.fifty, bd.duck, bd.captain, bd.wicketkeeper, m.hometeamid, m.awayteamid, m.matchdesignator, m.matchtitle, m.location, m.tossteamid, m.resultstring, m.whowonid, m.howmuch, m.victorytype, m.duration, m.ballsperover, m.daynight, m.LocationId from (select * from battingdetails where matchid in (select id from matches where id in (select matchid from battingdetails) and matchtype = #match_type )) as bd join matches m on m.id = bd.matchid join extramatchdetails emd1 on emd1.MatchId = m.Id and emd1.TeamId = bd.TeamId join extramatchdetails emd2 on emd2.MatchId = m.Id and emd2.TeamId = bd.TeamId ) select players.fullname name, teams.teams team, '' opponents, players.sortnamepart, innings.matches, innings.innings, innings.notouts, innings.runs, HS.score highestscore, HS.NotOut, CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg', innings.hundreds, innings.fifties, innings.ducks, innings.fours, innings.sixes, innings.balls, CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground, '' Year, '' CountryName from (select count(case when inningsnumber = 1 then 1 end) matches, count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings, LocationId, playerid, MatchType, SUM(score) runs, SUM(notout) notouts, SUM(hundred) Hundreds, SUM(fifty) Fifties, SUM(duck) Ducks, SUM(fours) Fours, SUM(sixes) Sixes, SUM(balls) Balls from tmp_bat group by MatchType, playerid, LocationId) as innings JOIN players ON players.id = innings.playerid join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType join (select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams from playersteams pt join teams t on pt.teamid = t.id group by pt.playerid, t.matchtype) as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType JOIN (SELECT playerid, LocationId, MAX(Score) Score, MAX(NotOut) NotOut FROM (SELECT battingdetails.playerid, battingdetails.score, battingdetails.notout, battingdetails.LocationId FROM tmp_bat as battingdetails JOIN (SELECT battingdetails.playerid, battingdetails.LocationId, MAX(battingdetails.Score) AS score FROM tmp_bat as battingdetails GROUP BY battingdetails.playerid, battingdetails.LocationId, battingdetails.playerid) AS maxscore ON battingdetails.score = maxscore.score AND battingdetails.playerid = maxscore.playerid AND battingdetails.LocationId = maxscore.LocationId ) AS internal GROUP BY internal.playerid, internal.LocationId) AS HS ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId where innings.runs >= #runs_limit order by runs desc, KnownAs, SortNamePart limit 0, 300; Wherever you see '#match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this: 1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort 1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100, 1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where 1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where 1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100, 8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary 8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index 14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary 7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort 7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100, 2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary 3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where 3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100, 3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index 3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index 3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100, 3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails) and the EXPLAIN for the query with the hardcoded values looks like this: 1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort 1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100, 1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where 1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where 1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where 8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary 8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100, 8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index 8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index 8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100, 8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where 8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails) 14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary 14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100, 14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index 14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index 14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100, 14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails) 7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort 7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100, 2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary 2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100, 2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index 2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index 2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100, 2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails) Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much Update 2 1st Jan AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared
In your WITH construct, are you overthinking your select in ( select in ( select in ))) ... overstating what could just be simplified to the with Innings I have in my solution. Also, you were joining to the extraMatchDetails TWICE, but joined on the same conditions on match and team, but never utliized either of those tables in the "WITH CTE" rendering that component useless, doesn't it? However, the MATCH table has homeTeamID and AwayTeamID which is what I THINK your actual intent was Also, your WITH CTE is pulling many columns not needed or used in subsequent return such as Captain, WicketKeeper. So, I have restructured... pre-query the batting details once up front and summarized, then you should be able to join off that. Hopefully this MIGHT be a better fit, function and performance for your needs. with innings as ( select bd.matchId, bd.matchtype, bd.playerid, m.locationId, count(case when bd.inningsnumber = 1 then 1 end) matches, count(case when bd.dismissaltype in ( 11, 14 ) then 0 else 1 end) innings, SUM(bd.score) runs, SUM(bd.notout) notouts, SUM(bd.hundred) Hundreds, SUM(bd.fifty) Fifties, SUM(bd.duck) Ducks, SUM(bd.fours) Fours, SUM(bd.sixes) Sixes, SUM(bd.balls) Balls from battingDetails bd join Match m on bd.MatchID = m.MatchID where matchtype = #match_type group by bd.matchId, bd.matchType, bd.playerid, m.locationId ) select p.fullname playerFullName, p.sortnamepart, CONCAT(g.CountryName, ' - ', g.KnownAs) Ground, t.team, i.matches, i.innings, i.runs, i.notouts, i.hundreds, i.fifties, i.ducks, i.fours, i.sixes, i.balls, CAST( TRUNCATE( i.runs / (CAST((i.Innings - i.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg', hs.maxScore, hs.maxNotOut, '' opponents, '' Year, '' CountryName from innings i JOIN players p ON i.playerid = p.id join grounds g on i.locationId = g.GroundId and i.matchType = g.matchType join (select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') team from playersteams pt join teams t on pt.teamid = t.id group by pt.playerid, t.matchtype) as t on i.playerid = t.playerid and i.MatchType = t.matchtype join ( select i2.playerid, i2.locationid, max( i2.score ) maxScore, max( i2.notOut ) maxNotOut from innings i2 group by i2.playerid, i2.LocationId ) HS on i.playerid = HS.playerid AND i.locationid = HS.locationid FROM where i.runs >= #runs_limit order by i.runs desc, g.KnownAs, p.SortNamePart limit 0, 300; Now, I know that you stated that after the server reboot, performance is better, but really, what you DO have appears to really have overbloated queries.
Not sure this is the correct answer but I thought I'd post this in case other people have the same issue. The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run! I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this: create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT, IN opponents_id INT) begin set #team_id = team_id; set #opponents_id = opponents_id; # use these variables in the SQL below ... end Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.
My SUM with cases seems to be repeating twice
I have some camp management software that registers users for a camp. I am trying to get how much a user owes on their account based on how much a camp costs and whether they are using the bus, and whether or not they sign up for the horse option. (These all cost extra). I originally was grouping by registration_ids which a camper can have multiple of if they sign up for a camp. But when I put this in I get this: https://imgur.com/i63Bnsu This is my sql: SELECT srbc_campers.camper_id, /*Calculate how much the user owes*/ SUM( srbc_camps.cost + (CASE WHEN srbc_registration.horse_opt = 1 THEN srbc_camps.horse_opt_cost ELSE 0 END) + (CASE WHEN srbc_registration.busride = 'to' THEN 35 WHEN srbc_registration.busride = 'from' THEN 35 WHEN srbc_registration.busride = 'both' THEN 60 ELSE 0 END) - IF(srbc_registration.discount IS NULL,0,srbc_registration.discount) - IF(srbc_registration.scholarship_amt IS NULL,0,srbc_registration.scholarship_amt) ) AS owe FROM ( srbc_registration INNER JOIN srbc_camps ON srbc_registration.camp_id=srbc_camps.camp_id) INNER JOIN srbc_payments ON srbc_registration.registration_id = srbc_payments.registration_id) INNER JOIN srbc_campers ON srbc_campers.camper_id=srbc_registration.camper_id) WHERE NOT srbc_payments.payment_type='Store' GROUP BY srbc_campers.camper_id This seems to be affected by how many payments they have made in their account. It multiplies the amount they owe times how many individual payments were made toward that camp. I can't figure out how to stop this. For instance in picture above^ We have camper_id #4 and they owe 678. I expect camper_id #4 to owe 339. They have made 2 payments on their account in srbc_payments. Haven't been using sql for that long, so any suggestions for a better way I am open too!
You are not selecting anything from srbc_payments, just checking for registration_id in srbc_payments. Or did you forget to subtract payments from srbc_payments? You can replace the inner join with: where srbc_registration.registration_id in ( select t1.registration_id from srbc_payments t1 where t1.registration_id = srbc_registration.registration_id and t1.payment_type <> 'Store' )
This is what I ended up getting to work how I wanted it too: SELECT owedTble.registration_id,owe FROM (SELECT registration_id, SUM( srbc_camps.cost + (CASE WHEN srbc_registration.horse_opt = 1 THEN srbc_camps.horse_opt_cost ELSE 0 END) + (CASE WHEN srbc_registration.busride = 'to' THEN 35 WHEN srbc_registration.busride = 'from' THEN 35 WHEN srbc_registration.busride = 'both' THEN 60 ELSE 0 END) - IF(srbc_registration.discount IS NULL,0,srbc_registration.discount) - IF(srbc_registration.scholarship_amt IS NULL,0,srbc_registration.scholarship_amt) ) AS owe FROM srbc_camps INNER JOIN srbc_registration ON srbc_camps.camp_id=srbc_registration.camp_id GROUP BY srbc_registration.registration_id ) as owedTble I kind of understand what I did here. I ended up trying different things from this answer: My SUM with cases seems to be repeating twice Thanks for the helpful comments from #nick and #a_horse_with_no_name
msAccess query with Sum and extra criteria
I need to get all of the Costs values for a Dog in a specific month. When I use this code with Access it says the join operation is not supported. Is there a better way to accomplish this in MS Access? I need all of the dog names to come back even if they don't have a cost associated with them for a specific month Select Dog.DogName, Dog.DogOwner, Sum(Costs.CostAmount) From (Dog Left join Costs on Dog.DogName = Costs.DogName and Costs.CostMonth = 10) Group by Dog.DogName, Dog.OwnerName
Try this: Select Dog.DogName, Dog.DogOwner, Sum(Costs.CostAmount) As TotalAmount From Dog Left join Costs On (Dog.DogName = Costs.DogName) Where Costs.CostMonth <= Month(Date()) Or Costs.CostMonth Is Null Group by Dog.DogName, Dog.OwnerName
SELECT Dogs.DogName , Dogs.OwnerName , ( SELECT SUM(Costs.CostAmountAmount) FROM Costs WHERE Dogs.DogName = Costs.DogName AND Costs.CostMonth =NumMonth ) FROM Dogs;
MySQL Ignoring Outliers
I have to present some data to work colleagues and i am having issues analysing it in MySQL. I have 1 table called 'payments'. Each payment has columns for: Client (our client e.g. a bank) Amount_gbp (the GBP equivalent of the value of the transaction) Currency Origin_country Client_type (individual or company) I have written pretty simple queries like: SELECT AVG(amount_GBP), COUNT(client) AS '#Of Results' FROM payments WHERE client_type = 'individual' AND amount_gbp IS NOT NULL AND currency = 'TRY' AND country_origin = 'GB' AND date_time BETWEEN '2017/1/1' AND '2017/9/1' But what i really need to do is eliminate outliers from the average AND/OR only include results within a number of Standard Deviations from the Mean. For example, ignore the top/bottom 10 results of 2% of results etc. AND/OR ignore any results that fall outside of 2 STDEVs from the Mean Can anyone help?
--- EDITED ANSWER -- TRY AND LET ME KNOW --- Your best best is to create a TEMPORARY table with the avg and std_dev values and compare against them. Let me know if that is not feasible: CREATE TEMPORARY TABLE payment_stats AS SELECT AVG(p.amount_gbp) as avg_gbp, STDDEV(amount_gbp) as std_gbp, (SELECT MIN(srt.amount_gbp) as max_gbp FROM (SELECT amount_gbp FROM payments <... repeat where no p. ...> ORDER BY amount_gbp DESC LIMIT <top_numbers to ignore> ) srt ) max_g, (SELECT MAX(srt.amount_gbp) as min_gbp FROM (SELECT amount_gbp FROM payments <... repeat where no p. ...> ORDER BY amount_gbp ASC LIMIT <top_numbers to ignore> ) srt ) min_g FROM payments WHERE client_type = 'individual' AND amount_gbp IS NOT NULL AND currency = 'TRY' AND country_origin = 'GB' AND date_time BETWEEN '2017/1/1' AND '2017/9/1'; You can then compare against the temp table SELECT AVG(p.amount_gbp) as avg_gbp, COUNT(p.client) AS '#Of Results' FROM payments p WHERE p.amount_gbp >= (SELECT (avg_gbp - std_gbp*2) FROM payment_stats) AND p.amount_gbp <= (SELECT (avg_gbp + std_gbp*2) FROM payment_stats) AND p.amount_gbp > (SELECT min_g FROM payment_stats) AND p.amount_gbp < (SELECT max_g FROM payment_stats) AND p.client_type = 'individual' AND p.amount_gbp IS NOT NULL AND p.currency = 'TRY' AND p.country_origin = 'GB' AND p.date_time BETWEEN '2017/1/1' AND '2017/9/1'; -- Later on DROP TEMPORARY TABLE payment_stats; Notice I had to repeat the WHERE condition. Also change *2 to whatever <factor> to what you need! Still Phew! Each compare will check a different stat Let me know if this is better
Getting the latest 'role-switch' timestamp from a messages table
Problem I am looking at trying to get the lowest timestamp (earliest) after the 'side' has changed in a ticket conversation, to see how long it has been since the first reply to the latest message. Example: A (10:00) : Hello A (10:05) : How are you? B (10:06) : I'm fine, thank you B (10:08) : How about you? A (10:10) : I'm fine too, thank you <------ A (10:15) : I have to go now, see you around! Now what I am looking for is the timestamp of the message indicated by the arrow. The first message after the 'side' of the conversation changed, in this case from user to support. Example data from table "messages": mid conv_id uid created_at message type 2750 1 3941 1341470051 Hello support 3615 1 3941 1342186946 How are you? support 4964 1 2210 1343588022 I'm fine, thank you user 4965 1 2210 1343588129 How about you? user 5704 1 3941 1344258743 I'm fine too, thank you support 5706 1 3941 1344258943 I have to go now, see you around! support What I have tried so far: select n.nid AS `node_id`, ( SELECT m_inner.created_at FROM messages m_inner WHERE m_inner.mid = messages.mid AND CASE WHEN MAX(m_support.created_at) < MAX(m_user.created_at) THEN -- latest reply from user m_support.created_at ELSE m_user.created_at END <= m_inner.created_at ORDER BY messages.created_at ASC LIMIT 0,1 ) AS `latest_role_switch_timestamp` from node n left join messages m on n.nid = messages.nid left join messages m_user on n.nid = m_user.nid and m_user.type = 'user' left join messages m_support on n.nid = m_support.nid and m_support.type = 'support' GROUP BY messages.type, messages.nid ORDER BY messages.nid, messages.created_at DESC Preferred result: node_id latest_role_switch_timestamp 1 1344258743 But this has not yielded any results for the subquery. Am I looking in the right direction or should I try something else? I don't know if this would be possible in mysql. Also this uses a subquery, which, for performance reasons, is not ideal, considering this query will probably be used in overviews, meaning it would have to run that subquery for every message in the overview. If you require any more information, please tell me, as I am at my wit's end
Join the table to a max-date summary of itself to get the messages of the last block, then use mysql's special group-by support to pick the first row from those for each conversation: select * from ( select * from ( select m.* from messages m join ( select conv_id, type, max(created_at) last_created from messages group by 1,2) x on x.conv_id = m.conv_id and x.type != m.type and x.last_created < m.created_at) y order by created_at) z group by conv_id This returns the whole row that was the first message of the last block. See SQLFiddle. Performance will be pretty good, because there are no correlated subqueries.