Mysql: Order by max N values from subquery - mysql

I'm about to throw in the towel with this.
Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.
I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.
So basically I've got something like this:
SELECT tbl.id
FROM
tbl1 AS maintable
LEFT JOIN
tbl2 AS othertable
ON
maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC,
maintable.timestamp DESC
Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.
Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.
I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:
ORDER BY
(
IF(othertable.timestamp IS NULL, 0,
IF(
(#rank:=#rank+1) > 3, null, othertable.timestamp
)
)
) DESC
(with a #rank:=0 preceding the statement)
So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.
Thanks to all in advance.
EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:
maintable
id timestamp
1 100
2 200
3 300
4 400
5 500
6 600
othertable
id timestamp
4 250
5 350
3 550
1 700
=>
1
3
5
6
4
2
And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:
1
3
4
6
2
...because now 4 is among the top 3 values in othertable referring to this set.
So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.
The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.

Maybe something like this.
SELECT
id
FROM
(SELECT
tbl.id,
CASE WHEN othertable.timestamp IS NULL THEN
0
ELSE
#i := #i + 1
END AS num,
othertable.timestamp as othertimestamp,
maintable.timestamp as maintimestamp
FROM
tbl1 AS maintable
CROSS JOIN (select #i := 0) i
LEFT JOIN tbl2 AS othertable
ON maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC) t
ORDER BY
CASE WHEN num > 0 AND num <= 3 THEN
othertimestamp
ELSE
maintimestamp
END DESC

Modified answer:
select ilv.* from
(select sq.*, #i:=#i+1 rn from
(select #i := 0) i
CROSS JOIN
(select m.*, o.id o_id, o.timestamp o_t
from maintable m
left join othertable o
on m.id = o.id
where 1=1
order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
timestamp desc
SQLFiddle here.
Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.

Can you use a union query as below?
(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC
This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

Related

Get subtraction done through mysql

I have below mentioned table called myData1
ID Value
1 150
2 120
3 100
I could get the last two values using below query:
SELECT value from myData1 order by ID desc limit 2;
I need to get the subtraction total of these two values (in this result set the result should be 100-120==> -20
Appreciate if someone can help to get this result
Approach 1
Use Correlated Subquery to get the first and second last value as two separate columns.
You can then use the result-set as Derived Table, to compute the difference.
Try (DB Fiddle DEMO #1):
SELECT dt.last_value - dt.second_last_value
FROM
(
SELECT
t1.Value as last_value,
(SELECT t2.Value
FROM myData1 AS t2
WHERE t2.ID < t1.ID
ORDER BY t2.ID DESC LIMIT 1) AS second_last_value
FROM myData1 AS t1
ORDER BY t1.ID DESC LIMIT 1
) AS dt
Approach 2
Break into two different Select Queries; Use Limit with Offset. For last item, use a factor of 1. For second last, use the factor of -1.
Combine these results using UNION ALL into a Derived Table.
Eventually, sum the values using respective factors.
You can do the following (DB Fiddle DEMO #2):
SELECT SUM(dt.factor * dt.Value)
FROM
(
(SELECT Value, 1 AS factor
FROM myData1
ORDER BY ID DESC LIMIT 0,1)
UNION ALL
(SELECT Value, -1 AS factor
FROM myData1
ORDER BY ID DESC LIMIT 1,1)
) AS dt

MySQL grouping with detail

I have a table that looks like this...
user_id, match_id, points_won
1 14 10
1 8 12
1 12 80
2 8 10
3 14 20
3 2 25
I want to write a MYSQL script that pulls back the most points a user has won in a single match and includes the match_id in the results - in other words...
user_id, match_id, max_points_won
1 12 80
2 8 10
3 2 25
Of course if I didn't need the match_id I could just do...
select user_id, max(points_won)
from table
group by user_id
But as soon as I add match_id to the "select" and "group by" I have a row for every match, and if I only add the match_id to the "select" (and not the "group by") then it won't correctly relate to the points_won.
Ideally I don't want to do the following either because it doesn't feel particularly safe (e.g. if the user has won the same amount of points on multiple matches)...
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
Are there any more elegant options for this problem?
This is harder than it needs to be in MySQL. One method is a bit of a hack but it works in most circumstances. That is the group_concat()/substring_index() trick:
select user_id, max(points_won),
substring_index(group_concat(match_id order by points_won desc), ',', 1)
from table
group by user_id;
The group_concat() concatenates together all the match_ids, ordered by the points descending. The substring_index() then takes the first one.
Two important caveats:
The resulting expression has a type of string, regardless of the internal type.
The group_concat() uses an internal buffer, whose length -- by default -- is 1,024 characters. This default length can be changed.
You can use the query:
select user_id, max(points_won)
from table
group by user_id
as a derived table. Joining this to the original table gets you what you want:
select t1.user_id, t1.match_id, t2.max_points_won
from table as t1
join (
select user_id, max(points_won) as max_points_won
from table
group by user_id
) as t2 on t1.user_id = t2.user_id and t1.points_won = t2.max_points_won
I think you can optimize your query by add limit 1 in the inner query.
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won limit 1) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
EDIT : only for postgresql, sql-server, oracle
You could use row_number :
SELECT USER_ID, MATCH_ID, POINTS_WON
FROM
(
SELECT user_id, match_id, points_won, row_number() over (partition by user_id order by points_won desc) rn
from table
) q
where q.rn = 1
For a similar function, have a look at Gordon Linoff's answer or at this article.
In your example, you partition your set of result per user then you order by points_won desc to obtain highest winning point first.

Fetch only 2 results, one for each different value of a concrete field

In a MYSQL table with those 5 fields: id, user_id, date, type, uid where type can be 1 or 2, I'm looking for a single query where I can fetch 2 results, one for type=1 and another one for type=2 based on date field.
Right now i have the following query which only gives me the last uid without taking care of the type field.
SELECT t.uid
FROM table AS t
WHERE t.user_id = 666
ORDER BY t.date
DESC LIMIT 1
Does anyone know how should modify this query so i can get the last uid for type=1 and the last one for type=2 based on date field? I would like to keep a a single query
Union all is probably the simplest method:
(select t.*
from t
where t.user_id = 666 and t.type = 1
order by date desc
limit 1
) union all
(select t.*
from t
where t.user_id = 666 and t.type = 2
order by date desc
limit 1
)
Finally i updated the query following this "paradigm":
http://dev.mysql.com/doc/refman/5.7/en/example-maximum-column-group-row.html
http://jan.kneschke.de/projects/mysql/groupwise-max/
This is how the query ended up:
SELECT s1.type, s1.uid
FROM t AS s1
LEFT JOIN t AS s2 ON s1.type = s2.type AND s1.date < s2.date
WHERE s2.date IS NULL;
Here's a visual example: http://hastebin.com/ibinidasuw.vhdl
Credits are for snoyes from #sql on Freenode. :)

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

mysql limit inside group?

I want to limit the size of records inside a group, and here is my trial, how to do it right?
mysql> select * from accounts limit 5 group by type;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the
right syntax to use near 'group by type' at line 1
The point of an aggregate function (and the GROUP BY it requires) is to turn many rows into one row. So if you really just want the top 5 savings accounts and the top 5 chequing accounts and the top 5 USD accounts etc., what you need is more like this:
criteria: top 5 of particular account type by account_balance
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
UNION
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5;
It's not pretty, but if you construct the SQL with a script then subbing in the account_types and concatenating together a query is straightforward.
I've had some luck with using numbered rows:
set #type = '';
set #num = 0;
select
items.*,
#num := if(#type = item_type, #num + 1, 1) as dummy_1,
#type := item_type as dummy_2,
#num as row_number
from items
group by
item_type,
row_number
having row_number < 3;
This will give you 2 results per item_type. (One gotcha: make sure you re-run the first two set statements otherwise your row numbers will steadily get higher and higher and the row_number < 3 restriction won't work.
I pieced this together from a couple of posts which have been linked in other answers on SO.
It appears you want to limit the number of rows returned within each group of your overall result set... this is difficult to do in a way that scales well. One technique is to perform N joins on the same table with the conditions such that the only rows that match are the top/bottom N that you want.
this page may offer some additional insight into your solution... although returning the top 5 in each group is going to get ugly fast.
Try placing the LIMIT clause after the GROUP BY clause.
EDIT: Try this:
SELECT *
FROM accounts a1
WHERE 5 >
(
SELECT COUNT(*)
FROM accounts a2
WHERE a2.type = a1.type
AND a2.balance > a1.balance
)
This returns at most 5 accounts of each type with the biggest balances.
Group by is used for aggregate functions (sums, averages...)
Is allows you to split the aggregate result into groups. You have not used one of these functions.
I am not sure you can use a limit in the group by. You can probably use it if your group by is a sub select that returns one row/value. For example:
select * from foo order by (select foo2.id from foo2 limit 1)
I am just guessing this would work.
This will probably do the trick, although if type isn't indexed, it'll be sloooowwww. And even with one, it's not especially fast:
SELECT a.*
FROM accounts a
LEFT JOIN accounts a2 ON (a2.type = a.type AND a2.id < a.id)
WHERE count(a2.id) < 5
GROUP BY a.id;
A better bet would be to just order the list by type and then use a loop at the business layer to remove the rows you don't want.
#dnagirl's answer almost has it, but for some reason, my version of MySQL only returns the first LIMIT'd set. To get around that, I put each statement into a subquery
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='savings'
ORDER BY account_balance DESC LIMIT 5
) as a
UNION
SELECT* FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='chequing'
ORDER BY account_balance DESC LIMIT 5
) as b
UNION
SELECT * FROM (
SELECT account_type, account_balance FROM accounts WHERE account_type='USD'
ORDER BY account_balance DESC LIMIT 5
) as c
This gave me back each set's results in the final result set. Otherwise, I would have only gotten the first 5 from the first query and nothing else - not sure if it's just some MySQL funk with my version