Selecting one row from a group in SQL with multiple selectors - mysql

(This question is specific to MySQL 5.6, which does not include CTEs)
Let's say I have a table like this (actual table is made from a subquery with several joins and is a fair bit more complex):
ID NAME POWER ACCESS_LEVEL DEPTH
1 Odin poetry 1 0
1 Isis song 2 1
2 Enki water 1 0
2 Zeus storms 2 2
2 Thor hammer 2 3
I want to first group them up by ID (it's actually a double grouping by PRINCIPAL_TYPE, PRINCIPAL_ID if that matters), then select one row from each group, preferring the row with the highest ACCESS_LEVEL, and among rows with the same access, choosing the one with the lowest depth. No two rows will have the same (ID, DEPTH) so we don't need to worry beyond that point.
For example with the above table, we select:
ID NAME POWER ACCESS_LEVEL DEPTH
1 Isis song 2 1
2 Zeus storms 2 2
The groups are (Odin, Isis) and (Enki, Thor, Zeus). In the first group, we prefer Odin over Isis because Odin has a higher ACCESS_LEVEL. In the second group, we take Zeus because Zeus and Thor have higher ACCESS_LEVELs than Enki, and between those two, Zeus has the lower depth.
Worst case Ontario, I can do it at the application level, but doing it at the database level allows for using LIMIT and SORT BY to do paging, instead of fetching the whole result set.

Here is one method that uses a correlated subquery:
select t.*
from t
where (access_level, depth) = (select access_level, depth
from t t2
where t2.id = t.id
order by access_level desc, depth asc
limit 1
);

Could be you need some grouped table based on subquery
select *
my_table m2
inner join (
select id, t.level, min(depth) depth
from my_table m
inner join (
select id, max(ACCESS_LEVEL) level
from my_table
group by id ) t on t.id= m.id and t.level = m.access_level
group by id level ) t2 on t2.id = m2.id
and t2.depth = m2.depth and t2.level = m2.access_level

This may work.
select *
from your_table t1
where (t1.access_level, t1.depth) = (
select t2.access_level, min(t2.depth)
from your_table t2
where (t2.access_level, t2.id) = (
select max(t3.access_level), t3.id
from your_table t3
where t3.id = t1.id
)
)
However, I feel that Gordon's solution will probably lead to a better query plan.

Related

MYSQL JOIN sorted join, lost

I have two tables
tbl1:
id
name
tid
1
some text
1
tbl2:
tid
level
related_id
1
1
4
1
2
5
1
3
6
I want to join tbl1 to tbl2 on tbl1.tid = tbl2.tid, I only want one row joined from tbl2 based on the level for example I want the least level first that is level 1 row joined
joined table
id
name
tid
level
related_id
1
some text
1
1
4
is it possible to achieve this?
Try the following CTE, noting that row_number() function is supported only with MySQL v8.0 and higher.
with cte as
(select
tbl1.id, tbl1.name, tbl2.tid, tbl2.level_, tbl2.related_id
, row_number() over (partition by tbl1.id order by level_) as rn
from tbl1 inner join tbl2
on tbl1.tid=tbl2.tid)
select cte.id, cte.name, cte.tid, cte.level_, cte.related_id
from cte where rn=1
See the result from db-fiddle.
In comments, I suggested to use the min(level) over (partition by tid) function in a where clause, but this will not give the required results unless the level field is unique, where it's not the case as I guess.

select only when different value

I have this column: name and price. I don't really know how or why in mysql database there are few line that are double record exactly from the previous line.
how to select all records but show only one of the records if the record is double with a line in front or behind it?
For example I have this records:
id
name
price
1
book
5
2
lamp
7
3
lamp
7
4
book
5
5
book
5
the result I want is:
id
name
price
1
book
5
2
lamp
7
4
book
5
If you want to exclude rows that match the previous name, there are several ways like the following.
Case 1:
If you use MySQL8, you can use the LAG function.
SELECT t1.id,t1.name,t1.price FROM (
SELECT t2.id,t2.name,t2.price,
LAG(t2.name) OVER(ORDER BY t2.id) prev
FROM mytable t2
) t1
WHERE t1.prev IS NULL OR t1.name<>t1.prev
ORDER BY 1
Case 2:
If the ids are continuous without any steps, you will get the expected result by comparing name and the previous id by JOIN.
SELECT t1.id,t1.name,t1.price FROM mytable t1
LEFT JOIN mytable t2
ON t1.name=t2.name AND
t1.id=t2.id-1
WHERE t1.id=1 OR t2.id IS NOT NULL
ORDER BY 1
Case 3:
If the ids are not continuous, there is a way to get the maximum id that does not exceed the other id.
SELECT t1.id,t1.name,t1.price FROM mytable t1
LEFT JOIN mytable t2
ON t1.name=t2.name AND
t1.id=(SELECT MAX(t3.id) FROM mytable t3 WHERE t3.id<t2.id)
WHERE t1.id=1 OR t2.id IS NOT NULL
ORDER BY 1
DB Fiddle
Select distinct is not an option here as id column is always unique. I guess this will work for you:
select min(id), name, price from table_name group by name, price

SQL: SUM selected Rows [duplicate]

How can you select the top n max values from a table?
For a table like this:
column1 column2
1 foo
2 foo
3 foo
4 foo
5 bar
6 bar
7 bar
8 bar
For n=2, the result needs to be:
3
4
7
8
The approach below selects only the max value for each group.
SELECT max(column1) FROM table GROUP BY column2
Returns:
4
8
For n=2 you could
SELECT max(column1) m
FROM table t
GROUP BY column2
UNION
SELECT max(column1) m
FROM table t
WHERE column1 NOT IN (SELECT max(column1)
WHERE column2 = t.column2)
for any n you could use approaches described here to simulate rank over partition.
EDIT:
Actually this article will give you exactly what you need.
Basically it is something like this
SELECT t.*
FROM
(SELECT grouper,
(SELECT val
FROM table li
WHERE li.grouper = dlo.grouper
ORDER BY
li.grouper, li.val DESC
LIMIT 2,1) AS mid
FROM
(
SELECT DISTINCT grouper
FROM table
) dlo
) lo, table t
WHERE t.grouper = lo.grouper
AND t.val > lo.mid
Replace grouper with the name of the column you want to group by and val with the name of the column that hold the values.
To work out how exactly it functions go step-by-step from the most inner query and run them.
Also, there is a slight simplification - the subquery that finds the mid can return NULL if certain category does not have enough values so there should be COALESCE of that to some constant that would make sense in the comparison (in your case it would be MIN of domain of the val, in article it is MAX).
EDIT2:
I forgot to mention that it is the LIMIT 2,1 that determines the n (LIMIT n,1).
If you are using mySQl, why don't you use the LIMIT functionality?
Sort the records in descending order and limit the top n i.e. :
SELECT yourColumnName FROM yourTableName
ORDER BY Id desc
LIMIT 0,3
Starting from MySQL 8.0/MariaDB support window functions which are designed for this kind of operations:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY column2 ORDER BY column1 DESC) AS r
FROM tab) s
WHERE r <= 2
ORDER BY column2 DESC, r DESC;
DB-Fiddle.com Demo
This is how I'm getting the N max rows per group in MySQL
SELECT co.id, co.person, co.country
FROM person co
WHERE (
SELECT COUNT(*)
FROM person ci
WHERE co.country = ci.country AND co.id < ci.id
) < 1
;
how it works:
self join to the table
groups are done by co.country = ci.country
N elements per group are controlled by ) < 1 so for 3 elements - ) < 3
to get max or min depends on: co.id < ci.id
co.id < ci.id - max
co.id > ci.id - min
Full example here:
mysql select n max values per group/
mysql select max and return multiple values
Note: Have in mind that additional constraints like gender = 0 should be done in both places. So if you want to get males only, then you should apply constraint on the inner and the outer select

Greatest-n-per-group within a second group

I've got a database where each entry is an edge with a source tag, a relationship and a weight. I want to perform a query where given a source tag, I get the top n edges by weight with that source tag per relationship.
For example, given the entries
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
2 cat isA animal 12
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
6 dog isA canine 72
If I queried using "cat" as a source and n=2, the result should be
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
I've tried several different approaches based on other questions.
The most sucessful so far is based on How to SELECT the newest four items per category?
SELECT *
FROM tablename t1
JOIN tablename t2 ON (t1.relationship = t2.relationship)
LEFT OUTER JOIN tablename t3
ON (t1.relationship = t3.relationship AND t2.weight < t3.weight)
WHERE t1.source = "cat"
AND t3.relationship IS NULL
ORDER BY t2.weight DESC;
However, this returns all the edges with source="cat" in sorted order. If I try to add LIMIT, I get the edges with the top weights not by group.
The other thing that I have tried is
SELECT *
FROM tablename t1
WHERE t1.source="cat"
AND (
SELECT COUNT(*)
FROM tablename t2
WHERE t1.relationship = t2.relationship
AND t1.weight <= t2.weight
) <= 2;
This returns
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
4 cat desires food 5
5 cat desires play 88
Because edge 6 has a higher weight for the isA relationship than edge 2, but is excluded from the results because the source="dog"
I am very new to databases, so if I need to take a completely different approach, let me know. I'm not afraid of starting over.
Doing this with the correlated subquery is indeed inefficient, because MySQL has to run the subquery for every row of the outer query, just to decide if the row in the outer query meets the conditions. That's a lot of overhead.
Here's a method using no subquery:
SELECT t1.*
FROM tablename t1
JOIN tablename t2 ON t1.source = t2.source and t1.relationship = t2.relationship
AND t1.weight <= t2.weight
WHERE t1.source = 'cat'
GROUP BY t1.id
HAVING COUNT(*) <= 2;
And here's a method using neither subquery, nor joins/group by:
SELECT *
FROM (
SELECT tablename.*, IF(#r = relationship, #n:=#n+1, #n:=1) AS _n,
#r:=relationship AS _r
FROM (SELECT #r:=null, #n:=1) _init, tablename
WHERE source = 'cat'
ORDER BY relationship, weight DESC
) AS _t
WHERE _n <= 2;
These solutions also need some tiebreaker in case there are multiple rows with the same top weights. But that applies to all the solutions.
The simpler solution, which wouldn't require special gymnastics or tiebreakers, is to use SQL window functions like ROW_NUMBER() OVER (PARTITION BY relationship), but MySQL does not support these.
It won't be too efficient, but MySQL allows you to do something like this:
SELECT t1.*
FROM
tablename t1 INNER JOIN (
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(Id ORDER BY Weight DESC),
',',
2) top_2
FROM tablename
WHERE Source='cat'
GROUP BY Relationship) t2
ON FIND_IN_SET(t1.id, t2.top_2);
Please see fiddle here.

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?