MySQL - joining a table to itself / sub queries - mysql

I'm asking for help on an assessment question I recently got wrong, I've tried a number of solutions and think I kind of know what I'm trying to do, but can't seem to figure out the syntax.
I have a table that looks like the below but with more records.
MyTable
ID Name DivisionID ManagerID Salary
123 John Smith 100 789 40000
456 Harold Johnson 101 null 60000
789 Vicky Brown 100 null 80000
and have to select the row of the person with the 3rd highest salary, which I had no problem with. However, I also need to return, instead of ManagerID, the Manager Name, which needs to be looked up from the same table.
I've tried the following solution which seems to be a bit inelegant and has to have the same query hard-coded within it, so not ideal for scaling or general use:
SELECT
table.ID,
Name,
DivisionID,
(SELECT
Name FROM table WHERE id=(
SELECT ManagerID FROM table ORDER BY Salary DESC LIMIT 2,1)
) AS ManagerName,
Salary
FROM table
ORDER BY Salary DESC LIMIT 2,1;
I think there may be some way of doing this with subqueries, e.g. first selecting a separate table within the query of just manager id and name, and then selecting from this - but I just can't seem to get the syntax right or get my head around it. I think it might also be possible with table aliases where I select two different results from the same table under different aliases and then join the two, but again just can't figure out how to do this. Below is what I've tried to do with aliases
SELECT
a.ID,
a.Name,
a.DivisionID,
b.Name AS ManagerName
a.Salary
FROM table a
INNER JOIN table b ON a.ManagerID=b.ID
ORDER BY Salary DESC LIMIT 2,1;

First of all, when asked to return the nth greatest/least value, you must ask back what to do in case of ties. They want the person with third highest salary, so with salaries 1000, 1000, 900, 900, 800, 800, 700, 600, 500, I'd suppose you want to return the persons that earn 800, because that is the third highest salary. If you just order the persons by salary, skip two and take the third, then you pick one of the persons with a salary of 900 arbitrarily, and 900 is not even the third highest, but the second highest salary.
In order to get the manager, simply join the table again. You should use an outer join for the case that an employee with the third highest salary is a manager themselves.
The straight-forward solution is to rank the rows with DENSE_RANK:
select *
from
(
select t.*, dense_rank() over (order by salary desc) as rnk
from mytable t
) employee
left join mytable manager on manager.id = employee.managerid
where employee.rnk = 3;
MySQL supports DENSE_RANK since version 8. In older versions you must look up the same table again. Select the distinct salaries and use your limit/offset clause on those saleries.
select *
from mytable employee
left join mytable manager on manager.id = employee.managerid
where employee.salary =
(
select distinct salary
from mytable
order by salary desc
limit 2, 1
);
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=6b17e369fcd4f99ddc6c268de15f08a1

Maka a subquersy form you code for the thrd and join selj join the table
This works also in MySQL 5.7 and earlier
CREATE TABLE MyTable (
`ID` INTEGER,
`Name` VARCHAR(14),
`DivisionID` INTEGER,
`ManagerID` VARCHAR(4),
`Salary` INTEGER
);
INSERT INTO MyTable
(`ID`, `Name`, `DivisionID`, `ManagerID`, `Salary`)
VALUES
('123', 'John Smith', '100', '789', '40000'),
('456', 'Harold Johnson', '101', 'null', '60000'),
('789', 'Vicky Brown', '100', 'null', '80000');
SELECT m1.ID, m1.Name, m1. DivisionID,m2. Name, m1.Salary
FROM (SELECT `ID`, `Name`, `DivisionID`, `ManagerID`, `Salary` FROM MyTable ORDER BY Salary DESC LIMIT 2,1) m1 JOIN MyTable m2 on m1.ManagerID = m2.ID
ID | Name | DivisionID | Name | Salary
--: | :--------- | ---------: | :---------- | -----:
123 | John Smith | 100 | Vicky Brown | 40000
db<>fiddle here

Related

In SQL, How to output "NULL" instead of " There are no results to be displayed" when there's no value to be exported

Using MySQL v8.0 right now.
The question is:
Write an SQL query to report the id and the salary of the second highest salary from the
Employee table. If there is no second highest salary, the query should
report null.
My dummy data is:
Create table If Not Exists Employee (id int, salary int);
insert into Employee (id, salary) values
(1, 100);
My ideal output is like this:
+------+--------+
| id | salary |
+------+--------+
| NULL | NULL |
+------+--------+
I used DENSE_RANK as a more straightforward way for me to solve this question:
WITH sub AS (SELECT id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS num
FROM Employee )
SELECT id, salary
FROM sub
WHERE num = 2
But I have a problem exporting NULL when there's no second highest salary. I tried IFNULL, but it didn't work. I guess it's because the output is not actually null but just empty.
Thank you in advance.
WITH sub AS (
SELECT id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS num
FROM Employee
)
SELECT id, salary
FROM sub
WHERE num = 2
UNION ALL
SELECT NULL, NULL
WHERE 0 = ( SELECT COUNT(*)
FROM sub
WHERE num = 2 );
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=31f5afb0e7e5dce9c2c128ccc49a6f42
Just making your query a subquery and left joining from a single-row producing subquery seems to me the simplest approach:
select id, salary
from (select null) at_least_one_row
left join (
select id, salary
from (
select id, salary, dense_rank() over (order by salary desc) as num
from Employee
) ranked_employees
where num = 2
) second_highest_salary on true
(I usually prefer a subquery to a cte that's only used once; I find that obfuscatory.)

Selecting a subset of rows with MySql: conditionally limiting number of entries selected

this is a followup question to my previous query. I hope that posting a new question is appropriate in the circumstances: Selecting a subset of rows from a PHP table
I have an sql table that looks like this (for example):
id seller price amount
1 tom 350 500
2 tom 350 750
3 tom 350 750
4 tom 370 850
5 jerry 500 1000
I want to select one row per seller: in particular, for each seller I want the row with the cheapest price, and the largest amount at that price. In the example above, I want rows 2 and 5 (or 3 and 5, I don't care which of 2 and 3 I get as long as I only get one of them).
I am using this:
dbquery("SELECT a.* FROM $marketdb a
INNER JOIN
(
SELECT seller, MAX(amount) amount
FROM $marketdb
WHERE price=$minprice
GROUP BY seller
) b ON a.seller = b.seller AND
a.amount = b.amount;");
But this is giving me rows 2,3 and 5, and I only want one of rows 2 and 3.
I also have a nagging suspicion that this might not always return the minimum price rows either. My tests so far have been confused by the fact that I am getting more than one row with the same amount entered for a given seller.
If someone could point out my error I would be most appreciative.
Thanks!
EDIT: my apologies, I did not ask what I mean to ask. I would like rows returned from the global min price, max 1 per seller, not the min price for each seller. This would be only row 2 or 3 above. Sorry!
Just try adding another group by on seller as you want single row for a seller
to final query like
SELECT a.* FROM $marketdb a
INNER JOIN
(
SELECT seller, MAX(amount) amount
FROM $marketdb
WHERE price=$minprice
GROUP BY seller
)
b ON a.seller = b.seller AND
a.amount = b.amount group by a.seller;
Test this SQL fiddle:
http://sqlfiddle.com/#!2/7de03/2/0
CREATE TABLE `sellers` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`seller` VARCHAR(16) NOT NULL,
`price` FLOAT NOT NULL,
`amount` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `sellers` VALUES (1, 'tom', 350, 500);
INSERT INTO `sellers` VALUES (2, 'tom', 350, 750);
INSERT INTO `sellers` VALUES (3, 'tom', 350, 750);
INSERT INTO `sellers` VALUES (4, 'tom', 350, 850);
INSERT INTO `sellers` VALUES (5, 'jerry', 500, 600);
INSERT INTO `sellers` VALUES (6, 'jerry', 500, 1000);
INSERT INTO `sellers` VALUES (7, 'jerry', 500, 800);
SELECT * FROM
(SELECT DISTINCT * FROM sellers ORDER BY price ASC, amount DESC) t0
GROUP BY seller;
Kind of... works :)
There's an ugly hack at the end of this answer, but if you don't care which row is returned then I guess it saves some typing. Although, if you really don't care which row is returned, that tends to point to a more fundamental flaw in your schema design!
SELECT x.*
FROM market x
JOIN
( SELECT seller,MIN(price) min_price FROM market GROUP BY seller) y
ON y.seller = x.seller
AND y.min_price = x.price
JOIN
( SELECT seller,price,MAX(amount) max_amount FROM market GROUP BY seller,price) z
ON z.seller = y.seller
AND y.min_price = z.price
AND z.max_amount = x.amount
GROUP
BY seller;
Another method, which i dislike but which is popular with others here, goes something like this...
SELECT x.*
FROM
( SELECT *
FROM market
ORDER
BY seller
, price
, amount DESC
, id
) x
GROUP
BY seller;
You may need to GROUP BY the seller column outside of the join. Also, your WHERE clause looks like where price is a set number, instead of <=.
Query:
SQLFIDDLEExample
SELECT s.*
FROM sellers s
WHERE s.id = (SELECT s2.id
FROM sellers s2
WHERE s2.seller = s.seller
ORDER BY s2.price ASC, s2.amount DESC
LIMIT 1)
Result:
| ID | SELLER | PRICE | AMOUNT |
--------------------------------
| 2 | tom | 350 | 750 |
| 5 | jerry | 500 | 1000 |

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?

Finding a users maximum score and the associated details

I have a table in which users store scores and other information about said score (for example notes on score, or time taken etc). I want a mysql query that finds each users personal best score and it's associated notes and time etc.
What I have tried to use is something like this:
SELECT *, MAX(score) FROM table GROUP BY (user)
The problem with this is that whilst you can extra the users personal best from that query [MAX(score)], the returned notes and times etc are not associated with the maximum score, but a different score (specifically the one contained in *). Is there a way I can write a query that selects what I want? Or will I have to do it manually in PhP?
I'm assuming that you only want one result per player, even if they have scored the same maximum score more than once. I am also assuming that you want each player's first time that they got their personal best in the case that there are repeats.
There's a few ways of doing this. Here's a way that is MySQL specific:
SELECT user, scoredate, score, notes FROM (
SELECT *, #prev <> user AS is_best, #prev := user
FROM table1, (SELECT #prev := -1) AS vars
ORDER BY user, score DESC, scoredate
) AS T1
WHERE is_best
Here's a more general way that uses ordinary SQL:
SELECT T3.* FROM table1 AS T3
JOIN (
SELECT T1.user, T1.score, MIN(scoredate) AS scoredate
FROM table1 AS T1
JOIN (SELECT user, MAX(score) AS score FROM table1 GROUP BY user) AS T2
ON T1.user = T2.user AND T1.score = T2.score
GROUP BY T1.user
) AS T4
ON T3.user = T4.user AND T3.score = T4.score AND T3.scoredate = T4.scoredate
Result:
1, '2010-01-01 17:00:00', 50, 'Much better'
2, '2010-01-01 14:00:00', 100, 'Perfect score'
Test data I used to test this:
CREATE TABLE table1 (user INT NOT NULL, scoredate DATETIME NOT NULL, score INT NOT NULL, notes NVARCHAR(100) NOT NULL);
INSERT INTO table1 (user, scoredate, score, notes) VALUES
(1, '2010-01-01 12:00:00', 10, 'First attempt'),
(1, '2010-01-01 17:00:00', 50, 'Much better'),
(1, '2010-01-01 22:00:00', 30, 'Time for bed'),
(2, '2010-01-01 14:00:00', 100, 'Perfect score'),
(2, '2010-01-01 16:00:00', 100, 'This is too easy');
You can join with a sub query, as in the following example:
SELECT t.*,
sub_t.max_score
FROM table t
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The above query can be explained as follows. It starts with:
SELECT t.* FROM table t;
... This by itself will obviously list all the contents of the table. The goal is to keep only the rows that represent a maximum score of a particular user. Therefore if we had the data below:
+------------------------+
| user | score | notes |
+------+-------+---------+
| 1 | 10 | note a |
| 1 | 15 | note b |
| 1 | 20 | note c |
| 2 | 8 | note d |
| 2 | 12 | note e |
| 2 | 5 | note f |
+------+-------+---------+
...We would have wanted to keep just the "note c" and "note e" rows.
To find the rows that we want to keep, we can simply use:
SELECT MAX(score), user FROM table GROUP BY user;
Note that we cannot get the notes attribute from the above query, because as you had already noticed, you would not get the expected results for fields not aggregated with an aggregate function, like MAX() or not part of the GROUP BY clause. For further reading on this topic, you may want to check:
Debunking GROUP BY Myths
How does MySQL decide which id to return in group by clause?
Why does MySql allow “group by” queries WITHOUT aggregate functions?
Now we only need to keep the rows from the first query that match the second query. We can do this with an INNER JOIN:
...
JOIN (SELECT MAX(score) as max_score,
user
FROM table
GROUP BY user) sub_t ON (sub_t.user = t.user AND
sub_t.max_score = t.score);
The sub query is given the name sub_t. It is the set of all the users with the personal best score. The ON clause of the JOIN applies the restriction to the relevant fields. Remember that we only want to keep rows that are part of this subquery.
SELECT *
FROM table t
ORDER BY t.score DESC
GROUP BY t.user
LIMIT 1
Side note: It is better to specify the fields than use SELECT *

What is the SQL query for finding the name of manager who supervises maximum number of employees?

person_id | manager_id | name |
| | |
-------------------------------
Query to find name of manager who supervises maximum number of employees?
Added: This is the only table. Yes self-referencing. DB is mysql. Recursive queries will also do.
This query returns the manager_id and manager_name of the manager with the maximal number of employees.
The trick is in the HAVING clause, which allows aggregates and counts over multiple rows.
SELECT manager_id,name, count(*)
FROM table
GROUP BY manager_id, name
HAVING max(count(*));
You can read more in the short but informative w3schools.com HAVING clause tutorial.
If the manager_id references a person id in the same table, Svinto's answer might be more suitable.
SELECT name
FROM table
WHERE person_id = (
SELECT manager_id
FROM table
GROUP BY manager_id
HAVING max(count(*)))
It's not entirely clear to me what you want, so if this isn't what you want please clarify your question.
This query returns just one of the managers if there is a tie:
SELECT T2.name FROM (
SELECT manager_id
FROM table1
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY count(*) DESC
LIMIT 1
) AS T1
JOIN table1 AS T2
ON T1.manager_id = T2.person_id
Result of query:
Bar
Here's a query that fetches all managers with the tied maximum count in the case that there is a tie:
SELECT name FROM (
SELECT manager_id, COUNT(*) AS C
FROM person
WHERE manager_id IS NOT NULL
GROUP BY manager_id) AS Counts
JOIN (
SELECT COUNT(*) AS C
FROM person
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY COUNT(*) DESC
LIMIT 1
) AS MaxCount
ON Counts.C = MaxCount.C
JOIN person
ON Counts.manager_id = person.person_id
Result of the second query:
Foo
Bar
Here's my test data:
CREATE TABLE Table1 (person_id int NOT NULL, manager_id nvarchar(100) NULL, name nvarchar(100) NOT NULL);
INSERT INTO Table1 (person_id, manager_id, name) VALUES
(1, NULL, 'Foo'),
(2, '1', 'Bar'),
(3, '1', 'Baz'),
(4, '2', 'Qux'),
(5, '2', 'Quux'),
(6, '3', 'Corge');
Assuming manager_id have a reference to person_id and name of table: table_name
SELECT name FROM (
SELECT manager_id
FROM table_name
GROUP BY manager_id
ORDER BY COUNT(*) DESC
LIMIT 1
) t
INNER JOIN table_name ON t.manager_id = table_name.person_id
edit:
Removed HAVING MAX COUNT, added ORDER BY COUNT DESC LIMIT 1 in subquery