MYSQL query to find the all employees with nth highest salary - mysql

The two tables are salary_employee and employee
employee_salary
salary_id emp_id salary
Employee
emp_id | first_name | last_name | gender | email | mobile | dept_id | is_active
Query to get the all employees who have nth highest salary where n =1,2,3,... any integer
SELECT a.salary, b.first_name
FROM employee_salary a
JOIN employee b
ON a.emp_id = b.emp_id
WHERE a.salary = (
SELECT salary
FROM employee_salary
GROUP BY salary
DESC
LIMIT 1 OFFSET N-1
)
My Questions:
1) Is there any better and optimized way we can query this,
2) Is using LIMIT an good option
3) We have more options to calculate the nth highest salary, which is the best and what to follow and when?
One option using :
SELECT *
FROM employee_salary t1
WHERE ( N ) = ( SELECT COUNT( t2.salary )
FROM employee_salary t2
WHERE t2.salary >= t1.salary
)
Using Rank Method
SELECT salary
FROM
(
SELECT #rn := #rn + 1 rn,
a.salary
FROM tableName a, (SELECT #rn := 0) b
GROUP BY salary DESC
) sub
WHERE sub.rn = N

You have asked what seems like a reasonable question. There are different ways of doing things in SQL and sometimes some methods are better than others. The ranking problem is just one of many, many examples. The "answer" to your question is that, in general, order by is going to perform better than group by in MySQL. Although even that depends on the particular data and what you consider to be "better".
The specific issues with the question are that you have three different queries that return three different things.
The first returns all employees with a "dense rank" that is the same. That terminology is use purposely because it corresponds to the ANSI dense_rank() function which MySQL does not support. So, if your salaries are 100, 100, and 10, it will return two rows with a ranking of 1 and one with a ranking of 2.
The second returns different results if there are ties. If the salaries are 100, 100, 10, this version will return no rows with a ranking of 1, two rows with a ranking of 2, and one row with a ranking of 3.
The third returns an entirely different result set, which is just the salaries and the ranking of the salaries.
My comment was directed at trying the queries on your data. In fact, you should decide what you actually want, both from a functional and a performance perspective.

LIMIT requires the SQL to skim through all records between 0 and N and therefore requires increasing time the further back in your ranking you want to look. However, IMO that problem cannot be solved better.
As Gordon Linoff suggested: Run your option against your data set, using the commonly used ranks (which ranks are queried often, which are not? The result might be fast on rank 1 but terrible on rank 100).
Execute and analyze the Query Execution Plan and create indexes accordingly (for example on the salary column) and retest your queries.
Other options:
Option 4:
You could build a ranking table whichs serves as cache. The execution plan of your Limit-Query shows (see sqlfiddle here), that mysql already does create a temporary table to solve the query.
Pros: Easy and fast
Cons: Forces you to regenerate the ranking table each time the data changes
Option 5:
You could reconsider how you define "ranks".
If we have the following salaries:
100'000
100'000
80'000
Is the employee Nr 3 considered to be of rank 3 or 2?
Are 1 and 2 on the same rank (rank 1), but 3 is on rank 3?
If you define rank = order, you can greatly simplify the query to
SELECT a.salary, b.first_name
FROM employee_salary a, employee b
WHERE a.emp_id = b.emp_id
order by salary desc
LIMIT 1 OFFSET 4
demo: http://sqlfiddle.com/#!2/e7321d/1/0

try this,
SELECT * FROM one as A WHERE ( n ) = ( SELECT COUNT(DISTINCT(b.salary)) FROM one as B WHERE
B.salary >= A.salary )

Suppose emp_salary table have the below records:
And you want to select all employees with nth (N=1,2,3 etc.) highest/lowest (only change >(for highest), < (for lowest) operator according to your needs) salary, use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE N = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
For example, if you want to select all employees with 2nd highest salary, use below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
But if you want to display only second highest salary(only single record), use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
) limit 1;

Related

How does these 2 sql queries work - to find two minimum and maximum salaries

Can anyone explain how does this two queries work ?
Q) Write a query to retrieve two minimum and maximum salaries from the EmployeePosition table.
To retrieve two minimum salaries, you can write a query as below:
A)To retrieve two minimum salaries, you can write a query as below:
SELECT DISTINCT Salary
FROM EmployeePosition E1
WHERE 2 >= (SELECT COUNT(DISTINCT Salary )
FROM EmployeePosition E2
WHERE E1.Salary >= E2.Salary
) ORDER BY E1.Salary DESC;
To retrieve two maximum salaries, you can write a query as below:
SELECT DISTINCT Salary
FROM EmployeePosition E1
WHERE 2 >= (SELECT COUNT(DISTINCT Salary)
FROM EmployeePosition E2
WHERE E1.Salary <= E2.Salary
)
ORDER BY E1.Salary DESC;
Reference table
is there any alternative SQL query to get the same result?
The question is different from what you have asked
Q21. Write a query to find the Nth highest salary from the table
without using TOP/limit keyword.
That is the second highest salary and it can be done by using row_number supported on MySQL 8.x
WITH max_salary AS
(
SELECT *,
DENSE_RANK() OVER (ORDER BY Salary Desc) AS Rnk
FROM EmployeePosition
)
SELECT max_salary.*
FROM max_salary
WHERE Rnk=2;
MySQL DENSE_RANK Function assigns a rank to each row within a partition or result set (in your case it is a result set) with no gaps in ranking values.
Meaning the same salary will have the same rank.
For example using the data on the linked question:
create table EmployeePosition (
EmpID int,
EmpPosition varchar(25),
DateOfJoining date ,
Salary int );
insert into EmployeePosition values
(1,'Manager','2022-05-01',500000),
(2,'Executive','2022-05-02',75000),
(3,'Manager','2022-05-01',90000),
(2,'Lead','2022-05-02',85000),
(1,'Executive','2022-05-01',300000),
(3,'Manager','2022-05-01',500000);
SELECT *,
DENSE_RANK() OVER (ORDER BY Salary Desc) AS Rnk
FROM EmployeePosition
Result:
EmpID EmpPosition DateOfJoining Salary Rnk
1 Manager 2022-05-01 500000 1
3 Manager 2022-05-01 500000 1
1 Executive 2022-05-01 300000 2
3 Manager 2022-05-01 90000 3
2 Lead 2022-05-02 85000 4
2 Executive 2022-05-02 75000 5
As you can see each Salary is assigned a rank you have two 500000 salary with rank 1 , so the second highest value is 300000 which is filtered on the WHERE Rnk=2;.
The above main query could be written differently:
select EmpID,EmpPosition,DateOfJoining,Salary
from ( SELECT *,
DENSE_RANK() OVER (ORDER BY Salary Desc) AS Rnk
FROM EmployeePosition
) as tbl
WHERE Rnk=2;
https://dbfiddle.uk/Meh2AloO
Can you please explain the sql queries in the question?
Let's explain below example
SELECT DISTINCT Salary
FROM EmployeePosition E1
WHERE 2 >= ( SELECT COUNT(DISTINCT Salary )
FROM EmployeePosition E2
WHERE E1.Salary <= E2.Salary
)
ORDER BY E1.Salary DESC;
This is known as Correlated subqueries, which are the one in which inner query or subquery reference outer query. Outer query needs to be executed before inner query.
For each record processed by outer query, inner query will be executed and will return how many records has records has salary less than the current salary. If you are looking for second highest salary then your query will stop as soon as inner query will return 2.
I think your answer would be:
By the inner selection, I mean:
(SELECT COUNT(DISTINCT Salary)
FROM EmployeePosition E2
WHERE E1.Salary <= E2.Salary
)
The system checks all the states which each salary is greater/less than how many salaries of the rest of the list(Salary list). In another word the system calculates it for all the rows and for each row it returns the number which says that this salary value is greater/less than how many salaries.
Imagine we have distinguish values in rows and they are sorted descending, so when it checks for the highest salary it returns the total row number as a result because it calculates that there are total row number states which the salary is less and equal to the highest salary.
In the same way, For the lowest salary we will only have 1 state which the salary is equal to itself.
So when the system checks for this logic:
WHERE 2 >= (SELECT COUNT(DISTINCT Salary)
FROM EmployeePosition E2
WHERE E1.Salary <= E2.Salary
)
it looks up for the situations that the salary is greater/less and equal to 2 other salaries and by the outer selection it returns the value of salaries.
I think that is really time consuming especially when dealing with large databases. As an alternative you can use this code:
SELECT
Salary, Rank
FROM(
SELECT
Salary,
Rank= ROW_NUMBER() OVER(ORDER BY(Salary) DESC)
FROM EmployeePosition
) X
WHERE Rank<=2

Understanding some SQL subquery examples

I've been self-learning SQL from resources on the internet. I have two SQL queries that I would like to understand.
Write an SQL query to fetch three max salaries from a table.
SELECT distinct Salary from worker a WHERE 3 >= (SELECT count(distinct Salary) from worker b WHERE a.Salary <= b.Salary) order by a.Salary desc;
Write an SQL query to fetch three min salaries from a table.
SELECT distinct Salary from worker a WHERE 3 >= (SELECT count(distinct Salary) from worker b WHERE a.Salary >= b.Salary) order by a.Salary desc;
As you can see these two are similar. The part I don't particularly understand is this:
(a.Salary >= b.Salary) or (a.Salary <= b.Salary)
I don't understand its logic here. What is it doing here?
Table:
In your query for each given salary it is calculated how many salaries are more/less than current. If count is more than 3, then the current one shouldn't be shown. The conditions (a.Salary >= b.Salary) are responsible for counting salaries more/less then the current one.
But to get 3 min/max salareis I would do thise:
SELECT slary
FROM worker
ORDER BY salary DESC
LIMIT 3;
For min:
SELECT slary
FROM worker
ORDER BY salary ASC
LIMIT 3;
If one worker can have more than one record and you need to show the max total, this would be:
SELECT SUM(slary) total_salary
FROM worker
GROUP BY worker_id
ORDER BY salary DESC
LIMIT 3

Query for getting top 5 candidate in every group in single table

I have a table in which student marks in each subject and i have to get query in such a way that i will able to get all top 5 student in every subject who secure highest marks.
Here is a sample table:
My expected output look somthing like :
Top five student in PCM, ART, PCB on the basis of students marks,And also if two or more student secure same than those record also need to be in list with single query.
Original Answer
Technically, what you want to accomplish is not possible using a single SQL query. Had you only wanted one student per subject you could have achieved that using GROUP BY, but in your case it won't work.
The only way I can think of to get 5 students for each subject would be to write x queries, one for each subject and use UNION to glue them together. Such query will return a maximum of 5x rows.
Since you want to get the top 5 students based on the mark, you will have to use an ORDER BY clause, which, in combination with the UNION clauses will cause an error. To avoid that, you will have to use subqueries, so that UNION and ORDER BY clauses are not on the same level.
Query:
-- Select the 5 students with the highest mark in the `PCM` subject.
(
SELECT *
FROM student
WHERE subject = 'PCM'
ORDER BY studentMarks DESC
LIMIT 5
)
UNION
(
SELECT *
FROM student
WHERE subject = 'PCB'
ORDER BY studentMarks DESC
LIMIT 5
)
UNION
(
SELECT *
FROM student
WHERE subject = 'ART'
ORDER BY studentMarks DESC
LIMIT 5
);
Check out this SQLFiddle to evaluate the result yourself.
Updated Answer
This update aims to allow getting more than 5 students in the scenario that many students share the same grade in a particular subject.
Instead of using LIMIT 5 to get the top 5 rows, we use LIMIT 4,1 to get the fifth highest grade and use that to get all students that have a grade more or equal to that in a given subject. Though, if there are < 5 students in a subject LIMIT 4,1 will return NULL. In that case, we want essentially every student, so we use the minimum grade.
To achieve what is described above, you will need to use the following piece of code x times, as many as the subjects you have and join them together using UNION. As can be easily understood, this solution can be used for a small handful of different subjects or the query's extent will become unmaintainable.
Code:
-- Select the students with the top 5 highest marks in the `x` subject.
SELECT *
FROM student
WHERE studentMarks >= (
-- If there are less than 5 students in the subject return them all.
IFNULL (
(
-- Get the fifth highest grade.
SELECT studentMarks
FROM student
WHERE subject = 'x'
ORDER BY studentMarks DESC
LIMIT 4,1
), (
-- Get the lowest grade.
SELECT MIN(studentMarks)
FROM student
WHERE subject = 'x'
)
)
) AND subject = 'x';
Check out this SQLFiddle to evaluate the result yourself.
Alternative:
After some research I found an alternative, simpler query that will yield the same result as the one presented above based on the data you have provided without the need of "hardcoding" every subject in its own query.
In the following solution, we define a couple of variables that help us control the data:
one to cache the subject of the previous row and
one to save an incremental value that differentiates the rows having the same subject.
Query:
-- Select the students having the top 5 marks in each subject.
SELECT studentID, studentName, studentMarks, subject FROM
(
-- Use an incremented value to differentiate rows with the same subject.
SELECT *, (#n := if(#s = subject, #n +1, 1)) as n, #s:= subject
FROM student
CROSS JOIN (SELECT #n := 0, #s:= NULL) AS b
) AS a
WHERE n <= 5
ORDER BY subject, studentMarks DESC;
Check out this SQLFiddle to evaluate the result yourself.
Ideas were taken by the following threads:
Get top n records for each group of grouped results
How to SELECT the newest four items per category?
Select X items from every type
Getting the latest n records for each group
Below query produces almost what I desired, may this query helps others in future.
SELECT a.studentId, a.studentName, a.StudentMarks,a.subject FROM testquery AS a WHERE
(SELECT COUNT(*) FROM testquery AS b
WHERE b.subject = a.subject AND b.StudentMarks >= a.StudentMarks) <= 2
ORDER BY a.subject ASC, a.StudentMarks DESC

MySQL grouping with detail

I have a table that looks like this...
user_id, match_id, points_won
1 14 10
1 8 12
1 12 80
2 8 10
3 14 20
3 2 25
I want to write a MYSQL script that pulls back the most points a user has won in a single match and includes the match_id in the results - in other words...
user_id, match_id, max_points_won
1 12 80
2 8 10
3 2 25
Of course if I didn't need the match_id I could just do...
select user_id, max(points_won)
from table
group by user_id
But as soon as I add match_id to the "select" and "group by" I have a row for every match, and if I only add the match_id to the "select" (and not the "group by") then it won't correctly relate to the points_won.
Ideally I don't want to do the following either because it doesn't feel particularly safe (e.g. if the user has won the same amount of points on multiple matches)...
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
Are there any more elegant options for this problem?
This is harder than it needs to be in MySQL. One method is a bit of a hack but it works in most circumstances. That is the group_concat()/substring_index() trick:
select user_id, max(points_won),
substring_index(group_concat(match_id order by points_won desc), ',', 1)
from table
group by user_id;
The group_concat() concatenates together all the match_ids, ordered by the points descending. The substring_index() then takes the first one.
Two important caveats:
The resulting expression has a type of string, regardless of the internal type.
The group_concat() uses an internal buffer, whose length -- by default -- is 1,024 characters. This default length can be changed.
You can use the query:
select user_id, max(points_won)
from table
group by user_id
as a derived table. Joining this to the original table gets you what you want:
select t1.user_id, t1.match_id, t2.max_points_won
from table as t1
join (
select user_id, max(points_won) as max_points_won
from table
group by user_id
) as t2 on t1.user_id = t2.user_id and t1.points_won = t2.max_points_won
I think you can optimize your query by add limit 1 in the inner query.
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won limit 1) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
EDIT : only for postgresql, sql-server, oracle
You could use row_number :
SELECT USER_ID, MATCH_ID, POINTS_WON
FROM
(
SELECT user_id, match_id, points_won, row_number() over (partition by user_id order by points_won desc) rn
from table
) q
where q.rn = 1
For a similar function, have a look at Gordon Linoff's answer or at this article.
In your example, you partition your set of result per user then you order by points_won desc to obtain highest winning point first.

SQL query to return the three highest values for one column "grouped" by another column

Let's say I have a table like this:
Player Score
A 5
B 4
A 3
B 2
A 1
B 1
A 2
B 3
A 4
B 5
I need an SQL query that will return the three highest scores per player in descending order "grouped" by player i.e.
Player Score
A 5
A 4
A 3
B 5
B 4
B 3
Very grateful for any pointers.
This is old-fashioned (read: basic sql) way of producing top-n per group. You might join the table to itself on group condition (here it is player) and pick records with higher score on right side; if there are three or less such records, the row is one of top n rows per group.
select player.player, player.score
from Player
left join Player p2
on p2.player = player.player
and p2.score > player.score
group by player.player, player.score
having count(distinct p2.score) < 3
order by 1, 2 desc
Alternative version you might check, using not exists:
select player, score
from player
where not exists
(
select p2.player
from Player p2
where p2.player = player.player
and p2.score > player.score
group by p2.player
having count(distinct p2.score) > 3
)
order by 1, 2 desc
This two versions differ in presentation of ties - while first one returns one row (by nature of group by) and needs to be joined back to original table to show all records, second one works directly from original table showing all data and ties at once.
You can find Demo at Sql Fiddle.
in SQL server:
select p.player, p.score
from PS p
where p.score in (select top 3 score from PS
where player = p.player order by score desc)
order by p.player asc, p.score desc
in MySql:
select p.player, p.score
from PS p
where p.score in (select score from PS
where player = p.player order by score desc limit 3)
order by p.player asc, p.score desc
I think what you are looking for can be found here:
http://www.sql-ex.ru/help/select16.php
Basically, the best solution uses the RANK function. Here is the example code from the site:
SELECT maker, model, type FROM
(
SELECT maker, model, type, RANK() OVER(PARTITION BY type ORDER BY model) num
FROM Product
) X
WHERE num <= 3
You would just need to modify the Partition By section to order by your score in descending order.
EDIT
Based upon the information that you will be using MySQL, you will need to make some modifications to the above query (which works with Microsoft SQL). You need to replace the RANK function with your own RANK implementation. It isn't that hard. Complete instructions can be found here:
http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/
That will show you how to implement a counter that can give you a rank number.
Depending on what DBMS you use, you may be able to use row_number in some form
In SQL Server 2008 you can use
create table #player
( Player char, Score int )
insert into #player (Player, Score) Values
('A',5),('B',4),('A',3),('B',2),('A',1),('B',1),('A',2),('B',3),('A',4),('B',5)
select * from #player
select Player, Score from
(
select *, ROW_NUMBER() over(partition by Player order by Score desc) as rowNo
from #player
) as tmp
where tmp.rowNo <= 3
drop table #player