mysql select more than one result using group by - mysql

A table 'employees' with
id integer,
name string,
office integer,
income decimal
How can i select in one query the 3 best income for each office (if possible) ?
SELECT id,name
FROM employees
GROUP BY office
ORDER BY income DESC
will return only one guy/office

I will give you a way to do this but by a little workaround you'll get what you want, check this SQLFiddle for the tested query.
select * from employees as e1 having 4>=(select count(*) from employees as e2 where e1.income<e2.income);

To get 3 highest incomes per office you can use rank query with user defined variables,unfortunately in other RDBMS it easier to achieve this kind of results by using window functions by Mysql don't have these functions available,Below query will give you the 3 employees with highest income per office
SELECT t.id,t.name
FROM (
SELECT *,
#r:= CASE WHEN #g = office THEN #r + 1 ELSE 1 END row_num,
#g:= office
FROM employees
CROSS JOIN(SELECT #g:=NULL,#r:=0) a
ORDER BY office,income DESC
) t
WHERE t.row_num <=3

Select the highest income for each office:
SELECT office, max(income)
FROM employees
GROUP BY office
Get the employees with these incomes:
SELECT e.id, e.name, e.office
FROM employees as e
INNER JOIN (SELECT office, max(income) as income FROM employees GROUP BY office) as mx ON e.office = mx.office and e.income = mx.income
Edit:
I copied the SQLFiddle of Ajeesh to test the queries: http://sqlfiddle.com/#!2/de4de1/14/0

Related

SQL Consecutive Monthly Purchases

I'm having great difficulty writing this query and cannot find any answers online which could be applied to my problem.
I have a couple of tables which looks similar to the below with. Each purchase date corresponds with an item purchased.
Cust_ID
Purchase_Date
123
08/01/2022
123
08/20/2022
123
09/05/2022
123
10/08/2022
123
12/25/2022
123
01/26/2023
The result I am looking for should contain the customers ID, a range of the purchases, the number of consecutive months they had made a purchase (regardless of which day they purchased), and a count of how many purchases they had made in the time frame. The result should look something like the below for my example.
Cust_ID
Min Purchase Date
Max Purchase Date
Consecutive Months
No. Items Purchased
123
08/01/2022
10/08/2022
3
4
123
12/25/2022
01/26/2023
2
2
I have tried using CTEs with querys similar to
WITH CTE as
(
SELECT
PaymentDate PD,
CustomerID CustID,
DATEADD(m, -ROW_NUMBER() OVER (PARTITION BY c.CustomerID ORDER BY
DATEPART(m,PaymentDate)), PaymentDate) as TempCol1,
FROM customers as c
LEFT JOIN payments as p on c.customerid = p.customerid
GROUP BY c.CustomerID, p.PaymentDate
)
SELECT
CustID,
MIN(PD) AS MinPaymentDate,
MAX(PD) AS MaxPaymentDate,
COUNT(*) as ConsecutiveMonths,
FROM CTE
GROUP BY CustID, TempCol1
However, the above failed to properly count consecutive months. When the payment dates matched a month apart (e.g. 1/1/22 - 2/1/22), the query properly counts the consecutive months. However, if the dates do not match from month to month (e.g. 1/5/22 - 2/15/22), the count breaks.
Any guidance/help would be much appreciated!
This is just a small enhancement on the answer already given by ahmed. If your date range for this query is more than a year, then year(M.Purchase_Date) + month(M.Purchase_Date) will be 2024 for both 2022-02-01 and 2023-01-01 as YEAR() and MONTH() both return integer values. This will return incorrect count of consecutive months. You can change this to use CONCAT() or FORMAT(). Also, the COUNT(*) for ItemsPurchased should be counting the right hand side of the join, as it is a LEFT JOIN.
WITH consecutive_months AS
(
SELECT *,
DATEADD(
month,
-DENSE_RANK() OVER (
PARTITION BY CustomerID
ORDER BY YEAR(PaymentDate), MONTH(PaymentDate)
),
PaymentDate
) AS grp_date
FROM payments
)
SELECT
C.CustomerID AS CustID,
MIN(M.PaymentDate) AS MinPaymentDate,
MAX(M.PaymentDate) AS MaxPaymentDate,
COUNT(DISTINCT FORMAT(M.PaymentDate, 'yyyyMM')) AS ConsecutiveMonths,
COUNT(M.CustomerID) AS ItemsPurchased
FROM customers C
LEFT JOIN consecutive_months M
ON C.CustomerID = M.CustomerID
GROUP BY C.CustomerID, YEAR(M.grp_date), MONTH(M.grp_date)
Here's a db<>fiddle
You need to use the dense_rank function instead of the row_number, this will give the same rank for the same months and avoid breaking the grouping column. Also, you need to aggregate for 'year-month' of the grouping date column.
with consecutive_months as
(
select *,
Purchase_Date - interval
dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date))
month as grp_date
from payments
)
select C.Cust_ID,
min(M.Purchase_Date) as MinPurchaseDate,
max(M.Purchase_Date) as MaxPurchaseDate,
count(distinct year(M.Purchase_Date), month(M.Purchase_Date)) as ConsecutiveMonthsNo,
count(M.Cust_ID) as ItemsPurchased
from customers C left join consecutive_months M
on C.Cust_ID = M.Cust_ID
group by C.Cust_ID, year(M.grp_date), month(M.grp_date)
See demo on MySQL
You tagged your question with MySQL, while it seems that you posted an SQL Server query syntax, for SQL Server just use dateadd(month, -dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date)), Purchase_Date).
See demo on SQL Server.

Display column values and their count on SQL

I just want to ask you please this question on SQL.
Let's consider this EMPLOYEE table :
Employee Department
A 10
A 10
A 11
A 12
B 13
B 13
What I want to display is for each employee, all distinct departments (without duplicates) AND the total number of those distinct departments. So, something like this :
Employee Department total_dept
A 10 3
A 11 3
A 12 3
B 13 1
If possible, I would even prefer something like these :
Employee Department total_dept
A 10 3
A 11 null
A 12 null
B 13 1
I have a very big table (with many columns and many data) so I thought this can be an "optimisation", no ? I mean, there is no need to store the total_dept in all rows. Just put it once it's sufficient. No problem if after this I left the column empty. But I don't know if it's possible to do such thing in SQL.
So, how can I fix this please ? I tried but it seems impossible to combine count(column) with the same column...
Thank you in advance
This might be what you are looking for
SELECT
emp,
dept,
(select count(distinct dept) from TB as tbi where tb.emp = tbi.emp ) x
FROM TB
group by emp, dept;
MySQL 8.0 supports windowed COUNT:
SELECT *,COUNT(*) OVER (PARTITION BY Employee) AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
db<>fiddle demo
You could even have second resulset(I recommend to leave presentation matter to apllication layer):
SELECT *, CASE WHEN ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY Department) = 1
THEN COUNT(*) OVER (PARTITION BY Employee) END AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
ORDER BY Employee, Department;
db<>fiddle demo
For the 2nd version:
SELECT
DISTINCT e.Employee, e.Department,
CASE
WHEN e.Department =
(SELECT MIN(Department) FROM Employees WHERE Employees.Employee = e.Employee)
THEN
(SELECT COUNT(DISTINCT Department) FROM Employees WHERE Employees.Employee = e.Employee)
END AS total_dept
FROM Employees e
ORDER BY e.Employee, e.Department;
See the demo

Grouping all in one tuple in SQL

I have a table EMP with employees id and their hireyear. And I have to get the amount of hired employees in lets say the the years 2002 and 2000. The output table should als contain the amount of hired employees in the whole time.
So the last is easy. I just have to write:
SELECT COUNT(id) AS GLOBELAMOUNT FROM EMP;
But how do I count the amount of hired employees in 2002?
I could write the following:
SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002;
But how do I combine this in one tuple with the data above?
Maybe I should group the data by Hireyear first and then count it? But can not really imagine how I count the data for several years.
Hope u guys can help me.
Cheers,
Andrej
Use conditional aggregation, e.g.:
SELECT COUNT(id) AS GLOBELAMOUNT,
COUNT(CASE WHEN YEAR=2000 THEN 1 END) AS HIREDIN2000,
COUNT(CASE WHEN YEAR=2002 THEN 1 END) AS HIREDIN2002
FROM EMP;
In Microsoft SQL Server (Transact-SQL) at least, you can use a windowed aggregate function like this:
Select Distinct
Year
,count(Id) over (Partition by Year) as CountHiredInYear
,count(Id) over () as CountTotalHires
From EMP
This gives something like:
Year | CountHiredInYear | CountTotalHires
2005 | 3 | 12
2006 | 4 | 12
2007 | 5 | 12
Another SQL Server specific approach is the With Rollup keyword.
Select Year
,count(Id) as CountHires
From Emp
Group by Year
With Rollup
This adds a summary line for each level of grouping, with the total value for that set of rows. So here, you'd get an extra row where Year was NULL, with the value 12.
You could use two (or more) inline queries:
SELECT
(SELECT COUNT(id) FROM EMP) AS GLOBELAMOUNT,
(SELECT COUNT(id) FROM EMP WHERE YEAR = 2002) AS HIREDIN2002
or a CROSS JOIN:
SELECT GLOBELAMOUNT, HIREDIN2002
FROM
(SELECT COUNT(id) AS GLOBELAMOUNT FFROM EMP) g CROSS JOIN
(SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002) h

MYSQL query to find the all employees with nth highest salary

The two tables are salary_employee and employee
employee_salary
salary_id emp_id salary
Employee
emp_id | first_name | last_name | gender | email | mobile | dept_id | is_active
Query to get the all employees who have nth highest salary where n =1,2,3,... any integer
SELECT a.salary, b.first_name
FROM employee_salary a
JOIN employee b
ON a.emp_id = b.emp_id
WHERE a.salary = (
SELECT salary
FROM employee_salary
GROUP BY salary
DESC
LIMIT 1 OFFSET N-1
)
My Questions:
1) Is there any better and optimized way we can query this,
2) Is using LIMIT an good option
3) We have more options to calculate the nth highest salary, which is the best and what to follow and when?
One option using :
SELECT *
FROM employee_salary t1
WHERE ( N ) = ( SELECT COUNT( t2.salary )
FROM employee_salary t2
WHERE t2.salary >= t1.salary
)
Using Rank Method
SELECT salary
FROM
(
SELECT #rn := #rn + 1 rn,
a.salary
FROM tableName a, (SELECT #rn := 0) b
GROUP BY salary DESC
) sub
WHERE sub.rn = N
You have asked what seems like a reasonable question. There are different ways of doing things in SQL and sometimes some methods are better than others. The ranking problem is just one of many, many examples. The "answer" to your question is that, in general, order by is going to perform better than group by in MySQL. Although even that depends on the particular data and what you consider to be "better".
The specific issues with the question are that you have three different queries that return three different things.
The first returns all employees with a "dense rank" that is the same. That terminology is use purposely because it corresponds to the ANSI dense_rank() function which MySQL does not support. So, if your salaries are 100, 100, and 10, it will return two rows with a ranking of 1 and one with a ranking of 2.
The second returns different results if there are ties. If the salaries are 100, 100, 10, this version will return no rows with a ranking of 1, two rows with a ranking of 2, and one row with a ranking of 3.
The third returns an entirely different result set, which is just the salaries and the ranking of the salaries.
My comment was directed at trying the queries on your data. In fact, you should decide what you actually want, both from a functional and a performance perspective.
LIMIT requires the SQL to skim through all records between 0 and N and therefore requires increasing time the further back in your ranking you want to look. However, IMO that problem cannot be solved better.
As Gordon Linoff suggested: Run your option against your data set, using the commonly used ranks (which ranks are queried often, which are not? The result might be fast on rank 1 but terrible on rank 100).
Execute and analyze the Query Execution Plan and create indexes accordingly (for example on the salary column) and retest your queries.
Other options:
Option 4:
You could build a ranking table whichs serves as cache. The execution plan of your Limit-Query shows (see sqlfiddle here), that mysql already does create a temporary table to solve the query.
Pros: Easy and fast
Cons: Forces you to regenerate the ranking table each time the data changes
Option 5:
You could reconsider how you define "ranks".
If we have the following salaries:
100'000
100'000
80'000
Is the employee Nr 3 considered to be of rank 3 or 2?
Are 1 and 2 on the same rank (rank 1), but 3 is on rank 3?
If you define rank = order, you can greatly simplify the query to
SELECT a.salary, b.first_name
FROM employee_salary a, employee b
WHERE a.emp_id = b.emp_id
order by salary desc
LIMIT 1 OFFSET 4
demo: http://sqlfiddle.com/#!2/e7321d/1/0
try this,
SELECT * FROM one as A WHERE ( n ) = ( SELECT COUNT(DISTINCT(b.salary)) FROM one as B WHERE
B.salary >= A.salary )
Suppose emp_salary table have the below records:
And you want to select all employees with nth (N=1,2,3 etc.) highest/lowest (only change >(for highest), < (for lowest) operator according to your needs) salary, use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE N = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
For example, if you want to select all employees with 2nd highest salary, use below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
But if you want to display only second highest salary(only single record), use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
) limit 1;

Combining Two Select Sum Statements Into One

I have two statements within my table which work fine individually like this:
SELECT fee_earner, (SUM(fe_fees)) AS Total
FROM fees
GROUP BY fee_earner
order by Total desc;
SELECT supervisor, (SUM(sv_fees)) AS Total
FROM fees
GROUP BY supervisor
order by Total desc;
But there are some cases where the fee_earner and supervisor fields have the same person as the data, is there a way to combine these two statements into one to get the overall totals?
You can use union all for this:
SELECT person, sum(fe_fees) as fe_fees, sum(sv_fees) as sv_fees,
(sum(fe_fees) + sum(sv_fees)) as total
FROM ((select fee_earner as person, fe_fees as fe_fees, 0 as sv_fees, 'earner' as which
from fees
) union all
(select supervisor as person, 0 as fe_fees, sv_fees as sv_fees, 'supervisor' as which
from fees
)
) t
GROUP BY person
order by Total desc;
select
fee_earner, SUM(fe_fees) as total, SUM(sv_fees) as total2,
SUM(fe_fees) + SUM(sv_fees) as wholeTotal
from
fees
group by
fee_earner, supervisor
order by
Total desc;