I just want to ask you please this question on SQL.
Let's consider this EMPLOYEE table :
Employee Department
A 10
A 10
A 11
A 12
B 13
B 13
What I want to display is for each employee, all distinct departments (without duplicates) AND the total number of those distinct departments. So, something like this :
Employee Department total_dept
A 10 3
A 11 3
A 12 3
B 13 1
If possible, I would even prefer something like these :
Employee Department total_dept
A 10 3
A 11 null
A 12 null
B 13 1
I have a very big table (with many columns and many data) so I thought this can be an "optimisation", no ? I mean, there is no need to store the total_dept in all rows. Just put it once it's sufficient. No problem if after this I left the column empty. But I don't know if it's possible to do such thing in SQL.
So, how can I fix this please ? I tried but it seems impossible to combine count(column) with the same column...
Thank you in advance
This might be what you are looking for
SELECT
emp,
dept,
(select count(distinct dept) from TB as tbi where tb.emp = tbi.emp ) x
FROM TB
group by emp, dept;
MySQL 8.0 supports windowed COUNT:
SELECT *,COUNT(*) OVER (PARTITION BY Employee) AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
db<>fiddle demo
You could even have second resulset(I recommend to leave presentation matter to apllication layer):
SELECT *, CASE WHEN ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY Department) = 1
THEN COUNT(*) OVER (PARTITION BY Employee) END AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
ORDER BY Employee, Department;
db<>fiddle demo
For the 2nd version:
SELECT
DISTINCT e.Employee, e.Department,
CASE
WHEN e.Department =
(SELECT MIN(Department) FROM Employees WHERE Employees.Employee = e.Employee)
THEN
(SELECT COUNT(DISTINCT Department) FROM Employees WHERE Employees.Employee = e.Employee)
END AS total_dept
FROM Employees e
ORDER BY e.Employee, e.Department;
See the demo
Related
Let's consider this example :
Employee Function Start_dept End_dept
A dev 10 13
A dev 11 12
A test 9 9
A dev 13 11
What I want to select is employee, their function and the distinct departments in BOTH "start" and "end" department. It will give this result :
Employee Function count_distinct_dept
A dev 4
A test 1 `
For the dev A, we have only 4 distinct departments (10, 11, 12 and 13) because we shouldn't count duplicate values in the 2 columns (start and end).
How can I do this ? (I'm using mySQL).
Is it possible to do this on one request without any JOIN or any UNION ? Or is it obligatory to use one of them ? Since I am using a huge database (with more than 3 billions lines), I am not sure if a join or union request will be optimal...
Use a union all and aggregation:
select Employee, Function, count(distinct dept)
from ((select Employee, Function, Start_dept as dept
from e
) union all
(select Employee, Function, End_dept
from e
)
) e
group by Employee, Function;
If you want performance, I would suggest starting with two indexes on (Employee, Function, Start_Dept) and (Employee, Function, End_Dept). Then:
select Employee, Function, count(distinct dept)
from ((select distinct Employee, Function, Start_dept as dept
from e
) union all
(select distinct Employee, Function, End_dept
from e
)
) e
group by Employee, Function;
The subqueries should be scanning the index rather than the overall table. You will still need to do the final GROUP BY. I am guessing that COUNT(DISTINCT) is a better approach than UNION in the subquery, but you could test that.
Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35
I have a table EMP with employees id and their hireyear. And I have to get the amount of hired employees in lets say the the years 2002 and 2000. The output table should als contain the amount of hired employees in the whole time.
So the last is easy. I just have to write:
SELECT COUNT(id) AS GLOBELAMOUNT FROM EMP;
But how do I count the amount of hired employees in 2002?
I could write the following:
SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002;
But how do I combine this in one tuple with the data above?
Maybe I should group the data by Hireyear first and then count it? But can not really imagine how I count the data for several years.
Hope u guys can help me.
Cheers,
Andrej
Use conditional aggregation, e.g.:
SELECT COUNT(id) AS GLOBELAMOUNT,
COUNT(CASE WHEN YEAR=2000 THEN 1 END) AS HIREDIN2000,
COUNT(CASE WHEN YEAR=2002 THEN 1 END) AS HIREDIN2002
FROM EMP;
In Microsoft SQL Server (Transact-SQL) at least, you can use a windowed aggregate function like this:
Select Distinct
Year
,count(Id) over (Partition by Year) as CountHiredInYear
,count(Id) over () as CountTotalHires
From EMP
This gives something like:
Year | CountHiredInYear | CountTotalHires
2005 | 3 | 12
2006 | 4 | 12
2007 | 5 | 12
Another SQL Server specific approach is the With Rollup keyword.
Select Year
,count(Id) as CountHires
From Emp
Group by Year
With Rollup
This adds a summary line for each level of grouping, with the total value for that set of rows. So here, you'd get an extra row where Year was NULL, with the value 12.
You could use two (or more) inline queries:
SELECT
(SELECT COUNT(id) FROM EMP) AS GLOBELAMOUNT,
(SELECT COUNT(id) FROM EMP WHERE YEAR = 2002) AS HIREDIN2002
or a CROSS JOIN:
SELECT GLOBELAMOUNT, HIREDIN2002
FROM
(SELECT COUNT(id) AS GLOBELAMOUNT FFROM EMP) g CROSS JOIN
(SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002) h
A table 'employees' with
id integer,
name string,
office integer,
income decimal
How can i select in one query the 3 best income for each office (if possible) ?
SELECT id,name
FROM employees
GROUP BY office
ORDER BY income DESC
will return only one guy/office
I will give you a way to do this but by a little workaround you'll get what you want, check this SQLFiddle for the tested query.
select * from employees as e1 having 4>=(select count(*) from employees as e2 where e1.income<e2.income);
To get 3 highest incomes per office you can use rank query with user defined variables,unfortunately in other RDBMS it easier to achieve this kind of results by using window functions by Mysql don't have these functions available,Below query will give you the 3 employees with highest income per office
SELECT t.id,t.name
FROM (
SELECT *,
#r:= CASE WHEN #g = office THEN #r + 1 ELSE 1 END row_num,
#g:= office
FROM employees
CROSS JOIN(SELECT #g:=NULL,#r:=0) a
ORDER BY office,income DESC
) t
WHERE t.row_num <=3
Select the highest income for each office:
SELECT office, max(income)
FROM employees
GROUP BY office
Get the employees with these incomes:
SELECT e.id, e.name, e.office
FROM employees as e
INNER JOIN (SELECT office, max(income) as income FROM employees GROUP BY office) as mx ON e.office = mx.office and e.income = mx.income
Edit:
I copied the SQLFiddle of Ajeesh to test the queries: http://sqlfiddle.com/#!2/de4de1/14/0
The two tables are salary_employee and employee
employee_salary
salary_id emp_id salary
Employee
emp_id | first_name | last_name | gender | email | mobile | dept_id | is_active
Query to get the all employees who have nth highest salary where n =1,2,3,... any integer
SELECT a.salary, b.first_name
FROM employee_salary a
JOIN employee b
ON a.emp_id = b.emp_id
WHERE a.salary = (
SELECT salary
FROM employee_salary
GROUP BY salary
DESC
LIMIT 1 OFFSET N-1
)
My Questions:
1) Is there any better and optimized way we can query this,
2) Is using LIMIT an good option
3) We have more options to calculate the nth highest salary, which is the best and what to follow and when?
One option using :
SELECT *
FROM employee_salary t1
WHERE ( N ) = ( SELECT COUNT( t2.salary )
FROM employee_salary t2
WHERE t2.salary >= t1.salary
)
Using Rank Method
SELECT salary
FROM
(
SELECT #rn := #rn + 1 rn,
a.salary
FROM tableName a, (SELECT #rn := 0) b
GROUP BY salary DESC
) sub
WHERE sub.rn = N
You have asked what seems like a reasonable question. There are different ways of doing things in SQL and sometimes some methods are better than others. The ranking problem is just one of many, many examples. The "answer" to your question is that, in general, order by is going to perform better than group by in MySQL. Although even that depends on the particular data and what you consider to be "better".
The specific issues with the question are that you have three different queries that return three different things.
The first returns all employees with a "dense rank" that is the same. That terminology is use purposely because it corresponds to the ANSI dense_rank() function which MySQL does not support. So, if your salaries are 100, 100, and 10, it will return two rows with a ranking of 1 and one with a ranking of 2.
The second returns different results if there are ties. If the salaries are 100, 100, 10, this version will return no rows with a ranking of 1, two rows with a ranking of 2, and one row with a ranking of 3.
The third returns an entirely different result set, which is just the salaries and the ranking of the salaries.
My comment was directed at trying the queries on your data. In fact, you should decide what you actually want, both from a functional and a performance perspective.
LIMIT requires the SQL to skim through all records between 0 and N and therefore requires increasing time the further back in your ranking you want to look. However, IMO that problem cannot be solved better.
As Gordon Linoff suggested: Run your option against your data set, using the commonly used ranks (which ranks are queried often, which are not? The result might be fast on rank 1 but terrible on rank 100).
Execute and analyze the Query Execution Plan and create indexes accordingly (for example on the salary column) and retest your queries.
Other options:
Option 4:
You could build a ranking table whichs serves as cache. The execution plan of your Limit-Query shows (see sqlfiddle here), that mysql already does create a temporary table to solve the query.
Pros: Easy and fast
Cons: Forces you to regenerate the ranking table each time the data changes
Option 5:
You could reconsider how you define "ranks".
If we have the following salaries:
100'000
100'000
80'000
Is the employee Nr 3 considered to be of rank 3 or 2?
Are 1 and 2 on the same rank (rank 1), but 3 is on rank 3?
If you define rank = order, you can greatly simplify the query to
SELECT a.salary, b.first_name
FROM employee_salary a, employee b
WHERE a.emp_id = b.emp_id
order by salary desc
LIMIT 1 OFFSET 4
demo: http://sqlfiddle.com/#!2/e7321d/1/0
try this,
SELECT * FROM one as A WHERE ( n ) = ( SELECT COUNT(DISTINCT(b.salary)) FROM one as B WHERE
B.salary >= A.salary )
Suppose emp_salary table have the below records:
And you want to select all employees with nth (N=1,2,3 etc.) highest/lowest (only change >(for highest), < (for lowest) operator according to your needs) salary, use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE N = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
For example, if you want to select all employees with 2nd highest salary, use below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
);
But if you want to display only second highest salary(only single record), use the below sql:
SELECT DISTINCT(a.salary),
a.id,
a.name
FROM emp_salary a
WHERE 2 = (SELECT COUNT( DISTINCT(b.salary)) FROM emp_salary b
WHERE b.salary >= a.salary
) limit 1;