Average of sum of date difference.. MySQL - mysql

So I'm trying to find the average time it takes for an employee to get a salary raise from the time of hiring.
I've tried a few things, but I'm not getting it right.
Here's what my database looks like:
https://dev.mysql.com/doc/employee/en/sakila-structure.html
This is what I've tried:
SELECT AVG(SUM(datediff(hire_date, min((SELECT from_date FROM salaries
WHERE from_date > hire_date AND
(SELECT salary FROM salaries
WHERE from_date = hire_date) <
(SELECT salary FROM salaries
WHERE from_date > hire_date))))))
FROM employees;
Any help would be greatly appreciated, logically this should be correct (maybe not), I'm probably just messing up with syntax some how..
Thanks!

This is not an easy query, but probably it is in the course for you to try to find it by yourself and, during that process, learn a lot of SQL.
In a real system development situation, if SQL gets too heavy or complex sometimes it is easier for the programmer to do more queries, use a stored procedure or solve it in the programming language.
But, as a help, I believe this would make the trick:
select avg(md) from (
select emp_no, min(days) as md from
(select s1.emp_no as emp_no,
s1.from_date as start,
datediff(s2.from_date,s1.from_date) as days
from employees e inner join salaries s1 on
e.emp_no=s1.emp_no
inner join salaries s2 on s1.emp_no=s2.emp_no
where
s2.from_date <> s1.from_date and
s1.from_date < s2.from_date and
s1.from_date = e.hire_date
) t ) tt group by emp_no
The idea is first make an expensive JOIN to find all differences of dates (from s2s2 - from of s1) but only when date_from is equal hire_date and the dates are not equal. (diff=0).
The second internal select gets the minimum value for each employee, this is for sure the first promotion.
The outer select makes the average.

This is how I would approach this to make things that you are trying to achieve explicit:
SELECT AVG(datediff(e.hire_date, second_salary.from_date))
FROM
employees e INNER JOIN
salaries first_salary ON e.emp_no = first_salary.emp_no AND
first_salary.from_date = e.hire_date INNER JOIN
salaries second_salary ON e.emp_no = second_salary.emp_no AND
second_salary.from_date > e.hire_date AND
second_salary.salary > frist_salary.salary AND
NOT EXISTS (SELECT * FROM salaries s
WHERE s.emp_no = e.emp_no AND
s.from_date > e.hire_date AND
s.from_date < second_salary.from_date AND
s.salary > first_salary.salary)
;
This type of analysis requires a lot of profiling and data quality validation though. I would not trust date conditions too much for this type of data.

Related

Reduce number of subqueries

I've 2 tables emp and expenditure.
Emp:
ID, NAME
Expenditure:
ID, EMP_ID, AMOUNT
Each emp has a limit of 100 that he/she can spend. We want to check which emp has expenditure > 100.
Output attributes needed: Emp name, exp id, amount
My query:
SELECT E.NAME,
EXP.ID,
EXP.AMOUNT
FROM EMP E
INNER JOIN expenditure EXP ON E.ID = EXP.EMP_ID
WHERE E.ID in
(SELECT EMP_ID
FROM
(SELECT EMP_ID,
SUM(AMOUNT) AS TOTAL
FROM expenditure
GROUP BY EMP_ID
HAVING SUM(AMOUNT) > 100.00
ORDER BY TOTAL DESC) SUBQ)
ORDER BY EXP.AMOUNT desc;
Is it possible to optimize this?
Just like code, SQL queries can be written in many different ways. Run an Execution Plan on your SQL. Check here and here
Below is more "conciser" version although it may not be any more optimised than your current code. Use Execution Plans to analyse performance.
SELECT E.NAME,
E.ID, -- THIS IS EMPLOYEE ID NOT EXPENDITURE ID
EXP.EMP_SPENT
FROM EMP E
JOIN (SELECT EMP_ID, sum(AMOUNT) as EMP_SPENT FROM expenditure GROUP BY EMP_ID) EXP
ON E.ID = EXP.EMP_ID
WHERE EXP.EMP_SPENT > 100;
Additionally...
I noticed that question is a bit confusing. The original query spits out "every line" of expense of an employee, whose "total" expenses are above 100. The text however, says "Which employee has gone above 100". These are two different questions. My answer above is for the latter "Who has gone above". It will NOT list all expenses of employees who had spent over 100, but only a list of employees who have gone above 100 and their total expenditure.
You can use a simple aggregation with HAVING clause such as
SELECT e.name, e.id, SUM(amount) AS total
FROM emp e
JOIN expenditure ep
ON e.id = ep.emp_id
GROUP BY e.name, e.id
HAVING SUM(amount) > 100
but it's not logical to have a non-aggregated column along with aggregated ones within the result

Find max salary and name of employee, if multiple records than print all

I want to print name and salary amount of the employee which has highest salary, till now its okay but if there are multiple records than print all. There are two table given :-
EMPLOYEE TABLE :-
SALARY TABLE:-
my query is: -
SELECT E.NAME, S.AMOUNT
FROM `salary` S,
employee E
WHERE S.EMPLOYEE_ID = E.ID
and S.AMOUNT = (SELECT max(`AMOUNT`)
FROM `salary`)
is there any better way to find out the solution ?
It is "with ties" functionality what you're trying to achieve. Unfortunately mySQL doesn't support that (in the docs there is nothing to add to the "LIMIT" part of the query), so you have no other option rather than looking for max salary first and filter records afterwards.
So, your solution is fine for that case.
Alternatively, if you're on version 8 and newer, you may move the subquery to the with clause
with max_sal as (
select max(amount) ms from salary
)
SELECT E.NAME, S.AMOUNT
FROM salary S
JOIN employee E
ON S.EMPLOYEE_ID = E.ID
JOIN max_sal ms
ON S.AMOUNT = ms.ms
or search for it in the join directly
SELECT E.NAME, S.AMOUNT
FROM salary S
JOIN employee E
ON S.EMPLOYEE_ID = E.ID
JOIN (select max(amount) ms from salary) ms
ON S.AMOUNT = ms.ms
But I'm sure it won't get you any better performance
I like solving them with a join:
WITH M as (select max(amount) as amount from salary)
SELECT E.NAME, S.AMOUNT
FROM M JOIN SALARY USING(AMOUNT) JOIN Employee USING(Id)
but your solution is perfectly fine..

MySQL joining tables and functions

I'm trying to answer these questions but I'm not understanding the whole joining and functions part of MySQL. Can someone show me or explain these to me?
this is the link to the employee database we are using - https://github.com/datacharmer/test_db
I want to know how many employees with each title were born after 1965-01-01.
I want to know the average salary per title.
How much money was spent on salary for the marketing department between the years 1990 and 1992?
This is what I have so far for each one.
1.
select count(title) as "Number Of Employees", title from titles GROUP BY title LIMIT 20;
SELECT d.dept_name as "Department", avg(s.salary) as "Average Salary" from departments d
INNER JOIN dept_emp de on de.dept_no = d.dept_no
INNER JOIN salaries s on s.emp_no = de.emp_no
GROUP BY d.dept_name;
and this one seems like it's just those two put together so I completely don't understand it.
Join with the employee table so you can get the employee's date of birth.
SELECT t.title, COUNT(*) AS "Number of Employees"
FROM titles AS t
JOIN employees AS e ON e.emp_no = t.emp_no
WHERE e.birth_date > '1965-01-01'
GROUP BY t.title
You need to get the most recent salary for each employee and average that. And you have to join with the titles table so you can average by title.
SELECT t.title, AVG(salary)
FROM titles AS t
JOIN employees AS e ON e.emp_no = t.emp_no
JOIN (
-- subquery to get latest salary for each employee
-- See https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column?noredirect=1&lq=1
SELECT s.emp_no, s.salary
FROM salaries AS s
JOIN (
SELECT emp_no, MAX(from_date) AS max_date
FROM salaries
GROUP BY emp_no
) AS ms ON s.emp_no = ms.emp_no AND s.from_date = ms.from_date
) AS s ON e.emp_no = t.emp_no
GROUP BY t.title
I'm not even sure what the third question means. Does it mean the total salaries for all employees during those years? This seems incredibly complex for a beginner exercise, since you have to deal with different start/end dates for employees, and changing salaries during that period. I'm not even sure how to do that in a single query.

Left Join or Group By for Finding Maximum Salary in Mysql

I am trying to understand why it is a popular belief that avoiding a group by is always beneficial. My problem statement is : From an employee table where department_id is a foreign key, find out those departments where an employees maximum salary is 40000
1 the group by approach :
select d.department_name , e.max_salary
from department d
join ( select department_id, max(salary) as max_salary
from emp
group by 1
having max_salary = 40000 ) e
on (d.department_id = e.department_id)
2 Now the left join approach :
select d.department_name, inner_q.salary
from department d
join
(select e.department_id , e.salary
from emp e
left join emp e_inner
on (e.department_id = e_inner.department_id and e.salary < e_inner.salary)
where e_inner.department_id is null and e.salary = 40000 ) inner_q
on (d.department_id = inner_q.department_id)
Unfortunately explain plan does not make much sense to me. Any help in explaining which one should perform better and why would be much appreciated.
You are working too hard.
SELECT department_name, MAX(salary) AS max_salary
FROM emp
GROUP BY department_name
HAVING max_salary >= 40000
That will be faster than any version with subqueries.
This will make it run faster: INDEX(department_name, salary)
(Perhaps you want >= 40000, not = 40000?)
This version will make a single pass over the entire table (or INDEX, if you add that "covering" index), gathering the max salary for each department. Then it will throw away results that fail the HAVING clause; delivering the rest.
I would have not qualms about running this GROUP BY on a table of 10K rows. A million-row table would take a noticeable, but small, amount of time.

how to calculate from date of birth in a single sql query

i want to find the age of employee with highest salary in the database.
i tried this query
SELECT DATEDIFF(SELECT DATE_FORMAT(SYSDATE(),'%Y-%m-%d'),
(SELECT birth_date FROM salaries as s, employees as e WHERE salary = (SELECT MAX(salary) FROM salaries) and s.emp_no = e.emp_no)/365.25);
but its not working.this picture contain database structure
Your original attempt seemed to have a number of minor problems, though the overall approach seems sound to me. Just take the DATEDIFF() between the birth date of the employee with the maximum salary and the current datetime.
SELECT DATEDIFF(SYSDATE(), e.birth_date) / 365.25
FROM salaries s
INNER JOIN employees e
ON s.emp_no = e.emp_no
WHERE s.salary = (SELECT MAX(salary) FROM salaries)
Changes I made include using an explicit inner join between your tables and also computing the date difference in a different way.
Note that this query would return stats for multiple employees should more than one employee tie for the maximum salary. In absence of further requirements, this seems like a reasonable thing to do.