aggregate tables with AVG - mysql

I am new to SQL.
I checked "another solutionSQL JOIN two tables with AVG" posted in StackOverflow. And I don't get the meaning with this line in that article:AVG(score.score * 1.0) Besides, the alternative solution below doesn't work at all:
SELECT songs.id, songs.song, songs.artist,
(SELECT AVG(Score) FROM score WHERE score.id = songs.id) AS AvgScore)
FROM songs
Here are my tables:
[employees]
Dep ID | SALARY
1 | 500
2 | 200
1 | 300
2 | 1000
2 | 400
3 | 200
3 | 300
[departments]
Dept ID Dep| Dept Name
1 | Volcano
2 | ShootingStar
3 | Tsunami
In the end, I want to create a list looks like:
Dept Name | Average Salary
Volcano | $$$
ShootingStar| $$
Tsunami | $$$$
I tried various ways and hunting hints in stackoverflow for sub queries/inner join features but still can't get it.
Based on the solution in the previous link SQL JOIN two tables with AVG, this code works:
-- mapping DEPT ID with NAME + average salary by DEPT --
select EMPLOYEES.DEP_ID, DEPARTMENTS.DEP_NAME, AVG(EMPLOYEES.SALARY) as AVG_S
from EMPLOYEES
LEFT JOIN DEPARTMENTS
ON EMPLOYEES.DEP_ID = DEPARTMENTS.DEPT_ID_DEP
group by DEP_ID, DEP_NAME;
However, I want to understand the reason WHY my original one doesn't work?
select E.DEP_ID, D.DEP_NAME, (select AVG(SALARY) from EMPLOYEES group by DEP_ID) as AVG_S
from EMPLOYEES E, DEPARTMENTS D
where E.DEP_ID = D.DEPT_ID_DEP
group by DEP_ID, DEP_NAME;
Please help!
Thank you very much.

The query you wanted to write:
select
e.dep_id,
d.dep_name,
(select avg(salary) from employees e where e.dep_id = d.dept_id_dep) as avg_s
from departments d;
The logic of the query is to select from departments only, then use a correlated subquery to compute the average salaries of employees of the department. This avois aggregating in the outer query.
Your query fails in the following regards:
table employees is in the from clause of the outer query
the outer query does group by
the subquery is not corelated on the department

Related

Trouble with SQL query - Search for crew members who never flew the same route two days in a row

I hope the title is not too confusing. I am learning sql by working on a database for an airline company. For the query I will explain, the following tables get involved:
1. Airplane
plane_number| type | capacity
-------------------------------
I-XX0 | boeing| 200
I-XX1 | airbus| 250
2. Route
route_id | airport1 | airport2
-------------------------------
1 | LAX | CDG
2 | FCO | LAX
3. Flight
flight_id | departure | arrival | plane_number | route_id
-----------------------------------------------------------------------------------------
AC 000 | 2020-02-11T13:10:00 |2020-02-11T15:15:00 | I-XX0 | 1
AC 001 | 2020-02-12T13:10:00 |2020-02-12T15:15:00 | I-XX1 | 2
4. employee
employee_id | name | surname
-------------------------------
1 | bob | black
2 | paul | white
5. service
employee_id | flight_id
-----------------------
1 | AC 000
2 | AC 001
Having this data, is it possible to find out the employees which never worked on the same route two days in a row?
I have tried doing a self join, but I'm not sure that's the right approach.
I hope I've been clear enough, if not please comment in order to suggest an edit.
Thank you all very much in advance.
EDIT
In order to make the whole model more clear, here is the ER model:
This isn't terrible once you break it down into its parts: A query giving employee_ids for employees who have worked the same route on consecutive days, and a query of the employee table getting all employees whose employee_ids don't appear in the first query.
For the first query, we need so find consecutive flights on the same route. The first step is joining Flight to itself on the route_id and the departure date. The join condition for the departure date should check that the second flight departed one day after the first flight's:
FROM Flight f1
JOIN Flight f2 ON f1.route_id = f2.route_id
AND CAST(f1.Departure AS DATE) = CAST(DATE_ADD(f2.Departure, INTERVAL -1 DAY) AS DATE)
Then join each Flight to service on flight_id, and confirm the employee_ids are the same:
JOIN service s1 ON f1.flight_id = s1.flight_id
JOIN service s2 ON f2.flight_id = s2.flight_id
WHERE
s1.employee_id = s2.employee_id
Putting it together, we want to select distinct employee_ids, and wrap it in a CTE to join to:
WITH B2BRouteEmployees
AS
(
SELECT DISTINCT s1.employee_id
FROM Flight f1
JOIN Flight f2 ON f1.route_id = f2.route_id
AND CAST(f1.Departure AS DATE) = CAST(DATE_ADD(f2.Departure, INTERVAL -1 DAY) AS DATE)
JOIN service s1 ON f1.flight_id = s1.flight_id
JOIN service s2 ON f2.flight_id = s2.flight_id
WHERE
s1.employee_id = s2.employee_id
)
Now we can do a LEFT JOIN between the employee table and our B2BRouteEmployees table, and take the employees where the join is NULL -- they do not appear in the list.
SELECT e.employee_id
FROM employee e
LEFT JOIN B2BRouteEmployees b ON e.employee_id = b.employee_id
WHERE
b.employee_id IS NULL
I cant say this is the best way of doing that - im just scratching the up in a text editor - but I think it should give you an idea of one way of doing it. I think something like this would work;
-- Make a temp table / cte / or you could do this as a sub query. you want to show the next departure for each employee route combination
WITH flightEmployee AS (
SELECT s.employee_id,
f1.route_id,
f1.departure,
lead(f1.departure,1) OVER (PARTITION BY s.employee_id, f1.route_id ORDER BY f1.departure) AS nextDeparture
FROM #flight f1
INNER JOIN #service s
ON f1.flight_id = s.flight_id
)
-- you can then use this to show if it is going to be departing within 24 hours in many different ways - here is one example.
SELECT *
FROM (
SELECT employee_id,
route_id,
SUM(CASE WHEN DATEDIFF(hour,nextDeparture,departure) >= -24 THEN 1 ELSE 0 END) As No_Of_Same_Employee_and_Route_Within_24_Hours
FROM flightEmployee
GROUP BY employee_id,route_id
) x
WHERE x.No_Of_Same_Employee_and_Route_Within_24_Hours= 0
I count each instance of employee / route combinations within 24 hours - then sum them and select only those employee / routes with a count of zero.
As i said there will be many versions of solving this - maybe some with less code or more efficient - this is simply food for thought.
SELECT e. name, e. surname
FROM employee e
LEFT JOIN service s ON s. employee_id = e.employee_id
LEFT JOIN (
SELECT DISTINCT (f.departure, f. route_id) , f.flight_id
FROM flight f) g ON g. flight_id = s. flight_id
can u try this.

SQL Predicates per aggregate function

I am struggling with a SQL query,
Query:
I want to find a list of hospitals with a count of dentists (is_denitist=true) and all doctors (including dentists) having monthly_income > 100 000
I have 2 tables Hospitals and Doctors with the following schema,
-------------
| Hospital |
|-----------|
| id | name |
|-----------|
---------------------------------------------------------
| Doctor |
|--------------------------------------------------------
| id | name | monthly_income | is_dentist | hospital_id |
|--------------------------------------------------------
The query I came up with is,
select h.name, count(d.is_dentist), sum(d.monthly_income)
from Hospital h inner join Doctor d
on h.id = d.hospital_id
where d.monthly_income > 100000 and d.is_dentist=true
group by h.name;
If I am a dentist and having income less than 100 000 then the hospital should still count me as a dentist.
But the caveat in the above query is it filters out all doctors having monthly_income above 100 000 and are dentists. I want an independent count of these conditions like predicates over each count() column. How can we achieve this in a single query?
You can do conditional aggregation.
Since is_dentist (presumably) contains 0/1 values, you can just sum() this column to count how many doctors belong to the group.
On the other hand, you can use another conditional sum() to count how many doctors have an income above the threshold.
select
h.name,
sum(d.is_dentist) no_dentists,
sum(d.monthly_income > 100000) no_doctors_above_100000_income
from Hospital h
inner join Doctor d on h.id = d.hospital_id
group by h.name;
You have two independent conditions (monthly_income > 100000, and is_dentist=true) which means there are two different data sets. You can't be used two different data set in the same group query.
So you need to divide it into two subqueries. You can check the following query whether the result is you wanted:
select temp3.name, temp1.dentist_count, temp2.income_count from
(select d1.hospital_id, count(*) as dentist_count from Doctor d1 where d1.monthly_income>100000 group by d1.hospital_id) as temp1
join
(select d2.hospital_id, count(*) as income_count from Doctor d2 where d2.is_dentist=true group by d2.hospital_id) as temp2
on temp1.hospital_id=temp2.hospital_id
join
(select h.id, h.name from Hospital h) as temp3
on temp2.hospital_id=temp3.id;

MySQL gruop by when there is no aggregation

I have a table called booking_details.
id | tour_id | tour_fee| booking_id
1 | 1 | 200 | 1
2 | 2 | 350 | 1
3 | 1 | 200 | 2
4 | 2 | 350 | 3
tour_id refers to the Tours table and the booking_id refers Bookings table.
I want to get a report like this
tour_id 1 refers to New york tour
tour_id 2 refers to Paris tour
I need a generate a report something like this
tour name | total_income | number_of_bookings
New york tour| 400 | 2
Paris tour | 700 | 2
Here basicaly tour name, total income from that tour and number of bookings for that tour.
What I have done upto now is this. But this gives me a syntax error. It seems I can't group by results.
SELECT booking_details.*,Tours.name as name, count(Tours.id) FROM booking_details
inner join Tours on
booking_details.tour_id = Tours.id group by Tours.name;
How do I achive this using MySQL?
you have used aggregation count() in your query and from your requirement, it shows you need aggregation. when you used aggregation you have to put selection column in group by also
SELECT Tours.name as name,sum(tour_fee) income, count(Tours.id)
FROM booking_details
inner join Tours on
booking_details.tour_id = Tours.id group by Tours.name
As you used in selection booking_details.* which means every column of booking table but you have not put those column in group by so it thrown error
You are trying to select non aggregated columns which are not part of your GROUP BY clause.
Change your query like following.
SELECT t.NAME AS NAME,
Sum(bd.tour_fee) total_income,
Count(t.id) number_of_bookings
FROM booking_details bd
INNER JOIN tours t
ON bd.tour_id = t.id
GROUP BY t.NAME;
Small suggestion, as a good practice you should use alias names for tables when joining.
You need to add all other columns in group by except aggregated fields
SELECT
booking_details.tour_id,
Tours.name AS name,
SUM(tourfee) AS total_income,
COUNT(Tours.id)
FROM
booking_details
INNER JOIN
Tours ON booking_details.tour_id = Tours.id
GROUP BY
booking_details.tour_id, Tours.name

Sql query with an OR operator between a HAVING and WHERE clause

Would it be possible to combine a having and a where clause in an OR operator within a single SQL query?
Maybe not the best example, but you'll get the idea:
Select the departments from the employee table that is HR (using the where clause) OR that pays all employees more than 25000 (using the having clause).
So how do we get the OR condition in the query down below? Or would it be better to separate the query into 2 queries.
SELECT dept, SUM (salary)
FROM employee
WHERE dept = "HR"
GROUP BY dept
HAVING SUM (salary) > 25000
The below will work - you do not have to specify an aggregate in the HAVING clause
SELECT dept, SUM (salary)
FROM employee
GROUP BY dept
HAVING dept = "HR" or SUM (salary) > 25000
But your statement "that pays all employees more than 25000 " is not clear. Do you want
Departments where all employees earn over 25000 each, or
departments where all employees earn over 25000 in total?
The query above gives you the second option, as that is closest to your original query
Wrap the GROUP BY part up in a derived table. Then apply the conditions to its result:
select dept, salarysum
from
(
SELECT dept, SUM (salary) as salarysum
FROM employee
GROUP BY dept
) dt
where salarysum > 25000 or dept = "HR"
Or, perhaps, "that pays all employees more than 25000", means that no dept employee earns less than 25000?
select dept, minsalary
from
(
SELECT dept, MIN(salary) as minsalary
FROM employee
GROUP BY dept
) dt
where minsalary > 25000 or dept = "HR"
Maybe you want dept where all salaries are over 25000.
drop table if exists employees;
create table employees(id int auto_increment primary key, dept varchar(2), salary int);
insert into employees (dept,salary)
values
('HR',10000),('aa',10000),('aa',45000),('bb',25000),('cc',26000),('cc',26000);
select dept,sum(salary) sumsalary,count(*) obs, sum(case when salary > 25000 then 1 else 0 end) over25000
from employees
group by dept having obs = over25000 or dept = 'hr'
+------+-----------+-----+-----------+
| dept | sumsalary | obs | over25000 |
+------+-----------+-----+-----------+
| cc | 52000 | 2 | 2 |
| HR | 10000 | 1 | 0 |
+------+-----------+-----+-----------+
2 rows in set (0.01 sec)

SQL query with multiple tables, possible to apply group by only to count(*)?

I am trying to list bookjobs info for jobtype 'N' and having publishers creditcode of 'C'. Then, add a count of the total number of po's (purchase orders- from table pos) for each row of the previous queries' output. Can you use group by to apply only to that count and not to the rest of the query? Do i have to use a join? My attempts thus far have been unsuccessful.
These are the tables i am working with:
bookjobs:
+--------+---------+----------+
| job_id | cust_id | jobtype |
+--------+---------+----------+
publishers:
+---------+------------+------------+
| cust_id | name | creditcode |
+---------+------------+------------+
pos:
+--------+-------+------------+-----------+
| job_id | po_id | po_date | vendor_id |
+--------+-------+------------+-----------+
This is what i came up with, although it is wrong (count is not grouped to job_id):
select b.*, (select count(*) from pos o) as count
from bookjobs b, publishers p, pos o
where b.cust_id=p.cust_id
and b.job_id=o.job_id
and b.jobtype='N'
and p.creditcode='C';
I believe i need to have the count grouped by job_id, but not the rest of the query. Is this possible or do i need to use a join? I tried a few joins but couldn't get anything to work. Any help appreciated.
Try this sql
select b.*, (select count(*) from pos where job_id=o.job_id) as count
from bookjobs b, publishers p, pos o
where b.cust_id=p.cust_id
and b.job_id=o.job_id
and b.jobtype='N'
and p.creditcode='C';
Based on what you describe, I would assume that your original query would return duplicate rows. You can fix this by pre-aggregating the pos table and then joining it in:
select b.*, o.cnt
from bookjobs b join
publishers p
on b.cust_id = p.cust_id join
(select job_id, count(*) as cnt
from pos o
group by job_id
) o
on b.job_id = o.job_id
where b.jobtype = 'N' and p.creditcode = 'C';