Complex SQL query over four tables does not fetch wanted result - mysql

Imagine the following scenario: Employees of a company can give votes to an arbitrary question (integer value).
I have a complex request where I want to fetch five information:
Name of the company
Average vote value per company
Number of employees
Number of votes
Participation (no of votes/no of employees)
The SQL query shall only fetch votes of companies, that the current user is employed at.
Therefore I am accessing four different tables, following you see an excerpt of the table declarations:
User
- id
Company
- id
- name
Employment
- user_id (FK User.id)
- company_id (FK Company.id)
Vote
- company_name
- vote_value
- timestamp
User and Company are related by an Employment (n:m relation, but needs to be extra table). The table Vote shall not be connected by PK/FK-relation, but they can be related to a company by their company name (Company.name = Vote.company_name).
I managed to fetch all information except for the number of employees correctly by the following SQL query:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
WHERE u.id = :u_id
GROUP BY v.company_name, e.company_id
But instead of fetching the correct number of employees, the employee field is always equal the number of votes. (And therefore the participation value is also wrong.)
Is there any way to perform this in one query without subqueries1? What do I have to change so that the query fetches the correct number of employees?
1 I am using Doctrine2 and try to avoid subqueries as Doctrine does not support them. I just did not want to pull this into a Doctrine discussion. That's I why I broke this topic down to SQL level.

If you want to fetch the number of employees then the issue is that you are filtering by only 1 employee:
WHERE u.id = :u_id
Secondly, bear in mind that if you want to count the amount of employees and you have gotten into the vote grouping level, then of course you will have the amount of rows equal to the amount of votes. So you will have to distinct count as #Przem... mentioned:
COUNT(DISTINCT e.user_id) AS employees,
That way you will uniquely count the employees for the company (getting rid of the repeated employee ids for all the votes the employee has).
As you mentioned in a comment:
It returns the 1 as employee count
This is because of the where condition forcing to 1 employee with many votes. The distinct will only count the unique 1 employee filtered by the where clause and that is why you get only 1. However, that is the correct result (based on your filter condition).
Adding subqueries in the select clause will also get you to the right result but at the expense of performance.

Try this--it calculates the votes as one subquery and the employees as another subquery.
SELECT c.name,
ce.employee_count,
cv.vote_count,
cv.vote_count / ce.employee_count,
cv.vote_value
FROM
(select company, count(*) AS 'employee_count'
FROM employment GROUP BY company) ce
INNER JOIN company c
ON c.id = ce.company
INNER JOIN
(select company, AVG(vote_value) AS 'vote_value', count(*) as 'vote_count'
FROM vote v GROUP BY company) cv
ON c.id = cv.company

Well I think with a query defined like that you should add the DISTINCT keyword while counting the number of employees:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(DISTINCT e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(DISTINCT e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
GROUP BY v.company_name, e.company_id;
Not sure if it is possible in MySQL, though.
Edit: as #Mosty Mostacho pointed out, the condition on u.id was the problem, and without it and with addition of DISTINCT keyword, the query returns correct results and I edited the above query.

Related

Trying to get a row count in a subquery

I have two tables, one is departments and the other is employees. The department id is a foreign key in the employees table. The employee table has a name and a flag saying if the person is part-time. I can have zero or more employees in a department. I'm trying to figure out out to get a list of all departments where a department has at least one employee and if it does have at least one employee, that all the employees are part time. I think this has to be some kind of subquery to get this. Here's what I have so far:
SELECT dept.name
,dept.id
,employee.deptid
,count(employee.is_parttime)
FROM employee
,dept
WHERE dept.id = employee.deptid
AND employee.is_parttime = 1
GROUP BY employee.is_parttime
I would really appreciate any help at this point.
You must join (properly) the tables and group by department with a condition in the HAVING clause:
select d.name, d.id, count(e.id) total
from dept d inner join employee e
on d.id = e.deptid
group by d.name, d.id
having total = sum(e.is_parttime)
The inner join returns only departments with at least 1 employee.
The column is_parttime (I guess) is a flag with values 0 or 1 so by summing it the result is the number of employees that are part time in the department and this number is compared to the total number of employees of the department.
As a preliminary aside, I recommend expressing joins with the JOIN keyword, and segregating join conditions from filter conditions. Doing so would make the original query look like so:
select dept.name, dept.id, employee.deptid, count(employee.is_parttime)
from employee
join dept on dept.id = employee.deptid
where employee.is_parttime = 1
group by employee.is_parttime
It doesn't make much practical difference for inner joins, but it does make the structure of the data and the logic of the query a bit clearer. On the other hand, it does make a difference for outer joins, and there is value in consistency.
As for the actual question, yes, one can rewrite the original query using a subquery or an inline view to produce the requested result. (An "inline view" is technically what one should call an embedded query used as a table in the FROM clause, but some people lump these in with subqueries.)
Example using a subquery
select dept.name, dept.id
from dept
where dept.id in (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
)
Example using an inline view
select dept.name, dept.id
from dept
join (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
) pt_dept
on dept.id = pt_dept.deptid
In each case, the subquery / inline view does most of the work. It aggregates employees by department, then filters the groups (HAVING clause) to select only those in which the part-time employee count is the same as the total count. Naturally, departments without any employees will not be represented. If a list of department IDs would suffice for a list of departments, then that's actually all you need. To get the department names too, however, you need to combine that with data from the dept table, as demonstrated in the two example queries.

How can I get customer data based on the number of users they have?

I want to get customer data from all the businesses with more than 1 user.
For this I think I need a subquery to count more than 1 user and then the outer query to give me their emails.
I have tried subqueries in the WHERE and HAVING clause
SELECT u.mail
FROM users u
WHERE count IN (
SELECT count (u.id_business)
FROM businesses b
INNER JOIN users u ON b.id = u.id_business
GROUP BY b.id, u.id_business
HAVING COUNT (u.id_business) >= 2
)
I believe that you do not need a subquery, everything can be achieved in a joined aggregate query with a HAVING clause, like :
SELECT u.mail
FROM users u
INNER JOIN businesses b on b.id = u.id_business
GROUP BY u.id, u.email
HAVING COUNT (*) >= 2
NB : in case several users may have the same email, I have added the primary key of users to the GROUP BY clause (I assumed that the pk is called id) : you may remove this if email is a unique field in users.

Use SELECT through three table

I tried to write a query, but unfortunately I didn't succeed.
I want to know how many packages delivered over a given period by a person.
So I want to know how many packages were delivered by John (user_id = 1) between 01-02-18 and 28-02-18. John drives another car (another plate_id) every day.
(orders_drivers.user_id, plates.plate_name, orders.delivery_date, orders.package_amount)
I have 3 table:
orders with plate_id delivery_date package_amount
plates with plate_id plate_name
orders_drivers with plate_id plate_date user_id
I tried some solutions but didn't get the expected result. Thanks!
Try using JOINS as shown below:
SELECT SUM(o.package_amount)
FROM orders o INNER JOIN orders_drivers od
ON o.plate_id=od.plate_id
WHERE od.user_id=<the_user_id>;
See MySQL Join Made Easy for insight.
You can also use a subquery:
SELECT SUM(o.package_amount)
FROM orders o
WHERE EXISTS (SELECT 1
FROM orders_drivers od
WHERE user_id=<user_id> AND o.plate_id=od.plate_id);
SELECT sum(orders.package_amount) AS amount
FROM orders
LEFT JOIN plates ON orders.plate_id = orders_drivers.plate_id
LEFT JOIN orders_driver ON orders.plate_id = orders_drivers.plate_id
WHERE orders.delivery_date > date1 AND orders.delivery_date < date2 AND orders_driver.user_id = userid
GROUP BY orders_drivers.user_id
But seriously, you need to ask questions that makes more sense.
sum is a function to add all values that has been grouped by GROUP BY.
LEFT JOIN connects all tables by id = id. Any other join can do this in this case, as all ids are unique (at least I hope).
WHERE, where you give the dates and user.
And GROUP BY userid, so if there are more records of the same id, they are returned as one (and summed by their pack amount.)
With the AS, your result is returned under the name 'amount',
If you want the total of packageamount by user in a period, you can use this query:
UPDATE: add a where clause on user_id, to retrieve John related data
SELECT od.user_id
, p.plate_name
, SUM(o.package_amount) AS TotalPackageAmount
FROM orders_drivers od
JOIN plates p
ON o.plate_id = od.plate_id
JOIN orders o
ON o.plate_id = od.plate_id
WHERE o.delivery_date BETWEEN convert(datetime,01/02/2018,103) AND convert(datetime,28/02/2018,103)
AND od.user_id = 1
GROUP BY od.user_id
, p.plate_name
It groups rows on user_id and plate_name, filter a period of delivery_date(s) and then calculate the sum of packageamount for the group

SQL - Append a count with a where has to query results

I have a Laravel booking app but am currently doing some manual reporting for a client.
I have a SQL query I run in SequelPro:
SELECT t.name
, t.email
, t.trial_ends_at
,
FROM teams t
ORDER
BY t.trial_ends_at DESC
However, what I now wish to do is add another field to each row that shows the client count for that team.
The relationships for clients of a team is:
users can have many bookings,
bookings belong to a user,
bookings have a team_id field
What I wish to do is append the count of users where they have made at least 1 booking of that team id.
In Laravel's eloquent I would do:
return User::whereHas('bookings', function($q) {
$q->where('team_id', THE ID)
})->count();
Join to a subquery which finds the counts:
SELECT
t.name,
t.email,
t.trial_ends_at,
COALESCE(b.cnt, 0) AS client_cnt
FROM teams t
LEFT JOIN
(
SELECT team_id, COUNT(*) AS cnt
FROM bookings
GROUP BY team_id
) b
ON t.id = b.team_id -- this assumes that id joins to team_id
ORDER BY
t.trial_ends_at DESC;
You had the following requirement:
What I wish to do is append the count of users where they have made at least 1 booking of that team id.
It seems to me that a user would only appear in a record in the bookings table if there was a reservation associated with that record. In other words, I don't think we need to do any extra check for this requirement, since if a user does appear, then by default he already appeared once.
If your first query produce $teams then you could do something like-
foreach($teams as $team){
$userCount = User::whereHas('bookings', function($q) use($team){
$q->where('team_id', $team->id);
})->count();
if($userCount>0){
$team->userCount = $userCount;
}
}
This will append userCount if you have more then single booking in the $teams collections, otherwise not.

SQL Group by Subquery ignored

I have a database where a company has an amount of slots, These slots can be filled with persons..
I want to do a query where I can see which companies still have open slots
This is the query i'm trying but it's giving me the wrong results.
select
name,
slots,
(select count(*) from persons) as persons
from companies
where city_id = 3
group by companies.id
This should give me a table with the slots, and the amount of personsfilled for that company in the persons table, but it's returning the total amount of persons every time.
This is the result
Thank you!
Like #JoeTaras said, you need to join persons and companies to be able to tell/count which persons belong to which company. If you don't join them somehow, companies and persons will be treated and counted independently which is normally not very useful.
A different sub-query could indeed be used, but it's not quite how 'you do it', and will probably be less performant than the straight-forward join.
Example:
select
companies.id
companies.name,
companies.slots,
count(persons.id)
from companies
left outer join persons on companies.id = persons. ...
where companies.city_id = 3
group by companies.id, companies.name, companies.slots