SQL Group by Subquery ignored - mysql

I have a database where a company has an amount of slots, These slots can be filled with persons..
I want to do a query where I can see which companies still have open slots
This is the query i'm trying but it's giving me the wrong results.
select
name,
slots,
(select count(*) from persons) as persons
from companies
where city_id = 3
group by companies.id
This should give me a table with the slots, and the amount of personsfilled for that company in the persons table, but it's returning the total amount of persons every time.
This is the result
Thank you!

Like #JoeTaras said, you need to join persons and companies to be able to tell/count which persons belong to which company. If you don't join them somehow, companies and persons will be treated and counted independently which is normally not very useful.
A different sub-query could indeed be used, but it's not quite how 'you do it', and will probably be less performant than the straight-forward join.
Example:
select
companies.id
companies.name,
companies.slots,
count(persons.id)
from companies
left outer join persons on companies.id = persons. ...
where companies.city_id = 3
group by companies.id, companies.name, companies.slots

Related

Joining three tables and finding a sum of payments in MySQL

I am struggling with this problem in MySQL. The question asks...
Find the names of the individuals and businesses that have made no more than three payments.
Individuals is a table, businesses is a table, and payments is a table. The problem I am having is Payments only contains columns dateFiled and amountPaid. I tried creating a count operation, but it shows blank results.
Here is my code:
SELECT Individuals.name, Businesses.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Individuals ON Payments.taxpayerID=Individuals.taxpayerID
JOIN Businesses ON Payments.taxpayerID=Businesses.taxpayerID
GROUP BY Businesses.name, Individuals.name, Payments.taxpayerID
HAVING COUNT(*) <= 3;
If anyone can help me solve this it would be greatly appreciated.
I guess what you are looking for is not a join of three tables but a union of two selects:
SELECT Individuals.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Individuals ON Payments.taxpayerID=Individuals.taxpayerID
GROUP BY Individuals.name, Payments.taxpayerID
HAVING COUNT(*) <= 3
UNION
SELECT Businesses.name, Payments.taxpayerID, COUNT(*) AS 'Payments'
FROM Payments
JOIN Businesses ON Payments.taxpayerID=Businesses.taxpayerID
GROUP BY Businesses.name, Payments.taxpayerID
HAVING COUNT(*) <= 3;
Your version is giving zero results, because a tax ID is either associated with a business or an individual. Therefore you need to query both independently and combine the results with union.
That said, yes you could work with joins and only a single select but then you'd need outer joins and the query would be less readable IMHO.

Best way to structure SQL queries with many inner joins?

I have an SQL query that needs to perform multiple inner joins, as follows:
SELECT DISTINCT adv.Email, adv.Credit, c.credit_id AS creditId, c.creditName AS creditName, a.Ad_id AS adId, a.adName
FROM placementlist pl
INNER JOIN
(SELECT Ad_id, List_id FROM placements) AS p
ON pl.List_id = p.List_id
INNER JOIN
(SELECT Ad_id, Name AS adName, credit_id FROM ad) AS a
ON ...
(few more inner joins)
My question is the following: How can I optimize this query? I was under the impression that, even though the way I currently query the database creates small temporary tables (inner SELECT statements), it would still be advantageous to performing an inner join on the unaltered tables as they could have about 10,000 - 100,000 entries (not millions). However, I was told that this is not the best way to go about it but did not have the opportunity to ask what the recommended approach would be.
What would be the best approach here?
To use derived tables such as
INNER JOIN (SELECT Ad_id, List_id FROM placements) AS p
is not recommendable. Let the dbms find out by itself what values it needs from
INNER JOIN placements AS p
instead of telling it (again) by kinda forcing it to create a view on the table with the two values only. (And using FROM tablename is even much more readable.)
With SQL you mainly say what you want to see, not how this is going to be achieved. (Well, of course this is just a rule of thumb.) So if no other columns except Ad_id and List_id are used from table placements, the dbms will find its best way to handle this. Don't try to make it use your way.
The same is true of the IN clause, by the way, where you often see WHERE col IN (SELECT DISTINCT colx FROM ...) instead of simply WHERE col IN (SELECT colx FROM ...). This does exactly the same, but with DISTINCT you tell the dbms "make your subquery's rows distinct before looking for col". But why would you want to force it to do so? Why not have it use just the method the dbms finds most appropriate?
Back to derived tables: Use them when they really do something, especially aggregations, or when they make your query more readable.
Moreover,
SELECT DISTINCT adv.Email, adv.Credit, ...
doesn't look to good either. Yes, sometimes you need SELECT DISTINCT, but usually you wouldn't. Most often it is just a sign that you haven't thought your query through.
An example: you want to select clients that bought product X. In SQL you would say: where a purchase of X EXISTS for the client. Or: where the client is IN the set of the X purchasers.
select * from clients c where exists
(select * from purchases p where p.clientid = c.clientid and product = 'X');
Or
select * from clients where clientid in
(select clientid from purchases where product = 'X');
You don't say: Give me all combinations of clients and X purchases and then boil that down so I just get each client once.
select distinct c.*
from clients c
join purchases p on p.clientid = c.clientid and product = 'X';
Yes, it is very easy to just join all tables needed and then just list the columns to select and then just put DISTINCT in front. But it makes the query kind of blurry, because you don't write the query as you would word the task. And it can make things difficult when it comes to aggregations. The following query is wrong, because you multiply money earned with the number of money-spent records and vice versa.
select
sum(money_spent.value),
sum(money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
And the following may look correct, but is still incorrect (it only works when the values happen to be unique):
select
sum(distinct money_spent.value),
sum(distinct money_earned.value)
from user
join money_spent on money_spent.userid = user.userid
join money_earned on money_earned.userid = user.userid;
Again: You would not say: "I want to combine each purchase with each earning and then ...". You would say: "I want the sum of money spent and the sum of money earned per user". So you are not dealing with single purchases or earnings, but with their sums. As in
select
sum(select value from money_spent where money_spent.userid = user.userid),
sum(select value from money_earned where money_earned.userid = user.userid)
from user;
Or:
select
spent.total,
earned.total
from user
join (select userid, sum(value) as total from money_spent group by userid) spent
on spent.userid = user.userid
join (select userid, sum(value) as total from money_earned group by userid) earned
on earned.userid = user.userid;
So you see, this is where derived tables come into play.

Complex SQL query over four tables does not fetch wanted result

Imagine the following scenario: Employees of a company can give votes to an arbitrary question (integer value).
I have a complex request where I want to fetch five information:
Name of the company
Average vote value per company
Number of employees
Number of votes
Participation (no of votes/no of employees)
The SQL query shall only fetch votes of companies, that the current user is employed at.
Therefore I am accessing four different tables, following you see an excerpt of the table declarations:
User
- id
Company
- id
- name
Employment
- user_id (FK User.id)
- company_id (FK Company.id)
Vote
- company_name
- vote_value
- timestamp
User and Company are related by an Employment (n:m relation, but needs to be extra table). The table Vote shall not be connected by PK/FK-relation, but they can be related to a company by their company name (Company.name = Vote.company_name).
I managed to fetch all information except for the number of employees correctly by the following SQL query:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
WHERE u.id = :u_id
GROUP BY v.company_name, e.company_id
But instead of fetching the correct number of employees, the employee field is always equal the number of votes. (And therefore the participation value is also wrong.)
Is there any way to perform this in one query without subqueries1? What do I have to change so that the query fetches the correct number of employees?
1 I am using Doctrine2 and try to avoid subqueries as Doctrine does not support them. I just did not want to pull this into a Doctrine discussion. That's I why I broke this topic down to SQL level.
If you want to fetch the number of employees then the issue is that you are filtering by only 1 employee:
WHERE u.id = :u_id
Secondly, bear in mind that if you want to count the amount of employees and you have gotten into the vote grouping level, then of course you will have the amount of rows equal to the amount of votes. So you will have to distinct count as #Przem... mentioned:
COUNT(DISTINCT e.user_id) AS employees,
That way you will uniquely count the employees for the company (getting rid of the repeated employee ids for all the votes the employee has).
As you mentioned in a comment:
It returns the 1 as employee count
This is because of the where condition forcing to 1 employee with many votes. The distinct will only count the unique 1 employee filtered by the where clause and that is why you get only 1. However, that is the correct result (based on your filter condition).
Adding subqueries in the select clause will also get you to the right result but at the expense of performance.
Try this--it calculates the votes as one subquery and the employees as another subquery.
SELECT c.name,
ce.employee_count,
cv.vote_count,
cv.vote_count / ce.employee_count,
cv.vote_value
FROM
(select company, count(*) AS 'employee_count'
FROM employment GROUP BY company) ce
INNER JOIN company c
ON c.id = ce.company
INNER JOIN
(select company, AVG(vote_value) AS 'vote_value', count(*) as 'vote_count'
FROM vote v GROUP BY company) cv
ON c.id = cv.company
Well I think with a query defined like that you should add the DISTINCT keyword while counting the number of employees:
SELECT
c.name AS company,
AVG(v.vote_value) AS value,
COUNT(DISTINCT e.user_id) AS employees,
COUNT(f.face) AS votes,
(COUNT(DISTINCT e.user_id) / COUNT(v.vote_value)) AS participation
FROM Company c
JOIN Employment e ON e.company_id = c.id
JOIN User u ON u.id = e.user_id
JOIN Vote v
ON v.company_name = c.name
AND YEAR(v.timestamp) = :year
AND MONTH(v.timestamp) = :month
AND DAY(v.timestamp) = :day
GROUP BY v.company_name, e.company_id;
Not sure if it is possible in MySQL, though.
Edit: as #Mosty Mostacho pointed out, the condition on u.id was the problem, and without it and with addition of DISTINCT keyword, the query returns correct results and I edited the above query.

SQL how to join these tables

The scenario:
I have a website which let users vote between cars which they like most. Cars are saved in the table cars, votes are saved in votes and the column country_id from the table cars reference to countries (where the carbrand comes from).
I want to show the users which country has the most votes. Simple version of the tables:
CARS
id
name
country_id
Countries
id
name
Votes
id
user_id
car_id
Ideally I would like to show the users the top x countries. And how many votes they all have.
Bonus: would it be possible to use this query for a certain user? So they see their top x with countries they voted on.
And which indexes you suggest? The votes table can grow beyond 10 million votes, the cars table can grow fast too.
I think you can achieve this with a LEFT JOIN query and GROUP BY aggregate function
SELECT COUNT(a.id) as total_votes, c.name as country_name
FROM Votes a
LEFT JOIN CARS b
ON a.car_id = b.id
LEFT JOIN Countries c
ON b.country_id = c.id
GROUP BY b.name, c.name
ORDER BY total_votes DESC
Indexes on Cars.CountryID, Votes.UserID and Votes.CarID would seem reasonable. As mzedler suggested though, when you get up to tens of millions, aggregates can be a bad idea.
There are number of ways of addressing that, triggers, a cache, or adding date voted to votes, so you break down the number of records you have to count in one go. e.g cache votes daily and then just query those made since midnight and then sum them.

MySQL: Selecting from 3 tables

I have the following query:
SELECT
DISTINCT sites.site_id,
sites.site_name,
sites.site_url,
earnings.cust_id
FROM
sites,
earnings
WHERE sites.site_id = earnings.site_id AND sites.site_id IN('8', '1666')
That query gives me very well the information asked. It returns two rows, one for site 8 and another for site 1666, with the information on them from those tables.
Now, I want that the cust_id number be used to select from another table (let's say table customers) where they are stored by id and where other info is such as name, last name, etc.
Basically what I need is to expand that query to extract customer name and last name from the table customers, using the ids obtained.
Same way you got the info from two tables. Add a comma, add the third table name, and add the relationship to your WHERE clause like you did with the first two tables.
SELECT
DISTINCT sites.site_id,
sites.site_name,
sites.site_url,
earnings.cust_id,
customers.name,
customers.last_name
FROM
sites,
earnings,
customers
WHERE sites.site_id = earnings.site_id AND sites.site_id IN('8', '1666') AND customers.id = earnings.cust_id
I think it's clearer to write out the JOINs though:
SELECT
sites.site_id,
sites.site_name,
sites.site_url,
earnings.cust_id,
customers.name,
customers.last_name
FROM
sites
INNER JOIN
earnings
ON
earnings.site_id = sites.site_id
INNER JOIN
customers
ON
customers.id = earnings.cust_id
WHERE
sites.site_id IN (8, 1666)
GROUP BY
sites.site_id