SQL count() not showing values of 0 - mysql

i just started my journey with SQL, and made some tables of Cyclists, and Cycling Teams.
Cyclist's table contains columns: ID, Name, Team (which is foreign key of TEAMS ID)
Team's table contains columns: ID, Name, Number of Cyclists
I want to Count number of Cyclists in each team, by using count() function ( Or basically any function, i just want to make it work )
After many minutes i figured out this query:
SELECT teams.name,
count(*) AS NumberOfCyclists FROM cyclists
JOIN teams ON cyclists.team = teams.id
group by teams.name;
and i Achieved this:
Which is all good, but when i LEFT JOIN i achieve:
My question is: How to get all of the teams (there are 15 of them, not 11), even those where the count of the cyclists is 0?

I think you misunderstand how LEFT JOIN works. The order of tables in the join is important. In a LEFT JOIN, the query returns all the rows in the left table, even if there are no matching rows in the right table. In your query, the left table is cyclists, and the right table is teams.
So your query is currently returning all cyclists, including those who have no team (the result shows that there are 3 cyclists who have no team). This is the reverse of what you want, which is all teams, even those with no cyclists.
If you want to return all the teams, then either reverse the tables in your join:
...
FROM teams
LEFT OUTER JOIN cyclists ON cyclists.team = teams.id
...
Or you could achieve the same result by using RIGHT join.
...
FROM cyclists
RIGHT OUTER JOIN teams ON cyclists.team = teams.id
...

You must count not the amount of rows (COUNT(*)) which cannot be zero but the amount of non-NULL values in definite column (the column which is used in joining condition usage is recommended) taken from right table (COUNT(table.column)). With LEFT JOIN, of course.
But the logic needs teams table to be left. And finally:
SELECT teams.name,
count(cyclists.team) AS NumberOfCyclists
FROM teams
LEFT JOIN cyclists ON cyclists.team = teams.id
group by teams.name;

Try this:
SELECT teams.name,
count(cyclists.id) AS NumberOfCyclists
FROM teams
LEFT JOIN cyclists ON cyclists.team = teams.id
group by teams.name;
The reason why this works instead of the way you have it is because it selects Teams as the base table to draw results from instead of Cyclists.
If there isn't a Cyclist record corresponding to a Team, then the Team is essentially null, and they are grouped together as such (with a null name). By going from Teams into Cyclists, you are saying to take each Team and find the Cyclist records corresponding to the Team, in which case there could be 0 or more.

As you LEFT JOIN, you get all rows from the table cyclists which can have a partner teams, when not all teams rows are NULL
So you have rows that have no oartner

Related

MySQL Differences with counts caused by joins

i have a problem in MySQL where I use the COUNT function for conditions. However, when combining this with joins, although I use grouping, the COUNT values include ALL rows, even the ones filtered out.
I'm providing a minimal working example, which however maybe does not make a practical sense or is designed smartly.
So assume I have 3 tables:
products with fields: productId, name, active (boolean)
teams with fields: teamId, name
rel_production with fields: teamId, productId
So basically I have products and teams with ids and names. Products can be active (lets say that means that they are still in production or so).
And then I have a relation which team is working on which product.
To explain my problem, assume the following minimal amount of data to clarify the problem is contained inside the tables:
products
teams
rel_production
Now the query that I want to do is, in plain english: "I want all teams that are working on exactly 2 products while atleast one product must be active."
The query in general works and is the following in mysql:
SELECT
teams.*,
"r_count:",
r_count.*,
COUNT(r_count.productId),
"r_active:",
r_active.*,
"p_active:",
p_active.*
FROM teams
INNER JOIN rel_production r_active ON r_active.teamId = teams.teamId
INNER JOIN products p_active ON p_active.productId = r_active.productId AND p_active.active
INNER JOIN rel_production r_count ON r_count.teamId = teams.teamId
GROUP BY teams.teamId, r_active.teamId
HAVING COUNT(r_count.productId) = 2 #4 is the problem!!!!!!!!!!!!!
Now them problem is with team 1. Because it is working on 2 active products, COUNT(r_count.productId) will be 4 and not 2. So my query will filter it out.
Here is the screenshot with the result without the HAVING clause:
I see why this happens, because the two inner joins on rel_production will cause 4 rows to be generated. But then they are merged always together to one using the GROUP BY. So what I need is the COUNT after the GROUP and not before.
How can I fix this?
Perform the filtering on teams in a separate subquery, and then join to that:
SELECT
t1.teamId,
t1.name
FROM teams t1
INNER JOIN
(
SELECT t1.teamId
FROM rel_production t1
INNER JOIN products t2
ON t1.productId = t2.productId
GROUP BY t1.teamId
HAVING COUNT(DISTINCT t1.productId) = 2 AND SUM(t2.active) > 0
) t2
ON t1.teamId = t2.teamId;
SQLFiddle

MYSQL Use OR in Left Join condition?

I'm not entirely sure that what I'm trying to do is possible. Can you use an OR in the condition of a left join? I start from my users table and then it can either go from week_meal to meal (adding a meal they do not own to their weekly meal plan) or straight to meal (a meal they own). That part appears to be working, but when I include mta.meal_to_add_id in the select, it incorrectly pulls in meals that do NOT meet the criteria in the LEFT JOIN to meal_to_add.
Fiddle with structure: http://sqlfiddle.com/#!2/7bd9c4
SELECT DISTINCT m.*, o.username as owner,i.*, mta.meal_to_add_id, follow_id
FROM webusers wu
LEFT JOIN week_meal wm ON wu.id=wm.user_id
LEFT JOIN meal m ON (wu.id=m.user_id OR wm.meal_id=m.meal_id)
LEFT JOIN webusers o ON m.user_id=o.id
LEFT JOIN meal_to_add mta ON
((wm.user_id = mta.user_id AND wm.meal_id=mta.meal_id)
OR (m.user_id=mta.user_id AND m.meal_id=mta.meal_id))
JOIN ingredient i ON m.meal_id = i.meal_id
LEFT JOIN follow f ON
m.user_id!=wu.id AND
m.user_id=f.followed_webuser_id
AND wu.id=f.followee_webuser_id
WHERE wu.id=5 AND m.meal_id in (138)
ORDER BY m.meal, i.ingredient_id
OUTPUT: It should be just like this only including the field for mta.meal_to_add_id, which in this case should be NULL for all rows (18)
Sample Results
To answer the first part of your question: Yes, you can use an OR clause in a LEFT JOIN.
As for the second part, in plain words, this is what the query seems to say:
Join 'week meals' to users on the user id. Join meals to users on that same user id OR join meals to users on the meal id. Assume now that we have some matching meal/user combinations, where some meal rows are matched on user, and others are matched on the meal id.
Next, join webusers to meals again. Now we have meal rows possibly matching two sets of users. So when mta tries to match meals, it is matching two possible sets of meal/user combinations.
My practice in cases like this is to break up the query into two queries and put the intermediate results in a temp table (using MEMORY engine), then select from that.

I can't wrap my head around joins

So, alright, I have a few tables. My current query runs against a "historical" table. I want to do a join of some kind to get the most recent status from my Current table. These tables share a like column, called "ID"
Here's the structure
ddCurrent
-ID
-Location
-Status
-Time
ddHistorical
-CID (AI field to keep multiple records per site)
-ID
-Location
-Status
-Time
My goal now is to do a simple join to get all the variables from ddHistorical and the current Status from ddCurrent.
I know that they can be joined on ID since both of them have the same items in their ID tables, I just can't figure out which kind of join is appropriate or why?
I'm sure someone may provide a specific link that goes into great detail explaining, but I'll try to summarize it this way. When writing a query, I try to list the tables from the position of what table do I want to get data from and have that as my first table in the "FROM" clause. Then, do "JOIN" criteria to other tables based on relationships (such as IDs). In your example
FROM
ddHistorical ddH
INNER JOIN ddCurrent ddC
on ddH.ID = ddC.ID
In this case, INNER JOIN (same as JOIN) the ddHistorical table is the left table(listed first for my styling consistency and indentation) and ddCurrent is the right table. Notice my ON criteria that joins them together is also left alias.column = right alias table.column -- again, this is just for mental correlation purposes.
an Inner Join (or JOIN) means a record MUST have a match on each side, otherwise it is discarded.
A LEFT JOIN means give me all records in the LEFT table (ddHistorical in this case), regardless of a matching in the right-side table (ddCurrent). Not practical in this example.
A RIGHT JOIN is the reverse... give me all records from the RIGHT-side table REGARDLESS of a matching record in the left side table. Most of the time you will see LEFT-JOINs more frequently than RIGHT-JOINs.
Now, a sample to mentally get the left-join. You work at a car dealership and have a master table of 10 cars that are sold. For a given month, you want to know what IS NOT selling. So, start with the master table of all cars and look at the sales table for what DID sell. If there is NO such sales activity the right-side table will have NULL value
select
M.CarID,
M.CarModel
from
MasterCarsList M
LEFT JOIN CarSales CS
on M.CarID = CS.CarID
AND month( CS.DateSold ) = 4
where
CS.CarID IS NULL
So, my LEFT join is based on a matching car ID -- AND -- the month of sales activity is 4 (April) as I may not care about sales for Jan-Mar -- but would also qualify year too, but this is a simple sample.
If there is no record in the Car Sales table it will have a NULL value for all columns. I just happen to care about the car ID column since that was the join basis. That is why I am including that in the WHERE clause. For all other types of cars that DO have a sale it will have a value.
This is a common approach you will see in querying where someone looking for all regardless of other... Some use a where NOT EXIST ( subselect ), but those perform slower because they test on every record. Having joins is much faster.
Other examples may be you want a list of all employees of a company, and if they had some certification / training to show it... You still want all employees, but LEFT-JOINING to some certification/training table would expose those extra field as needed.
select
Emp.FullName,
Cert.DateCertified
FROM
Employees Emp
Left Join Certifications Cert
on Emp.EmpID = Cert.EmpID
Hopefully these samples help you understand better the relationship for queries, and now to actually provide answer for your needs.
If what you want is a list of all "Current" items and want to look at their historical past, I would use current FIRST. This might be if your current table of things is 50, but historically your table had 420 items. You don't care about the other 360 items, just those that are current and the history of those.
select
ddC.WhateverColumns,
ddH.WhateverHistoricalColumns
from
ddCurrent ddC
JOIN ddHistorical ddH
on ddC.ID = ddH.ID
If there is always a current field then a simple INNER JOIN will do it
SELECT a.CID, a.ID, a.Location, a.Status, a.Time, b.Status
FROM ddHistorical a
INNER JOIN ddCurrent b
ON a.ID = b.ID
An INNER JOIN will omit any ddHistorical rows that don't have a corresponding ID in ddCurrent.
A LEFT JOIN will include all ddHistorical rows, even if they don't have a corresponding ID in ddCurrent, but the ddCurrent values will be null (because they're unknown).
Also note that a LEFT JOIN is just a specific type of outer join. Don't bother with the others yet - 90% or more of what you'll ever do will be INNER or LEFT.
To include only those ddHistorical rows where the ID is in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
INNER JOIN ddCurrent c ON h.ID = c.ID
If you want to include ddHistorical rows even if the ID isn't in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
LEFT JOIN ddCurrent c ON h.ID = c.ID
If all ddHistorical rows happen to match an ID in ddCurrent, note that both queries will return the same result.

MySQL LEFT JOIN?

I have a table cars(id, name) containing 20 rows. The other table carLog(username, car, count) contains rows which count the cars a player has bought (if there is no row if they haven't bought the car)
I want my query to return all twenty cars, and the extra join info, if they've got a row in the carLog table but I can't get it to work.
SELECT * FROM cars LEFT JOIN carLog ON cars.id=carLog.car
This is returning hundreds of rows, I want it to return 20 rows (one for each car), and the extra info in the row if the username has purchased the car:
WHERE carLog.username='Juddling'
I have no idea if I'm meant to be using GROUP BY, WHERE or another type of join!
Move the username condition from the WHERE clause to the ON clause.
SELECT *
FROM cars
LEFT JOIN carLog
ON cars.id=carLog.car
AND carLog.username='Juddling'
The WHERE clause is applied when the JOIN is already completed. This means, it will discard the NULL rows that the LEFT JOIN added.
As you are limiting the table from the outer join, you have to put the condition in the on, not the where:
select * from cars
left join carLog on cars.id = carLog.car and carlog.username = 'Juddling'

When to use a left outer join?

I don't understand the concept of a left outer join, a right outer join, or indeed why we need to use a join at all! The question I am struggling with and the table I am working from is here: Link
Question 3(b)
Construct a command in SQL to solve the following query, explaining why it had to employ the
(outer) join method. [5 Marks]
“Find the name of each staff member and his/her dependent spouse, if any”
Question 3(c) -
Construct a command in SQL to solve the following query, using (i) the join method, and (ii) the
subquery method. [10 Marks]
“Find the identity name of each staff member who has worked more than 20 hours on the
Computerization Project”
Can anyone please explain this to me simply?
Joins are used to combine two related tables together.
In your example, you can combine the Employee table and the Department table, like so:
SELECT FNAME, LNAME, DNAME
FROM
EMPLOYEE INNER JOIN DEPARTMENT ON EMPLOYEE.DNO=DEPARTMENT.DNUMBER
This would result in a recordset like:
FNAME LNAME DNAME
----- ----- -----
John Smith Research
John Doe Administration
I used an INNER JOIN above. INNER JOINs combine two tables so that only records with matches in both tables are displayed, and they are joined in this case, on the department number (field DNO in Employee, DNUMBER in Department table).
LEFT JOINs allow you to combine two tables when you have records in the first table but might not have records in the second table. For example, let's say you want a list of all the employees, plus any dependents:
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_last, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE INNER JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
The problem here is that if an employee doesn't have a dependent, then their record won't show up at all -- because there's no matching record in the DEPENDENT table.
So, you use a left join which keeps all the data on the "left" (i.e. the first table) and pulls in any matching data on the "right" (the second table):
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_first, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE LEFT JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
Now we get all of the employee records. If there is no matching dependent(s) for a given employee, the dependent_first and dependent_last fields will be null.
example (not using your example tables :-)
I have a car rental company.
Table car
id: integer primary key autoincrement
licence_plate: varchar
purchase_date: date
Table customer
id: integer primary key autoincrement
name: varchar
Table rental
id: integer primary key autoincrement
car_id: integer
bike_id: integer
customer_id: integer
rental_date: date
Simple right? I have 10 records for cars because I have 10 cars.
I've been running this business for 10 years, so I've got 1000 customers.
And I rent the cars about 20x per year per cars = 10 years x 10 cars x 20 = 2000 rentals.
If I store everything in one big table I've got 10x1000x2000 = 20 million records.
If I store it in 3 tables I've got 10+1000+2000 = 3010 records.
That's 3 orders of magnitude, so that's why I use 3 tables.
But because I use 3 tables (to save space and time) I have to use joins in order to get the data out again
(at least if I want names and licence plates instead of numbers).
Using inner joins
All rentals for customer 345?
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
WHERE customer.id = 345.
That's an INNER JOIN, because we only want to know about cars linked to rentals linked to customers that actually happened.
Notice that we also have a bike_id, linking to the bike table, which is pretty similar to the car table but different.
How would we get all bike + car rentals for customer 345.
We can try and do this
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
INNER JOIN bike on (bike.id = rental.bike_id)
WHERE customer.id = 345.
But that will give an empty set!!
This is because a rental can either be a bike_rental OR a car_rental, but not both at the same time.
And the non-working inner join query will only give results for all rentals where we rent out both a bike and a car in the same transaction.
We are trying to get and boolean OR relationship using a boolean AND join.
Using outer joins
In order to solve this we need an outer join.
Let's solve it with left join
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id) <<-- link always
LEFT JOIN car on (car.id = rental.car_id) <<-- link half of the time
LEFT JOIN bike on (bike.id = rental.bike_id) <<-- link (other) 0.5 of the time.
WHERE customer.id = 345.
Look at it this way. An inner join is an AND and a left join is a OR as in the following pseudocode:
if a=1 AND a=2 then {this is always false, no result}
if a=1 OR a=2 then {this might be true or not}
If you create the tables and run the query you can see the result.
on terminology
A left join is the same as a left outer join.
A join with no extra prefixes is an inner join
There's also a full outer join. In 25 years of programming I've never used that.
Why Left join
Well there's two tables involved. In the example we linked
customer to rental with an inner join, in an inner join both tables must link so there is no difference between the left:customer table and the right:rental table.
The next link was a left join between left:rental and right:car. On the left side all rows must link and the right side they don't have to. This is why it's a left join
You use outer joins when you need all of the results from one of the join tables, whether there is a matching row in the other table or not.
I think Question 3(b) is confusing because its entire premise wrong: you don't have to use an outer join to "solve the query" e.g. consider this (following the style of syntax in the exam paper is probably wise):
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
UNION
SELECT FNAME, LNAME, NULL
FROM EMPLOYEE
EXCEPT
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
In general:
JOIN joints two tables together.
Use INNER JOIN when you wanna "look up", like look up detailed information of any specific column.
Use OUTER JOIN when you wanna "demonstrate", like list all the info of the 2 tables.