LEFT JOIN not working to replace a NOT EXISTS MYSQL - mysql

I have a table called Documents and a table called Notes that stores notes for the Documents. I need to get all Documents where there are no notes that have a status of 2 or 3.
SELECT * FROM Documents
WHERE NOT EXISTS (
SELECT docID FROM Notes WHERE docId = id AND status IN (2, 3)
)
This is extremely slow but it works. I tried doing an Inner join but if just one note has a status other than 2 or 3, it still shows the Document. I need it to only show Documents where there is no occurrence of 2 or 3 in any of the notes.
Can anyone help!? Thanks!

One way of doing it:
SELECT *, COUNT(docID)
FROM Documents
LEFT JOIN Notes ON docID = id AND (status IN (2,3))
GROUP BY id
HAVING COUNT(docID) = 0
If there's a status=2 or status=3, then the count will be non-zero, and the having will eliminate the document entirely.

An anti-join is a familiar SQL pattern.
With that pattern, we use an outer join operation to return all rows, along with matching rows, including rows that don't have a match. The "trick" is to use a predicate in the WHERE clause to filter out all of the rows that found a match, leaving only rows that didn't have a match.
As an example, to retrieve rows from Documents that don't have any matching row in Notes that meet specified criteria:
SELECT d.*
FROM Documents d
LEFT
JOIN Notes n
ON n.docId = d.id
AND n.status IN (2,3)
WHERE n.docId IS NULL
(I'm guessing that docId and status are references to columns in Notes, and that id is a reference to a column in Documents. The column references in your query aren't qualified, so that leaves us guessing which columns are in which table. Best practice is to qualify all column references in a query that references more than one table. One big benefit is that it makes it possible to decipher the statement without having to look at the table definitions, to figure out which columns are coming from which table.)
That query will return rows from Documents where there isn't any "matching" row in Notes that has a status of 2 or 3.
Because the LEFT JOIN is an outer join, it returns all rows from Documents, along with matching rows from Notes. Any rows from Documents that don't have a matching row will be also be returned, with NULL values for the columns from Notes. The equality predicate in the join (n.docId = d.Id) guarantees us that any "matching" row will have a non-NULL value for docId.
The "trick" is the predicate in the WHERE clause: n.docId IS NULL
Any rows that had a match will be filtered out, so we're left with rows from Documents that didn't have a match.
The original query has status NOT IN (2,3). That would essentially ignore rows in Notes that have one of those statuses, and only a row with some other non-NULL value of status would "match". Based on the specification... "no notes with a status of 2 or 3", that seems to imply you'd want status IN (2,3) in the query.

Related

MySQL Joins. match two conditions in two columns

I am trying to fetch data from the 2 tables using joins. But I don't know how to do it. I have read about joins but it wasn't making any sense since I am new.
I have two tables.
Playgrounds Inspection
id u_id id playground_id
1 2 1 1
2 2 2 2
In playgrounds table id is the playground id and is unique and u_id is the user of the playground.
In inspection table id is unique and playground_id is used as a foreign key from playgrounds table's id.
My problem is that I have to pass id and u_id to the the query in playgrounds table then it should select id from playground table. Then it should select every thing from inspection table based on that id .
Hope I have explained it properly.
Any help is much appreciated.
Thanks in advance
JOIN operations, whether INNER JOIN, LEFT JOIN, or RIGHT JOIN, have this overall syntax.
table_expression JOIN table_expression ON boolean_expression
When the Boolean expression is true, the JOIN concatenates the matching rows of the two tables into a single row in the result.
In your case, you want the ON clause's Boolean expression to say
Playgrounds.id = Inspection.playground_id
because that's how you know a row from Inspection relates to a row in Playgrounds. Accordingly, a statement like
SELECT Inspection.id AS Inspection_ID,
Playgrounds.id AS Playground_ID,
Playgrounds.u_id AS User_ID
FROM Playgrounds
JOIN Inspection ON Playgrounds.id = Inspection.playground_id
WHERE Playgrounds.u_id = something
AND Playgrounds.id = something_else
See how it goes? Your SELECT makes a new table, sometimes called a result set, by JOINing the rows of two existing tables on the criteria you choose in the ON clause.
You can put anything you want into the ON clause. You could put
Playgrounds.id = Inspection.playground_id OR Inspection.id < 20
or
Playgrounds.id = Inspection.playground_id OR Inspection.scope = 'citywide'
for example (if you had a scope column in the Inspection table).
But, don't go there until you master the basic JOIN operation.

include null and zero in count() from related table

I would like to list in table (staging) the number of related records from table (studies).
So far this statement works well but returns only the rows where there are >0 related records:
SELECT staging.*,
COUNT(studies.PMID) AS refcount
FROM studies
LEFT JOIN staging
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
How can I adjust this statement to list ALL rows in table (staging) including where there are zero or null related records from table (studies)?
Thank you
You have the tables in the wrong order in the LEFT JOIN:
SELECT staging.*, COUNT(studies.PMID) AS refcount
FROM staging LEFT JOIN
studies
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
LEFT JOIN keeps everything in the first ("left") table and all matching rows in the second. If you want to keep everything in the staging table, then put it first.
And, in case anyone wants to complain about the use of staging.* with GROUP BY. This particular usage is (presumably) ANSI compliant because staging.idstaging is (presumably) a unique id in that table.

Using LEFT JOIN to returns rows that don't have a match

I have a situation where we have a table that contains all the items, if the item was sold it has an entry in another table.
I am trying to get all the items returned that have not been sold, but it doesn't seem to work.
I have this SQL:
SELECT a.auction_id FROM auctions AS a
LEFT JOIN winners AS w ON a.auction_id=w.auction_id AND w.winner_id IS NULL
WHERE a.owner_id=1234567 AND a.is_draft=0 AND a.creation_in_progress=0;
I thought this would only return items from the auctions table that don't have a matched entry in the winners table since I am doing AND w.winner_id IS NULL.
However it seems to still return the same amount of rows as it does when I leave off AND w.winner_id IS NULL.
SELECT a.auction_id
FROM auctions AS a
LEFT JOIN winners AS w
ON a.auction_id = w.auction_id
WHERE a.owner_id = 1234567
AND a.is_draft = 0
AND a.creation_in_progress = 0
AND w.winner_id IS NULL
This belongs in the WHERE clause:
AND w.winner_id IS NULL
Criteria on the outer joined table belongs in the ON clause when you want to ALLOW nulls. In this case, where you're filtering in on nulls, you put that criteria into the WHERE clause. Everything in the ON clause is designed to allow nulls.
Here are some examples using data from a question I answered not long ago:
Proper use of where x is null:
http://sqlfiddle.com/#!2/8936b5/2/0
Same thing but improperly placing that criteria into the ON clause:
http://sqlfiddle.com/#!2/8936b5/3/0
(notice the FUNCTIONAL difference, the result is not the same, because the queries are not functionally equivalent)

Dependant SubQuery v Left Join

This query displays the correct result but when doing an EXPLAIN, it lists it as a "Dependant SubQuery" which I'm led to believe is bad?
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
WHERE CompetitionID NOT
IN (
SELECT CompetitionID
FROM PicksPoints
WHERE UserID =1
)
I tried changing the query to this:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON Competition.CompetitionID = PicksPoints.CompetitionID
WHERE UserID =1
and PicksPoints.PicksPointsID is null
but it displays 0 rows. What is wrong with the above compared to the first query that actually does work?
The seconds query cannot produce rows: it claims:
WHERE UserID =1
and PicksPoints.PicksPointsID is null
But to clarify, I rewrite as follows:
WHERE PicksPoints.UserID =1
and PicksPoints.PicksPointsID is null
So, on one hand, you are asking for rows on PicksPoints where UserId = 1, but then again you expect the row to not exist in the first place. Can you see the fail?
External joins are so tricky at that! Usually you filter using columns from the "outer" table, for example Competition. But you do not wish to do so; you wish to filter on the left-joined table. Try and rewrite as follows:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON (Competition.CompetitionID = PicksPoints.CompetitionID AND UserID = 1)
WHERE
PicksPoints.PicksPointsID is null
For more on this, read this nice post.
But, as an additional note, performance-wise you're in some trouble, using either subquery or the left join.
With subquery you're in trouble because up to 5.6 (where some good work has been done), MySQL is very bad with optimizing inner queries, and your subquery is expected to execute multiple times.
With the LEFT JOIN you are in trouble since a LEFT JOIN dictates the order of join from left to right. Yet your filtering is on the right table, which means you will not be able to use an index for filtering the USerID = 1 condition (or you would, and lose the index for the join).
These are two different queries. The first query looks for competitions associated with user id 1 (via the PicksPoints table), which the second joins with those rows that are associated with user id 1 that in addition have a null PicksPointsID.
The second query is coming out empty because you are joining against a table called PicksPoints and you are looking for rows in the join result that have PicksPointsID as null. This can only happen if
The second table had a row with a null PickPointsID and a competition id that matched a competition id in the first table, or
All the columns in the second table's contribution to the join are null because there is a competition id in the first table that did not appear in the second.
Since PicksPointsID really sounds like a primary key, it's case 2 that is showing up. So all the columns from PickPointsID are null, your where clause (UserID=1 and PicksPoints.PicksPointsID is null) will always be false and your result will be empty.
A plain left join should work for you
select c.CompetitionID, c.CompetitionName, c.CompetitionStartDate
from Competition c
left join PicksPoints p
on (c.CompetitionID = p.CompetitionID)
where p.UserID <> 1
Replacing the final where with an and (making a complex join clause) might also work. I'll leave it to you to analyze the plans for each query. :)
I'm not personally convinced of the need for the is null test. The article linked to by Shlomi Noach is excellent and you may find some tips in there to help you with this.

Do I need NULL values in SQL table for IDs that don't exist

I have an MySQL application that has many-to-many relationships. My main table is my materials table. In all of my other tables I have a material_id to match the table ID. So in my supplier table, I have supplier_id and material_id.
For this application, some materials do not have suppliers. For my SQL SELECT statement to return correctly that there are no suppliers for that material, should I have an entry of NULL for that supplier_id to match the material_id? Or will the SQL JOIN statement not return a result and I can script that accordingly in my PHP?
I wouldn't even create a NULL entry for the mapping.
You need an outer join, something like
select * from materials m left outer join suppliers s on (m.material_id=s.material_id);
It will return null values for the suppliers automatically.
You do not need to populate NULLs. What you want is an outer join. From Wikipedia:
The result of a left outer join (or
simply left join) for table A and B
always contains all records of the
"left" table (A), even if the
join-condition does not find any
matching record in the "right" table
(B). This means that if the ON clause
matches 0 (zero) records in B, the
join will still return a row in the
result—but with NULL in each column
from B. This means that a left outer
join returns all the values from the
left table, plus matched values from
the right table (or NULL in case of no
matching join predicate).
...should I have an entry of NULL for
that supplier_id to match the
material_id? Or will the SQL JOIN
statement not return a result and I
can script that accordingly in my PHP?
Both. Neither.
The meaning of "null" is "I have no information on this". If there is no supplier, the logical thing to do is to leave the column "null".
However, given that you're talking about a "join table" representing a many-to-many relationship, the best thing is not to insert a record in the join table at all. After all, logically, the absence of the record shows there is no relationship to suppliers.
You may need an outer join to make it all work when querying, though...