include null and zero in count() from related table - mysql

I would like to list in table (staging) the number of related records from table (studies).
So far this statement works well but returns only the rows where there are >0 related records:
SELECT staging.*,
COUNT(studies.PMID) AS refcount
FROM studies
LEFT JOIN staging
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
How can I adjust this statement to list ALL rows in table (staging) including where there are zero or null related records from table (studies)?
Thank you

You have the tables in the wrong order in the LEFT JOIN:
SELECT staging.*, COUNT(studies.PMID) AS refcount
FROM staging LEFT JOIN
studies
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
LEFT JOIN keeps everything in the first ("left") table and all matching rows in the second. If you want to keep everything in the staging table, then put it first.
And, in case anyone wants to complain about the use of staging.* with GROUP BY. This particular usage is (presumably) ANSI compliant because staging.idstaging is (presumably) a unique id in that table.

Related

LEFT JOIN not working to replace a NOT EXISTS MYSQL

I have a table called Documents and a table called Notes that stores notes for the Documents. I need to get all Documents where there are no notes that have a status of 2 or 3.
SELECT * FROM Documents
WHERE NOT EXISTS (
SELECT docID FROM Notes WHERE docId = id AND status IN (2, 3)
)
This is extremely slow but it works. I tried doing an Inner join but if just one note has a status other than 2 or 3, it still shows the Document. I need it to only show Documents where there is no occurrence of 2 or 3 in any of the notes.
Can anyone help!? Thanks!
One way of doing it:
SELECT *, COUNT(docID)
FROM Documents
LEFT JOIN Notes ON docID = id AND (status IN (2,3))
GROUP BY id
HAVING COUNT(docID) = 0
If there's a status=2 or status=3, then the count will be non-zero, and the having will eliminate the document entirely.
An anti-join is a familiar SQL pattern.
With that pattern, we use an outer join operation to return all rows, along with matching rows, including rows that don't have a match. The "trick" is to use a predicate in the WHERE clause to filter out all of the rows that found a match, leaving only rows that didn't have a match.
As an example, to retrieve rows from Documents that don't have any matching row in Notes that meet specified criteria:
SELECT d.*
FROM Documents d
LEFT
JOIN Notes n
ON n.docId = d.id
AND n.status IN (2,3)
WHERE n.docId IS NULL
(I'm guessing that docId and status are references to columns in Notes, and that id is a reference to a column in Documents. The column references in your query aren't qualified, so that leaves us guessing which columns are in which table. Best practice is to qualify all column references in a query that references more than one table. One big benefit is that it makes it possible to decipher the statement without having to look at the table definitions, to figure out which columns are coming from which table.)
That query will return rows from Documents where there isn't any "matching" row in Notes that has a status of 2 or 3.
Because the LEFT JOIN is an outer join, it returns all rows from Documents, along with matching rows from Notes. Any rows from Documents that don't have a matching row will be also be returned, with NULL values for the columns from Notes. The equality predicate in the join (n.docId = d.Id) guarantees us that any "matching" row will have a non-NULL value for docId.
The "trick" is the predicate in the WHERE clause: n.docId IS NULL
Any rows that had a match will be filtered out, so we're left with rows from Documents that didn't have a match.
The original query has status NOT IN (2,3). That would essentially ignore rows in Notes that have one of those statuses, and only a row with some other non-NULL value of status would "match". Based on the specification... "no notes with a status of 2 or 3", that seems to imply you'd want status IN (2,3) in the query.

MySQL SELECT from two tables with COUNT

i have two tables as below:
Table 1 "customer" with fields "Cust_id", "first_name", "last_name" (10 customers)
Table 2 "cust_order" with fields "order_id", "cust_id", (26 orders)
I need to display "Cust_id" "first_name" "last_name" "order_id"
to where i need count of order_id group by cust_id like list total number of orders placed by each customer.
I am running below query, however, it is counting all the 26 orders and applying that 26 orders to each of the customer.
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order, customer cus
GROUP BY cust_id;
Could you please suggest/advice what is wrong in the query?
You issue here is that you have told the database how these two tables are 'connected', or what they should be connected by:
Have a look at this image:
~IMAGE SOURCE
This effectively allows you to 'join' two tables together, and use a query between them.
so you might want to use something like:
SELECT COUNT(B.order_id), A.cust_id, A.first_name, A.last_name
FROM customer A
LEFT JOIN cust_order B //this is using a left join, but an inner may be appropriate also
ON (A.cust_id= B.Cust_id) //what links them together
GROUP BY A.cust_id; // the group by clause
As per your comment requesting some further info:
Left Join (right joins are almost identical, only the other way around):
The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right table. This means that if the ON clause matches 0 (zero) records in right table, the join will still return a row in the result, but with NULL in each column from right table. ~Tutorials Point.
This means that a left join returns all the values from the left table, plus matched values from the right table or NULL in case of no matching join predicate.
LEFT joins will be used in the cases where you wish to retrieve all the data from the table in the left hand side, and only data from the right that match.
Execution Time
While the accepted answer in this case may work well in small datasets, it may however become 'heavy' in larger databases. This is because it was not actually designed for this type of operation.
This was the purpose of Joins to be introduced.
Much work in database-systems has aimed at efficient implementation of joins, because relational systems commonly call for joins, yet face difficulties in optimising their efficient execution. The problem arises because inner joins operate both commutatively and associatively. ~Wikipedia
In practice, this means that the user merely supplies the list of tables for joining and the join conditions to use, and the database system has the task of determining the most efficient way to perform the operation. A query optimizer determines how to execute a query containing joins. So, by allowing the dbms to choose the way your data is queried, you can save a lot of time.
Other Joins/Summary
AN INNER JOIN will return data from both tables where the keys in each table match
A LEFT JOIN or RIGHT JOIN will return all the rows from one table and matching data from the other table.
Use a join when you want to query multiple tables.
Joins are much faster than other ways of querying >=2 tables (speed can be seen much better on larger datasets).
You could try this one:
SELECT COUNT(cus_order.order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order cus_order, customer cus
WHERE cus_order.cust_id = cus.cust_id
GROUP BY cust_id;
Maybe an left join will help you
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name ]
FROM customer cus
LEFT JOIN cust_order co
ON (co.cust_id= cus.Cust_id )
GROUP BY cus.cust_id;

Join TableA, TableB and TableC to get data from TableA and TableC

Why I am getting so many records for this
SELECT e.OneColumn, fb.OtherColumn
FROM dbo.TABLEA FB
INNER JOIN dbo.TABLEB eo ON Fb.Primary = eo.foregin
INNER JOIN dbo.TABLEC e ON eo.Primary =e.Foreign
WHERE FB.SomeOtherColumn = 0
When I am running this I am getting Millions of records which is not the correct case, all tables has less number of records.
I need to get the columns from TableA and TableC and because they are not joined logically so I have to use TableB to act as bridge
EDIT
Below is the count:
TABLEA = 273551
TABLEB = 384412
TABKEC = 13046
Above Query = After 2 minutes I have forcefully canceled the query.. till that time the count was 11437613
Any suggestion?
To figure out what is going on in such a query where the results are not as expected, I tend to do this. First I change to a SELECT * (Note this is only for figuring out the problem, do not use SELECT * on production, ever!) Then I add an order by for the ID frield from tableA if there is not one in the query.
So now I run the query up to the first table including any where conditions that are from the first table. I comment out the rest. I note the number of records returned.
Now I add in the second table and any where conditions from it. If I am expecting a one to relationship, and if this query doesn't return the smae number of records, then I look at the data that is being returned to see if I can figure out why. Since the contents are ordered by the table1 ID, you can ususally see examples of some records that are duplicated fairly easily and then scroll over until you find the field that causes the differnce. Often this means that you need some sort of addtional where clause or aggregation on the fields in the next table to limit to only one record. JUSt note down the problem at this point though as you may be able tomake the change more effectively in the next join.
So add inteh the third table and again, not the number of records and then look closely at the data where the id from A is repeated. LOok at the columns you intend to return, are they always teh same for an id? If they are differnt then you do not havea one-one relationship and you need to understand that either theri is a data integrity problem or you are mistaken in thinking there is a one-to-one. If tehy are the same, then a derived table may be in order. You only need the ids from tableb so the join could look something like this:
JOIN (SELECT MIn(Primary), foreign FROM TABLEB GROUP BY foreign) EO ON Fb.Primary = eo.foreign
Hope this helps.

Dependant SubQuery v Left Join

This query displays the correct result but when doing an EXPLAIN, it lists it as a "Dependant SubQuery" which I'm led to believe is bad?
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
WHERE CompetitionID NOT
IN (
SELECT CompetitionID
FROM PicksPoints
WHERE UserID =1
)
I tried changing the query to this:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON Competition.CompetitionID = PicksPoints.CompetitionID
WHERE UserID =1
and PicksPoints.PicksPointsID is null
but it displays 0 rows. What is wrong with the above compared to the first query that actually does work?
The seconds query cannot produce rows: it claims:
WHERE UserID =1
and PicksPoints.PicksPointsID is null
But to clarify, I rewrite as follows:
WHERE PicksPoints.UserID =1
and PicksPoints.PicksPointsID is null
So, on one hand, you are asking for rows on PicksPoints where UserId = 1, but then again you expect the row to not exist in the first place. Can you see the fail?
External joins are so tricky at that! Usually you filter using columns from the "outer" table, for example Competition. But you do not wish to do so; you wish to filter on the left-joined table. Try and rewrite as follows:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON (Competition.CompetitionID = PicksPoints.CompetitionID AND UserID = 1)
WHERE
PicksPoints.PicksPointsID is null
For more on this, read this nice post.
But, as an additional note, performance-wise you're in some trouble, using either subquery or the left join.
With subquery you're in trouble because up to 5.6 (where some good work has been done), MySQL is very bad with optimizing inner queries, and your subquery is expected to execute multiple times.
With the LEFT JOIN you are in trouble since a LEFT JOIN dictates the order of join from left to right. Yet your filtering is on the right table, which means you will not be able to use an index for filtering the USerID = 1 condition (or you would, and lose the index for the join).
These are two different queries. The first query looks for competitions associated with user id 1 (via the PicksPoints table), which the second joins with those rows that are associated with user id 1 that in addition have a null PicksPointsID.
The second query is coming out empty because you are joining against a table called PicksPoints and you are looking for rows in the join result that have PicksPointsID as null. This can only happen if
The second table had a row with a null PickPointsID and a competition id that matched a competition id in the first table, or
All the columns in the second table's contribution to the join are null because there is a competition id in the first table that did not appear in the second.
Since PicksPointsID really sounds like a primary key, it's case 2 that is showing up. So all the columns from PickPointsID are null, your where clause (UserID=1 and PicksPoints.PicksPointsID is null) will always be false and your result will be empty.
A plain left join should work for you
select c.CompetitionID, c.CompetitionName, c.CompetitionStartDate
from Competition c
left join PicksPoints p
on (c.CompetitionID = p.CompetitionID)
where p.UserID <> 1
Replacing the final where with an and (making a complex join clause) might also work. I'll leave it to you to analyze the plans for each query. :)
I'm not personally convinced of the need for the is null test. The article linked to by Shlomi Noach is excellent and you may find some tips in there to help you with this.

Use of inner join seems to cause entries in a #temp table to disappear

I am brand new to SQL and am working with the following code provided to us by one of our vendors:
SELECT DISTINCT MriPatients.PatientID
INTO #UniquePt
FROM MriPatients
INNER JOIN #TotalPopulation ON MriPatients.PatientID = #TotalPopulation.PatientID
Set #TotalUniquePatients = (Select Count(*) FROM #UniquePt)
What happens is the Set line causes #TotalUniquePatients to be set to 0 even though there are many unique patient ids in our database. That value is then later used as a denominator in a division which causes a divide by 0 error.
Now it seems to me that this is easy to fix by using COUNT DISTINCT on the MriPatients table; then you don't need to create #UniquePt at all...this is the only place that table is used. But, I don't understand why the code as it is gets a 0 result when counting #UniquePt. If you remove the INNER JOIN, the Set returns a correct result...so what does the INNER JOIN do to #UniquePt?
If it matters, we are using SQL Server 2008.
The result is 0 because of 1 of 2 situations:
#TotalPopulation is empty
#TotalPopulation contains no records that have the same value for PatientID as the records in MriPatients
How are you populating #TotalPopulation?
A COUNT DISTINCT won't necessarily do the same thing. It depends on what you fill #TotalPopulation with. If all you want is the number of unique patients in MriPatients then yes, the COUNT DISTINCT will work. But if you're filling #TotalPopulation based on some kind of logic then they're the COUNT DISTINCT won't necessarily give you the same results as the COUNT of the joined tables.
The INNER JOIN causes you to insert ONLY records that have a matching PatientID in the #TotalPopulation table.
I'm guessing you don't, or that table isn't populated, which is causing the issue.
Is there a reason you are joining to it in the first place?