Join TableA, TableB and TableC to get data from TableA and TableC - sql-server-2008

Why I am getting so many records for this
SELECT e.OneColumn, fb.OtherColumn
FROM dbo.TABLEA FB
INNER JOIN dbo.TABLEB eo ON Fb.Primary = eo.foregin
INNER JOIN dbo.TABLEC e ON eo.Primary =e.Foreign
WHERE FB.SomeOtherColumn = 0
When I am running this I am getting Millions of records which is not the correct case, all tables has less number of records.
I need to get the columns from TableA and TableC and because they are not joined logically so I have to use TableB to act as bridge
EDIT
Below is the count:
TABLEA = 273551
TABLEB = 384412
TABKEC = 13046
Above Query = After 2 minutes I have forcefully canceled the query.. till that time the count was 11437613
Any suggestion?

To figure out what is going on in such a query where the results are not as expected, I tend to do this. First I change to a SELECT * (Note this is only for figuring out the problem, do not use SELECT * on production, ever!) Then I add an order by for the ID frield from tableA if there is not one in the query.
So now I run the query up to the first table including any where conditions that are from the first table. I comment out the rest. I note the number of records returned.
Now I add in the second table and any where conditions from it. If I am expecting a one to relationship, and if this query doesn't return the smae number of records, then I look at the data that is being returned to see if I can figure out why. Since the contents are ordered by the table1 ID, you can ususally see examples of some records that are duplicated fairly easily and then scroll over until you find the field that causes the differnce. Often this means that you need some sort of addtional where clause or aggregation on the fields in the next table to limit to only one record. JUSt note down the problem at this point though as you may be able tomake the change more effectively in the next join.
So add inteh the third table and again, not the number of records and then look closely at the data where the id from A is repeated. LOok at the columns you intend to return, are they always teh same for an id? If they are differnt then you do not havea one-one relationship and you need to understand that either theri is a data integrity problem or you are mistaken in thinking there is a one-to-one. If tehy are the same, then a derived table may be in order. You only need the ids from tableb so the join could look something like this:
JOIN (SELECT MIn(Primary), foreign FROM TABLEB GROUP BY foreign) EO ON Fb.Primary = eo.foreign
Hope this helps.

Related

What actually happens during table JOINs?

I'm trying to see if my understanding of JOINs is correct.
For the following query:
SELECT * FROM tableA
join tableB on tableA.someId = tableB.someId
join tableC on tableA.someId = tableC.someId;
Does the RDMS basically execute similar pseudocode as follows:
List tempResults
for each A_record in tableA
for each B_record in tableB
if (A_record.someId = B_record.someId)
tempResults.add(A_record)
List results
for each Temp_Record in tempResults
for each C_record in tableC
if (Temp_record.someId = C_record.someId)
results.add(C_record)
return results;
So basically the more records with the same someId tableA has with tableB and tableC, the more records the RDMS have the scan? If all 3 tables have records with same someId, then essentially a full table scan is done on all 3 tables?
Is my understanding correct?
Each vendor's query processor is of course written (coded) slightly differently, but they probably share many common techniques. Implementing a join can be done in a variety of ways, and which one is chosen, in any vendor's implementation, will be dependent on the specific situation, but factors that will be considered include whether the data is already sorted by the join attribute, the relative number of records in each table (a join between 20 records in one set of data with a million records in the other will be done differently than one where each set of records is of comparable size). I do not know the internals for MySQL, but for SQL server, there are three different join techniques, a Merge Join, a Loop Join, and a Hash Join. Take a look at this.

include null and zero in count() from related table

I would like to list in table (staging) the number of related records from table (studies).
So far this statement works well but returns only the rows where there are >0 related records:
SELECT staging.*,
COUNT(studies.PMID) AS refcount
FROM studies
LEFT JOIN staging
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
How can I adjust this statement to list ALL rows in table (staging) including where there are zero or null related records from table (studies)?
Thank you
You have the tables in the wrong order in the LEFT JOIN:
SELECT staging.*, COUNT(studies.PMID) AS refcount
FROM staging LEFT JOIN
studies
ON studies.rs_number = staging.rs
GROUP BY staging.idstaging;
LEFT JOIN keeps everything in the first ("left") table and all matching rows in the second. If you want to keep everything in the staging table, then put it first.
And, in case anyone wants to complain about the use of staging.* with GROUP BY. This particular usage is (presumably) ANSI compliant because staging.idstaging is (presumably) a unique id in that table.

MySQL Select WHERE IN recordset

I'll try to explain my problem. I have two tables. In the first one each record is identified by a unique INT code (counter). In the second the code from the first table is one of the fields (and may be repeated in various records).
I want to make a SELECT CODE in the second table, based on WHERE parameters, knowing I will get as result a recordset with possibly repeated CODES, and use this recordset for another SELECT in the first table, WHERE CODE IN the above recordset (from the second table).
Is this possible ?? And if yes, how to do this ?
Usually, if I use the WHERE IN clause, the array can contain repeated values like WHERE Code IN "3,4,5,6,3,4,2" ... right ? The difference here is that I want to use a previously Selected recordset in place of the array.
Is this possible ?? Sure is.
And if yes, how to do this ? Like most questions answers depends. There's more than one way to skin this cat; and depending on data (volume of records), and indexes answers can vary.
You can use a distinct or group by to limit the table A records because the join from A--> b is a 1--> many thus we need to distinct or group by the values from A as they would be repeated. But if you need values from B as well, this is the way to do it.
Select A.Code, max(count B.A_CODE) countOfBRecords
from B
LEFT JOIN A
on A.Code = B.A_Code
WHERE B.Paramater = 'value'
and B.Paramater2 = 'Value2'
group by A.Code)
Or using your approach (works if you ONLY need values/columns from table A.)
Select A.Code
from A
Where code in (Select B.A_CODE
From B WHERE B.Paramater = 'value'
and B.Paramater2 = 'Value2')
But these can be slow depending on data/indexes.
You don't need the distinct on the inner query as A.Code only exists once and thus wouldn't be repeated. It's the JOIN which would cause the records to repeat not the where clause.
-Correlated Subquery will return a single A.Code works if you ONLY need values from table A.
Select A.Code
From A
Where exists (Select 1
from B
where b.paramter = value ...
AND A.Code = B.A_CODE)
Since there's no join no A.records would be repeated. On larger data sets this generally performs better .
This last approach works because it Correlates the outer table with sub select Note this can only go 1 level in a relationship. If you had multiple levels deep trying to join this way, it woudln't work.

Conditional Select Statement in MS Access

I want to create a query in MS Access that will display information from two tables based on the values in one table. Both of these tables have the same exact columns. One has set records and the other one has records a visitor can insert/edit/delete. For the purpose of this question I will call the tables TableA and TableB. TableA has the predetermined records and can not be changed. Multiple users will be using these records. Visitors would add records to TableB. I need a query that will display the records from TableA unless a visitor adds a record to TableB and then it displays that record. The field I need to join on is CategoryID. So what I need is basically like this;
If TableB.CategoryID Is Not Null Then
Select * From TableB
Else
Select * From TableA
End If
Thanks for any assistance anyone can provide.
JW
You get part of the way there by unioning the individual table queries; that works if there's nothing in B, but shows the A records if there is.
So suppose we created a table just like A, say A2, but with an added column: the number of records in B. And then we select all of the records in A2 where this new column 0, and only the columns originally in A; call this A3.
Now consider the union of A3 & B. If B is empty, we get A. If B is not empty, then none of the records from A2 will be chosen for A3, and we'e left with just B.
That is easier than it seems at first. You'll have to join both tables on CategoryID and then conditionally select the right item like this:
SELECT tA.CategoryID, IIF(tB.CategoryID IS NULL, tA.txtEntry, tB.txtEntry) AS EntryText,
tB.CategoryID IS NULL AS bOriginalEntry
FROM TableA AS tA LEFT JOIN TableB AS tB ON tA.CategoryID=tB.CategoryID
However there is one caveat: If TableB is empty then the join is producing an empty set! Just populate TableB with at least one record (preferably one with an invalid CategoryID so it won't join with a valid record in TableA.
The bOriginalEntry is just a boolean expression to show whether the EntryText stems from TableA or TableB.
I found this thread searching for a similar problem. Note to self and others.
You can use the Join Types to cope with potential different values in a conditional select,
MS Access doesn't have the full range of JOIN that MS SQL has, but you can "fudge" it.
eg
Full outer joins: all the data, combined where feasible
In some systems, an outer join can include all rows from both tables, with rows combined when they correspond. This is called a full outer join, and Access doesn’t explicitly support them. However, you can use a cross join and criteria to achieve the same effect.
https://support.office.com/en-us/article/join-tables-and-queries-3f5838bd-24a0-4832-9bc1-07061a1478f6#typesofjoins

Use of inner join seems to cause entries in a #temp table to disappear

I am brand new to SQL and am working with the following code provided to us by one of our vendors:
SELECT DISTINCT MriPatients.PatientID
INTO #UniquePt
FROM MriPatients
INNER JOIN #TotalPopulation ON MriPatients.PatientID = #TotalPopulation.PatientID
Set #TotalUniquePatients = (Select Count(*) FROM #UniquePt)
What happens is the Set line causes #TotalUniquePatients to be set to 0 even though there are many unique patient ids in our database. That value is then later used as a denominator in a division which causes a divide by 0 error.
Now it seems to me that this is easy to fix by using COUNT DISTINCT on the MriPatients table; then you don't need to create #UniquePt at all...this is the only place that table is used. But, I don't understand why the code as it is gets a 0 result when counting #UniquePt. If you remove the INNER JOIN, the Set returns a correct result...so what does the INNER JOIN do to #UniquePt?
If it matters, we are using SQL Server 2008.
The result is 0 because of 1 of 2 situations:
#TotalPopulation is empty
#TotalPopulation contains no records that have the same value for PatientID as the records in MriPatients
How are you populating #TotalPopulation?
A COUNT DISTINCT won't necessarily do the same thing. It depends on what you fill #TotalPopulation with. If all you want is the number of unique patients in MriPatients then yes, the COUNT DISTINCT will work. But if you're filling #TotalPopulation based on some kind of logic then they're the COUNT DISTINCT won't necessarily give you the same results as the COUNT of the joined tables.
The INNER JOIN causes you to insert ONLY records that have a matching PatientID in the #TotalPopulation table.
I'm guessing you don't, or that table isn't populated, which is causing the issue.
Is there a reason you are joining to it in the first place?