SQL LEFT JOIN and WHERE yields unexpected result - mysql

I've got two tables related by a field called 'status'. With table2.location having a value of 'texas', there are 20 instances of table2.status having a value of 'good'.
That is, the following query returns 20 rows of table2.status = 'good'.
[A] SELECT table2.status FROM table2 WHERE table2.location = 'texas';
Further, there are 50 unique table1.id's with table1.status = 'good'.
That is, the following query returns 50 rows of unique table1.id's.
[B] SELECT table1.id FROM table1 WHERE table1.status = 'good';
Now, when I run the following query:
[C] SELECT table1.id FROM table1 LEFT JOIN table2 ON table1.status = table2.status WHERE table2.location = "texas";
I would expect it to return the 50 rows of unique id's. However, it is actually returning the 50 rows of unique id's 20 times (ie. I get 1000 rows returned).
The quick fix I did was to simply execute SELECT DISTINCT table1.id... which then just returns one set of 50 rows (not 20 sets of 50 rows).
However, I'd like to know why I'm seeing this behahvior - maybe there is something wrong with my query [C] ?
Thanks!

Your query is wrong if you want to get 50 items: you're performing a cross join 50x20, so you have 1000 records as result.
You cannot join on status (which is not unique): probably your table design is wrong.
IMO you should have id in both tables and join with it...

it is expected. 1st row from table 1 matches 50 rows from table 2, 2nd row from table 1 matches again the same 50 rows from table 2 etc.

Your query has two problems
SELECT table1.id
FROM table1
LEFT JOIN table2 ON table1.status = table2.status
WHERE table2.location = "texas";
First when you usae a left join a condition on the second table must be in the ON clause not the where clause or you convert the left join to an inner join (Since the condition must be met)
So your query should start looking like this:
SELECT table1.id
FROM table1
LEFT JOIN table2
ON table1.status = table2.status
AND table2.location = "texas";
Now your next problem is that status is unlikely to be the thing you actually want to join on. To help you get the results you want though, we would need to see the table structure of the two tables.

Related

MySQL. Subtract data from table2 in table1

Simple question, but I don't get the way to acomplish it.
Table 1.
ID Quantity
1 4
2 5
3 2
Table 2
ID Quantity
2 1
3 2
I want the query to obtain the following result:
Table result
ID Quantity
1 4
2 4
I have been looking for something related with MINUS operator or NOT IN, but the thing is I want to substract the quantity in the same query.
EDIT: Table 1 is always bigger than Table 2. Table 2 can't contain id's that are not present in table 1.
I hope the example clarifies the question.
Regards!!
Sounds like a classic use-case of a join:
SELECT table1.value - COALESCE(table2.value, 0) AS value
FROM table1
LEFT OUTER JOIN table2
ON table1.id = table2.id
WHERE table1.value != table2.value
-- insert order by clauses/etc if needed
This will compute the values of table2's value minus table1's. You can get a good overview of different joins here. This uses a left join, which will only include results where there are ids in both table1 and table2 that match, and then uses COALESCE to turn the null/non-match from table2 into a 0.
The last statement's purpose is to finally remove results which equate to 0, so this would not include the (3, 0) result.
You can also use this join to create a view of the output, which has advantages like caching and speeding up your lookups.
SELECT table1.value - IFNULL(table2.value, 0) AS value
FROM table1
LEFT JOIN table2
ON table1.ID = table2.ID
WHERE table1.value > table2.value
To walk you through the above query. You use a LEFT JOIN here to combine your two tables. LEFT JOIN is specifically used since not all table 1 IDs all guaranteed to appear in table 2, but you still want to output these results. You use the ID in your ON condition since that is how you are matching the tables. You need to include the IFNULL statement since table 1 IDs with no matching table 2 IDs will result in NULL table 2 values for that joined row. You then subtract these two values to obtain your result. The WHERE clause here will remove rows which would have returned a value equal to or less than zero.
Use this SELECT statement:
SELECT T1.ID, T1.Quantity - COALESCE(T2.Quantity, 0) AS Quantity
FROM T1 LEFT JOIN T2 ON T1.ID = T2.ID
ORDER BY T1.ID;

MySql inner join of 2 tables

Im sure Im missing something vital here so any advices are welcome. I have one base table lets call it Content_1 and two additional tables called Content_2 and Content_3. What Im trying to do is get all results from table_2 matching the id from table 1 and to add to this result set all results from table_3 which also match the id coming from table_1. Basically to have an OR condition in the final results - return everything from table 2 matching the id from table 1 OR return everything from table 3 matching the id from table 1. However I see that no results are returned so my guess is that we make the first join and then second join is applied to the result set returned after the first join, not on the initial join.
SELECT * FROM Content_1
JOIN Content_2 ON Content_1.id = Content_2.id
JOIN Content_3 ON Content_1.id = Content_3.id
You probably want UNION instead of joining all 3 tables.
SELECT Col1, Col2
FROM Content_1
JOIN Content_2 ON Content_1.id = Content_2.id
UNION
SELECT Col1, Col2
FROM Content_1
JOIN Content_3 ON Content_1.id = Content_3.id

MySQL: JOIN where ON may compare with null

I have:
simple_table
|- first_id
|- second_id
SELECT * FROM table t1 JOIN table t2
ON [many many conditions]
ON t1.id IN (SELECT first_id FROM simple_table)
AND t2 = (
SELECT second_id FROM simple_table WHERE t1.id = first_id //4th row, can return NULL
)
Questions:
How to handle situation where 4th row return null?
Can I use t1 & t2 alias inside subqueries?
Updated [extra wxplanation]
I have very big table. I need to iterate through table and check some conditions. Actually simple_table provide the ids of table entities, conditions of which I should check. I mean:
simple_table
first_id second_id
11 128
table
id <other_fields>
................
11 <other_data>
...............
128 <other_data>
So, I should check whether those two entities in table have right conditions relatively one another.
The question is unclear, but given the update the query should work better if there is an index on the ID of the big table (probably it's there already as the PK).
As the condition seems to be on the same table the easiest query will be
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID IN (st.first_id, st.second_id)
or
SELECT ...
FROM bigtable t1
INNER JOIN simple_table st ON t1.ID = st.first_id
INNER JOIN bigtable t2 ON st.second_id = t2
to get the two rows from bigtable on the same row of the result.
The second query will make the checks easier to write, the first will be faster but most probable need a GROUP BY to return the wanted results.
Some performance tests on the OP machine are needed to get the fastest one.
In case one of the ID in simple_table is NULL only the other will be considered, the code will have to check about it.
You can use the alias of the tables in the subqueries, and you'll need to do that as you'll probably have the same table in the subqueries.
The relative condition to check are still undisclosed by the OP so that's all I can help with.

Are these two mysql select queries identical functionality wise?

I have 3 tables. Table 1 and 2 share column 1 and 2. All 3 tables share column 2 (an ID column), but only table 3 contains column 3. I want all rows where tables 1,2 have equal values for columns 1 and 2, but only where table3.Col3 (joined on the ID column 2) is equal to some specific value "X".
I have two queries which appear to be identical and are working for what I want, but I am asking the experts to make sure they are interchangeable :
SELECT *
from Table1 INNER JOIN Table2
ON Table1.Col1 = Table2.Col1 and Table1.Col2 = Table2.Col2
WHERE (Select Col3 from Table3 where Table2.Col2 = Table3.Col2) = "X"
SELECT *
from Table1 INNER JOIN Table2
ON Table1.Col1 = Table2.Col1 and Table1.Col2 = Table2.Col2
INNER JOIN Table3
ON Table1.Col2 = Table3.Col2
WHERE Table3.Col3 = "X"
I'm going to say yes they are equivalent, and try offering an explanation.
1st query:
The 1st INNER JOIN will only select rows from Table1 and Table2 where both Col1 and Col2 match. The subquery is in-fact a correlated subquery, which will be executed for each row of the outer subquery, which means every row filtered by the INNER JOIN. Additionally, you are filtering the outer query on the results of the inner query where Col3 from Table3 = 'X'. This is giving you exactly the data you want.
2nd query:
Slightly different. 1st INNER JOIN works the same way as in case 1. However, then you INNER JOIN this result set with Table3. Again, you are only joining on rows where Table1.Col2 = Table3.Col2. And, since Table1.Col2 = Table2.Col2, it results in an equivalent intermediate result set as defined by the correlated subquery in the 1st case. Finally, you are filtering on Table3.Col3 = 'X', which again results in the exact dataset as you wanted.
Hope this makes sense. Do correct me if my logic is wrong.

left join returning more than expected

Using the following query
select *
from table1
left join table2 on table1.name = table2.name
table1 returns 16 rows and table2 returns 35 rows.
I was expecting the above query to return 16 rows because of the left join, but it is returning 35 rows. right join also returns 35 rows
Why is this happening and how do I get it to return 16 rows?
LEFT JOIN can return multiple copies of the data from table1, if the foreign key for a row in table 1 is referenced by multiple rows in table2.
If you want it to only return 16 rows, one for each table 1 row, and with a random data set for table 2, you can use just a plain GROUP BY:
select *
from table1
left join table2 on table1.name = table2.name
group by table1.name
GROUP BY aggregates rows based on a field, so this will collapse all the table1 duplicates into one row. Generally, you specify aggregate functions to explain how the rows should collapse (for example, for a number row, you could collapse it using SUM() so the one row would be the total). If you just want one random row though, don't specify any aggregate functions. MySQL will by default just choose one row (note that this is specific to MySQL, most databases will require you to specify aggregates when you group). The way it chooses it is not technically "random", but it is not necessarily predictable to you. I guess by "random" you really just mean "any row will do".
Let's assume you have the following tables:
tbl1:
|Name |
-------
|Name1|
|Name2|
tbl2:
|Name |Value |
--------------
|Name1|Value1|
|Name1|Value2|
|Name3|Value1|
For your LEFT JOIN you'll get:
|tbl1.Name|tbl2.Name|Value |
----------------------------
|Name1 | Name1 |Value1|
|Name1 | Name1 |Value2|
|Name2 | NULL | NULL |
So, LEFT JOIN means that all records from LEFT (first) table will be returned regardless of their presence in right table.
For your question you need to specify some specific fields instead of using "*" and add GROUP BY tbl1.Name - so your query will look like
select tbl1.Name, SOME_AGGREGATE_FUNCTION(tbl2.specific_field), ...
from table1
left join table2 on table1.name = table2.name
GROUP BY tbl1.Name
One way to use this is by using the power of SQL distinct.
select distinct tbl1.id, *
from table1 tbl1
left join table2 tbl2 on tbl2.name = tbl1.name
where
....................
Please not that I am also using aliasing.
If the name column is not unique in the tables then you may simply have duplicates on table2.
Try running:
select * from table2 where name not in (select name from table1);
If you get no results back then duplicates on the name column is the reason for the extra rows coming back.
Duplication may be reason. See example in the post
https://alexpetralia.com/posts/2017/7/19/more-dangerous-subtleties-of-joins-in-sql
if you want to join the single latest/earliest relative row from right table, you can limit the join data using min/max primary key and then limiting to 1 row using group Like this:
SELECT * FROM table1
LEFT JOIN (SELECT max(tbl2_primary_col), {table2.etc} FROM table2 GROUP BY name) AS tbl2
ON table1.name = tbl2.name
WHERE {condition_for_table1}
And remember don't use * for left join because it will disable min/max and always return first row.
As per your comment "A random row from table2, as long as name from table1 matches name from table2", you can use the following:
select table1.name, (select top 1 somecolumn from table2 where table2.name = table1.name)
from table1
Note that top 1 is not mysql but it is for SQL Server