Natural join works but not with all values - mysql

I can't understand whats happening...
I use two sql queries which do not return the same thing...
this one :
SELECT * FROM table1 t1 JOIN table1 t2 on t1.attribute1 = t2.attribute1
I get 10 rows
this other :
SELECT * FROM table1 NATURAL JOIN table1
I get 8 rows
With the NATURAL JOIN 2 rows aren't returned... I look for the missing lines and they are the same values ​​for the attribute1 column ...
It's impossible for me.
If anyone has an answer I could sleep better ^^
Best regards
Max

As was pointed out in the comments, the reason you are getting a different row count is that the natural join is connecting your self join using all columns. All columns are being compared because the same table appears on both sides of the join. To test this hypothesis, just check the column values from both tables, which should all match.
The moral of the story here is to avoid natural joins. Besides not being clear as to the join condition, the logic of the join could easily change should table structure change, e.g. if a new column gets added.
Follow the link below for a small demo which tried to reproduce your current results. In a table of 8 records, the natural join returns 8 records, whereas the inner join on one attribute returns 10 records due to some duplicate matching.
Demo

You need to 'project away' the attribute you don't want used in the join e.g. in a derived table (dt):
SELECT *
FROM table1
NATURAL JOIN ( SELECT attribute1 FROM table1 ) dt;

Related

MySQL INNER JOIN - how to add extra IF statement?

I just learned how to use JOINS (so please be gentle ;) ), and wrote this query:
SELECT 1
FROM `T1`
INNER JOIN `T2` ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID` = XXX
AND `T1`.`STATEMENT` = YYY
AND `T2`.`REQUIREMENT` != 0 //last row does not work as intended!
It works perfectly without the last condition: if T1_ID from T1 does match REQUIREMENT from T2 for given T1_ID (XXX) - it does work. I wanted additional STATEMENT from T1 to be match (YYY) - and it still does work.
But then I realized, that I need to exclude one cause: when T2.REQUIREMENT is equal to 0, I want this query to return 1 regardless of the JOIN formula. The problem is, that if T2.REQUIREMENT = 0, I know for sure that there will not be any T1.T1_ID entry that will match the JOIN requirements. So I understand, that this last condition has no right to work like I'd wish it was.
What I need is some kind of IF statement. Something that would work like:
SELECT 1
IF (`T2`.`REQUIREMENT`!=0) //if true, don't even go to join, and return 1
OR (my previous join query)
The thing is, that I have no idea how to implement such IF statement into mysql.
Any ideas? Thanks.
Sample data:
T1:
id STATEMENT T1_ID
1 irrelevant 1
2 irrelevant 5
T2:
id T1_ID REQUIREMENT
1 1 0
2 2 0
3 3 1
4 4 3
5 5 4
6 6 5
7 7 6
Such setup should return 1 for T1_ID equal 1, 2, 3, 6.
In addition, if it's even possible in single query, I'd like it to return 1 as well even if T1 was empty, for all T2.REQUIREMENT=0 - in this case T1_ID equal 1, 2.
Just FYI, good start on your post and example... tableName.columnName references (or alias.columnName) should always be provided to prevent ambiguity that others don't know your table structure. Also, you only really need the tick marks for things like reserved words or column names that have spaces (never like these anyhow).
From my reading your question and sample tables T1 and T2... T1 appears to be some Lookup table and has IDs and descriptions associated to said IDs. Your T2 table appears to be your detail/transaction based table and it may or not have an actual requirement hence your desire to always include those records without a specific requirement.
If this is the case, it sounds like you want "all detail records that have some condition REGARDLESS of a matched requirement ID as found in the lookup table." If this is accurate, you would be looking for
Select
T2.T1_ID,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
SomeMainTable T2
LEFT JOIN SomeLookupTable T1
on T2.T1_ID = T1.T1_ID
where
T2.T1_ID = SomeIdParameterValue
AND ( T1.T1_ID is NULL
OR T1.Statement = SomeOtherParameterValue )
The join between tables I always try to list the left-side table first, then indent to the right-side table and have my ON condition show the left.column = right.column so you always see the relationship and how table A gets to table B (and nested more as other joins come into play).
The different between (INNER) JOIN and LEFT JOIN is that (INNER) JOIN REQUIRES a record to always match on both tables. LEFT JOIN means I want everything from the table on the left side REGARDLESS of an actual match in the right table.
So at this point, I get all from T2 alias table first regardless of the answer in T1 alias. Now, how to deal with the zero remarks id value. If zero indicates you KNOW there wont be a match in T1, then you can just say I want all records on the T2 side if the T1 side IS NULL... But you also care for a specific statement, hence the OR within the parenthesis test.
The first part of the where is if you were specifically looking for all things of a T1_ID = some value, so that is a primary parameter and only applicable to the T2 side... you are qualifying the T1 side via the AND ( null or other equality test ) condition.
If you have some confidential data, it is ok to randomize / provide sample data, but having real table name reference / context will help us better understand what you are trying to accomplish. Ambiguity in table names and columns does not help us mentally understand and might have better query solutions having a better understanding.
If I am close and you need additional clarification, please advise and/or edit your original post with additional sample data and final output... such as parameters being filtered for too.
POST CLARIFICATION.
Per your comments, here are the clarifications...
The "SomeMainTable T2" is actually a breakdown of the actual table name within your database and "T2" is the ALIAS reference. Imagine your table name is "SomethingReallyLongDetail". Would you prefer to write your query something like
select
SomethingReallyLongDetail.Column1,
SomethingReallyLongDetail.Column7,
SomethingReallyLongDetail.Column20
from
SomethingReallyLongDetail
OR...
select
SRL.Column1,
SRL.Column7,
SRL.Column20
from
SomethingReallyLongDetail SRL
In this case, I used an alias "SRL" to more easily correlate to the table name as an acronym / abbreviation vs having to type the long value over and over, then have more chance of typing mistakes. Simply for readability providing the "alias" reference within the query. So, I did not know your ACTUAL table name, so I made it up but using the "T1" and "T2" references to stay in-line with your original post.
Next, COALESCE(). Since this query does a LEFT-JOIN, The right-side table may (or not) actually have a record match on the ID as you know might not always exist. Since I was trying to pull the "Statement" column from that second table (alias T1), that description could be NULL which you probably would not want to show in any sort of output. To prevent that, COALESCE() says, give me the value from the first parameter in the list... If that value is null, give me the second value. In this case the second value is just an empty string.
Parameters in the query. Your original query had reference to XXX and YYY such as you knew of a specific T1.ID value you wanted to narrow down to pulling out, but a different value YYY as being part of the statement. So the place where you had an "XXX", I just put a place-holder here for you to apply/put any value you were specifically looking for. Similarly for your "YYY" value, another place-holder for that. Just substitute whatever criteria you were looking for.
Finally that AND part of the where clause. This is for the condition of the LEFT-JOIN. Since you KNOW that not all records will have a match in the "T1" secondary table, with the LEFT JOIN, the ID will be found and HAVE a value, or there will not be a value and thus NULL.
If there is no matching record, you would never be able to compare some string, int, date, whatever to a column as it would be null. So I am doing
(T1.T1_ID IS NULL -- as in there was no match
OR T1.Statement = SomeOtherParameterValue ) -- there WAS a match, and I only want where the statement equals a given value.
Per your comments and example results, your query SHOULD be simplified to...
Select
T2.ID,
T2.T1_ID,
T2.Requirement,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
T2
LEFT JOIN T1
on T2.Requirement = T1.T1_ID
where
T2.Requirement = 0
OR T1.T1_ID IS NOT NULL
In your case, the final answer is... I want all records where there is no requirement (thus = 0) OR the record DOES have a match in the T1 table (thus T1.T1_ID IS NOT NULL)
I am thinking that you want a LEFT JOIN:
SELECT 1
FROM `T2` LEFT JOIN
`T1`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE (`T2`.`REQUIREMENT` <> 0) OR
(`T1`.`STATEMENT` = YYY AND `T2`.`T1_ID` = XXX);
This returns rows for all T2 values where REQUIREMENT != 0. It also returns the rows generated by the JOIN. Of course 1 is not very descriptive, so you want be able to tell which rows are which.
Your question would be much easier to follow with sample data and desired results.
if T2.REQUIREMENT=0 I know for sure, that there will not be any
T1.T1_ID entry that will match the JOIN requirements
So in order to get 1 returned when T2.REQUIREMENT=0, the join must match this condition too:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT` OR `T2`.`REQUIREMENT`=0
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
Edit:
or just append 1s with UNION for all rows that have T2.REQUIREMENT=0:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
UNION ALL
SELECT 1
FROM `T2`
WHERE `T2.REQUIREMENT=0`
this will work even if T1 is empty.

MySQL: Select query with many AND/OR conditions

I have a very large dataset, and I am trying to do a LEFT outer join, but keep losing some of my left table rows because of where I place my WHERE command, similar to the problem and solution here.
Example 1
SELECT *
FROM table1
LEFT JOIN table2
USING (IDvar)
WHERE table2.var IN(val1, val2,..., val100);
This only selects the rows in the first/left table (table1) that have a matching row in the second/right table (table2). The second example is what is likely to work:
Example 2
SELECT *
FROM table1
LEFT JOIN table2
USING (IDvar)
AND (table2.var = val1 OR table2.var = val2);
But, I have like 200 table2.var values that I would like to include, which are sporadic and and non-continuous (can't use syntax like table2.var >= val1).
An example of what I thought should work is to use "AND" and "IN" such as (because I have the values as a comma-separated list):
Example 3
SELECT *
FROM table1
LEFT JOIN table2
USING (IDvar)
AND table2.var IN(val1, val2,..., val100);
So how can I get many many values into an AND command?
I've found a working solution, but it takes way way to long to perform.
Example 4 - Working Example but takes too long
SELECT *
FROM table1
LEFT JOIN (SELECT table2.var WHERE table2.var IN(val1, val2,..., val100)) AS t
USING (IDvar);
Is there any way of optimising this query, it is taking way too long?

MySQL query for finding rows that are in one table but not another

Let's say I have about 25,000 records in two tables and the data in each should be the same. If I need to find any rows that are in table A but NOT in table B, what's the most efficient way to do this.
We've tried it as a subquery of one table and a NOT IN the result but this runs for over 10 minutes and almost crashes our site.
There must be a better way. Maybe a JOIN?
Hope LEFT OUTER JOIN will do the job
select t1.similar_ID
, case when t2.similar_ID is not null then 1 else 0 end as row_exists
from table1 t1
left outer join (select distinct similar_ID from table2) t2
on t1.similar_ID = t2.similar_ID // your WHERE goes here
I would suggest you read the following blog post, which goes into great detail on this question:
Which method is best to select values present in one table but missing
in another one?
And after a thorough analysis, arrives at the following conclusion:
However, these three methods [NOT IN, NOT EXISTS, LEFT JOIN]
generate three different plans which are executed by three different
pieces of code. The code that executes EXISTS predicate is about 30%
less efficient than those that execute index_subquery and LEFT JOIN
optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT
EXISTS.
If the performance you're seeing with NOT IN is not satisfactory, you won't improve this performance by switching to a LEFT JOIN / IS NULL or NOT EXISTS, and instead you'll need to take a different route to optimizing this query, such as adding indexes.
Use exixts and not exists function instead
Select * from A where not exists(select * from B);
Left join. From the mysql documentation
If there is no matching row for the right table in the ON or USING
part in a LEFT JOIN, a row with all columns set to NULL is used for
the right table. You can use this fact to find rows in a table that
have no counterpart in another table:
SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id =
right_tbl.id WHERE right_tbl.id IS NULL;
This example finds all rows in left_tbl with an id value that is not
present in right_tbl (that is, all rows in left_tbl with no
corresponding row in right_tbl).

Case statement in LEFT OUTER JOIN slowing the query in SQL Server 2008

I am getting a problem with my LEFT OUTER JOIN. I have a set of queries which gives me about 80,000 to 1,00000 records in a #Temp Table. Now when I LEFT OUTER JOIN this #Temp table with another table I have to put a CASE statement i.e. if the records are not found when joining with a particular column then take that particular column value and find its subsequent value in another table which has the matching records. The query is working fine for a particular data but for larger data it just goes on executing or just takes too much time. My query is like:
SELECT * FROM #Temp
LEFT OUTER JOIN TABLE1 ON #Temp.Materialcode =
CASE WHEN TABLE1.MaterialCode LIKE 'HY%'
THEN TABLE1.MaterialCode
ELSE REPLACE(TABLE1.MaterialCode,
TABLE1.MaterialCode,
(SELECT NewMaterialCode
FROM TABLE2
WHERE OldMaterialCode = TABLE1.MaterialCode))
END
Here TABLE2 has got only two columns NewMaterialCode and OldMetarialCode. What I have to do is if the Material Code is not found in TABLE1 LIKE 'HY%' type then it should take that material code and look for its subsequent NewMaterialCode in TABLE2 to get both types of records having 'HY' type and non 'HY' type. I think I made my problem clear. Any help would be greatly appreciated.
SELECT *
FROM #TEMP TMP
LEFT JOIN Table1 MATERIAL
ON TMP.MaterialCode = MATERIAL.MaterialCode
LEFT JOIN Table2 REPLACEMENT
ON MATERIAL.MaterialCode = REPLACEMENT.OldMaterialCode
WHERE ( COALESCE(MATERIAL.materialcode, '') LIKE 'HY%'
AND TMP.materialCode = MATERIAL.MaterialCode
)
OR MATERIAL.MaterialCode = REPLACEMENT.NewMaterialCode
I think this should do what you're trying to do, but I don't really know how the tables are related except by reverse-engineering your query.
For the record, the OUTER JOIN in your query isn't accomplishing a thing, because an outer condition would product null values for the columns in TABLE1, and the case condition wouldn't work (a NULL would be neither a match for 'HY%' nor an ELSE). That's counter-intuitive to those not used to working in the three-valued logic of the database world, but that's why we have COALESCE and ISNULL.

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.