SQL conditional JOIN - JOIN point defined based on joined_table conditions - mysql

Scenario:
In each of the below, "created date", "other_created_date" and "date" are a Day (i.e. 2012-01-03)
Table 1:
fields:
ID | created_date
Table 2:
fields:
ID | table_1_fk | other_created_date
Table 3:
fields:
date
The Goal:
I want to do the following:
SELECT * FROM table_1
JOIN table_2
ON table_1.id = table_2.table_1_fk
FULL OUTER JOIN table_3
ON table_3.date = (
CASE
WHEN table_1.created_date > table_2.other_created_date THEN table_1.created_date
ELSE table_2.other_created_date
END
)
Basically, I'm interested in (Table_1 + Table_2) JOINed on Table_3, where if the first statement is true we join on Table_1's date, and if the second statement is true we join on Table_2's date
Is this possible or is there a better way to go at this?

SELECT * FROM table_1
JOIN table_2
ON table_1.id = table_2.table_1_fk
FULL OUTER JOIN table_3
ON table_3.date = GREATEST(table_1.created_date,table_2.other_created_date)

I like Bernd's answer. However, without knowing anything about the content of these tables, I think it's worth your while to evaluate the performance difference between doing what you are suggesting and simply having two separate outer joins. I know I've done creative things in the joins before, and the database will manage it, but HOW it manages it may not be what I had in mind at all, especially if dealing with tens of millions of records.
As an example, this is what the SQL might look like if you used two outer joins instead of trying to merge them into one. It will potentially be a lot more code, which is why you will need to benchmark it to see if it matters.
I know I used left joins here -- I'm always a little suspicious when I see a full outer, but that's not to say it wasn't exactly what you wanted. But this is for illustration purposes only:
SELECT
case
when table_1.created_date > table_2.other_created_date then
t3a.<field_1>
else
t3b.<field_1>
end
FROM
table_1
JOIN table_2
ON table_1.id = table_2.table_1_fk
left join table3 t3a on
table3.date = table_1.created_date
left join table3 t3b on
table3.date = table_2.other_created_date
-- EDIT --
Here's an example of where a compactly-coded join condition had horrible performance and a workaround that was way more code but worth it:
PostgreSQL Joining Between Two Values

Related

Natural join works but not with all values

I can't understand whats happening...
I use two sql queries which do not return the same thing...
this one :
SELECT * FROM table1 t1 JOIN table1 t2 on t1.attribute1 = t2.attribute1
I get 10 rows
this other :
SELECT * FROM table1 NATURAL JOIN table1
I get 8 rows
With the NATURAL JOIN 2 rows aren't returned... I look for the missing lines and they are the same values ​​for the attribute1 column ...
It's impossible for me.
If anyone has an answer I could sleep better ^^
Best regards
Max
As was pointed out in the comments, the reason you are getting a different row count is that the natural join is connecting your self join using all columns. All columns are being compared because the same table appears on both sides of the join. To test this hypothesis, just check the column values from both tables, which should all match.
The moral of the story here is to avoid natural joins. Besides not being clear as to the join condition, the logic of the join could easily change should table structure change, e.g. if a new column gets added.
Follow the link below for a small demo which tried to reproduce your current results. In a table of 8 records, the natural join returns 8 records, whereas the inner join on one attribute returns 10 records due to some duplicate matching.
Demo
You need to 'project away' the attribute you don't want used in the join e.g. in a derived table (dt):
SELECT *
FROM table1
NATURAL JOIN ( SELECT attribute1 FROM table1 ) dt;

MySQL LEFT JOIN order of ON conditions

I have a MySQL query with inner joins and one left join and a lot of data in my database, and it's running quite slow. This is roughly my query:
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON (main_table.id = second_table.ref_id AND second_table.type = 'foo' AND second_table.bar IS NULL
WHERE
second_table.id IS NULL
;
An entry from main_table may have one or more referenced entries in second_table. I want to get all results from main_table, that either have no results in second_table, or only has irrelevant data in the second table (type 'foo' or bar is NULL).
Taking a look into the EXPLAIN, MySQL searches for bar IS NULL first, followed by type = 'foo', that would still result in many thousands of result, whereas checking for ref_id first would only leave very few results to check the other conditions on.
I only have an index on ref_id, not for type or bar and I don't feel the need to index them if I could just get the query search for ref_id first.
--EDIT: I noticed that on the copy of the database (where it has the actual data and runs slow) does also have an index on type and bar individually, so that's probably why MySQL prefers bar over the other keys. I'm considering a key spanning multiple fields.--
Does anybody have an idea how to optimize this kind of query? Is it possible to force MySQL using a certain order in the ON conditions?
"Solution": I added an index spanned over all the relevant fields.
I don't consider this being a real solution, because I believe, it would also have been faster if the JOIN was done on the indexed ref_id first. It probably did so when that was the only index, however my colleague had the idea to add an index separately on the other fields as well for some reason, probably needed somewhere else in our application.
What happens if you move the "Irrelevant" rows to the where part?
Seems to me the DB should have an easier time joining the tables, and will use the index
Something like
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON main_table.id = second_table.ref_id
WHERE
second_table.id IS NULL OR
(second_table.type = 'foo' AND second_table.bar IS NULL)
In MYSQL JOIN is faster then LEFT JOIN so you can write your query like this.
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN (SELECT main_table.*,second_table.* FROM main_table
JOIN second_table ON main_table.id = second_table.ref_id AND
second_table.type = 'foo' AND second_table.bar IS NULL) AS main_table2 ON
main_table2.id = main_table.id
WHERE
second_table.id IS NULL;

What is better way to join in mysql?

I wanted to join 3 or more tables
table1 - 1 thousand record
table2 - 100 thousands record
table3 - 10 millions record
Which of the following is best(speed wise performance):-
Note: pk and fk are primary and foreign key for respective tables and FILTER_CONDITION1 and FILTER_CONDITION2 are respective restricting records query normally found in where
Case 1 :taking smaller tables first and joining larger one later
Select table1.*,table2.*,table3.*
from table1
join table2
on table1.fk = table2.pk and FILTER_CONDITION1
join table3
on table2.fk = table3.pk and FILTER_CONDITION2
Case 2
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk and FILTER_CONDITION2
join table1
on table1.fk = table2.pk and FILTER_CONDITION1
Case 3
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk
join table1
on table1.fk = table2.pk
where FILTER_CONDITION1 and FILTER_CONDITION2
The cases you show are equivalent. What you are describing is in the end the same query and will be seen by the database as such: the database will make a query plan.
The best thing you can do is use EXPLAIN and check out what your query actually does: this way you can see they will probably be run the same, AND if there might be a bottle neck in there.
As #Nanne updated in his answer that normally mysql do it its own (right ordering) but some time (rare case) mysql can read table join in wrong order and can kill query performance in this case you can follow below approach-
If you can filter data from your bulky tables like table2 and table3 (suppose you can get only 500 records after joining these tables and applying filter) then first you filter your data and then you can join that filtered data with your small table..in this way you can get performance but there can be various combinations, so you have to check by which join you can do more filteration..yes explain will help you to know it and index will help you to get filtered data.
After above approach you can say mysql to use ordering as you have in your query by syntax "SELECT STRAIGHT_JOIN....." same as some time mysql does not use proper index and we have to use force index

Case statement in LEFT OUTER JOIN slowing the query in SQL Server 2008

I am getting a problem with my LEFT OUTER JOIN. I have a set of queries which gives me about 80,000 to 1,00000 records in a #Temp Table. Now when I LEFT OUTER JOIN this #Temp table with another table I have to put a CASE statement i.e. if the records are not found when joining with a particular column then take that particular column value and find its subsequent value in another table which has the matching records. The query is working fine for a particular data but for larger data it just goes on executing or just takes too much time. My query is like:
SELECT * FROM #Temp
LEFT OUTER JOIN TABLE1 ON #Temp.Materialcode =
CASE WHEN TABLE1.MaterialCode LIKE 'HY%'
THEN TABLE1.MaterialCode
ELSE REPLACE(TABLE1.MaterialCode,
TABLE1.MaterialCode,
(SELECT NewMaterialCode
FROM TABLE2
WHERE OldMaterialCode = TABLE1.MaterialCode))
END
Here TABLE2 has got only two columns NewMaterialCode and OldMetarialCode. What I have to do is if the Material Code is not found in TABLE1 LIKE 'HY%' type then it should take that material code and look for its subsequent NewMaterialCode in TABLE2 to get both types of records having 'HY' type and non 'HY' type. I think I made my problem clear. Any help would be greatly appreciated.
SELECT *
FROM #TEMP TMP
LEFT JOIN Table1 MATERIAL
ON TMP.MaterialCode = MATERIAL.MaterialCode
LEFT JOIN Table2 REPLACEMENT
ON MATERIAL.MaterialCode = REPLACEMENT.OldMaterialCode
WHERE ( COALESCE(MATERIAL.materialcode, '') LIKE 'HY%'
AND TMP.materialCode = MATERIAL.MaterialCode
)
OR MATERIAL.MaterialCode = REPLACEMENT.NewMaterialCode
I think this should do what you're trying to do, but I don't really know how the tables are related except by reverse-engineering your query.
For the record, the OUTER JOIN in your query isn't accomplishing a thing, because an outer condition would product null values for the columns in TABLE1, and the case condition wouldn't work (a NULL would be neither a match for 'HY%' nor an ELSE). That's counter-intuitive to those not used to working in the three-valued logic of the database world, but that's why we have COALESCE and ISNULL.

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.