I have a left join on a MySQL database like this...
select *
from tableA
left join tableB on (tableA.id=tableB.id1 and tableB.col2='Red')
...which performs OK with >500K rows on tableA and tableB
However, changing it to this (and assuming indexes are OK)...
select *
from tableA
left join tableB on
((tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue') )
...kills it, in terms of performance.
So why the performance hit? Can I do it another way?
EDIT
Not really sure what do you need
can you show some expected result
can you tell us what you mean by "kills it, in terms of performance" (does it go to 20sec of execution time ?)
I don't believe its more efficient but try it.
select
*
from
tableA as a
left join tableB as b1
on a.id=b1.id1
and b1.col2='Red'
left join tableB as b2
on a.id=b2.id2
and b2.col2='Blue'
where
(b1.id1 is not null or b2.id2 is not null)
or (b1.id1 is null and b2.id2 is null)
You have to manage the result in the SELECT with CASE WHEN...
You can compare the performance and put indexes on appropriated columns (depends on what you have in full table and query but here it should be id, id1 and col2)
Oh didn't notice that ... how about this ... don't have execution so you'll be able to judge better.
select *
from tableB, tableA
where (tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue')
Related
I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
Consider two table
tableA and tableB
tableA
|id|driver_id|vehicle_id|is_allowed|license_number|driver_name|
tableB
|id|driver_id|vehicle_id|offence|payable_amount|driver_name|
Goal: find driver_id and vehicle_id of allowed driver whose name is XYZ.
Query1:SELECT * FROM tableA,tableB {join-condition}{filter-condition}
SELECT tableA.driver_id,tableA.vehicle_id FROM tableA,tableB
WHERE
tableA.driver_id=tableB.driver_id AND
tableA.vehicle_id=tableB.vehicle_id AND
tableA.driver_name='XYZ' AND
tableB.driver_name='XYZ' AND
tableA.is_allowed = 1
Query2:SELECT * FROM (SELECT * FROM tableA {filter-condition}) JOIN (SELECT * FROM tableB {filter-condition}) ON {join-condition}{filter-condition}
SELECT tableAA.driver_id,tableAA.vehicle_id FROM
(SELECT tableA.driver_id,tableA.vehicle_id from tableA WHERE tableA.driver_name='XYZ' AND
tableA.is_allowed = 1) as tableAA,
JOIN
(SELECT tableB.driver_id,tableB.vehicle_id from tableB WHERE tableB.driver_name='XYZ') as tableBB
ON
tableAA.driver_id=tableBB.driver_id AND
tableAA.vehicle_id=tableBB.vehicle_id
which type of query is readable, optimized and according to standard.
A correct version would look like this:
SELECT a.driver_id, a.vehicle_id
FROM tableA a JOIN
tableB b
ON a.driver_id = b.driver_id AND
a.vehicle_id = b.vehicle_id
WHERE a.driver_name = 'XYZ' AND
b.driver_name = 'XYZ' AND
a.is_allowed = 1;
Notes:
JOIN is accepted as the right way to combine tables in the FROM clause. Simple rule: Never use commas in the FROM clause.
The ON clause should contain all predicates that contain columns from more than one table.
The use of table aliases is a preference that makes queries easier to write and to read.
You might want to use IN or EXISTS, because your query is not returning columns from TableB.
Do not use unnecessary subqueries in the FROM clause. In some databases (notably MySQL), this impedes the use of indexes and adds additional overhead for materialization of the intermediate table.
And, the answer to your question is that the first version is probably the optimized version (because it does not materialize subqueries unnecessarily). Neither version is preferred.
First one is better in case of standard and performance but is very old fashioned so it can be written in this way
SELECT tableA.driver_id,tableA.vehicle_id
FROM tableA
INNER JOIN tableB ON tableA.driver_id=tableB.driver_id
AND tableA.vehicle_id=tableB.vehicle_id
AND tableA.driver_name='XYZ'
AND tableB.driver_name='XYZ'
AND tableA.is_allowed = 1
I have a little problem with an SQL query: I have 'TableA' with a field 'TableA.b' that contains an ID for 'TableB'. I want to select all rows from 'TableB' that don't have an ID that equals any field 'TableA.b'. With other words, I need every row from TableB that's not referred to by any row from TableA in field .
I tried a Query like this :
SELECT DISTINCT TableB.* FROM TableA, TableB Where TableA.b != TableB.ID
But the result contains a row that is also returned by the negation, i.e. where both fields have the same value.
Any ideas?
What you need is LEFT (or RIGHT) JOIN.
SELECT TableB.* FROM TableA
LEFT JOIN TableB on TableA.b = TableB.ID
WHERE TableA.b IS NULL
While it's possible to do the same with a subquery as in some of the otehr answers. A join will often be faster.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better—a fact that is not
specific to MySQL Server alone. Prior to SQL-92, outer joins did not
exist, so subqueries were the only way to do certain things. Today,
MySQL Server and many other modern database systems offer a wide range
of outer join types.
First, select all ids from TableA:
SELECT DISTINCT b FROM TableA
Then use that result to select all rows in TableB that have an id that does not exist in this set by using the above query as a subquery:
SELECT * FROM TableB WHERE ID NOT IN (SELECT DISTINCT b FROM TableA)
Hope this helps.
You can try this
SELECT TableB.* FROM TableB
WHERE ID NOT IN
(SELECT b from TableA);
Use NOT IN in SELECT Query.
SELECT * FROM TableB t1 WHERE t1.ID NOT IN (SELECT t2.b FROM TableA t2);
You can use right join also.
Try this:
SELECT DISTINCT TableB.* FROM tablea RIGHT JOIN TableB ON TableA.b = Tableb.ID WHERE TableA.B IS NULL
I have create a sql query that the sketch is like this
select *
from A
where A.id in (select B.id1, B.id2 from B);
where the main select returns those values for which A.id coincides with either B.id1 or B.id2.
Clearly this solution doesn't work as the cardinality doesn't match in the where clause. How can I overcome this problem?
One solution would be to make two sub-queries, one for B.id1 and one for B.id2, but as my sub-query is much longer than in this example I was looking for a more elegant solution.
I'm using Mysql
EDIT 1
As long as the syntax is simpler than using two sub-queries I have no issues using joins
EDIT 2
Thanks #NullSoulException. I tried the first solution and works as expected!!
Something like the below should do the trick.
select *
From table1 a , (select id1 , id2 from table2 ) b
where (a.id = b.id1) or (a.id = b.id2)
or you can JOIN with the same table twice by giving the joined tables an alias.
select * from table1 a
INNER JOIN table2 b1 on a.id = b1.id1
INNER JOIN table2 b2 on a.id = b2.id2
Please test the above against your datasets/tables..
select * from table t inner join table_3 t3 on (t3.t_id=t.id) where t3.k_id IN(2,3,5);
select * from table t inner join table_3 t3 on (t3.t_id=t.id) where t3.k_id IN(select id from table_2);
How do these two statements differ as performance in big tables? In 2nd statement, is the inner "select" queried again and again or is it queried once only? Thanks
No, these two queries are quite different. Compare the two query execution plans with EXPLAIN, it'll show a DEPENDENT SUBQUERY select type for the second query. You can however optimize your query easily turning the dependent subquery intto a static subquery. It'll be something like
select *
from table t
inner join table_3 t3 on (t3.t_id=t.id)
where exists (select 1 from table2 where t3.k_id = table2.id);
Didn't try this out so please verify that both queries are equivalent.