I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
Related
Consider two table
tableA and tableB
tableA
|id|driver_id|vehicle_id|is_allowed|license_number|driver_name|
tableB
|id|driver_id|vehicle_id|offence|payable_amount|driver_name|
Goal: find driver_id and vehicle_id of allowed driver whose name is XYZ.
Query1:SELECT * FROM tableA,tableB {join-condition}{filter-condition}
SELECT tableA.driver_id,tableA.vehicle_id FROM tableA,tableB
WHERE
tableA.driver_id=tableB.driver_id AND
tableA.vehicle_id=tableB.vehicle_id AND
tableA.driver_name='XYZ' AND
tableB.driver_name='XYZ' AND
tableA.is_allowed = 1
Query2:SELECT * FROM (SELECT * FROM tableA {filter-condition}) JOIN (SELECT * FROM tableB {filter-condition}) ON {join-condition}{filter-condition}
SELECT tableAA.driver_id,tableAA.vehicle_id FROM
(SELECT tableA.driver_id,tableA.vehicle_id from tableA WHERE tableA.driver_name='XYZ' AND
tableA.is_allowed = 1) as tableAA,
JOIN
(SELECT tableB.driver_id,tableB.vehicle_id from tableB WHERE tableB.driver_name='XYZ') as tableBB
ON
tableAA.driver_id=tableBB.driver_id AND
tableAA.vehicle_id=tableBB.vehicle_id
which type of query is readable, optimized and according to standard.
A correct version would look like this:
SELECT a.driver_id, a.vehicle_id
FROM tableA a JOIN
tableB b
ON a.driver_id = b.driver_id AND
a.vehicle_id = b.vehicle_id
WHERE a.driver_name = 'XYZ' AND
b.driver_name = 'XYZ' AND
a.is_allowed = 1;
Notes:
JOIN is accepted as the right way to combine tables in the FROM clause. Simple rule: Never use commas in the FROM clause.
The ON clause should contain all predicates that contain columns from more than one table.
The use of table aliases is a preference that makes queries easier to write and to read.
You might want to use IN or EXISTS, because your query is not returning columns from TableB.
Do not use unnecessary subqueries in the FROM clause. In some databases (notably MySQL), this impedes the use of indexes and adds additional overhead for materialization of the intermediate table.
And, the answer to your question is that the first version is probably the optimized version (because it does not materialize subqueries unnecessarily). Neither version is preferred.
First one is better in case of standard and performance but is very old fashioned so it can be written in this way
SELECT tableA.driver_id,tableA.vehicle_id
FROM tableA
INNER JOIN tableB ON tableA.driver_id=tableB.driver_id
AND tableA.vehicle_id=tableB.vehicle_id
AND tableA.driver_name='XYZ'
AND tableB.driver_name='XYZ'
AND tableA.is_allowed = 1
I'm working on a query and came up with a solution like this
SELECT a.c1, a.c2, b.c3, b.c4
FROM table4 AS a
LEFT JOIN ((
SELECT c1, c2 FROM table1 LEFT JOIN table2 ... something
) UNION ALL (
SELECT c3, c4 FROM table1 LEFT JOIN table3 ... something
)) AS b
ON a.key = b.key
WHERE a.key ... something
ORDER BY a.key
This works perfectly as I want and not much execution time used. But it feels a little weird for me to have a result of UNION clause be in another SELECT and be JOINed with another table in a query.
So my question is : is it a good practice to use a query like this. Or I should find a better solution?
This query is rather complex, but whether this complexity is needed for what you want to achieve, is not the question here. And we cannot answer this either, not knowing your table structures, real query, and desired results.
But as to the technical side: Your query is perfectly okay. If this is the most straight-forward way to get to the desired data, there is nothing against it. There is no rule that you shouldn't outer join a union result or the like. You just put your data together step by step, and if you end up with this query, then that's it.
As to UNION vs. UNION ALL, as mentioned by Gordon Linoff: It often happens that someone writes UNION instead of UNION ALL, thus unwillingly giving the DBMS more work to do than necessary. In your case, however, it may change the result, so you must consider whether duplicates are possible in your UNION query and whether you want them removed or not.
try this, in my opinion its should be better
SELECT a.c1,a.c2,b.c3,c.c4
FROM table1 as a
LEFT JOIN table2 as b on ...
LEFT JOIN table3 as c on ...
WHERE something
ORDER BY a.key
I'm trying to use WHERE IN in a query and it's going very slowly. When running an explain, it turns out it's actually pulling all the rows at first, then sorting by IN.
Here's the query.
SELECT a.*, b.*
FROM table_a a
INNER JOIN table_b b
ON b.name = a.name
WHERE a.id IN (1,2,3,4,5);
In real life, there's 40-50 ids in the IN statement, but when I run an explain, it pulls hundreds of thousands of results at first.
What's an alternative I can use to this?
I have something in a query that I have to edit, that I don't understand.
There are 4 tables that are joined: tickets, tasks, tickets_users, users. The whole query is not important, but you have an example at the end of the post. What bugs me is this kind of code used many times in relation to other tables:
(SELECT name
FROM users
WHERE users.id=tickets_users.users_id
) AS RequesterName,
Is this a subquery with the tables users and tickets_users joined? What is this?
WHERE users.id=tickets_users.users_id
If this was a join I would have expected to see:
ON users.id = tickets_users.users_id
And how is this different from a typical join? Just use the same column definition: users.name and just join with the users table.
Can anyone enlighten me on the advanced SQL querying prowess of the original author?
The query looks like this:
SELECT
description,
(SELECT name
FROM users
WHERE users.id = tickets_users.users_id) AS RequesterName,
(SELECT description
FROM tickets
WHERE tickets.id = ticket_tasks.tickets_id) AS TicketDescription,
ticket_tasks.content AS TaskDescription
FROM
ticket_tasks
RIGHT JOIN
tickets ON ticket_tasks.tickets_id = tickets.id
INNER JOIN
tickets_users ON tickets_users.tickets_id = tickettasks.tickets_id
Thanks,
This is what is called a correlated subquery. To describe it in simple terms its doing a select inside a select.
However doing this more than once in ANY query is not recommended AT ALL.. the performance issue with this will be huge.
A correlated subquery will return a row by row comparison for each row of the select... if that doesnt make sense then think of it this way...
SELECT
id,
(SELECT id FROM tableA AS ta WHERE ta.id > t.id)
FROM
tableB AS t;
This will do for each row in tableB, every row in tableA will be selected and compared to tableB id.
NOTE:
If you have 100 rows in all 4 tables and you do a correlated subquery for each one then you are doing 100*100*100*100 row comparisons. thats 100,000,000 (one hundred million) comparisons!
A correlated subquery is NOT a join, but rather a subquery..
SELECT *
FROM
(SELECT id FROM t -- this is a subquery
) AS temp
However, JOINs are different... generally you can do it one of these two ways
This is the faster way
SELECT *
FROM t
JOIN t1 ON t1.id = t.id
This is the slower way
SELECT *
FROM t, t1
WHERE t1.id = t.id
what the second join is doing is making the Cartesian Product of the two tables and then filtering out the extra stuff in the WHERE clause as opposed to the first JOIN that filters as it joins.
For the different types of joins theres a few and all are useful in their prospective actions..
INNER JOIN (same as JOIN)
LEFT JOIN
RIGHT JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
In mysql FULL JOIN or FULL OUTER JOIN does not exist.. so in order to do a FULL join you need to combine a LEFT and RIGHT join. See this link for a better understanding of what joins do with Venn diagrams LINK
REMEMBER this is for SQL so it includes the FULL joins as well. those don't work in MySQL.
I have a left join on a MySQL database like this...
select *
from tableA
left join tableB on (tableA.id=tableB.id1 and tableB.col2='Red')
...which performs OK with >500K rows on tableA and tableB
However, changing it to this (and assuming indexes are OK)...
select *
from tableA
left join tableB on
((tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue') )
...kills it, in terms of performance.
So why the performance hit? Can I do it another way?
EDIT
Not really sure what do you need
can you show some expected result
can you tell us what you mean by "kills it, in terms of performance" (does it go to 20sec of execution time ?)
I don't believe its more efficient but try it.
select
*
from
tableA as a
left join tableB as b1
on a.id=b1.id1
and b1.col2='Red'
left join tableB as b2
on a.id=b2.id2
and b2.col2='Blue'
where
(b1.id1 is not null or b2.id2 is not null)
or (b1.id1 is null and b2.id2 is null)
You have to manage the result in the SELECT with CASE WHEN...
You can compare the performance and put indexes on appropriated columns (depends on what you have in full table and query but here it should be id, id1 and col2)
Oh didn't notice that ... how about this ... don't have execution so you'll be able to judge better.
select *
from tableB, tableA
where (tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue')